Bootstrappers#
- class Bootstrappers[source]#
Container class for bootstrap sampling classes.
Notes
Bootstrapping is a resampling method used to estimate uncertainty in metric results. The bootstrapping methods available in TEEHR include:
Gumboot
CircularBlock
Stationary
Methods
- class CircularBlock(*, return_type: str | ~pyspark.sql.types.ArrayType | ~pyspark.sql.types.MapType = None, reps: int = 1000, seed: int | None = None, quantiles: ~typing.List[float] | None = None, name: str = 'CircularBlock', include_value_time: bool = False, func: ~typing.Callable = <function create_circularblock_func>, random_state: ~numpy.random.mtrand.RandomState | None = None, block_size: int = 365)#
CircularBlock bootstrapping from the arch python package.
- Parameters:
random_state (
RandomState, optional) – The random state for the random number generator.block_size (
int) – The block size for the CircularBlockBootstrap. Default value is 365.
- class Gumboot(*, return_type: str | ~pyspark.sql.types.ArrayType | ~pyspark.sql.types.MapType = None, reps: int = 1000, seed: int | None = None, quantiles: ~typing.List[float] | None = None, name: str = 'Gumboot', include_value_time: bool = True, func: ~typing.Callable = <function create_gumboot_func>, boot_year_file: str | ~pathlib.Path | None = None, water_year_month: int = 10)#
Gumboot bootstrapping.
This is a partial implementation of the Gumboot R package, a non-overlapping bootstrap method where blocks are defined by water years. Synthetic timeseries are constructed by randomly resampling water years from the input timeseries with replacement. The specified performance metric is calculated for each synthetic timeseries for a number of bootstrap replications (reps). The quantiles of the bootstrap metric results are calculated and returned.
If the quantile values are not specified or are set to None, the array of metric values is returned (dimensions: [reps, 1]). Otherwise the specified quantiles of the metric values are returned as a dictionary.
- See Also: Clark et al. (2021), “The abuse of popular performance metrics
in hydrologic modeling”, Water Resources Research, <doi:10.1029/2020WR029001>
- Parameters:
boot_year_file (
Union[str,Path,None]) – The file path to the boot year csv file. The default value is None.water_year_month (
int) – The month specifying the start of the water year. Default value is 10.
- class Stationary(*, return_type: str | ~pyspark.sql.types.ArrayType | ~pyspark.sql.types.MapType = None, reps: int = 1000, seed: int | None = None, quantiles: ~typing.List[float] | None = None, name: str = 'Stationary', include_value_time: bool = False, func: ~typing.Callable = <function create_stationary_func>, random_state: ~numpy.random.mtrand.RandomState | None = None, block_size: int = 365)#
Stationary bootstrapping from the arch python package.
- Parameters:
random_state (
RandomState, optional) – The random state for the random number generator.block_size (
int) – The block size for the StationaryBootstrap. Default value is 365.