teehr.evaluation.fetch.Fetch.nwm_operational_grids#
- Fetch.nwm_operational_grids(nwm_configuration: str, output_type: str, variable_name: str, start_date: str | datetime, ingest_days: int, nwm_version: SupportedNWMOperationalVersionsEnum, calculate_zonal_weights: bool = True, location_id_prefix: str | None = None, data_source: SupportedNWMDataSourcesEnum | None = 'GCS', kerchunk_method: SupportedKerchunkMethod | None = 'local', prioritize_analysis_valid_time: bool | None = False, t_minus_hours: List[int] | None = None, ignore_missing_file: bool | None = True, overwrite_output: bool | None = False, timeseries_type: TimeseriesTypeEnum = 'primary', starting_z_hour: int | None = None, ending_z_hour: int | None = None, write_mode: TableWriteEnum = 'append', zonal_weights_filepath: Path | str | None = None)[source]#
Fetch NWM operational gridded data, calculate zonal statistics (currently only mean is available) of selected variable for given zones, and load into the TEEHR dataset.
Data is fetched for the location IDs in the locations table having a given location_id_prefix. All dates and times within the files and in the cache file names are in UTC.
The zonal weights file, which contains the fraction each grid pixel overlaps each zone is necessary, and can be calculated and saved to the cache directory if it does not already exist.
- Parameters:
nwm_configuration (
str
) – NWM forecast category. (e.g., “analysis_assim”, “short_range”, …).output_type (
str
) – Output component of the nwm_configuration. (e.g., “channel_rt”, “reservoir”, …).variable_name (
str
) – Name of the NWM data variable to download. (e.g., “streamflow”, “velocity”, …).start_date (
str
ordatetime
) – Date to begin data ingest. Str formats can include YYYY-MM-DD or MM/DD/YYYY.ingest_days (
int
) – Number of days to ingest data after start date.nwm_version (
SupportedNWMOperationalVersionsEnum
) – The NWM operational version. “nwm12”, “nwm20”, “nwm21”, “nwm22”, or “nwm30”. Note that there is no change in NWM configuration between version 2.1 and 2.2, and they are treated as the same version. They are both allowed here for convenience.Availability of each version:
v1.2: 2018-09-17 - 2019-06-18
v2.0: 2019-06-19 - 2021-04-19
v2.1/2.2: 2021-04-20 - 2023-09-18
v3.0: 2023-09-19 - present
calculate_zonal_weights (
bool
) – Flag specifying whether or not to calculate zonal weights. True = calculate; False = use existing file. Default is True.location_id_prefix (
Optional[str]
) – Prefix to include when filtering the locations table for polygon primary_location_id. Default is None, all locations are included.data_source (
Optional[SupportedNWMDataSourcesEnum]
) – Specifies the remote location from which to fetch the data “GCS” (default), “NOMADS”, or “DSTOR”. Currently only “GCS” is implemented.kerchunk_method (
Optional[SupportedKerchunkMethod]
) – When data_source = “GCS”, specifies the preference in creating Kerchunk reference json files. “local” (default) will create new json files from netcdf files in GCS and save to a local directory if they do not already exist locally, in which case the creation is skipped. “remote” - read the CIROH pre-generated jsons from s3, ignoring any that are unavailable. “auto” - read the CIROH pre-generated jsons from s3, and create any that are unavailable, storing locally.prioritize_analysis_valid_time (
Optional[bool]
) – A boolean flag that determines the method of fetching analysis data. When False (default), all hours of the reference time are included in the output. When True, only the hours within t_minus_hours are included.t_minus_hours (
Optional[Iterable[int]]
) – Specifies the look-back hours to include if an assimilation nwm_configuration is specified.ignore_missing_file (
bool
) – Flag specifying whether or not to fail if a missing NWM file is encountered True = skip and continue; False = fail.overwrite_output (
bool
) – Flag specifying whether or not to overwrite output files if they already exist. True = overwrite; False = fail.timeseries_type (
str
) – Whether to consider as the “primary” or “secondary” timeseries. Default is “primary”.starting_z_hour (
Optional[int]
) – The starting z_hour to include in the output. If None, all z_hours are included for the first day. Default is None. Must be between 0 and 23.ending_z_hour (
Optional[int]
) – The ending z_hour to include in the output. If None, all z_hours are included for the last day. Default is None. Must be between 0 and 23.write_mode (
TableWriteEnum
,optional (default
:"append"
)
) – The write mode for the table. Options are “append” or “upsert”. If “append”, the Evaluation table will be appended with new data that does not already exist. If “upsert”, existing data will be replaced and new data that does not exist will be appended.zonal_weights_filepath (
Optional[Union[Path
,str]]
) – The path to the zonal weights file. If None and calculate_zonal_weights is False, the weights file must exist in the cache for the configuration. Default is None.
Note
Data in the cache is cleared before each call to the fetch method. So if a long-running fetch is interrupted before the data is automatically loaded into the Evaluation, it should be loaded or cached manually. This will prevent it from being deleted when the fetch job is resumed.
Notes
The NWM variables, including nwm_configuration, output_type, and variable_name are stored as a pydantic model in grid_config_models.py.
The cached forecast and assimilation data is grouped and saved one file per reference time, using the file name convention “YYYYMMDDTHH”.
All dates and times within the files and in the file names are in UTC.
Examples
Here we will calculate mean areal precipitation using operational NWM forcing data for the polygons in the locations table. Pixel weights (fraction of pixel overlap) are calculated for each polygon and stored in the evaluation cache directory.
(see:
generate_weights_file()
for weights calculation).>>> import teehr >>> ev = teehr.Evaluation()
>>> ev.fetch.nwm_operational_grids( >>> nwm_configuration="forcing_short_range", >>> output_type="forcing", >>> variable_name="RAINRATE", >>> start_date=datetime(2000, 1, 1), >>> ingest_days=1, >>> Path(Path.home(), "nextgen_03S_weights.parquet"), >>> nwm_version="nwm22", >>> data_source="GCS", >>> kerchunk_method="auto" >>> )
Note
NWM data can also be fetched outside of a TEEHR Evaluation by calling the method directly.
>>> from teehr.fetching.nwm.nwm_grids import nwm_grids_to_parquet
Perform the calculations, writing to the specified directory.
>>> nwm_grids_to_parquet( >>> nwm_configuration=forcing_short_range, >>> output_type=forcing, >>> variable_name=RAINRATE, >>> start_date=2020-12-18, >>> ingest_days=1, >>> zonal_weights_filepath=Path(Path.home(), "nextgen_03S_weights.parquet"), >>> json_dir=Path(Path.home(), "temp/parquet/jsons/"), >>> output_parquet_dir=Path(Path.home(), "temp/parquet"), >>> nwm_version="nwm21", >>> data_source="GCS", >>> kerchunk_method="auto", >>> t_minus_hours=[0, 1, 2], >>> ignore_missing_file=True, >>> overwrite_output=True >>> )
See also
teehr.fetching.nwm.nwm_grids.nwm_grids_to_parquet()