teehr.Fetch.nwm_forecast_grids#

Fetch.nwm_forecast_grids(nwm_configuration: str, output_type: str, variable_name: str, start_date: str | datetime, ingest_days: int, zonal_weights_filepath: Path | str, nwm_version: SupportedNWMOperationalVersionsEnum, data_source: SupportedNWMDataSourcesEnum | None = 'GCS', kerchunk_method: SupportedKerchunkMethod | None = 'local', prioritize_analysis_valid_time: bool | None = False, t_minus_hours: List[int] | None = None, ignore_missing_file: bool | None = True, overwrite_output: bool | None = False, location_id_prefix: str | None = None, timeseries_type: TimeseriesTypeEnum = 'primary')[source]#

Fetch NWM operational gridded data, calculate zonal statistics (currently only mean is available) of selected variable for given zones, and load into the TEEHR dataset.

Data is fetched for all location IDs in the locations table, and all dates and times within the files and in the cache file names are in UTC.

Parameters:

nwm_configuration (str) – NWM forecast category. (e.g., “analysis_assim”, “short_range”, …).
output_type (str) – Output component of the nwm_configuration. (e.g., “channel_rt”, “reservoir”, …).
variable_name (str) – Name of the NWM data variable to download. (e.g., “streamflow”, “velocity”, …).
start_date (str or datetime) – Date to begin data ingest. Str formats can include YYYY-MM-DD or MM/DD/YYYY.
ingest_days (int) – Number of days to ingest data after start date.
zonal_weights_filepath (str) – Path to the array containing fraction of pixel overlap for each zone.
nwm_version (SupportedNWMOperationalVersionsEnum) – The NWM operational version. “nwm22”, or “nwm30”.
data_source (Optional[SupportedNWMDataSourcesEnum]) – Specifies the remote location from which to fetch the data “GCS” (default), “NOMADS”, or “DSTOR”. Currently only “GCS” is implemented.
kerchunk_method (Optional[SupportedKerchunkMethod]) – When data_source = “GCS”, specifies the preference in creating Kerchunk reference json files. “local” (default) will create new json files from netcdf files in GCS and save to a local directory if they do not already exist locally, in which case the creation is skipped. “remote” - read the CIROH pre-generated jsons from s3, ignoring any that are unavailable. “auto” - read the CIROH pre-generated jsons from s3, and create any that are unavailable, storing locally.
prioritize_analysis_valid_time (Optional[bool]) – A boolean flag that determines the method of fetching analysis data. When False (default), all hours of the reference time are included in the output. When True, only the hours within t_minus_hours are included.
t_minus_hours (Optional[Iterable[int]]) – Specifies the look-back hours to include if an assimilation nwm_configuration is specified.
ignore_missing_file (bool) – Flag specifying whether or not to fail if a missing NWM file is encountered True = skip and continue; False = fail.
overwrite_output (bool) – Flag specifying whether or not to overwrite output files if they already exist. True = overwrite; False = fail.
location_id_prefix (Union[str, None]) – Optional location ID prefix to add (prepend) or replace.
timeseries_type (str) – Whether to consider as the “primary” or “secondary” timeseries. Default is “primary”.

Notes

The NWM variables, including nwm_configuration, output_type, and variable_name are stored as a pydantic model in grid_config_models.py.

The cached forecast and assimilation data is grouped and saved one file per reference time, using the file name convention “YYYYMMDDTHH”.

Additionally, the location_id values in the zonal weights file are used as location ids in the output of this function, unless a prefix is specified which will be prepended to the location_id values if none exists, or will it replace the existing prefix. It is assumed that the location_id follows the pattern ‘[prefix]-[unique id]’.

All dates and times within the files and in the file names are in UTC.

Examples

Here we will calculate mean areal precipitation using NWM forcing data for some watersheds (polygons) a using pre-calculated weights file (see: generate_weights_file() for weights calculation).

>>> import teehr
>>> ev = teehr.Evaluation()

>>> ev.fetch.nwm_forecast_grids(
>>>     nwm_configuration="forcing_short_range",
>>>     output_type="forcing",
>>>     variable_name="RAINRATE",
>>>     start_date=datetime(2000, 1, 1),
>>>     ingest_days=1,
>>>     Path(Path.home(), "nextgen_03S_weights.parquet"),
>>>     nwm_version="nwm22",
>>>     data_source="GCS",
>>>     kerchunk_method="auto"
>>> )

Note

NWM data can also be fetched outside of a TEEHR Evaluation by calling the method directly.

>>> from teehr.fetching.nwm.nwm_grids import nwm_grids_to_parquet

Perform the calculations, writing to the specified directory.

>>> nwm_grids_to_parquet(
>>>     nwm_configuration=forcing_short_range,
>>>     output_type=forcing,
>>>     variable_name=RAINRATE,
>>>     start_date=2020-12-18,
>>>     ingest_days=1,
>>>     zonal_weights_filepath=Path(Path.home(), "nextgen_03S_weights.parquet"),
>>>     json_dir=Path(Path.home(), "temp/parquet/jsons/"),
>>>     output_parquet_dir=Path(Path.home(), "temp/parquet"),
>>>     nwm_version="nwm22",
>>>     data_source="GCS",
>>>     kerchunk_method="auto",
>>>     t_minus_hours=[0, 1, 2],
>>>     ignore_missing_file=True,
>>>     overwrite_output=True
>>> )