nwm_operational_grids#

Fetch NWM operational gridded data, calculate zonal statistics (currently only mean is available) of selected variable for given zones, and load into the TEEHR dataset.

Data is fetched for the location IDs in the locations table having a given location_id_prefix. All dates and times within the files and in the cache file names are in UTC.

The zonal weights file, which contains the fraction each grid pixel overlaps each zone is necessary, and can be calculated and saved to the cache directory if it does not already exist.

Parameters:

nwm_configuration (str) – NWM forecast category. (e.g., “analysis_assim”, “short_range”, …).
output_type (str) – Output component of the nwm_configuration. (e.g., “channel_rt”, “reservoir”, …).
variable_name (str) – Name of the NWM data variable to download. (e.g., “streamflow”, “velocity”, …).
nwm_version (SupportedNWMOperationalVersionsEnum) – The NWM operational version. “nwm12”, “nwm20”, “nwm21”, “nwm22”, or “nwm30”. Note that there is no change in NWM configuration between version 2.1 and 2.2, and they are treated as the same version. They are both allowed here for convenience.

Availability of each version:
- v1.2: 2018-09-17 - 2019-06-18
- v2.0: 2019-06-19 - 2021-04-19
- v2.1/2.2: 2021-04-20 - 2023-09-18
- v3.0: 2023-09-19 - present
start_date (Union[str, datetime, pd.Timestamp]) – Date and time to begin data ingest. Str formats can include YYYY-MM-DD HH:MM or MM/DD/YYYY HH:MM.
end_date (Optional[Union[str, datetime, pd.Timestamp]],) – Date and time to end data ingest. Str formats can include YYYY-MM-DD HH:MM or MM/DD/YYYY HH:MM. If not provided, must provide ingest_days.
ingest_days (Optional[int]) – Number of days to ingest data after start date. This is deprecated in favor of end_date, and will be removed in a future release. If both are provided, ingest_days takes precedence. If not provided, end_date must be specified.
calculate_zonal_weights (bool) – Flag specifying whether or not to calculate zonal weights. True = calculate; False = use existing file. Default is False.
location_id_prefix (Optional[str]) – Prefix to include when filtering the locations table for polygon primary_location_id. Default is None, all locations are included.
data_source (Optional[SupportedNWMDataSourcesEnum]) – Specifies the remote location from which to fetch the data “GCS” (default), “NOMADS”, or “DSTOR”. Currently only “GCS” is implemented.
kerchunk_method (Optional[SupportedKerchunkMethod]) – When data_source = “GCS”, specifies the preference in creating Kerchunk reference json files. “local” (default) will create new json files from netcdf files in GCS and save to a local directory if they do not already exist locally, in which case the creation is skipped. “remote” - read the CIROH pre-generated jsons from s3, ignoring any that are unavailable. “auto” - read the CIROH pre-generated jsons from s3, and create any that are unavailable, storing locally.
prioritize_analysis_value_time (Optional[bool]) – A boolean flag that determines the method of fetching analysis-assimilation data. When False (default), all non-overlapping value_time hours (prioritizing the most recent reference_time) are included in the output. When True, only the hours within t_minus_hours are included.
t_minus_hours (Optional[Iterable[int]]) – Specifies the look-back hours to include if an assimilation nwm_configuration is specified. Only utilized if assimilation data is requested and prioritize_analysis_value_time is True.
ignore_missing_file (bool) – Flag specifying whether or not to fail if a missing NWM file is encountered True = skip and continue; False = fail.
overwrite_output (bool) – Flag specifying whether or not to overwrite output files if they already exist. True = overwrite; False = fail.
timeseries_type (str) – Whether to consider as the “primary” or “secondary” timeseries. Default is “secondary”, unless the configuration is a analysis containing assimilation, in which case the default is “primary”.
starting_z_hour (Optional[int]) – The starting z_hour to include in the output. If None, z_hours for the first day are determined by start_date. Default is None. Must be between 0 and 23.
ending_z_hour (Optional[int]) – The ending z_hour to include in the output. If None, z_hours for the last day are determined by end_date if provided, otherwise all z_hours are included in the final day. Default is None. Must be between 0 and 23.
write_mode (TableWriteEnum, optional (default: "append")) – The write mode for the table. Options are “append” or “upsert”. If “append”, the Evaluation table will be appended with new data that does not already exist. If “upsert”, existing data will be replaced and new data that does not exist will be appended.
zonal_weights_filepath (Optional[Union[Path, str]]) – The path to the zonal weights file. If None and calculate_zonal_weights is False, the weights file must exist in the cache for the configuration. Default is None.
drop_duplicates (bool) – Whether to drop duplicates in the data. Default is True.
drop_overlapping_assimilation_values (Optional[bool] = True) – Whether to drop assimilation values that overlap in value_time. Default is True. If True, values that overlap in value_time are dropped, keeping those with the most recent reference_time. In this case, all reference_time values are set to None. If False, overlapping values are kept and reference_time is retained.

Note

Data in the cache is cleared before each call to the fetch method. So if a long-running fetch is interrupted before the data is automatically loaded into the Evaluation, it should be loaded or cached manually. This will prevent it from being deleted when the fetch job is resumed.

Notes

The NWM variables, including nwm_configuration, output_type, and variable_name are stored as a pydantic model in grid_config_models.py.

The cached forecast and assimilation data is grouped and saved one file per reference time, using the file name convention “YYYYMMDDTHH”.

All dates and times within the files and in the file names are in UTC.

Examples

Here we will calculate mean areal precipitation using operational NWM forcing data for the polygons in the locations table. Pixel weights (fraction of pixel overlap) are calculated for each polygon and stored in the evaluation cache directory.

(see: generate_weights_file() for weights calculation).

>>> import teehr
>>> ev = teehr.Evaluation()

>>> ev.fetch.nwm_operational_grids(
>>>     nwm_configuration="forcing_short_range",
>>>     output_type="forcing",
>>>     variable_name="RAINRATE",
>>>     start_date=datetime(2000, 1, 1),
>>>     end_date=datetime(2000, 1, 2),
>>>     Path(Path.home(), "nextgen_03S_weights.parquet"),
>>>     nwm_version="nwm22",
>>>     data_source="GCS",
>>>     kerchunk_method="auto"
>>> )

Note

NWM data can also be fetched outside of a TEEHR Evaluation by calling the method directly.

>>> from teehr.fetching.nwm.nwm_grids import nwm_grids_to_parquet

Perform the calculations, writing to the specified directory.

>>> nwm_grids_to_parquet(
>>>     nwm_configuration=forcing_short_range,
>>>     output_type=forcing,
>>>     variable_name=RAINRATE,
>>>     start_date="2020-12-18",
>>>     end_date="2020-12-19",
>>>     zonal_weights_filepath=Path(Path.home(), "nextgen_03S_weights.parquet"),
>>>     json_dir=Path(Path.home(), "temp/parquet/jsons/"),
>>>     output_parquet_dir=Path(Path.home(), "temp/parquet"),
>>>     nwm_version="nwm21",
>>>     data_source="GCS",
>>>     kerchunk_method="auto",
>>>     t_minus_hours=[0, 1, 2],
>>>     ignore_missing_file=True,
>>>     overwrite_output=True
>>> )

nwm_operational_grids#

This Page