teehr.Fetch.nwm_forecast_points#

Fetch.nwm_forecast_points(nwm_configuration: str, output_type: str, variable_name: str, start_date: str | datetime, ingest_days: int, nwm_version: SupportedNWMOperationalVersionsEnum, data_source: SupportedNWMDataSourcesEnum | None = 'GCS', kerchunk_method: SupportedKerchunkMethod | None = 'local', prioritize_analysis_valid_time: bool | None = False, t_minus_hours: List[int] | None = None, process_by_z_hour: bool | None = True, stepsize: int | None = 100, ignore_missing_file: bool | None = True, overwrite_output: bool | None = False, timeseries_type: TimeseriesTypeEnum = 'secondary')[source]#

Fetch operational NWM point data and load into the TEEHR dataset.

Data is fetched for all secondary location IDs in the locations crosswalk table, and all dates and times within the files and in the cache file names are in UTC.

Parameters:

nwm_configuration (str) – NWM forecast category. (e.g., “analysis_assim”, “short_range”, …).
output_type (str) – Output component of the nwm_configuration. (e.g., “channel_rt”, “reservoir”, …).
variable_name (str) – Name of the NWM data variable to download. (e.g., “streamflow”, “velocity”, …).
start_date (str or datetime) – Date to begin data ingest. Str formats can include YYYY-MM-DD or MM/DD/YYYY.
ingest_days (int) – Number of days to ingest data after start date.
nwm_version (SupportedNWMOperationalVersionsEnum) – The NWM operational version “nwm22”, or “nwm30”.
data_source (Optional[SupportedNWMDataSourcesEnum]) – Specifies the remote location from which to fetch the data “GCS” (default), “NOMADS”, or “DSTOR” Currently only “GCS” is implemented.
kerchunk_method (Optional[SupportedKerchunkMethod]) – When data_source = “GCS”, specifies the preference in creating Kerchunk reference json files. “local” (default) will create new json files from netcdf files in GCS and save to a local directory if they do not already exist locally, in which case the creation is skipped. “remote” - read the CIROH pre-generated jsons from s3, ignoring any that are unavailable. “auto” - read the CIROH pre-generated jsons from s3, and create any that are unavailable, storing locally.
prioritize_analysis_valid_time (Optional[bool]) – A boolean flag that determines the method of fetching analysis data. When False (default), all hours of the reference time are included in the output. When True, only the hours within t_minus_hours are included.
t_minus_hours (Optional[List[int]]) – Specifies the look-back hours to include if an assimilation nwm_configuration is specified.
process_by_z_hour (Optional[bool]) – A boolean flag that determines the method of grouping files for processing. The default is True, which groups by day and z_hour. False groups files sequentially into chunks, whose size is determined by stepsize. This allows users to process more data potentially more efficiently, but runs to risk of splitting up forecasts into separate output files.
stepsize (Optional[int]) – The number of json files to process at one time. Used if process_by_z_hour is set to False. Default value is 100. Larger values can result in greater efficiency but require more memory.
ignore_missing_file (Optional[bool]) – Flag specifying whether or not to fail if a missing NWM file is encountered. True = skip and continue; False = fail.
overwrite_output (Optional[bool]) – Flag specifying whether or not to overwrite output files if they already exist. True = overwrite; False = fail.
timeseries_type (str) – Whether to consider as the “primary” or “secondary” timeseries. Default is “secondary”.

Notes

The NWM variables, including nwm_configuration, output_type, and variable_name are stored as pydantic models in point_config_models.py

The cached forecast and assimilation data is grouped and saved one file per reference time, using the file name convention “YYYYMMDDTHH”.

Examples

Here we fetch operational streamflow forecasts for NWM v2.2 from GCS, and load into the TEEHR dataset.

>>> import teehr
>>> ev = teehr.Evaluation()

>>> ev.fetch.nwm_forecast_points(
>>>     nwm_configuration="short_range",
>>>     output_type="channel_rt",
>>>     variable_name="streamflow",
>>>     start_date=datetime(2000, 1, 1),
>>>     ingest_days=1,
>>>     nwm_version="nwm22",
>>>     data_source="GCS",
>>>     kerchunk_method="auto"
>>> )

Note

NWM data can also be fetched outside of a TEEHR Evaluation by calling the method directly.

>>> from teehr.fetching.nwm.nwm_points import nwm_to_parquet

Fetch and format the data, writing to the specified directory.

>>> nwm_to_parquet(
>>>     nwm_configuration="short_range",
>>>     output_type="channel_rt",
>>>     variable_name="streamflow",
>>>     start_date="2023-03-18",
>>>     ingest_days=1,
>>>     location_ids=LOCATION_IDS,
>>>     json_dir=Path(Path.home(), "temp/parquet/jsons/"),
>>>     output_parquet_dir=Path(Path.home(), "temp/parquet"),
>>>     nwm_version="nwm22",
>>>     data_source="GCS",
>>>     kerchunk_method="auto",
>>>     prioritize_analysis_valid_time=True,
>>>     t_minus_hours=[0, 1, 2],
>>>     process_by_z_hour=True,
>>>     stepsize=STEPSIZE,
>>>     ignore_missing_file=True,
>>>     overwrite_output=True,
>>> )