Fetching USGS and NWM Streamflow Data#

Overview#

In this guide we’ll demonstrate fetching National Water Model (NWM) streamflow forecasts from Google Cloud Storage (GCS). This example makes use of a pre-generated Evaluation dataset stored in TEEHR’s examples data module. It contains a single USGS gage location and the corresponding NWM location ID.

Note: For demonstration purposes several cells below are shown in markdown form. If you want to download this notebook and run them yourself, you will need to convert them to code cells.

For a refresher on loading location and location crosswalk data into a new Evaluation refer back to the Loading Data section of the user guide.

Set up the example Evaluation#

from pathlib import Path
import shutil

import teehr
from teehr.example_data.setup_nwm_streamflow_example import setup_nwm_example

# Tell Bokeh to output plots in the notebook
from bokeh.io import output_notebook
output_notebook()

Loading BokehJS ...

# Define the directory where the Evaluation will be created.
test_eval_dir = Path(Path().home(), "temp", "10_fetch_nwm_data")
shutil.rmtree(test_eval_dir, ignore_errors=True)

# Setup the example evaluation using data from the TEEHR repository.
setup_nwm_example(tmpdir=test_eval_dir)

# Initialize the evaluation.
ev = teehr.Evaluation(dir_path=test_eval_dir)

This example Evaluation only contains a single location, the USGS gage on the New River at Radford, VA.

locations_gdf = ev.locations.to_geopandas()
locations_gdf

	id	name	geometry	created_at	updated_at	properties
0	usgs-03171000	NEW RIVER AT RADFORD, VA	POINT (-80.56922 37.14179)	2026-06-17 21:01:17.438454	2026-06-17 21:01:17.438454	None

The ID of the National Water Model reach corresponding to the USGS gage is in the location crosswalks table.

location_crosswalks_df = ev.location_crosswalks.to_pandas()
location_crosswalks_df

	primary_location_id	secondary_location_id	created_at	updated_at	properties
0	usgs-03171000	nwm30-6884666	2026-06-17 21:01:18.456133	2026-06-17 21:01:18.456133	None

Location ID Prefixes#

Note that the fetching methods automatically append location ID prefixes to the USGS and NWM location IDs. These must existing the location and location crosswalk tables first. See the Loading Local Data user guide for more info on specifying the location ID prefix when loading data.

Fetching USGS streamgage data#

Since the example Evalution already contains the location and location crosswalk IDs for USGS and NWM locations, we can make use of TEEHR’s built-in tools to fetch USGS and NWM streamflow data.

First we’ll fetch USGS data from the National Water Information System (NWIS). TEEHR makes use of the USGS dataretrieval python tool under the hood.

Note that the USGS and NWM streamflow timeseries data have been pre-loading into this example Evaluation. However you can still download this notebook and execute the methods yourself.

ev.fetch.usgs_streamflow() is a method that fetches the USGS data for the primary locations in the evaluation. It requires users to define the start and end times of data to fetch and has several optional arguments. For more details on the method see:

We’ll fetch streamflow data at this gage during the 2024 Hurricane Helene event.

# Convert this to a code cell to run locally
ev.fetch.usgs_streamflow(
    start_date=datetime(2024, 9, 26),
    end_date=datetime(2024, 10, 1)
)

TEEHR automatically loads the data into the Evaluation. USGS data is loaded a primary timeseries by default.

df = ev.primary_timeseries.to_pandas()
df

	reference_time	value_time	configuration_name	unit_name	variable_name	value	location_id	created_at	updated_at
0	NaT	2024-09-26 01:00:00	usgs_observations	m^3/s	streamflow_hourly_inst	274.956573	usgs-03171000	2026-06-17 21:01:20.599288	2026-06-17 21:01:20.599288
1	NaT	2024-09-26 02:00:00	usgs_observations	m^3/s	streamflow_hourly_inst	270.425873	usgs-03171000	2026-06-17 21:01:20.599288	2026-06-17 21:01:20.599288
2	NaT	2024-09-26 04:00:00	usgs_observations	m^3/s	streamflow_hourly_inst	286.000153	usgs-03171000	2026-06-17 21:01:20.599288	2026-06-17 21:01:20.599288
3	NaT	2024-09-27 00:00:00	usgs_observations	m^3/s	streamflow_hourly_inst	481.386383	usgs-03171000	2026-06-17 21:01:20.599288	2026-06-17 21:01:20.599288
4	NaT	2024-09-27 13:00:00	usgs_observations	m^3/s	streamflow_hourly_inst	1639.545410	usgs-03171000	2026-06-17 21:01:20.599288	2026-06-17 21:01:20.599288
...	...	...	...	...	...	...	...	...	...
116	NaT	2024-09-30 07:00:00	usgs_observations	m^3/s	streamflow_hourly_inst	436.079437	usgs-03171000	2026-06-17 21:01:20.599288	2026-06-17 21:01:20.599288
117	NaT	2024-09-30 09:00:00	usgs_observations	m^3/s	streamflow_hourly_inst	433.247742	usgs-03171000	2026-06-17 21:01:20.599288	2026-06-17 21:01:20.599288
118	NaT	2024-09-30 13:00:00	usgs_observations	m^3/s	streamflow_hourly_inst	336.970490	usgs-03171000	2026-06-17 21:01:20.599288	2026-06-17 21:01:20.599288
119	NaT	2024-09-30 16:00:00	usgs_observations	m^3/s	streamflow_hourly_inst	334.138794	usgs-03171000	2026-06-17 21:01:20.599288	2026-06-17 21:01:20.599288
120	NaT	2024-09-30 17:00:00	usgs_observations	m^3/s	streamflow_hourly_inst	336.970490	usgs-03171000	2026-06-17 21:01:20.599288	2026-06-17 21:01:20.599288

121 rows × 9 columns

Appending, Upserting, and Overwriting data#

Note that the fetching methods allow you to add new data to an existing table (append, the default) or replace existing data with new values while adding additional data (upsert). These are available through the write_mode argument. Additionally, when write_mode="overwrite", any existing partitions receiving new data will be overwritten.

ex:

ev.fetch.usgs_streamflow(
    start_date=datetime(2024, 10, 1),
    end_date=datetime(2024, 10, 5),
    write_mode="append"
)

Fetching NWM streamflow data#

ev.fetch.nwm_operational_points() is a method that fetches near real-time NWM point data (e.g., streamflow) from Google Cloud Storage. This method fetches data for the secondary location IDs listed in the location_crosswalks table, and automatically loads the time series into the secondary_timeseries Evaluation table.

There are several required arguments to define when using the method, including the NWM configuration, NWM variable name, start date, number of ingest days, and others. Several optional arguments are also available.

We’ll now fetch streamflow forecasts for the NWM location corresponding to the USGS gage.

Note

The tools for fetching NWM data in TEEHR can take advantage of Dask. Start a Dask cluster for improved performance when fetching NWM data if you have Dask.Distributed installed!

# Convert this to a code cell to run locally
from dask.distributed import Client
client = Client()

# Convert this to a code cell to run locally
ev.fetch.nwm_operational_points(
    nwm_configuration="medium_range_mem1",
    output_type="channel_rt_1",
    variable_name="streamflow",
    start_date=datetime(2024, 9, 26),
    end_date=datetime(2024, 9, 26),
    nwm_version="nwm30",
)

Here we are fetching NWM version 3.0 Medium Range streamflow forecast, ensemble member 1, for the same time period as the USGS data.

A list of available NWM configurations for point data is shown below. Appropriate values for the output_type and variable_name arguments depend on the specified nwm_configuration value.

More information on NWM configurations can be found here: https://water.noaa.gov/about/nwm

from teehr.fetching.models.nwm30_point import ConfigurationsEnum
list(ConfigurationsEnum.__members__)

['analysis_assim',
 'analysis_assim_no_da',
 'analysis_assim_extend',
 'analysis_assim_extend_no_da',
 'analysis_assim_long',
 'analysis_assim_long_no_da',
 'analysis_assim_hawaii',
 'analysis_assim_hawaii_no_da',
 'analysis_assim_puertorico',
 'analysis_assim_puertorico_no_da',
 'analysis_assim_alaska',
 'analysis_assim_alaska_no_da',
 'analysis_assim_extend_alaska',
 'analysis_assim_extend_alaska_no_da',
 'short_range',
 'short_range_hawaii',
 'short_range_puertorico',
 'short_range_hawaii_no_da',
 'short_range_puertorico_no_da',
 'short_range_alaska',
 'medium_range_mem1',
 'medium_range_mem2',
 'medium_range_mem3',
 'medium_range_mem4',
 'medium_range_mem5',
 'medium_range_mem6',
 'medium_range_mem7',
 'medium_range_no_da',
 'medium_range_alaska_mem1',
 'medium_range_alaska_mem2',
 'medium_range_alaska_mem3',
 'medium_range_alaska_mem4',
 'medium_range_alaska_mem5',
 'medium_range_alaska_mem6',
 'medium_range_alaska_no_da',
 'medium_range_blend',
 'medium_range_blend_alaska',
 'long_range_mem1',
 'long_range_mem2',
 'long_range_mem3',
 'long_range_mem4']

df = ev.secondary_timeseries.to_pandas()
df

	reference_time	value_time	configuration_name	unit_name	variable_name	value	location_id	member	created_at	updated_at
0	2024-09-26 06:00:00	2024-09-26 17:00:00	nwm30_medium_range	m^3/s	streamflow_hourly_inst	104.399994	nwm30-6884666	1	2026-06-17 21:01:22.940912	2026-06-17 21:01:22.940912
1	2024-09-26 06:00:00	2024-09-26 19:00:00	nwm30_medium_range	m^3/s	streamflow_hourly_inst	84.909996	nwm30-6884666	1	2026-06-17 21:01:22.940912	2026-06-17 21:01:22.940912
2	2024-09-26 06:00:00	2024-09-27 05:00:00	nwm30_medium_range	m^3/s	streamflow_hourly_inst	62.669998	nwm30-6884666	1	2026-06-17 21:01:22.940912	2026-06-17 21:01:22.940912
3	2024-09-26 06:00:00	2024-09-27 09:00:00	nwm30_medium_range	m^3/s	streamflow_hourly_inst	62.699997	nwm30-6884666	1	2026-06-17 21:01:22.940912	2026-06-17 21:01:22.940912
4	2024-09-26 06:00:00	2024-09-28 06:00:00	nwm30_medium_range	m^3/s	streamflow_hourly_inst	84.839996	nwm30-6884666	1	2026-06-17 21:01:22.940912	2026-06-17 21:01:22.940912
...	...	...	...	...	...	...	...	...	...	...
955	2024-09-26 00:00:00	2024-10-04 07:00:00	nwm30_medium_range	m^3/s	streamflow_hourly_inst	259.299988	nwm30-6884666	1	2026-06-17 21:01:22.940912	2026-06-17 21:01:22.940912
956	2024-09-26 00:00:00	2024-10-04 20:00:00	nwm30_medium_range	m^3/s	streamflow_hourly_inst	219.709991	nwm30-6884666	1	2026-06-17 21:01:22.940912	2026-06-17 21:01:22.940912
957	2024-09-26 00:00:00	2024-10-04 22:00:00	nwm30_medium_range	m^3/s	streamflow_hourly_inst	214.509995	nwm30-6884666	1	2026-06-17 21:01:22.940912	2026-06-17 21:01:22.940912
958	2024-09-26 00:00:00	2024-10-05 02:00:00	nwm30_medium_range	m^3/s	streamflow_hourly_inst	204.720001	nwm30-6884666	1	2026-06-17 21:01:22.940912	2026-06-17 21:01:22.940912
959	2024-09-26 00:00:00	2024-10-05 21:00:00	nwm30_medium_range	m^3/s	streamflow_hourly_inst	167.160004	nwm30-6884666	1	2026-06-17 21:01:22.940912	2026-06-17 21:01:22.940912

960 rows × 10 columns

Now you calculate metrics comparing the NWM forecasts to the USGS observations by using the joined_timeseries_view.

ev.spark.stop()

Additional NWM fetching methods in TEEHR#

NWM Retrospective Point Data#

NWM retrospective point data for versions 2.0, 2.1, and 3.0:

Fetch.nwm_retrospective_points

Fetch NWM retrospective point data and load into the TEEHR dataset.

NWM Retrospective and Forecast Gridded Data#

NWM gridded data can also be fetched in TEEHR. Gridded data is summarized to zones (polygons) as the zonal mean.

`Fetch.nwm_operational_grids`	Fetch NWM operational gridded data, calculate zonal statistics (currently only mean is available) of selected variable for given zones, and load into the TEEHR dataset.
`Fetch.nwm_retrospective_grids`	Fetch NWM retrospective gridded data, calculate zonal statistics (currently only mean is available) of selected variable for given zones, and load into the TEEHR dataset.