Fetching and Downloading#
TEEHR provides methods to fetch data from external sources and download data from the TEEHR warehouse. This section covers both approaches.
Fetching External Data#
The fetch component retrieves data from USGS and the National Water Model (NWM),
validates it, and loads it directly into your evaluation.
USGS Streamflow Data#
Fetch observed streamflow data from USGS gages:
import teehr
from datetime import datetime
ev = teehr.LocalReadWriteEvaluation(dir_path="./my_eval", create_dir=True)
# First, load your USGS gage locations
ev.locations.load_spatial("./data/usgs_gages.geojson")
# Fetch streamflow data for all locations in the locations table
ev.fetch.usgs_streamflow(
start_date=datetime(2024, 1, 1),
end_date=datetime(2024, 1, 31),
service="iv", # "iv" = hourly instantaneous, "dv" = daily
filter_to_hourly=True, # Drop 15-minute data
convert_to_si=True # Convert from ft³/s to m³/s
)
See also: Fetch.usgs_streamflow()
The data is automatically:
Fetched from USGS NWIS
Converted to the TEEHR schema
Validated
Loaded into the
primary_timeseriestable
Parameters#
start_date,end_dateTime range to fetch (inclusive start, exclusive end).
service"iv"- Instantaneous values (15-min or hourly)"dv"- Daily mean values
chunk_byProcess data in chunks to manage memory:
"location_id","day","week","month","year", orNonefilter_to_hourlyWhen True, keeps only values on the hour (drops 15-minute data).
convert_to_siConvert from ft³/s to m³/s.
Edge Cases: Sub-Locations#
Some USGS sites have multiple measurement points. To handle these, use the underlying function directly with a dictionary:
from teehr.fetching.usgs.usgs import usgs_to_parquet
# Site with sub-locations
sites = [
{"site_no": "USGS-02449838", "description": "Main Gage"},
"USGS-01234567" # Regular site
]
usgs_to_parquet(
sites=sites,
start_date="2024-01-01",
end_date="2024-01-31",
output_parquet_dir="./cache/usgs"
)
# Then load the resulting Parquet files into TEEHR
ev.primary_timeseries.load_parquet("./cache/usgs/*.parquet")
NWM Retrospective Data (Points)#
Fetch NWM retrospective streamflow simulations at point locations:
# First, set up crosswalks mapping USGS to NWM location IDs
ev.location_crosswalks.load_csv("./data/usgs_nwm_crosswalk.csv")
# Fetch NWM v3.0 retrospective data
ev.fetch.nwm_retrospective_points(
nwm_version="nwm30",
variable_name="streamflow",
start_date=datetime(2020, 1, 1),
end_date=datetime(2020, 12, 31),
chunk_by="month"
)
See also: Fetch.nwm_retrospective_points()
Supported NWM Versions#
Version |
Date Range |
|---|---|
|
1993-01-01 to 2018-12-31 |
|
1979-01-01 to 2020-12-31 |
|
1979-02-01 to 2023-01-31 |
Supported Variables#
streamflow- Channel streamflow (m³/s)velocity- Channel velocity (m/s)
NWM Operational Data (Points)#
Fetch real-time NWM forecast data:
ev.fetch.nwm_operational_points(
nwm_version="nwm30",
forecast_configuration="analysis_assim",
variable_name="streamflow",
start_date=datetime(2024, 1, 1),
end_date=datetime(2024, 1, 7)
)
See also: Fetch.nwm_operational_points()
Forecast Configurations#
analysis_assim- Analysis and assimilationshort_range- Short-range forecast (0-18 hours)medium_range- Medium-range forecast (0-10 days)
NWM Gridded Data#
Fetch gridded NWM data (e.g., forcing variables) and compute zonal statistics:
# Fetch gridded precipitation and compute zonal means
ev.fetch.nwm_retrospective_grids(
nwm_version="nwm30",
variable_name="RAINRATE",
start_date=datetime(2020, 6, 1),
end_date=datetime(2020, 6, 30),
calculate_zonal_weights=True, # Compute weights first time
domain="CONUS"
)
See also: Fetch.nwm_retrospective_grids()
Zonal Weights#
When calculate_zonal_weights=True, TEEHR computes area-weighted averages
for each location’s drainage basin. The weights are cached for subsequent calls.
For custom polygons, provide a weights file:
ev.fetch.nwm_retrospective_grids(
nwm_version="nwm30",
variable_name="RAINRATE",
start_date="2020-06-01",
end_date="2020-06-30",
zonal_weights_filepath="./data/custom_weights.parquet"
)
Gridded Variables#
Forcing variables available:
RAINRATE- Precipitation rateT2D- 2-meter temperatureLWDOWN- Longwave radiationSWDOWN- Shortwave radiationQ2D- Specific humidityU2D,V2D- Wind componentsPSFC- Surface pressure
Downloading from TEEHR Warehouse#
The download component retrieves pre-processed data from the TEEHR data warehouse
via the TEEHR-HUB REST API. This is useful for quickly setting up evaluations with
curated datasets. Starting in May 2026 the API now requires authentication through the
use of API keys or bearer tokens, which can be obtained by contacting the TEEHR team.
Configure the API#
Authentication credentials and the API base URL are read automatically from
environment variables at startup, so calling configure() is optional for
common cases:
Environment variable |
Purpose |
|---|---|
|
Override the default API base URL |
|
API key sent as |
|
Bearer token sent as |
Set these in your shell or .env file rather than embedding secrets in code:
export TEEHR_DOWNLOAD_API_BASE_URL="some-alternative-api-endpoint"
export TEEHR_DOWNLOAD_API_KEY="your-api-key-here"
Then use the download component without passing credentials in Python:
# No configure() call needed — credentials are read from env vars
ev.download.locations(prefix="usgs", load=True)
# configure() can still override individual settings explicitly
ev.download.configure(
api_base_url="https://api.teehr.rtiamanzi.org",
verify_ssl=False,
timeout=120
)
See also: Download.configure()
Download Locations#
# Preview available locations
locs_df = ev.download.locations(prefix="usgs", limit=100)
print(locs_df.head())
# Download and load directly into evaluation
ev.download.locations(prefix="usgs", load=True)
# Filter by bounding box
ev.download.locations(
prefix="usgs",
bbox=[-85, 30, -80, 35], # [minx, miny, maxx, maxy]
load=True
)
# Include attributes in response (for preview only)
locs_with_attrs = ev.download.locations(
prefix="usgs",
include_attributes=True
)
See also: Download.locations()
Download Domain Data#
# Download all configurations
ev.download.configurations(load=True)
# Download specific unit
ev.download.units(name="m3/s", load=True)
# Download variables
ev.download.variables(load=True)
# Download attributes (definitions)
ev.download.attributes(type="continuous", load=True)
See also: Download.configurations(),
Download.units(),
Download.variables(),
Download.attributes()
Download Location Attributes#
# Download attributes for specific locations
ev.download.location_attributes(
location_id=["usgs-01010000", "usgs-01020000"],
load=True
)
# Download all attributes for locations with a prefix
ev.download.location_attributes(load=True)
See also: Download.location_attributes()
Download Crosswalks#
# Download crosswalks for NWM v3.0
ev.download.location_crosswalks(
secondary_location_id_prefix="nwm30",
load=True
)
See also: Download.location_crosswalks()
Download Timeseries#
# Download primary (observed) timeseries
ev.download.primary_timeseries(
primary_location_id=["usgs-01010000"],
start_date="2020-01-01",
end_date="2020-12-31",
load=True
)
# Download secondary (simulated) timeseries
ev.download.secondary_timeseries(
configuration_name="nwm30_retrospective",
primary_location_id=["usgs-01010000"],
start_date="2020-01-01",
end_date="2020-12-31",
load=True
)
See also: Download.primary_timeseries(),
Download.secondary_timeseries()
Complete Workflow Examples#
Example 1: New Evaluation with TEEHR Warehouse Data#
import teehr
# Create evaluation
ev = teehr.LocalReadWriteEvaluation(dir_path="./nwm_eval", create_dir=True)
# Download curated data from warehouse
ev.download.configure()
# Get domain tables
ev.download.units(load=True)
ev.download.variables(load=True)
ev.download.configurations(load=True)
ev.download.attributes(load=True)
# Get locations and crosswalks
ev.download.locations(primary_location_id=["usgs-01010000"], load=True)
ev.download.location_attributes(primary_location_id=["usgs-01010000"], load=True)
ev.download.location_crosswalks(secondary_location_id_prefix="nwm30", load=True)
# Get timeseries data
ev.download.primary_timeseries(
configuration_name="usgs_observations",
primary_location_id=["usgs-01010000"],
start_date="2020-01-01",
end_date="2020-12-31",
load=True
)
ev.download.secondary_timeseries(
configuration_name="nwm30_retrospective",
primary_location_id=["usgs-01010000"],
start_date="2020-01-01",
end_date="2020-12-31",
load=True
)
Example 2: Fresh Data from USGS and NWM#
import teehr
from datetime import datetime
ev = teehr.LocalReadWriteEvaluation(dir_path="./fresh_eval", create_dir=True)
# Load your location data
ev.locations.load_spatial("./data/my_gages.geojson")
ev.location_crosswalks.load_csv("./data/my_crosswalk.csv")
# Fetch fresh USGS data
ev.fetch.usgs_streamflow(
start_date=datetime(2024, 1, 1),
end_date=datetime(2024, 3, 31)
)
# Fetch corresponding NWM retrospective data
ev.fetch.nwm_retrospective_points(
nwm_version="nwm30",
variable_name="streamflow",
start_date=datetime(2020, 1, 1),
end_date=datetime(2022, 12, 31),
chunk_by="year"
)