Fetching and Downloading#

TEEHR provides methods to fetch data from external sources and download data from the TEEHR warehouse. This section covers both approaches.

Fetching External Data#

The fetch component retrieves data from USGS and the National Water Model (NWM), validates it, and loads it directly into your evaluation.

USGS Streamflow Data#

Fetch observed streamflow data from USGS gages:

import teehr
from datetime import datetime

ev = teehr.LocalReadWriteEvaluation(dir_path="./my_eval", create_dir=True)

# First, load your USGS gage locations
ev.locations.load_spatial("./data/usgs_gages.geojson")

# Fetch streamflow data for all locations in the locations table
ev.fetch.usgs_streamflow(
    start_date=datetime(2024, 1, 1),
    end_date=datetime(2024, 1, 31),
    service="iv",              # "iv" = hourly instantaneous, "dv" = daily
    filter_to_hourly=True,     # Drop 15-minute data
    convert_to_si=True         # Convert from ft³/s to m³/s
)

Parameters#

start_date, end_date

Time range to fetch (inclusive start, exclusive end).

service

"iv" - Instantaneous values (15-min or hourly)
"dv" - Daily mean values

chunk_by

Process data in chunks to manage memory: "location_id", "day", "week", "month", "year", or None

filter_to_hourly

When True, keeps only values on the hour (drops 15-minute data).

convert_to_si

Convert from ft³/s to m³/s.

Edge Cases: Sub-Locations#

Some USGS sites have multiple measurement points. To handle these, use the underlying function directly with a dictionary:

from teehr.fetching.usgs.usgs import usgs_to_parquet

# Site with sub-locations
sites = [
    {"site_no": "USGS-02449838", "description": "Main Gage"},
    "USGS-01234567"  # Regular site
]

usgs_to_parquet(
    sites=sites,
    start_date="2024-01-01",
    end_date="2024-01-31",
    output_parquet_dir="./cache/usgs"
)

# Then load the resulting Parquet files into TEEHR
ev.primary_timeseries.load_parquet("./cache/usgs/*.parquet")

NWM Retrospective Data (Points)#

Fetch NWM retrospective streamflow simulations at point locations:

# First, set up crosswalks mapping USGS to NWM location IDs
ev.location_crosswalks.load_csv("./data/usgs_nwm_crosswalk.csv")

# Fetch NWM v3.0 retrospective data
ev.fetch.nwm_retrospective_points(
    nwm_version="nwm30",
    variable_name="streamflow",
    start_date=datetime(2020, 1, 1),
    end_date=datetime(2020, 12, 31),
    chunk_by="month"
)

Supported NWM Versions#

Version	Date Range
`nwm20`	1993-01-01 to 2018-12-31
`nwm21`	1979-01-01 to 2020-12-31
`nwm30`	1979-02-01 to 2023-01-31

Supported Variables#

streamflow - Channel streamflow (m³/s)
velocity - Channel velocity (m/s)

NWM Operational Data (Points)#

Fetch real-time NWM forecast data:

ev.fetch.nwm_operational_points(
    nwm_version="nwm30",
    forecast_configuration="analysis_assim",
    variable_name="streamflow",
    start_date=datetime(2024, 1, 1),
    end_date=datetime(2024, 1, 7)
)

Forecast Configurations#

analysis_assim - Analysis and assimilation
short_range - Short-range forecast (0-18 hours)
medium_range - Medium-range forecast (0-10 days)

NWM Gridded Data#

Fetch gridded NWM data (e.g., forcing variables) and compute zonal statistics:

# Fetch gridded precipitation and compute zonal means
ev.fetch.nwm_retrospective_grids(
    nwm_version="nwm30",
    variable_name="RAINRATE",
    start_date=datetime(2020, 6, 1),
    end_date=datetime(2020, 6, 30),
    calculate_zonal_weights=True,  # Compute weights first time
    domain="CONUS"
)

Zonal Weights#

When calculate_zonal_weights=True, TEEHR computes area-weighted averages for each location’s drainage basin. The weights are cached for subsequent calls.

For custom polygons, provide a weights file:

ev.fetch.nwm_retrospective_grids(
    nwm_version="nwm30",
    variable_name="RAINRATE",
    start_date="2020-06-01",
    end_date="2020-06-30",
    zonal_weights_filepath="./data/custom_weights.parquet"
)

Gridded Variables#

Forcing variables available:

RAINRATE - Precipitation rate
T2D - 2-meter temperature
LWDOWN - Longwave radiation
SWDOWN - Shortwave radiation
Q2D - Specific humidity
U2D, V2D - Wind components
PSFC - Surface pressure

Downloading from TEEHR Warehouse#

The download component retrieves pre-processed data from the TEEHR data warehouse via the TEEHR-HUB REST API. This is useful for quickly setting up evaluations with curated datasets. Starting in May 2026 the API now requires authentication through the use of API keys or bearer tokens, which can be obtained by contacting the TEEHR team.

Configure the API#

Authentication credentials and the API base URL are read automatically from environment variables at startup, so calling configure() is optional for common cases:

Environment variable	Purpose
`TEEHR_DOWNLOAD_API_BASE_URL`	Override the default API base URL
`TEEHR_DOWNLOAD_API_KEY`	API key sent as `x-api-key`
`TEEHR_DOWNLOAD_BEARER_TOKEN`	Bearer token sent as `Authorization: Bearer <token>`

Set these in your shell or .env file rather than embedding secrets in code:

export TEEHR_DOWNLOAD_API_BASE_URL="some-alternative-api-endpoint"
export TEEHR_DOWNLOAD_API_KEY="your-api-key-here"

Then use the download component without passing credentials in Python:

# No configure() call needed — credentials are read from env vars
ev.download.locations(prefix="usgs", load=True)

# configure() can still override individual settings explicitly
ev.download.configure(
    api_base_url="https://api.teehr.rtiamanzi.org",
    verify_ssl=False,
    timeout=120
)

Download Locations#

# Preview available locations
locs_df = ev.download.locations(prefix="usgs", limit=100)
print(locs_df.head())

# Download and load directly into evaluation
ev.download.locations(prefix="usgs", load=True)

# Filter by bounding box
ev.download.locations(
    prefix="usgs",
    bbox=[-85, 30, -80, 35],  # [minx, miny, maxx, maxy]
    load=True
)

# Include attributes in response (for preview only)
locs_with_attrs = ev.download.locations(
    prefix="usgs",
    include_attributes=True
)

Download Domain Data#

# Download all configurations
ev.download.configurations(load=True)

# Download specific unit
ev.download.units(name="m3/s", load=True)

# Download variables
ev.download.variables(load=True)

# Download attributes (definitions)
ev.download.attributes(type="continuous", load=True)

Download Location Attributes#

# Download attributes for specific locations
ev.download.location_attributes(
    location_id=["usgs-01010000", "usgs-01020000"],
    load=True
)

# Download all attributes for locations with a prefix
ev.download.location_attributes(load=True)

Download Crosswalks#

# Download crosswalks for NWM v3.0
ev.download.location_crosswalks(
    secondary_location_id_prefix="nwm30",
    load=True
)

Download Timeseries#

# Download primary (observed) timeseries
ev.download.primary_timeseries(
    primary_location_id=["usgs-01010000"],
    start_date="2020-01-01",
    end_date="2020-12-31",
    load=True
)

# Download secondary (simulated) timeseries
ev.download.secondary_timeseries(
    configuration_name="nwm30_retrospective",
    primary_location_id=["usgs-01010000"],
    start_date="2020-01-01",
    end_date="2020-12-31",
    load=True
)

Complete Workflow Examples#

Example 1: New Evaluation with TEEHR Warehouse Data#

import teehr

# Create evaluation
ev = teehr.LocalReadWriteEvaluation(dir_path="./nwm_eval", create_dir=True)

# Download curated data from warehouse
ev.download.configure()

# Get domain tables
ev.download.units(load=True)
ev.download.variables(load=True)
ev.download.configurations(load=True)
ev.download.attributes(load=True)

# Get locations and crosswalks
ev.download.locations(primary_location_id=["usgs-01010000"], load=True)
ev.download.location_attributes(primary_location_id=["usgs-01010000"], load=True)
ev.download.location_crosswalks(secondary_location_id_prefix="nwm30", load=True)

# Get timeseries data
ev.download.primary_timeseries(
     configuration_name="usgs_observations",
     primary_location_id=["usgs-01010000"],
     start_date="2020-01-01",
     end_date="2020-12-31",
     load=True
)
ev.download.secondary_timeseries(
     configuration_name="nwm30_retrospective",
     primary_location_id=["usgs-01010000"],
     start_date="2020-01-01",
     end_date="2020-12-31",
     load=True
)

Example 2: Fresh Data from USGS and NWM#

import teehr
from datetime import datetime

ev = teehr.LocalReadWriteEvaluation(dir_path="./fresh_eval", create_dir=True)

# Load your location data
ev.locations.load_spatial("./data/my_gages.geojson")
ev.location_crosswalks.load_csv("./data/my_crosswalk.csv")

# Fetch fresh USGS data
ev.fetch.usgs_streamflow(
    start_date=datetime(2024, 1, 1),
    end_date=datetime(2024, 3, 31)
)

# Fetch corresponding NWM retrospective data
ev.fetch.nwm_retrospective_points(
    nwm_version="nwm30",
    variable_name="streamflow",
    start_date=datetime(2020, 1, 1),
    end_date=datetime(2022, 12, 31),
    chunk_by="year"
)