Fetching and Downloading#

TEEHR provides methods to fetch data from external sources and download data from the TEEHR warehouse. This section covers both approaches.

Fetching External Data#

The fetch component retrieves data from USGS and the National Water Model (NWM), validates it, and loads it directly into your evaluation.

USGS Streamflow Data#

Fetch observed streamflow data from USGS gages:

import teehr
from datetime import datetime

ev = teehr.LocalReadWriteEvaluation(dir_path="./my_eval", create_dir=True)

# First, load your USGS gage locations
ev.locations.load_spatial("./data/usgs_gages.geojson")

# Fetch streamflow data for all locations in the locations table
ev.fetch.usgs_streamflow(
    start_date=datetime(2024, 1, 1),
    end_date=datetime(2024, 1, 31),
    service="iv",              # "iv" = hourly instantaneous, "dv" = daily
    filter_to_hourly=True,     # Drop 15-minute data
    convert_to_si=True         # Convert from ft³/s to m³/s
)

See also: Fetch.usgs_streamflow()

The data is automatically:

  1. Fetched from USGS NWIS

  2. Converted to the TEEHR schema

  3. Validated

  4. Loaded into the primary_timeseries table

Parameters#

start_date, end_date

Time range to fetch (inclusive start, exclusive end).

service
  • "iv" - Instantaneous values (15-min or hourly)

  • "dv" - Daily mean values

chunk_by

Process data in chunks to manage memory: "location_id", "day", "week", "month", "year", or None

filter_to_hourly

When True, keeps only values on the hour (drops 15-minute data).

convert_to_si

Convert from ft³/s to m³/s.

Edge Cases: Sub-Locations#

Some USGS sites have multiple measurement points. To handle these, use the underlying function directly with a dictionary:

from teehr.fetching.usgs.usgs import usgs_to_parquet

# Site with sub-locations
sites = [
    {"site_no": "02449838", "description": "Main Gage"},
    "01234567"  # Regular site
]

usgs_to_parquet(
    sites=sites,
    start_date="2024-01-01",
    end_date="2024-01-31",
    output_parquet_dir="./cache/usgs"
)

# Then load the resulting Parquet files into TEEHR
ev.load.primary_timeseries.from_parquet("./cache/usgs/*.parquet")

NWM Retrospective Data (Points)#

Fetch NWM retrospective streamflow simulations at point locations:

# First, set up crosswalks mapping USGS to NWM location IDs
ev.location_crosswalks.load_csv("./data/usgs_nwm_crosswalk.csv")

# Fetch NWM v3.0 retrospective data
ev.fetch.nwm_retrospective_points(
    nwm_version="nwm30",
    variable_name="streamflow",
    start_date=datetime(2020, 1, 1),
    end_date=datetime(2020, 12, 31),
    chunk_by="month"
)

See also: Fetch.nwm_retrospective_points()

Supported NWM Versions#

Version

Date Range

nwm20

1993-01-01 to 2018-12-31

nwm21

1979-01-01 to 2020-12-31

nwm30

1979-02-01 to 2023-01-31

Supported Variables#

  • streamflow - Channel streamflow (m³/s)

  • velocity - Channel velocity (m/s)

NWM Operational Data (Points)#

Fetch real-time NWM forecast data:

ev.fetch.nwm_operational_points(
    nwm_version="nwm30",
    forecast_configuration="analysis_assim",
    variable_name="streamflow",
    start_date=datetime(2024, 1, 1),
    end_date=datetime(2024, 1, 7)
)

See also: Fetch.nwm_operational_points()

Forecast Configurations#

  • analysis_assim - Analysis and assimilation

  • short_range - Short-range forecast (0-18 hours)

  • medium_range - Medium-range forecast (0-10 days)

NWM Gridded Data#

Fetch gridded NWM data (e.g., forcing variables) and compute zonal statistics:

# Fetch gridded precipitation and compute zonal means
ev.fetch.nwm_retrospective_grids(
    nwm_version="nwm30",
    variable_name="RAINRATE",
    start_date=datetime(2020, 6, 1),
    end_date=datetime(2020, 6, 30),
    calculate_zonal_weights=True,  # Compute weights first time
    domain="CONUS"
)

See also: Fetch.nwm_retrospective_grids()

Zonal Weights#

When calculate_zonal_weights=True, TEEHR computes area-weighted averages for each location’s drainage basin. The weights are cached for subsequent calls.

For custom polygons, provide a weights file:

ev.fetch.nwm_retrospective_grids(
    nwm_version="nwm30",
    variable_name="RAINRATE",
    start_date="2020-06-01",
    end_date="2020-06-30",
    zonal_weights_filepath="./data/custom_weights.parquet"
)

Gridded Variables#

Forcing variables available:

  • RAINRATE - Precipitation rate

  • T2D - 2-meter temperature

  • LWDOWN - Longwave radiation

  • SWDOWN - Shortwave radiation

  • Q2D - Specific humidity

  • U2D, V2D - Wind components

  • PSFC - Surface pressure

Downloading from TEEHR Warehouse#

The download component retrieves pre-processed data from the TEEHR data warehouse via the TEEHR-HUB REST API. This is useful for quickly setting up evaluations with curated datasets.

Configure the API#

# Default configuration (public TEEHR warehouse)
ev.download.configure()

# Or specify a custom endpoint
ev.download.configure(
    api_base_url="https://api.teehr.rtiamanzi.org",
    verify_ssl=True
)

See also: Download.configure()

Download Locations#

# Preview available locations
locs_df = ev.download.locations(prefix="usgs", limit=100)
print(locs_df.head())

# Download and load directly into evaluation
ev.download.locations(prefix="usgs", load=True)

# Filter by bounding box
ev.download.locations(
    prefix="usgs",
    bbox=[-85, 30, -80, 35],  # [minx, miny, maxx, maxy]
    load=True
)

# Include attributes in response (for preview only)
locs_with_attrs = ev.download.locations(
    prefix="usgs",
    include_attributes=True
)

See also: Download.locations()

Download Domain Data#

# Download all configurations
ev.download.configurations(load=True)

# Download specific unit
ev.download.units(name="m3/s", load=True)

# Download variables
ev.download.variables(load=True)

# Download attributes (definitions)
ev.download.attributes(type="continuous", load=True)

See also: Download.configurations(), Download.units(), Download.variables(), Download.attributes()

Download Location Attributes#

# Download attributes for specific locations
ev.download.location_attributes(
    location_id=["usgs-01010000", "usgs-01020000"],
    load=True
)

# Download all attributes for locations with a prefix
ev.download.location_attributes(load=True)

See also: Download.location_attributes()

Download Crosswalks#

# Download crosswalks for NWM v3.0
ev.download.location_crosswalks(
    secondary_location_id_prefix="nwm30",
    load=True
)

See also: Download.location_crosswalks()

Download Timeseries#

# Download primary (observed) timeseries
ev.download.primary_timeseries(
    primary_location_id=["usgs-01010000"],
    start_date="2020-01-01",
    end_date="2020-12-31",
    load=True
)

# Download secondary (simulated) timeseries
ev.download.secondary_timeseries(
    configuration_name="nwm30_retrospective",
    primary_location_id=["usgs-01010000"],
    start_date="2020-01-01",
    end_date="2020-12-31",
    load=True
)

See also: Download.primary_timeseries(), Download.secondary_timeseries()

Complete Workflow Examples#

Example 1: New Evaluation with TEEHR Warehouse Data#

import teehr

# Create evaluation
ev = teehr.LocalReadWriteEvaluation(dir_path="./nwm_eval", create_dir=True)

# Download curated data from warehouse
ev.download.configure()

# Get domain tables
ev.download.units(load=True)
ev.download.variables(load=True)
ev.download.configurations(load=True)
ev.download.attributes(load=True)

# Get locations and crosswalks
ev.download.locations(primary_location_id=["usgs-01010000"], load=True)
ev.download.location_attributes(primary_location_id=["usgs-01010000"], load=True)
ev.download.location_crosswalks(secondary_location_id_prefix="nwm30", load=True)

# Get timeseries data
ev.download.primary_timeseries(
     configuration_name="usgs_observations",
     primary_location_id=["usgs-01010000"],
     start_date="2020-01-01",
     end_date="2020-12-31",
     load=True
)
ev.download.secondary_timeseries(
     configuration_name="nwm30_retrospective",
     primary_location_id=["usgs-01010000"],
     start_date="2020-01-01",
     end_date="2020-12-31",
     load=True
)

Example 2: Fresh Data from USGS and NWM#

import teehr
from datetime import datetime

ev = teehr.LocalReadWriteEvaluation(dir_path="./fresh_eval", create_dir=True)

# Load your location data
ev.locations.load_spatial("./data/my_gages.geojson")
ev.location_crosswalks.load_csv("./data/my_crosswalk.csv")

# Fetch fresh USGS data
ev.fetch.usgs_streamflow(
    start_date=datetime(2024, 1, 1),
    end_date=datetime(2024, 3, 31)
)

# Fetch corresponding NWM retrospective data
ev.fetch.nwm_retrospective_points(
    nwm_version="nwm30",
    variable_name="streamflow",
    start_date=datetime(2020, 1, 1),
    end_date=datetime(2022, 12, 31),
    chunk_by="year"
)