.. _fetching: ======================== Fetching and Downloading ======================== TEEHR provides methods to fetch data from external sources and download data from the TEEHR warehouse. This section covers both approaches. Fetching External Data ====================== The ``fetch`` component retrieves data from USGS and the National Water Model (NWM), validates it, and loads it directly into your evaluation. USGS Streamflow Data -------------------- Fetch observed streamflow data from USGS gages: .. code-block:: python import teehr from datetime import datetime ev = teehr.LocalReadWriteEvaluation(dir_path="./my_eval", create_dir=True) # First, load your USGS gage locations ev.locations.load_spatial("./data/usgs_gages.geojson") # Fetch streamflow data for all locations in the locations table ev.fetch.usgs_streamflow( start_date=datetime(2024, 1, 1), end_date=datetime(2024, 1, 31), service="iv", # "iv" = hourly instantaneous, "dv" = daily filter_to_hourly=True, # Drop 15-minute data convert_to_si=True # Convert from ft³/s to m³/s ) See also: :meth:`Fetch.usgs_streamflow() ` The data is automatically: 1. Fetched from USGS NWIS 2. Converted to the TEEHR schema 3. Validated 4. Loaded into the ``primary_timeseries`` table Parameters ^^^^^^^^^^ ``start_date``, ``end_date`` Time range to fetch (inclusive start, exclusive end). ``service`` - ``"iv"`` - Instantaneous values (15-min or hourly) - ``"dv"`` - Daily mean values ``chunk_by`` Process data in chunks to manage memory: ``"location_id"``, ``"day"``, ``"week"``, ``"month"``, ``"year"``, or ``None`` ``filter_to_hourly`` When True, keeps only values on the hour (drops 15-minute data). ``convert_to_si`` Convert from ft³/s to m³/s. Edge Cases: Sub-Locations ^^^^^^^^^^^^^^^^^^^^^^^^^ Some USGS sites have multiple measurement points. To handle these, use the underlying function directly with a dictionary: .. code-block:: python from teehr.fetching.usgs.usgs import usgs_to_parquet # Site with sub-locations sites = [ {"site_no": "02449838", "description": "Main Gage"}, "01234567" # Regular site ] usgs_to_parquet( sites=sites, start_date="2024-01-01", end_date="2024-01-31", output_parquet_dir="./cache/usgs" ) # Then load the resulting Parquet files into TEEHR ev.load.primary_timeseries.from_parquet("./cache/usgs/*.parquet") NWM Retrospective Data (Points) ------------------------------- Fetch NWM retrospective streamflow simulations at point locations: .. code-block:: python # First, set up crosswalks mapping USGS to NWM location IDs ev.location_crosswalks.load_csv("./data/usgs_nwm_crosswalk.csv") # Fetch NWM v3.0 retrospective data ev.fetch.nwm_retrospective_points( nwm_version="nwm30", variable_name="streamflow", start_date=datetime(2020, 1, 1), end_date=datetime(2020, 12, 31), chunk_by="month" ) See also: :meth:`Fetch.nwm_retrospective_points() ` Supported NWM Versions ^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 * - Version - Date Range * - ``nwm20`` - 1993-01-01 to 2018-12-31 * - ``nwm21`` - 1979-01-01 to 2020-12-31 * - ``nwm30`` - 1979-02-01 to 2023-01-31 Supported Variables ^^^^^^^^^^^^^^^^^^^ - ``streamflow`` - Channel streamflow (m³/s) - ``velocity`` - Channel velocity (m/s) NWM Operational Data (Points) ----------------------------- Fetch real-time NWM forecast data: .. code-block:: python ev.fetch.nwm_operational_points( nwm_version="nwm30", forecast_configuration="analysis_assim", variable_name="streamflow", start_date=datetime(2024, 1, 1), end_date=datetime(2024, 1, 7) ) See also: :meth:`Fetch.nwm_operational_points() ` Forecast Configurations ^^^^^^^^^^^^^^^^^^^^^^^ - ``analysis_assim`` - Analysis and assimilation - ``short_range`` - Short-range forecast (0-18 hours) - ``medium_range`` - Medium-range forecast (0-10 days) NWM Gridded Data ---------------- Fetch gridded NWM data (e.g., forcing variables) and compute zonal statistics: .. code-block:: python # Fetch gridded precipitation and compute zonal means ev.fetch.nwm_retrospective_grids( nwm_version="nwm30", variable_name="RAINRATE", start_date=datetime(2020, 6, 1), end_date=datetime(2020, 6, 30), calculate_zonal_weights=True, # Compute weights first time domain="CONUS" ) See also: :meth:`Fetch.nwm_retrospective_grids() ` Zonal Weights ^^^^^^^^^^^^^ When ``calculate_zonal_weights=True``, TEEHR computes area-weighted averages for each location's drainage basin. The weights are cached for subsequent calls. For custom polygons, provide a weights file: .. code-block:: python ev.fetch.nwm_retrospective_grids( nwm_version="nwm30", variable_name="RAINRATE", start_date="2020-06-01", end_date="2020-06-30", zonal_weights_filepath="./data/custom_weights.parquet" ) Gridded Variables ^^^^^^^^^^^^^^^^^ Forcing variables available: - ``RAINRATE`` - Precipitation rate - ``T2D`` - 2-meter temperature - ``LWDOWN`` - Longwave radiation - ``SWDOWN`` - Shortwave radiation - ``Q2D`` - Specific humidity - ``U2D``, ``V2D`` - Wind components - ``PSFC`` - Surface pressure Downloading from TEEHR Warehouse ================================ The ``download`` component retrieves pre-processed data from the TEEHR data warehouse via the TEEHR-HUB REST API. This is useful for quickly setting up evaluations with curated datasets. Configure the API ----------------- .. code-block:: python # Default configuration (public TEEHR warehouse) ev.download.configure() # Or specify a custom endpoint ev.download.configure( api_base_url="https://api.teehr.rtiamanzi.org", verify_ssl=True ) See also: :meth:`Download.configure() ` Download Locations ------------------ .. code-block:: python # Preview available locations locs_df = ev.download.locations(prefix="usgs", limit=100) print(locs_df.head()) # Download and load directly into evaluation ev.download.locations(prefix="usgs", load=True) # Filter by bounding box ev.download.locations( prefix="usgs", bbox=[-85, 30, -80, 35], # [minx, miny, maxx, maxy] load=True ) # Include attributes in response (for preview only) locs_with_attrs = ev.download.locations( prefix="usgs", include_attributes=True ) See also: :meth:`Download.locations() ` Download Domain Data -------------------- .. code-block:: python # Download all configurations ev.download.configurations(load=True) # Download specific unit ev.download.units(name="m3/s", load=True) # Download variables ev.download.variables(load=True) # Download attributes (definitions) ev.download.attributes(type="continuous", load=True) See also: :meth:`Download.configurations() `, :meth:`Download.units() `, :meth:`Download.variables() `, :meth:`Download.attributes() ` Download Location Attributes ---------------------------- .. code-block:: python # Download attributes for specific locations ev.download.location_attributes( location_id=["usgs-01010000", "usgs-01020000"], load=True ) # Download all attributes for locations with a prefix ev.download.location_attributes(load=True) See also: :meth:`Download.location_attributes() ` Download Crosswalks ------------------- .. code-block:: python # Download crosswalks for NWM v3.0 ev.download.location_crosswalks( secondary_location_id_prefix="nwm30", load=True ) See also: :meth:`Download.location_crosswalks() ` Download Timeseries ------------------- .. code-block:: python # Download primary (observed) timeseries ev.download.primary_timeseries( primary_location_id=["usgs-01010000"], start_date="2020-01-01", end_date="2020-12-31", load=True ) # Download secondary (simulated) timeseries ev.download.secondary_timeseries( configuration_name="nwm30_retrospective", primary_location_id=["usgs-01010000"], start_date="2020-01-01", end_date="2020-12-31", load=True ) See also: :meth:`Download.primary_timeseries() `, :meth:`Download.secondary_timeseries() ` Complete Workflow Examples ========================== Example 1: New Evaluation with TEEHR Warehouse Data --------------------------------------------------- .. code-block:: python import teehr # Create evaluation ev = teehr.LocalReadWriteEvaluation(dir_path="./nwm_eval", create_dir=True) # Download curated data from warehouse ev.download.configure() # Get domain tables ev.download.units(load=True) ev.download.variables(load=True) ev.download.configurations(load=True) ev.download.attributes(load=True) # Get locations and crosswalks ev.download.locations(primary_location_id=["usgs-01010000"], load=True) ev.download.location_attributes(primary_location_id=["usgs-01010000"], load=True) ev.download.location_crosswalks(secondary_location_id_prefix="nwm30", load=True) # Get timeseries data ev.download.primary_timeseries( configuration_name="usgs_observations", primary_location_id=["usgs-01010000"], start_date="2020-01-01", end_date="2020-12-31", load=True ) ev.download.secondary_timeseries( configuration_name="nwm30_retrospective", primary_location_id=["usgs-01010000"], start_date="2020-01-01", end_date="2020-12-31", load=True ) Example 2: Fresh Data from USGS and NWM --------------------------------------- .. code-block:: python import teehr from datetime import datetime ev = teehr.LocalReadWriteEvaluation(dir_path="./fresh_eval", create_dir=True) # Load your location data ev.locations.load_spatial("./data/my_gages.geojson") ev.location_crosswalks.load_csv("./data/my_crosswalk.csv") # Fetch fresh USGS data ev.fetch.usgs_streamflow( start_date=datetime(2024, 1, 1), end_date=datetime(2024, 3, 31) ) # Fetch corresponding NWM retrospective data ev.fetch.nwm_retrospective_points( nwm_version="nwm30", variable_name="streamflow", start_date=datetime(2020, 1, 1), end_date=datetime(2022, 12, 31), chunk_by="year" )