Download#

class Download(ev)[source]#

A component class for downloading data from the TEEHR-Cloud data warehouse.

Methods

attributes

Fetch attributes from the warehouse API.

configurations

Fetch configurations from the warehouse API.

configure

Configure the warehouse API connection settings.

evaluation_subset

Download a subset of evaluation data from the warehouse API.

location_attributes

Fetch location attributes from the warehouse API.

location_crosswalks

Fetch location crosswalks from the warehouse API.

locations

Fetch locations from the warehouse API as a GeoDataFrame.

primary_timeseries

Fetch primary timeseries from the warehouse API.

secondary_timeseries

Fetch secondary timeseries from the warehouse API.

units

Fetch units from the warehouse API.

variables

Fetch variables from the warehouse API.

Attributes

DEFAULT_TIMEOUT

attributes(name: str | None = None, type: str | None = None, page_size: int = 10000, load: bool = False, write_mode: str = 'append', timeout: int | None = None, **kwargs) DataFrame | None[source]#

Fetch attributes from the warehouse API.

Parameters:
  • name (str, optional) – Filter by attribute name

  • type (str, optional) – Filter by attribute type (“categorical” or “continuous”)

  • page_size (int, optional) – Number of attributes to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.

  • load (bool, optional) – If True, load the downloaded data into the local evaluation “attributes” table. Default: False

  • write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”

  • timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).

  • **kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame containing attribute definitions, or None if load=True

Examples

>>> # Fetch all categorical attributes
>>> attrs = ev.download.attributes(type="categorical")
>>> # Fetch and load into local evaluation
>>> attrs = ev.download.attributes(load=True)
configurations(name: str | None = None, type: str | None = None, page_size: int = 10000, load: bool = False, write_mode: str = 'append', timeout: int | None = None, **kwargs) DataFrame | None[source]#

Fetch configurations from the warehouse API.

Parameters:
  • name (str, optional) – Filter by configuration name

  • type (str, optional) – Filter by configuration type (“primary” or “secondary”)

  • page_size (int, optional) – Number of configurations to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.

  • load (bool, optional) – If True, load the downloaded data into the local evaluation “configurations” table. Default: False

  • write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”

  • timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).

  • **kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame containing configuration definitions, or None if load=True

Examples

>>> # Fetch all primary configurations
>>> configs = ev.download.configurations(type="primary")
>>> # Fetch and load into local evaluation
>>> configs = ev.download.configurations(load=True)
configure(api_base_url: str | None = None, api_port: int | None = None, verify_ssl: bool = True, timeout: int = 60) Download[source]#

Configure the warehouse API connection settings.

Parameters:
  • api_base_url (str, optional) – Base URL for the TEEHR warehouse API. Default: “https://api.teehr.rtiamanzi.org

  • api_port (int, optional) – Port number for the API. If provided, will be appended to the base URL (e.g., “https://api.teehr.rtiamanzi.org:8443”).

  • verify_ssl (bool, optional) – Whether to verify SSL certificates when making requests. Default: True

  • timeout (int, optional) – Default request timeout in seconds for all download methods. Default: 60

Returns:

Download – Returns self for method chaining

Examples

>>> ev.download.configure(
...     api_base_url="https://api.teehr.rtiamanzi.org",
...     api_port=8443,
...     verify_ssl=True,
...     timeout=120
... )
>>> locations = ev.download.locations(prefix="usgs")
evaluation_subset(start_date: str | datetime | Timestamp, end_date: str | datetime | Timestamp, primary_configuration_name: str, secondary_configuration_name: str, location_ids: str | List[str] | None = None, prefix: str | None = None, bbox: List[float] | None = None, page_size: int = 10000, timeout: int | None = None) None[source]#

Download a subset of evaluation data from the warehouse API.

Parameters:
  • start_date (Union[str, datetime, pd.Timestamp]) – Start date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp.

  • end_date (Union[str, datetime, pd.Timestamp]) – End date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp.

  • primary_configuration_name (str) – Name of the primary configuration to include.

  • secondary_configuration_name (str) – Name of the secondary configuration to include.

  • location_ids (str or list of str, optional) – Location ID or list of location IDs to include in the subset.

  • prefix (str, optional) – Filter locations by ID prefix (e.g., “usgs”, “nwm30”).

  • bbox (list of float, optional) – Bounding box to filter locations by spatial extent, in the format [minx, miny, maxx, maxy].

  • page_size (int, optional) – Number of series items to fetch per API request for timeseries. Decrease if timeout errors are encountered. Default: 10000

  • timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).

Returns:

None – Loads the subset data to local iceberg tables.

Examples

>>> ev.download.evaluation_subset(
...     prefix="usgs",
...     bbox=[-120.0, 35.0, -119.0, 36.0],
...     start_date="2005-01-01",
...     end_date="2020-01-02",
...     primary_configuration_name="usgs_observations",
...     secondary_configuration_name="nwm30_retrospective",
...     page_size=5000
... )
location_attributes(location_id: str | List[str] | None = None, attribute_name: str | None = None, page_size: int = 10000, load: bool = False, write_mode: str = 'append', timeout: int | None = None, **kwargs) DataFrame | None[source]#

Fetch location attributes from the warehouse API.

Parameters:
  • location_id (str or list of str, optional) – Filter by location ID(s)

  • attribute_name (str, optional) – Filter by attribute name

  • page_size (int, optional) – Number of location attributes to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.

  • load (bool, optional) – If True, load the downloaded data into the local evaluation “location_attributes” table. Default: False

  • write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”

  • timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).

  • **kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame containing location attribute values, or None if load=True

Examples

>>> # Fetch attributes for specific locations
>>> loc_ids = ["usgs-01010000", "usgs-01010500"]
>>> loc_attrs = ev.download.location_attributes(
...     location_id=loc_ids
... )
>>> # Fetch and load into local evaluation
>>> loc_attrs = ev.download.location_attributes(load=True)
location_crosswalks(primary_location_id: str | List[str] | None = None, secondary_location_id: str | List[str] | None = None, primary_location_id_prefix: str | None = None, secondary_location_id_prefix: str | None = None, page_size: int = 10000, load: bool = False, write_mode: str = 'append', timeout: int | None = None, **kwargs) DataFrame | None[source]#

Fetch location crosswalks from the warehouse API.

Parameters:
  • primary_location_id (str or list of str, optional) – Filter by primary location ID(s)

  • secondary_location_id (str or list of str, optional) – Filter by secondary location ID(s)

  • primary_location_id_prefix (str, optional) – Filter crosswalks by primary location ID prefix (e.g., “usgs”). Passed as a query parameter to the API. Default: None

  • secondary_location_id_prefix (str, optional) – Filter crosswalks by secondary location ID prefix (e.g., “nwm30”). Passed as a query parameter to the API. Default: None

  • page_size (int, optional) – Number of location crosswalks to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.

  • load (bool, optional) – If True, load the downloaded data into the local evaluation “location_crosswalks” table. Default: False

  • write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”

  • timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).

  • **kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame containing location crosswalk mappings, or None if load=True

Examples

>>> # Fetch crosswalks for specific primary locations
>>> loc_ids = ["usgs-01010000", "usgs-01010500"]
>>> crosswalks = ev.download.location_crosswalks(
...     primary_location_id=loc_ids
... )
>>> # Fetch crosswalks filtered by ID prefixes
>>> crosswalks = ev.download.location_crosswalks(
...     primary_location_id_prefix="usgs",
...     secondary_location_id_prefix="nwm30"
... )
>>> # Fetch and load into local evaluation
>>> crosswalks = ev.download.location_crosswalks(load=True)
locations(prefix: str | None = None, ids: str | List[str] | None = None, bbox: List[float] | None = None, include_attributes: bool = False, page_size: int = 10000, load: bool = False, write_mode: str = 'append', timeout: int | None = None, **kwargs) GeoDataFrame | None[source]#

Fetch locations from the warehouse API as a GeoDataFrame.

Parameters:
  • prefix (str, optional) – Filter locations by ID prefix (e.g., “usgs”, “nwm30”)

  • ids (str or list of str, optional) – Filter locations by specific IDs.

  • bbox (list of float, optional) – Bounding box to filter locations by spatial extent, in the format [minx, miny, maxx, maxy].

  • include_attributes (bool, optional) – Whether to include location attributes in the response. Default: False

  • page_size (int, optional) – Number of locations to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.

  • load (bool, optional) – If True, load the downloaded data into the local evaluation “locations” table. Default: False

  • write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”

  • timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).

  • **kwargs – Additional query parameters to pass to the API

Returns:

Union[gpd.GeoDataFrame, None] – GeoDataFrame containing locations with geometry, or None if load=True

Examples

>>> # Fetch USGS locations with attributes
>>> locations = ev.download.locations(
...     prefix="usgs",
...     include_attributes=True
... )
>>> # Fetch and load into local evaluation
>>> locations = ev.download.locations(
...     prefix="usgs",
...     load=True
... )
primary_timeseries(primary_location_id: str | List[str], configuration_name: str | List[str], variable_name: str | None = None, start_date: str | datetime | Timestamp | None = None, end_date: str | datetime | Timestamp | None = None, load: bool = False, write_mode: str = 'append', page_size: int = 10000, timeout: int | None = None, **kwargs) DataFrame | None[source]#

Fetch primary timeseries from the warehouse API.

Parameters:
  • primary_location_id (str or list of str) – Filter by primary location ID(s)

  • configuration_name (str or list of str) – Filter by configuration name(s)

  • variable_name (str, optional) – Filter by variable name

  • start_date (Union[str, datetime, pd.Timestamp], optional) – Start date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp.

  • end_date (Union[str, datetime, pd.Timestamp], optional) – End date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp. If None, only start_date is used.

  • load (bool, optional) – If True, load the downloaded data into the local evaluation “primary_timeseries” table. Default: False

  • write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”

  • page_size (int, optional) – Number of series items to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.

  • timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).

  • **kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame with TEEHR primary timeseries schema, or None if load=True

Examples

>>> # Fetch USGS observations for specific locations and date range
>>> timeseries = ev.download.primary_timeseries(
...     primary_location_id=["usgs-01010000", "usgs-01010500"],
...     configuration_name="usgs_observations",
...     start_date="1990-10-01",
...     end_date="1990-10-02"
... )
>>> # Fetch and load into local evaluation
>>> timeseries = ev.download.primary_timeseries(
...     primary_location_id=["usgs-01010000"],
...     configuration_name="usgs_observations",
...     start_date="1990-10-01",
...     end_date="1990-10-02",
...     load=True
... )
secondary_timeseries(primary_location_id: str | List[str] | None = None, secondary_location_id: str | List[str] | None = None, configuration_name: str | List[str] | None = None, variable_name: str | None = None, start_date: str | datetime | Timestamp | None = None, end_date: str | datetime | Timestamp | None = None, load: bool = False, write_mode: str = 'append', page_size: int = 10000, timeout: int | None = None, **kwargs) DataFrame | None[source]#

Fetch secondary timeseries from the warehouse API.

Parameters:
  • primary_location_id (str or list of str, optional) – Filter by primary location ID(s). Either primary_location_id or secondary_location_id must be provided.

  • secondary_location_id (str or list of str, optional) – Filter by secondary location ID(s). Either primary_location_id or secondary_location_id must be provided.

  • configuration_name (str or list of str, optional) – Filter by configuration name(s)

  • variable_name (str, optional) – Filter by variable name

  • start_date (Union[str, datetime, pd.Timestamp], optional) – Start date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp.

  • end_date (Union[str, datetime, pd.Timestamp], optional) – End date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp. If None, only start_date is used.

  • load (bool, optional) – If True, load the downloaded data into the local evaluation “secondary_timeseries” table. Default: False

  • write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”

  • page_size (int, optional) – Number of series items to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.

  • timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).

  • **kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame with TEEHR secondary timeseries schema, or None if load=True

Raises:

ValueError – If neither primary_location_id nor secondary_location_id is provided

Examples

>>> # Fetch NWM retrospective for specific locations and date range
>>> timeseries = ev.download.secondary_timeseries(
...     primary_location_id=["usgs-01010000", "usgs-01010500"],
...     configuration_name="nwm30_retrospective",
...     start_date="1990-10-01",
...     end_date="1990-10-02"
... )
>>> # Fetch and load into local evaluation
>>> timeseries = ev.download.secondary_timeseries(
...     primary_location_id=["usgs-01010000"],
...     configuration_name="nwm30_retrospective",
...     start_date="1990-10-01",
...     end_date="1990-10-02",
...     load=True
... )
units(name: str | None = None, page_size: int = 10000, load: bool = False, write_mode: str = 'append', timeout: int | None = None, **kwargs) DataFrame | None[source]#

Fetch units from the warehouse API.

Parameters:
  • name (str, optional) – Filter by unit name

  • page_size (int, optional) – Number of units to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.

  • load (bool, optional) – If True, load the downloaded data into the local evaluation “units” table. Default: False

  • write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”

  • timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).

  • **kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame containing unit definitions, or None if load=True

Examples

>>> # Fetch all units
>>> units = ev.download.units()
>>> # Fetch and load into local evaluation
>>> units = ev.download.units(load=True)
variables(name: str | None = None, page_size: int = 10000, load: bool = False, write_mode: str = 'append', timeout: int | None = None, **kwargs) DataFrame | None[source]#

Fetch variables from the warehouse API.

Parameters:
  • name (str, optional) – Filter by variable name

  • page_size (int, optional) – Number of variables to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.

  • load (bool, optional) – If True, load the downloaded data into the local evaluation “variables” table. Default: False

  • write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”

  • timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).

  • **kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame containing variable definitions, or None if load=True

Examples

>>> # Fetch all variables
>>> variables = ev.download.variables()
>>> # Fetch and load into local evaluation
>>> variables = ev.download.variables(load=True)