Download#

class Download(ev)[source]#

A component class for downloading data from the TEEHR-Cloud data warehouse.

Methods

`attributes`	Fetch attributes from the warehouse API.
`configurations`	Fetch configurations from the warehouse API.
`configure`	Configure the warehouse API connection settings.
`evaluation_subset`	Download a subset of evaluation data from the warehouse API.
`location_attributes`	Fetch location attributes from the warehouse API.
`location_crosswalks`	Fetch location crosswalks from the warehouse API.
`locations`	Fetch locations from the warehouse API as a GeoDataFrame.
`primary_timeseries`	Fetch primary timeseries from the warehouse API.
`secondary_timeseries`	Fetch secondary timeseries from the warehouse API.
`units`	Fetch units from the warehouse API.
`variables`	Fetch variables from the warehouse API.

Attributes

DEFAULT_TIMEOUT

attributes(name: str = None, type: str = None, page_size: int | None = None, load: bool = False, write_mode: str = 'append', timeout: int = None, **kwargs) → DataFrame | None[source]#

Fetch attributes from the warehouse API.

Parameters:

name (str, optional) – Filter by attribute name
type (str, optional) – Filter by attribute type (“categorical” or “continuous”)
page_size (int, optional) – Number of attributes to fetch per API request. Decrease if timeout errors are encountered. If None, the API determines page size based on server configuration and auth.
load (bool, optional) – If True, load the downloaded data into the local evaluation “attributes” table. Default: False
write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”
timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).
**kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame containing attribute definitions, or None if load=True

Examples

>>> # Fetch all categorical attributes
>>> attrs = ev.download.attributes(type="categorical")
>>> # Fetch and load into local evaluation
>>> attrs = ev.download.attributes(load=True)

configurations(name: str = None, timeseries_type: str = None, page_size: int | None = None, load: bool = False, write_mode: str = 'append', timeout: int = None, **kwargs) → DataFrame | None[source]#

Fetch configurations from the warehouse API.

Parameters:

name (str, optional) – Filter by configuration name
timeseries_type (str, optional) – Filter by configuration type (“primary” or “secondary”)
page_size (int, optional) – Number of configurations to fetch per API request. Decrease if timeout errors are encountered. If None, the API determines page size based on server configuration and auth.
load (bool, optional) – If True, load the downloaded data into the local evaluation “configurations” table. Default: False
write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”
timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).
**kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame containing configuration definitions, or None if load=True

Examples

>>> # Fetch all primary configurations
>>> configs = ev.download.configurations(timeseries_type="primary")
>>> # Fetch and load into local evaluation
>>> configs = ev.download.configurations(load=True)

configure(api_base_url: str = None, api_port: int = None, api_key: str = None, bearer_token: str = None, verify_ssl: bool = True, timeout: int = 60) → Download[source]#

Configure the warehouse API connection settings.

Parameters:

api_base_url (str, optional) – Base URL for the TEEHR warehouse API. Default: “https://api.teehr.rtiamanzi.org”
api_port (int, optional) – Port number for the API. If provided, will be appended to the base URL (e.g., “https://api.teehr.rtiamanzi.org:8443”).
api_key (str, optional) – API key for teehr-hub authenticated routes. Sent as x-api-key. For testing only. Please use the TEEHR_DOWNLOAD_API_KEY environment variable.
bearer_token (str, optional) – Bearer token for authenticated routes. Sent as Authorization: Bearer <token>. For testing only. Please use the TEEHR_DOWNLOAD_BEARER_TOKEN environment variable.
verify_ssl (bool, optional) – Whether to verify SSL certificates when making requests. Default: True
timeout (int, optional) – Default request timeout in seconds for all download methods. Default: 60

Returns:

Download – Returns self for method chaining

Examples

>>> ev.download.configure(
...     api_base_url="https://api.teehr.rtiamanzi.org",
...     api_port=8443,
...     verify_ssl=True,
...     timeout=120
... )
>>> locations = ev.download.locations(prefix="usgs")

evaluation_subset(start_date: str | datetime | Timestamp, end_date: str | datetime | Timestamp, primary_configuration_name: str, secondary_configuration_name: str, location_ids: str | List[str] = None, prefix: str = None, bbox: List[float] = None, page_size: int | None = None, timeout: int = None) → None[source]#

Download a subset of evaluation data from the warehouse API.

Parameters:

start_date (Union[str, datetime, pd.Timestamp]) – Start date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp.
end_date (Union[str, datetime, pd.Timestamp]) – End date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp.
primary_configuration_name (str) – Name of the primary configuration to include.
secondary_configuration_name (str) – Name of the secondary configuration to include.
location_ids (str or list of str, optional) – Location ID or list of location IDs to include in the subset.
prefix (str, optional) – Filter locations by ID prefix (e.g., “usgs”, “nwm30”).
bbox (list of float, optional) – Bounding box to filter locations by spatial extent, in the format [minx, miny, maxx, maxy].
page_size (int, optional) – Number of series items to fetch per API request for timeseries. Decrease if timeout errors are encountered. If None, the API determines page size based on server configuration and auth.
timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).

Returns:

None – Loads the subset data to local iceberg tables.

Examples

>>> ev.download.evaluation_subset(
...     prefix="usgs",
...     bbox=[-120.0, 35.0, -119.0, 36.0],
...     start_date="2005-01-01",
...     end_date="2020-01-02",
...     primary_configuration_name="usgs_observations",
...     secondary_configuration_name="nwm30_retrospective",
...     page_size=5000
... )

location_attributes(location_id: str | List[str] = None, attribute_name: str = None, page_size: int | None = None, load: bool = False, write_mode: str = 'append', timeout: int = None, **kwargs) → DataFrame | None[source]#

Fetch location attributes from the warehouse API.

Parameters:

location_id (str or list of str, optional) – Filter by location ID(s)
attribute_name (str, optional) – Filter by attribute name
page_size (int, optional) – Number of location attributes to fetch per API request. Decrease if timeout errors are encountered. If None, the API determines page size based on server configuration and auth.
load (bool, optional) – If True, load the downloaded data into the local evaluation “location_attributes” table. Default: False
write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”
timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).
**kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame containing location attribute values, or None if load=True

Examples

>>> # Fetch attributes for specific locations
>>> loc_ids = ["usgs-01010000", "usgs-01010500"]
>>> loc_attrs = ev.download.location_attributes(
...     location_id=loc_ids
... )
>>> # Fetch and load into local evaluation
>>> loc_attrs = ev.download.location_attributes(load=True)

location_crosswalks(primary_location_id: str | List[str] = None, secondary_location_id: str | List[str] = None, primary_location_id_prefix: str = None, secondary_location_id_prefix: str = None, page_size: int | None = None, load: bool = False, write_mode: str = 'append', timeout: int = None, **kwargs) → DataFrame | None[source]#

Fetch location crosswalks from the warehouse API.

Parameters:

primary_location_id (str or list of str, optional) – Filter by primary location ID(s)
secondary_location_id (str or list of str, optional) – Filter by secondary location ID(s)
primary_location_id_prefix (str, optional) – Filter crosswalks by primary location ID prefix (e.g., “usgs”). Passed as a query parameter to the API. Default: None
secondary_location_id_prefix (str, optional) – Filter crosswalks by secondary location ID prefix (e.g., “nwm30”). Passed as a query parameter to the API. Default: None
page_size (int, optional) – Number of location crosswalks to fetch per API request. Decrease if timeout errors are encountered. If None, the API determines page size based on server configuration and auth.
load (bool, optional) – If True, load the downloaded data into the local evaluation “location_crosswalks” table. Default: False
write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”
timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).
**kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame containing location crosswalk mappings, or None if load=True

Examples

>>> # Fetch crosswalks for specific primary locations
>>> loc_ids = ["usgs-01010000", "usgs-01010500"]
>>> crosswalks = ev.download.location_crosswalks(
...     primary_location_id=loc_ids
... )
>>> # Fetch crosswalks filtered by ID prefixes
>>> crosswalks = ev.download.location_crosswalks(
...     primary_location_id_prefix="usgs",
...     secondary_location_id_prefix="nwm30"
... )
>>> # Fetch and load into local evaluation
>>> crosswalks = ev.download.location_crosswalks(load=True)

locations(prefix: str = None, ids: str | List[str] = None, bbox: List[float] = None, include_attributes: bool = False, page_size: int | None = None, load: bool = False, write_mode: str = 'append', timeout: int = None, **kwargs) → GeoDataFrame | None[source]#

Fetch locations from the warehouse API as a GeoDataFrame.

Parameters:

prefix (str, optional) – Filter locations by ID prefix (e.g., “usgs”, “nwm30”)
ids (str or list of str, optional) – Filter locations by specific IDs.
bbox (list of float, optional) – Bounding box to filter locations by spatial extent, in the format [minx, miny, maxx, maxy].
include_attributes (bool, optional) – Whether to include location attributes in the response. Default: False
page_size (int, optional) – Number of locations to fetch per API request. Decrease if timeout errors are encountered. If None, the API determines page size based on server configuration and auth.
load (bool, optional) – If True, load the downloaded data into the local evaluation “locations” table. Default: False
write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”
timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).
**kwargs – Additional query parameters to pass to the API

Returns:

Union[gpd.GeoDataFrame, None] – GeoDataFrame containing locations with geometry, or None if load=True

Examples

>>> # Fetch USGS locations with attributes
>>> locations = ev.download.locations(
...     prefix="usgs",
...     include_attributes=True
... )
>>> # Fetch and load into local evaluation
>>> locations = ev.download.locations(
...     prefix="usgs",
...     load=True
... )

Fetch primary timeseries from the warehouse API.

Parameters:

primary_location_id (str or list of str) – Filter by primary location ID(s)
configuration_name (str or list of str) – Filter by configuration name(s)
variable_name (str, optional) – Filter by variable name
start_date (Union[str, datetime, pd.Timestamp], optional) – Start date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp.
end_date (Union[str, datetime, pd.Timestamp], optional) – End date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp. If None, only start_date is used.
load (bool, optional) – If True, load the downloaded data into the local evaluation “primary_timeseries” table. Default: False
write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”
page_size (int, optional) – Number of series items to fetch per API request. Decrease if timeout errors are encountered. If None, the API determines page size based on server configuration and auth.
timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).
**kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame with TEEHR primary timeseries schema, or None if load=True

Examples

>>> # Fetch USGS observations for specific locations and date range
>>> timeseries = ev.download.primary_timeseries(
...     primary_location_id=["usgs-01010000", "usgs-01010500"],
...     configuration_name="usgs_observations",
...     start_date="1990-10-01",
...     end_date="1990-10-02"
... )
>>> # Fetch and load into local evaluation
>>> timeseries = ev.download.primary_timeseries(
...     primary_location_id=["usgs-01010000"],
...     configuration_name="usgs_observations",
...     start_date="1990-10-01",
...     end_date="1990-10-02",
...     load=True
... )

Fetch secondary timeseries from the warehouse API.

Parameters:

primary_location_id (str or list of str, optional) – Filter by primary location ID(s). Either primary_location_id or secondary_location_id must be provided.
secondary_location_id (str or list of str, optional) – Filter by secondary location ID(s). Either primary_location_id or secondary_location_id must be provided.
configuration_name (str or list of str, optional) – Filter by configuration name(s)
variable_name (str, optional) – Filter by variable name
start_date (Union[str, datetime, pd.Timestamp], optional) – Start date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp.
end_date (Union[str, datetime, pd.Timestamp], optional) – End date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp. If None, only start_date is used.
reference_start_date (Union[str, datetime, pd.Timestamp], optional) – Start date for reference_time (i.e., time of forecast) timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp.
reference_end_date (Union[str, datetime, pd.Timestamp], optional) – End date for reference_time (i.e., time of forecast) timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp. If None, only reference_start_date is used.
load (bool, optional) – If True, load the downloaded data into the local evaluation “secondary_timeseries” table. Default: False
write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”
page_size (int, optional) – Number of series items to fetch per API request. Decrease if timeout errors are encountered. If None, the API determines page size based on server configuration and auth.
timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).
**kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame with TEEHR secondary timeseries schema, or None if load=True

Raises:

ValueError – If neither primary_location_id nor secondary_location_id is provided

Examples

>>> # Fetch NWM retrospective for specific locations and date range
>>> timeseries = ev.download.secondary_timeseries(
...     primary_location_id=["usgs-01010000", "usgs-01010500"],
...     configuration_name="nwm30_retrospective",
...     start_date="1990-10-01",
...     end_date="1990-10-02"
... )
>>> # Fetch and load into local evaluation
>>> timeseries = ev.download.secondary_timeseries(
...     primary_location_id=["usgs-01010000"],
...     configuration_name="nwm30_retrospective",
...     start_date="1990-10-01",
...     end_date="1990-10-02",
...     load=True
... )

units(name: str = None, page_size: int | None = None, load: bool = False, write_mode: str = 'append', timeout: int = None, **kwargs) → DataFrame | None[source]#

Fetch units from the warehouse API.

Parameters:

name (str, optional) – Filter by unit name
page_size (int, optional) – Number of units to fetch per API request. Decrease if timeout errors are encountered. If None, the API determines page size based on server configuration and auth.
load (bool, optional) – If True, load the downloaded data into the local evaluation “units” table. Default: False
write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”
timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).
**kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame containing unit definitions, or None if load=True

Examples

>>> # Fetch all units
>>> units = ev.download.units()
>>> # Fetch and load into local evaluation
>>> units = ev.download.units(load=True)

variables(name: str = None, page_size: int | None = None, load: bool = False, write_mode: str = 'append', timeout: int = None, **kwargs) → DataFrame | None[source]#

Fetch variables from the warehouse API.

Parameters:

name (str, optional) – Filter by variable name
page_size (int, optional) – Number of variables to fetch per API request. Decrease if timeout errors are encountered. If None, the API determines page size based on server configuration and auth.
load (bool, optional) – If True, load the downloaded data into the local evaluation “variables” table. Default: False
write_mode (str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”
timeout (int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).
**kwargs – Additional query parameters to pass to the API

Returns:

Union[pd.DataFrame, None] – DataFrame containing variable definitions, or None if load=True

Examples

>>> # Fetch all variables
>>> variables = ev.download.variables()
>>> # Fetch and load into local evaluation
>>> variables = ev.download.variables(load=True)

Download#

This Page