Download#
- class Download(ev)[source]#
A component class for downloading data from the TEEHR-Cloud data warehouse.
Methods
Fetch attributes from the warehouse API.
Fetch configurations from the warehouse API.
Configure the warehouse API connection settings.
Download a subset of evaluation data from the warehouse API.
Fetch location attributes from the warehouse API.
Fetch location crosswalks from the warehouse API.
Fetch locations from the warehouse API as a GeoDataFrame.
Fetch primary timeseries from the warehouse API.
Fetch secondary timeseries from the warehouse API.
Fetch units from the warehouse API.
Fetch variables from the warehouse API.
Attributes
DEFAULT_TIMEOUT- attributes(name: str | None = None, type: str | None = None, page_size: int = 10000, load: bool = False, write_mode: str = 'append', timeout: int | None = None, **kwargs) DataFrame | None[source]#
Fetch attributes from the warehouse API.
- Parameters:
name (
str, optional) – Filter by attribute nametype (
str, optional) – Filter by attribute type (“categorical” or “continuous”)page_size (
int, optional) – Number of attributes to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.load (
bool, optional) – If True, load the downloaded data into the local evaluation “attributes” table. Default: Falsewrite_mode (
str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”timeout (
int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).**kwargs – Additional query parameters to pass to the API
- Returns:
Union[pd.DataFrame,None]– DataFrame containing attribute definitions, or None if load=True
Examples
>>> # Fetch all categorical attributes >>> attrs = ev.download.attributes(type="categorical") >>> # Fetch and load into local evaluation >>> attrs = ev.download.attributes(load=True)
- configurations(name: str | None = None, type: str | None = None, page_size: int = 10000, load: bool = False, write_mode: str = 'append', timeout: int | None = None, **kwargs) DataFrame | None[source]#
Fetch configurations from the warehouse API.
- Parameters:
name (
str, optional) – Filter by configuration nametype (
str, optional) – Filter by configuration type (“primary” or “secondary”)page_size (
int, optional) – Number of configurations to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.load (
bool, optional) – If True, load the downloaded data into the local evaluation “configurations” table. Default: Falsewrite_mode (
str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”timeout (
int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).**kwargs – Additional query parameters to pass to the API
- Returns:
Union[pd.DataFrame,None]– DataFrame containing configuration definitions, or None if load=True
Examples
>>> # Fetch all primary configurations >>> configs = ev.download.configurations(type="primary") >>> # Fetch and load into local evaluation >>> configs = ev.download.configurations(load=True)
- configure(api_base_url: str | None = None, api_port: int | None = None, verify_ssl: bool = True, timeout: int = 60) Download[source]#
Configure the warehouse API connection settings.
- Parameters:
api_base_url (
str, optional) – Base URL for the TEEHR warehouse API. Default: “https://api.teehr.rtiamanzi.org”api_port (
int, optional) – Port number for the API. If provided, will be appended to the base URL (e.g., “https://api.teehr.rtiamanzi.org:8443”).verify_ssl (
bool, optional) – Whether to verify SSL certificates when making requests. Default: Truetimeout (
int, optional) – Default request timeout in seconds for all download methods. Default: 60
- Returns:
Download– Returns self for method chaining
Examples
>>> ev.download.configure( ... api_base_url="https://api.teehr.rtiamanzi.org", ... api_port=8443, ... verify_ssl=True, ... timeout=120 ... ) >>> locations = ev.download.locations(prefix="usgs")
- evaluation_subset(start_date: str | datetime | Timestamp, end_date: str | datetime | Timestamp, primary_configuration_name: str, secondary_configuration_name: str, location_ids: str | List[str] | None = None, prefix: str | None = None, bbox: List[float] | None = None, page_size: int = 10000, timeout: int | None = None) None[source]#
Download a subset of evaluation data from the warehouse API.
- Parameters:
start_date (
Union[str,datetime,pd.Timestamp]) – Start date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp.end_date (
Union[str,datetime,pd.Timestamp]) – End date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp.primary_configuration_name (
str) – Name of the primary configuration to include.secondary_configuration_name (
str) – Name of the secondary configuration to include.location_ids (
strorlistofstr, optional) – Location ID or list of location IDs to include in the subset.prefix (
str, optional) – Filter locations by ID prefix (e.g., “usgs”, “nwm30”).bbox (
listoffloat, optional) – Bounding box to filter locations by spatial extent, in the format [minx, miny, maxx, maxy].page_size (
int, optional) – Number of series items to fetch per API request for timeseries. Decrease if timeout errors are encountered. Default: 10000timeout (
int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).
- Returns:
None– Loads the subset data to local iceberg tables.
Examples
>>> ev.download.evaluation_subset( ... prefix="usgs", ... bbox=[-120.0, 35.0, -119.0, 36.0], ... start_date="2005-01-01", ... end_date="2020-01-02", ... primary_configuration_name="usgs_observations", ... secondary_configuration_name="nwm30_retrospective", ... page_size=5000 ... )
- location_attributes(location_id: str | List[str] | None = None, attribute_name: str | None = None, page_size: int = 10000, load: bool = False, write_mode: str = 'append', timeout: int | None = None, **kwargs) DataFrame | None[source]#
Fetch location attributes from the warehouse API.
- Parameters:
location_id (
strorlistofstr, optional) – Filter by location ID(s)attribute_name (
str, optional) – Filter by attribute namepage_size (
int, optional) – Number of location attributes to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.load (
bool, optional) – If True, load the downloaded data into the local evaluation “location_attributes” table. Default: Falsewrite_mode (
str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”timeout (
int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).**kwargs – Additional query parameters to pass to the API
- Returns:
Union[pd.DataFrame,None]– DataFrame containing location attribute values, or None if load=True
Examples
>>> # Fetch attributes for specific locations >>> loc_ids = ["usgs-01010000", "usgs-01010500"] >>> loc_attrs = ev.download.location_attributes( ... location_id=loc_ids ... ) >>> # Fetch and load into local evaluation >>> loc_attrs = ev.download.location_attributes(load=True)
- location_crosswalks(primary_location_id: str | List[str] | None = None, secondary_location_id: str | List[str] | None = None, primary_location_id_prefix: str | None = None, secondary_location_id_prefix: str | None = None, page_size: int = 10000, load: bool = False, write_mode: str = 'append', timeout: int | None = None, **kwargs) DataFrame | None[source]#
Fetch location crosswalks from the warehouse API.
- Parameters:
primary_location_id (
strorlistofstr, optional) – Filter by primary location ID(s)secondary_location_id (
strorlistofstr, optional) – Filter by secondary location ID(s)primary_location_id_prefix (
str, optional) – Filter crosswalks by primary location ID prefix (e.g., “usgs”). Passed as a query parameter to the API. Default: Nonesecondary_location_id_prefix (
str, optional) – Filter crosswalks by secondary location ID prefix (e.g., “nwm30”). Passed as a query parameter to the API. Default: Nonepage_size (
int, optional) – Number of location crosswalks to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.load (
bool, optional) – If True, load the downloaded data into the local evaluation “location_crosswalks” table. Default: Falsewrite_mode (
str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”timeout (
int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).**kwargs – Additional query parameters to pass to the API
- Returns:
Union[pd.DataFrame,None]– DataFrame containing location crosswalk mappings, or None if load=True
Examples
>>> # Fetch crosswalks for specific primary locations >>> loc_ids = ["usgs-01010000", "usgs-01010500"] >>> crosswalks = ev.download.location_crosswalks( ... primary_location_id=loc_ids ... ) >>> # Fetch crosswalks filtered by ID prefixes >>> crosswalks = ev.download.location_crosswalks( ... primary_location_id_prefix="usgs", ... secondary_location_id_prefix="nwm30" ... ) >>> # Fetch and load into local evaluation >>> crosswalks = ev.download.location_crosswalks(load=True)
- locations(prefix: str | None = None, ids: str | List[str] | None = None, bbox: List[float] | None = None, include_attributes: bool = False, page_size: int = 10000, load: bool = False, write_mode: str = 'append', timeout: int | None = None, **kwargs) GeoDataFrame | None[source]#
Fetch locations from the warehouse API as a GeoDataFrame.
- Parameters:
prefix (
str, optional) – Filter locations by ID prefix (e.g., “usgs”, “nwm30”)ids (
strorlistofstr, optional) – Filter locations by specific IDs.bbox (
listoffloat, optional) – Bounding box to filter locations by spatial extent, in the format [minx, miny, maxx, maxy].include_attributes (
bool, optional) – Whether to include location attributes in the response. Default: Falsepage_size (
int, optional) – Number of locations to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.load (
bool, optional) – If True, load the downloaded data into the local evaluation “locations” table. Default: Falsewrite_mode (
str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”timeout (
int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).**kwargs – Additional query parameters to pass to the API
- Returns:
Union[gpd.GeoDataFrame,None]– GeoDataFrame containing locations with geometry, or None if load=True
Examples
>>> # Fetch USGS locations with attributes >>> locations = ev.download.locations( ... prefix="usgs", ... include_attributes=True ... ) >>> # Fetch and load into local evaluation >>> locations = ev.download.locations( ... prefix="usgs", ... load=True ... )
- primary_timeseries(primary_location_id: str | List[str], configuration_name: str | List[str], variable_name: str | None = None, start_date: str | datetime | Timestamp | None = None, end_date: str | datetime | Timestamp | None = None, load: bool = False, write_mode: str = 'append', page_size: int = 10000, timeout: int | None = None, **kwargs) DataFrame | None[source]#
Fetch primary timeseries from the warehouse API.
- Parameters:
primary_location_id (
strorlistofstr) – Filter by primary location ID(s)configuration_name (
strorlistofstr) – Filter by configuration name(s)variable_name (
str, optional) – Filter by variable namestart_date (
Union[str,datetime,pd.Timestamp], optional) – Start date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp.end_date (
Union[str,datetime,pd.Timestamp], optional) – End date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp. If None, only start_date is used.load (
bool, optional) – If True, load the downloaded data into the local evaluation “primary_timeseries” table. Default: Falsewrite_mode (
str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”page_size (
int, optional) – Number of series items to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.timeout (
int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).**kwargs – Additional query parameters to pass to the API
- Returns:
Union[pd.DataFrame,None]– DataFrame with TEEHR primary timeseries schema, or None if load=True
Examples
>>> # Fetch USGS observations for specific locations and date range >>> timeseries = ev.download.primary_timeseries( ... primary_location_id=["usgs-01010000", "usgs-01010500"], ... configuration_name="usgs_observations", ... start_date="1990-10-01", ... end_date="1990-10-02" ... ) >>> # Fetch and load into local evaluation >>> timeseries = ev.download.primary_timeseries( ... primary_location_id=["usgs-01010000"], ... configuration_name="usgs_observations", ... start_date="1990-10-01", ... end_date="1990-10-02", ... load=True ... )
- secondary_timeseries(primary_location_id: str | List[str] | None = None, secondary_location_id: str | List[str] | None = None, configuration_name: str | List[str] | None = None, variable_name: str | None = None, start_date: str | datetime | Timestamp | None = None, end_date: str | datetime | Timestamp | None = None, load: bool = False, write_mode: str = 'append', page_size: int = 10000, timeout: int | None = None, **kwargs) DataFrame | None[source]#
Fetch secondary timeseries from the warehouse API.
- Parameters:
primary_location_id (
strorlistofstr, optional) – Filter by primary location ID(s). Either primary_location_id or secondary_location_id must be provided.secondary_location_id (
strorlistofstr, optional) – Filter by secondary location ID(s). Either primary_location_id or secondary_location_id must be provided.configuration_name (
strorlistofstr, optional) – Filter by configuration name(s)variable_name (
str, optional) – Filter by variable namestart_date (
Union[str,datetime,pd.Timestamp], optional) – Start date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp.end_date (
Union[str,datetime,pd.Timestamp], optional) – End date for timeseries query. Accepts ISO 8601 string, datetime, or pd.Timestamp. If None, only start_date is used.load (
bool, optional) – If True, load the downloaded data into the local evaluation “secondary_timeseries” table. Default: Falsewrite_mode (
str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”page_size (
int, optional) – Number of series items to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.timeout (
int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).**kwargs – Additional query parameters to pass to the API
- Returns:
Union[pd.DataFrame,None]– DataFrame with TEEHR secondary timeseries schema, or None if load=True- Raises:
ValueError – If neither primary_location_id nor secondary_location_id is provided
Examples
>>> # Fetch NWM retrospective for specific locations and date range >>> timeseries = ev.download.secondary_timeseries( ... primary_location_id=["usgs-01010000", "usgs-01010500"], ... configuration_name="nwm30_retrospective", ... start_date="1990-10-01", ... end_date="1990-10-02" ... ) >>> # Fetch and load into local evaluation >>> timeseries = ev.download.secondary_timeseries( ... primary_location_id=["usgs-01010000"], ... configuration_name="nwm30_retrospective", ... start_date="1990-10-01", ... end_date="1990-10-02", ... load=True ... )
- units(name: str | None = None, page_size: int = 10000, load: bool = False, write_mode: str = 'append', timeout: int | None = None, **kwargs) DataFrame | None[source]#
Fetch units from the warehouse API.
- Parameters:
name (
str, optional) – Filter by unit namepage_size (
int, optional) – Number of units to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.load (
bool, optional) – If True, load the downloaded data into the local evaluation “units” table. Default: Falsewrite_mode (
str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”timeout (
int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).**kwargs – Additional query parameters to pass to the API
- Returns:
Union[pd.DataFrame,None]– DataFrame containing unit definitions, or None if load=True
Examples
>>> # Fetch all units >>> units = ev.download.units() >>> # Fetch and load into local evaluation >>> units = ev.download.units(load=True)
- variables(name: str | None = None, page_size: int = 10000, load: bool = False, write_mode: str = 'append', timeout: int | None = None, **kwargs) DataFrame | None[source]#
Fetch variables from the warehouse API.
- Parameters:
name (
str, optional) – Filter by variable namepage_size (
int, optional) – Number of variables to fetch per API request. Decrease if timeout errors are encountered. Default: 10000.load (
bool, optional) – If True, load the downloaded data into the local evaluation “variables” table. Default: Falsewrite_mode (
str, optional) – Write mode when loading. Options: “append”, “upsert”, “create_or_replace”. Default: “append”timeout (
int, optional) – Request timeout in seconds. If None, uses the instance default (set via configure() or __init__, default: 60).**kwargs – Additional query parameters to pass to the API
- Returns:
Union[pd.DataFrame,None]– DataFrame containing variable definitions, or None if load=True
Examples
>>> # Fetch all variables >>> variables = ev.download.variables() >>> # Fetch and load into local evaluation >>> variables = ev.download.variables(load=True)