teehr.LocationTable#
- class teehr.LocationTable(ev)[source]#
Bases:
BaseTable
Access methods to locations table.
Methods
Add domain variables.
Return distinct values for a column.
Get the location fields enum.
Return table columns as a list.
Apply a filter.
Import geometry data.
Apply an order_by.
Run a query against the table with filters and order_by.
Return GeoPandas DataFrame.
Return Pandas DataFrame for Location Table.
Return PySpark DataFrame.
- add()#
Add domain variables.
- distinct_values(column: str) List[str] #
Return distinct values for a column.
- fields() List[str] #
Return table columns as a list.
- filter(filters: str | dict | FilterBaseModel | List[str | dict | FilterBaseModel])#
Apply a filter.
- Parameters:
- filters (
Union[
) – str, dict, FilterBaseModel, List[Union[str, dict, FilterBaseModel]]
] The filters to apply to the query. The filters can be a string, dictionary, FilterBaseModel or a list of any of these.
- filters (
- Returns:
Examples
Filters as dictionary:
>>> ts_df = ev.primary_timeseries.filter( >>> filters=[ >>> { >>> "column": "value_time", >>> "operator": ">", >>> "value": "2022-01-01", >>> }, >>> { >>> "column": "value_time", >>> "operator": "<", >>> "value": "2022-01-02", >>> }, >>> { >>> "column": "location_id", >>> "operator": "=", >>> "value": "gage-C", >>> }, >>> ] >>> ).to_pandas()
Filters as string:
>>> ts_df = ev.primary_timeseries.filter( >>> filters=[ >>> "value_time > '2022-01-01'", >>> "value_time < '2022-01-02'", >>> "location_id = 'gage-C'" >>> ] >>> ).to_pandas()
Filters as FilterBaseModel:
>>> from teehr.models.filters import TimeseriesFilter >>> from teehr.models.filters import FilterOperators >>> >>> fields = ev.primary_timeseries.field_enum() >>> ts_df = ev.primary_timeseries.filter( >>> filters=[ >>> TimeseriesFilter( >>> column=fields.value_time, >>> operator=FilterOperators.gt, >>> value="2022-01-01", >>> ), >>> TimeseriesFilter( >>> column=fields.value_time, >>> operator=FilterOperators.lt, >>> value="2022-01-02", >>> ), >>> TimeseriesFilter( >>> column=fields.location_id, >>> operator=FilterOperators.eq, >>> value="gage-C", >>> ), >>> ]).to_pandas()
- load_spatial(in_path: Path | str, field_mapping: dict | None = None, pattern: str = '**/*.parquet', **kwargs)[source]#
Import geometry data.
- Parameters:
in_path (
Union[Path
,str]
) – The input file or directory path. Any file format that can be read by GeoPandas.field_mapping (
dict
, optional) – A dictionary mapping input fields to output fields. Format: {input_field: output_field}pattern (
str
,optional (default
:"**/*.parquet"
)
) – The pattern to match files. Only used when in_path is a directory.**kwargs – Additional keyword arguments are passed to GeoPandas read_file().
File is first read by GeoPandas, field names renamed and
then validated and inserted into the dataset.
Notes
The TEEHR Location Crosswalk table schema includes fields:
id
name
geometry
- order_by(fields: str | StrEnum | List[str | StrEnum])#
Apply an order_by.
- Parameters:
fields (
Union[str
,StrEnum
,List[Union[str
,StrEnum]]]
) – The fields to order the query by. The fields can be a string, StrEnum or a list of any of these. The fields will be ordered in the order they are provided.- Returns:
Examples
Order by string:
>>> ts_df = ev.primary_timeseries.order_by("value_time").to_df()
Order by StrEnum:
>>> from teehr.querying.field_enums import TimeseriesFields >>> ts_df = ev.primary_timeseries.order_by( >>> TimeseriesFields.value_time >>> ).to_pandas()
- query(filters: str | dict | FilterBaseModel | List[str | dict | FilterBaseModel] | None = None, order_by: str | StrEnum | List[str | StrEnum] | None = None)#
Run a query against the table with filters and order_by.
In general a user will either use the query methods or the filter and order_by methods. The query method is a convenience method that will apply filters and order_by in a single call.
- Parameters:
- filters (
Union[
) – str, dict, FilterBaseModel, List[Union[str, dict, FilterBaseModel]]
] The filters to apply to the query. The filters can be a string, dictionary, FilterBaseModel or a list of any of these. The filters
- filters (
order_by (
Union[str
,List[str]
,StrEnum
,List[StrEnum]]
) – The fields to order the query by. The fields can be a string, StrEnum or a list of any of these. The fields will be ordered in the order they are provided.
- Returns:
Examples
Filters as dictionary:
>>> ts_df = ev.primary_timeseries.query( >>> filters=[ >>> { >>> "column": "value_time", >>> "operator": ">", >>> "value": "2022-01-01", >>> }, >>> { >>> "column": "value_time", >>> "operator": "<", >>> "value": "2022-01-02", >>> }, >>> { >>> "column": "location_id", >>> "operator": "=", >>> "value": "gage-C", >>> }, >>> ], >>> order_by=["location_id", "value_time"] >>> ).to_pandas()
Filters as string:
>>> ts_df = ev.primary_timeseries.query( >>> filters=[ >>> "value_time > '2022-01-01'", >>> "value_time < '2022-01-02'", >>> "location_id = 'gage-C'" >>> ], >>> order_by=["location_id", "value_time"] >>> ).to_pandas()
Filters as FilterBaseModel:
>>> from teehr.models.filters import TimeseriesFilter >>> from teehr.models.filters import FilterOperators >>> >>> fields = ev.primary_timeseries.field_enum() >>> ts_df = ev.primary_timeseries.query( >>> filters=[ >>> TimeseriesFilter( >>> column=fields.value_time, >>> operator=FilterOperators.gt, >>> value="2022-01-01", >>> ), >>> TimeseriesFilter( >>> column=fields.value_time, >>> operator=FilterOperators.lt, >>> value="2022-01-02", >>> ), >>> TimeseriesFilter( >>> column=fields.location_id, >>> operator=FilterOperators.eq, >>> value="gage-C", >>> ), >>> ]).to_pandas()
- to_sdf()#
Return PySpark DataFrame.
The PySpark DataFrame can be further processed using PySpark. Note, PySpark DataFrames are lazy and will not be executed until an action is called. For example, calling show(), collect() or toPandas(). This can be useful for further processing or analysis, for example,
>>> ts_sdf = ev.primary_timeseries.query( >>> filters=[ >>> "value_time > '2022-01-01'", >>> "value_time < '2022-01-02'", >>> "location_id = 'gage-C'" >>> ] >>> ).to_sdf() >>> ts_df = ( >>> ts_sdf.select("value_time", "location_id", "value") >>> .orderBy("value").toPandas() >>> ) >>> ts_df.head()