teehr.Evaluation#

class teehr.Evaluation(dir_path: str | Path | S3Path, create_dir: bool = False, spark: SparkSession | None = None)[source]#

Bases: object

The Evaluation class.

This is the main class for the TEEHR evaluation.

Methods

clean_cache

Clean temporary files.

clone_from_s3

Fetch the study data from S3.

clone_template

Create a study from the standard template.

enable_logging

Enable logging.

list_s3_evaluations

List the evaluations available on S3.

sql

Run a SQL query on the Spark session against the TEEHR tables.

Attributes

attributes

Access the attributes table.

configurations

Access the configurations table.

fetch

The fetch component class for accessing external data.

joined_timeseries

Access the joined timeseries table.

location_attributes

Access the location attributes table.

location_crosswalks

Access the location crosswalks table.

locations

Access the locations table.

metrics

The metrics component class for calculating performance metrics.

primary_timeseries

Access the primary timeseries table.

secondary_timeseries

Access the secondary timeseries table.

units

Access the units table.

variables

Access the variables table.

property attributes: AttributeTable#

Access the attributes table.

clean_cache()[source]#

Clean temporary files.

Includes removing temporary files.

clone_from_s3(evaluation_name: str, primary_location_ids: List[str] | None = None, start_date: str | datetime | None = None, end_date: str | datetime | None = None)[source]#

Fetch the study data from S3.

Copies the study from s3 to the local directory, with the option to subset the dataset by primary location ID, start and end dates.

Parameters:
  • evaluation_name (str) – The name of the evaluation to clone from S3. Use the list_s3_evaluations method to get the available evaluations.

  • primary_location_ids (List[str], optional) – The list of primary location ids to subset the data. The default is None.

  • start_date (Union[str, datetime], optional) – The start date to subset the data. The default is None.

  • end_date (Union[str, datetime], optional) – The end date to subset the data. The default is None.

Notes

Includes the following tables:
  • units

  • variables

  • attributes

  • configurations

  • locations

  • location_attributes

  • location_crosswalks

  • primary_timeseries

  • secondary_timeseries

  • joined_timeseries

Also includes the user_defined_fields.py script.

clone_template()[source]#

Create a study from the standard template.

This method mainly copies the template directory to the specified evaluation directory.

property configurations: ConfigurationTable#

Access the configurations table.

enable_logging()[source]#

Enable logging.

property fetch: Fetch#

The fetch component class for accessing external data.

property joined_timeseries: JoinedTimeseriesTable#

Access the joined timeseries table.

static list_s3_evaluations(format: Literal['pandas', 'list'] = 'pandas') list | DataFrame[source]#

List the evaluations available on S3.

Parameters:

format (str, optional) – The format of the output. Either “pandas” or “list”. The default is “pandas”.

property location_attributes: LocationAttributeTable#

Access the location attributes table.

property location_crosswalks: LocationCrosswalkTable#

Access the location crosswalks table.

property locations: LocationTable#

Access the locations table.

property metrics: Metrics#

The metrics component class for calculating performance metrics.

property primary_timeseries: PrimaryTimeseriesTable#

Access the primary timeseries table.

property secondary_timeseries: SecondaryTimeseriesTable#

Access the secondary timeseries table.

sql(query: str, create_temp_views: List[str] | None = None)[source]#

Run a SQL query on the Spark session against the TEEHR tables.

Parameters:
  • query (str) – The SQL query to run.

  • create_temp_views (List[str], optional) – A list of tables to create temporary views for. The default is None which creates all.

Returns:

  • pyspark.sql.DataFrame – The result of the SQL query. This is lazily evaluated so you need to call an action (e.g., sdf.show()) to get the result.

  • By default this method has access to the following tables preloaded as temporary views

    • units

    • variables

    • attributes

    • configurations

    • locations

    • location_attributes

    • location_crosswalks

    • primary_timeseries

    • secondary_timeseries

    • joined_timeseries

property units: UnitTable#

Access the units table.

property variables: VariableTable#

Access the variables table.