teehr.Evaluation#

class teehr.Evaluation(dir_path: str | Path, create_dir: bool = False, spark: SparkSession | None = None)[source]#

Bases: object

The Evaluation class.

This is the main class for the TEEHR evaluation.

Methods

`clean_cache`	Clean temporary files.
`clone_from_s3`	Fetch the study data from S3.
`clone_template`	Create a study from the standard template.
`enable_logging`	Enable logging.
`list_s3_evaluations`	List the evaluations available on S3.
`sql`	Run a SQL query on the Spark session against the TEEHR tables.

Attributes

`attributes`	Access the attributes table.
`configurations`	Access the configurations table.
`fetch`	The fetch component class for accessing external data.
`joined_timeseries`	Access the joined timeseries table.
`location_attributes`	Access the location attributes table.
`location_crosswalks`	Access the location crosswalks table.
`locations`	Access the locations table.
`metrics`	The metrics component class for calculating performance metrics.
`primary_timeseries`	Access the primary timeseries table.
`secondary_timeseries`	Access the secondary timeseries table.
`units`	Access the units table.
`variables`	Access the variables table.

clean_cache()[source]#

Clean temporary files.

Includes removing temporary files.

Fetch the study data from S3.

Copies the study from s3 to the local directory, with the option to subset the dataset by primary location ID, start and end dates.

Parameters:

evaluation_name (str) – The name of the evaluation to clone from S3. Use the list_s3_evaluations method to get the available evaluations.
primary_location_ids (List[str], optional) – The list of primary location ids to subset the data. The default is None.
start_date (Union[str, datetime], optional) – The start date to subset the data. The default is None.
end_date (Union[str, datetime], optional) – The end date to subset the data. The default is None.

Notes

Includes the following tables:

Also includes the user_defined_fields.py script.

clone_template()[source]#

Create a study from the standard template.

This method mainly copies the template directory to the specified evaluation directory.

property configurations: ConfigurationTable#: Access the configurations table.

property fetch: Fetch#: The fetch component class for accessing external data.

property joined_timeseries: JoinedTimeseriesTable#: Access the joined timeseries table.

static list_s3_evaluations(format: Literal['pandas', 'list'] = 'pandas') → list | DataFrame[source]#

List the evaluations available on S3.

Parameters:: format (str, optional) – The format of the output. Either “pandas” or “list”. The default is “pandas”.

property location_attributes: LocationAttributeTable#: Access the location attributes table.

property location_crosswalks: LocationCrosswalkTable#: Access the location crosswalks table.

property metrics: Metrics#: The metrics component class for calculating performance metrics.

property primary_timeseries: PrimaryTimeseriesTable#: Access the primary timeseries table.

property secondary_timeseries: SecondaryTimeseriesTable#: Access the secondary timeseries table.

sql(query: str)[source]#

Run a SQL query on the Spark session against the TEEHR tables.

Parameters:

query (str) – The SQL query to run.

Returns:

pyspark.sql.DataFrame – The result of the SQL query. This is lazily evaluated so you need to call an action (e.g., sdf.show()) to get the result.
This method has access to the following tables preloaded as temporary views –
- units
- variables
- attributes
- configurations
- locations
- location_attributes
- location_crosswalks
- primary_timeseries
- secondary_timeseries
- joined_timeseries