teehr.Evaluation#
- class teehr.Evaluation(dir_path: str | Path | S3Path, create_dir: bool = False, spark: SparkSession | None = None)[source]#
Bases:
object
The Evaluation class.
This is the main class for the TEEHR evaluation.
Methods
Clean temporary files.
Fetch the study data from S3.
Create a study from the standard template.
Enable logging.
List the evaluations available on S3.
Run a SQL query on the Spark session against the TEEHR tables.
Attributes
Access the attributes table.
Access the configurations table.
The fetch component class for accessing external data.
Access the joined timeseries table.
Access the location attributes table.
Access the location crosswalks table.
Access the locations table.
The metrics component class for calculating performance metrics.
Access the primary timeseries table.
Access the secondary timeseries table.
Access the units table.
Access the variables table.
- property attributes: AttributeTable#
Access the attributes table.
- clone_from_s3(evaluation_name: str, primary_location_ids: List[str] | None = None, start_date: str | datetime | None = None, end_date: str | datetime | None = None)[source]#
Fetch the study data from S3.
Copies the study from s3 to the local directory, with the option to subset the dataset by primary location ID, start and end dates.
- Parameters:
evaluation_name (
str
) – The name of the evaluation to clone from S3. Use the list_s3_evaluations method to get the available evaluations.primary_location_ids (
List[str]
, optional) – The list of primary location ids to subset the data. The default is None.start_date (
Union[str
,datetime]
, optional) – The start date to subset the data. The default is None.end_date (
Union[str
,datetime]
, optional) – The end date to subset the data. The default is None.
Notes
- Includes the following tables:
units
variables
attributes
configurations
locations
location_attributes
location_crosswalks
primary_timeseries
secondary_timeseries
joined_timeseries
Also includes the user_defined_fields.py script.
- clone_template()[source]#
Create a study from the standard template.
This method mainly copies the template directory to the specified evaluation directory.
- property configurations: ConfigurationTable#
Access the configurations table.
- property fetch: Fetch#
The fetch component class for accessing external data.
- property joined_timeseries: JoinedTimeseriesTable#
Access the joined timeseries table.
- static list_s3_evaluations(format: Literal['pandas', 'list'] = 'pandas') list | DataFrame [source]#
List the evaluations available on S3.
- Parameters:
format (
str
, optional) – The format of the output. Either “pandas” or “list”. The default is “pandas”.
- property location_attributes: LocationAttributeTable#
Access the location attributes table.
- property location_crosswalks: LocationCrosswalkTable#
Access the location crosswalks table.
- property locations: LocationTable#
Access the locations table.
- property metrics: Metrics#
The metrics component class for calculating performance metrics.
- property primary_timeseries: PrimaryTimeseriesTable#
Access the primary timeseries table.
- property secondary_timeseries: SecondaryTimeseriesTable#
Access the secondary timeseries table.
- sql(query: str, create_temp_views: List[str] | None = None)[source]#
Run a SQL query on the Spark session against the TEEHR tables.
- Parameters:
query (
str
) – The SQL query to run.create_temp_views (
List[str]
, optional) – A list of tables to create temporary views for. The default is None which creates all.
- Returns:
pyspark.sql.DataFrame
– The result of the SQL query. This is lazily evaluated so you need to call an action (e.g., sdf.show()) to get the result.By default this method has access
tothe following tables preloaded as temporary views
–units
variables
attributes
configurations
locations
location_attributes
location_crosswalks
primary_timeseries
secondary_timeseries
joined_timeseries
- property variables: VariableTable#
Access the variables table.