Getting started#
Installation#
TEEHR requires the following dependencies:
Python 3.10 or later
Java 8 or later for Spark (we use 17)
The easiest way to install TEEHR is from PyPI using pip. If using pip to install TEEHR, we recommend installing TEEHR in a virtual environment. The code below creates a new virtual environment and installs TEEHR in it.
# Create directory for your code and create a new virtual environment.
mkdir teehr_examples
cd teehr_examples
python3 -m venv .venv
source .venv/bin/activate
# Install using pip.
# Starting with version 0.4.1 TEEHR is available in PyPI
pip install teehr
# Download the required JAR files for Spark to interact with AWS S3.
python -m teehr.utils.install_spark_jars
Or, if you do not want to install TEEHR in your own virtual environment, you can use Docker:
docker build -t teehr:v0.4.7 .
docker run -it --rm --volume $HOME:$HOME -p 8888:8888 teehr:v0.4.7 jupyter lab --ip 0.0.0.0 $HOME
Project Objectives#
Easy integration into research workflows
Use of modern and efficient data structures and computing platforms
Scalable for rapid execution of large-domain/large-sample evaluations
Simplified exploration of performance trends and potential drivers (e.g., climate, time-period, regulation, and basin characteristics)
Inclusion of common and emergent evaluation methods (e.g., error statistics, skill scores, categorical metrics, hydrologic signatures, uncertainty quantification, and graphical methods)
Open source and community-extensible development
Why TEEHR?#
TEEHR is a python package that provides a framework for the evaluation of hydrologic model performance. It is designed to enable iterative and exploratory analysis of hydrologic data, and facilitates this through:
Scalability - TEEHR’s computational engine is built on PySpark, allowing it to take advantage of your available compute resources.
Data Integrity - TEEHR’s internal data model (The TEEHR Framework) makes it easier to work with and validate the various data making up your evaluation, such as model outputs, observations, location attributes, and more.
Flexibility - TEEHR is designed to be flexible and extensible, allowing you to easily customize metrics, add bootstrapping, and group and filter your data in a variety of ways.
TEEHR Evaluation Example#
The following is an example of initializing a TEEHR Evaluation, cloning a dataset from the TEEHR S3 bucket, and calculating two versions of KGE (one with bootstrap uncertainty and one without).
import teehr
from pathlib import Path
# Initialize an Evaluation object
ev = teehr.Evaluation(
dir_path=Path(Path().home(), "temp", "quick_start_example"),
create_dir=True
)
# Clone the example data from S3
ev.clone_from_s3("e0_2_location_example")
# Define a bootstrapper with custom parameters.
boot = teehr.Bootstrappers.CircularBlock(
seed=50,
reps=500,
block_size=10,
quantiles=[0.05, 0.95]
)
kge = teehr.DeterministicMetrics.KlingGuptaEfficiency(bootstrap=boot)
kge.output_field_name = "BS_KGE"
include_metrics = [kge, teehr.DeterministicMetrics.KlingGuptaEfficiency()]
# Get the currently available fields to use in the query.
flds = ev.joined_timeseries.field_enum()
metrics_df = ev.metrics.query(
include_metrics=include_metrics,
group_by=[flds.primary_location_id],
order_by=[flds.primary_location_id]
).to_pandas()
metrics_df
For a full list of metrics currently available in TEEHR, see the Available Metrics documentation.