Metrics#
TEEHR provides comprehensive metrics for evaluating hydrologic model performance.
The aggregate() method on tables and views computes metrics across grouped data,
with support for bootstrapping, transforms, and multiple metric categories.
Using the Aggregate Method#
The aggregate() method is available on all Table and View objects. It computes
specified metrics grouped by selected fields:
import teehr
from teehr.metrics import DeterministicMetrics
ev = teehr.LocalReadWriteEvaluation(dir_path="/path/to/evaluation")
# Basic metrics query
metrics_df = ev.table("joined_timeseries").aggregate(
metrics=[
DeterministicMetrics.KlingGuptaEfficiency(),
DeterministicMetrics.NashSutcliffeEfficiency(),
],
group_by=["primary_location_id"],
).to_pandas()
Aggregate Parameters#
Parameter |
Description |
|---|---|
|
List of metric instances to compute |
|
List of fields to group by before computing metrics |
Group By Fields#
The group_by parameter controls how metrics are aggregated. Common groupings:
import teehr.models.calculated_fields.row_level as rcf
# Group by location only
jt.aggregate(metrics=[...], group_by=["primary_location_id"])
# Group by location and configuration
jt.aggregate(metrics=[...], group_by=["primary_location_id", "configuration_name"])
# Group by calculated fields
jt = ev.joined_timeseries_view().add_calculated_fields([
rcf.Month(),
rcf.WaterYear(),
])
jt.aggregate(metrics=[...], group_by=["primary_location_id", "water_year", "month"])
Using Metrics#
Import metric classes and instantiate them:
from teehr.metrics import DeterministicMetrics, Signatures, ProbabilisticMetrics
# Deterministic metrics
kge = DeterministicMetrics.KlingGuptaEfficiency()
nse = DeterministicMetrics.NashSutcliffeEfficiency()
rmse = DeterministicMetrics.RootMeanSquareError()
# Signatures (single field statistics)
avg = Signatures.Average()
fdc = Signatures.FlowDurationCurveSlope()
# Probabilistic metrics (ensemble forecasts)
crps = ProbabilisticMetrics.CRPS()
Transforms#
Apply mathematical transformations before computing metrics:
from teehr.models.metrics.basemodels import TransformEnum
# Log-transformed RMSE
rmse = DeterministicMetrics.RootMeanSquareError()
rmse.transform = TransformEnum.log
rmse.add_epsilon = True # Avoid log(0)
Available transforms: log, sqrt, square, cube, exp, inv, abs
Bootstrapping#
Compute confidence intervals using bootstrap resampling:
from teehr.models.metrics.bootstrap_models import Bootstrappers
# Configure bootstrap
boot = Bootstrappers.CircularBlock(
reps=1000,
block_size=365,
seed=42,
quantiles=[0.05, 0.5, 0.95]
)
# Apply to metric
kge = DeterministicMetrics.KlingGuptaEfficiency()
kge.bootstrap = boot
kge.unpack_results = True # Separate columns for quantiles
metrics_df = jt.aggregate(
metrics=[kge],
group_by=["primary_location_id"],
).to_pandas()
# Results: kling_gupta_efficiency_0.05, _0.5, _0.95
See also: Bootstrappers
Complete Example#
import teehr
from teehr.metrics import DeterministicMetrics, Signatures
import teehr.models.calculated_fields.row_level as rcf
ev = teehr.LocalReadWriteEvaluation(dir_path="/path/to/evaluation")
# Build view with calculated fields
metrics_df = (
ev.joined_timeseries_view(add_attrs=True)
.add_calculated_fields([rcf.WaterYear(), rcf.Seasons()])
.filter("water_year >= 2015")
.aggregate(
metrics=[
DeterministicMetrics.KlingGuptaEfficiency(),
DeterministicMetrics.RelativeBias(),
Signatures.Average(),
],
group_by=["primary_location_id", "season"],
)
.order_by(["primary_location_id", "season"])
.to_pandas()
)
print(metrics_df.head())
ev.spark.stop()
Available Metrics#
The metrics currently built into TEEHR are listed in the tables below. The metrics currently built into TEEHR are listed in the tables below. Please note that some are still in development and planned for inclusion in future versions.
Signatures#
Signatures#
Signatures operate on a single field to characterize timeseries properties. Signatures operate on a single field to characterize timeseries properties.
Available |
Description |
Short Name |
Equation |
API Reference |
|---|---|---|---|---|
Average |
\(Average\) |
\(\frac{\sum(prim)}{count}\) |
||
Count |
\(Count\) |
\(count\) |
||
Flow Duration Curve Slope |
\(FDC\ Slope\) |
\(\frac{q85-q25}{p85-p25}\) |
||
Max Value Time |
\(Max\ Value\ Time\) |
\(peak\ time_{prim}\) |
||
Maximum |
\(Max\) |
\(max(prim)\) |
||
Minimum |
\(Min\) |
\(min(prim)\) |
||
Sum |
\(Sum\) |
\(\sum(prim)\) |
||
Variance |
\(Variance\) |
\(\sigma^2_{prim}\) |
Deterministic Metrics#
Deterministic metrics compare two timeseries, typically primary (“observed”) vs. secondary (“modeled”) values.
Probabilistic Metrics#
Probabilistic metrics compare a value against a distribution of predicted values, such as ensemble forecasts.
Available |
Description |
Short Name |
Equation |
API Reference |
|---|---|---|---|---|
Continuous Ranked Probability Score |
\(CRPS\) |
\(\int_{-\infty}^{\infty} (F(x) - \mathbf{1}_{x \geq y})^2 dx\) |
||
Brier Score |
\(BS\) |
\(\frac{\sum(sec\ ensemble\ prob-prim\ outcome)^2}{n}\) |
||
Brier Skill Score |
\(BSS\) |
\(1-\frac{BS}{BS_{ref}}\) |
||
Continuous Ranked Probability Skill Score |
\(CRPSS\) |
\(1-\frac{CRPS}{CRPS_{ref}}\) |