Metrics#

TEEHR provides comprehensive metrics for evaluating hydrologic model performance. The aggregate() method on tables and views computes metrics across grouped data, with support for bootstrapping, transforms, and multiple metric categories.

Using the Aggregate Method#

The aggregate() method is available on all Table and View objects. It computes specified metrics grouped by selected fields:

import teehr
from teehr.metrics import DeterministicMetrics

ev = teehr.LocalReadWriteEvaluation(dir_path="/path/to/evaluation")

# Basic metrics query
metrics_df = ev.table("joined_timeseries").aggregate(
    metrics=[
        DeterministicMetrics.KlingGuptaEfficiency(),
        DeterministicMetrics.NashSutcliffeEfficiency(),
    ],
    group_by=["primary_location_id"],
).to_pandas()

Aggregate Parameters#

Parameter

Description

metrics

List of metric instances to compute

group_by

List of fields to group by before computing metrics

Group By Fields#

The group_by parameter controls how metrics are aggregated. Common groupings:

import teehr.models.calculated_fields.row_level as rcf

# Group by location only
jt.aggregate(metrics=[...], group_by=["primary_location_id"])

# Group by location and configuration
jt.aggregate(metrics=[...], group_by=["primary_location_id", "configuration_name"])

# Group by calculated fields
jt = ev.joined_timeseries_view().add_calculated_fields([
    rcf.Month(),
    rcf.WaterYear(),
])
jt.aggregate(metrics=[...], group_by=["primary_location_id", "water_year", "month"])

Using Metrics#

Import metric classes and instantiate them:

from teehr.metrics import DeterministicMetrics, Signatures, ProbabilisticMetrics

# Deterministic metrics
kge = DeterministicMetrics.KlingGuptaEfficiency()
nse = DeterministicMetrics.NashSutcliffeEfficiency()
rmse = DeterministicMetrics.RootMeanSquareError()

# Signatures (single field statistics)
avg = Signatures.Average()
fdc = Signatures.FlowDurationCurveSlope()

# Probabilistic metrics (ensemble forecasts)
crps = ProbabilisticMetrics.CRPS()

Transforms#

Apply mathematical transformations before computing metrics:

from teehr.models.metrics.basemodels import TransformEnum

# Log-transformed RMSE
rmse = DeterministicMetrics.RootMeanSquareError()
rmse.transform = TransformEnum.log
rmse.add_epsilon = True  # Avoid log(0)

Available transforms: log, sqrt, square, cube, exp, inv, abs

Bootstrapping#

Compute confidence intervals using bootstrap resampling:

from teehr.models.metrics.bootstrap_models import Bootstrappers

# Configure bootstrap
boot = Bootstrappers.CircularBlock(
    reps=1000,
    block_size=365,
    seed=42,
    quantiles=[0.05, 0.5, 0.95]
)

# Apply to metric
kge = DeterministicMetrics.KlingGuptaEfficiency()
kge.bootstrap = boot
kge.unpack_results = True  # Separate columns for quantiles

metrics_df = jt.aggregate(
    metrics=[kge],
    group_by=["primary_location_id"],
).to_pandas()

# Results: kling_gupta_efficiency_0.05, _0.5, _0.95

See also: Bootstrappers

Complete Example#

import teehr
from teehr.metrics import DeterministicMetrics, Signatures
import teehr.models.calculated_fields.row_level as rcf

ev = teehr.LocalReadWriteEvaluation(dir_path="/path/to/evaluation")

# Build view with calculated fields
metrics_df = (
    ev.joined_timeseries_view(add_attrs=True)
    .add_calculated_fields([rcf.WaterYear(), rcf.Seasons()])
    .filter("water_year >= 2015")
    .aggregate(
        metrics=[
            DeterministicMetrics.KlingGuptaEfficiency(),
            DeterministicMetrics.RelativeBias(),
            Signatures.Average(),
        ],
        group_by=["primary_location_id", "season"],
    )
    .order_by(["primary_location_id", "season"])
    .to_pandas()
)

print(metrics_df.head())
ev.spark.stop()

Available Metrics#

The metrics currently built into TEEHR are listed in the tables below. The metrics currently built into TEEHR are listed in the tables below. Please note that some are still in development and planned for inclusion in future versions.

Signatures#

Signatures#

Signatures operate on a single field to characterize timeseries properties. Signatures operate on a single field to characterize timeseries properties.

Available

Description

Short Name

Equation

API Reference

Average

\(Average\)

\(\frac{\sum(prim)}{count}\)

Average

Count

\(Count\)

\(count\)

Count

Flow Duration Curve Slope

\(FDC\ Slope\)

\(\frac{q85-q25}{p85-p25}\)

Flow Duration Curve Slope

Max Value Time

\(Max\ Value\ Time\)

\(peak\ time_{prim}\)

Max Value Time

Maximum

\(Max\)

\(max(prim)\)

Maximum

Minimum

\(Min\)

\(min(prim)\)

Minimum

Sum

\(Sum\)

\(\sum(prim)\)

Sum

Variance

\(Variance\)

\(\sigma^2_{prim}\)

Variance

Deterministic Metrics#

Deterministic metrics compare two timeseries, typically primary (“observed”) vs. secondary (“modeled”) values.

Probabilistic Metrics#

Probabilistic metrics compare a value against a distribution of predicted values, such as ensemble forecasts.

Available

Description

Short Name

Equation

API Reference

Continuous Ranked Probability Score

\(CRPS\)

\(\int_{-\infty}^{\infty} (F(x) - \mathbf{1}_{x \geq y})^2 dx\)

Continuous Ranked Probability Score

Brier Score

\(BS\)

\(\frac{\sum(sec\ ensemble\ prob-prim\ outcome)^2}{n}\)

Brier Score

Brier Skill Score

\(BSS\)

\(1-\frac{BS}{BS_{ref}}\)

Brier Score

Continuous Ranked Probability Skill Score

\(CRPSS\)

\(1-\frac{CRPS}{CRPS_{ref}}\)

Continuous Ranked Probability Score