Metrics#

TEEHR provides comprehensive metrics for evaluating hydrologic model performance. The aggregate() method on tables and views computes metrics across grouped data, with support for bootstrapping, transforms, multiple metric categories, and a choice of aggregation engine (Spark-native or Python/pandas-UDF).

Using the Aggregate Method#

The aggregate() method is available on all Table and View objects. It computes specified metrics grouped by selected fields:

import teehr
from teehr.metrics import DeterministicMetrics

ev = teehr.LocalReadWriteEvaluation(dir_path="/path/to/evaluation")

# Basic metrics query
metrics_df = ev.table("joined_timeseries").aggregate(
    metrics=[
        DeterministicMetrics.KlingGuptaEfficiency(),
        DeterministicMetrics.NashSutcliffeEfficiency(),
    ],
    group_by=["primary_location_id"],
).to_pandas()

Aggregate Parameters#

Parameter

Description

metrics

List of metric instances to compute

group_by

List of fields to group by before computing metrics

engine

Aggregation engine: "auto" (default), "spark", or "python"

Choosing an Engine#

The engine parameter controls how metrics are computed under the hood.

Engine

Behaviour

"auto" (default)

Routes each metric to the fastest available path. Metrics that have a Spark-native implementation run without pandas UDFs; remaining metrics fall back to the Python/pandas-UDF path. Results are joined before being returned.

"spark"

Forces the Spark-native path for every metric. Raises ValueError if any requested metric is not supported natively (e.g. metrics with a transform or bootstrap configured). Produces a physical plan with no AggregateInPandas nodes, which can significantly improve performance on large datasets.

"python"

Forces the Python/pandas-UDF path for every metric. Behaves identically to the pre-engine-parameter behavior.

Metrics supported on the Spark-native path (no transform, no bootstrap):

Signature metrics: Count, Minimum, Maximum, Average, Sum, Variance, MaxValueTime

Deterministic metrics: MeanError, RelativeBias, MultiplicativeBias, MeanSquareError, RootMeanSquareError, MeanAbsoluteError, MeanAbsoluteRelativeError, PearsonCorrelation, SpearmanCorrelation, Rsquared, NashSutcliffeEfficiency, NormalizedNashSutcliffeEfficiency, VariabilityRatio, RootMeanStandardDeviationRatio, KlingGuptaEfficiency, KlingGuptaEfficiencyMod1, KlingGuptaEfficiencyMod2, RelativeMean, RelativeMedian, RelativeMinimum, RelativeMaximum, RelativeStandardDeviation

Note

Metrics that use a transform (e.g. transform="log") or a bootstrap configuration are always routed to the Python path, even in engine="auto" mode.

Note

Spark-native quantile-derived metrics may use Spark approximate quantile algorithms rather than exact order statistics. In the current Spark-native metric set, this mainly affects metrics that depend on medians, such as RelativeMedian. The approximation is computed from a distributed summary of the full group, not from a simple random sample, so it is usually a good tradeoff for large datasets. The main practical effect is that values very close to the cutoff can shift slightly relative to an exact pandas result. If exact quantile behavior is important for your analysis, use engine="python".

from teehr import DeterministicMetrics

# Explicitly use Spark-native path for a pure-native query
metrics_df = ev.table("joined_timeseries").aggregate(
    metrics=[
        DeterministicMetrics.KlingGuptaEfficiency(),
        DeterministicMetrics.NashSutcliffeEfficiency(),
        DeterministicMetrics.RelativeBias(),
    ],
    group_by=["primary_location_id"],
    engine="spark",
).to_pandas()

# Auto mode – mix native and python metrics transparently
metrics_df = ev.table("joined_timeseries").aggregate(
    metrics=[
    DeterministicMetrics.MeanError(),                      # spark-native
    DeterministicMetrics.MeanError(transform="log"),      # python path (transform)
    ],
    group_by=["primary_location_id"],
).to_pandas()

Group By Fields#

The group_by parameter controls how metrics are aggregated. Common groupings:

import teehr.calculated_fields.models.row_level as rcf

# Group by location only
jt.aggregate(metrics=[...], group_by=["primary_location_id"])

# Group by location and configuration
jt.aggregate(metrics=[...], group_by=["primary_location_id", "configuration_name"])

# Group by calculated fields
jt = ev.joined_timeseries_view().add_calculated_fields([
    rcf.Month(),
    rcf.WaterYear(),
])
jt.aggregate(metrics=[...], group_by=["primary_location_id", "water_year", "month"])

Using Metrics#

Import metric classes and instantiate them:

from teehr.metrics import DeterministicMetrics, Signatures, ProbabilisticMetrics

# Deterministic metrics
kge = DeterministicMetrics.KlingGuptaEfficiency()
nse = DeterministicMetrics.NashSutcliffeEfficiency()
rmse = DeterministicMetrics.RootMeanSquareError()

# Signatures (single field statistics)
avg = Signatures.Average()
fdc = Signatures.FlowDurationCurveSlope()

# Probabilistic metrics (ensemble forecasts)
crps = ProbabilisticMetrics.CRPS()

Transforms#

Apply mathematical transformations before computing metrics:

from teehr.metrics.models.base import TransformEnum

# Log-transformed RMSE
rmse = DeterministicMetrics.RootMeanSquareError()
rmse.transform = TransformEnum.log
rmse.add_epsilon = True  # Avoid log(0)

Available transforms: log, sqrt, square, cube, exp, inv, abs

Bootstrapping#

Compute confidence intervals using bootstrap resampling:

For CircularBlock and Stationary bootstrapping, block_size is optional. If omitted (or set to None), TEEHR uses arch.bootstrap.optimal_block_length to estimate an optimal block size from the primary metric input series.

from teehr.metrics.models.bootstrap import Bootstrappers

# Configure bootstrap
boot = Bootstrappers.CircularBlock(
    reps=1000,
    # block_size=None -> auto estimate using optimal_block_length (b_cb)
    block_size=None,
    seed=42,
    quantiles=[0.05, 0.5, 0.95]
)

# Optional: provide a fixed block size if desired
# boot = Bootstrappers.CircularBlock(reps=1000, block_size=365, seed=42)

# Apply to metric
kge = DeterministicMetrics.KlingGuptaEfficiency()
kge.bootstrap = boot
kge.unpack_results = True  # Separate columns for quantiles

metrics_df = jt.aggregate(
    metrics=[kge],
    group_by=["primary_location_id"],
).to_pandas()

# Results: kling_gupta_efficiency_0.05, _0.5, _0.95

See also: Bootstrappers

Complete Example#

import teehr
from teehr.metrics import DeterministicMetrics, Signatures
import teehr.calculated_fields.models.row_level as rcf

ev = teehr.LocalReadWriteEvaluation(dir_path="/path/to/evaluation")

# Build view with calculated fields
metrics_df = (
    ev.joined_timeseries_view(add_attrs=True)
    .add_calculated_fields([rcf.WaterYear(), rcf.Seasons()])
    .filter("water_year >= 2015")
    .aggregate(
        metrics=[
            DeterministicMetrics.KlingGuptaEfficiency(),
            DeterministicMetrics.RelativeBias(),
            Signatures.Average(),
        ],
        group_by=["primary_location_id", "season"],
    )
    .order_by(["primary_location_id", "season"])
    .to_pandas()
)

print(metrics_df.head())
ev.spark.stop()

Available Metrics#

The metrics currently built into TEEHR are listed in the tables below. The metrics currently built into TEEHR are listed in the tables below. Please note that some are still in development and planned for inclusion in future versions.

Signatures#

Signatures#

Signatures operate on a single field to characterize timeseries properties. Signatures operate on a single field to characterize timeseries properties.

Available

Description

Short Name

Equation

API Reference

Average

\(Average\)

\(\frac{\sum(prim)}{count}\)

Average

Count

\(Count\)

\(count\)

Count

Flow Duration Curve Slope

\(FDC\ Slope\)

\(\frac{q85-q25}{p85-p25}\)

Flow Duration Curve Slope

Max Value Time

\(Max\ Value\ Time\)

\(peak\ time_{prim}\)

Max Value Time

Maximum

\(Max\)

\(max(prim)\)

Maximum

Minimum

\(Min\)

\(min(prim)\)

Minimum

Sum

\(Sum\)

\(\sum(prim)\)

Sum

Variance

\(Variance\)

\(\sigma^2_{prim}\)

Variance

Deterministic Metrics#

Deterministic metrics compare two timeseries, typically primary (“observed”) vs. secondary (“modeled”) values.

Available

Description

Short Name

Equation

API Reference

Mean Error

\(Mean\ Error\)

\(\frac{\sum(sec-prim)}{count}\)

Mean Error

Relative Bias

\(Relative\ Bias\)

\(\frac{\sum(sec-prim)}{\sum(prim)}\)

Relative Bias

Multiplicative Bias

\(Mult.\ Bias\)

\(\frac{\mu_{sec}}{\mu_{prim}}\)

Multiplicative Bias

Relative Mean

\(RelMean\)

\(\frac{mean(sec)}{mean(prim)}\)

Relative Mean

Relative Median

\(RelMedian\)

\(\frac{median(sec)}{median(prim)}\)

Relative Median

Relative Minimum

\(RelMin\)

\(\frac{min(sec)}{min(prim)}\)

Relative Minimum

Relative Maximum

\(RelMax\)

\(\frac{max(sec)}{max(prim)}\)

Relative Maximum

Relative Standard Deviation

\(RelStd\)

\(\frac{std(sec)}{std(prim)}\)

Relative Standard Deviation

Mean Square Error

\(MSE\)

\(\frac{\sum(sec-prim)^2}{count}\)

Mean Square Error

Root Mean Square Error

\(RMSE\)

\(\sqrt{\frac{\sum(sec-prim)^2}{count}}\)

Root Mean Square Error

Mean Absolute Error

\(MAE\)

\(\frac{\sum|sec-prim|}{count}\)

Mean Absolute Error

Mean Absolute Relative Error

\(Relative\ MAE\)

\(\frac{\sum|sec-prim|}{\sum(prim)}\)

Mean Absolute Relative Error

Pearson Correlation Coefficient

\(r\)

\(r(sec, prim)\)

Pearson Correlation Coefficient

Variability Ratio

\(VR\)

\(\frac{\sigma_{sec}}{\sigma_{prim}}\)

Variability Ratio

Coefficient of Determination

\(r^2\)

\(r(sec, prim)^2\)

Coefficient of Determination

Nash-Sutcliffe Efficiency

\(NSE\)

\(1-\frac{\sum(prim-sec)^2}{\sum(prim-\mu_{prim}^2)}\)

Nash-Sutcliffe Efficiency

Normalized Nash-Sutcliffe Efficiency

\(NNSE\)

\(\frac{1}{(2-NSE)}\)

Normalized Nash-Sutcliffe Efficiency

Kling Gupta Efficiency - original

\(KGE\)

\(1-\sqrt{(r(sec, prim)-1)^2+(\frac{\sigma_{sec}}{\sigma_{prim}}-1)^2+(\frac{\mu_{sec}}{\mu_{sec}/\mu_{prim}}-1)^2}\)

Kling Gupta Efficiency - original

Kling Gupta Efficiency - modified 1 (2012)

\(KGE'\)

\(1-\sqrt{(r(sec, prim)-1)^2+(\frac{\sigma_{sec}/\mu_{sec}}{\sigma_{prim}/\mu_{prim}}-1)^2+(\frac{\mu_{sec}}{\mu_{sec}/\mu_{prim}}-1)^2}\)

Kling Gupta Efficiency - modified 1

Kling Gupta Efficiency - modified 2 (2021)

\(KGE''\)

\(1-\sqrt{(r(sec, prim)-1)^2+(\frac{\sigma_{sec}}{\sigma_{prim}}-1)^2+\frac{(\mu_{sec}-\mu_{prim})^2}{\sigma_{prim}^2}}\)

Kling Gupta Efficiency - modified 2

Annual Peak Relative Bias

\(Ann\ PF\ Bias\)

\(\frac{\sum(ann.\ peak_{sec}-ann.\ peak_{prim})}{\sum(ann.\ peak_{prim})}\)

Annual Peak Relative Bias

Spearman Rank Correlation Coefficient

\(r_s\)

\(1-\frac{6*\sum|rank_{prim}-rank_{sec}|^2}{count(count^2-1)}\)

Spearman Rank Correlation Coefficient

Max Value Delta

\(Max\ Val\ Delta\)

\(max(sec) - max(prim)\)

Max Value Delta

Root Mean Standard Deviation Ratio

\(RSR\)

\(\frac{RMSE}{\sigma_{prim}}\)

Root Mean Standard Deviation Ratio

Max Value Time Delta

\(Max\ Val\ Time\ Delta\)

\(time(max(sec)) - time(max(prim))\)

Max Value Time Delta

Coming Soon

Flow Duration Curve Slope Error

\(Slope\ FDC\ Error\)

\(\frac{q66_{sec}-q33_{sec}}{33}-\frac{q66_{prim}-q33_{prim}}{33}\)

N/A

Event Peak Flow Relative Bias

\(Peak\ Bias\)

\(\frac{\sum(peak_{sec}-peak_{prim})}{\sum(peak_{prim})}\)

N/A

Event Peak Flow Timing Error

\(Peak\ Time\ Error\)

\(\frac{\sum(peak\ time_{sec}-peak\ time_{prim})}{count}\)

N/A

Coming Soon

Baseflow Index Error

\(BFI\ Error\)

\(\frac{\frac{\mu(baseflow_{sec})}{\mu(sec)}-\frac{\mu(baseflow_{prim})}{\mu(prim)}}{\frac{\mu(baseflow_{prim})}{\mu(prim)}}\)

N/A

Coming Soon

Rising Limb Density Error

\(RLD\ Error\)

\(\frac{count(rising\ limb\ events_{sec})}{count(rising\ limb\ timesteps_{sec})}-\frac{count(rising\ limb\ events_{prim})}{count(rising\ limb\ timesteps_{prim})}\)

N/A

Coming Soon

Mean Square Error Skill Score (generalized reference)

\(MSESS\)

\(1-\frac{\sum(prim-sec)^2}{\sum(prim-reference)^2}\)

N/A

Coming Soon

Runoff Ratio Error

\(RR\ Error\)

\(abs\left\|\frac{\mu(volume_{sec})}{\mu(precip\ volume)}-\frac{\mu(volume_{prim})}{\mu(precip\ volume)}\right\|\)

N/A

Confusion Matrix

\(CM\)

\(TP,\ TN,\ FP,\ FN\)

Confusion Matrix

False Alarm Ratio

\(FAR\)

\(\frac{n_{FP}}{n_{TP}+n_{FP}}\)

False Alarm Ratio

Probability of Detection

\(POD\)

\(\frac{n_{TP}}{n_{TP}+n_{FN}}\)

Probability of Detection

Probability of False Detection

\(POFD\)

\(\frac{n_{FP}}{n_{TN}+n_{FP}}\)

Probability of False Detection

Critical Success Index (Threat Score)

\(CSI\)

\(\frac{n_{TP}}{n_{TP}+n_{FN}+n_{FP}}\)

Critical Success Index

Success Ratio

\(SR\)

\(\frac{n_{TP}+n_{TN}}{n_{TP}+n_{FP}+n_{FN}+n_{TN}}\)

Success Ratio

Frequency Bias Index

\(FBI\)

\(\frac{n_{TP}+n_{FP}}{n_{TP}+n_{FN}}\)

Frequency Bias Index

Probabilistic Metrics#

Probabilistic metrics compare a value against a distribution of predicted values, such as ensemble forecasts.

Available

Description

Short Name

Equation

API Reference

Continuous Ranked Probability Score

\(CRPS\)

\(\int_{-\infty}^{\infty} (F(x) - \mathbf{1}_{x \geq y})^2 dx\)

Continuous Ranked Probability Score

Brier Score

\(BS\)

\(\frac{\sum(sec\ ensemble\ prob-prim\ outcome)^2}{n}\)

Brier Score

Brier Skill Score

\(BSS\)

\(1-\frac{BS}{BS_{ref}}\)

Brier Score

Continuous Ranked Probability Skill Score

\(CRPSS\)

\(1-\frac{CRPS}{CRPS_{ref}}\)

Continuous Ranked Probability Score