TEEHR Evaluation Example 2 - Continued#

Daily Data, NWM 3.0 Retrospective and MARRMoT_37 HBV, CAMELS Subset (542)#

Evaluate Model Output#

The prior notebooks were all about setting the stage for evaluation - preparing and joining the data to make the evaluation as easy and efficient as possible. The reality is that evaluation is not remotely easy, particularly when you are dealing with very large datasets, 1000s of locations, many different models to compare, different baselines, different types of variables, different decision objectives, etc. There are a nearly endless number of ways to slice up the data, calculate metrics and visualize comparisons. It is difficult (or impossible) to know in advance which approach is going to give us the most useful insights to answer a given question (e.g., which model is better for certain conditions and objectives? how does it compare to a particular baseline? is there a relationship between performance and location characteristics (attributes))?

In this notebook we will demonstrate how to use TEEHR to calculate metrics from the joined TEEHR database created in Notebook ex2-1, using a range of different options for grouping and filtering. We will then create some common graphics based on the results.

In this notebook we will perform the following steps:#

  1. Review the contents of our joined parquet file
  2. Calculate metrics with different group_by options
  3. Calculate metrics with different filters options
  4. Example visualizations of TEEHR results

First setup the TEEHR class and review the contents of the joined parquet file#

from teehr.classes.duckdb_joined_parquet import DuckDBJoinedParquet
from pathlib import Path

# Define the paths to the joined parquet file and the geometry files
TEEHR_BASE = Path(Path.home(), 'teehr/example-2')
JOINED_FILEPATH = f"{TEEHR_BASE}/joined/teehr_joined.parquet"
GEOMETRY_FILEPATH = f"{TEEHR_BASE}/geometry/**/*.parquet"

# Initialize a teehr joined parquet class with our parquet file and geometry
joined_data = DuckDBJoinedParquet(
    joined_parquet_filepath = JOINED_FILEPATH,
    geometry_filepath = GEOMETRY_FILEPATH
)

1. Review the contents of the joined parquet file#

In practice, you may want to review the fields of data in the parquet file to plan your evaluation strategy. If the dataset is large, reading it into a dataframe may be cumbersome or infeasible. TEEHR provides the get_joined_timeseries_schema method to quickly review the fields of the joined parquet file and the get_unique_field_values method to review the unique values contained in a specified field. The latter is particularly helpful for building dashboards for evaluation (e.g., to populate a drop down menu of possible filter or group_by values).

joined_data.get_joined_timeseries_schema()
# Review what configuration datasets were included
joined_data.get_unique_field_values('configuration')
# ...number of locations
len(joined_data.get_unique_field_values('primary_location_id'))
# ...what variables were included
joined_data.get_unique_field_values('variable_name')
# ...and what unique attribute values are available for grouping
joined_data.get_unique_field_values('obs_flow_category_q_mean')
# ...and get unique river_forecast_center values are available for grouping
joined_data.get_unique_field_values('river_forecast_center')

2. Calculate metrics with different group_by options#

The get_metrics method for the joined parquet class works the same way as the ‘on-the-fly’ get_metrics function we ran in Notebook 2, except you do not need to specify any filepaths because those were already defined when creating the pre-joined parquet file and when initializing the joined parquet class above.

The arguments for the joined parquet class get_metrics are:

  • group_by -> list of the fields TEEHR should use to group (or 'pool') the data for metric calculations (most often primary_location_id and configuration)*
  • order_by -> list of fields on which the resulting table (dataframe) of metrics should be sorted
  • include_metrics -> list of the available metrics to include in the results (available list link here)
  • filters (optional) -> list of filters to extract a subset of the available data (e.g., locations, dates or value ranges)
  • include_geometry (optional) -> whether or not (True or False) the location geometry should be included in the results dataframe

First, get the metrics for each location and configuration, for all data points available (no filters)#

This is the same query we ran in Notebook 2 using the on-the-fly method. Note the run time difference (~10s using on-the-fly vs. <1 s using pre-joined). This might not seem significant, but when you scale up (hourly data, many model scenarios, 1000s of locations, etc.) the difference becomes much more significant. The pre-joined method will allow you to actually explore the data without waiting minutes (or hours) to recalculate metrics in different ways.

%%time

gdf_all = joined_data.get_metrics(
    group_by=["primary_location_id", "configuration"],
    order_by=["primary_location_id", "configuration"],
    include_metrics=[
        'kling_gupta_efficiency_mod2',
        'relative_bias',
        'pearson_correlation',                  
        'nash_sutcliffe_efficiency_normalized',  
        'mean_absolute_relative_error',
        'primary_count' 
    ],
    include_geometry=True,
)
# view the dataframe
gdf_all
# use pandas magic to create a nice summary table of the metrics by model configuration across locations
gdf_all.groupby('configuration').describe(percentiles=[.5]).unstack(1).reset_index().rename(
    columns={'level_0':'metric','level_1':'summary'}).pivot(
    index=['metric','configuration'], values=0, columns='summary')
%%time

'''
Calculate metrics separately for low flows and high flows based on the 
calculated field "obs_flow_category_q_mean" -> add the field to the group_by list.  
'''

gdf_flowcat = joined_data.get_metrics(
    group_by=["primary_location_id", "configuration", "obs_flow_category_q_mean"],
    order_by=["primary_location_id", "configuration"],
    include_metrics=[
        'kling_gupta_efficiency_mod2',
        'pearson_correlation',                  
        'mean_absolute_relative_error',
        'primary_count' 
    ],
)
display(gdf_flowcat)
gdf_flowcat.groupby(['configuration','obs_flow_category_q_mean']).describe(percentiles=[.5]).unstack().unstack().reset_index().rename(
    columns={'level_0':'metric','level_1':'summary'}).pivot(
    index=['metric','obs_flow_category_q_mean','configuration'], values=0, columns='summary')
%%time
'''
Now add the location characteristics you want included in the metrics table
(for output tables and visualization)

To include location-specific attributes in the metrics table, those attributes 
must be added to the group_by list.  If grouping across locations (.e.g., all locations 
within an RFC region), you should only add attributes that are already aggregated by that 
same region (TEEHR does not check for this). An example of including location characteristic 
attributes is included below.

Notice we set include_geometry to False in this example so geometry is not included in the
resulting dataframe.

'''
# list the attributes that are location characteristics that you want to include 
# in the metrics results tables
include_location_characteristics = [
    'aridity',
    'runoff_ratio',
    'baseflow_index',
    'stream_order',  
    'q_mean_cms',
    'slope_fdc',  
    'frac_urban',
    'frac_snow',
    'forest_frac',
    'ecoregion_L2',
    'river_forecast_center',
]
df_atts = joined_data.get_metrics(
    group_by=["primary_location_id", "configuration"] + include_location_characteristics,
    order_by=["primary_location_id", "configuration"],
    include_metrics=[
        'kling_gupta_efficiency_mod2',
        'pearson_correlation',                  
        'mean_absolute_relative_error',
        'relative_bias',
        'primary_count' 
    ],
    include_geometry=False,
)

# view the dataframe
display(df_atts)

# summarize just the median results across locations by attribute (river forecast center)
df_atts_summary = df_atts.groupby(['configuration','river_forecast_center'])\
    .describe(percentiles=[.5]).unstack().unstack().reset_index()\
    .rename(columns={'level_0':'metric','level_1':'summary'})
df_atts_summary[df_atts_summary['summary'].isin(['50%'])].pivot(
    index=['river_forecast_center','configuration'],values=0, columns=['metric','summary'])

3. Calculate metrics with different filters options#

The filter option allows you to calculate metrics for desired subsets of the data. Filters can be combined resulting in nearly endless possibilities, but some examples could include:

  • A date range to isolate a calibration period and validation period
  • Stream orders < 4 to isolate performance in small streams
  • RFC to get results only for a specific RFC region
  • Aridity greater [less] than a threshold to get results only for arid [humid] locations
  • Specified months to assess performance only for a specified season

The example below gets metrics results for small streams in the summer in the southeast and creates a KGE’’ map of results. Try different filters of your choice.

%%time

import geoviews as gv
import holoviews as hv
import colorcet as cc
hv.extension('bokeh', logo=False)
gv.extension('bokeh', logo=False)
basemap = hv.element.tiles.CartoLight()

gdf_filters = joined_data.get_metrics(
    group_by=["primary_location_id", "configuration", "stream_order"],
    order_by=["primary_location_id", "configuration"],
    include_metrics=[
        'kling_gupta_efficiency_mod2',
        'relative_bias',
        'pearson_correlation',                  
        'nash_sutcliffe_efficiency_normalized',  
        'mean_absolute_relative_error',
        'primary_count' 
    ],
    filters = [
          {
              "column": "stream_order",
              "operator": "in",
              "value": ['1','2','3','4']
              #"value": ['5','6','7','8']
          },
         # {
         #     "column": "month",
         #     "operator": "in",
         #     "value": ['5','6','7','8','9']
         # },
         # {
         #     "column": "river_forecast_center",
         #     "operator": "=",
         #     "value": "SERFC"
         # },
    ],
    include_geometry=True,
)
#display(gdf_filters.head())

# make a quick map of locations - see how it changes as you make different filter selections
basemap * gv.Points(gdf_filters, vdims=['kling_gupta_efficiency_mod2','configuration']).select(
    configuration='nwm30_retro').opts(
    color='kling_gupta_efficiency_mod2', 
    height=400, width=600, size=7, 
    cmap=cc.rainbow[::-1], colorbar=True, clim=(0,1))

4. Example visualizations of TEEHR evaluation results#

Just as there are many ways to calculate metrics, there are even more ways to visualize the results. TEEHR does not currently include methods to generate plots, though that capability may be added in it becomes clear it would be useful. Instead of building visualization functionality directly in TEEHR, our focus has been using one of the many available and powerful python packages for plotting in conjunction with TEEHR. The above map and below examples are all created using the holoviz suite of visualization tools (holoviz.org), with the ‘bokeh’ backend, which includes some interactive functionality by default. These examples are just barely scratching the surface of what’s feasible with these tools. Holoviews makes it easy to create interactive dashboards to explore the vast range of performance relationships, patterns, and other insights that can be revealed by TEEHR evaluation results like those above. This was too much to fit into the workshop, but we will share examples and future workshops will may explore more visualization methods. We would love feedback about which types of visualization approaches the community sees as top priorities to include in TEEHR directly.

# set up color and abbrevation settings to use across multiple plots

metric_abbrev=dict(
    kling_gupta_efficiency_mod2 = "KGE''",
    mean_absolute_relative_error = "MAE",
    pearson_correlation = "Correlation",
    relative_bias  = "Rel.Bias",
    nash_sutcliffe_efficiency_normalized = "NNSE",
)
cmap_lin = cc.rainbow[::-1]
cmap_div = cc.CET_D1A[::-1]
metric_colors=dict(
    kling_gupta_efficiency_mod2          = {'cmap': cmap_lin, 'clim': (0,1)},  
    relative_bias                        = {'cmap': cmap_div, 'clim': (-1,1)},   
    pearson_correlation                  = {'cmap': cmap_lin, 'clim': (0,1)},     
    nash_sutcliffe_efficiency_normalized = {'cmap': cmap_lin, 'clim': (0,1)}, 
    mean_absolute_relative_error         = {'cmap': cmap_lin, 'clim': (0,2)},
)
metrics = list(metric_colors.keys())
configs = ['nwm30_retro', 'marrmot_hbv']
config_colors = dict(marrmot_hbv='red', nwm30_retro='#377EB8')

4a. Metric maps#

First we will create side-by-side maps of the first query results above (all locations and configurations, no filters), showing metric values at each location, where dots are colored by metric value and sized by sample size. See how the comparison changes for each metric.

# map_metric = 'kling_gupta_efficiency_mod2'
# map_metric = 'pearson_correlation'                  
# map_metric = 'nash_sutcliffe_efficiency_normalized'
# map_metric = 'mean_absolute_relative_error' 
map_metric = 'relative_bias'

# factor to size dots based on sample size 
size_factor = 15/max(gdf_filters[('primary_count')])

polys = gv.Points(
    gdf_all, 
    vdims = metrics + ['primary_location_id','configuration','primary_count'],
    label = 'metric value (color), sample size (size)',
).opts(
    height = 400,
    width = 600,
    line_color = 'gray',
    colorbar = True,
    size = hv.dim('primary_count') * 15/max(gdf_filters[('primary_count')]),
    tools = ['hover'],
    xaxis = 'bare',
    yaxis = 'bare',
    show_legend = True
)
maps = []
for config in configs:
    maps.append(basemap * polys.select(configuration=config).opts(
            title=f"{config} | {metric_abbrev[map_metric]}",
            color = map_metric,
            clim = metric_colors[map_metric]['clim'],
            cmap = metric_colors[map_metric]['cmap']
        )
    )
maps[0] + maps[1]

4b. Dataframe table and bar chart side by side#

Next we will summarize results across locations by creating a summary table with pandas (as we did above) and juxtapose it with a bar chart using holoviews and panel.

# Display dataframes and simple plots side by side using Panel
import panel as pn

gdf_summary = gdf_all.groupby('configuration').describe(percentiles=[.5]).unstack(1).reset_index().rename(
    columns={'level_0':'metric','level_1':'summary'}).pivot(
    index=['metric','configuration'], values=0, columns='summary')

gdf_bars = gdf_summary.drop('primary_count', axis=0)['50%'].reset_index().replace({'metric':metric_abbrev})
bars = hv.Bars(gdf_bars, kdims=['metric', 'configuration']).opts(
    xrotation=90, height=400, width=400, ylabel='median',xlabel='')

pn.Row(pn.pane.DataFrame(gdf_summary, width=800), bars)

4c. Box-whisker plots of results by metric and model#

Next we’ll create box-whisker plots to see the distribution of metrics across locations for each metric and configuration.

# remove geometry so holoviews knows this is not a map.
df = gdf_all.drop('geometry', axis=1)

opts = dict(
    show_legend=False, 
    width=200, 
    cmap='Set1', 
    xrotation=45,
    labelled=[]
)
boxplots = []
for metric in metrics:
    boxplots.append(
        hv.BoxWhisker(df, 'configuration', metric, label=metric_abbrev[metric]).opts(
            **opts,
            box_fill_color=hv.dim('configuration')
        )
    )
hv.Layout(boxplots).cols(len(metrics))

4d. Histograms by metric and model#

Every good scientist loves a histogram. The below example creates a layout of histograms by configuration and metric, which gives us a more complete understanding of the metric distributions.

import hvplot.pandas
histograms =[]
for config in configs:
    for metric in metrics:
        histograms.append(
            df[df['configuration']==config].hvplot.hist(
                y=metric, 
                ylim=(0,200),
                color=config_colors[config],
                bin_range=metric_colors[metric]['clim'], 
                xlabel=metric_abbrev[metric],
            ).opts(height = 200, width=250, title = config)
        )
hv.Layout(histograms).cols(len(metrics))

4e. CDFs overlays by metric#

Every good scientist loves a CDF even more. The below example creates a layout of histograms by configuration and metric, which gives us a more complete understanding of the metric distributions. We include metrics here with (mostly) the same range (0,1) and ‘good’ value (1).

import numpy as np

layout = []
for metric in [
    'kling_gupta_efficiency_mod2',
    'pearson_correlation',                  
    'nash_sutcliffe_efficiency_normalized',
]:
    xlim = metric_colors[metric]['clim']
    xlabel = metric_abbrev[metric]
    
    cdfs = hv.Curve([])
    for config in configs:
        data = df[df['configuration']==config].copy()
        data[xlabel] = np.sort(data[metric])
        n = len(data[xlabel])
        data['y'] = 1. * np.arange(n) / (n - 1)    
        cdfs = cdfs * hv.Curve(data, xlabel, 'y', label=config).opts(color=config_colors[config])
        
    layout.append(
        cdfs.opts(
            width = 300,
            legend_position='top_left',
            xlim=xlim, 
            xlabel=xlabel,
            title=metric_abbrev[metric],
            shared_axes=False,
        )
    )
    
hv.Layout(layout).cols(5)

4f. Bar charts by attribute#

In the third example query above, we demonstrate how to add attributes to the resulting dataframe for summary and visualization purposes. In that example we generated a summary table to RFC region. The below example uses those result to build bar charts of the median performance metric across locations within each RFC region.

df_bars = df_atts_summary.set_index('metric').drop('primary_count', axis=0).reset_index().set_index('summary').loc['50%']
df_bars = df_bars.replace({'metric': metric_abbrev}) \
    .rename(columns={'river_forecast_center':'rfc',0:'median'}) \
    .reset_index().drop('summary', axis=1)
df_bars.loc[df_bars['metric'] == 'MAE', 'median'] = 1 - df_bars.loc[df_bars['metric'] == 'MAE', 'median']
df_bars = df_bars.replace('MAE','1-MAE')

bars = hv.Bars(df_bars, kdims=['metric', 'configuration','rfc'],vdims=['median']).opts(
    xrotation=90, height=300, width=300, ylabel='median',xlabel='')

layout = []
for rfc in df_bars['rfc'].unique():
    layout.append(bars.select(rfc=rfc).opts(title=rfc))
hv.Layout(layout)

4g Scatter plots by attribute#

Scatter plots of location metric values and location characteristics can provide insight about the relationship between the two - i.e., does model performance have a clear relationship with any of the characteristics?

# As examples, let's create scatter plots of KGE with each of the numeric attributes
import pandas as pd

location_chars = [
    'aridity',
    'runoff_ratio',
    'baseflow_index',
    'stream_order',  
    'q_mean_cms',
    'slope_fdc',  
    'frac_urban',
    'frac_snow',
    'forest_frac'
]
df_atts[location_chars] = df_atts[location_chars].apply(pd.to_numeric)
df_atts['config_num'] = np.where(df_atts['configuration']=='nwm30_retro',1,2)

metrics = [
    'kling_gupta_efficiency_mod2',
    'pearson_correlation',                  
    'mean_absolute_relative_error',
    'relative_bias',
]
from bokeh.models import FixedTicker

scatter_layout = []
for char in location_chars:
    scatter_layout.append(
        hv.Scatter(
            df_atts, 
            kdims=[char],
            vdims=['kling_gupta_efficiency_mod2', 'relative_bias', 'primary_location_id','config_num']
        ).opts(
            width = 400, height = 300,
            #color = 'relative_bias',
            color = 'config_num',
            cmap = ['#377EB8', '#E41A1C'],
            colorbar = True,
            clim=(0.5,2.5),
            ylabel = "KGE''",
            tools=['hover'],
            ylim=(-1,1),
            size=4,
            alpha=0.8,
            colorbar_opts={
                'ticker': FixedTicker(ticks=[1,2]),
                'major_label_overrides': {
                    1: 'nwm', 
                    2: 'hbv', 
                },
                'major_label_text_align': 'left',
            },
        ))
hv.Layout(scatter_layout).cols(3)