{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Add Calculated Fields" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "In this lesson we will clone a small example TEEHR Evaluation from S3 and work through the different ways to add calculated fields to the joined_timeseries table, save them to the joined_timeseries table on disk to persist them, or add them temporarily before calculating metrics. The first few steps for creating and cloning the TEEHR Evaluation should look familiar if you worked through the previous examples." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a new Evaluation\n", "First we will import TEEHR along with some other required libraries for this example. Then we create a new instance of the Evaluation that points to a directory where the evaluation data will be stored." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", "
\n", " \n", " Loading BokehJS ...\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": "'use strict';\n(function(root) {\n function now() {\n return new Date();\n }\n\n const force = true;\n\n if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\nconst JS_MIME_TYPE = 'application/javascript';\n const HTML_MIME_TYPE = 'text/html';\n const EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n const CLASS_NAME = 'output_bokeh rendered_html';\n\n /**\n * Render data to the DOM node\n */\n function render(props, node) {\n const script = document.createElement(\"script\");\n node.appendChild(script);\n }\n\n /**\n * Handle when an output is cleared or removed\n */\n function handleClearOutput(event, handle) {\n function drop(id) {\n const view = Bokeh.index.get_by_id(id)\n if (view != null) {\n view.model.document.clear()\n Bokeh.index.delete(view)\n }\n }\n\n const cell = handle.cell;\n\n const id = cell.output_area._bokeh_element_id;\n const server_id = cell.output_area._bokeh_server_id;\n\n // Clean up Bokeh references\n if (id != null) {\n drop(id)\n }\n\n if (server_id !== undefined) {\n // Clean up Bokeh references\n const cmd_clean = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n cell.notebook.kernel.execute(cmd_clean, {\n iopub: {\n output: function(msg) {\n const id = msg.content.text.trim()\n drop(id)\n }\n }\n });\n // Destroy server and session\n const cmd_destroy = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n cell.notebook.kernel.execute(cmd_destroy);\n }\n }\n\n /**\n * Handle when a new output is added\n */\n function handleAddOutput(event, handle) {\n const output_area = handle.output_area;\n const output = handle.output;\n\n // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n if ((output.output_type != \"display_data\") || (!Object.prototype.hasOwnProperty.call(output.data, EXEC_MIME_TYPE))) {\n return\n }\n\n const toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n\n if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n // store reference to embed id on output_area\n output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n }\n if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n const bk_div = document.createElement(\"div\");\n bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n const script_attrs = bk_div.children[0].attributes;\n for (let i = 0; i < script_attrs.length; i++) {\n toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n toinsert[toinsert.length - 1].firstChild.textContent = bk_div.children[0].textContent\n }\n // store reference to server id on output_area\n output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n }\n }\n\n function register_renderer(events, OutputArea) {\n\n function append_mime(data, metadata, element) {\n // create a DOM node to render to\n const toinsert = this.create_output_subarea(\n metadata,\n CLASS_NAME,\n EXEC_MIME_TYPE\n );\n this.keyboard_manager.register_events(toinsert);\n // Render to node\n const props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n render(props, toinsert[toinsert.length - 1]);\n element.append(toinsert);\n return toinsert\n }\n\n /* Handle when an output is cleared or removed */\n events.on('clear_output.CodeCell', handleClearOutput);\n events.on('delete.Cell', handleClearOutput);\n\n /* Handle when a new output is added */\n events.on('output_added.OutputArea', handleAddOutput);\n\n /**\n * Register the mime type and append_mime function with output_area\n */\n OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n /* Is output safe? */\n safe: true,\n /* Index of renderer in `output_area.display_order` */\n index: 0\n });\n }\n\n // register the mime type if in Jupyter Notebook environment and previously unregistered\n if (root.Jupyter !== undefined) {\n const events = require('base/js/events');\n const OutputArea = require('notebook/js/outputarea').OutputArea;\n\n if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n register_renderer(events, OutputArea);\n }\n }\n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n const NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"

\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"

\\n\"+\n \"\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"
\"}};\n\n function display_loaded(error = null) {\n const el = document.getElementById(\"bc94028c-c42f-4b78-b7fe-1cda7b24e7ff\");\n if (el != null) {\n const html = (() => {\n if (typeof root.Bokeh === \"undefined\") {\n if (error == null) {\n return \"BokehJS is loading ...\";\n } else {\n return \"BokehJS failed to load.\";\n }\n } else {\n const prefix = `BokehJS ${root.Bokeh.version}`;\n if (error == null) {\n return `${prefix} successfully loaded.`;\n } else {\n return `${prefix} encountered errors while loading and may not function as expected.`;\n }\n }\n })();\n el.innerHTML = html;\n\n if (error != null) {\n const wrapper = document.createElement(\"div\");\n wrapper.style.overflow = \"auto\";\n wrapper.style.height = \"5em\";\n wrapper.style.resize = \"vertical\";\n const content = document.createElement(\"div\");\n content.style.fontFamily = \"monospace\";\n content.style.whiteSpace = \"pre-wrap\";\n content.style.backgroundColor = \"rgb(255, 221, 221)\";\n content.textContent = error.stack ?? error.toString();\n wrapper.append(content);\n el.append(wrapper);\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(() => display_loaded(error), 100);\n }\n }\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) {\n if (callback != null)\n callback();\n });\n } finally {\n delete root._bokeh_onload_callbacks\n }\n console.debug(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(css_urls, js_urls, callback) {\n if (css_urls == null) css_urls = [];\n if (js_urls == null) js_urls = [];\n\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = css_urls.length + js_urls.length;\n\n function on_load() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n run_callbacks()\n }\n }\n\n function on_error(url) {\n console.error(\"failed to load \" + url);\n }\n\n for (let i = 0; i < css_urls.length; i++) {\n const url = css_urls[i];\n const element = document.createElement(\"link\");\n element.onload = on_load;\n element.onerror = on_error.bind(null, url);\n element.rel = \"stylesheet\";\n element.type = \"text/css\";\n element.href = url;\n console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n document.body.appendChild(element);\n }\n\n for (let i = 0; i < js_urls.length; i++) {\n const url = js_urls[i];\n const element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error.bind(null, url);\n element.async = false;\n element.src = url;\n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n };\n\n function inject_raw_css(css) {\n const element = document.createElement(\"style\");\n element.appendChild(document.createTextNode(css));\n document.body.appendChild(element);\n }\n\n const js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-3.6.1.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-3.6.1.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-3.6.1.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-3.6.1.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-mathjax-3.6.1.min.js\"];\n const css_urls = [];\n\n const inline_js = [ function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\nfunction(Bokeh) {\n }\n ];\n\n function run_inline_js() {\n if (root.Bokeh !== undefined || force === true) {\n try {\n for (let i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }\n\n } catch (error) {display_loaded(error);throw error;\n }if (force === true) {\n display_loaded();\n }} else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n const cell = $(document.getElementById(\"bc94028c-c42f-4b78-b7fe-1cda7b24e7ff\")).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n }\n\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(css_urls, js_urls, function() {\n console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));", "application/vnd.bokehjs_load.v0+json": "" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import teehr\n", "from pathlib import Path\n", "import shutil\n", "\n", "# Tell Bokeh to output plots in the notebook\n", "from bokeh.io import output_notebook\n", "output_notebook()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Setting default log level to \"WARN\".\n", "To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n", "25/01/06 16:53:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\n" ] } ], "source": [ "# Define the directory where the Evaluation will be created\n", "test_eval_dir = Path(Path().home(), \"temp\", \"08_calculated_fields\")\n", "shutil.rmtree(test_eval_dir, ignore_errors=True)\n", "\n", "# Create an Evaluation object and create the directory\n", "ev = teehr.Evaluation(dir_path=test_eval_dir, create_dir=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Clone Evaluation Data form S3\n", "As mentioned above, for this exercise we will be cloning a complete Evaluation dataset from the TEEHR S3 bucket. First we will list the available Evaluations and then we will clone the `e0_2_location_example` evaluation which is a small example Evaluation that only contains 2 gages." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namedescriptionurl
0e0_2_location_exampleExample evaluation datsets with 2 USGS gagess3a://ciroh-rti-public-data/teehr-data-warehou...
1e1_camels_daily_streamflowDaily average streamflow at ther Camels basinss3a://ciroh-rti-public-data/teehr-data-warehou...
2e2_camels_hourly_streamflowHourly instantaneous streamflow at ther Camels...s3a://ciroh-rti-public-data/teehr-data-warehou...
3e3_usgs_hourly_streamflowHourly instantaneous streamflow at USGS CONUS ...s3a://ciroh-rti-public-data/teehr-data-warehou...
\n", "
" ], "text/plain": [ " name \\\n", "0 e0_2_location_example \n", "1 e1_camels_daily_streamflow \n", "2 e2_camels_hourly_streamflow \n", "3 e3_usgs_hourly_streamflow \n", "\n", " description \\\n", "0 Example evaluation datsets with 2 USGS gages \n", "1 Daily average streamflow at ther Camels basins \n", "2 Hourly instantaneous streamflow at ther Camels... \n", "3 Hourly instantaneous streamflow at USGS CONUS ... \n", "\n", " url \n", "0 s3a://ciroh-rti-public-data/teehr-data-warehou... \n", "1 s3a://ciroh-rti-public-data/teehr-data-warehou... \n", "2 s3a://ciroh-rti-public-data/teehr-data-warehou... \n", "3 s3a://ciroh-rti-public-data/teehr-data-warehou... " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# List the evaluations in the S3 bucket\n", "ev.list_s3_evaluations()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "25/01/06 16:53:21 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties\n", "25/01/06 16:53:35 WARN SparkStringUtils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.\n", " \r" ] } ], "source": [ "# Clone the e0_2_location_example evaluation from the S3 bucket\n", "ev.clone_from_s3(\"e0_2_location_example\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Add Calculated Fields to Joined Timeseries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will get right to it. Lets start by re-creating the `joined_timeseries` table and setting the `add_attrs=False` and `execute_udf=False` so that we get just the basic `joined_timeseries` table that joins the primary and secondary timeseries but does not add attributes or user defined fields. This step is not strictly necessary but makes it a bit easier to follow what we are doing in the subsequent steps. Lets take a look at the `joined_timeseries` table and see what fields are included." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------+-------------------+-------------------+---------------------+-------------+---------------+---------+------+-------------------+--------------------+\n", "|reference_time| value_time|primary_location_id|secondary_location_id|primary_value|secondary_value|unit_name|member| configuration_name| variable_name|\n", "+--------------+-------------------+-------------------+---------------------+-------------+---------------+---------+------+-------------------+--------------------+\n", "| NULL|2000-10-01 00:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 00:00:00| usgs-14138800| nwm30-23736071| 3.3413877| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 01:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 01:00:00| usgs-14138800| nwm30-23736071| 3.9926753| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 02:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 02:00:00| usgs-14138800| nwm30-23736071| 4.445745| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 03:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 03:00:00| usgs-14138800| nwm30-23736071| 5.408518| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 04:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 04:00:00| usgs-14138800| nwm30-23736071| 5.6067357| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 05:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 05:00:00| usgs-14138800| nwm30-23736071| 5.153666| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 06:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 06:00:00| usgs-14138800| nwm30-23736071| 4.5590124| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 07:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 07:00:00| usgs-14138800| nwm30-23736071| 5.2952504| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 08:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 08:00:00| usgs-14138800| nwm30-23736071| 7.730499| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 09:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 09:00:00| usgs-14138800| nwm30-23736071| 9.825946| 0.07| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "+--------------+-------------------+-------------------+---------------------+-------------+---------------+---------+------+-------------------+--------------------+\n", "only showing top 20 rows\n", "\n" ] } ], "source": [ "ev.joined_timeseries.create(add_attrs=False, execute_scripts=False)\n", "ev.joined_timeseries.to_sdf().show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, so, as shown, the `joined_timeseries` table now only contains the basic fields required without any attributes or user defined fields. Now lets add some fields to the `joined_timeseries` table using the TEEHR calculated fields. Adding fields can be useful for grouping and filtering operations. The TEEHR calculated fields (CF) allow for row level and timeseries aware CFs:\n", "\n", "* Row level CFs are applied to each row in the table based on data that is in one or more existing fields. These are applied per row and are not aware of the data in any other row (e.g., are not aware of any other timeseries values in a \"timeseries\"). This can be used for adding fields such as a field based on the data/time (e.g., month, year, season, etc.) or based on the value field (e.g., normalized flow, log flow, etc.) and many other uses.\n", "* Timeseries aware CFs are aware of ordered groups of data (e.g., a timeseries). This is useful for things such as event detection, base flow separation, and other fields that need to be calculated based on a entire collection of timeseries values. The definition of what creates a unique set of timeseries (i.e., a timeseries) can be specified.\n", "\n", "There are two ways that these CFs can be used: \n", "\n", "* First, they can be used to add the CF to the `joined_timeseries` which can then be persisted by writing to disk. This is useful if the calculation is expected to be needed for multiple different metric calculations.\n", "* Second, they can be used as a pre-processing step in the calculation of metrics.\n", "\n", "These use cases will be demonstrated below. First we will import the CF classes. Normally this would be done at the top of the page, but is done here for demonstration purposes." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# first we need to import the UDF classes\n", "from teehr import RowLevelCalculatedFields as rcf\n", "from teehr import TimeseriesAwareCalculatedFields as tcf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Available Calculated Fields\n", "There are a number of calculated field classes that are included in the TEEHR package. They are:\n", "\n", "Row Level Calculated Fields:\n", "- Month\n", "- Year\n", "- WaterYear\n", "- NormalizedFlow\n", "- Seasons\n", "- ForecastLeadTime\n", "- ThresholdValueExceeded\n", "- DayOfYear\n", "\n", "Timeseries Aware Calculated Fields.\n", "- PercentileEventDetection\n", "\n", "There will be more added over time. If there is one you are particularly interested in, please reach out and let us know." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add Calculated Field in Memory\n", "Now we will use the Row level CFs to add year, month, water year and season to the `joined_timeseries` table in memory, using the `add_udf_columns()` method on the `joined_timeseries` table, but will not save it to disk yet. Adding the UDFs and displaying the table shows that the new fields were added." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Stage 73:> (0 + 1) / 1]\r" ] }, { "name": "stdout", "output_type": "stream", "text": [ "+--------------+-------------------+-------------------+---------------------+-------------+---------------+---------+------+-------------------+--------------------+-----+----+----------+------+\n", "|reference_time| value_time|primary_location_id|secondary_location_id|primary_value|secondary_value|unit_name|member| configuration_name| variable_name|month|year|water_year|season|\n", "+--------------+-------------------+-------------------+---------------------+-------------+---------------+---------+------+-------------------+--------------------+-----+----+----------+------+\n", "| NULL|2000-10-01 00:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 00:00:00| usgs-14138800| nwm30-23736071| 3.3413877| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 01:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 01:00:00| usgs-14138800| nwm30-23736071| 3.9926753| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 02:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 02:00:00| usgs-14138800| nwm30-23736071| 4.445745| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 03:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 03:00:00| usgs-14138800| nwm30-23736071| 5.408518| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 04:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 04:00:00| usgs-14138800| nwm30-23736071| 5.6067357| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 05:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 05:00:00| usgs-14138800| nwm30-23736071| 5.153666| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 06:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 06:00:00| usgs-14138800| nwm30-23736071| 4.5590124| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 07:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 07:00:00| usgs-14138800| nwm30-23736071| 5.2952504| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 08:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 08:00:00| usgs-14138800| nwm30-23736071| 7.730499| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 09:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "| NULL|2000-10-01 09:00:00| usgs-14138800| nwm30-23736071| 9.825946| 0.07| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...| 10|2000| 2001| fall|\n", "+--------------+-------------------+-------------------+---------------------+-------------+---------------+---------+------+-------------------+--------------------+-----+----+----------+------+\n", "only showing top 20 rows\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " \r" ] } ], "source": [ "sdf = ev.joined_timeseries.add_calculated_fields([\n", " rcf.Month(),\n", " rcf.Year(),\n", " rcf.WaterYear(),\n", " rcf.Seasons()\n", "]).to_sdf()\n", "sdf.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, if we query the table again, we can see that the additional fields are not there. This is because we did not write the table with the additional fields to disk." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------+-------------------+-------------------+---------------------+-------------+---------------+---------+------+-------------------+--------------------+\n", "|reference_time| value_time|primary_location_id|secondary_location_id|primary_value|secondary_value|unit_name|member| configuration_name| variable_name|\n", "+--------------+-------------------+-------------------+---------------------+-------------+---------------+---------+------+-------------------+--------------------+\n", "| NULL|2000-10-01 00:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 00:00:00| usgs-14138800| nwm30-23736071| 3.3413877| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 01:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 01:00:00| usgs-14138800| nwm30-23736071| 3.9926753| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 02:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 02:00:00| usgs-14138800| nwm30-23736071| 4.445745| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 03:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 03:00:00| usgs-14138800| nwm30-23736071| 5.408518| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 04:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 04:00:00| usgs-14138800| nwm30-23736071| 5.6067357| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 05:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 05:00:00| usgs-14138800| nwm30-23736071| 5.153666| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 06:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 06:00:00| usgs-14138800| nwm30-23736071| 4.5590124| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 07:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 07:00:00| usgs-14138800| nwm30-23736071| 5.2952504| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 08:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 08:00:00| usgs-14138800| nwm30-23736071| 7.730499| 0.06| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 09:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 09:00:00| usgs-14138800| nwm30-23736071| 9.825946| 0.07| m^3/s| NULL|nwm30_retrospective|streamflow_hourly...|\n", "+--------------+-------------------+-------------------+---------------------+-------------+---------------+---------+------+-------------------+--------------------+\n", "only showing top 20 rows\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/mdenno/.pyenv/versions/3.10.15/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown\n", " warnings.warn('resource_tracker: There appear to be %d '\n" ] } ], "source": [ "ev.joined_timeseries.to_sdf().show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add Calculated Field and Persist\n", "Now we will add the UDFs again, but this time we will write the table with the additional fields to disk so they are persisted." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " \r" ] } ], "source": [ "ev.joined_timeseries.add_calculated_fields([\n", " rcf.Month(),\n", " rcf.Year(),\n", " rcf.WaterYear(),\n", " rcf.Seasons()\n", "]).write()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And query the table again. This time we can see that the new fields we added are there as they were written to disk and are now part of the `joined_timeseries` table on disk." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------+-------------------+-------------------+---------------------+-------------+---------------+---------+------+-----+----+----------+------+-------------------+--------------------+\n", "|reference_time| value_time|primary_location_id|secondary_location_id|primary_value|secondary_value|unit_name|member|month|year|water_year|season| configuration_name| variable_name|\n", "+--------------+-------------------+-------------------+---------------------+-------------+---------------+---------+------+-----+----+----------+------+-------------------+--------------------+\n", "| NULL|2000-10-01 00:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 00:00:00| usgs-14138800| nwm30-23736071| 3.3413877| 0.06| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 01:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 01:00:00| usgs-14138800| nwm30-23736071| 3.9926753| 0.06| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 02:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 02:00:00| usgs-14138800| nwm30-23736071| 4.445745| 0.06| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 03:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 03:00:00| usgs-14138800| nwm30-23736071| 5.408518| 0.06| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 04:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 04:00:00| usgs-14138800| nwm30-23736071| 5.6067357| 0.06| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 05:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 05:00:00| usgs-14138800| nwm30-23736071| 5.153666| 0.06| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 06:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 06:00:00| usgs-14138800| nwm30-23736071| 4.5590124| 0.06| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 07:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 07:00:00| usgs-14138800| nwm30-23736071| 5.2952504| 0.06| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 08:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 08:00:00| usgs-14138800| nwm30-23736071| 7.730499| 0.06| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 09:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 09:00:00| usgs-14138800| nwm30-23736071| 9.825946| 0.07| m^3/s| NULL| 10|2000| 2001| fall|nwm30_retrospective|streamflow_hourly...|\n", "+--------------+-------------------+-------------------+---------------------+-------------+---------------+---------+------+-----+----+----------+------+-------------------+--------------------+\n", "only showing top 20 rows\n", "\n" ] } ], "source": [ "ev.joined_timeseries.to_sdf().show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Timeseries Aware Calculated Fields\n", "The timeseries aware calculated fields behave the same way from the users perspective, but behind the scenes are performing some extra grouping and sorting to ensure that the field is calculated based on an ordered group of timeseries values (i.e., a \"timeseries\"). This is necessary for doing things like event detection, but comes at a computational cost, so use with care, especially on large datasets. Lets try it. This time we will jump right to writing the resulting data frame back to disk to persist it, but you could add the field and display the results without persisting as we did above." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " \r" ] } ], "source": [ "ev.joined_timeseries.add_calculated_fields([\n", " tcf.PercentileEventDetection()\n", "]).write()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And query the `joined_timeseries` table to see the new `event` and `event_id` fields." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------+-------------------+-------------------+---------------------+-------------+---------------+---------+------+-----+----+----------+------+-----+--------+-------------------+--------------------+\n", "|reference_time| value_time|primary_location_id|secondary_location_id|primary_value|secondary_value|unit_name|member|month|year|water_year|season|event|event_id| configuration_name| variable_name|\n", "+--------------+-------------------+-------------------+---------------------+-------------+---------------+---------+------+-----+----+----------+------+-----+--------+-------------------+--------------------+\n", "| NULL|2000-10-01 00:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 01:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 02:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 03:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 04:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 05:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 06:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 07:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 08:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 09:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 10:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 11:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 12:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 13:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 14:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 15:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 16:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 17:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 18:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "| NULL|2000-10-01 19:00:00| usgs-14316700| nwm30-23894572| 1.1326739| 0.38| m^3/s| NULL| 10|2000| 2001| fall|false| NULL|nwm30_retrospective|streamflow_hourly...|\n", "+--------------+-------------------+-------------------+---------------------+-------------+---------------+---------+------+-----+----+----------+------+-----+--------+-------------------+--------------------+\n", "only showing top 20 rows\n", "\n" ] } ], "source": [ "ev.joined_timeseries.to_sdf().show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the table above you can see the `event` and `event_id` fields, but it is a bit difficult to see what was done based on the table alone. Lets create a plot to see what the new fields mean. In the next few cells we will query the `joined_timeseries` table and filter the data to a single location ('usgs-14138800') and only values that were identified as `event=True`, then create a plot where we color the points by the new `event_id` field." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": "(function(root) {\n function now() {\n return new Date();\n }\n\n const force = true;\n const py_version = '3.6.1'.replace('rc', '-rc.').replace('.dev', '-dev.');\n const reloading = false;\n const Bokeh = root.Bokeh;\n\n // Set a timeout for this load but only if we are not already initializing\n if (typeof (root._bokeh_timeout) === \"undefined\" || (force || !root._bokeh_is_initializing)) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) {\n if (callback != null)\n callback();\n });\n } finally {\n delete root._bokeh_onload_callbacks;\n }\n console.debug(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(css_urls, js_urls, js_modules, js_exports, callback) {\n if (css_urls == null) css_urls = [];\n if (js_urls == null) js_urls = [];\n if (js_modules == null) js_modules = [];\n if (js_exports == null) js_exports = {};\n\n root._bokeh_onload_callbacks.push(callback);\n\n if (root._bokeh_is_loading > 0) {\n // Don't load bokeh if it is still initializing\n console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n } else if (js_urls.length === 0 && js_modules.length === 0 && Object.keys(js_exports).length === 0) {\n // There is nothing to load\n run_callbacks();\n return null;\n }\n\n function on_load() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n run_callbacks()\n }\n }\n window._bokeh_on_load = on_load\n\n function on_error(e) {\n const src_el = e.srcElement\n console.error(\"failed to load \" + (src_el.href || src_el.src));\n }\n\n const skip = [];\n if (window.requirejs) {\n window.requirejs.config({'packages': {}, 'paths': {}, 'shim': {}});\n root._bokeh_is_loading = css_urls.length + 0;\n } else {\n root._bokeh_is_loading = css_urls.length + js_urls.length + js_modules.length + Object.keys(js_exports).length;\n }\n\n const existing_stylesheets = []\n const links = document.getElementsByTagName('link')\n for (let i = 0; i < links.length; i++) {\n const link = links[i]\n if (link.href != null) {\n existing_stylesheets.push(link.href)\n }\n }\n for (let i = 0; i < css_urls.length; i++) {\n const url = css_urls[i];\n const escaped = encodeURI(url)\n if (existing_stylesheets.indexOf(escaped) !== -1) {\n on_load()\n continue;\n }\n const element = document.createElement(\"link\");\n element.onload = on_load;\n element.onerror = on_error;\n element.rel = \"stylesheet\";\n element.type = \"text/css\";\n element.href = url;\n console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n document.body.appendChild(element);\n } var existing_scripts = []\n const scripts = document.getElementsByTagName('script')\n for (let i = 0; i < scripts.length; i++) {\n var script = scripts[i]\n if (script.src != null) {\n existing_scripts.push(script.src)\n }\n }\n for (let i = 0; i < js_urls.length; i++) {\n const url = js_urls[i];\n const escaped = encodeURI(url)\n if (skip.indexOf(escaped) !== -1 || existing_scripts.indexOf(escaped) !== -1) {\n if (!window.requirejs) {\n on_load();\n }\n continue;\n }\n const element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error;\n element.async = false;\n element.src = url;\n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n for (let i = 0; i < js_modules.length; i++) {\n const url = js_modules[i];\n const escaped = encodeURI(url)\n if (skip.indexOf(escaped) !== -1 || existing_scripts.indexOf(escaped) !== -1) {\n if (!window.requirejs) {\n on_load();\n }\n continue;\n }\n var element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error;\n element.async = false;\n element.src = url;\n element.type = \"module\";\n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n for (const name in js_exports) {\n const url = js_exports[name];\n const escaped = encodeURI(url)\n if (skip.indexOf(escaped) >= 0 || root[name] != null) {\n if (!window.requirejs) {\n on_load();\n }\n continue;\n }\n var element = document.createElement('script');\n element.onerror = on_error;\n element.async = false;\n element.type = \"module\";\n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n element.textContent = `\n import ${name} from \"${url}\"\n window.${name} = ${name}\n window._bokeh_on_load()\n `\n document.head.appendChild(element);\n }\n if (!js_urls.length && !js_modules.length) {\n on_load()\n }\n };\n\n function inject_raw_css(css) {\n const element = document.createElement(\"style\");\n element.appendChild(document.createTextNode(css));\n document.body.appendChild(element);\n }\n\n const js_urls = [\"https://cdn.holoviz.org/panel/1.5.4/dist/bundled/reactiveesm/es-module-shims@^1.10.0/dist/es-module-shims.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-3.6.1.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-3.6.1.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-3.6.1.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-3.6.1.min.js\", \"https://cdn.holoviz.org/panel/1.5.4/dist/panel.min.js\"];\n const js_modules = [];\n const js_exports = {};\n const css_urls = [];\n const inline_js = [ function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\nfunction(Bokeh) {} // ensure no trailing comma for IE\n ];\n\n function run_inline_js() {\n if ((root.Bokeh !== undefined) || (force === true)) {\n for (let i = 0; i < inline_js.length; i++) {\n try {\n inline_js[i].call(root, root.Bokeh);\n } catch(e) {\n if (!reloading) {\n throw e;\n }\n }\n }\n // Cache old bokeh versions\n if (Bokeh != undefined && !reloading) {\n var NewBokeh = root.Bokeh;\n if (Bokeh.versions === undefined) {\n Bokeh.versions = new Map();\n }\n if (NewBokeh.version !== Bokeh.version) {\n Bokeh.versions.set(NewBokeh.version, NewBokeh)\n }\n root.Bokeh = Bokeh;\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n }\n root._bokeh_is_initializing = false\n }\n\n function load_or_wait() {\n // Implement a backoff loop that tries to ensure we do not load multiple\n // versions of Bokeh and its dependencies at the same time.\n // In recent versions we use the root._bokeh_is_initializing flag\n // to determine whether there is an ongoing attempt to initialize\n // bokeh, however for backward compatibility we also try to ensure\n // that we do not start loading a newer (Panel>=1.0 and Bokeh>3) version\n // before older versions are fully initialized.\n if (root._bokeh_is_initializing && Date.now() > root._bokeh_timeout) {\n // If the timeout and bokeh was not successfully loaded we reset\n // everything and try loading again\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_is_initializing = false;\n root._bokeh_onload_callbacks = undefined;\n root._bokeh_is_loading = 0\n console.log(\"Bokeh: BokehJS was loaded multiple times but one version failed to initialize.\");\n load_or_wait();\n } else if (root._bokeh_is_initializing || (typeof root._bokeh_is_initializing === \"undefined\" && root._bokeh_onload_callbacks !== undefined)) {\n setTimeout(load_or_wait, 100);\n } else {\n root._bokeh_is_initializing = true\n root._bokeh_onload_callbacks = []\n const bokeh_loaded = root.Bokeh != null && (root.Bokeh.version === py_version || (root.Bokeh.versions !== undefined && root.Bokeh.versions.has(py_version)));\n if (!reloading && !bokeh_loaded) {\n if (root.Bokeh) {\n root.Bokeh = undefined;\n }\n console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n }\n load_libs(css_urls, js_urls, js_modules, js_exports, function() {\n console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n }\n // Give older versions of the autoload script a head-start to ensure\n // they initialize before we start loading newer version.\n setTimeout(load_or_wait, 100)\n}(window));", "application/vnd.holoviews_load.v0+json": "" }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": "\nif ((window.PyViz === undefined) || (window.PyViz instanceof HTMLElement)) {\n window.PyViz = {comms: {}, comm_status:{}, kernels:{}, receivers: {}, plot_index: []}\n}\n\n\n function JupyterCommManager() {\n }\n\n JupyterCommManager.prototype.register_target = function(plot_id, comm_id, msg_handler) {\n if (window.comm_manager || ((window.Jupyter !== undefined) && (Jupyter.notebook.kernel != null))) {\n var comm_manager = window.comm_manager || Jupyter.notebook.kernel.comm_manager;\n comm_manager.register_target(comm_id, function(comm) {\n comm.on_msg(msg_handler);\n });\n } else if ((plot_id in window.PyViz.kernels) && (window.PyViz.kernels[plot_id])) {\n window.PyViz.kernels[plot_id].registerCommTarget(comm_id, function(comm) {\n comm.onMsg = msg_handler;\n });\n } else if (typeof google != 'undefined' && google.colab.kernel != null) {\n google.colab.kernel.comms.registerTarget(comm_id, (comm) => {\n var messages = comm.messages[Symbol.asyncIterator]();\n function processIteratorResult(result) {\n var message = result.value;\n console.log(message)\n var content = {data: message.data, comm_id};\n var buffers = []\n for (var buffer of message.buffers || []) {\n buffers.push(new DataView(buffer))\n }\n var metadata = message.metadata || {};\n var msg = {content, buffers, metadata}\n msg_handler(msg);\n return messages.next().then(processIteratorResult);\n }\n return messages.next().then(processIteratorResult);\n })\n }\n }\n\n JupyterCommManager.prototype.get_client_comm = function(plot_id, comm_id, msg_handler) {\n if (comm_id in window.PyViz.comms) {\n return window.PyViz.comms[comm_id];\n } else if (window.comm_manager || ((window.Jupyter !== undefined) && (Jupyter.notebook.kernel != null))) {\n var comm_manager = window.comm_manager || Jupyter.notebook.kernel.comm_manager;\n var comm = comm_manager.new_comm(comm_id, {}, {}, {}, comm_id);\n if (msg_handler) {\n comm.on_msg(msg_handler);\n }\n } else if ((plot_id in window.PyViz.kernels) && (window.PyViz.kernels[plot_id])) {\n var comm = window.PyViz.kernels[plot_id].connectToComm(comm_id);\n comm.open();\n if (msg_handler) {\n comm.onMsg = msg_handler;\n }\n } else if (typeof google != 'undefined' && google.colab.kernel != null) {\n var comm_promise = google.colab.kernel.comms.open(comm_id)\n comm_promise.then((comm) => {\n window.PyViz.comms[comm_id] = comm;\n if (msg_handler) {\n var messages = comm.messages[Symbol.asyncIterator]();\n function processIteratorResult(result) {\n var message = result.value;\n var content = {data: message.data};\n var metadata = message.metadata || {comm_id};\n var msg = {content, metadata}\n msg_handler(msg);\n return messages.next().then(processIteratorResult);\n }\n return messages.next().then(processIteratorResult);\n }\n }) \n var sendClosure = (data, metadata, buffers, disposeOnDone) => {\n return comm_promise.then((comm) => {\n comm.send(data, metadata, buffers, disposeOnDone);\n });\n };\n var comm = {\n send: sendClosure\n };\n }\n window.PyViz.comms[comm_id] = comm;\n return comm;\n }\n window.PyViz.comm_manager = new JupyterCommManager();\n \n\n\nvar JS_MIME_TYPE = 'application/javascript';\nvar HTML_MIME_TYPE = 'text/html';\nvar EXEC_MIME_TYPE = 'application/vnd.holoviews_exec.v0+json';\nvar CLASS_NAME = 'output';\n\n/**\n * Render data to the DOM node\n */\nfunction render(props, node) {\n var div = document.createElement(\"div\");\n var script = document.createElement(\"script\");\n node.appendChild(div);\n node.appendChild(script);\n}\n\n/**\n * Handle when a new output is added\n */\nfunction handle_add_output(event, handle) {\n var output_area = handle.output_area;\n var output = handle.output;\n if ((output.data == undefined) || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n return\n }\n var id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n if (id !== undefined) {\n var nchildren = toinsert.length;\n var html_node = toinsert[nchildren-1].children[0];\n html_node.innerHTML = output.data[HTML_MIME_TYPE];\n var scripts = [];\n var nodelist = html_node.querySelectorAll(\"script\");\n for (var i in nodelist) {\n if (nodelist.hasOwnProperty(i)) {\n scripts.push(nodelist[i])\n }\n }\n\n scripts.forEach( function (oldScript) {\n var newScript = document.createElement(\"script\");\n var attrs = [];\n var nodemap = oldScript.attributes;\n for (var j in nodemap) {\n if (nodemap.hasOwnProperty(j)) {\n attrs.push(nodemap[j])\n }\n }\n attrs.forEach(function(attr) { newScript.setAttribute(attr.name, attr.value) });\n newScript.appendChild(document.createTextNode(oldScript.innerHTML));\n oldScript.parentNode.replaceChild(newScript, oldScript);\n });\n if (JS_MIME_TYPE in output.data) {\n toinsert[nchildren-1].children[1].textContent = output.data[JS_MIME_TYPE];\n }\n output_area._hv_plot_id = id;\n if ((window.Bokeh !== undefined) && (id in Bokeh.index)) {\n window.PyViz.plot_index[id] = Bokeh.index[id];\n } else {\n window.PyViz.plot_index[id] = null;\n }\n } else if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n var bk_div = document.createElement(\"div\");\n bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n var script_attrs = bk_div.children[0].attributes;\n for (var i = 0; i < script_attrs.length; i++) {\n toinsert[toinsert.length - 1].childNodes[1].setAttribute(script_attrs[i].name, script_attrs[i].value);\n }\n // store reference to server id on output_area\n output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n }\n}\n\n/**\n * Handle when an output is cleared or removed\n */\nfunction handle_clear_output(event, handle) {\n var id = handle.cell.output_area._hv_plot_id;\n var server_id = handle.cell.output_area._bokeh_server_id;\n if (((id === undefined) || !(id in PyViz.plot_index)) && (server_id !== undefined)) { return; }\n var comm = window.PyViz.comm_manager.get_client_comm(\"hv-extension-comm\", \"hv-extension-comm\", function () {});\n if (server_id !== null) {\n comm.send({event_type: 'server_delete', 'id': server_id});\n return;\n } else if (comm !== null) {\n comm.send({event_type: 'delete', 'id': id});\n }\n delete PyViz.plot_index[id];\n if ((window.Bokeh !== undefined) & (id in window.Bokeh.index)) {\n var doc = window.Bokeh.index[id].model.document\n doc.clear();\n const i = window.Bokeh.documents.indexOf(doc);\n if (i > -1) {\n window.Bokeh.documents.splice(i, 1);\n }\n }\n}\n\n/**\n * Handle kernel restart event\n */\nfunction handle_kernel_cleanup(event, handle) {\n delete PyViz.comms[\"hv-extension-comm\"];\n window.PyViz.plot_index = {}\n}\n\n/**\n * Handle update_display_data messages\n */\nfunction handle_update_output(event, handle) {\n handle_clear_output(event, {cell: {output_area: handle.output_area}})\n handle_add_output(event, handle)\n}\n\nfunction register_renderer(events, OutputArea) {\n function append_mime(data, metadata, element) {\n // create a DOM node to render to\n var toinsert = this.create_output_subarea(\n metadata,\n CLASS_NAME,\n EXEC_MIME_TYPE\n );\n this.keyboard_manager.register_events(toinsert);\n // Render to node\n var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n render(props, toinsert[0]);\n element.append(toinsert);\n return toinsert\n }\n\n events.on('output_added.OutputArea', handle_add_output);\n events.on('output_updated.OutputArea', handle_update_output);\n events.on('clear_output.CodeCell', handle_clear_output);\n events.on('delete.Cell', handle_clear_output);\n events.on('kernel_ready.Kernel', handle_kernel_cleanup);\n\n OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n safe: true,\n index: 0\n });\n}\n\nif (window.Jupyter !== undefined) {\n try {\n var events = require('base/js/events');\n var OutputArea = require('notebook/js/outputarea').OutputArea;\n if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n register_renderer(events, OutputArea);\n }\n } catch(err) {\n }\n}\n", "application/vnd.holoviews_load.v0+json": "" }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.holoviews_exec.v0+json": "", "text/html": [ "
\n", "
\n", "
\n", "" ] }, "metadata": { "application/vnd.holoviews_exec.v0+json": { "id": "fcbbcab4-04b6-4340-b105-75ba82ce3266" } }, "output_type": "display_data" } ], "source": [ "import hvplot.pandas # noqa" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "pdf = ev.joined_timeseries.filter([\n", " \"primary_location_id = 'usgs-14138800'\",\n", " \"event = true\",\n", "]).to_pandas()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "primary_plot = pdf.hvplot.points(x=\"value_time\", y=\"primary_value\", color=\"event_id\") #.opts(width=1200, height=400)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": {}, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.holoviews_exec.v0+json": "", "text/html": [ "
\n", "
\n", "
\n", "" ], "text/plain": [ ":Points [value_time,primary_value] (event_id)" ] }, "execution_count": 16, "metadata": { "application/vnd.holoviews_exec.v0+json": { "id": "19daf080-733d-4519-82be-e92fcdc694f8" } }, "output_type": "execute_result" } ], "source": [ "primary_plot.opts(width=1200, height=400)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you zoom in on the plot above, you can see that each unique `event` has been given a unique `event_id` that was used to color the individual data points. This was done by first identifying the 85 percentile value at each location and identifying each value as either above or below that value for its location. Then the values were grouped by continuous sets of values that were identified as `event=True` and each continuous group was given an `event_id`. This is just one example of what can be done using `TimeseriesAwareCalculatedFields`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use Calculated Fields in Metrics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Beyond just identifying high flow events (in this example), the `event` and `event_id` fields can be used in subsequent metrics calculations. In the next few cells we will demonstrate one way that the `event` and `event_id` fields can be used by working through the steps to calculate the `event_max_relative_bias`. The `event_max_relative_bias` is the relative bias between the primary and secondary timeseries maximum values within each event. We calculate this in twos steps using chained queries. We will do it in two steps to demonstrate what we are doing. Note, if you have not worked though the grouping and filter notebooks you may want to go back and do that first as it is an important concept to understanding what is being done here. First we run a metrics query where we filter to a single location and only values that were identified as being `event=True`, group by `configuration_name`, `primary_location_id` and `event_id`, and calculate the maximum primary and secondary values which we call `max_primary_value` and `max_secondary_value` but could give them any name we wanted. " ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Stage 104:> (0 + 1) / 1]\r" ] }, { "name": "stdout", "output_type": "stream", "text": [ "+-------------------+-------------------+--------------------+-----------------+-------------------+\n", "| configuration_name|primary_location_id| event_id|max_primary_value|max_secondary_value|\n", "+-------------------+-------------------+--------------------+-----------------+-------------------+\n", "|nwm30_retrospective| usgs-14138800|2000-10-01 00:00:...| 11.015253| 0.07|\n", "|nwm30_retrospective| usgs-14138800|2000-10-21 03:00:...| 5.153666| 0.96|\n", "|nwm30_retrospective| usgs-14138800|2000-11-08 15:00:...| 3.68119| 1.43|\n", "|nwm30_retrospective| usgs-14138800|2000-11-27 09:00:...| 7.5039644| 1.24|\n", "|nwm30_retrospective| usgs-14138800|2000-12-15 03:00:...| 2.8600016| 0.64|\n", "|nwm30_retrospective| usgs-14138800|2000-12-15 07:00:...| 3.68119| 0.64|\n", "|nwm30_retrospective| usgs-14138800|2000-12-17 04:00:...| 4.049309| 1.24|\n", "|nwm30_retrospective| usgs-14138800|2000-12-22 23:00:...| 5.5501018| 1.1|\n", "|nwm30_retrospective| usgs-14138800|2001-01-05 15:00:...| 3.3413877| 2.47|\n", "|nwm30_retrospective| usgs-14138800|2001-02-05 01:00:...| 7.2207956| 8.19|\n", "|nwm30_retrospective| usgs-14138800|2001-03-18 14:00:...| 12.884165| 11.99|\n", "|nwm30_retrospective| usgs-14138800|2001-03-25 12:00:...| 3.3413877| 4.33|\n", "|nwm30_retrospective| usgs-14138800|2001-03-25 22:00:...| 3.1148531| 5.66|\n", "|nwm30_retrospective| usgs-14138800|2001-03-26 22:00:...| 2.8600016| 4.0|\n", "|nwm30_retrospective| usgs-14138800|2001-03-28 01:00:...| 7.305746| 7.3399997|\n", "|nwm30_retrospective| usgs-14138800|2001-03-31 17:00:...| 8.070301| 4.89|\n", "|nwm30_retrospective| usgs-14138800|2001-04-25 23:00:...| 3.879408| 1.4699999|\n", "|nwm30_retrospective| usgs-14138800|2001-04-30 12:00:...| 14.016839| 8.44|\n", "|nwm30_retrospective| usgs-14138800|2001-05-14 22:00:...| 9.42951| 1.4399999|\n", "|nwm30_retrospective| usgs-14138800|2001-06-03 15:00:...| 2.8600016| 0.71999997|\n", "+-------------------+-------------------+--------------------+-----------------+-------------------+\n", "only showing top 20 rows\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " \r" ] } ], "source": [ "(\n", " ev.metrics\n", " .query(\n", " group_by=[\"configuration_name\", \"primary_location_id\", \"event_id\"],\n", " filters=[\n", " \"primary_location_id = 'usgs-14138800'\",\n", " \"event = true\",\n", " ],\n", " include_metrics=[\n", " teehr.SignatureMetrics.Maximum(\n", " input_field_names=[\"primary_value\"],\n", " output_field_name=\"max_primary_value\"\n", " ),\n", " teehr.SignatureMetrics.Maximum(\n", " input_field_names=[\"secondary_value\"],\n", " output_field_name=\"max_secondary_value\"\n", " )\n", " ]\n", " )\n", " .to_sdf().show()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see that this gives us the `max_primary_value` and `max_secondary_value` for each unique group of `configuration_name`, `primary_location_id` and `event_id`. But, that is not what we are actually trying to calculate. We are really trying to calculate the `event_max_relative_bias`. To do that we have to add one more step by chaining together queries. In the following query we add an additional chained query where we only group by `configuration_name` and `primary_location_id` which causes the query to aggregate the values from the different events, and then as an aggregation method (`include_metrics`) we choose relative bias." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/mdenno/.pyenv/versions/3.10.15/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown\n", " warnings.warn('resource_tracker: There appear to be %d '\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "+-------------------+-------------------+-----------------------+\n", "| configuration_name|primary_location_id|event_max_relative_bias|\n", "+-------------------+-------------------+-----------------------+\n", "|nwm30_retrospective| usgs-14138800| -0.08458245|\n", "+-------------------+-------------------+-----------------------+\n", "\n" ] } ], "source": [ "(\n", " ev.metrics\n", " .query(\n", " group_by=[\"configuration_name\", \"primary_location_id\", \"event_id\"],\n", " filters=[\n", " \"primary_location_id = 'usgs-14138800'\",\n", " \"event = true\",\n", " ],\n", " include_metrics=[\n", " teehr.SignatureMetrics.Maximum(\n", " input_field_names=[\"primary_value\"],\n", " output_field_name=\"max_primary_value\"\n", " ),\n", " teehr.SignatureMetrics.Maximum(\n", " input_field_names=[\"secondary_value\"],\n", " output_field_name=\"max_secondary_value\"\n", " )\n", " ]\n", " )\n", " .query(\n", " group_by=[\"configuration_name\", \"primary_location_id\"],\n", " include_metrics=[\n", " teehr.DeterministicMetrics.RelativeBias(\n", " input_field_names=[\"max_primary_value\", \"max_secondary_value\"],\n", " output_field_name=\"event_max_relative_bias\"\n", " )\n", " ]\n", " )\n", " .to_sdf().show()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One last thing to cover here. So far we have added the calculated fields on the `joined_timeseries` table, written them to disk, and then queried the `joined_timeseries` table to calculate metrics. This works well and allows the calculated fields to be calculated once and used in many subsequent metrics, plots, etc. However, you may wish to add temporary fields to the `joined_timeseries` table as part of the metrics calculation. This can be done too. Building on the pervious example, where we calculated the \"event_max_relative_bias\", lets now assume we want to calculate the same metric but for the 90th percentile instead of the default 85th percentile that we used when we added the added the `event` and `event_id` fields to the `joined_timeseries` table. We could add the new \"90th percentile event\" to the `joined_timeseries` table and save to disk and then proceed as we did before, or we can add new `event90` and `event90_id` fields to the data frame temporarily before calculating the maximum event values and ultimately the \"event_90th_max_relative_bias\"." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Stage 129:> (0 + 1) / 1]\r" ] }, { "name": "stdout", "output_type": "stream", "text": [ "+-------------------+-------------------+----------------------------+\n", "| configuration_name|primary_location_id|event_90th_max_relative_bias|\n", "+-------------------+-------------------+----------------------------+\n", "|nwm30_retrospective| usgs-14138800| -0.08735727|\n", "+-------------------+-------------------+----------------------------+\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " \r" ] } ], "source": [ "(\n", " ev.metrics\n", " # Add the PercentileEventDetection calculated field to identify events greater than the 90th percentile.\n", " # Note the output_event_field_name and output_event_id_field_name are set to \"event90\" and \"event90_id\" respectively.\n", " .add_calculated_fields([\n", " tcf.PercentileEventDetection(\n", " quantile=0.90,\n", " output_event_field_name=\"event90\",\n", " output_event_id_field_name=\"event90_id\"\n", " )\n", " ])\n", " # First query to calculate the maximum primary and secondary values for each event.\n", " # Note the filters are set to only include events where event90 is true and the group_by includes event90_id.\n", " .query(\n", " group_by=[\"configuration_name\", \"primary_location_id\", \"event90_id\"],\n", " filters=[\n", " \"primary_location_id = 'usgs-14138800'\",\n", " \"event90 = true\",\n", " ],\n", " include_metrics=[\n", " teehr.SignatureMetrics.Maximum(\n", " input_field_names=[\"primary_value\"],\n", " output_field_name=\"max_primary_value\"\n", " ),\n", " teehr.SignatureMetrics.Maximum(\n", " input_field_names=[\"secondary_value\"],\n", " output_field_name=\"max_secondary_value\"\n", " )\n", " ]\n", " )\n", " # Second query to calculate the relative bias between the maximum primary and secondary values.\n", " .query(\n", " group_by=[\"configuration_name\", \"primary_location_id\"],\n", " include_metrics=[\n", " teehr.DeterministicMetrics.RelativeBias(\n", " input_field_names=[\"max_primary_value\", \"max_secondary_value\"],\n", " output_field_name=\"event_90th_max_relative_bias\"\n", " )\n", " ]\n", " )\n", " # Convert the metrics to a pandas DataFrame\n", " .to_sdf().show()\n", ")" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "ev.spark.stop()" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.15" } }, "nbformat": 4, "nbformat_minor": 2 }