# Fetching USGS and NWM Streamflow Data

## Overview
In this guide we'll demonstrate fetching National Water Model (NWM) streamflow forecasts from Google Cloud Storage (GCS). This example makes use of a pre-generated Evaluation dataset stored in TEEHR's examples data module. It contains a single USGS gage location and the corresponding NWM location ID.

**Note**: For demonstration purposes several cells below are shown in markdown form. If you want to download this notebook and run them yourself, you will need to convert them to code cells.

For a refresher on loading location and location crosswalk data into a new Evaluation refer back to the [](/user_guide/notebooks/02_loading_local_data.ipynb) and [](/user_guide/notebooks/04_setup_simple_example.ipynb) user guide pages.

### Set up the example Evaluation

In [None]:
from datetime import datetime
from pathlib import Path
import shutil

import teehr
from teehr.examples.setup_nwm_streamflow_example import setup_nwm_example

# Tell Bokeh to output plots in the notebook
from bokeh.io import output_notebook
output_notebook()

In [None]:
# Define the directory where the Evaluation will be created.
test_eval_dir = Path(Path().home(), "temp", "10_fetch_nwm_data")
shutil.rmtree(test_eval_dir, ignore_errors=True)

# Setup the example evaluation using data from the TEEHR repository.
setup_nwm_example(tmpdir=test_eval_dir)

# Initialize the evaluation.
ev = teehr.Evaluation(dir_path=test_eval_dir)

This example Evaluation only contains a single location, the USGS gage on the New River at Radford, VA.

In [None]:
locations_gdf = ev.locations.to_geopandas()
locations_gdf

The ID of the National Water Model reach corresponding to the USGS gage is in the `location crosswalks` table.

In [None]:
location_crosswalks_df = ev.location_crosswalks.to_pandas()
location_crosswalks_df

### Fetching USGS streamgage data

Since the example Evalution already contains the location and location crosswalk IDs for USGS and NWM locations, we can make use of TEEHR's built-in tools to fetch USGS and NWM streamflow data.

First we'll fetch USGS data from the National Water Information System (NWIS). TEEHR makes use of the USGS [dataretrieval](https://github.com/DOI-USGS/dataretrieval-python) python tool under the hood.

Note that the USGS and NWM streamflow timeseries data have been pre-loading into this example Evaluation. However you can still download this notebook and execute the methods yourself.

`ev.fetch.usgs_streamflow()` is a method that fetches the USGS data for the primary locations in the evaluation. It requires users to define the start and end times of data to fetch and has several optional arguments. For more details on the method see: 

We'll fetch streamflow data at this gage during the 2024 Hurricane Helene event. Note that we set ``add_configuration_name`` to False since it has already been added in this example evaluation.

```python
# Convert this to a code cell to run locally
ev.fetch.usgs_streamflow(
    start_date=datetime(2024, 9, 26),
    end_date=datetime(2024, 10, 1),
    add_configuraton_name=False
)
```

TEEHR automatically loads the data into the Evaluation. USGS data is loaded a `primary` timeseries by default.

In [None]:
df = ev.primary_timeseries.to_pandas()
df.teehr.timeseries_plot()

### Fetching NWM streamflow data

`ev.fetch.nwm_operational_points()` is a method that fetches near real-time NWM point data (e.g., streamflow) from Google Cloud Storage. This method fetches data for the secondary location IDs listed in the `location_crosswalks` table, and automatically loads the time series into the `secondary_timeseries` Evaluation table.

There are several required arguments to define when using the method, including the NWM configuration, NWM variable name, start date, number of ingest days, and others. Several optional arguments are also available.

We'll now fetch streamflow forecasts for the NWM location corresponding to the USGS gage.

:::{note}
The tools for fetching NWM data in TEEHR can take advantage of `Dask`. Start a Dask cluster for improved performance when fetching NWM data if you have [Dask.Distributed installed](https://distributed.dask.org/en/latest/install.html)!
:::

```python
# Convert this to a code cell to run locally
from dask.distributed import Client
client = Client()
```

```python
# Convert this to a code cell to run locally
ev.fetch.nwm_operational_points(
    nwm_configuration="medium_range_mem1",
    output_type="channel_rt_1",
    variable_name="streamflow",
    start_date=datetime(2024, 9, 26),
    ingest_days=1,
    nwm_version="nwm30",
    add_configuraton_name=False
)
```

Here we are fetching NWM version 3.0 Medium Range streamflow forecast, ensemble member 1, for the same time period as the USGS data.

A list of available NWM configurations for point data is shown below. Appropriate values for the `output_type` and `variable_name` arguments depend on the specified `nwm_configuration` value.

More information on NWM configurations can be found here: [https://water.noaa.gov/about/nwm](https://water.noaa.gov/about/nwm)

In [None]:
from teehr.models.fetching.nwm30_point import ConfigurationsEnum
list(ConfigurationsEnum.__members__)

In [None]:
df = ev.secondary_timeseries.to_pandas()
df.teehr.timeseries_plot()

Now you can create the `joined_timeseries` table and calculate metrics comparing the NWM forecasts to the USGS observations.

In [None]:
ev.joined_timeseries.create()
df = ev.joined_timeseries.to_pandas()
df.head()

In [19]:
ev.spark.stop()

### Additional NWM fetching methods in TEEHR

#### NWM Retrospective Point Data

NWM retrospective point data for versions 2.0, 2.1, and 3.0:

```{eval-rst}
.. currentmodule:: teehr
.. autosummary::
   :recursive:
   :nosignatures:

    teehr.Fetch.nwm_retrospective_points
```

#### NWM Retrospective and Forecast Gridded Data

NWM gridded data can also be fetched in TEEHR. Gridded data is summarized to zones (polygons) as the zonal mean. 

```{eval-rst}
.. currentmodule:: teehr
.. autosummary::
   :recursive:
   :nosignatures:

    teehr.Fetch.nwm_operational_grids
    teehr.Fetch.nwm_retrospective_grids
```

Before gridded data can be fetched and summarized, the pixel weights (fractional area intersecting the zones) must be calculated.

```{eval-rst}
.. currentmodule:: teehr
.. autosummary::
   :toctree: generated
   :nosignatures:

   teehr.utilities.generate_weights.generate_weights_file
```