Release Notes#
0.6.2 - 2026-04-17#
Breaking Changes#
None
Added#
Calculated Fields
Added
AboveThresholdEventDetectionandBelowThresholdEventDetectiontimeseries-aware calculated fields. These detect events based on a threshold read from a DataFrame column (e.g., an attribute field), supporting both numeric and string-typed fields.
Write optimizations
Added partition filters to the upsert and append methods in the Write class to enable partition pruning when writing large datasets. The filters are applied by default.
Added
uniqueness_fields,partition_by, andwrite_ordered_byas arguments to write methods for non-core tables.
Changed#
Sets the default
spark.sql.shuffle.partitionsto 2 * num_cores in spark configuration.
Fixed#
Fixed bug when fetching NWM analysis data for Hawaii due to the 15-min time step
Fixed filenaming bug when fetching NWM data using CIROH-generated kerchunk json files
Dependencies#
None
Deprecated#
None
0.6.1 - 2026-04-01#
This release focuses on (1) making the public Evaluation API safer/clearer by moving lower-level components behind “private” attributes, (2) improving table loading/writing ergonomics (including audit timestamps and safer writes), and (3) updating fetching capability (notably USGS via dataretrieval.waterdata and NWM temperature handling).
Upgrade notes#
Replace any
Configuration(..., type="primary")withConfiguration(..., timeseries_type="primary").Prefer:
ev.primary_timeseries.load_parquet(...)overev.load.primary_timeseries.from_parquet(...)df_accessor.write_to("my_results")overdf_accessor.write("my_results")
If you were calling
ev.write.*/ev.load.*directly, expect those to be internal/private (ev._write,ev._load) and migrate to table methods where possible.
Breaking / behavior changes#
Renamed
Configuration.type→Configuration.timeseries_type(and corresponding schema/docs updates).Internal Evaluation components are now treated as private (
ev._write,ev._load,ev._validate,ev._read,ev._extract) and many call sites were updated accordingly.USGS site IDs are now expected to be prefixed (e.g.,
"USGS-02449838") when calling the fetch methods. This happens automatically when using thefetch.usgs_streamflow()method, but users calling the underlying fetch functions directly will need to update their site ID formats.
Added#
Audit + metadata fields
Added
created_atandupdated_atcolumns (via migrations) across core tables; warehouse writes now manage these timestamps automatically.Added
propertiesmap columns (via migrations) tolocations,location_crosswalks,location_attributes, andconfigurations.
Safer / clearer write APIs
Added
write_to()on DataFrame accessors for writing results to Iceberg tables.Added protections preventing accidental writes to core tables via accessor writes; use table loading methods instead.
Table-centric loading ergonomics
Added
BaseTable.load_dataframe()so data can be loaded to all tables viaev.<table>.load_dataframe(...).Added domain-table file loading helpers (e.g.,
load_csv,load_parquet) using shared single-file extraction utilities.
Fetching enhancements
NWM: expanded operational configuration descriptions and added support for
T2Dwith optional Kelvin→Celsius conversion (convert_k_to_c).USGS: migrated streamflow fetching to
dataretrieval.waterdatato support the latests USGS API changes and added description-based time series selection via metadata lookup.
Migrations
Added migrations to (1) update units/variables for temperature support, (2) add audit timestamps, (3) add properties map fields, and (4) rename configuration
type→timeseries_type.
Changed#
Refactored validation into
Validate.dataframe(...)with improved handling for:adding missing nullable columns
strict column enforcement
duplicate dropping
clearer foreign key constraint enforcement
Improved GeoDataFrame caching/parquet writing by converting geometry to WKB and writing GeoParquet metadata.
LocalReadWriteEvaluation/LocalReadEvaluationnow acceptnamespace_nameand use constants for local/remote namespaces rather than Spark conf.Updated docs/notebooks to reflect:
timeseries_typerenametable-first load/write patterns (e.g.,
ev.primary_timeseries.load_parquet(...))write_to()usageupdated USGS ID format
Fixed#
Improved
import_evaluation.update_metadata_paths(...)robustness when re-registering imported Iceberg tables and applying migrations.Updated/expanded test fixtures and added new tests covering:
created_at/updated_atbehaviorpropertiesfieldsdomain table CSV/Parquet loading
validation edge cases
Dependencies#
Bumped
dataretrievalto>=1.1.2,<2(and updated lockfile).
Deprecated#
TeehrDataFrameBase.write()is deprecated in favor ofwrite_to()(will be removed in a future release).
0.6.0 - 2026-03-19#
This release introduces major architectural changes and new features. Several breaking changes require updates to existing code; see the Breaking Changes section below.
A core change in this release is the integration of Apache Iceberg
as the underlying table format for TEEHR evaluations. Iceberg brings a rich set of capabilities
to TEEHR, including ACID transactions, time travel, schema evolution, hidden
partitioning, a full suite of write methods (append, upsert, insert, delete, etc.), and more.
In addition to local warehouse support, users now have access to the
TEEHR-Cloud data warehouse hosted on AWS S3 via the download methods, which provides CONUS-scale historical and
real-time NWM simulations and forecasts ready for evaluation.
Breaking Changes#
query() method renamed to aggregate(): The
query()method on metrics and joined timeseries has been renamed toaggregate()to better reflect its purpose. All user code and notebooks referencing.query()must be updated to.aggregate().Pandas DataFrame accessor removed: The
@pd.api.extensions.register_dataframe_accessorTEEHR accessor has been removed. Visualization and utility methods previously accessed via the accessor must now be called directly on the evaluation or table objects.SignatureMetrics renamed to Signatures: Importing
SignatureMetricswill no longer work; useSignaturesinstead.clone_from_s3(), clone_template(), and list_s3_evaluations() removed: These methods have been removed from the
Evaluationclass. Data from the TEEHR-Cloud data warehouse should now be accessed via theev.download.*methods.
Added#
Views for computed DataFrames: New
Viewclasses (JoinedTimeseriesView,PrimaryTimeseriesView,SecondaryTimeseriesView,LocationAttributesView) provide lazy, on-the-fly computed DataFrames that can be filtered, chained withadd_calculated_fields()andaggregate(), and materialized to an Iceberg table viawrite(). Views are accessed via methods on the evaluation object (e.g.,ev.joined_timeseries_view()) and complement the existing persistedTableobjects.TEEHR API support: New download methods backed by a TEEHR API, with support for fetching locations, timeseries, and other data from the TEEHR-Cloud data warehouse via
ev.download.*methods.Automatic pagination for all download methods: Download methods now automatically handle pagination under the hood. A
page_sizeargument is available for tuning. A configurable timeout (default 60 s) has been added to all download methods.RemoteReadOnlyEvaluation and RemoteReadWriteEvaluation classes: New evaluation classes for connecting to a remote TEEHR deployment without requiring a local evaluation directory.
INSERT INTO and DELETE FROM write methods:
Writeclass now supportsinsert_into()anddelete_from()operations, and tables expose adelete()method for row-level deletes.User-defined custom tables: Users can now create and manage their own Iceberg tables alongside the core TEEHR tables (e.g.
primary_timeseries,locations, etc.) usingev.table("my_table"). Custom tables are backed by the sameBaseTableinterface, supporting the full read, filter, aggregate, add calculated fields, load, and write workflow. Core TEEHR tables continue to return their specialized table class; any other name returns a genericBaseTableinstance, giving users a flexible way to persist and query intermediate or derived datasets within the same Iceberg warehouse.Drop table method for user-created tables: Tables now have a
drop()/drop_table()method so users can remove custom tables programmatically.add_attributes() on DataFrame results: A generic
add_attributes()method has been added toTeehrDataFrameBase, allowing attribute columns to be joined onto any query result DataFrame.GenericSQL row-level calculated field: Users can now supply arbitrary SQL expressions as row-level calculated fields via the
GenericSQLclass.name field in geometry joins: The
namefield is now included when joining geometry to timeseries tables.primary_location_id_prefix and secondary_location_id_prefix for crosswalk downloads: These new parameters allow location ID prefix filtering when downloading location crosswalk data.
ID list filtering for location downloads:
download.locations()now accepts a list of location IDs for targeted data retrieval.Ability to generate and write partial joined timeseries: Filters can be applied during the
create_joined_timeseries()step so only a subset of the joined data is written.Load functions added to BaseTable:
load_dataframe(),load_csv(),load_parquet(), andload_spatial()are now available on the base table class, enabling consistent data ingestion across all table types.AWS profile support:
create_spark_session()now accepts an AWS profile name and resolves credentials using the standard AWS credential priority chain.Spark decommissioning support: The Spark session helper now gracefully decommissions executor nodes when a session is closed.
Additional deterministic and probabilistic metrics: New categorical metrics including
SuccessRatio,FrequencyBiasIndex,ProbabilityOfDetection,FalseAlarmRatio,CriticalSuccessIndex, and related metrics have been added.Brier Score and Brier Skill Score: Probabilistic metrics for ensemble evaluation.
ForecastLeadTimeBins row-level calculated field for grouping forecasts by lead time.
Improved Sphinx API documentation: API docs have been reorganized and expanded.
Updated Getting Started documentation: The getting-started guide has been refreshed.
Changed#
Evaluation initialization revisited: Initialization logic has been cleaned up; the template-based initialization approach has been replaced with a simpler, more explicit flow.
Spark session configuration updated: Improved handling of AWS credentials, Iceberg catalog configuration, spot-instance executor support, and Jupyter pod prefix support.
Data validation fixed when adding domain data: Validation now correctly catches invalid values when adding domain data to an evaluation.
Table filter bug fixed: Applying filters to table queries now works correctly.
Foreign key enforcement bug fixed: Fixes an edge case in
load_dataframe()where foreign key checks were not enforced properly.Pearson/R-squared updated to use epsilon for near-zero denominators.
Metrics refactored: Reduces duplicate code across deterministic, probabilistic, and signature metric modules; bootstrap and probabilistic models cleaned up.
Sort order applied to schema migrations: Migrations are now applied in deterministic sort order, preventing out-of-order migration application.
Unused S3Path class removed.
netcdf4 and PySpark pinned to compatible versions for stability.
Conversion workflow updated for migrating v0.4/v0.5 evaluations to v0.6 format.
0.5.3 - 2026-01-07#
Changed#
Pins PySpark to 4.0.1 in pyproject.toml
Fixes bug using
drop_overlapping_assimilation_valuesin NWM operational fetching methods.Fixes a bug in
load_dataframe()Fixes a bug in unpacking metric results
Updates the Getting Started sphinx documentation
Added#
Check for missing location IDs when cloning from s3
Row level calculated fields for forecast lead time bins
Brier Score and Brier Skill Score metrics
0.5.2 - 2025-11-11#
Changed#
Updates version number in pyproject.toml and init files
No changes in functionality from
0.5.1dev10
0.5.1dev10 - 2025-11-10#
Changed#
Rename SignatureMetrics to Signatures
Updates Spearman correlation to handle repeating values
Added#
Flow Duration Curve Slope
VariabilityRatio
Epsilon to handle division by zero
BelowPercentileEventDetection metric
Bootstrapping for signatures
0.5.0 - 2025-08-27#
Changed#
Many updates and new features are introduced in this release, including but not limited to:
Baseflow separation methods
Better handling of warnings
Ability to generate benchmark timeseries and forecasts
Upgrading to pyspark 4.0
Documentation updates
For a full list see: https://github.com/RTIInternational/teehr/milestone/4?closed=1
0.4.13 - 2025-06-09#
Changed#
Updates logic around reading empty tables to allow for cloning empty timeseries tables from s3. Overrides the read method in the
joined_timeseriestable class.Removes the
pa.Check.isin()pandera validation checks and replaces with the manual “foreign key” enforcement method,_enforce_foreign_keys()to speed up validation.Removes redundant dataframe validation during the write methods.
Sets
.set("spark.sql.parquet.enableVectorizedReader", "false")in pyspark config to fix type error when reading null parquet fields.Fixes the path conversion error when visualizing locations in the accessor.
Sorts timeseries before plotting to fix the visualization error in the accessor.
Removes
add_configuration_namefrom user guide doc (#450)Adds
tomlito pyproject.toml todevgroup to support python 3.12 when building sphinx docs.Drops
location_idafter joining attributes to thejoined_timeseriestable.Fixes joining geometry to the secondary timeseries table.
Sets
timeseries_typetosecondarywhen fetching operational NWM gridded data, unless the configuration_name contains “forcing_analysis_assim” in which casetimeseries_typeis set toprimary.
0.4.12 - 2025-05-22#
Changed#
Adds
gitandvimto docker image for TEEHR-HUBMoves
scoringrulesandarchimports into the function to speed up import timeRemoves the repartitioning by
self.partition_byof the dataframe in theBaseTableclass when writing to parquet
0.4.11 - 2025-05-19#
Changed#
Fixes bug in _write_spark_df() method in the BaseTable class that caused writing larger dataframes to fail.
Parallelizes
convert_single_timeseries()when a directory is passed to thein_pathargument.Fixes doc string in
generate_weights_file()Switched to the built-in dropDuplicates() method in the
BaseTableclass to drop duplicates instead of using a custom implementation.Added option to specify the number of partitions when writing dataframes in the BaseTable class.
Added the option to skip the dropDuplicates() method when writing dataframes in the BaseTable class.
0.4.10 - 2025-04-14#
Added#
Adds append and upsert functionality without duplicates to loading methods on tables:
locations
location crosswalk
location attributes
primary and secondary timeseries
Adds upsert argument to fetching methods (append is default).
Clears fetching cache before each call.
Adds ability to add or update the location id prefix during loading in above tables
Adds
reference_timeas a default partition for secondary and joined timeseriesAdds script to re-write timeseries tables partitioned on
reference_timeAdds function to drop potential duplicates before writing tables (
_drop_duplicates())Combines the script to calculate pixel weights per polygon (
generate_weights.py) with the NWM gridded Evaluation fetching methods (retro and operational).This allows users to optionally generate the weights from within the fetching methods or to use a pre-created weights file.
When run from the Evaluation, the weights file is saved to the evaluation cache and corresponds to ids in the
locationstable.
Adds User Guide notebook for NWM gridded fetching
Adds transform functions to metric calculations
Adds
geoviewsdependency to poetry evaluationAdds aws cli and
datashaderto the TEEHR-HUB docker imageRemoves
duckdbfor teehr env
0.4.9 - 2025-03-26#
Added#
Adds pandera schema for the weights file and validates weights dataframe on read and write, coercing values into schema data types
Adds
starting_z_hourandending_z_hourarguments to operational NWM fetching methods (point, gridded)Adds function to drop NaN values (from value field) when fetching NWM and USGS data
Adds a check so that if schema validation fails, the current file is skipped and fetching continues
Adds versions 1.2 and 2.0 to operational NWM fetching (version 2.2 (nwm22) is allowed to be used with a note that it is no different from 2.1)
Adds a test notebook for testing on remote teehr-hub kernel
Adds wrapper functions for deterministic and signature metrics
Changed#
Fixes doc strings for fetch.nwm_retrospective_grids()
Removes
add_configuration_namein fetching and automatically adds if it doesn’t existUpdates dask version
Fixes a bug in parsing the z_hour and day from the remote json paths when an ensemble configuration is selected
Removes the imports in
__init__.pythat were for documentation purposesRemoves hydrotools as a dependency
Updates API documentation, adding evaluation.metrics.Metrics methods
Changes base docker image to
base-notebook:2025.01.24
0.4.8 - 2025-02-17#
Added#
Adds box zoom to location plots.
Adds User Guide page for fetching NWM point data.
Adds new row level calculated fields, DayOfYear, ThresholdValueExceeded, ForecastLeadTime.
Changed#
Changes NWM fetching methods from
nwm_forecast_<xxxx>tonwm_operational_<xxxx>.Set
use_table_schemato False when cloning thejoined_timeseriestable from s3, so that extra fields will not be dropped. Note, this will raise an error if the table is empty or does not exist.Made auto-adding of configuration_name in NWM and USGS fetching optional.
Removed 2 evaluations from s3 (HEFS, NWM fetching), using TEEHR data module instead.
0.4.7 - 2025-01-08#
Added#
Adds RowLevelCalculatedFields and TimeseriesAwareCalculatedFields which are hopefully descriptive enough names.
Adds a User Guide page to describe what they are and how to use them.
Adds hvplot dependency to poetry
Adds add_calculated_fields() methods to joined_timeseries and metrics “tables”
Adds the Continuous Rank Probability Score CRPS ensemble metric using the scoringrules package
Adds a script to create an example ensemble evaluation using data in the test directory
Adds an example notebook to demo CRPS metric query
Adds user guide notebook page for ensembles, reading a test ensemble evaluation from S3
Adds ability to unpack metric dictionary results into separate columns (ie, bootstrap quantiles)
Changed#
Splits metric models and functions into three categories: Deterministic, Probabilistic, Signature. This is a breaking change requiring import of specific metric classes (Deterministic, Probabilistic, Signature) rather than just
Metrics.Functions are moved to separate modules
Models are moved to separate classes
Basemodels and metric enums are moved to a separate basemodel module
Updates API docs, removes unused files and the autoapi directory.
0.4.6 - 2024-12-17#
Added#
Adds
add_missing_columnsto the_validatemethod in theBaseTableclass to allow for adding missing columns to the schema.When upgrading from 0.4.4 or earlier, you may need to run the following to add the missing columns to the secondary_timeseries if you have existing datasets:
sdf = ev.secondary_timeseries.to_sdf()
validated_sdf = ev.secondary_timeseries._validate(sdf, add_missing_columns=True)
ev.secondary_timeseries._write_spark_df(validated_sdf)
Changed#
None
0.4.5 - 2024-12-09#
Added#
Fixes issues with sphinx docs and run the
install_spark_jars.pyscript in the build container.Adds location plotting to accessor.
Adds loading from FEWS XML files.
Adds
memberto secondary timeseries schema for ensembles.
Changed#
Fixes issues with sphinx docs and run the
install_spark_jars.pyscript in the build container.
0.4.4 - 2024-12-02#
Added#
Added ability to read an Evaluation dataset directly from an S3 bucket.
When path to an Evaluation dataset is an S3 bucket, the Evaluation is read-only.
Changed#
Pretty significant refactor of the Table classes to make them more flexible and easier to use.
Added more robust Pandera validation to the Table classes.
Updated docs to reflect changes and added
read_from_s3example.
0.4.3 - 2024-10-19#
Added#
None
Changed#
Changed paths to the S3 bucket evaluations to reference “e*…” instead of “p*…” naming convention.
0.4.2 - 2024-10-18#
Added#
A test-build-publish workflow to push to PyPI
Changed#
None
0.4.1 - 2024-10-15#
Added#
Updated docs to include pages for
grouping,filteringandJoiningin the User Guide.
Changed#
Fixed some broken data download links in the User Guide.
Fixed the post-install script to install the AWS Spark Jars.
Fixed the API doc build.
0.4.0 - 2024-10-13#
Added#
This is a major (although still less that version 1) release that includes a number of new features and changes.
Some of the more significant changes:
Added a new Evaluation class that is the primary interface for working with TEEHR data.
Switched from DuckDB to PySpark to enable horizonal scaling for the computational workloads.
Formalized the structure of the TEEHR dataset.
Added data validation of values referenced from domain and location tables to the timeseries tables.
Updated docs to include new features and changes.
Changed#
Many changes have been made between v0.3.28 and v0.4.0.
0.3.28 - 2024-07-10#
Added#
pandas DataFrame accessor classes for metrics and timeseries queries, including some simple methods for plotting and summarizing data.
Added Bokeh as a dependency for visualization.
Changed#
None
0.3.27 - 2024-07-08#
Added#
Documentation updates primarly to Getting Started and User Guide sections.
Changed#
None
0.3.26 - 2024-06-27#
Added#
Dark theme logo for sphinx documentation.
Added the
picklesharepackage to dev dependency group to fixipythondirective in sphinx documentation.
Changed#
Pinned
sphinx-autodocto v3.0.0 andnumpyto v1.26.4 indocumentation-publish.ymlto fix the API documentation build.Removed unused documentation dependencies from dev group.
0.3.25 - 2024-06-06#
Added#
Added PySpark to TEEHR-HUB (including openjdk-17-jdk and jar files)
Changed#
None
0.3.24 - 2024-05-29#
Added#
Added metrics documentation to the Sphinx documentation.
Changed#
None
0.3.23 - 2024-05-28#
Added#
None
Changed#
Docstring updates in duckdb_database.py.
Changelog update for 0.3.22.
Updates
insert_attributes()induckdb_database.pyto better handle None/Null attribute units.Test updates in
convert.py.
0.3.22 - 2024-05-22#
Added#
None
Changed#
Cleaned up the
DuckDB*classes. Don’t think any public interfaces changed.Import of
DuckDBDatabase,DuckDBDatabaseAPI, andDuckDBJoinedParquetnow usefrom teehr.classes import DuckDBDatabase, DuckDBDatabaseAPI, DuckDBJoinedParquetthe
calculate_field`` method was renamed toinsert_calculated_field``
0.3.21 - 2024-05-21#
Added#
Added the
DuckDBJoinedParquetclass for metric queries on pre-joined parquet files.Added the
DuckDBBaseclass for common methods between theDuckDBDatabase,DuckDBAPI, andDuckDBJoinedParquetclasses.
Changed#
Renamed the
databasedirectory toclasses.Renamed the
teehr_dataset.pytoteehr_duckdb.py.Renamed the
TEEHRDatasetDBandTEEHRDatasetAPIclasses toDuckDBDatabaseandDuckDBAPIrespectively.Removed
lead_timeandabsolute_valuefrom joined table
0.3.20 - 2024-05-18#
Added#
None
Changed#
Update queries to accept a list of paths for example,
primary_filepathandsecondary_filepathIncludesget_metrics(),get_joined_timeseries(),get_timeseries(), andget_timeseries_chars()
0.3.19 - 2024-05-18#
Added#
None
Changed#
Update SQL queries to allow
reference_timeto be NULL.Updated tests for NULL
reference_time
0.3.18 - 2024-05-10#
Added#
Added documentation regarding best practices for specifying the
chunk_byparameter when fetching NWM retrospective and USGS data.
Changed#
Fixed a bug in the NWM retrospective grid loading weighted average calculation.
Changed the method of fetching NWM gridded data to read only a subset of the grid (given by the row/col bounds from the weights file) into memory rather than the entire grid.
Removed ‘day’ and ‘location_id’
chunk_byoptions to reduce redundant data transfer costs.
0.3.17 - 2024-04-22#
Added#
None
Changed#
Dropped “Z” from the file name in the NWM loading functions, adding a note in the docstrings that all times are in UTC.
Changed data type of
zonal_weights_filepathtoUnion[str, Path]innwm_grids.py.Fixed
SettingWithCopyWarningin NWM grid loading.Fixed the
end_datein NWM retrospective loading to include the entirety of the last day and not fail when last available day is specfified.Removed “elevation”, “gage_id”, “order” from NWM v3.0 retrospective point loading.
0.3.16 - 2024-04-11#
Added#
Adds a few new metrics to the queries:
annual_peak_relative_bias
spearman_correlation
kling_gupta_efficiency_mod1
kling_gupta_efficiency_mod2
Changed#
None
0.3.15 - 2024-04-08#
Added#
location_id_prefixas an optional argument togenerate_weights_file()to allow for the prefixing of the location ID with a string.
Changed#
Updated the NWM operational and retrospective grid loading functions so that the location ID as defined in the zonal weights file is used as the location ID in the output parquet files.
0.3.14 - 2024-03-29#
Added#
relative_bias
multiplicative_bias
mean_squared_error
mean_absolute_relative_error
pearson_correlation
r_squared
nash_sutcliffe_efficiency_normalized
Changed#
mean_error (rename current bias to mean_error)
mean_absolute_error (rename current mean_error to mean_absolute_error)
0.3.13 - 2024-03-22#
Added#
None
Changed#
Updated from Enum to StrEnum and added a fix for backwards incompatibility described here: https://tomwojcik.com/posts/2023-01-02/python-311-str-enum-breaking-change. This is required to support both python 3.10 and python 3.11.
Updated TEEHR-HUB to Python 3.11 and
pangeo/pangeo-notebook:2024.03.13Made all packages that use YYYY.MM.DD versioning
>=instead of^inpyproject.toml
0.3.12 - 2024-03-22#
Added#
None
Changed#
Changed the chunking method for USGS and NWM retrospective data loading to iterate over pandas
period_rangerather than usinggroupbyordate_rangeto fix a bug when fetching data over multiple years.
0.3.11 - 2024-03-19#
Added#
None
Changed#
Downgraded required Dask version to
dask = "^2023.8.1"to matchpangeo/pangeo-notebook:2023.09.11
0.3.10 - 2024-03-07#
Added#
Added
test_zonal_mean_results.py
Changed#
Fixed the calculation of the zonal mean of pixel values in
compute_zonal_mean()so it caculates the weighted average (divides by the sum of weight values).Updated grid loading tests and data to reflect the fixed method.
0.3.9 - 2024-02-15#
Added#
Adds sphinx documentation framework and initial docs.
The
documentation-publish.ymlworkflow is set to build the docs and push to github pages on every tag.The
pre-commit-config.ymlgithub hook runs on each commit and checks docstring formatting, trailing whitespaces, and the presence of large files.Added documenation-related python dependencies to
[tool.poetry.group.dev.dependencies]
Changed#
Example notebooks have been moved to
docs/sphinx/user_guide/notebooks.The CHANGELOG.md is now the
index.rstfile indocs/sphinx/changelog.The CONTRIBUTE.md and release_process.md files now part of the
index.rstfile indocs/sphinx/development.The data_models.md and queries.md are now the
data_models.rstandqueries.rstfiles indocs/sphinx/getting_started.
0.3.8 - 2024-02-14#
Added#
Adds logging with a
NullHandler()that can be implemented by the parent app using teehr.
0.3.7 - 2024-02-09#
Changed#
Upgraded pandas to ^2.2.0
Changed unit=”H” in pandas.time_delta to unit=”h”
Updated assert statements in
test_weight_generation.py
0.3.6 - 2024-02-07#
Added#
Adds an exception to catch an error when a corrupted file is encountered while building the Kerchunk reference file using
SingleHdf5ToZarr.The behavior determining whether to raise an exception is controlled by the
ignore_missing_fileflag.
0.3.5 - 2023-12-18#
Added#
Adds additional chunking methods for USGS and NWM retrospective loading to allow week, month and year chunking.
Adds mean areal summaries for NWM retrospective gridded forcing variables
Adds NWM v3.0 to retrospective loading
Changed#
Fixes USGS loading to include last date of range
Removes extra fields from v2.1 retro output
0.3.4 - 2023-12-18#
Added#
Adds the
read_onlyargument to thequerymethod in the TEEHRDatasetDB class with default values specified in the query methods.
Changed#
Establishes a read-only database connection as a class variable to the TEEHRDatasetAPI class so it can be re-used for each class instance.
0.3.3 - 2023-12-13#
Added#
Adds
get_joined_timeseriesmethod to TEEHR Dataset classes.
Changed#
Updated validation fields in the
TimeSeriesQuerypydantic model to accept only selected fields rather than existing database fields.Updated function argument typing in
queries/utils.pyto be more explicit
0.3.2 - 2023-12-12#
Added#
None
Changed#
Fixed the
biasmetric so that it issum(secondary_value - primary_value)/count(*)instead ofsum(primary_value - secondary_value)/count(*)which resulted in the wrong sign.Changed
primary_max_value_time,secondary_max_value_timeandmax_value_timedeltaqueries to use built-in functions instead of CTEs. This improves speed significantly.Fixed bug in queries when filtering by
configuration,measurement_unitandvariable.Refactored
join_attributesinTEEHRDatasetDBto better handle attributes with no units.Refactored
create_join_and_save_timeseries_query queriesso that the de-duplication CTE is after the intial join CTE for improved performance.Changes default list of
order_byvariables ininsert_joined_timeseriesto improve query performance
0.3.1 - 2023-12-08#
Added#
Adds a boolean flag to parquet-based metric query control whether or not to de-duplicate.
Adds a test primary timeseries file including duplicate values for testing.
Changed#
Refactored parquet-based
get_metricsandget_joined_timeseriesqueries to that so that the de-duplication CTE is after the intial join CTE for improved performance.
0.3.0 - 2023-12-08#
Added#
Adds a dataclass and database that allows preprocessing of joined timeseries and attributes as well as the addition of user defined functions.
Adds an initial web service API that serves out
timeseriesandmetricsalong with some other supporting data.Adds an initial interactive web application using the web service API.
Changed#
Switches to poetry to manage Python venv
Upgrades to Pydantic 2+
Upgrades to Pangeo image
pangeo/pangeo-notebook:2023.09.11
0.2.9 - 2023-12-08#
Added#
Three options related to kerchunk jsons
local- (default) previous behavior, manually creates the jsons based on GCS netcdf files using Kerchunk’sSingleHdf5ToZarr. Any locally existing files will be used before creating new jsons from the remote store.remote- use pre-created jsons, skipping any that do not exist within the specified time frame. Jsons are read directly from s3 using fsspecauto- use pre-created jsons, creating any that do not exist within the specified time frame
Adds
nwm_version(nwm22 or nwm30) anddata_source(GCS, NOMADS, DSTOR - currently on GCS implemented) as loading arguments
Changed#
Combines loading modules into one directory
loading/nwmUpdates to loading example notebooks
Updates to loading tests
0.2.8 - 2023-11-14#
Added#
NWM v3.0 data loading and configuration models
Added check for duplicate rows in
get_metricsandget_joined_timeseriesqueries (#69)Added control for overwrite file behavior in loading (#77)
Significant refactor of the loading libraries
Added ability to select which retrospective version to download (v2.0 or v2.1) (#80)
Changed#
Fixed NWM pydantic configurations models for v2.2
Refactored
models/loadingdirectory
0.2.7 - 2023-09-14#
Added#
More testing to NWM point and grid loading functions
0.2.6 - 2023-09-14#
Changed#
Fixed some sloppy bugs in
nwm_grid_data.py
Added#
ValueErrorhandling when encountering a corrupt zarr json file
0.2.5 - 2023-09-11#
Changed#
None
Added#
Added ability to use holoviz export to TEEHR-HUB:
Installed firefox (and a bunch of dependencies) to the Docker container (using apt)
Installed selenium and the geckodriver using conda
0.2.4 - 2023-08-30#
Changed#
Behavior of loading when encountering missing files
Renamed field
zonetolocation_idinnwm_grid_data.pyandgenerate_weights.py
Added#
The boolean flag
ignore_missing_filesto point and grid loading to determine whether to fail or continue on missing NWM filesAdded a check to skip locally existing zarr json files when loading NWM data
0.2.3 - 2023-08-23#
Changed#
Removed pyarrow from time calculations in
nwm_point_data.pyloading due to windows bugUpdated output file name in
nwm_point_data.pyto include forecast hour ifprocess_by_z_hour=False
0.2.2 - 2023-08-23#
Added#
nodejs to the jupyterhub build so the extensions will load (not 100% sure this was needed)
Changed#
Updated TEEHR to v0.2.2, including TEEHR-HUB
Updated the TEEHR-HUB baseimage to
pangeo/pangeo-notebook:2023.07.05
0.2.1 - 2023-08-21#
Added#
Nothing
Changed#
Updated TEEHR version in TEEHR-HUB to v0.2.1
Converts nwm feature id’s to numpy array in loading
0.2.0 - 2023-08-17#
Added#
This changelog
Changed#
Loading directory refactor changed import paths to loading modules
Changed directory of
generate_weights.pyutilityReplaced NWM config parameter dictionary with pydantic models
NWM reference time used by TEEHR is now taken directly from the file name rather than the “reference time” embedded in the file
Use of the term
runupdated toconfigurationfor NWM
0.1.3 - 2023-06-17#
Added#
Initial release