teehr.RowLevelCalculatedFields#

class teehr.RowLevelCalculatedFields[source]#

Bases: object

Row level Calculated Fields.

Notes

Row level CFs are applied to each row in the table based on data that is in one or more existing fields. These are applied per row and are not aware of the data in any other row (e.g., are not aware of any other timeseries values in a “timeseries”). This can be used for adding fields such as a field based on the data/time (e.g., month, year, season, etc.) or based on the value field (e.g., normalized flow, log flow, etc.) and many other uses.

Available Calculated Fields:

  • Month

  • Year

  • WaterYear

  • NormalizedFlow

  • Seasons

  • ForecastLeadTime

  • ThresholdValueExceeded

  • DayOfYear

Methods

class DayOfYear(*, input_field_name: str = 'value_time', output_field_name: str = 'day_of_year')#

Bases: CalculatedFieldABC, CalculatedFieldBaseModel

Adds the day of the year from a timestamp column.

Properties#

  • input_field_name:

    The name of the column containing the timestamp. Default: “value_time”

  • output_field_name:

    The name of the column to store the day of the year. Default: “day_of_year”

Notes

  • February 29th in leap years is set to None.

  • All days after February 29th are adjusted to correspond to the same day of the year as in a non-leap year.

apply_to(sdf: DataFrame) DataFrame#

Apply the calculated field to the Spark DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ForecastLeadTime(*, value_time_field_name: str = 'value_time', reference_time_field_name: str = 'reference_time', output_field_name: str = 'forecast_lead_time')#

Bases: CalculatedFieldABC, CalculatedFieldBaseModel

Adds the forecast lead time from a timestamp column.

Properties#

  • value_time_field_name:

    The name of the column containing the timestamp. Default: “value_time”

  • reference_time_field_name:

    The name of the column containing the forecast time. Default: “reference_time”

  • output_field_name:

    The name of the column to store the forecast lead time. Default: “forecast_lead_time”

apply_to(sdf: DataFrame) DataFrame#

Apply the calculated field to the Spark DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class Month(*, input_field_name: str = 'value_time', output_field_name: str = 'month')#

Bases: CalculatedFieldABC, CalculatedFieldBaseModel

Adds the month from a timestamp column.

Properties#

  • input_field_name:

    The name of the column containing the timestamp. Default: “value_time”

  • output_field_name:

    The name of the column to store the month. Default: “month”

apply_to(sdf: DataFrame) DataFrame#

Apply the calculated field to the Spark DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class NormalizedFlow(*, primary_value_field_name: str = 'primary_value', drainage_area_field_name: str = 'drainage_area', output_field_name: str = 'normalized_flow')#

Bases: CalculatedFieldABC, CalculatedFieldBaseModel

Normalize flow values by drainage area.

Properties#

  • primary_value_field_name:

    The name of the column containing the flow values. Default: “primary_value”

  • drainage_area_field_name:

    The name of the column containing the drainage area. Default: “drainage_area”

  • output_field_name:

    The name of the column to store the normalized flow values. Default: “normalized_flow”

apply_to(sdf: DataFrame) DataFrame#

Apply the calculated field to the Spark DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class Seasons(*, value_time_field_name: str = 'value_time', season_months: dict = {'fall': [9, 10, 11], 'spring': [3, 4, 5], 'summer': [6, 7, 8], 'winter': [12, 1, 2]}, output_field_name: str = 'season')#

Bases: CalculatedFieldABC, CalculatedFieldBaseModel

Adds the season from a timestamp column.

Properties#

  • value_time_field_name:

    The name of the column containing the timestamp. Default: “value_time”

  • season_months:

    A dictionary mapping season names to the months that define them.

    Default: {
        "winter": [12, 1, 2],
        "spring": [3, 4, 5],
        "summer": [6, 7, 8],
        "fall": [9, 10, 11]
    }
    
  • output_field_name:

    The name of the column to store the season. Default: “season”

apply_to(sdf: DataFrame) DataFrame#

Apply the calculated field to the Spark DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ThresholdValueExceeded(*, input_field_name: str = 'primary_value', threshold_field_name: str = 'secondary_value', output_field_name: str = 'threshold_value_exceeded')#

Bases: CalculatedFieldABC, CalculatedFieldBaseModel

Adds boolean column indicating if the input value exceeds a threshold.

Properties#

  • input_field_name:

    The name of the column containing the primary value. Default: “primary_value”

  • threshold_field_name:

    The name of the column containing the threshold value. Default: 0

  • output_field_name:

    The name of the column to store the boolean value. Default: “threshold_value_exceeded”

apply_to(sdf: DataFrame) DataFrame#

Apply the calculated field to the Spark DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class WaterYear(*, input_field_name: str = 'value_time', output_field_name: str = 'water_year')#

Bases: CalculatedFieldABC, CalculatedFieldBaseModel

Adds the water year from a timestamp column.

Properties#

  • input_field_name:

    The name of the column containing the timestamp. Default: “value_time”

  • output_field_name:

    The name of the column to store the water year. Default: “water_year”

Water year is defined as the year of the date plus one if the month is October or later.

apply_to(sdf: DataFrame) DataFrame#

Apply the calculated field to the Spark DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class Year(*, input_field_name: str = 'value_time', output_field_name: str = 'year')#

Bases: CalculatedFieldABC, CalculatedFieldBaseModel

Adds the year from a timestamp column.

Properties#

  • input_field_name:

    The name of the column containing the timestamp. Default: “value_time”

  • output_field_name:

    The name of the column to store the year. Default: “year”

apply_to(sdf: DataFrame) DataFrame#

Apply the calculated field to the Spark DataFrame.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].