Utils

Note

This documentation is still under development. If you find any bug or have any suggestion in utils, please, open an issue in the GitHub repository.

Moving averages

mango_time_series.utils.moving_averages.create_season_unit_previous_date(df: polars.DataFrame, time_unit: str) polars.DataFrame

Create previous date mapping for seasonal unit analysis.

Calculates the previous occurrence date for each seasonal unit (day of week) relative to the forecast origin. This is used for creating seasonal variables that reference the same day of week from previous periods.

Parameters:
  • df (polars.DataFrame) – DataFrame containing forecast_origin and season_unit columns

  • time_unit (str) – Time unit string (e.g., ‘d’ for days, ‘w’ for weeks)

Returns:

DataFrame with previous_dow_date column added

Return type:

polars.DataFrame

Note:
  • Calculates weekday difference between season_unit and forecast_origin

  • Adjusts for week boundaries (handles day-of-week wrapping)

  • Creates offset dates for seasonal variable creation

mango_time_series.utils.moving_averages.create_recent_variables(df: polars.LazyFrame, SERIES_CONF: dict, window: list, lags: list, gap: int = 0, freq: int = 1, colname: str = '', season_unit: bool = False) polars.LazyFrame

Create rolling averages and lag variables for time series forecasting.

Generates rolling average and lag features for time series data, with optional seasonal unit grouping. Creates multiple columns for different window sizes and lag periods, grouped by the key columns specified in SERIES_CONF.

Parameters:
  • df (polars.LazyFrame) – Input LazyFrame containing time series data

  • SERIES_CONF (dict) – Configuration dictionary containing KEY_COLS and TIME_PERIOD

  • window (list) – List of window sizes for rolling averages

  • lags (list) – List of lag periods to create

  • gap (int) – Number of periods to skip before starting window (default: 0)

  • freq (int) – Frequency multiplier for lag calculations (default: 1)

  • colname (str) – Prefix for new column names (default: “”)

  • season_unit (bool) – Whether to group by seasonal unit (default: False)

Returns:

LazyFrame with new rolling average and lag columns

Return type:

polars.LazyFrame

Note:
  • Rolling averages exclude current row (shifted by gap)

  • Lag variables are shifted by (gap - 1 + lag * freq)

  • Seasonal unit mode uses previous day-of-week dates for alignment

  • Column naming: y_{colname}roll_{window} and y_{colname}lag_{lag*freq}

mango_time_series.utils.moving_averages.create_seasonal_variables(df: polars.LazyFrame, SERIES_CONF: dict, window: list, lags: list, season_unit: str, freq: int, gap: int = 0) polars.LazyFrame

Create seasonal rolling averages and lag variables.

Generates seasonal features by creating rolling averages and lag variables grouped by seasonal units (day of week, week, or month). Automatically extracts the appropriate seasonal unit from datetime and creates variables that capture seasonal patterns in the time series.

Parameters:
  • df (polars.LazyFrame) – Input LazyFrame containing time series data with datetime column

  • SERIES_CONF (dict) – Configuration dictionary containing KEY_COLS and TIME_PERIOD

  • window (list) – List of window sizes for rolling averages

  • lags (list) – List of lag periods to create

  • season_unit (str) – Seasonal unit type (‘day’, ‘week’, or ‘month’)

  • freq (int) – Frequency multiplier for lag calculations

  • gap (int) – Number of periods to skip before starting window (default: 0)

Returns:

LazyFrame with seasonal rolling average and lag columns

Return type:

polars.LazyFrame

Note:
  • Extracts seasonal unit from datetime column based on season_unit parameter

  • Uses sea_ prefix for seasonal variable column names

  • Removes season_unit column after processing

  • Supports day (weekday), week, and month seasonal units

Processing time series

mango_time_series.utils.processing_time_series.aggregate_to_input(df: pandas.DataFrame, freq: str, series_conf: dict) pandas.DataFrame

Aggregate time series data to specified frequency using pandas.

Groups the data by key columns and time frequency, then applies aggregation operations defined in the series configuration.

Parameters:
  • df (pandas.DataFrame) – DataFrame containing time series data

  • freq (str) – Frequency string for aggregation (e.g., ‘D’, ‘W’, ‘M’)

  • series_conf (dict) – Configuration dictionary containing KEY_COLS and AGG_OPERATIONS

Returns:

Aggregated DataFrame with specified frequency

Return type:

pandas.DataFrame

Note:
  • Uses pandas Grouper for time-based grouping

  • Applies aggregation operations from series_conf[“AGG_OPERATIONS”]

  • Groups by both key columns and time frequency

mango_time_series.utils.processing_time_series.aggregate_to_input_pl(df: pandas.DataFrame, freq: str, series_conf: dict) pandas.DataFrame

Aggregate time series data to specified frequency using Polars.

Converts pandas DataFrame to Polars, groups by key columns and time frequency, applies aggregation operations, then converts back to pandas.

Parameters:
  • df (pandas.DataFrame) – DataFrame containing time series data

  • freq (str) – Frequency string for aggregation (e.g., ‘D’, ‘W’, ‘M’)

  • series_conf (dict) – Configuration dictionary containing KEY_COLS and AGG_OPERATIONS

Returns:

Aggregated DataFrame with specified frequency

Return type:

pandas.DataFrame

Note:
  • Converts pandas to Polars for processing, then back to pandas

  • Handles month frequency conversion (‘m’, ‘MS’, ‘ME’ -> ‘mo’)

  • Uses Polars datetime truncate for time-based grouping

  • Supports sum, mean, min, max, median aggregation operations

mango_time_series.utils.processing_time_series.aggregate_to_input_pllazy(df: polars.LazyFrame, freq: str, series_conf: dict) polars.LazyFrame

Aggregate time series data to specified frequency using Polars LazyFrame.

Groups LazyFrame by key columns and time frequency, applies aggregation operations, and returns sorted result as LazyFrame.

Parameters:
  • df (polars.LazyFrame) – LazyFrame containing time series data

  • freq (str) – Frequency string for aggregation (e.g., ‘D’, ‘W’, ‘M’)

  • series_conf (dict) – Configuration dictionary containing KEY_COLS and AGG_OPERATIONS

Returns:

Aggregated LazyFrame with specified frequency

Return type:

polars.LazyFrame

Note:
  • Works with Polars LazyFrame for efficient processing

  • Uses Polars datetime truncate for time-based grouping

  • Sorts result by key columns and datetime

  • Supports sum, mean, min, max, median aggregation operations

mango_time_series.utils.processing_time_series.rename_to_common_ts_names(df: pandas.DataFrame, time_col: str, value_col: str) pandas.DataFrame

Rename columns to standard time series naming convention.

Standardizes column names by renaming time and value columns to ‘datetime’ and ‘y’ respectively, and converts all column names to lowercase.

Parameters:
  • df (pandas.DataFrame) – DataFrame to rename columns in

  • time_col (str) – Name of the time/datetime column

  • value_col (str) – Name of the value/target column

Returns:

DataFrame with standardized column names

Return type:

pandas.DataFrame

Note:
  • Renames time_col to ‘datetime’ and value_col to ‘y’

  • Converts all column names to lowercase

  • Standardizes naming for time series processing pipeline

mango_time_series.utils.processing_time_series.rename_to_common_ts_names_pl(df: polars.LazyFrame, time_col: str, value_col: str) polars.LazyFrame

Rename columns to standard time series naming convention using Polars.

Standardizes column names by renaming time and value columns to ‘datetime’ and ‘y’ respectively, and converts all column names to lowercase.

Parameters:
  • df (polars.LazyFrame) – LazyFrame to rename columns in

  • time_col (str) – Name of the time/datetime column

  • value_col (str) – Name of the value/target column

Returns:

LazyFrame with standardized column names

Return type:

polars.LazyFrame

Note:
  • Renames time_col to ‘datetime’ and value_col to ‘y’

  • Converts all column names to lowercase using with_columns

  • Standardizes naming for time series processing pipeline

mango_time_series.utils.processing_time_series.drop_negative_output(df: pandas.DataFrame) pandas.DataFrame

Remove rows with negative values from time series data.

Filters out rows where the target variable (y) has negative values, which are typically not meaningful for sales or demand forecasting.

Parameters:

df (pandas.DataFrame) – DataFrame containing time series data with ‘y’ column

Returns:

DataFrame with negative value rows removed

Return type:

pandas.DataFrame

Note:
  • Removes rows where y < 0

  • Logs the number of rows being dropped

  • Preserves all other data and columns

mango_time_series.utils.processing_time_series.drop_negative_output_pl(df: polars.LazyFrame) polars.LazyFrame

Remove rows with negative values from time series data using Polars.

Filters out rows where the target variable (y) has negative values, which are typically not meaningful for sales or demand forecasting. Uses lazy evaluation for efficient processing.

Parameters:

df (polars.LazyFrame) – LazyFrame containing time series data with ‘y’ column

Returns:

LazyFrame with negative value rows removed

Return type:

polars.LazyFrame

Note:
  • Removes rows where y < 0 using filter operation

  • Logs the number of rows being dropped

  • Uses lazy evaluation for memory efficiency

  • Preserves all other data and columns

mango_time_series.utils.processing_time_series.add_covid_mark(df: polars.LazyFrame) polars.LazyFrame

Add COVID period indicator column to time series data.

Creates a boolean column ‘covid’ that marks the COVID-19 pandemic period from March 2020 to March 2021, which can be used for analysis or modeling.

Parameters:

df (polars.LazyFrame) – LazyFrame containing time series data with datetime column

Returns:

LazyFrame with added ‘covid’ boolean column

Return type:

polars.LazyFrame

Note:
  • COVID period: 2020-03-01 to 2021-03-01

  • Creates boolean column where True indicates COVID period

  • Useful for identifying pandemic impact on time series patterns

mango_time_series.utils.processing_time_series.create_lags_col(df: pandas.DataFrame, col: str, lags: List[int], key_cols: List[str] = None, check_col: List[str] = None) pandas.DataFrame

Create lagged columns for time series feature engineering.

Generates lagged versions of a specified column for time series analysis. Supports both positive lags (past values) and negative lags (future values/leads). Can handle multiple time series by grouping on key columns and optionally handle discontinuities by checking for changes in specified columns.

Parameters:
  • df (pandas.DataFrame) – DataFrame to add lagged columns to

  • col (str) – Name of the column to create lags for

  • lags (list[int]) – List of lag values (positive for past, negative for future)

  • key_cols (list[str], optional) – Columns that define each time series (for grouping)

  • check_col (list[str], optional) – Columns to check for discontinuities in lagged series

Returns:

DataFrame with lagged columns added

Return type:

pandas.DataFrame

Note:
  • Positive lags create columns named ‘{col}_lag{lag}’

  • Negative lags create columns named ‘{col}_lead{abs(lag)}’

  • Groups by key_cols if provided, otherwise treats as single series

  • Sets lag values to NaN when discontinuities detected in check_col

  • Requires pandas and numpy to be installed

mango_time_series.utils.processing_time_series.series_as_columns(df: pandas.DataFrame, series_conf: dict) pandas.DataFrame

Pivot time series data to wide format with series as columns.

Transforms long-format time series data into wide format where each unique combination of key columns becomes a separate column. Uses sum aggregation for overlapping values.

Parameters:
  • df (pandas.DataFrame) – DataFrame in long format with time series data

  • series_conf (dict) – Configuration dictionary containing KEY_COLS and VALUE_COL

Returns:

DataFrame in wide format with series as columns

Return type:

pandas.DataFrame

Note:
  • Pivots using datetime as index and key_cols as columns

  • Uses sum aggregation for overlapping values

  • Flattens multi-level column names with underscore separator

  • Renames first column back to ‘datetime’

mango_time_series.utils.processing_time_series.series_as_rows(df: pandas.DataFrame, series_conf: dict) pandas.DataFrame

Pivot time series data to long format with series as rows.

Transforms wide-format time series data into long format where each time series becomes a separate row. Splits column names to reconstruct the original key columns.

Parameters:
  • df (pandas.DataFrame) – DataFrame in wide format with series as columns

  • series_conf (dict) – Configuration dictionary containing KEY_COLS and VALUE_COL

Returns:

DataFrame in long format with series as rows

Return type:

pandas.DataFrame

Note:
  • Uses melt to transform wide to long format

  • Splits column names by underscore to reconstruct key columns

  • Arranges columns in order: datetime, key_cols, value_col

  • Removes temporary ‘id’ column after processing

mango_time_series.utils.processing_time_series.process_time_series(df: pandas.DataFrame | polars.DataFrame | polars.LazyFrame, series_conf: Dict) polars.LazyFrame

Process time series data through complete preprocessing pipeline.

Applies a comprehensive preprocessing pipeline including column standardization, negative value removal, aggregation, dense data creation, and COVID marking. Converts input to Polars LazyFrame for efficient processing.

Parameters:
  • df (Union[pandas.DataFrame, polars.DataFrame, polars.LazyFrame]) – Input DataFrame (pandas, Polars, or LazyFrame)

  • series_conf (dict) – Configuration dictionary containing processing parameters

Returns:

Processed LazyFrame with standardized time series data

Return type:

polars.LazyFrame

Note:
  • Converts input to LazyFrame for processing

  • Standardizes column names (TIME_COL -> datetime, VALUE_COL -> y)

  • Removes negative values from target variable

  • Aggregates to daily frequency, then to specified frequency

  • Creates dense data with complete date ranges

  • Adds COVID period indicator column

mango_time_series.utils.processing_time_series.create_dense_data(df: pandas.DataFrame, id_cols: List[str], freq: str, min_max_by_id: bool = None, date_init: str = None, date_end: str = None, time_col: str = 'timeslot') pandas.DataFrame

Create dense time series data with complete date ranges.

Expands sparse time series data to include all dates within the specified frequency, filling missing dates with NaN values. Can use global date range or individual date ranges per ID group.

Parameters:
  • df (pandas.DataFrame) – DataFrame to expand with complete date ranges

  • id_cols (list[str]) – List of columns that identify unique time series

  • freq (str) – Frequency string for date range generation (e.g., ‘D’, ‘W’, ‘M’)

  • min_max_by_id (bool, optional) – Whether to use individual min/max dates per ID (default: None)

  • date_init (str, optional) – Override start date for all series (default: None)

  • date_end (str, optional) – Override end date for all series (default: None)

  • time_col (str) – Name of the time column (default: “timeslot”)

Returns:

DataFrame with complete date ranges and original data merged

Return type:

pandas.DataFrame

Note:
  • Creates Cartesian product of all dates and ID combinations

  • Fills missing dates with NaN values

  • Requires pandas to be installed

  • Uses cross join to create complete date grid

mango_time_series.utils.processing_time_series.create_dense_data_pl(df: polars.LazyFrame, id_cols: List[str], freq: str, min_max_by_id: bool = None, date_init: str = None, date_end: str = None, time_col: str = 'timeslot') polars.LazyFrame

Create dense time series data with complete date ranges using Polars.

Expands sparse time series data to include all dates within the specified frequency, filling missing dates with null values. Uses Polars for efficient processing with lazy evaluation.

Parameters:
  • df (polars.LazyFrame) – LazyFrame to expand with complete date ranges

  • id_cols (list[str]) – List of columns that identify unique time series

  • freq (str) – Frequency string for date range generation (e.g., ‘D’, ‘W’, ‘M’)

  • min_max_by_id (bool, optional) – Whether to use individual min/max dates per ID (default: None)

  • date_init (str, optional) – Override start date for all series (default: None)

  • date_end (str, optional) – Override end date for all series (default: None)

  • time_col (str) – Name of the time column (default: “timeslot”)

Returns:

LazyFrame with complete date ranges and original data merged

Return type:

polars.LazyFrame

Note:
  • Collects LazyFrame for date range calculations

  • Creates cross join of all dates and ID combinations

  • Converts time column to datetime format

  • Returns LazyFrame for efficient processing

mango_time_series.utils.processing_time_series.create_dense_data_pllazy(df: polars.LazyFrame, id_cols, freq: str, min_max_by_id: bool = None, date_init=None, date_end=None, time_col: str = 'timeslot') polars.LazyFrame
Create a dense dataframe with a frequency of freq, given range of dates or inherited from the dataframe,

using the id_cols as keys.

Parameters:
  • df – dataframe to be expanded

  • id_cols – list of columns to be used as keys

  • freq – frequency of the new dataframe

  • min_max_by_id – boolean to indicate if the range of dates is the min and max of the dataframe by id

  • date_init – if it has a value, all initial dates will be set to this value

  • date_end – if it has a value, all final dates will be set to this value

  • time_col – string with name of the column with the time information

Returns:

dataframe with all the dates using the frequency freq

mango_time_series.utils.processing_time_series.create_recurrent_dataset(data: numpy.ndarray, look_back: int, include_output_lags: bool = False, lags: List[int] = None, output_last: bool = True) tuple[numpy.ndarray, numpy.ndarray]

Create dataset for recurrent neural networks with time series sequences.

Transforms 2D time series data into 3D sequences suitable for RNN training. Creates sliding windows of historical data as input and corresponding future values as targets. Supports optional output lag features and flexible target column positioning.

Parameters:
  • data (numpy.ndarray) – 2D array with shape (num_samples, num_features)

  • look_back (int) – Number of previous time steps to use as input

  • include_output_lags (bool) – Whether to include output lag features (default: False)

  • lags (list[int], optional) – List of lag periods to include as features (default: None)

  • output_last (bool) – Whether target is last column (True) or first column (False)

Returns:

Tuple containing (input_sequences, target_values)

Return type:

tuple[numpy.ndarray, numpy.ndarray]

Note:
  • Input shape: (num_samples, look_back, num_features)

  • Output shape: (num_samples,)

  • Supports both positive and negative lags

  • Handles output column positioning (first vs last)

  • Creates sequences with proper temporal alignment

mango_time_series.utils.processing_time_series.get_corr_matrix(df: pandas.DataFrame, n_top: int = None, threshold: float = None, date_col: str = None, years_corr: List[int] = None, subset: List[str] = None) Dict[str, Dict[str, float]]

Calculate correlation matrix for time series data with filtering options.

Computes Pearson correlation matrix for time series data with various filtering and selection options. Automatically detects date columns and validates data format. Returns top correlations or correlations above threshold for each time series.

Parameters:
  • df (pandas.DataFrame) – DataFrame containing time series data

  • n_top (int, optional) – Number of top correlations to return per series (default: None)

  • threshold (float, optional) – Minimum correlation threshold to filter results (default: None)

  • date_col (str, optional) – Name of the date column (auto-detected if None)

  • years_corr (list[int], optional) – List of years to filter data for correlation calculation (default: None)

  • subset (list[str], optional) – List of series names to focus correlation analysis on (default: None)

Returns:

Dictionary with series names as keys and correlation dictionaries as values

Return type:

dict[str, dict[str, float]]

Note:
  • Automatically detects datetime columns or uses index

  • Validates data format and raises errors for inconsistencies

  • Sets diagonal correlations to -100 to avoid self-correlation

  • Returns all correlations if both n_top and threshold are None

mango_time_series.utils.processing_time_series.get_date_col_candidate(df: pandas.DataFrame) tuple[List[str] | None, bool]

Identify datetime columns in DataFrame for time series analysis.

Searches for datetime columns in the DataFrame and determines whether the index contains datetime information. Returns the datetime column names and a boolean indicating if the index is datetime-based.

Parameters:

df (pandas.DataFrame) – DataFrame to analyze for datetime columns

Returns:

Tuple containing (datetime_columns, index_is_datetime)

Return type:

tuple[list[str] or None, bool]

Note:
  • Returns None for datetime_columns if no datetime columns found

  • Returns True for index_is_datetime if index is DatetimeIndex

  • Searches all columns for datetime64 dtypes

  • Used by correlation functions for automatic date detection

mango_time_series.utils.processing_time_series.raise_if_inconsistency(df: pandas.DataFrame, date_col: str, as_index: bool) None

Validate DataFrame format for time series correlation analysis.

Performs comprehensive validation of DataFrame structure to ensure it’s suitable for correlation analysis. Checks for datetime columns, duplicate indices, numeric columns, and proper pivoted format.

Parameters:
  • df (pandas.DataFrame) – DataFrame to validate

  • date_col (str) – Name of the date column (or None if using index)

  • as_index (bool) – Whether the DataFrame uses datetime index

Returns:

None (raises ValueError if validation fails)

Return type:

None

Raises:

ValueError: If DataFrame format is inconsistent or invalid

Note:
  • Validates presence of exactly one datetime column

  • Checks for duplicate indices in time series data

  • Ensures all non-datetime columns are numeric

  • Provides example format in error messages

  • Used by correlation functions for data validation

mango_time_series.utils.processing_time_series.get_corr_matrix_aux(df: pandas.DataFrame, years_corr: List[int] = None, n_top: int = None, threshold: float = None, subset: List[str] = None) Dict[str, Dict[str, float]]

Compute correlation matrix with filtering and selection options.

Calculates Pearson correlation matrix for time series data and returns filtered results based on various criteria. Supports year filtering, top N correlations, threshold filtering, and subset analysis.

Parameters:
  • df (pandas.DataFrame) – DataFrame with datetime index and numeric columns

  • years_corr (list[int], optional) – List of years to filter data for correlation calculation (default: None)

  • n_top (int, optional) – Number of top correlations to return per series (default: None)

  • threshold (float, optional) – Minimum correlation threshold to filter results (default: None)

  • subset (list[str], optional) – List of series names to focus correlation analysis on (default: None)

Returns:

Dictionary with series names as keys and correlation dictionaries as values

Return type:

dict[str, dict[str, float]]

Note:
  • Filters data by specified years before correlation calculation

  • Sets diagonal correlations to -100 to avoid self-correlation

  • Supports subset analysis for focused correlation studies

  • Returns all correlations if both n_top and threshold are None

  • Issues warnings for edge cases (no threshold matches, etc.)

Tabular structure

mango_time_series.utils.tabular_structure.create_tabular_structure(df: pandas.DataFrame, horizon: int, SERIES_CONF: dict) pandas.DataFrame

Create tabular structure for time series forecasting with multiple horizons.

Transforms time series data into a tabular format suitable for machine learning by creating all possible combinations of forecast origins and horizons. Each row represents a forecast point with its corresponding horizon.

Parameters:
  • df (pandas.DataFrame) – DataFrame containing time series data with datetime column

  • horizon (int) – Maximum forecast horizon to create (e.g., 7 for 7-day ahead forecasts)

  • SERIES_CONF (dict) – Configuration dictionary containing KEY_COLS and TIME_PERIOD settings

Returns:

DataFrame with tabular structure including horizon and forecast_origin columns

Return type:

pandas.DataFrame

Note:
  • Creates Cartesian product of original data with horizon range (1 to horizon)

  • Calculates forecast_origin by subtracting horizon from datetime

  • Handles different time periods (months vs other units)

  • Sorts result by key columns and datetime

  • Each row represents one forecast point for one horizon

mango_time_series.utils.tabular_structure.create_tabular_structure_pl(df: polars.LazyFrame, horizon: int, SERIES_CONF: dict) polars.LazyFrame

Create tabular structure for time series forecasting using Polars.

Transforms time series data into a tabular format suitable for machine learning using Polars LazyFrame for efficient processing. Creates all possible combinations of forecast origins and horizons with optimized operations.

Parameters:
  • df (polars.LazyFrame) – LazyFrame containing time series data with datetime column

  • horizon (int) – Maximum forecast horizon to create (e.g., 7 for 7-day ahead forecasts)

  • SERIES_CONF (dict) – Configuration dictionary containing TIME_PERIOD settings

Returns:

LazyFrame with tabular structure including horizon and forecast_origin columns

Return type:

polars.LazyFrame

Note:
  • Uses Polars cross join for efficient Cartesian product creation

  • Extracts time unit from TIME_PERIOD configuration

  • Calculates forecast_origin using Polars datetime offset operations

  • Maintains lazy evaluation for memory efficiency

  • Each row represents one forecast point for one horizon

Time features

mango_time_series.utils.time_features.create_time_features(df: polars.LazyFrame, SERIES_CONF: dict) polars.LazyFrame

Create time-based features from datetime column based on time period description.

Extracts relevant time features from the datetime column depending on the time period description in the series configuration. Creates different sets of features for monthly, daily, and weekly time series.

Parameters:
  • df (polars.LazyFrame) – LazyFrame containing time series data with datetime column

  • SERIES_CONF (dict) – Configuration dictionary containing TIME_PERIOD_DESCR

Returns:

LazyFrame with additional time feature columns

Return type:

polars.LazyFrame

Note:
  • For monthly data: adds ‘month’ and ‘year’ columns

  • For daily data: adds ‘day’, ‘month’, ‘year’, and ‘weekday’ columns

  • For weekly data: adds ‘week’ and ‘year’ columns

  • Features are extracted using Polars datetime methods

mango_time_series.utils.time_features.month_as_bspline(df: polars.DataFrame) pandas.DataFrame

Transform monthly seasonality into B-spline features for machine learning.

Converts monthly seasonal patterns into smooth B-spline features that can be used effectively in machine learning models. Creates day-of-year features and applies B-spline transformation with periodic extrapolation.

Parameters:

df (polars.DataFrame) – DataFrame containing time series data with datetime column

Returns:

DataFrame with original data plus B-spline features

Return type:

pandas.DataFrame

Note:
  • Converts Polars DataFrame to pandas for sklearn compatibility

  • Creates day_of_year feature from datetime column

  • Applies B-spline transformation with 12 knots (monthly period)

  • Uses periodic extrapolation for smooth seasonal transitions

  • Removes intermediate day_of_year column after transformation

mango_time_series.utils.time_features.custom_weights(index: pandas.DatetimeIndex) numpy.ndarray

Create custom weights for time series data with specific period exclusion.

Generates a weight array where values are set to 0 for a specific date range and 1 for all other dates. This is useful for excluding certain periods from model training or evaluation.

Parameters:

index (pandas.DatetimeIndex) – DatetimeIndex containing the dates to weight

Returns:

Array of weights (0 or 1) corresponding to each date

Return type:

numpy.ndarray

Note:
  • Sets weight to 0 for dates between 2012-06-01 and 2012-10-21

  • Sets weight to 1 for all other dates

  • Useful for excluding specific periods from analysis

  • Returns numpy array for efficient computation