Utils¶
Note
This documentation is still under development. If you find any bug or have any suggestion in utils, please, open an issue in the GitHub repository.
Moving averages¶
- mango_time_series.utils.moving_averages.create_season_unit_previous_date(df: polars.DataFrame, time_unit: str) polars.DataFrame ¶
Create previous date mapping for seasonal unit analysis.
Calculates the previous occurrence date for each seasonal unit (day of week) relative to the forecast origin. This is used for creating seasonal variables that reference the same day of week from previous periods.
- Parameters:
df (polars.DataFrame) – DataFrame containing forecast_origin and season_unit columns
time_unit (str) – Time unit string (e.g., ‘d’ for days, ‘w’ for weeks)
- Returns:
DataFrame with previous_dow_date column added
- Return type:
polars.DataFrame
- Note:
Calculates weekday difference between season_unit and forecast_origin
Adjusts for week boundaries (handles day-of-week wrapping)
Creates offset dates for seasonal variable creation
- mango_time_series.utils.moving_averages.create_recent_variables(df: polars.LazyFrame, SERIES_CONF: dict, window: list, lags: list, gap: int = 0, freq: int = 1, colname: str = '', season_unit: bool = False) polars.LazyFrame ¶
Create rolling averages and lag variables for time series forecasting.
Generates rolling average and lag features for time series data, with optional seasonal unit grouping. Creates multiple columns for different window sizes and lag periods, grouped by the key columns specified in SERIES_CONF.
- Parameters:
df (polars.LazyFrame) – Input LazyFrame containing time series data
SERIES_CONF (dict) – Configuration dictionary containing KEY_COLS and TIME_PERIOD
window (list) – List of window sizes for rolling averages
lags (list) – List of lag periods to create
gap (int) – Number of periods to skip before starting window (default: 0)
freq (int) – Frequency multiplier for lag calculations (default: 1)
colname (str) – Prefix for new column names (default: “”)
season_unit (bool) – Whether to group by seasonal unit (default: False)
- Returns:
LazyFrame with new rolling average and lag columns
- Return type:
polars.LazyFrame
- Note:
Rolling averages exclude current row (shifted by gap)
Lag variables are shifted by (gap - 1 + lag * freq)
Seasonal unit mode uses previous day-of-week dates for alignment
Column naming: y_{colname}roll_{window} and y_{colname}lag_{lag*freq}
- mango_time_series.utils.moving_averages.create_seasonal_variables(df: polars.LazyFrame, SERIES_CONF: dict, window: list, lags: list, season_unit: str, freq: int, gap: int = 0) polars.LazyFrame ¶
Create seasonal rolling averages and lag variables.
Generates seasonal features by creating rolling averages and lag variables grouped by seasonal units (day of week, week, or month). Automatically extracts the appropriate seasonal unit from datetime and creates variables that capture seasonal patterns in the time series.
- Parameters:
df (polars.LazyFrame) – Input LazyFrame containing time series data with datetime column
SERIES_CONF (dict) – Configuration dictionary containing KEY_COLS and TIME_PERIOD
window (list) – List of window sizes for rolling averages
lags (list) – List of lag periods to create
season_unit (str) – Seasonal unit type (‘day’, ‘week’, or ‘month’)
freq (int) – Frequency multiplier for lag calculations
gap (int) – Number of periods to skip before starting window (default: 0)
- Returns:
LazyFrame with seasonal rolling average and lag columns
- Return type:
polars.LazyFrame
- Note:
Extracts seasonal unit from datetime column based on season_unit parameter
Uses
sea_
prefix for seasonal variable column namesRemoves season_unit column after processing
Supports day (weekday), week, and month seasonal units
Processing time series¶
- mango_time_series.utils.processing_time_series.aggregate_to_input(df: pandas.DataFrame, freq: str, series_conf: dict) pandas.DataFrame ¶
Aggregate time series data to specified frequency using pandas.
Groups the data by key columns and time frequency, then applies aggregation operations defined in the series configuration.
- Parameters:
df (pandas.DataFrame) – DataFrame containing time series data
freq (str) – Frequency string for aggregation (e.g., ‘D’, ‘W’, ‘M’)
series_conf (dict) – Configuration dictionary containing KEY_COLS and AGG_OPERATIONS
- Returns:
Aggregated DataFrame with specified frequency
- Return type:
pandas.DataFrame
- Note:
Uses pandas Grouper for time-based grouping
Applies aggregation operations from series_conf[“AGG_OPERATIONS”]
Groups by both key columns and time frequency
- mango_time_series.utils.processing_time_series.aggregate_to_input_pl(df: pandas.DataFrame, freq: str, series_conf: dict) pandas.DataFrame ¶
Aggregate time series data to specified frequency using Polars.
Converts pandas DataFrame to Polars, groups by key columns and time frequency, applies aggregation operations, then converts back to pandas.
- Parameters:
df (pandas.DataFrame) – DataFrame containing time series data
freq (str) – Frequency string for aggregation (e.g., ‘D’, ‘W’, ‘M’)
series_conf (dict) – Configuration dictionary containing KEY_COLS and AGG_OPERATIONS
- Returns:
Aggregated DataFrame with specified frequency
- Return type:
pandas.DataFrame
- Note:
Converts pandas to Polars for processing, then back to pandas
Handles month frequency conversion (‘m’, ‘MS’, ‘ME’ -> ‘mo’)
Uses Polars datetime truncate for time-based grouping
Supports sum, mean, min, max, median aggregation operations
- mango_time_series.utils.processing_time_series.aggregate_to_input_pllazy(df: polars.LazyFrame, freq: str, series_conf: dict) polars.LazyFrame ¶
Aggregate time series data to specified frequency using Polars LazyFrame.
Groups LazyFrame by key columns and time frequency, applies aggregation operations, and returns sorted result as LazyFrame.
- Parameters:
df (polars.LazyFrame) – LazyFrame containing time series data
freq (str) – Frequency string for aggregation (e.g., ‘D’, ‘W’, ‘M’)
series_conf (dict) – Configuration dictionary containing KEY_COLS and AGG_OPERATIONS
- Returns:
Aggregated LazyFrame with specified frequency
- Return type:
polars.LazyFrame
- Note:
Works with Polars LazyFrame for efficient processing
Uses Polars datetime truncate for time-based grouping
Sorts result by key columns and datetime
Supports sum, mean, min, max, median aggregation operations
- mango_time_series.utils.processing_time_series.rename_to_common_ts_names(df: pandas.DataFrame, time_col: str, value_col: str) pandas.DataFrame ¶
Rename columns to standard time series naming convention.
Standardizes column names by renaming time and value columns to ‘datetime’ and ‘y’ respectively, and converts all column names to lowercase.
- Parameters:
df (pandas.DataFrame) – DataFrame to rename columns in
time_col (str) – Name of the time/datetime column
value_col (str) – Name of the value/target column
- Returns:
DataFrame with standardized column names
- Return type:
pandas.DataFrame
- Note:
Renames time_col to ‘datetime’ and value_col to ‘y’
Converts all column names to lowercase
Standardizes naming for time series processing pipeline
- mango_time_series.utils.processing_time_series.rename_to_common_ts_names_pl(df: polars.LazyFrame, time_col: str, value_col: str) polars.LazyFrame ¶
Rename columns to standard time series naming convention using Polars.
Standardizes column names by renaming time and value columns to ‘datetime’ and ‘y’ respectively, and converts all column names to lowercase.
- Parameters:
df (polars.LazyFrame) – LazyFrame to rename columns in
time_col (str) – Name of the time/datetime column
value_col (str) – Name of the value/target column
- Returns:
LazyFrame with standardized column names
- Return type:
polars.LazyFrame
- Note:
Renames time_col to ‘datetime’ and value_col to ‘y’
Converts all column names to lowercase using with_columns
Standardizes naming for time series processing pipeline
- mango_time_series.utils.processing_time_series.drop_negative_output(df: pandas.DataFrame) pandas.DataFrame ¶
Remove rows with negative values from time series data.
Filters out rows where the target variable (y) has negative values, which are typically not meaningful for sales or demand forecasting.
- Parameters:
df (pandas.DataFrame) – DataFrame containing time series data with ‘y’ column
- Returns:
DataFrame with negative value rows removed
- Return type:
pandas.DataFrame
- Note:
Removes rows where y < 0
Logs the number of rows being dropped
Preserves all other data and columns
- mango_time_series.utils.processing_time_series.drop_negative_output_pl(df: polars.LazyFrame) polars.LazyFrame ¶
Remove rows with negative values from time series data using Polars.
Filters out rows where the target variable (y) has negative values, which are typically not meaningful for sales or demand forecasting. Uses lazy evaluation for efficient processing.
- Parameters:
df (polars.LazyFrame) – LazyFrame containing time series data with ‘y’ column
- Returns:
LazyFrame with negative value rows removed
- Return type:
polars.LazyFrame
- Note:
Removes rows where y < 0 using filter operation
Logs the number of rows being dropped
Uses lazy evaluation for memory efficiency
Preserves all other data and columns
- mango_time_series.utils.processing_time_series.add_covid_mark(df: polars.LazyFrame) polars.LazyFrame ¶
Add COVID period indicator column to time series data.
Creates a boolean column ‘covid’ that marks the COVID-19 pandemic period from March 2020 to March 2021, which can be used for analysis or modeling.
- Parameters:
df (polars.LazyFrame) – LazyFrame containing time series data with datetime column
- Returns:
LazyFrame with added ‘covid’ boolean column
- Return type:
polars.LazyFrame
- Note:
COVID period: 2020-03-01 to 2021-03-01
Creates boolean column where True indicates COVID period
Useful for identifying pandemic impact on time series patterns
- mango_time_series.utils.processing_time_series.create_lags_col(df: pandas.DataFrame, col: str, lags: List[int], key_cols: List[str] = None, check_col: List[str] = None) pandas.DataFrame ¶
Create lagged columns for time series feature engineering.
Generates lagged versions of a specified column for time series analysis. Supports both positive lags (past values) and negative lags (future values/leads). Can handle multiple time series by grouping on key columns and optionally handle discontinuities by checking for changes in specified columns.
- Parameters:
df (pandas.DataFrame) – DataFrame to add lagged columns to
col (str) – Name of the column to create lags for
lags (list[int]) – List of lag values (positive for past, negative for future)
key_cols (list[str], optional) – Columns that define each time series (for grouping)
check_col (list[str], optional) – Columns to check for discontinuities in lagged series
- Returns:
DataFrame with lagged columns added
- Return type:
pandas.DataFrame
- Note:
Positive lags create columns named ‘{col}_lag{lag}’
Negative lags create columns named ‘{col}_lead{abs(lag)}’
Groups by key_cols if provided, otherwise treats as single series
Sets lag values to NaN when discontinuities detected in check_col
Requires pandas and numpy to be installed
- mango_time_series.utils.processing_time_series.series_as_columns(df: pandas.DataFrame, series_conf: dict) pandas.DataFrame ¶
Pivot time series data to wide format with series as columns.
Transforms long-format time series data into wide format where each unique combination of key columns becomes a separate column. Uses sum aggregation for overlapping values.
- Parameters:
df (pandas.DataFrame) – DataFrame in long format with time series data
series_conf (dict) – Configuration dictionary containing KEY_COLS and VALUE_COL
- Returns:
DataFrame in wide format with series as columns
- Return type:
pandas.DataFrame
- Note:
Pivots using datetime as index and key_cols as columns
Uses sum aggregation for overlapping values
Flattens multi-level column names with underscore separator
Renames first column back to ‘datetime’
- mango_time_series.utils.processing_time_series.series_as_rows(df: pandas.DataFrame, series_conf: dict) pandas.DataFrame ¶
Pivot time series data to long format with series as rows.
Transforms wide-format time series data into long format where each time series becomes a separate row. Splits column names to reconstruct the original key columns.
- Parameters:
df (pandas.DataFrame) – DataFrame in wide format with series as columns
series_conf (dict) – Configuration dictionary containing KEY_COLS and VALUE_COL
- Returns:
DataFrame in long format with series as rows
- Return type:
pandas.DataFrame
- Note:
Uses melt to transform wide to long format
Splits column names by underscore to reconstruct key columns
Arranges columns in order: datetime, key_cols, value_col
Removes temporary ‘id’ column after processing
- mango_time_series.utils.processing_time_series.process_time_series(df: pandas.DataFrame | polars.DataFrame | polars.LazyFrame, series_conf: Dict) polars.LazyFrame ¶
Process time series data through complete preprocessing pipeline.
Applies a comprehensive preprocessing pipeline including column standardization, negative value removal, aggregation, dense data creation, and COVID marking. Converts input to Polars LazyFrame for efficient processing.
- Parameters:
df (Union[pandas.DataFrame, polars.DataFrame, polars.LazyFrame]) – Input DataFrame (pandas, Polars, or LazyFrame)
series_conf (dict) – Configuration dictionary containing processing parameters
- Returns:
Processed LazyFrame with standardized time series data
- Return type:
polars.LazyFrame
- Note:
Converts input to LazyFrame for processing
Standardizes column names (TIME_COL -> datetime, VALUE_COL -> y)
Removes negative values from target variable
Aggregates to daily frequency, then to specified frequency
Creates dense data with complete date ranges
Adds COVID period indicator column
- mango_time_series.utils.processing_time_series.create_dense_data(df: pandas.DataFrame, id_cols: List[str], freq: str, min_max_by_id: bool = None, date_init: str = None, date_end: str = None, time_col: str = 'timeslot') pandas.DataFrame ¶
Create dense time series data with complete date ranges.
Expands sparse time series data to include all dates within the specified frequency, filling missing dates with NaN values. Can use global date range or individual date ranges per ID group.
- Parameters:
df (pandas.DataFrame) – DataFrame to expand with complete date ranges
id_cols (list[str]) – List of columns that identify unique time series
freq (str) – Frequency string for date range generation (e.g., ‘D’, ‘W’, ‘M’)
min_max_by_id (bool, optional) – Whether to use individual min/max dates per ID (default: None)
date_init (str, optional) – Override start date for all series (default: None)
date_end (str, optional) – Override end date for all series (default: None)
time_col (str) – Name of the time column (default: “timeslot”)
- Returns:
DataFrame with complete date ranges and original data merged
- Return type:
pandas.DataFrame
- Note:
Creates Cartesian product of all dates and ID combinations
Fills missing dates with NaN values
Requires pandas to be installed
Uses cross join to create complete date grid
- mango_time_series.utils.processing_time_series.create_dense_data_pl(df: polars.LazyFrame, id_cols: List[str], freq: str, min_max_by_id: bool = None, date_init: str = None, date_end: str = None, time_col: str = 'timeslot') polars.LazyFrame ¶
Create dense time series data with complete date ranges using Polars.
Expands sparse time series data to include all dates within the specified frequency, filling missing dates with null values. Uses Polars for efficient processing with lazy evaluation.
- Parameters:
df (polars.LazyFrame) – LazyFrame to expand with complete date ranges
id_cols (list[str]) – List of columns that identify unique time series
freq (str) – Frequency string for date range generation (e.g., ‘D’, ‘W’, ‘M’)
min_max_by_id (bool, optional) – Whether to use individual min/max dates per ID (default: None)
date_init (str, optional) – Override start date for all series (default: None)
date_end (str, optional) – Override end date for all series (default: None)
time_col (str) – Name of the time column (default: “timeslot”)
- Returns:
LazyFrame with complete date ranges and original data merged
- Return type:
polars.LazyFrame
- Note:
Collects LazyFrame for date range calculations
Creates cross join of all dates and ID combinations
Converts time column to datetime format
Returns LazyFrame for efficient processing
- mango_time_series.utils.processing_time_series.create_dense_data_pllazy(df: polars.LazyFrame, id_cols, freq: str, min_max_by_id: bool = None, date_init=None, date_end=None, time_col: str = 'timeslot') polars.LazyFrame ¶
- Create a dense dataframe with a frequency of freq, given range of dates or inherited from the dataframe,
using the id_cols as keys.
- Parameters:
df – dataframe to be expanded
id_cols – list of columns to be used as keys
freq – frequency of the new dataframe
min_max_by_id – boolean to indicate if the range of dates is the min and max of the dataframe by id
date_init – if it has a value, all initial dates will be set to this value
date_end – if it has a value, all final dates will be set to this value
time_col – string with name of the column with the time information
- Returns:
dataframe with all the dates using the frequency freq
- mango_time_series.utils.processing_time_series.create_recurrent_dataset(data: numpy.ndarray, look_back: int, include_output_lags: bool = False, lags: List[int] = None, output_last: bool = True) tuple[numpy.ndarray, numpy.ndarray] ¶
Create dataset for recurrent neural networks with time series sequences.
Transforms 2D time series data into 3D sequences suitable for RNN training. Creates sliding windows of historical data as input and corresponding future values as targets. Supports optional output lag features and flexible target column positioning.
- Parameters:
data (numpy.ndarray) – 2D array with shape (num_samples, num_features)
look_back (int) – Number of previous time steps to use as input
include_output_lags (bool) – Whether to include output lag features (default: False)
lags (list[int], optional) – List of lag periods to include as features (default: None)
output_last (bool) – Whether target is last column (True) or first column (False)
- Returns:
Tuple containing (input_sequences, target_values)
- Return type:
tuple[numpy.ndarray, numpy.ndarray]
- Note:
Input shape: (num_samples, look_back, num_features)
Output shape: (num_samples,)
Supports both positive and negative lags
Handles output column positioning (first vs last)
Creates sequences with proper temporal alignment
- mango_time_series.utils.processing_time_series.get_corr_matrix(df: pandas.DataFrame, n_top: int = None, threshold: float = None, date_col: str = None, years_corr: List[int] = None, subset: List[str] = None) Dict[str, Dict[str, float]] ¶
Calculate correlation matrix for time series data with filtering options.
Computes Pearson correlation matrix for time series data with various filtering and selection options. Automatically detects date columns and validates data format. Returns top correlations or correlations above threshold for each time series.
- Parameters:
df (pandas.DataFrame) – DataFrame containing time series data
n_top (int, optional) – Number of top correlations to return per series (default: None)
threshold (float, optional) – Minimum correlation threshold to filter results (default: None)
date_col (str, optional) – Name of the date column (auto-detected if None)
years_corr (list[int], optional) – List of years to filter data for correlation calculation (default: None)
subset (list[str], optional) – List of series names to focus correlation analysis on (default: None)
- Returns:
Dictionary with series names as keys and correlation dictionaries as values
- Return type:
dict[str, dict[str, float]]
- Note:
Automatically detects datetime columns or uses index
Validates data format and raises errors for inconsistencies
Sets diagonal correlations to -100 to avoid self-correlation
Returns all correlations if both n_top and threshold are None
- mango_time_series.utils.processing_time_series.get_date_col_candidate(df: pandas.DataFrame) tuple[List[str] | None, bool] ¶
Identify datetime columns in DataFrame for time series analysis.
Searches for datetime columns in the DataFrame and determines whether the index contains datetime information. Returns the datetime column names and a boolean indicating if the index is datetime-based.
- Parameters:
df (pandas.DataFrame) – DataFrame to analyze for datetime columns
- Returns:
Tuple containing (datetime_columns, index_is_datetime)
- Return type:
tuple[list[str] or None, bool]
- Note:
Returns None for datetime_columns if no datetime columns found
Returns True for index_is_datetime if index is DatetimeIndex
Searches all columns for datetime64 dtypes
Used by correlation functions for automatic date detection
- mango_time_series.utils.processing_time_series.raise_if_inconsistency(df: pandas.DataFrame, date_col: str, as_index: bool) None ¶
Validate DataFrame format for time series correlation analysis.
Performs comprehensive validation of DataFrame structure to ensure it’s suitable for correlation analysis. Checks for datetime columns, duplicate indices, numeric columns, and proper pivoted format.
- Parameters:
df (pandas.DataFrame) – DataFrame to validate
date_col (str) – Name of the date column (or None if using index)
as_index (bool) – Whether the DataFrame uses datetime index
- Returns:
None (raises ValueError if validation fails)
- Return type:
None
- Raises:
ValueError: If DataFrame format is inconsistent or invalid
- Note:
Validates presence of exactly one datetime column
Checks for duplicate indices in time series data
Ensures all non-datetime columns are numeric
Provides example format in error messages
Used by correlation functions for data validation
- mango_time_series.utils.processing_time_series.get_corr_matrix_aux(df: pandas.DataFrame, years_corr: List[int] = None, n_top: int = None, threshold: float = None, subset: List[str] = None) Dict[str, Dict[str, float]] ¶
Compute correlation matrix with filtering and selection options.
Calculates Pearson correlation matrix for time series data and returns filtered results based on various criteria. Supports year filtering, top N correlations, threshold filtering, and subset analysis.
- Parameters:
df (pandas.DataFrame) – DataFrame with datetime index and numeric columns
years_corr (list[int], optional) – List of years to filter data for correlation calculation (default: None)
n_top (int, optional) – Number of top correlations to return per series (default: None)
threshold (float, optional) – Minimum correlation threshold to filter results (default: None)
subset (list[str], optional) – List of series names to focus correlation analysis on (default: None)
- Returns:
Dictionary with series names as keys and correlation dictionaries as values
- Return type:
dict[str, dict[str, float]]
- Note:
Filters data by specified years before correlation calculation
Sets diagonal correlations to -100 to avoid self-correlation
Supports subset analysis for focused correlation studies
Returns all correlations if both n_top and threshold are None
Issues warnings for edge cases (no threshold matches, etc.)
Tabular structure¶
- mango_time_series.utils.tabular_structure.create_tabular_structure(df: pandas.DataFrame, horizon: int, SERIES_CONF: dict) pandas.DataFrame ¶
Create tabular structure for time series forecasting with multiple horizons.
Transforms time series data into a tabular format suitable for machine learning by creating all possible combinations of forecast origins and horizons. Each row represents a forecast point with its corresponding horizon.
- Parameters:
df (pandas.DataFrame) – DataFrame containing time series data with datetime column
horizon (int) – Maximum forecast horizon to create (e.g., 7 for 7-day ahead forecasts)
SERIES_CONF (dict) – Configuration dictionary containing KEY_COLS and TIME_PERIOD settings
- Returns:
DataFrame with tabular structure including horizon and forecast_origin columns
- Return type:
pandas.DataFrame
- Note:
Creates Cartesian product of original data with horizon range (1 to horizon)
Calculates forecast_origin by subtracting horizon from datetime
Handles different time periods (months vs other units)
Sorts result by key columns and datetime
Each row represents one forecast point for one horizon
- mango_time_series.utils.tabular_structure.create_tabular_structure_pl(df: polars.LazyFrame, horizon: int, SERIES_CONF: dict) polars.LazyFrame ¶
Create tabular structure for time series forecasting using Polars.
Transforms time series data into a tabular format suitable for machine learning using Polars LazyFrame for efficient processing. Creates all possible combinations of forecast origins and horizons with optimized operations.
- Parameters:
df (polars.LazyFrame) – LazyFrame containing time series data with datetime column
horizon (int) – Maximum forecast horizon to create (e.g., 7 for 7-day ahead forecasts)
SERIES_CONF (dict) – Configuration dictionary containing TIME_PERIOD settings
- Returns:
LazyFrame with tabular structure including horizon and forecast_origin columns
- Return type:
polars.LazyFrame
- Note:
Uses Polars cross join for efficient Cartesian product creation
Extracts time unit from TIME_PERIOD configuration
Calculates forecast_origin using Polars datetime offset operations
Maintains lazy evaluation for memory efficiency
Each row represents one forecast point for one horizon
Time features¶
- mango_time_series.utils.time_features.create_time_features(df: polars.LazyFrame, SERIES_CONF: dict) polars.LazyFrame ¶
Create time-based features from datetime column based on time period description.
Extracts relevant time features from the datetime column depending on the time period description in the series configuration. Creates different sets of features for monthly, daily, and weekly time series.
- Parameters:
df (polars.LazyFrame) – LazyFrame containing time series data with datetime column
SERIES_CONF (dict) – Configuration dictionary containing TIME_PERIOD_DESCR
- Returns:
LazyFrame with additional time feature columns
- Return type:
polars.LazyFrame
- Note:
For monthly data: adds ‘month’ and ‘year’ columns
For daily data: adds ‘day’, ‘month’, ‘year’, and ‘weekday’ columns
For weekly data: adds ‘week’ and ‘year’ columns
Features are extracted using Polars datetime methods
- mango_time_series.utils.time_features.month_as_bspline(df: polars.DataFrame) pandas.DataFrame ¶
Transform monthly seasonality into B-spline features for machine learning.
Converts monthly seasonal patterns into smooth B-spline features that can be used effectively in machine learning models. Creates day-of-year features and applies B-spline transformation with periodic extrapolation.
- Parameters:
df (polars.DataFrame) – DataFrame containing time series data with datetime column
- Returns:
DataFrame with original data plus B-spline features
- Return type:
pandas.DataFrame
- Note:
Converts Polars DataFrame to pandas for sklearn compatibility
Creates day_of_year feature from datetime column
Applies B-spline transformation with 12 knots (monthly period)
Uses periodic extrapolation for smooth seasonal transitions
Removes intermediate day_of_year column after transformation
- mango_time_series.utils.time_features.custom_weights(index: pandas.DatetimeIndex) numpy.ndarray ¶
Create custom weights for time series data with specific period exclusion.
Generates a weight array where values are set to 0 for a specific date range and 1 for all other dates. This is useful for excluding certain periods from model training or evaluation.
- Parameters:
index (pandas.DatetimeIndex) – DatetimeIndex containing the dates to weight
- Returns:
Array of weights (0 or 1) corresponding to each date
- Return type:
numpy.ndarray
- Note:
Sets weight to 0 for dates between 2012-06-01 and 2012-10-21
Sets weight to 1 for all other dates
Useful for excluding specific periods from analysis
Returns numpy array for efficient computation