Time series

Note

This documentation is still under development. If you find any bug or have any suggestion in time series, please, open an issue in the GitHub repository.

Decomposition

class mango_time_series.exploratory_analysis.decomposition.SeasonalityDecompose(fs_threshold: float = 0.64)

Class for time series decomposition and seasonality analysis.

Provides methods for decomposing time series into trend, seasonal, and residual components using STL (Seasonal-Trend decomposition using LOESS) and MSTL (Multiple Seasonal-Trend decomposition using LOESS) methods. Also includes functionality for detecting heteroscedasticity and measuring seasonal strength.

static decompose_stl(series: pandas.Series, period: int)

Decompose time series using STL (Seasonal-Trend decomposition using LOESS).

Performs seasonal-trend decomposition using LOESS smoothing. Automatically detects heteroscedasticity and applies appropriate transformation (Box-Cox) if needed. Uses multiplicative decomposition for heteroscedastic series and additive decomposition otherwise.

Parameters:
  • series (pandas.Series) – Time series data to decompose

  • period (int) – Seasonal period (e.g., 12 for monthly data with yearly seasonality)

Returns:

Tuple containing (trend, seasonal, residual) components

Return type:

tuple[pandas.Series, pandas.Series, pandas.Series]

static decompose_mstl(series: pandas.Series, periods: list) Tuple[pandas.Series, pandas.Series, pandas.Series]

Decompose time series using MSTL (Multiple Seasonal-Trend decomposition using LOESS).

Performs decomposition with multiple seasonal components simultaneously. Automatically handles Box-Cox transformation for series with positive values and uses standard decomposition for series with non-positive values.

Parameters:
  • series (pandas.Series) – Time series data to decompose

  • periods (list[int]) – List of seasonal periods to decompose (e.g., [12, 24] for monthly and bi-monthly seasonality)

Returns:

Tuple containing (trend, seasonal, residual) components

Return type:

tuple[pandas.Series, pandas.Series, pandas.Series]

static calculate_seasonal_strength(seasonal: numpy.ndarray, resid: numpy.ndarray) float

Calculate the seasonal strength (Fs) based on decomposition components.

Measures the strength of seasonality in the time series using the formula: Fs = max(0, 1 - Var(Rt) / Var(St + Rt))

where: - Rt is the residual component - St is the seasonal component - Var() represents variance

Values closer to 1 indicate stronger seasonality, while values closer to 0 indicate weaker or no seasonality.

Parameters:
  • seasonal (numpy.ndarray) – Seasonal component from time series decomposition

  • resid (numpy.ndarray) – Residual component from time series decomposition

Returns:

Seasonal strength value between 0 and 1

Return type:

float

detect_seasonality(series: pandas.Series, period: int) bool

Detect if the time series has significant seasonality.

Performs STL decomposition and calculates seasonal strength to determine if the series exhibits significant seasonality based on the configured threshold (fs_threshold).

Parameters:
  • series (pandas.Series) – Time series data to analyze for seasonality

  • period (int) – Seasonal period to test (e.g., 12 for monthly data with yearly seasonality)

Returns:

True if seasonal strength exceeds the threshold, False otherwise

Return type:

bool

Example:
>>> decomposer = SeasonalityDecompose(fs_threshold=0.5)
>>> has_seasonality = decomposer.detect_seasonality(monthly_data, period=12)

Differentiation

mango_time_series.exploratory_analysis.differentiation.differentiate_target(df, group_cols, lag) polars.DataFrame

Differentiate the target variable by applying lag-based differencing.

Performs time series differentiation by calculating the difference between the current value and the value at the specified lag. This is useful for making non-stationary time series stationary by removing trends and seasonality. The original target values are preserved as ‘y_orig’ and ‘y_orig_lagged’ columns for reference.

Parameters:
  • df (polars.DataFrame) – Input DataFrame containing time series data

  • group_cols (list[str]) – List of column names to group by for differentiation

  • lag (int) – Number of periods to lag for differentiation

Returns:

DataFrame with differentiated target variable and original values preserved

Return type:

polars.DataFrame

Note:
  • The DataFrame is sorted by ‘datetime’ column before processing

  • Rows with null values in the differentiated target are removed

  • Original target values are preserved in ‘y_orig’ and ‘y_orig_lagged’ columns

Example:
>>> df = pl.DataFrame({
...     "datetime": ["2023-01-01", "2023-01-02", "2023-01-03"],
...     "y": [100, 110, 120],
...     "group": ["A", "A", "A"]
... })
>>> result = differentiate_target(df, ["group"], lag=1)

Heteroscedasticity

mango_time_series.exploratory_analysis.heteroscedasticity.get_optimal_lambda(series: numpy.ndarray) float

Calculate the optimal Box-Cox lambda parameter for transformation.

Uses the boxcox_normmax function to find the lambda value that maximizes the normality of the transformed data. Automatically handles negative values by shifting the series to ensure all values are positive before transformation.

Parameters:

series (numpy.ndarray) – Time series data to find optimal lambda for

Returns:

Optimal lambda value for Box-Cox transformation

Return type:

float

Note:

If the series contains negative values, it is automatically shifted to ensure all values are positive before calculating lambda.

mango_time_series.exploratory_analysis.heteroscedasticity.apply_boxcox_with_lambda(series: numpy.ndarray, lambda_value: float) numpy.ndarray

Apply Box-Cox transformation using a specified lambda value.

Transforms the time series data using the Box-Cox power transformation with the provided lambda parameter. Automatically handles negative values by shifting the series to ensure all values are positive before transformation.

Parameters:
  • series (numpy.ndarray) – Time series data to transform

  • lambda_value (float) – Lambda parameter for Box-Cox transformation

Returns:

Transformed time series data

Return type:

numpy.ndarray

Note:

If the series contains negative values, it is automatically shifted to ensure all values are positive before applying the transformation.

mango_time_series.exploratory_analysis.heteroscedasticity.detect_and_transform_heteroscedasticity(series: numpy.ndarray) Tuple[numpy.ndarray, float]

Detect heteroscedasticity and apply Box-Cox transformation if needed.

Performs the Breusch-Pagan test to detect heteroscedasticity (non-constant variance) in the time series. If heteroscedasticity is detected (p-value < 0.05), applies Box-Cox transformation to stabilize the variance. Returns the original series if no transformation is needed or if the series contains non-positive values.

Parameters:

series (numpy.ndarray) – Time series data to analyze and potentially transform

Returns:

Tuple containing (transformed_series, lambda_value) - transformed_series: Original or transformed time series - lambda_value: Lambda used for transformation, or None if no transformation applied

Return type:

tuple[numpy.ndarray, float or None]

Raises:

ValueError: If the time series contains only one data point

Note:
  • Series with zeros or negative values are not transformed

  • Uses Breusch-Pagan test with significance level of 0.05

  • Logs the test results and transformation decisions

Seasonal

class mango_time_series.exploratory_analysis.seasonal.SeasonalityDetector(threshold_acf: float = 0.1, percentile_periodogram: float = 99)

Detector for identifying seasonal patterns in time series data.

Combines autocorrelation function (ACF) analysis and periodogram analysis to detect and validate seasonal patterns in time series. Uses configurable thresholds to determine significance of detected patterns.

static detect_significant_seasonality_acf(ts: numpy.ndarray, max_lag: int = 366, acf_threshold: float = 0.2, min_repetitions: int = 2) int

Detect significant seasonality using autocorrelation function analysis.

Analyzes the time series using ACF to identify seasonal patterns by finding local maxima in autocorrelation values. Validates detected periods by ensuring sufficient repetitions at period multiples, indicating true seasonality.

Parameters:
  • ts (numpy.ndarray) – Time series data to analyze

  • max_lag (int) – Maximum lag for ACF analysis (default: 366 for yearly seasonality)

  • acf_threshold (float) – ACF threshold for significant peaks (default: 0.2)

  • min_repetitions (int) – Minimum significant multiples for valid seasonality (default: 2)

Returns:

Most significant seasonal period, or 0 if none detected

Return type:

int

Note:
  • Identifies local maxima in ACF values as potential seasonal periods

  • Filters peaks above the ACF threshold and confidence intervals

  • Validates periods by checking for significant ACF values at multiples

  • Returns 0 if no valid seasonality pattern is found

static detect_seasonality_periodogram(ts: numpy.ndarray, min_period: int = 2, max_period: int = 365) tuple[list, numpy.ndarray, numpy.ndarray]

Detect seasonality using periodogram analysis.

Analyzes the power spectral density of the time series to identify significant periodic components. Filters periods within specified range, applies strict percentile thresholds, and refines peaks to avoid redundant multiples while keeping only near-integer periods.

Parameters:
  • ts (numpy.ndarray) – Time series data to analyze

  • min_period (int) – Minimum period to consider (default: 2)

  • max_period (int) – Maximum period to consider (default: 365)

Returns:

Tuple containing: - List of detected seasonal periods - Array of filtered periods - Array of filtered power spectrum values

Return type:

tuple[list, numpy.ndarray, numpy.ndarray]

Note:
  • Uses 99th percentile threshold for peak detection

  • Refines peaks by ensuring sufficient power difference (1.5x)

  • Removes redundant multiples of detected periods

  • Keeps only periods close to integers (tolerance: 0.05)

detect_seasonality(ts: numpy.ndarray, max_lag: int = 366) list

Detect seasonality using combined ACF and periodogram analysis.

Implements a two-step approach for robust seasonality detection: 1. Uses ACF analysis to confirm the presence of seasonality 2. Uses periodogram analysis to identify specific seasonal periods

This combination prevents false positives from periodogram analysis while ensuring accurate identification of true seasonal patterns.

Parameters:
  • ts (numpy.ndarray) – Time series data to analyze

  • max_lag (int) – Maximum lag for ACF analysis (default: 366 for yearly seasonality)

Returns:

Sorted list of detected seasonal periods, empty if none found

Return type:

list

Note:
  • ACF analysis validates the presence of seasonality

  • Periodogram analysis identifies specific periods

  • Automatically adjusts max_lag based on time series length

  • Combines results from both methods, removing duplicates

Stationary

class mango_time_series.exploratory_analysis.stationary.StationaryTester(threshold: float = 0.05, fs_threshold: float = 0.64)

Tester for making time series stationary through differencing.

Implements a comprehensive approach to stationarity testing and transformation using ADF and KPSS tests combined with seasonal strength analysis. Applies regular and seasonal differencing iteratively until the series becomes stationary in both trend and seasonal components.

static test_adf(series: pandas.Series) float

Test stationarity using the Augmented Dickey-Fuller test.

Performs the ADF test to determine if the time series is stationary. The null hypothesis is that the series has a unit root (non-stationary).

Parameters:

series (pandas.Series) – Time series data to test

Returns:

P-value of the ADF test

Return type:

float

Note:
  • p-value < 0.05 typically indicates stationarity

  • Lower p-values suggest stronger evidence against unit root

static test_kpss(series: pandas.Series) float | None

Test stationarity using the KPSS test.

Performs the KPSS test to determine if the time series is stationary. The null hypothesis is that the series is stationary around a constant.

Parameters:

series (pandas.Series) – Time series data to test

Returns:

P-value of the KPSS test, or None if test fails

Return type:

float or None

Note:
  • p-value < 0.05 typically indicates non-stationarity

  • Handles InterpolationWarning and returns None on errors

make_stationary(df: polars.DataFrame, target_column: str, date_column: str) Tuple[pandas.DataFrame, int, int]

Transform time series to make it stationary in trend and seasonal components.

Main function that implements a comprehensive approach to making time series stationary through iterative application of regular and seasonal differencing. Uses ADF and KPSS tests combined with seasonal strength analysis to determine the appropriate transformations.

Parameters:
  • df (polars.DataFrame) – Input Polars DataFrame containing time series data

  • target_column (str) – Name of the column containing time series values

  • date_column (str) – Name of the column containing dates

Returns:

Tuple containing (transformed_DataFrame, regular_differencing_steps, seasonal_differencing_steps)

Return type:

tuple[pandas.DataFrame, int, int]

Note:
  • Returns original data if already stationary

  • Applies regular differencing first, then seasonal differencing

  • Logs final differencing parameters for reference

  • Converts Polars DataFrame to Pandas for processing