Time series¶
Note
This documentation is still under development. If you find any bug or have any suggestion in time series, please, open an issue in the GitHub repository.
Decomposition¶
- class mango_time_series.exploratory_analysis.decomposition.SeasonalityDecompose(fs_threshold: float = 0.64)¶
Class for time series decomposition and seasonality analysis.
Provides methods for decomposing time series into trend, seasonal, and residual components using STL (Seasonal-Trend decomposition using LOESS) and MSTL (Multiple Seasonal-Trend decomposition using LOESS) methods. Also includes functionality for detecting heteroscedasticity and measuring seasonal strength.
- static decompose_stl(series: pandas.Series, period: int)¶
Decompose time series using STL (Seasonal-Trend decomposition using LOESS).
Performs seasonal-trend decomposition using LOESS smoothing. Automatically detects heteroscedasticity and applies appropriate transformation (Box-Cox) if needed. Uses multiplicative decomposition for heteroscedastic series and additive decomposition otherwise.
- Parameters:
series (pandas.Series) – Time series data to decompose
period (int) – Seasonal period (e.g., 12 for monthly data with yearly seasonality)
- Returns:
Tuple containing (trend, seasonal, residual) components
- Return type:
tuple[pandas.Series, pandas.Series, pandas.Series]
- static decompose_mstl(series: pandas.Series, periods: list) Tuple[pandas.Series, pandas.Series, pandas.Series] ¶
Decompose time series using MSTL (Multiple Seasonal-Trend decomposition using LOESS).
Performs decomposition with multiple seasonal components simultaneously. Automatically handles Box-Cox transformation for series with positive values and uses standard decomposition for series with non-positive values.
- Parameters:
series (pandas.Series) – Time series data to decompose
periods (list[int]) – List of seasonal periods to decompose (e.g., [12, 24] for monthly and bi-monthly seasonality)
- Returns:
Tuple containing (trend, seasonal, residual) components
- Return type:
tuple[pandas.Series, pandas.Series, pandas.Series]
- static calculate_seasonal_strength(seasonal: numpy.ndarray, resid: numpy.ndarray) float ¶
Calculate the seasonal strength (Fs) based on decomposition components.
Measures the strength of seasonality in the time series using the formula: Fs = max(0, 1 - Var(Rt) / Var(St + Rt))
where: - Rt is the residual component - St is the seasonal component - Var() represents variance
Values closer to 1 indicate stronger seasonality, while values closer to 0 indicate weaker or no seasonality.
- Parameters:
seasonal (numpy.ndarray) – Seasonal component from time series decomposition
resid (numpy.ndarray) – Residual component from time series decomposition
- Returns:
Seasonal strength value between 0 and 1
- Return type:
float
- detect_seasonality(series: pandas.Series, period: int) bool ¶
Detect if the time series has significant seasonality.
Performs STL decomposition and calculates seasonal strength to determine if the series exhibits significant seasonality based on the configured threshold (fs_threshold).
- Parameters:
series (pandas.Series) – Time series data to analyze for seasonality
period (int) – Seasonal period to test (e.g., 12 for monthly data with yearly seasonality)
- Returns:
True if seasonal strength exceeds the threshold, False otherwise
- Return type:
bool
- Example:
>>> decomposer = SeasonalityDecompose(fs_threshold=0.5) >>> has_seasonality = decomposer.detect_seasonality(monthly_data, period=12)
Differentiation¶
- mango_time_series.exploratory_analysis.differentiation.differentiate_target(df, group_cols, lag) polars.DataFrame ¶
Differentiate the target variable by applying lag-based differencing.
Performs time series differentiation by calculating the difference between the current value and the value at the specified lag. This is useful for making non-stationary time series stationary by removing trends and seasonality. The original target values are preserved as ‘y_orig’ and ‘y_orig_lagged’ columns for reference.
- Parameters:
df (polars.DataFrame) – Input DataFrame containing time series data
group_cols (list[str]) – List of column names to group by for differentiation
lag (int) – Number of periods to lag for differentiation
- Returns:
DataFrame with differentiated target variable and original values preserved
- Return type:
polars.DataFrame
- Note:
The DataFrame is sorted by ‘datetime’ column before processing
Rows with null values in the differentiated target are removed
Original target values are preserved in ‘y_orig’ and ‘y_orig_lagged’ columns
- Example:
>>> df = pl.DataFrame({ ... "datetime": ["2023-01-01", "2023-01-02", "2023-01-03"], ... "y": [100, 110, 120], ... "group": ["A", "A", "A"] ... }) >>> result = differentiate_target(df, ["group"], lag=1)
Heteroscedasticity¶
- mango_time_series.exploratory_analysis.heteroscedasticity.get_optimal_lambda(series: numpy.ndarray) float ¶
Calculate the optimal Box-Cox lambda parameter for transformation.
Uses the boxcox_normmax function to find the lambda value that maximizes the normality of the transformed data. Automatically handles negative values by shifting the series to ensure all values are positive before transformation.
- Parameters:
series (numpy.ndarray) – Time series data to find optimal lambda for
- Returns:
Optimal lambda value for Box-Cox transformation
- Return type:
float
- Note:
If the series contains negative values, it is automatically shifted to ensure all values are positive before calculating lambda.
- mango_time_series.exploratory_analysis.heteroscedasticity.apply_boxcox_with_lambda(series: numpy.ndarray, lambda_value: float) numpy.ndarray ¶
Apply Box-Cox transformation using a specified lambda value.
Transforms the time series data using the Box-Cox power transformation with the provided lambda parameter. Automatically handles negative values by shifting the series to ensure all values are positive before transformation.
- Parameters:
series (numpy.ndarray) – Time series data to transform
lambda_value (float) – Lambda parameter for Box-Cox transformation
- Returns:
Transformed time series data
- Return type:
numpy.ndarray
- Note:
If the series contains negative values, it is automatically shifted to ensure all values are positive before applying the transformation.
- mango_time_series.exploratory_analysis.heteroscedasticity.detect_and_transform_heteroscedasticity(series: numpy.ndarray) Tuple[numpy.ndarray, float] ¶
Detect heteroscedasticity and apply Box-Cox transformation if needed.
Performs the Breusch-Pagan test to detect heteroscedasticity (non-constant variance) in the time series. If heteroscedasticity is detected (p-value < 0.05), applies Box-Cox transformation to stabilize the variance. Returns the original series if no transformation is needed or if the series contains non-positive values.
- Parameters:
series (numpy.ndarray) – Time series data to analyze and potentially transform
- Returns:
Tuple containing (transformed_series, lambda_value) - transformed_series: Original or transformed time series - lambda_value: Lambda used for transformation, or None if no transformation applied
- Return type:
tuple[numpy.ndarray, float or None]
- Raises:
ValueError: If the time series contains only one data point
- Note:
Series with zeros or negative values are not transformed
Uses Breusch-Pagan test with significance level of 0.05
Logs the test results and transformation decisions
Seasonal¶
- class mango_time_series.exploratory_analysis.seasonal.SeasonalityDetector(threshold_acf: float = 0.1, percentile_periodogram: float = 99)¶
Detector for identifying seasonal patterns in time series data.
Combines autocorrelation function (ACF) analysis and periodogram analysis to detect and validate seasonal patterns in time series. Uses configurable thresholds to determine significance of detected patterns.
- static detect_significant_seasonality_acf(ts: numpy.ndarray, max_lag: int = 366, acf_threshold: float = 0.2, min_repetitions: int = 2) int ¶
Detect significant seasonality using autocorrelation function analysis.
Analyzes the time series using ACF to identify seasonal patterns by finding local maxima in autocorrelation values. Validates detected periods by ensuring sufficient repetitions at period multiples, indicating true seasonality.
- Parameters:
ts (numpy.ndarray) – Time series data to analyze
max_lag (int) – Maximum lag for ACF analysis (default: 366 for yearly seasonality)
acf_threshold (float) – ACF threshold for significant peaks (default: 0.2)
min_repetitions (int) – Minimum significant multiples for valid seasonality (default: 2)
- Returns:
Most significant seasonal period, or 0 if none detected
- Return type:
int
- Note:
Identifies local maxima in ACF values as potential seasonal periods
Filters peaks above the ACF threshold and confidence intervals
Validates periods by checking for significant ACF values at multiples
Returns 0 if no valid seasonality pattern is found
- static detect_seasonality_periodogram(ts: numpy.ndarray, min_period: int = 2, max_period: int = 365) tuple[list, numpy.ndarray, numpy.ndarray] ¶
Detect seasonality using periodogram analysis.
Analyzes the power spectral density of the time series to identify significant periodic components. Filters periods within specified range, applies strict percentile thresholds, and refines peaks to avoid redundant multiples while keeping only near-integer periods.
- Parameters:
ts (numpy.ndarray) – Time series data to analyze
min_period (int) – Minimum period to consider (default: 2)
max_period (int) – Maximum period to consider (default: 365)
- Returns:
Tuple containing: - List of detected seasonal periods - Array of filtered periods - Array of filtered power spectrum values
- Return type:
tuple[list, numpy.ndarray, numpy.ndarray]
- Note:
Uses 99th percentile threshold for peak detection
Refines peaks by ensuring sufficient power difference (1.5x)
Removes redundant multiples of detected periods
Keeps only periods close to integers (tolerance: 0.05)
- detect_seasonality(ts: numpy.ndarray, max_lag: int = 366) list ¶
Detect seasonality using combined ACF and periodogram analysis.
Implements a two-step approach for robust seasonality detection: 1. Uses ACF analysis to confirm the presence of seasonality 2. Uses periodogram analysis to identify specific seasonal periods
This combination prevents false positives from periodogram analysis while ensuring accurate identification of true seasonal patterns.
- Parameters:
ts (numpy.ndarray) – Time series data to analyze
max_lag (int) – Maximum lag for ACF analysis (default: 366 for yearly seasonality)
- Returns:
Sorted list of detected seasonal periods, empty if none found
- Return type:
list
- Note:
ACF analysis validates the presence of seasonality
Periodogram analysis identifies specific periods
Automatically adjusts max_lag based on time series length
Combines results from both methods, removing duplicates
Stationary¶
- class mango_time_series.exploratory_analysis.stationary.StationaryTester(threshold: float = 0.05, fs_threshold: float = 0.64)¶
Tester for making time series stationary through differencing.
Implements a comprehensive approach to stationarity testing and transformation using ADF and KPSS tests combined with seasonal strength analysis. Applies regular and seasonal differencing iteratively until the series becomes stationary in both trend and seasonal components.
- static test_adf(series: pandas.Series) float ¶
Test stationarity using the Augmented Dickey-Fuller test.
Performs the ADF test to determine if the time series is stationary. The null hypothesis is that the series has a unit root (non-stationary).
- Parameters:
series (pandas.Series) – Time series data to test
- Returns:
P-value of the ADF test
- Return type:
float
- Note:
p-value < 0.05 typically indicates stationarity
Lower p-values suggest stronger evidence against unit root
- static test_kpss(series: pandas.Series) float | None ¶
Test stationarity using the KPSS test.
Performs the KPSS test to determine if the time series is stationary. The null hypothesis is that the series is stationary around a constant.
- Parameters:
series (pandas.Series) – Time series data to test
- Returns:
P-value of the KPSS test, or None if test fails
- Return type:
float or None
- Note:
p-value < 0.05 typically indicates non-stationarity
Handles InterpolationWarning and returns None on errors
- make_stationary(df: polars.DataFrame, target_column: str, date_column: str) Tuple[pandas.DataFrame, int, int] ¶
Transform time series to make it stationary in trend and seasonal components.
Main function that implements a comprehensive approach to making time series stationary through iterative application of regular and seasonal differencing. Uses ADF and KPSS tests combined with seasonal strength analysis to determine the appropriate transformations.
- Parameters:
df (polars.DataFrame) – Input Polars DataFrame containing time series data
target_column (str) – Name of the column containing time series values
date_column (str) – Name of the column containing dates
- Returns:
Tuple containing (transformed_DataFrame, regular_differencing_steps, seasonal_differencing_steps)
- Return type:
tuple[pandas.DataFrame, int, int]
- Note:
Returns original data if already stationary
Applies regular differencing first, then seasonal differencing
Logs final differencing parameters for reference
Converts Polars DataFrame to Pandas for processing