AutoEncoder Modules¶

The modules package contains the core building blocks for the AutoEncoder architecture.

Encoder Module¶

The encoder module provides functions for creating different types of encoder architectures.

mango_autoencoder.modules.encoder.encoder(form: str, **kwargs)¶

Create an encoder model for different neural network architectures.

Factory function that creates encoder models of various types including Dense, RNN, GRU and LSTM encoders. The specific parameters depend on the encoder type selected.

Parameters:

form (str) – Type of encoder architecture
kwargs (dict) – Keyword arguments specific to the encoder type

Returns:

Configured encoder model

Return type:

keras.Model

Raises:

ValueError – If an invalid encoder form is specified

Example:

>>> # Create LSTM encoder
>>> lstm_enc = encoder(
...     form="lstm",
...     context_window=10,
...     features=5,
...     hidden_dim=64,
...     num_layers=2
... )
>>> # Create Dense encoder
>>> dense_enc = encoder(
...     form="dense",
...     features=10,
...     hidden_dim=[128, 64],
...     num_layers=2
... )

Decoder Module¶

The decoder module provides functions for creating different types of decoder architectures.

mango_autoencoder.modules.decoder.decoder(form: str, **kwargs)¶

Create a decoder model for different neural network architectures.

Factory function that creates decoder models of various types including Dense, RNN, GRU and LSTM decoders. The decoder reconstructs the original input from the encoded representation.

Parameters:

form (str) – Type of decoder architecture
kwargs (dict) – Keyword arguments specific to the decoder type

Returns:

Configured decoder model

Return type:

keras.Model

Raises:

ValueError – If an invalid decoder form is specified

Example:

>>> # Create LSTM decoder
>>> lstm_dec = decoder(
...     form="lstm",
...     context_window=10,
...     features=5,
...     hidden_dim=64,
...     num_layers=2
... )
>>> # Create Dense decoder
>>> dense_dec = decoder(
...     form="dense",
...     features=10,
...     hidden_dim=[64, 128],
...     num_layers=2
... )

Anomaly Detector Module¶

The anomaly detector module provides functions for analyzing reconstruction errors and detecting anomalies.

mango_autoencoder.modules.anomaly_detector.calculate_reconstruction_error(x_converted: numpy.ndarray, x_hat: numpy.ndarray) → numpy.ndarray¶

Calculate the reconstruction error matrix between original and reconstructed data.

Computes the absolute difference between original data and autoencoder reconstructed data. This is a fundamental metric for evaluating autoencoder performance and identifying reconstruction quality.

Parameters:

x_converted (np.ndarray) – Original input data array
x_hat (np.ndarray) – Reconstructed data array from autoencoder

Returns:

Matrix of absolute differences between original and reconstructed data

Return type:

np.ndarray

Raises:

ValueError – If input arrays have different shapes

Example:

>>> original = np.array([[1.0, 2.0], [3.0, 4.0]])
>>> reconstructed = np.array([[1.1, 1.9], [2.8, 4.2]])
>>> error = calculate_reconstruction_error(original, reconstructed)
>>> print(error)
[[0.1 0.1]
 [0.2 0.2]]

mango_autoencoder.modules.anomaly_detector.analyze_error_by_columns(error_matrix: numpy.ndarray, column_names: List[str] | None = None, save_path: str | None = None, show: bool = False) → pandas.DataFrame¶

Analyze reconstruction error distribution by columns with interactive Plotly visualizations.

Creates comprehensive error analysis dashboards showing error distributions, statistics, and visualizations for each feature column. Generates interactive Plotly charts for detailed error analysis.

Parameters:

error_matrix (np.ndarray) – Matrix of reconstruction errors (samples x features)
column_names (Optional[List[str]]) – List of feature names for columns
save_path (Optional[str]) – Directory path to save generated plots
show (bool) – Whether to display plots in browser

Returns:

DataFrame with error data and column names

Return type:

pd.DataFrame

Raises:

ValueError – If error_matrix is empty or if column_names length doesn’t match error_matrix columns

Example:

>>> error_matrix = np.array([[0.1, 0.2], [0.3, 0.1], [0.2, 0.4]])
>>> column_names = ["feature_1", "feature_2"]
>>> error_df = analyze_error_by_columns(
...     error_matrix,
...     column_names,
...     save_path="./plots"
... )

mango_autoencoder.modules.anomaly_detector.reconstruction_error(actual_data_df: pandas.DataFrame, autoencoder_output_df: pandas.DataFrame, threshold_factor: int = 3, split_column: str | None = 'data_split', save_path: str | None = None, filename: str = 'reconstruction_error.csv') → pandas.DataFrame¶

Calculate and optionally save reconstruction error between actual data and autoencoder output.

Computes the difference between original data and autoencoder reconstructed data, with optional data split handling and high-error feature detection. Provides warnings for features with unusually high reconstruction errors.

Parameters:

actual_data_df (pd.DataFrame) – Original input data DataFrame
autoencoder_output_df (pd.DataFrame) – Reconstructed data DataFrame from autoencoder
threshold_factor (int) – Multiplier for median reconstruction error to flag high-error features
split_column (Optional[str]) – Optional name of column that defines data split (train/val/test)
save_path (Optional[str]) – Optional directory path to save the output CSV file
filename (str) – Filename for the saved CSV file

Returns:

DataFrame with reconstruction error values

Return type:

pd.DataFrame

Raises:

ValueError – If input DataFrames have different index or columns

Example:

>>> import pandas as pd
>>> actual = pd.DataFrame({"feature1": [1, 2, 3], "feature2": [4, 5, 6]})
>>> reconstructed = pd.DataFrame({"feature1": [1.1, 1.9, 3.1], "feature2": [3.9, 5.2, 5.8]})
>>> error_df = reconstruction_error(actual, reconstructed, save_path="./results")

mango_autoencoder.modules.anomaly_detector.anova_reconstruction_error(reconstruction_error_df: pandas.DataFrame, p_value_threshold: float | None = 0.05, F_stat_threshold: float | None = None, split_column: str = 'data_split') → pandas.DataFrame¶

Perform one-way ANOVA to test if reconstruction errors vary across split_column for each feature.

Conducts statistical analysis to determine if reconstruction errors differ significantly between data splits (train/validation/test) for each feature. Uses F-test to assess variance differences and provides warnings for significant differences.

Parameters:

reconstruction_error_df (pd.DataFrame) – DataFrame with reconstruction error and split_column
p_value_threshold (float) – Maximum p-value to trigger logger warning (significance level)
F_stat_threshold (float) – Minimum F-statistic to trigger logger warning
split_column (str) – Name of column that defines data split categories

Returns:

DataFrame with F-statistics and p-values per feature

Return type:

pd.DataFrame

Raises:

ValueError – If split_column not found or has insufficient categories

Example:

>>> error_df = pd.DataFrame({
...     "feature1": [0.1, 0.2, 0.3, 0.4, 0.5],
...     "feature2": [0.2, 0.3, 0.4, 0.5, 0.6],
...     "data_split": ["train", "train", "val", "val", "test"]
... })
>>> anova_results = anova_reconstruction_error(error_df, p_value_threshold=0.01)

mango_autoencoder.modules.anomaly_detector.reconstruction_error_summary(reconstruction_error_df: pandas.DataFrame, split_column: str | None = 'data_split', split_order: List[str] | None = ['train', 'validation', 'test'], save_path: str | None = None, filename: str = 'reconstruction_error_summary.csv') → pandas.DataFrame¶

Generate and optionally save summary statistics for reconstruction error grouped by split_column (if provided and present in the DataFrame).

Creates comprehensive summary statistics including absolute mean and standard deviation for reconstruction errors, optionally grouped by data splits. Provides organized output with consistent column ordering.

Parameters:

reconstruction_error_df (pd.DataFrame) – DataFrame with reconstruction error data
split_column (Optional[str]) – Optional name of column that defines data split categories
split_order (Optional[List[str]]) – Optional order of split categories for consistent output
save_path (Optional[str]) – Optional directory path to save the summary CSV file
filename (str) – Filename to use for the saved CSV file

Returns:

Summary statistics DataFrame with mean and std error metrics

Return type:

pd.DataFrame

Raises:

ValueError – If data is empty or split_order doesn’t match actual splits

Example:

>>> error_df = pd.DataFrame({
...     "feature1": [0.1, 0.2, 0.3, 0.4],
...     "feature2": [0.2, 0.3, 0.4, 0.5],
...     "data_split": ["train", "train", "val", "val"]
... })
>>> summary = reconstruction_error_summary(error_df, save_path="./results")

mango_autoencoder.modules.anomaly_detector.std_error_threshold(reconstruction_error_df: pandas.DataFrame, std_threshold: float = 3.0, save_path: str | None = None, anomaly_mask_filename: str = 'std_anomaly_mask.csv', anomaly_proportions_filename: str = 'std_anomaly_proportions.csv') → pandas.DataFrame¶

Identify anomalies using a standard deviation threshold over reconstruction error. Considers all time series data for a given feature.

Detects anomalous data points by comparing reconstruction errors to statistical thresholds based on standard deviations from the mean. Creates boolean masks indicating anomaly locations and provides summary statistics of anomaly rates.

Parameters:

reconstruction_error_df (pd.DataFrame) – DataFrame with autoencoder reconstruction errors and ‘data_split’ column
std_threshold (float) – Threshold in terms of standard deviations from the mean error
save_path (Optional[str]) – Optional directory path to save output files
anomaly_mask_filename (str) – CSV filename for the boolean anomaly mask
anomaly_proportions_filename (str) – CSV filename for anomaly rate summary

Returns:

Boolean DataFrame mask (True = anomaly, False = normal)

Return type:

pd.DataFrame

Raises:

ValueError – If required columns are missing, if data is empty, or if std_threshold is negative

Example:

>>> error_df = pd.DataFrame({
...     "feature1": [0.1, 0.2, 0.3, 0.4, 0.5],
...     "feature2": [0.2, 0.3, 0.4, 0.5, 0.6],
...     "data_split": ["train", "train", "val", "val", "test"]
... })
>>> anomaly_mask = std_error_threshold(error_df, std_threshold=2.0, save_path="./results")

mango_autoencoder.modules.anomaly_detector.corrected_data(actual_data_df: pandas.DataFrame, autoencoder_output_df: pandas.DataFrame, anomaly_mask: pandas.DataFrame, save_path: str | None = None, filename: str = 'corrected_data.csv') → pandas.DataFrame¶

Replace anomalous values in the original data with autoencoder reconstructed values.

Creates a corrected dataset by replacing identified anomalous values with their autoencoder reconstructions. Uses a boolean mask to determine which values should be replaced and provides logging of correction counts.

Parameters:

actual_data_df (pd.DataFrame) – Original input data DataFrame
autoencoder_output_df (pd.DataFrame) – Reconstructed data DataFrame from autoencoder
anomaly_mask (pd.DataFrame) – Boolean DataFrame indicating where to apply corrections (True = replace)
save_path (Optional[str]) – Optional directory path to save the corrected data as CSV
filename (str) – Filename to use if saving the corrected data

Returns:

DataFrame with corrected data (anomalies replaced with reconstructions)

Return type:

pd.DataFrame

Raises:

ValueError – If input DataFrames have different lengths or columns

Example:

>>> import pandas as pd
>>> actual = pd.DataFrame({"feature1": [1, 2, 3], "feature2": [4, 5, 6]})
>>> reconstructed = pd.DataFrame({"feature1": [1.1, 1.9, 3.1], "feature2": [3.9, 5.2, 5.8]})
>>> mask = pd.DataFrame({"feature1": [False, True, False], "feature2": [True, False, False]})
>>> corrected = corrected_data(actual, reconstructed, mask, save_path="./results")