AutoEncoder Modules

The modules package contains the core building blocks for the AutoEncoder architecture.

Encoder Module

The encoder module provides functions for creating different types of encoder architectures.

mango_autoencoder.modules.encoder.encoder(form: str, **kwargs)

Create an encoder model for different neural network architectures.

Factory function that creates encoder models of various types including Dense, RNN, GRU and LSTM encoders. The specific parameters depend on the encoder type selected.

Parameters:
  • form (str) – Type of encoder architecture

  • kwargs (dict) – Keyword arguments specific to the encoder type

Returns:

Configured encoder model

Return type:

keras.Model

Raises:

ValueError – If an invalid encoder form is specified

Example:
>>> # Create LSTM encoder
>>> lstm_enc = encoder(
...     form="lstm",
...     context_window=10,
...     features=5,
...     hidden_dim=64,
...     num_layers=2
... )
>>> # Create Dense encoder
>>> dense_enc = encoder(
...     form="dense",
...     features=10,
...     hidden_dim=[128, 64],
...     num_layers=2
... )

Decoder Module

The decoder module provides functions for creating different types of decoder architectures.

mango_autoencoder.modules.decoder.decoder(form: str, **kwargs)

Create a decoder model for different neural network architectures.

Factory function that creates decoder models of various types including Dense, RNN, GRU and LSTM decoders. The decoder reconstructs the original input from the encoded representation.

Parameters:
  • form (str) – Type of decoder architecture

  • kwargs (dict) – Keyword arguments specific to the decoder type

Returns:

Configured decoder model

Return type:

keras.Model

Raises:

ValueError – If an invalid decoder form is specified

Example:
>>> # Create LSTM decoder
>>> lstm_dec = decoder(
...     form="lstm",
...     context_window=10,
...     features=5,
...     hidden_dim=64,
...     num_layers=2
... )
>>> # Create Dense decoder
>>> dense_dec = decoder(
...     form="dense",
...     features=10,
...     hidden_dim=[64, 128],
...     num_layers=2
... )

Anomaly Detector Module

The anomaly detector module provides functions for analyzing reconstruction errors and detecting anomalies.

mango_autoencoder.modules.anomaly_detector.calculate_reconstruction_error(x_converted: numpy.ndarray, x_hat: numpy.ndarray) numpy.ndarray

Calculate the reconstruction error matrix between original and reconstructed data.

Computes the absolute difference between original data and autoencoder reconstructed data. This is a fundamental metric for evaluating autoencoder performance and identifying reconstruction quality.

Parameters:
  • x_converted (np.ndarray) – Original input data array

  • x_hat (np.ndarray) – Reconstructed data array from autoencoder

Returns:

Matrix of absolute differences between original and reconstructed data

Return type:

np.ndarray

Raises:

ValueError – If input arrays have different shapes

Example:
>>> original = np.array([[1.0, 2.0], [3.0, 4.0]])
>>> reconstructed = np.array([[1.1, 1.9], [2.8, 4.2]])
>>> error = calculate_reconstruction_error(original, reconstructed)
>>> print(error)
[[0.1 0.1]
 [0.2 0.2]]
mango_autoencoder.modules.anomaly_detector.analyze_error_by_columns(error_matrix: numpy.ndarray, column_names: List[str] | None = None, save_path: str | None = None, show: bool = False) pandas.DataFrame

Analyze reconstruction error distribution by columns with interactive Plotly visualizations.

Creates comprehensive error analysis dashboards showing error distributions, statistics, and visualizations for each feature column. Generates interactive Plotly charts for detailed error analysis.

Parameters:
  • error_matrix (np.ndarray) – Matrix of reconstruction errors (samples x features)

  • column_names (Optional[List[str]]) – List of feature names for columns

  • save_path (Optional[str]) – Directory path to save generated plots

  • show (bool) – Whether to display plots in browser

Returns:

DataFrame with error data and column names

Return type:

pd.DataFrame

Raises:

ValueError – If error_matrix is empty or if column_names length doesn’t match error_matrix columns

Example:
>>> error_matrix = np.array([[0.1, 0.2], [0.3, 0.1], [0.2, 0.4]])
>>> column_names = ["feature_1", "feature_2"]
>>> error_df = analyze_error_by_columns(
...     error_matrix,
...     column_names,
...     save_path="./plots"
... )
mango_autoencoder.modules.anomaly_detector.reconstruction_error(actual_data_df: pandas.DataFrame, autoencoder_output_df: pandas.DataFrame, threshold_factor: int = 3, split_column: str | None = 'data_split', save_path: str | None = None, filename: str = 'reconstruction_error.csv') pandas.DataFrame

Calculate and optionally save reconstruction error between actual data and autoencoder output.

Computes the difference between original data and autoencoder reconstructed data, with optional data split handling and high-error feature detection. Provides warnings for features with unusually high reconstruction errors.

Parameters:
  • actual_data_df (pd.DataFrame) – Original input data DataFrame

  • autoencoder_output_df (pd.DataFrame) – Reconstructed data DataFrame from autoencoder

  • threshold_factor (int) – Multiplier for median reconstruction error to flag high-error features

  • split_column (Optional[str]) – Optional name of column that defines data split (train/val/test)

  • save_path (Optional[str]) – Optional directory path to save the output CSV file

  • filename (str) – Filename for the saved CSV file

Returns:

DataFrame with reconstruction error values

Return type:

pd.DataFrame

Raises:

ValueError – If input DataFrames have different index or columns

Example:
>>> import pandas as pd
>>> actual = pd.DataFrame({"feature1": [1, 2, 3], "feature2": [4, 5, 6]})
>>> reconstructed = pd.DataFrame({"feature1": [1.1, 1.9, 3.1], "feature2": [3.9, 5.2, 5.8]})
>>> error_df = reconstruction_error(actual, reconstructed, save_path="./results")
mango_autoencoder.modules.anomaly_detector.anova_reconstruction_error(reconstruction_error_df: pandas.DataFrame, p_value_threshold: float | None = 0.05, F_stat_threshold: float | None = None, split_column: str = 'data_split') pandas.DataFrame

Perform one-way ANOVA to test if reconstruction errors vary across split_column for each feature.

Conducts statistical analysis to determine if reconstruction errors differ significantly between data splits (train/validation/test) for each feature. Uses F-test to assess variance differences and provides warnings for significant differences.

Parameters:
  • reconstruction_error_df (pd.DataFrame) – DataFrame with reconstruction error and split_column

  • p_value_threshold (float) – Maximum p-value to trigger logger warning (significance level)

  • F_stat_threshold (float) – Minimum F-statistic to trigger logger warning

  • split_column (str) – Name of column that defines data split categories

Returns:

DataFrame with F-statistics and p-values per feature

Return type:

pd.DataFrame

Raises:

ValueError – If split_column not found or has insufficient categories

Example:
>>> error_df = pd.DataFrame({
...     "feature1": [0.1, 0.2, 0.3, 0.4, 0.5],
...     "feature2": [0.2, 0.3, 0.4, 0.5, 0.6],
...     "data_split": ["train", "train", "val", "val", "test"]
... })
>>> anova_results = anova_reconstruction_error(error_df, p_value_threshold=0.01)
mango_autoencoder.modules.anomaly_detector.reconstruction_error_summary(reconstruction_error_df: pandas.DataFrame, split_column: str | None = 'data_split', split_order: List[str] | None = ['train', 'validation', 'test'], save_path: str | None = None, filename: str = 'reconstruction_error_summary.csv') pandas.DataFrame

Generate and optionally save summary statistics for reconstruction error grouped by split_column (if provided and present in the DataFrame).

Creates comprehensive summary statistics including absolute mean and standard deviation for reconstruction errors, optionally grouped by data splits. Provides organized output with consistent column ordering.

Parameters:
  • reconstruction_error_df (pd.DataFrame) – DataFrame with reconstruction error data

  • split_column (Optional[str]) – Optional name of column that defines data split categories

  • split_order (Optional[List[str]]) – Optional order of split categories for consistent output

  • save_path (Optional[str]) – Optional directory path to save the summary CSV file

  • filename (str) – Filename to use for the saved CSV file

Returns:

Summary statistics DataFrame with mean and std error metrics

Return type:

pd.DataFrame

Raises:

ValueError – If data is empty or split_order doesn’t match actual splits

Example:
>>> error_df = pd.DataFrame({
...     "feature1": [0.1, 0.2, 0.3, 0.4],
...     "feature2": [0.2, 0.3, 0.4, 0.5],
...     "data_split": ["train", "train", "val", "val"]
... })
>>> summary = reconstruction_error_summary(error_df, save_path="./results")
mango_autoencoder.modules.anomaly_detector.std_error_threshold(reconstruction_error_df: pandas.DataFrame, std_threshold: float = 3.0, save_path: str | None = None, anomaly_mask_filename: str = 'std_anomaly_mask.csv', anomaly_proportions_filename: str = 'std_anomaly_proportions.csv') pandas.DataFrame

Identify anomalies using a standard deviation threshold over reconstruction error. Considers all time series data for a given feature.

Detects anomalous data points by comparing reconstruction errors to statistical thresholds based on standard deviations from the mean. Creates boolean masks indicating anomaly locations and provides summary statistics of anomaly rates.

Parameters:
  • reconstruction_error_df (pd.DataFrame) – DataFrame with autoencoder reconstruction errors and ‘data_split’ column

  • std_threshold (float) – Threshold in terms of standard deviations from the mean error

  • save_path (Optional[str]) – Optional directory path to save output files

  • anomaly_mask_filename (str) – CSV filename for the boolean anomaly mask

  • anomaly_proportions_filename (str) – CSV filename for anomaly rate summary

Returns:

Boolean DataFrame mask (True = anomaly, False = normal)

Return type:

pd.DataFrame

Raises:

ValueError – If required columns are missing, if data is empty, or if std_threshold is negative

Example:
>>> error_df = pd.DataFrame({
...     "feature1": [0.1, 0.2, 0.3, 0.4, 0.5],
...     "feature2": [0.2, 0.3, 0.4, 0.5, 0.6],
...     "data_split": ["train", "train", "val", "val", "test"]
... })
>>> anomaly_mask = std_error_threshold(error_df, std_threshold=2.0, save_path="./results")
mango_autoencoder.modules.anomaly_detector.corrected_data(actual_data_df: pandas.DataFrame, autoencoder_output_df: pandas.DataFrame, anomaly_mask: pandas.DataFrame, save_path: str | None = None, filename: str = 'corrected_data.csv') pandas.DataFrame

Replace anomalous values in the original data with autoencoder reconstructed values.

Creates a corrected dataset by replacing identified anomalous values with their autoencoder reconstructions. Uses a boolean mask to determine which values should be replaced and provides logging of correction counts.

Parameters:
  • actual_data_df (pd.DataFrame) – Original input data DataFrame

  • autoencoder_output_df (pd.DataFrame) – Reconstructed data DataFrame from autoencoder

  • anomaly_mask (pd.DataFrame) – Boolean DataFrame indicating where to apply corrections (True = replace)

  • save_path (Optional[str]) – Optional directory path to save the corrected data as CSV

  • filename (str) – Filename to use if saving the corrected data

Returns:

DataFrame with corrected data (anomalies replaced with reconstructions)

Return type:

pd.DataFrame

Raises:

ValueError – If input DataFrames have different lengths or columns

Example:
>>> import pandas as pd
>>> actual = pd.DataFrame({"feature1": [1, 2, 3], "feature2": [4, 5, 6]})
>>> reconstructed = pd.DataFrame({"feature1": [1.1, 1.9, 3.1], "feature2": [3.9, 5.2, 5.8]})
>>> mask = pd.DataFrame({"feature1": [False, True, False], "feature2": [True, False, False]})
>>> corrected = corrected_data(actual, reconstructed, mask, save_path="./results")