Table¶
The Table class is used to represent and work on a table or dataframe as a list of dict. It derives from the pytups.Tuplist class and provides powerful data manipulation capabilities similar to pandas DataFrames but with a more lightweight and flexible approach.
Key Features¶
Lightweight data structure based on Python dictionaries
Powerful data manipulation methods inspired by R’s dplyr
Easy conversion to/from JSON format
Integration with pandas DataFrames
Support for various join operations
Flexible data reshaping capabilities
Basic Example¶
For example the following table:
a |
b |
---|---|
1 |
False |
2 |
True |
3 |
False |
will be represented as:
[{"a": 1, "b": False}, {"a": 2, "b": True}, {"a": 3, "b": False}]
Common Operations¶
Data Manipulation¶
The Table class provides various methods for data manipulation:
mutate(): Add or modify columns
select(): Choose specific columns
drop(): Remove columns
filter(): Filter rows based on conditions
group_by(): Group data by columns
summarise(): Aggregate data after grouping
# Example of data manipulation
table = Table([{"a": 1, "b": 2}, {"a": 3, "b": 4}])
# Add a new column 'c' as sum of 'a' and 'b'
result = table.mutate(c=lambda v: v["a"] + v["b"])
# Result: [{"a": 1, "b": 2, "c": 3}, {"a": 3, "b": 4, "c": 7}]
Join Operations¶
The class supports various types of joins similar to SQL:
left_join(): Keep all rows from the left table
right_join(): Keep all rows from the right table
inner_join(): Keep only matching rows
full_join(): Keep all rows from both tables
auto_join(): Join a table with itself
# Example of join operation
table1 = Table([{"id": 1, "value": "A"}, {"id": 2, "value": "B"}])
table2 = Table([{"id": 1, "data": 100}, {"id": 3, "data": 300}])
result = table1.left_join(table2, by="id")
# Result: [{"id": 1, "value": "A", "data": 100}, {"id": 2, "value": "B", "data": None}]
Data Reshaping¶
Support for reshaping data between wide and long formats:
pivot_longer(): Convert wide format to long format
pivot_wider(): Convert long format to wide format
# Example of pivot operations
table = Table([{"id": 1, "x": 10, "y": 20}, {"id": 2, "x": 30, "y": 40}])
long = table.pivot_longer(["x", "y"], names_to="variable", value_to="value")
# Result: [{"id": 1, "variable": "x", "value": 10}, {"id": 1, "variable": "y", "value": 20},
# {"id": 2, "variable": "x", "value": 30}, {"id": 2, "variable": "y", "value": 40}]
Import/Export¶
The Table class supports various formats:
JSON files (to_json/from_json)
Excel files (to_excel/from_excel)
CSV files (to_csv/from_csv)
Pandas DataFrames (to_pandas/from_pandas)
For more information on the underlying Tuplist class, see the pytups documentation.
Table class API¶
- class mango.table.Table(*args: Any, **kwargs: Any)¶
Bases:
TupList
Enhanced table class extending TupList with data manipulation capabilities.
A Table is a list of dictionaries that provides a rich set of methods for data manipulation, filtering, joining, and transformation operations. Inspired by R’s dplyr and data.table packages.
- Parameters:
data (list[dict], optional) – Initial data as a list of dictionaries or None for empty table
check (bool) – Whether to validate that all rows are dictionaries
- Example:
>>> # Create empty table >>> table = Table() >>> >>> # Create table from list of dicts >>> data = [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}] >>> table = Table(data) >>> print(table) Table (2 rows, 2 columns): 0 {'name': 'Alice', 'age': 30} 1 {'name': 'Bob', 'age': 25}
- vapply(func: Callable, *args, **kwargs) Table ¶
Apply a function to each element (row) of the table.
Maps the given function to each row in the table, returning a new table with the transformed values.
- Parameters:
func (Callable) – Function to apply to each row
args – Additional positional arguments for the function
kwargs – Additional keyword arguments for the function
- Returns:
New table with transformed values
- Return type:
- Example:
>>> table = Table([{"x": 1, "y": 2}, {"x": 3, "y": 4}]) >>> result = table.vapply(lambda row: {"sum": row["x"] + row["y"]}) >>> print(result) [{'sum': 3}, {'sum': 7}]
- kapply(func, *args, **kwargs) Table ¶
Apply a function to each key (column name) of the table.
Maps the given function to each column name in the table, returning a new table with transformed column names.
- Parameters:
func (Callable) – Function to apply to each column name
args – Additional positional arguments for the function
kwargs – Additional keyword arguments for the function
- Returns:
New table with transformed column names
- Return type:
- Example:
>>> table = Table([{"name": "Alice", "age": 30}]) >>> result = table.kapply(lambda key: key.upper()) >>> print(result) [{'NAME': 'Alice', 'AGE': 30}]
- kvapply(func, *args, **kwargs) Table ¶
Apply a function to each element with its index.
Maps the given function to each row in the table along with its index, returning a new table with transformed values.
- Parameters:
func (Callable) – Function to apply to each (index, row) pair
args – Additional positional arguments for the function
kwargs – Additional keyword arguments for the function
- Returns:
New table with transformed values
- Return type:
- Example:
>>> table = Table([{"x": 1}, {"x": 2}]) >>> result = table.kvapply(lambda idx, row: {"index": idx, "value": row["x"]}) >>> print(result) [{'index': 0, 'value': 1}, {'index': 1, 'value': 2}]
- copy_shallow() Table ¶
Create a shallow copy of the table.
Creates a new table with the same structure but shares references to the same row objects. Changes to row contents will affect both tables.
- Returns:
New table with shallow copy of the data
- Return type:
- Example:
>>> table = Table([{"name": "Alice", "age": 30}]) >>> copy = table.copy_shallow() >>> copy[0]["age"] = 31 >>> print(table[0]["age"]) # Also changed to 31 31
- copy_deep() Table ¶
Create a deep copy of the table.
Creates a new table with completely independent copies of all data. Changes to the original table will not affect the copy.
- Returns:
New table with deep copy of the data
- Return type:
- Example:
>>> table = Table([{"name": "Alice", "age": 30}]) >>> copy = table.copy_deep() >>> copy[0]["age"] = 31 >>> print(table[0]["age"]) # Still 30 30
- vfilter(function: Callable) Table ¶
Filter rows based on a condition (internal method).
Keeps only the rows for which the provided function returns True. This is the internal implementation used by the public filter method.
- Parameters:
function (Callable[[dict], bool]) – Function that takes a row (dict) and returns a boolean
- Returns:
New table containing only rows that satisfy the condition
- Return type:
- unique() Table ¶
Remove duplicate rows from the table.
Eliminates rows with identical values across all columns. When duplicates are found, the first occurrence is kept.
- Returns:
New table with duplicate rows removed
- Return type:
- Raises:
NotImplementedError – If table structure is not compatible
- Example:
>>> table = Table([ ... {"name": "Alice", "age": 30}, ... {"name": "Bob", "age": 25}, ... {"name": "Alice", "age": 30} # Duplicate ... ]) >>> result = table.unique() >>> print(result) [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]
- unique2() Table ¶
Remove duplicate rows from the table (alternative implementation).
Alternative implementation of unique() that eliminates rows with identical values across all columns. When duplicates are found, the first occurrence is kept.
- Returns:
New table with duplicate rows removed
- Return type:
- Raises:
NotImplementedError – If table structure is not compatible
- Example:
>>> table = Table([ ... {"name": "Alice", "age": 30}, ... {"name": "Bob", "age": 25}, ... {"name": "Alice", "age": 30} # Duplicate ... ]) >>> result = table.unique2() >>> print(result) [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]
- sorted(**kwargs) Table ¶
Sort the table according to a custom key function.
Sorts the table using a custom key function. This method is limited to simple data types and may not work with complex table structures.
- Parameters:
kwargs (dict) – Arguments for the sorted function
- Returns:
New table with sorted rows
- Return type:
- Raises:
NotImplementedError – If table structure is not compatible
- Example:
>>> table = Table([{"value": 3}, {"value": 1}, {"value": 2}]) >>> result = table.sorted(key=lambda x: x["value"]) >>> print(result) [{'value': 1}, {'value': 2}, {'value': 3}]
- take(*args, use_numpy=False) pytups.TupList ¶
Extract values from specified columns of the table.
Returns a TupList containing tuples of values from the specified columns. Each tuple represents the values from one row for the selected columns.
- Parameters:
args (str) – Names of columns to extract
use_numpy (bool) – Whether to use numpy methods for extraction
- Returns:
TupList of tuples containing column values
- Return type:
TupList
- Example:
>>> table = Table([{"a": 1, "b": 2}, {"a": 3, "b": 4}]) >>> result = table.take("a", "b") >>> print(result) [(1, 2), (3, 4)]
- show_rows(n1, n2=None) str ¶
Display specific rows of the table as a formatted string.
Shows the specified row or range of rows with their indices. Used internally by the string representation of the table.
- Parameters:
n1 (int) – Starting row index
n2 (int, optional) – Ending row index (if None, shows only row n1)
- Returns:
Formatted string showing the specified rows
- Return type:
str
- Example:
>>> table = Table([{"name": "Alice"}, {"name": "Bob"}, {"name": "Charlie"}]) >>> result = table.show_rows(0, 2) >>> print(result) 0 {'name': 'Alice'} 1 {'name': 'Bob'}
- head(n=10) Table ¶
Return the first n rows of the table.
Creates a new table containing only the first n rows. If the table has fewer than n rows, returns the entire table.
- Parameters:
n (int) – Number of rows to return (default: 10)
- Returns:
New table with the first n rows
- Return type:
- Example:
>>> table = Table([ ... {"name": "Alice"}, {"name": "Bob"}, {"name": "Charlie"}, ... {"name": "David"}, {"name": "Eve"} ... ]) >>> result = table.head(3) >>> print(result) [{'name': 'Alice'}, {'name': 'Bob'}, {'name': 'Charlie'}]
- peek(n=3, name=None) Table ¶
Display a preview of the table showing first, middle, and last rows.
Shows the first n rows, middle n rows, and last n rows of the table. Useful for getting an overview of large tables without printing everything.
- Parameters:
n (int) – Number of rows to show in each section (default: 3)
name (str, optional) – Optional name or message to display with the table
- Returns:
The original table (unchanged)
- Return type:
- Example:
>>> table = Table([{"id": i, "value": i*2} for i in range(10)]) >>> table.peek(2, "Sample Data") Sample Data: Table (10 rows, 2 columns): 0 {'id': 0, 'value': 0} 1 {'id': 1, 'value': 2} ... 4 {'id': 4, 'value': 8} 5 {'id': 5, 'value': 10} ... 8 {'id': 8, 'value': 16} 9 {'id': 9, 'value': 18}
- mutate(**kwargs) Table ¶
Add or modify columns in the table.
Creates new columns or modifies existing ones using various methods: - Single values applied to all rows - Lists of values for each row - Functions that operate on row data
Note: All changes are applied to the original table structure and do not take into account other changes made in the same call.
- Parameters:
kwargs (dict) – Named arguments with column names and their values
- Returns:
New table with modified columns
- Return type:
- Example:
>>> table = Table([{"x": 1, "y": 2}, {"x": 3, "y": 4}]) >>> result = table.mutate( ... z=10, # Constant value ... sum=lambda row: row["x"] + row["y"], # Function ... product=[2, 12] # List of values ... ) >>> print(result) [{'x': 1, 'y': 2, 'z': 10, 'sum': 3, 'product': 2}, {'x': 3, 'y': 4, 'z': 10, 'sum': 7, 'product': 12}]
- group_by(col) pytups.SuperDict ¶
Group rows of the table by specified column values.
Groups the table rows based on the values in the specified column(s). Returns a SuperDict where keys are the unique values and values are lists of rows that have that value.
- Parameters:
col (str or list[str]) – Column name or list of column names to group by
- Returns:
SuperDict with grouped data
- Return type:
SuperDict
- Example:
>>> table = Table([ ... {"name": "Alice", "city": "Madrid"}, ... {"name": "Bob", "city": "Barcelona"}, ... {"name": "Charlie", "city": "Madrid"} ... ]) >>> result = table.group_by("city") >>> print(result) {'Madrid': [{'name': 'Alice', 'city': 'Madrid'}, {'name': 'Charlie', 'city': 'Madrid'}], 'Barcelona': [{'name': 'Bob', 'city': 'Barcelona'}]}
- group_mutate(group_by, **func) Table ¶
Group by specified columns and apply functions to other columns.
Groups the table by the specified columns and applies aggregation functions to the remaining columns. Similar to SQL GROUP BY with aggregate functions.
- Parameters:
group_by (str or list[str]) – Column name or list of column names to group by
func (dict[str, Callable]) – Functions to apply to columns (e.g., a=sum, b=mean)
- Returns:
New table with grouped and aggregated data
- Return type:
- Example:
>>> table = Table([ ... {"city": "Madrid", "sales": 100}, ... {"city": "Madrid", "sales": 150}, ... {"city": "Barcelona", "sales": 200} ... ]) >>> result = table.group_mutate("city", sales=sum) >>> print(result) [{'city': 'Madrid', 'sales': 250}, {'city': 'Barcelona', 'sales': 200}]
- sum_all(group_by=None) Table ¶
Group by specified columns and sum all numeric columns.
Groups the table by the specified columns and sums all numeric columns in each group. Non-numeric columns are ignored.
- Parameters:
group_by (str or list[str], optional) – Column name or list of column names to group by
- Returns:
New table with grouped and summed data
- Return type:
- Example:
>>> table = Table([ ... {"category": "A", "value1": 10, "value2": 20}, ... {"category": "A", "value1": 15, "value2": 25}, ... {"category": "B", "value1": 5, "value2": 10} ... ]) >>> result = table.sum_all("category") >>> print(result) [{'category': 'A', 'value1': 25, 'value2': 45}, {'category': 'B', 'value1': 5, 'value2': 10}]
- summarise(group_by=None, default: Callable = None, **func) Table ¶
Group by specified columns and apply aggregation functions.
Groups the table by specified columns and applies custom aggregation functions to other columns. More flexible than group_mutate as it allows specifying a default function for non-explicitly handled columns.
- Parameters:
group_by (str or list[str], optional) – Column name or list of column names to group by
default (Callable, optional) – Default function to apply to columns not explicitly specified
func (dict[str, Callable]) – Functions to apply to specific columns
- Returns:
New table with grouped and summarized data
- Return type:
- Example:
>>> table = Table([ ... {"category": "A", "value": 10, "count": 1}, ... {"category": "A", "value": 15, "count": 2}, ... {"category": "B", "value": 5, "count": 1} ... ]) >>> result = table.summarise("category", value=sum, count=sum) >>> print(result) [{'category': 'A', 'value': 25, 'count': 3}, {'category': 'B', 'value': 5, 'count': 1}]
- join(table2, by=None, suffix=None, jtype='full', empty=None, if_empty_table_1=None, if_empty_table_2=None) Table ¶
Join two tables using various join types.
Performs table joins inspired by R’s dplyr join functions. Supports different join types and flexible column matching strategies.
- Parameters:
table2 (Table or list[dict]) – Second table to join with
by (list, dict, or None) – Column specification for joining
suffix (list[str], optional) – Suffixes for disambiguating column names
jtype (str) – Type of join (“full”, “left”, “right”, “inner”)
empty – Value to use for empty cells created by the join
if_empty_table_1 – Replacement if table 1 is empty
if_empty_table_2 – Replacement if table 2 is empty
- Returns:
New table containing the joined data
- Return type:
- Example:
>>> table1 = Table([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]) >>> table2 = Table([{"id": 1, "age": 30}, {"id": 3, "age": 25}]) >>> result = table1.join(table2, by="id", jtype="left") >>> print(result) [{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 2, 'name': 'Bob', 'age': None}]
- left_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) Table ¶
Perform a left join with another table.
Returns all rows from the left table (self) and matching rows from the right table (table2). Rows from the left table without matches will have None values for columns from the right table.
- Parameters:
table2 (Table or list[dict]) – Second table to join with
by (list, dict, or None) – Column specification for joining
suffix (list[str], optional) – Suffixes for disambiguating column names
empty – Value to use for empty cells created by the join
if_empty_table_1 – Replacement if table 1 is empty
if_empty_table_2 – Replacement if table 2 is empty
- Returns:
New table containing the left join result
- Return type:
- Example:
>>> table1 = Table([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]) >>> table2 = Table([{"id": 1, "age": 30}, {"id": 3, "age": 25}]) >>> result = table1.left_join(table2, by="id") >>> print(result) [{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 2, 'name': 'Bob', 'age': None}]
- right_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) Table ¶
Perform a right join with another table.
Returns all rows from the right table (table2) and matching rows from the left table (self). Rows from the right table without matches will have None values for columns from the left table.
- Parameters:
table2 (Table or list[dict]) – Second table to join with
by (list, dict, or None) – Column specification for joining
suffix (list[str], optional) – Suffixes for disambiguating column names
empty – Value to use for empty cells created by the join
if_empty_table_1 – Replacement if table 1 is empty
if_empty_table_2 – Replacement if table 2 is empty
- Returns:
New table containing the right join result
- Return type:
- Example:
>>> table1 = Table([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]) >>> table2 = Table([{"id": 1, "age": 30}, {"id": 3, "age": 25}]) >>> result = table1.right_join(table2, by="id") >>> print(result) [{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 3, 'name': None, 'age': 25}]
- full_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) Table ¶
Perform a full outer join with another table.
Returns all rows from both tables, with None values where there are no matches. This is the default join type and combines left and right joins.
- Parameters:
table2 (Table or list[dict]) – Second table to join with
by (list, dict, or None) – Column specification for joining
suffix (list[str], optional) – Suffixes for disambiguating column names
empty – Value to use for empty cells created by the join
if_empty_table_1 – Replacement if table 1 is empty
if_empty_table_2 – Replacement if table 2 is empty
- Returns:
New table containing the full join result
- Return type:
- Example:
>>> table1 = Table([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]) >>> table2 = Table([{"id": 1, "age": 30}, {"id": 3, "age": 25}]) >>> result = table1.full_join(table2, by="id") >>> print(result) [{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 2, 'name': 'Bob', 'age': None}, {'id': 3, 'name': None, 'age': 25}]
- inner_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) Table ¶
Perform an inner join with another table.
Returns only rows that have matching values in both tables. Rows without matches in either table are excluded from the result.
- Parameters:
table2 (Table or list[dict]) – Second table to join with
by (list, dict, or None) – Column specification for joining
suffix (list[str], optional) – Suffixes for disambiguating column names
empty – Value to use for empty cells created by the join
if_empty_table_1 – Replacement if table 1 is empty
if_empty_table_2 – Replacement if table 2 is empty
- Returns:
New table containing only matching rows
- Return type:
- Example:
>>> table1 = Table([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]) >>> table2 = Table([{"id": 1, "age": 30}, {"id": 3, "age": 25}]) >>> result = table1.inner_join(table2, by="id") >>> print(result) [{'id': 1, 'name': 'Alice', 'age': 30}]
- auto_join(by=None, suffix=None, empty=None) Table ¶
Join a table with itself to create combinations.
Performs a self-join to create all possible combinations of rows. Useful for creating Cartesian products or finding relationships within the same table.
- Parameters:
by (list, dict, or None) – Column specification for the self-join
suffix (list[str], optional) – Suffixes to add to column names to distinguish them
empty – Value to use for empty cells created by the join
- Returns:
New table with all combinations
- Return type:
- Example:
>>> table = Table([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]) >>> result = table.auto_join() >>> print(result) [{'id': 1, 'name': 'Alice', 'id_2': 1, 'name_2': 'Alice'}, {'id': 1, 'name': 'Alice', 'id_2': 2, 'name_2': 'Bob'}, {'id': 2, 'name': 'Bob', 'id_2': 1, 'name_2': 'Alice'}, {'id': 2, 'name': 'Bob', 'id_2': 2, 'name_2': 'Bob'}]
- select(*args) Table ¶
Select specific columns from the table.
Creates a new table containing only the specified columns. Maintains the original row order.
- Parameters:
args (str) – Names of columns to select
- Returns:
New table with only the selected columns
- Return type:
- Raises:
KeyError – If any specified column doesn’t exist
- Example:
>>> table = Table([ ... {"id": 1, "name": "Alice", "age": 30, "city": "Madrid"}, ... {"id": 2, "name": "Bob", "age": 25, "city": "Barcelona"} ... ]) >>> result = table.select("name", "age") >>> print(result) [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]
- drop(*args) Table ¶
Remove specific columns from the table.
Creates a new table with the specified columns removed. Maintains the original row order.
- Parameters:
args (str) – Names of columns to remove
- Returns:
New table without the specified columns
- Return type:
- Example:
>>> table = Table([ ... {"id": 1, "name": "Alice", "age": 30, "city": "Madrid"}, ... {"id": 2, "name": "Bob", "age": 25, "city": "Barcelona"} ... ]) >>> result = table.drop("id", "city") >>> print(result) [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]
- rename(**kwargs) Table ¶
Rename columns in the table.
Changes column names using a mapping of old names to new names. Maintains the original row order and data.
- Parameters:
kwargs (dict[str, str]) – Mapping of old column names to new names
- Returns:
New table with renamed columns
- Return type:
- Raises:
KeyError – If any old column name doesn’t exist
- Example:
>>> table = Table([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]) >>> result = table.rename(id="user_id", name="full_name") >>> print(result) [{'user_id': 1, 'full_name': 'Alice'}, {'user_id': 2, 'full_name': 'Bob'}]
- filter(func) Table ¶
Filter rows based on a condition.
Keeps only the rows for which the provided function returns True. Returns an empty table if the original table is empty.
- Parameters:
func (Callable[[dict], bool]) – Function that takes a row (dict) and returns a boolean
- Returns:
New table containing only rows that satisfy the condition
- Return type:
- Example:
>>> table = Table([ ... {"name": "Alice", "age": 30}, ... {"name": "Bob", "age": 25}, ... {"name": "Charlie", "age": 35} ... ]) >>> result = table.filter(lambda row: row["age"] > 28) >>> print(result) [{'name': 'Alice', 'age': 30}, {'name': 'Charlie', 'age': 35}]
- get_col_names(fast=False) list ¶
Get the names of all columns in the table.
Returns a list of column names. By default, scans all rows to ensure all possible columns are included. Use fast=True for better performance if you’re certain the first row contains all columns.
- Parameters:
fast (bool) – If True, only check the first row for column names
- Returns:
List of column names
- Return type:
list[str]
- Raises:
IndexError – If table is empty
- Example:
>>> table = Table([{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]) >>> columns = table.get_col_names() >>> print(columns) ['name', 'age']
- to_columns() pytups.SuperDict ¶
Convert table to column-oriented format.
Transforms the table from row-oriented (list of dicts) to column-oriented (dict of lists) format. Each column becomes a key with a list of values.
- Returns:
SuperDict with column names as keys and lists of values
- Return type:
SuperDict
- Example:
>>> table = Table([ ... {"name": "Alice", "age": 30}, ... {"name": "Bob", "age": 25} ... ]) >>> result = table.to_columns() >>> print(result) {'name': ['Alice', 'Bob'], 'age': [30, 25]}
- classmethod from_columns(dct) Table ¶
Create a table from a column-oriented dictionary.
Transforms a dictionary of lists (column-oriented) into a table (row-oriented list of dictionaries).
- Parameters:
dct (dict[str, list]) – Dictionary with column names as keys and lists of values
- Returns:
New table with row-oriented data
- Return type:
- Example:
>>> data = {"name": ["Alice", "Bob"], "age": [30, 25]} >>> result = Table.from_columns(data) >>> print(result) [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]
- get_index(cond) list ¶
Get indices of rows that satisfy a condition.
Returns a list of row indices where the condition function returns True. Useful for identifying which rows match specific criteria.
- Parameters:
cond (Callable[[dict], bool]) – Function that takes a row (dict) and returns a boolean
- Returns:
List of row indices that satisfy the condition
- Return type:
list[int]
- Example:
>>> table = Table([ ... {"name": "Alice", "age": 30}, ... {"name": "Bob", "age": 25}, ... {"name": "Charlie", "age": 35} ... ]) >>> indices = table.get_index(lambda row: row["age"] > 28) >>> print(indices) [0, 2]
- replace(replacement=None, to_replace=None, fast=False) Table ¶
Replace specific values in the table.
Replaces specified values with new values in the table. Can replace values across all columns or target specific columns.
- Parameters:
replacement (any or dict[str, any]) – New values to use as replacements
to_replace (any or dict[str, any]) – Values to be replaced
fast (bool) – If True, assume first row contains all columns
- Returns:
New table with replaced values
- Return type:
- Example:
>>> table = Table([ ... {"age": 25, "city": "Madrid"}, ... {"age": 30, "city": "Barcelona"} ... ]) >>> result = table.replace( ... replacement={"age": 35, "city": "Paris"}, ... to_replace={"age": 25, "city": "Madrid"} ... ) >>> print(result) [{'age': 35, 'city': 'Paris'}, {'age': 30, 'city': 'Barcelona'}]
- replace_empty(replacement=None, fast=False) Table ¶
Replace empty values in the table.
Replaces empty values (None, empty strings, etc.) with specified replacement values. Can use different replacements for different columns.
- Parameters:
replacement (any or dict[str, any]) – Values to use for replacing empty values
fast (bool) – If True, assume first row contains all columns
- Returns:
New table with empty values replaced
- Return type:
- Example:
>>> table = Table([ ... {"name": "Alice", "age": None}, ... {"name": "", "age": 25} ... ]) >>> result = table.replace_empty(replacement={"name": "Unknown", "age": 0}) >>> print(result) [{'name': 'Alice', 'age': 0}, {'name': 'Unknown', 'age': 25}]
- replace_nan(replacement=None) Table ¶
Replace NaN values in the table.
Replaces NaN (Not a Number) values with specified replacement values. Useful for cleaning numeric data with missing values.
- Parameters:
replacement (any or dict[str, any]) – Values to use for replacing NaN values
- Returns:
New table with NaN values replaced
- Return type:
- Example:
>>> import math >>> table = Table([ ... {"value": 10.5, "score": math.nan}, ... {"value": math.nan, "score": 85.0} ... ]) >>> result = table.replace_nan(replacement=0) >>> print(result) [{'value': 10.5, 'score': 0}, {'value': 0, 'score': 85.0}]
- pivot_longer(cols, names_to='variable', value_to='value') Table ¶
Transform table from wide to long format.
“Lengthens” data by increasing the number of rows and decreasing the number of columns. The inverse transformation of pivot_wider().
- Parameters:
cols (list[str]) – List of column names to pivot
names_to (str) – Name for the new column containing variable names
value_to (str) – Name for the new column containing values
- Returns:
New table in long format
- Return type:
- Example:
>>> table = Table([ ... {"id": 1, "Q1": 100, "Q2": 150, "Q3": 200}, ... {"id": 2, "Q1": 120, "Q2": 180, "Q3": 220} ... ]) >>> result = table.pivot_longer(["Q1", "Q2", "Q3"], "quarter", "sales") >>> print(result) [{'id': 1, 'quarter': 'Q1', 'sales': 100}, {'id': 1, 'quarter': 'Q2', 'sales': 150}, {'id': 1, 'quarter': 'Q3', 'sales': 200}, {'id': 2, 'quarter': 'Q1', 'sales': 120}, {'id': 2, 'quarter': 'Q2', 'sales': 180}, {'id': 2, 'quarter': 'Q3', 'sales': 220}]
- pivot_wider(names_from='variable', value_from='value', id_cols=None, values_fill=None) Table ¶
Transform table from long to wide format.
“Widens” data by increasing the number of columns and decreasing the number of rows. The inverse transformation of pivot_longer().
- Parameters:
names_from (str) – Name of the column containing variable names
value_from (str) – Name of the column containing values
id_cols (list[str], optional) – Columns that uniquely identify each observation
values_fill (any or dict, optional) – Value or dict to fill missing values
- Returns:
New table in wide format
- Return type:
- Example:
>>> table = Table([ ... {"id": 1, "quarter": "Q1", "sales": 100}, ... {"id": 1, "quarter": "Q2", "sales": 150}, ... {"id": 2, "quarter": "Q1", "sales": 120} ... ]) >>> result = table.pivot_wider("quarter", "sales", "id") >>> print(result) [{'id': 1, 'Q1': 100, 'Q2': 150}, {'id': 2, 'Q1': 120, 'Q2': None}]
- drop_empty(cols=None, fast=False) Table ¶
Remove rows with empty values in specified columns.
Drops rows where the specified columns contain empty values (None, empty strings, etc.).
- Parameters:
cols (str or list[str], optional) – Column name(s) to check for empty values
fast (bool) – If True, assume first row contains all columns
- Returns:
New table with empty rows removed
- Return type:
- Example:
>>> table = Table([ ... {"name": "Alice", "age": 30}, ... {"name": "", "age": 25}, ... {"name": "Bob", "age": None} ... ]) >>> result = table.drop_empty("name") >>> print(result) [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': None}]
- lag_col(col, i=1, replace=False) Table ¶
Create a lagged version of a column.
Shifts the values of a column by a specified number of steps. Useful for time series analysis and creating features from previous values.
- Parameters:
col (str) – Name of the column to lag
i (int) – Number of steps to lag (default: 1)
replace (bool) – If True, replace original column; if False, create new column
- Returns:
New table with lagged column
- Return type:
- Example:
>>> table = Table([ ... {"date": "2023-01", "sales": 100}, ... {"date": "2023-02", "sales": 150}, ... {"date": "2023-03", "sales": 200} ... ]) >>> result = table.lag_col("sales", 1) >>> print(result) [{'date': '2023-01', 'sales': 100, 'lag_sales_1': None}, {'date': '2023-02', 'sales': 150, 'lag_sales_1': 100}, {'date': '2023-03', 'sales': 200, 'lag_sales_1': 150}]
- distinct(columns) Table ¶
Keep only unique combinations of values in specified columns.
Removes duplicate rows based on the values in the specified columns. When duplicates are found, the first occurrence is kept.
- Parameters:
columns (str or list[str]) – Column name(s) to check for uniqueness
- Returns:
New table with duplicate rows removed
- Return type:
- Example:
>>> table = Table([ ... {"name": "Alice", "city": "Madrid"}, ... {"name": "Bob", "city": "Barcelona"}, ... {"name": "Alice", "city": "Madrid"} # Duplicate ... ]) >>> result = table.distinct("name") >>> print(result) [{'name': 'Alice', 'city': 'Madrid'}, {'name': 'Bob', 'city': 'Barcelona'}]
- order_by(columns, reverse=False) Table ¶
Sort the table by specified columns.
Reorders the table rows based on the values in the specified columns. Supports both ascending and descending order.
- Parameters:
columns (str or list[str]) – Column name(s) to sort by
reverse (bool) – If True, sort in descending order
- Returns:
New table with sorted rows
- Return type:
- Example:
>>> table = Table([ ... {"name": "Charlie", "age": 35}, ... {"name": "Alice", "age": 30}, ... {"name": "Bob", "age": 25} ... ]) >>> result = table.order_by("age") >>> print(result) [{'name': 'Bob', 'age': 25}, {'name': 'Alice', 'age': 30}, {'name': 'Charlie', 'age': 35}]
- drop_nested() Table ¶
Remove columns containing nested data structures.
Drops columns that contain nested values (dictionaries, lists, or tuples). Assumes homogeneous table structure and checks only the first row.
- Returns:
New table with nested columns removed
- Return type:
- Example:
>>> table = Table([ ... {"name": "Alice", "age": 30, "hobbies": ["reading", "swimming"]}, ... {"name": "Bob", "age": 25, "hobbies": ["gaming"]} ... ]) >>> result = table.drop_nested() >>> print(result) [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]
- check_type()¶
Validate that the table is a list of dictionaries.
Checks that the table structure is correct (list of dictionaries). Raises TypeError if the structure is invalid.
- Returns:
None
- Return type:
None
- Raises:
TypeError – If table is not a list of dictionaries
- Example:
>>> table = Table([{"name": "Alice"}, {"name": "Bob"}]) >>> table.check_type() # No error >>> >>> invalid_table = Table([1, 2, 3]) >>> invalid_table.check_type() # Raises TypeError
- to_set2(columns) pytups.TupList ¶
Create a list of unique combinations from specified columns.
Extracts unique combinations of values from the specified columns and returns them as a TupList.
- Parameters:
columns (str or list[str]) – Column name(s) to extract unique combinations from
- Returns:
TupList containing unique combinations
- Return type:
TupList
- Raises:
KeyError – If specified columns don’t exist in the table
- Example:
>>> table = Table([ ... {"name": "Alice", "city": "Madrid"}, ... {"name": "Bob", "city": "Barcelona"}, ... {"name": "Alice", "city": "Madrid"} # Duplicate ... ]) >>> result = table.to_set2(["name", "city"]) >>> print(result) [('Alice', 'Madrid'), ('Bob', 'Barcelona')]
- to_param(keys, value, is_list=False) pytups.SuperDict ¶
Create a parameter dictionary from specified columns.
Creates a SuperDict using specified columns as keys and values. Useful for creating lookup dictionaries or parameter mappings.
- Parameters:
keys (str or list[str]) – Column name(s) to use as dictionary keys
value (str) – Column name to use as dictionary values
is_list (bool) – If True, values can be lists; if False, expects unique keys
- Returns:
SuperDict with keys and values from specified columns
- Return type:
SuperDict
- Raises:
KeyError – If specified columns don’t exist
ValueError – If keys are not unique and is_list=False
- Example:
>>> table = Table([ ... {"id": 1, "name": "Alice", "age": 30}, ... {"id": 2, "name": "Bob", "age": 25} ... ]) >>> result = table.to_param("id", "name") >>> print(result) {1: 'Alice', 2: 'Bob'}
- is_unique(columns) bool ¶
Check if combinations of values in specified columns are unique.
Determines whether the combination of values in the specified columns forms a unique key for each row.
- Parameters:
columns (str or list[str]) – Column name(s) to check for uniqueness
- Returns:
True if all combinations are unique
- Return type:
bool
- Example:
>>> table = Table([ ... {"id": 1, "name": "Alice"}, ... {"id": 2, "name": "Bob"}, ... {"id": 1, "name": "Charlie"} # Duplicate id ... ]) >>> print(table.is_unique("id")) False >>> print(table.is_unique("name")) True
- add_row(**kwargs) Table ¶
Add a new row to the table.
Adds a new row with the specified values. Missing columns are filled with None values to maintain table structure.
- Parameters:
kwargs (dict) – Column values in the format column_name=value
- Returns:
New table with the added row
- Return type:
- Example:
>>> table = Table([{"name": "Alice", "age": 30}]) >>> result = table.add_row(name="Bob", age=25) >>> print(result) [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]
- rbind(table: list) Table ¶
Bind another table by rows (row bind).
Combines the current table with another table by stacking rows. Missing columns are filled with None values.
- Parameters:
table (list[dict] or Table) – Another table to bind (list of dicts or Table)
- Returns:
New table with combined rows
- Return type:
- Example:
>>> table1 = Table([{"name": "Alice", "age": 30}]) >>> table2 = [{"name": "Bob", "city": "Madrid"}] >>> result = table1.rbind(table2) >>> print(result) [{'name': 'Alice', 'age': 30, 'city': None}, {'name': 'Bob', 'age': None, 'city': 'Madrid'}]
- col_apply(columns, func: Callable, **kwargs) Table ¶
Apply a function to specified columns.
Applies a function to the values in specified columns, transforming the data according to the function logic.
- Parameters:
columns (str or list[str]) – Column name(s) to apply the function to
func (Callable) – Function to apply to column values
kwargs – Additional keyword arguments for the function
- Returns:
New table with transformed columns
- Return type:
- Example:
>>> table = Table([{"value": 10}, {"value": 20}]) >>> result = table.col_apply("value", lambda x: x * 2) >>> print(result) [{'value': 20}, {'value': 40}]
- classmethod format_dataset(dataset) dict ¶
Convert dataset dictionary to use Table objects for list values.
Processes a dataset dictionary, converting any list values to Table objects while keeping other values unchanged.
- Parameters:
dataset (dict) – Dictionary containing data in various formats
- Returns:
Dictionary with list values converted to Table objects
- Return type:
dict
- Example:
>>> data = { ... "users": [{"name": "Alice"}, {"name": "Bob"}], ... "config": {"setting": "value"} ... } >>> result = Table.format_dataset(data) >>> print(type(result["users"])) <class 'mango.table.pytups_table.Table'>
- classmethod dataset_from_json(path, **kwargs) dict ¶
Load a JSON file and convert list values to Table objects.
Loads a JSON file and processes it using format_dataset to convert any list values to Table objects.
- Parameters:
path (str) – Path to the JSON file
kwargs – Additional arguments for load_json
- Returns:
Dictionary with list values converted to Table objects
- Return type:
dict
- Example:
>>> # Assuming data.json contains: {"users": [{"name": "Alice"}]} >>> result = Table.dataset_from_json("data.json") >>> print(type(result["users"])) <class 'mango.table.pytups_table.Table'>
- classmethod from_pandas(df) Table ¶
Create a Table from a pandas DataFrame.
Converts a pandas DataFrame to a Table object, preserving all data and column names. Requires pandas to be installed.
- Parameters:
df (pandas.DataFrame) – Pandas DataFrame to convert
- Returns:
New Table object with the DataFrame data
- Return type:
- Raises:
ImportError – If pandas is not installed
- Example:
>>> import pandas as pd >>> df = pd.DataFrame({"name": ["Alice", "Bob"], "age": [30, 25]}) >>> table = Table.from_pandas(df) >>> print(table) [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]
- to_pandas() pandas.DataFrame ¶
Convert the Table to a pandas DataFrame.
Creates a pandas DataFrame from the Table data, preserving all columns and rows. Requires pandas to be installed.
- Returns:
Pandas DataFrame with the table data
- Return type:
pandas.DataFrame
- Raises:
ImportError – If pandas is not installed
- Example:
>>> table = Table([{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]) >>> df = table.to_pandas() >>> print(df) name age 0 Alice 30 1 Bob 25
- pk_save(file)¶
Save the table to a pickle file.
Serializes the table object and saves it to a pickle file. Automatically adds .pickle extension if not present.
- Parameters:
file (str) – Path to the pickle file
- Returns:
None
- Return type:
None
- Example:
>>> table = Table([{"name": "Alice", "age": 30}]) >>> table.pk_save("my_table") # Creates my_table.pickle file
- classmethod pk_load(file) Table ¶
Load a Table from a pickle file.
Deserializes a Table object from a pickle file. Automatically adds .pickle extension if not present.
- Parameters:
file (str) – Path to the pickle file
- Returns:
Table object loaded from the file
- Return type:
- Example:
>>> table = Table.pk_load("my_table") >>> print(table) [{'name': 'Alice', 'age': 30}]
- to_json(path)¶
Export the table to a JSON file.
Saves the table data to a JSON file. Numeric column names are automatically converted to strings for JSON compatibility.
- Parameters:
path (str) – Path to the JSON file
- Returns:
None
- Return type:
None
- Example:
>>> table = Table([{"name": "Alice", "age": 30}]) >>> table.to_json("output.json") # Creates output.json with table data
- classmethod from_json(path, **kwargs) Table ¶
Create a table from a JSON file.
Loads data from a JSON file and creates a Table object. The JSON file should contain a list of dictionaries.
- Parameters:
path (str) – Path to the JSON file
kwargs – Additional arguments for load_json
- Returns:
New table with data from the JSON file
- Return type:
- Example:
>>> # Assuming data.json contains: [{"name": "Alice", "age": 30}] >>> table = Table.from_json("data.json") >>> print(table) [{'name': 'Alice', 'age': 30}]
- apply(func: Callable, *args, **kwargs)¶
Apply a function to the entire table.
Passes the entire table as the first argument to the specified function. Useful for chaining custom functions that operate on the whole table.
- Parameters:
func (Callable) – Function that takes the table as its first argument
args – Additional positional arguments for the function
kwargs – Additional keyword arguments for the function
- Returns:
Result of the function call
- Return type:
any
- Example:
>>> table = Table([{"value": 10}, {"value": 20}]) >>> def double_table(t): ... return t.mutate(value=lambda row: row["value"] * 2) >>> result = table.apply(double_table) >>> print(result) [{'value': 20}, {'value': 40}]
- classmethod dataset_from_excel(path, sheets=None) dict ¶
Load an Excel file and return a dictionary of Table objects.
Reads an Excel file and converts each sheet to a Table object. Returns a dictionary with sheet names as keys and Table objects as values.
- Parameters:
path (str) – Path to the Excel file
sheets (list[str], optional) – List of sheet names to read (all sheets if None)
- Returns:
Dictionary with sheet names and Table objects
- Return type:
dict
- Example:
>>> # Assuming data.xlsx has sheets "users" and "orders" >>> result = Table.dataset_from_excel("data.xlsx") >>> print(list(result.keys())) ['users', 'orders']
- to_excel(path, sheet_name=None)¶
Export the table to an Excel file.
Saves the table data to an Excel file with the specified sheet name. If no sheet name is provided, uses “Sheet1” as default.
- Parameters:
path (str) – Path to the Excel file
sheet_name (str, optional) – Name of the Excel sheet
- Returns:
None
- Return type:
None
- Example:
>>> table = Table([{"name": "Alice", "age": 30}]) >>> table.to_excel("output.xlsx", "Users") # Creates output.xlsx with "Users" sheet
- classmethod from_csv(path, sep=',', encoding=None) Table ¶
Load a table from a CSV file.
Reads a CSV file and creates a Table object from the data. Supports custom separators and encodings.
- Parameters:
path (str) – Path to the CSV file
sep (str) – Column separator (detected automatically if None)
encoding (str, optional) – File encoding
- Returns:
New table with data from the CSV file
- Return type:
- Example:
>>> # Assuming data.csv contains: name,age\nAlice,30\nBob,25 >>> table = Table.from_csv("data.csv") >>> print(table) [{'name': 'Alice', 'age': '30'}, {'name': 'Bob', 'age': '25'}]
- to_csv(path, sep=',', encoding=None)¶
Export the table to a CSV file.
Saves the table data to a CSV file with the specified separator and encoding.
- Parameters:
path (str) – Path to the CSV file
sep (str) – Column separator for the CSV file
encoding (str, optional) – File encoding
- Returns:
None
- Return type:
None
- Example:
>>> table = Table([{"name": "Alice", "age": 30}]) >>> table.to_csv("output.csv", sep=";") # Creates output.csv with semicolon separator