Table¶
The Table class is used to represent and work on a table or dataframe as a list of dict.
For example the following table
a |
b |
---|---|
1 |
False |
2 |
True |
3 |
False |
will be represented as
[{"a": 1, "b": False}, {"a": 2, "b": True}, {"a": 3, "b": False}]
This class is particularly useful for working with data in json format.
Table class¶
- class mango.table.Table(data=None, check=False)¶
- vapply(func: Callable, *args, **kwargs) Table ¶
maps function into each element of TupList
- Parameters:
func (callable) – function to apply
- Returns:
new
TupList
- kapply(func, *args, **kwargs) Table ¶
maps function into each key of TupList
- Parameters:
func (callable) – function to apply
- Returns:
new
TupList
- kvapply(func, *args, **kwargs) Table ¶
maps function into each element of TupList with indexes
- Parameters:
func (callable) – function to apply
- Returns:
new
TupList
- vfilter(function: Callable) Table ¶
returns new list with only tuples for which function returns True
- Parameters:
function (callable) – function to apply to each element
- Returns:
new
TupList
- unique() Table ¶
Applies
numpy.unique()
.- Parameters:
dtype – arguments to
numpy.asarray()
- Returns:
new
TupList
- sorted(**kwargs) Table ¶
Applies sorted function to elements and returns a TupList
- Parameters:
kwargs – arguments for sorted function main arguments for sorted are: - key - reverse
- Returns:
new
TupList
- take(*args, use_numpy=False) TupList ¶
Extract values from a columns of a table.
Example: result=Table([{“a”:1, “b”:2}, {“a”:3, “b”:4}]).take(“a”, “b”) result: [(1,2), (3,4)]
- Parameters:
args – name of the columns to extract
use_numpy – use numpy methods in take
- Returns:
a list of tuples.
- show_rows(n1, n2=None)¶
Show the n1 th row of the table or rows between n1 and n2.
- Parameters:
n1 – row number to start
n2 – row number to end (if None, only show row n1)
- Returns:
a string
- head(n=10)¶
Return the first n rows of a table.
- Parameters:
n – Number of rows to show (default:10).
- Returns:
set_a table with n rows or less.
- peek(n=3, name=None)¶
Show the first, middle and last n rows of the table.
- Parameters:
n – number of rows to show in each part.
name – name or message to print with the table
- Returns:
the printed string
- mutate(**kwargs) Table ¶
Add or modify a column in a table.
Example: mutate(table, a=3, b=[4,5,6], c=lambda v: v[“a”]+v[“b”], d = mean)
Note: all changes are applied over the input table and do not take into account the other changes.
- Parameters:
kwargs – named arguments with the changes to apply.
- The values can be:
a single value which will be applied to each row
a list with all the values of the column
a function to apply to the row.
- Returns:
a table
- group_by(col) SuperDict ¶
Group the rows of a table by the value fo some columns
- Parameters:
col – single name of list of columns to use to group the rows
:return a SuperDict
- group_mutate(group_by, **func) Table ¶
Group by the given columns and apply the given functions to the others.
- Parameters:
group_by – name of the columns to group.
func – function to apply to the named column. ex: a = first, b = mean
- Returns:
a table (Tuplist of dict).
- sum_all(group_by=None) Table ¶
Group by the given columns and sum the others.
- Parameters:
group_by – name of the columns to group.
- Returns:
a table (Tuplist of dict)
- summarise(group_by=None, default: Callable | None = None, **func) Table ¶
Group by the given columns and apply the given functions to the others.
- Parameters:
group_by – name of the columns to group.
default – default function to apply to non-grouped columns.
func – function to apply to the named column. ex: a = first, b = mean
- Returns:
a table (Tuplist of dict).
- join(table2, by=None, suffix=None, jtype='full', empty=None, if_empty_table_1=None, if_empty_table_2=None) Table ¶
Join to tables. Inspired by R dplyr join functions.
- Parameters:
table2 – 2nd table (Tuplist with dict)
by – list, dict or None. If the columns have the same name in both tables, a list of keys/column to use for the join. If the columns have different names in both tables, a dict in the format {name_table1: name_table2} If by is None, use all the shared keys.
suffix – if some columns have the same name in both tables but are not in “by”, a suffix will be added to their names. With suffix=[“_1”,”_2”], shared column “x” will become “x_1”, “x_2”
jtype – type of join: “full”
empty – values to give to empty cells created by the join.
if_empty_table_1 – (dict or list) if table 1 is empty, it will be replaced by this dict in the join.
if_empty_table_2 – (dict or list) if table 2 is empty, it will be replaced by this dict in the join.
- Returns:
a Table
- left_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) Table ¶
Shortcut to join(type=”left”)
- Parameters:
table2 – 2nd table (Tuplist with dict)
by – list, dict or None. If the columns have the same name in both tables, a list of keys/column to use for the join. If the columns have different names in both tables, a dict in the format {name_table1: name_table2} If by is None, use all the shared keys.
suffix – if some columns have the same name in both tables but are not in “by”, a suffix will be added to their names. With suffix=[“_1”,”_2”], shared column “x” will become “x_1”, “x_2”
empty – values to give to empty cells created by the join.
if_empty_table_1 – (dict or list) if table 1 is empty, it will be replaced by this dict in the join.
if_empty_table_2 – (dict or list) if table 2 is empty, it will be replaced by this dict in the join.
- Returns:
a Table
- right_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) Table ¶
Shortcut to join(type=”right”)
- Parameters:
table2 – 2nd table (Tuplist with dict)
by – list, dict or None. If the columns have the same name in both tables, a list of keys/column to use for the join. If the columns have different names in both tables, a dict in the format {name_table1: name_table2} If by is None, use all the shared keys.
suffix – if some columns have the same name in both tables but are not in “by”, a suffix will be added to their names. With suffix=[“_1”,”_2”], shared column “x” will become “x_1”, “x_2”
empty – values to give to empty cells created by the join.
if_empty_table_1 – (dict or list) if table 1 is empty, it will be replaced by this dict in the join.
if_empty_table_2 – (dict or list) if table 2 is empty, it will be replaced by this dict in the join.
- Returns:
a Table
- full_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) Table ¶
Shortcut to join(type=”full”)
- Parameters:
table2 – 2nd table (Tuplist with dict)
by – list, dict or None. If the columns have the same name in both tables, a list of keys/column to use for the join. If the columns have different names in both tables, a dict in the format {name_table1: name_table2} If by is None, use all the shared keys.
suffix – if some columns have the same name in both tables but are not in “by”, a suffix will be added to their names. With suffix=[“_1”,”_2”], shared column “x” will become “x_1”, “x_2”
empty – values to give to empty cells created by the join.
if_empty_table_1 – (dict or list) if table 1 is empty, it will be replaced by this dict in the join.
if_empty_table_2 – (dict or list) if table 2 is empty, it will be replaced by this dict in the join.
- Returns:
a Table
- inner_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) Table ¶
Shortcut to join(type=”inner”)
- Parameters:
table2 – 2nd table (Tuplist with dict)
by – list, dict or None. If the columns have the same name in both tables, a list of keys/column to use for the join. If the columns have different names in both tables, a dict in the format {name_table1: name_table2} If by is None, use all the shared keys.
suffix – if some columns have the same name in both tables but are not in “by”, a suffix will be added to their names. With suffix=[“_1”,”_2”], shared column “x” will become “x_1”, “x_2”
empty – values to give to empty cells created by the join.
if_empty_table_1 – (dict or list) if table 1 is empty, it will be replaced by this dict in the join.
if_empty_table_2 – (dict or list) if table 2 is empty, it will be replaced by this dict in the join.
- Returns:
a Table
- auto_join(by=None, suffix=None, empty=None) Table ¶
Join a table with itself. Useful to create combinations of values from columns of a table.
- Parameters:
by – list, dict or None. If by is a list of keys/column, those columns will be used for the join. If by is a dict in the format {name_table1: name_table2}, those columns will be used for the join. If by is None, all combinations of rows will be created (join by dummy).
suffix – suffix to add to column to create different names. Default is (“”, “_2”). With default suffix, column id will appear as id and id_2.
empty – values to give to empty cells created by the join.
- Returns:
a tuplist
- select(*args) Table ¶
Select columns from a table
- Parameters:
args – names of the columns to select
- Returns:
Table with the selected columns.
- drop(*args) Table ¶
Drop columns from a table
- Parameters:
args – names of the columns to drop
- Returns:
a table without the selected columns.
- rename(**kwargs) Table ¶
Rename columns from a table
- Parameters:
kwargs – names of the columns to rename and new names old_name=new_name
- Returns:
a table without the selected columns.
- filter(func) Table ¶
Filter a table.
- Parameters:
func – function to use to filter
- Returns:
the filtered table.
- get_col_names(fast=False) Table ¶
Get the names of the column of the table.
- Parameters:
fast – assume that the first row has all the columns.
- Returns:
a list of keys
- to_columns() SuperDict ¶
Create a dict with a list of values for each column of the table.
- Returns:
a dict
- static from_columns(dct) Table ¶
Create a table from a dict of list (columns)
Example: dic = dict(a=[1,2,3], b=[4,5,6]) result= Table.from_column(dic) result: [{‘a’: 1, ‘b’: 4}, {‘a’: 2, ‘b’: 5}, {‘a’: 3, ‘b’: 6}]
- Parameters:
dct – a dict of list
- Returns:
a Table
- get_index(cond) list ¶
Get row number for rows which respect a condition.
- Parameters:
cond – condition/filter to apply to the rows
- Returns:
a list of row numbers
- replace(replacement=None, to_replace=None, fast=False) Table ¶
Fill missing values of a tuplist.
- Parameters:
replacement – a single value or a dict of columns and values to use as replacement.
to_replace – a single value or a dict of columns and values to replace.
fast – assume that the first row has all the columns.
- Returns:
the table with missing values filled.
- replace_empty(replacement=None, fast=False) Table ¶
Fill empty values of a tuplist.
- Parameters:
replacement – a single value or a dict of columns and values to use as replacement.
fast – assume that the first row has all the columns.
- Returns:
the table with empty values filled.
- replace_nan(replacement=None) Table ¶
Fill nan values of a tuplist.
- Parameters:
replacement – a single value or a dict of columns and values to use as replacement.
- Returns:
the table with nan values filled.
- pivot_longer(cols, names_to='variable', value_to='value') Table ¶
pivot_longer() “lengthens” data, increasing the number of rows and decreasing the number of columns. The inverse transformation is pivot_wider()
- Parameters:
cols – a list of columns to pivot
names_to – the name of the new names column
value_to – the name of the new value column
- Returns:
the table with the new columns
- pivot_wider(names_from='variable', value_from='value', id_cols=None, values_fill=None) Table ¶
pivot_wider() “widens” data, increasing the number of columns and decreasing the number of rows. The inverse transformation is pivot_longer()
- Parameters:
names_from – the name of the new name column
value_from – the name of the new value column
id_cols – set_a set of columns that uniquely identifies each observation. If None, use all columns except names_from and value_from.
values_fill – set_a value or dict of values to fill in the missing values.
- Returns:
the table with the new columns
- drop_empty(cols=None, fast=False) Table ¶
Drop rows with empty values of a tuplist.
- Parameters:
cols – list of column names or single name.
fast – assume that the first row has all the columns.
- Returns:
the table with empty values dropped.
- lag_col(col, i=1, replace=False) Table ¶
Lag a column by i steps.
- Parameters:
col – the name of the column to lag
i – the number of steps to lag
replace – replace the former value of the column. If not, create a new column lag_{col}_i
- Returns:
the table with the new column
- distinct(columns) Table ¶
Only keeps unique combinations values of the selected columns. When there are rows with duplicated values, the first one is kept.
- Parameters:
columns – names of the columns.
- Returns:
a Table (list of dict) with unique data.
- order_by(columns, reverse=False) Table ¶
Reorder the table according to the given columns.
- Parameters:
columns – names of the columns to use to sort the table.
reverse – if True, the sorted list is sorted in descending order.
- Returns:
the sorted Table
- drop_nested() Table ¶
Drop any nested column from a table. Nested value are dict or lists nested as dict values in the table. This function assume df structure is homogenous and only look at the first row to find nested values.
- Returns:
the table without nested values.
- check_type()¶
Check that the table is a list of dict.
- to_set2(columns) TupList ¶
Create a list of unique value from some columns of the table
- Parameters:
columns – Columns to select to create the set.
- Returns:
a tuplist with unique values
- to_param(keys, value, is_list=False) SuperDict ¶
Create a dict with the given columns as keys and values.
- Parameters:
keys – columns to use as keys.
value – column to use as values.
is_list – True if the values are a list instead of a single value.
- Returns:
a superdict indexed by the given keys.
- is_unique(columns) bool ¶
Check if the combination of values of given columns is unique.
- Parameters:
columns – combination of columns to check.
- Returns:
True if the combination of values of the columns is unique.
- add_row(**kwargs) Table ¶
Add a row to the table. Missing columns are filled with value None.
- Parameters:
kwargs – values of the column in the format column_name=value
- Returns:
the table with added row.
- rbind(table: list) Table ¶
Bind two tables by rows.
- Parameters:
table – another table
- Returns:
the complete table.
- col_apply(columns, func: Callable, **kwargs) Table ¶
Apply a function to a column or a list of columns.
- Parameters:
columns – column or list of columns.
func – function to apply.
- Returns:
the table
- static format_dataset(dataset)¶
Format an entire data instance applying Table() to every table. Leave dict as they are.
- Parameters:
dataset – a data instance in dict/json format.
- Returns:
a dict of Tables
- classmethod dataset_from_json(path, **kwargs)¶
Load a json file and format it applying Table() to every table.
- Parameters:
path – path of the json file
- Returns:
a dict of Tables
- static from_pandas(df)¶
Create a table from a pandas dataframe.
- Parameters:
df – a pandas dataframe
- Returns:
a Table
- to_pandas()¶
Create a pandas dataframe from a table.
- Returns:
a pandas dataframe.
- pk_save(file)¶
Save the table in a pickle file.
- Parameters:
file – path of the file
- Returns:
nothing
- static pk_load(file)¶
Load a Table from a pickle file.
- Parameters:
file – path of the file
- Returns:
The table
- to_json(path)¶
Export the table to a json file. If any column name is a number, transform it into string.
- Parameters:
path – path of the json file
- Returns:
nothing
- static from_json(path, **kwargs)¶
Create a table from a json file.
- Parameters:
path – path to json file
- Returns:
a Table containing the data.
- apply(func: Callable, *args, **kwargs)¶
Apply a function to the entire table. Useful to chain varius functions applied to the entire table.
- Parameters:
func – a function which take the table as a first argument.
args – args of the function
kwargs – kwargs of the function
- Returns:
what the function returns.
- classmethod dataset_from_excel(path, sheets=None) dict ¶
Read an Excel file and return a dict of Table()
- Parameters:
path – path fo the Excel file.
sheets – list of sheets to read (all the sheets are read if None)
- Returns:
a dict of Table objects.
- to_excel(path, sheet_name=None)¶
Write the table to an Excel file
- Parameters:
path – path fo the Excel file.
sheet_name – Name of the Excel sheet.
- Returns:
None
- classmethod from_csv(path, sep=',', encoding=None) Table ¶
Load the table from a csv file.
- Parameters:
path – path fo the Excel file.
sep – column separator in the csv file. (detected automatically if None).
encoding – encoding.
- Returns:
a dict of Table objects.
- to_csv(path, sep=',', encoding=None)¶
Write the table to a csv file.
- Parameters:
path – path fo the Excel file.
sep – column separator in the csv file.
encoding – encoding.
- Returns:
nothing.
- add(*args) None ¶
this is just a shortcut for doing
>>> TupList().append((0, 1, 2))
by doing:
>>> TupList().add(0, 1, 2)
which is a little more friendly and short
- Parameters:
args – any number of elements to append
- Returns:
modified
TupList
- append(object, /)¶
Append object to the end of the list.
- chain() TupList ¶
Flattens a TupList by applying itertools chain method
- clear()¶
Remove all items from list.
- copy()¶
Return a shallow copy of the list.
- count(value, /)¶
Return number of occurrences of value.
- extend(iterable, /)¶
Extend list by appending elements from the iterable.
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
- insert(index, object, /)¶
Insert object before index.
- intersect(input_list: Iterable) TupList ¶
Converts list and argument into sets and then intersects them.
- Parameters:
input_list (list) – list to intersect
- Returns:
new
TupList
- len() int ¶
Shortcut to:
>>> len(TupList())
- Returns:
length of list
- Return type:
int
- pop(index=-1, /)¶
Remove and return item at index (default last).
Raises IndexError if list is empty or index is out of range.
- remove(value, /)¶
Remove first occurrence of value.
Raises ValueError if the value is not present.
- reverse()¶
Reverse IN PLACE.
- set_diff(input_list: Iterable) TupList ¶
Converts list and argument into sets and then subtracts one from the other.
- Parameters:
input_list (list) – list to subtract
- Returns:
new
TupList
- sort(*, key=None, reverse=False)¶
Sort the list in ascending order and return None.
The sort is in-place (i.e. the list itself is modified) and stable (i.e. the order of two equal elements is maintained).
If a key function is given, apply it once to each list item and sort them, ascending or descending, according to their function values.
The reverse flag can be set to sort in descending order.
- take_np(indices: Iterable | int) TupList ¶
filters the tuple of each element of the list according to a list of positions
- Parameters:
indices (int or list) – a list of positions
- Returns:
a new
TupList
- to_dict(result_col: Iterable | int | None = 0, is_list: bool = True, indices: Iterable | int | None = None) SuperDict ¶
- This magic function converts a tuple list into a dictionary
by taking one or several of the columns as the result.
- Parameters:
result_col (int or list or None) – a list of keys for the result (positions of the tuple or keys of the dict)
is_list (bool) – the value of the dictionary will be a TupList?
indices (list) – optional way of determining the indices instead of being the complement of result_col
- Returns:
new
pytups.superdict.SuperDict
- to_dictlist(keys: list) TupList ¶
Converts a list of tuples to a list of dictionaries.
- Parameters:
keys (list) – a unique list of dictionary keys
- Returns:
new
TupList
- to_list() list ¶
- Returns:
list
- to_set() set ¶
- Returns:
set
- to_start_finish(compare_tups: Callable, pp: int = 1, sort: bool = True, join_func: Callable | None = None) TupList ¶
Takes a calendar tuple list of the form: (id, month) and returns a tuple list of the form (id, start_month, end_month) it works with a bigger tuple too.
- Parameters:
compare_tups (callable) – returns True if tups are not consecutive. Takes 3 arguments
pp (int) – the position in the tuple where the period is
join_func (callable) – returns joined tuple from list of consecutive tuples. Takes 1 argument.
- Returns:
new
TupList
- vapply_col(pos: int | str | None, func: Callable)¶
Like vapply, but it stores the result in one of the positions of the tuple (or dictionary) :param pos: int or str :param callable func: function to apply to create col