Table¶

The Table class is used to represent and work on a table or dataframe as a list of dict.

For example the following table

a	b
1	False
2	True
3	False

will be represented as

[{"a": 1, "b": False}, {"a": 2, "b": True}, {"a": 3, "b": False}]

This class is particularly useful for working with data in json format.

Table class¶

class mango.table.Table(data=None, check=False)¶

vapply(func: Callable, *args, **kwargs) → Table¶

maps function into each element of TupList

Parameters:: func (callable) – function to apply
Returns:: new TupList

kapply(func, *args, **kwargs) → Table¶

maps function into each key of TupList

Parameters:: func (callable) – function to apply
Returns:: new TupList

kvapply(func, *args, **kwargs) → Table¶

maps function into each element of TupList with indexes

Parameters:: func (callable) – function to apply
Returns:: new TupList

copy_shallow() → Table¶

Copies the list only. Not it’s contents

Returns:: new TupList

copy_deep() → Table¶: Copies the complete object using python’s pickle

vfilter(function: Callable) → Table¶

returns new list with only tuples for which function returns True

Parameters:: function (callable) – function to apply to each element
Returns:: new TupList

unique() → Table¶

Applies numpy.unique().

Parameters:: dtype – arguments to numpy.asarray()
Returns:: new TupList

unique2() → Table¶

Converts to set and then back to TupList.

Returns:: new TupList

sorted(**kwargs) → Table¶

Applies sorted function to elements and returns a TupList

Parameters:: kwargs – arguments for sorted function main arguments for sorted are: - key - reverse
Returns:: new TupList

take(*args, use_numpy=False) → TupList¶

Extract values from a columns of a table.

Example: result=Table([{“a”:1, “b”:2}, {“a”:3, “b”:4}]).take(“a”, “b”) result: [(1,2), (3,4)]

Parameters:

args – name of the columns to extract
use_numpy – use numpy methods in take

Returns:

a list of tuples.

show_rows(n1, n2=None)¶

Show the n1 th row of the table or rows between n1 and n2.

Parameters:

n1 – row number to start
n2 – row number to end (if None, only show row n1)

Returns:

a string

head(n=10)¶

Return the first n rows of a table.

Parameters:: n – Number of rows to show (default:10).
Returns:: set_a table with n rows or less.

peek(n=3, name=None)¶

Show the first, middle and last n rows of the table.

Parameters:

n – number of rows to show in each part.
name – name or message to print with the table

Returns:

the printed string

mutate(**kwargs) → Table¶

Add or modify a column in a table.

Example: mutate(table, a=3, b=[4,5,6], c=lambda v: v[“a”]+v[“b”], d = mean)

Note: all changes are applied over the input table and do not take into account the other changes.

Parameters:: kwargs – named arguments with the changes to apply.

The values can be:

a single value which will be applied to each row
a list with all the values of the column
a function to apply to the row.

Returns:: a table

group_by(col) → SuperDict¶

Group the rows of a table by the value fo some columns

Parameters:: col – single name of list of columns to use to group the rows

:return a SuperDict

group_mutate(group_by, **func) → Table¶

Group by the given columns and apply the given functions to the others.

Parameters:

group_by – name of the columns to group.
func – function to apply to the named column. ex: a = first, b = mean

Returns:

a table (Tuplist of dict).

sum_all(group_by=None) → Table¶

Group by the given columns and sum the others.

Parameters:: group_by – name of the columns to group.
Returns:: a table (Tuplist of dict)

summarise(group_by=None, default: Callable | None = None, **func) → Table¶

Group by the given columns and apply the given functions to the others.

Parameters:

group_by – name of the columns to group.
default – default function to apply to non-grouped columns.
func – function to apply to the named column. ex: a = first, b = mean

Returns:

a table (Tuplist of dict).

join(table2, by=None, suffix=None, jtype='full', empty=None, if_empty_table_1=None, if_empty_table_2=None) → Table¶

Join to tables. Inspired by R dplyr join functions.

Parameters:

table2 – 2nd table (Tuplist with dict)
by – list, dict or None. If the columns have the same name in both tables, a list of keys/column to use for the join. If the columns have different names in both tables, a dict in the format {name_table1: name_table2} If by is None, use all the shared keys.
suffix – if some columns have the same name in both tables but are not in “by”, a suffix will be added to their names. With suffix=[“_1”,”_2”], shared column “x” will become “x_1”, “x_2”
jtype – type of join: “full”
empty – values to give to empty cells created by the join.
if_empty_table_1 – (dict or list) if table 1 is empty, it will be replaced by this dict in the join.
if_empty_table_2 – (dict or list) if table 2 is empty, it will be replaced by this dict in the join.

Returns:

a Table

left_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) → Table¶

Shortcut to join(type=”left”)

Parameters:

table2 – 2nd table (Tuplist with dict)
by – list, dict or None. If the columns have the same name in both tables, a list of keys/column to use for the join. If the columns have different names in both tables, a dict in the format {name_table1: name_table2} If by is None, use all the shared keys.
suffix – if some columns have the same name in both tables but are not in “by”, a suffix will be added to their names. With suffix=[“_1”,”_2”], shared column “x” will become “x_1”, “x_2”
empty – values to give to empty cells created by the join.
if_empty_table_1 – (dict or list) if table 1 is empty, it will be replaced by this dict in the join.
if_empty_table_2 – (dict or list) if table 2 is empty, it will be replaced by this dict in the join.

Returns:

a Table

right_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) → Table¶

Shortcut to join(type=”right”)

Parameters:

table2 – 2nd table (Tuplist with dict)
by – list, dict or None. If the columns have the same name in both tables, a list of keys/column to use for the join. If the columns have different names in both tables, a dict in the format {name_table1: name_table2} If by is None, use all the shared keys.
suffix – if some columns have the same name in both tables but are not in “by”, a suffix will be added to their names. With suffix=[“_1”,”_2”], shared column “x” will become “x_1”, “x_2”
empty – values to give to empty cells created by the join.
if_empty_table_1 – (dict or list) if table 1 is empty, it will be replaced by this dict in the join.
if_empty_table_2 – (dict or list) if table 2 is empty, it will be replaced by this dict in the join.

Returns:

a Table

full_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) → Table¶

Shortcut to join(type=”full”)

Parameters:

table2 – 2nd table (Tuplist with dict)
by – list, dict or None. If the columns have the same name in both tables, a list of keys/column to use for the join. If the columns have different names in both tables, a dict in the format {name_table1: name_table2} If by is None, use all the shared keys.
suffix – if some columns have the same name in both tables but are not in “by”, a suffix will be added to their names. With suffix=[“_1”,”_2”], shared column “x” will become “x_1”, “x_2”
empty – values to give to empty cells created by the join.
if_empty_table_1 – (dict or list) if table 1 is empty, it will be replaced by this dict in the join.
if_empty_table_2 – (dict or list) if table 2 is empty, it will be replaced by this dict in the join.

Returns:

a Table

inner_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) → Table¶

Shortcut to join(type=”inner”)

Parameters:

table2 – 2nd table (Tuplist with dict)
by – list, dict or None. If the columns have the same name in both tables, a list of keys/column to use for the join. If the columns have different names in both tables, a dict in the format {name_table1: name_table2} If by is None, use all the shared keys.
suffix – if some columns have the same name in both tables but are not in “by”, a suffix will be added to their names. With suffix=[“_1”,”_2”], shared column “x” will become “x_1”, “x_2”
empty – values to give to empty cells created by the join.
if_empty_table_1 – (dict or list) if table 1 is empty, it will be replaced by this dict in the join.
if_empty_table_2 – (dict or list) if table 2 is empty, it will be replaced by this dict in the join.

Returns:

a Table

auto_join(by=None, suffix=None, empty=None) → Table¶

Join a table with itself. Useful to create combinations of values from columns of a table.

Parameters:

by – list, dict or None. If by is a list of keys/column, those columns will be used for the join. If by is a dict in the format {name_table1: name_table2}, those columns will be used for the join. If by is None, all combinations of rows will be created (join by dummy).
suffix – suffix to add to column to create different names. Default is (“”, “_2”). With default suffix, column id will appear as id and id_2.
empty – values to give to empty cells created by the join.

Returns:

a tuplist

select(*args) → Table¶

Select columns from a table

Parameters:: args – names of the columns to select
Returns:: Table with the selected columns.

drop(*args) → Table¶

Drop columns from a table

Parameters:: args – names of the columns to drop
Returns:: a table without the selected columns.

rename(**kwargs) → Table¶

Rename columns from a table

Parameters:: kwargs – names of the columns to rename and new names old_name=new_name
Returns:: a table without the selected columns.

filter(func) → Table¶

Filter a table.

Parameters:: func – function to use to filter
Returns:: the filtered table.

get_col_names(fast=False) → Table¶

Get the names of the column of the table.

Parameters:: fast – assume that the first row has all the columns.
Returns:: a list of keys

to_columns() → SuperDict¶

Create a dict with a list of values for each column of the table.

Returns:: a dict

static from_columns(dct) → Table¶

Create a table from a dict of list (columns)

Example: dic = dict(a=[1,2,3], b=[4,5,6]) result= Table.from_column(dic) result: [{‘a’: 1, ‘b’: 4}, {‘a’: 2, ‘b’: 5}, {‘a’: 3, ‘b’: 6}]

Parameters:: dct – a dict of list
Returns:: a Table

get_index(cond) → list¶

Get row number for rows which respect a condition.

Parameters:: cond – condition/filter to apply to the rows
Returns:: a list of row numbers

replace(replacement=None, to_replace=None, fast=False) → Table¶

Fill missing values of a tuplist.

Parameters:

replacement – a single value or a dict of columns and values to use as replacement.
to_replace – a single value or a dict of columns and values to replace.
fast – assume that the first row has all the columns.

Returns:

the table with missing values filled.

replace_empty(replacement=None, fast=False) → Table¶

Fill empty values of a tuplist.

Parameters:

replacement – a single value or a dict of columns and values to use as replacement.
fast – assume that the first row has all the columns.

Returns:

the table with empty values filled.

replace_nan(replacement=None) → Table¶

Fill nan values of a tuplist.

Parameters:: replacement – a single value or a dict of columns and values to use as replacement.
Returns:: the table with nan values filled.

pivot_longer(cols, names_to='variable', value_to='value') → Table¶

pivot_longer() “lengthens” data, increasing the number of rows and decreasing the number of columns. The inverse transformation is pivot_wider()

Parameters:

cols – a list of columns to pivot
names_to – the name of the new names column
value_to – the name of the new value column

Returns:

the table with the new columns

pivot_wider(names_from='variable', value_from='value', id_cols=None, values_fill=None) → Table¶

pivot_wider() “widens” data, increasing the number of columns and decreasing the number of rows. The inverse transformation is pivot_longer()

Parameters:

names_from – the name of the new name column
value_from – the name of the new value column
id_cols – set_a set of columns that uniquely identifies each observation. If None, use all columns except names_from and value_from.
values_fill – set_a value or dict of values to fill in the missing values.

Returns:

the table with the new columns

drop_empty(cols=None, fast=False) → Table¶

Drop rows with empty values of a tuplist.

Parameters:

cols – list of column names or single name.
fast – assume that the first row has all the columns.

Returns:

the table with empty values dropped.

lag_col(col, i=1, replace=False) → Table¶

Lag a column by i steps.

Parameters:

col – the name of the column to lag
i – the number of steps to lag
replace – replace the former value of the column. If not, create a new column lag_{col}_i

Returns:

the table with the new column

distinct(columns) → Table¶

Only keeps unique combinations values of the selected columns. When there are rows with duplicated values, the first one is kept.

Parameters:: columns – names of the columns.
Returns:: a Table (list of dict) with unique data.

order_by(columns, reverse=False) → Table¶

Reorder the table according to the given columns.

Parameters:

columns – names of the columns to use to sort the table.
reverse – if True, the sorted list is sorted in descending order.

Returns:

the sorted Table

drop_nested() → Table¶

Drop any nested column from a table. Nested value are dict or lists nested as dict values in the table. This function assume df structure is homogenous and only look at the first row to find nested values.

Returns:: the table without nested values.

check_type()¶: Check that the table is a list of dict.

to_set2(columns) → TupList¶

Create a list of unique value from some columns of the table

Parameters:: columns – Columns to select to create the set.
Returns:: a tuplist with unique values

to_param(keys, value, is_list=False) → SuperDict¶

Create a dict with the given columns as keys and values.

Parameters:

keys – columns to use as keys.
value – column to use as values.
is_list – True if the values are a list instead of a single value.

Returns:

a superdict indexed by the given keys.

is_unique(columns) → bool¶

Check if the combination of values of given columns is unique.

Parameters:: columns – combination of columns to check.
Returns:: True if the combination of values of the columns is unique.

add_row(**kwargs) → Table¶

Add a row to the table. Missing columns are filled with value None.

Parameters:: kwargs – values of the column in the format column_name=value
Returns:: the table with added row.

rbind(table: list) → Table¶

Bind two tables by rows.

Parameters:: table – another table
Returns:: the complete table.

col_apply(columns, func: Callable, **kwargs) → Table¶

Apply a function to a column or a list of columns.

Parameters:

columns – column or list of columns.
func – function to apply.

Returns:

the table

static format_dataset(dataset)¶

Format an entire data instance applying Table() to every table. Leave dict as they are.

Parameters:: dataset – a data instance in dict/json format.
Returns:: a dict of Tables

classmethod dataset_from_json(path, **kwargs)¶

Load a json file and format it applying Table() to every table.

Parameters:: path – path of the json file
Returns:: a dict of Tables

static from_pandas(df)¶

Create a table from a pandas dataframe.

Parameters:: df – a pandas dataframe
Returns:: a Table

to_pandas()¶

Create a pandas dataframe from a table.

Returns:: a pandas dataframe.

pk_save(file)¶

Save the table in a pickle file.

Parameters:: file – path of the file
Returns:: nothing

static pk_load(file)¶

Load a Table from a pickle file.

Parameters:: file – path of the file
Returns:: The table

to_json(path)¶

Export the table to a json file. If any column name is a number, transform it into string.

Parameters:: path – path of the json file
Returns:: nothing

static from_json(path, **kwargs)¶

Create a table from a json file.

Parameters:: path – path to json file
Returns:: a Table containing the data.

apply(func: Callable, *args, **kwargs)¶

Apply a function to the entire table. Useful to chain varius functions applied to the entire table.

Parameters:

func – a function which take the table as a first argument.
args – args of the function
kwargs – kwargs of the function

Returns:

what the function returns.

classmethod dataset_from_excel(path, sheets=None) → dict¶

Read an Excel file and return a dict of Table()

Parameters:

path – path fo the Excel file.
sheets – list of sheets to read (all the sheets are read if None)

Returns:

a dict of Table objects.

to_excel(path, sheet_name=None)¶

Write the table to an Excel file

Parameters:

path – path fo the Excel file.
sheet_name – Name of the Excel sheet.

Returns:

None

classmethod from_csv(path, sep=',', encoding=None) → Table¶

Load the table from a csv file.

Parameters:

path – path fo the Excel file.
sep – column separator in the csv file. (detected automatically if None).
encoding – encoding.

Returns:

a dict of Table objects.

to_csv(path, sep=',', encoding=None)¶

Write the table to a csv file.

Parameters:

path – path fo the Excel file.
sep – column separator in the csv file.
encoding – encoding.

Returns:

nothing.

add(*args) → None¶

this is just a shortcut for doing

>>> TupList().append((0, 1, 2))

by doing:

>>> TupList().add(0, 1, 2)

which is a little more friendly and short

Parameters:: args – any number of elements to append
Returns:: modified TupList

append(object, /)¶: Append object to the end of the list.

chain() → TupList¶: Flattens a TupList by applying itertools chain method

clear()¶: Remove all items from list.

copy()¶: Return a shallow copy of the list.

count(value, /)¶: Return number of occurrences of value.

extend(iterable, /)¶: Extend list by appending elements from the iterable.

index(value, start=0, stop=9223372036854775807, /)¶

Return first index of value.

Raises ValueError if the value is not present.

insert(index, object, /)¶: Insert object before index.

intersect(input_list: Iterable) → TupList¶

Converts list and argument into sets and then intersects them.

Parameters:: input_list (list) – list to intersect
Returns:: new TupList

len() → int¶

Shortcut to:

>>> len(TupList())

Returns:: length of list
Return type:: int

pop(index=-1, /)¶

Remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.

remove(value, /)¶

Remove first occurrence of value.

Raises ValueError if the value is not present.

reverse()¶: Reverse IN PLACE.

set_diff(input_list: Iterable) → TupList¶

Converts list and argument into sets and then subtracts one from the other.

Parameters:: input_list (list) – list to subtract
Returns:: new TupList

sort(*, key=None, reverse=False)¶

Sort the list in ascending order and return None.

The sort is in-place (i.e. the list itself is modified) and stable (i.e. the order of two equal elements is maintained).

If a key function is given, apply it once to each list item and sort them, ascending or descending, according to their function values.

The reverse flag can be set to sort in descending order.

take_np(indices: Iterable | int) → TupList¶

filters the tuple of each element of the list according to a list of positions

Parameters:: indices (int or list) – a list of positions
Returns:: a new TupList

to_dict(result_col: Iterable | int | None = 0, is_list: bool = True, indices: Iterable | int | None = None) → SuperDict¶

This magic function converts a tuple list into a dictionary: by taking one or several of the columns as the result.

Parameters:

result_col (int or list or None) – a list of keys for the result (positions of the tuple or keys of the dict)
is_list (bool) – the value of the dictionary will be a TupList?
indices (list) – optional way of determining the indices instead of being the complement of result_col

Returns:

new pytups.superdict.SuperDict

to_dictlist(keys: list) → TupList¶

Converts a list of tuples to a list of dictionaries.

Parameters:: keys (list) – a unique list of dictionary keys
Returns:: new TupList

to_list() → list¶

Returns:: list

to_set() → set¶

Returns:: set

to_start_finish(compare_tups: Callable, pp: int = 1, sort: bool = True, join_func: Callable | None = None) → TupList¶

Takes a calendar tuple list of the form: (id, month) and returns a tuple list of the form (id, start_month, end_month) it works with a bigger tuple too.

Parameters:

compare_tups (callable) – returns True if tups are not consecutive. Takes 3 arguments
pp (int) – the position in the tuple where the period is
join_func (callable) – returns joined tuple from list of consecutive tuples. Takes 1 argument.

Returns:

new TupList

vapply_col(pos: int | str | None, func: Callable)¶: Like vapply, but it stores the result in one of the positions of the tuple (or dictionary) :param pos: int or str :param callable func: function to apply to create col