Table

The Table class is used to represent and work on a table or dataframe as a list of dict.

For example the following table

a

b

1

False

2

True

3

False

will be represented as

[{"a": 1, "b": False}, {"a": 2, "b": True}, {"a": 3, "b": False}]

This class is particularly useful for working with data in json format.

Table class

class mango.table.Table(data=None, check=False)
vapply(func: Callable, *args, **kwargs) Table

maps function into each element of TupList

Parameters:

func (callable) – function to apply

Returns:

new TupList

kapply(func, *args, **kwargs) Table

maps function into each key of TupList

Parameters:

func (callable) – function to apply

Returns:

new TupList

kvapply(func, *args, **kwargs) Table

maps function into each element of TupList with indexes

Parameters:

func (callable) – function to apply

Returns:

new TupList

copy_shallow() Table

Copies the list only. Not it’s contents

Returns:

new TupList

copy_deep() Table

Copies the complete object using python’s pickle

vfilter(function: Callable) Table

returns new list with only tuples for which function returns True

Parameters:

function (callable) – function to apply to each element

Returns:

new TupList

unique() Table

Applies numpy.unique().

Parameters:

dtype – arguments to numpy.asarray()

Returns:

new TupList

unique2() Table

Converts to set and then back to TupList.

Returns:

new TupList

sorted(**kwargs) Table

Applies sorted function to elements and returns a TupList

Parameters:

kwargs – arguments for sorted function main arguments for sorted are: - key - reverse

Returns:

new TupList

take(*args, use_numpy=False) TupList

Extract values from a columns of a table.

Example: result=Table([{“a”:1, “b”:2}, {“a”:3, “b”:4}]).take(“a”, “b”) result: [(1,2), (3,4)]

Parameters:
  • args – name of the columns to extract

  • use_numpy – use numpy methods in take

Returns:

a list of tuples.

show_rows(n1, n2=None)

Show the n1 th row of the table or rows between n1 and n2.

Parameters:
  • n1 – row number to start

  • n2 – row number to end (if None, only show row n1)

Returns:

a string

head(n=10)

Return the first n rows of a table.

Parameters:

n – Number of rows to show (default:10).

Returns:

set_a table with n rows or less.

peek(n=3, name=None)

Show the first, middle and last n rows of the table.

Parameters:
  • n – number of rows to show in each part.

  • name – name or message to print with the table

Returns:

the printed string

mutate(**kwargs) Table

Add or modify a column in a table.

Example: mutate(table, a=3, b=[4,5,6], c=lambda v: v[“a”]+v[“b”], d = mean)

Note: all changes are applied over the input table and do not take into account the other changes.

Parameters:

kwargs – named arguments with the changes to apply.

The values can be:
  • a single value which will be applied to each row

  • a list with all the values of the column

  • a function to apply to the row.

Returns:

a table

group_by(col) SuperDict

Group the rows of a table by the value fo some columns

Parameters:

col – single name of list of columns to use to group the rows

:return a SuperDict

group_mutate(group_by, **func) Table

Group by the given columns and apply the given functions to the others.

Parameters:
  • group_by – name of the columns to group.

  • func – function to apply to the named column. ex: a = first, b = mean

Returns:

a table (Tuplist of dict).

sum_all(group_by=None) Table

Group by the given columns and sum the others.

Parameters:

group_by – name of the columns to group.

Returns:

a table (Tuplist of dict)

summarise(group_by=None, default: Callable | None = None, **func) Table

Group by the given columns and apply the given functions to the others.

Parameters:
  • group_by – name of the columns to group.

  • default – default function to apply to non-grouped columns.

  • func – function to apply to the named column. ex: a = first, b = mean

Returns:

a table (Tuplist of dict).

join(table2, by=None, suffix=None, jtype='full', empty=None, if_empty_table_1=None, if_empty_table_2=None) Table

Join to tables. Inspired by R dplyr join functions.

Parameters:
  • table2 – 2nd table (Tuplist with dict)

  • by – list, dict or None. If the columns have the same name in both tables, a list of keys/column to use for the join. If the columns have different names in both tables, a dict in the format {name_table1: name_table2} If by is None, use all the shared keys.

  • suffix – if some columns have the same name in both tables but are not in “by”, a suffix will be added to their names. With suffix=[“_1”,”_2”], shared column “x” will become “x_1”, “x_2”

  • jtype – type of join: “full”

  • empty – values to give to empty cells created by the join.

  • if_empty_table_1 – (dict or list) if table 1 is empty, it will be replaced by this dict in the join.

  • if_empty_table_2 – (dict or list) if table 2 is empty, it will be replaced by this dict in the join.

Returns:

a Table

left_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) Table

Shortcut to join(type=”left”)

Parameters:
  • table2 – 2nd table (Tuplist with dict)

  • by – list, dict or None. If the columns have the same name in both tables, a list of keys/column to use for the join. If the columns have different names in both tables, a dict in the format {name_table1: name_table2} If by is None, use all the shared keys.

  • suffix – if some columns have the same name in both tables but are not in “by”, a suffix will be added to their names. With suffix=[“_1”,”_2”], shared column “x” will become “x_1”, “x_2”

  • empty – values to give to empty cells created by the join.

  • if_empty_table_1 – (dict or list) if table 1 is empty, it will be replaced by this dict in the join.

  • if_empty_table_2 – (dict or list) if table 2 is empty, it will be replaced by this dict in the join.

Returns:

a Table

right_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) Table

Shortcut to join(type=”right”)

Parameters:
  • table2 – 2nd table (Tuplist with dict)

  • by – list, dict or None. If the columns have the same name in both tables, a list of keys/column to use for the join. If the columns have different names in both tables, a dict in the format {name_table1: name_table2} If by is None, use all the shared keys.

  • suffix – if some columns have the same name in both tables but are not in “by”, a suffix will be added to their names. With suffix=[“_1”,”_2”], shared column “x” will become “x_1”, “x_2”

  • empty – values to give to empty cells created by the join.

  • if_empty_table_1 – (dict or list) if table 1 is empty, it will be replaced by this dict in the join.

  • if_empty_table_2 – (dict or list) if table 2 is empty, it will be replaced by this dict in the join.

Returns:

a Table

full_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) Table

Shortcut to join(type=”full”)

Parameters:
  • table2 – 2nd table (Tuplist with dict)

  • by – list, dict or None. If the columns have the same name in both tables, a list of keys/column to use for the join. If the columns have different names in both tables, a dict in the format {name_table1: name_table2} If by is None, use all the shared keys.

  • suffix – if some columns have the same name in both tables but are not in “by”, a suffix will be added to their names. With suffix=[“_1”,”_2”], shared column “x” will become “x_1”, “x_2”

  • empty – values to give to empty cells created by the join.

  • if_empty_table_1 – (dict or list) if table 1 is empty, it will be replaced by this dict in the join.

  • if_empty_table_2 – (dict or list) if table 2 is empty, it will be replaced by this dict in the join.

Returns:

a Table

inner_join(table2, by=None, suffix=None, empty=None, if_empty_table_1=None, if_empty_table_2=None) Table

Shortcut to join(type=”inner”)

Parameters:
  • table2 – 2nd table (Tuplist with dict)

  • by – list, dict or None. If the columns have the same name in both tables, a list of keys/column to use for the join. If the columns have different names in both tables, a dict in the format {name_table1: name_table2} If by is None, use all the shared keys.

  • suffix – if some columns have the same name in both tables but are not in “by”, a suffix will be added to their names. With suffix=[“_1”,”_2”], shared column “x” will become “x_1”, “x_2”

  • empty – values to give to empty cells created by the join.

  • if_empty_table_1 – (dict or list) if table 1 is empty, it will be replaced by this dict in the join.

  • if_empty_table_2 – (dict or list) if table 2 is empty, it will be replaced by this dict in the join.

Returns:

a Table

auto_join(by=None, suffix=None, empty=None) Table

Join a table with itself. Useful to create combinations of values from columns of a table.

Parameters:
  • by – list, dict or None. If by is a list of keys/column, those columns will be used for the join. If by is a dict in the format {name_table1: name_table2}, those columns will be used for the join. If by is None, all combinations of rows will be created (join by dummy).

  • suffix – suffix to add to column to create different names. Default is (“”, “_2”). With default suffix, column id will appear as id and id_2.

  • empty – values to give to empty cells created by the join.

Returns:

a tuplist

select(*args) Table

Select columns from a table

Parameters:

args – names of the columns to select

Returns:

Table with the selected columns.

drop(*args) Table

Drop columns from a table

Parameters:

args – names of the columns to drop

Returns:

a table without the selected columns.

rename(**kwargs) Table

Rename columns from a table

Parameters:

kwargs – names of the columns to rename and new names old_name=new_name

Returns:

a table without the selected columns.

filter(func) Table

Filter a table.

Parameters:

func – function to use to filter

Returns:

the filtered table.

get_col_names(fast=False) Table

Get the names of the column of the table.

Parameters:

fast – assume that the first row has all the columns.

Returns:

a list of keys

to_columns() SuperDict

Create a dict with a list of values for each column of the table.

Returns:

a dict

static from_columns(dct) Table

Create a table from a dict of list (columns)

Example: dic = dict(a=[1,2,3], b=[4,5,6]) result= Table.from_column(dic) result: [{‘a’: 1, ‘b’: 4}, {‘a’: 2, ‘b’: 5}, {‘a’: 3, ‘b’: 6}]

Parameters:

dct – a dict of list

Returns:

a Table

get_index(cond) list

Get row number for rows which respect a condition.

Parameters:

cond – condition/filter to apply to the rows

Returns:

a list of row numbers

replace(replacement=None, to_replace=None, fast=False) Table

Fill missing values of a tuplist.

Parameters:
  • replacement – a single value or a dict of columns and values to use as replacement.

  • to_replace – a single value or a dict of columns and values to replace.

  • fast – assume that the first row has all the columns.

Returns:

the table with missing values filled.

replace_empty(replacement=None, fast=False) Table

Fill empty values of a tuplist.

Parameters:
  • replacement – a single value or a dict of columns and values to use as replacement.

  • fast – assume that the first row has all the columns.

Returns:

the table with empty values filled.

replace_nan(replacement=None) Table

Fill nan values of a tuplist.

Parameters:

replacement – a single value or a dict of columns and values to use as replacement.

Returns:

the table with nan values filled.

pivot_longer(cols, names_to='variable', value_to='value') Table

pivot_longer() “lengthens” data, increasing the number of rows and decreasing the number of columns. The inverse transformation is pivot_wider()

Parameters:
  • cols – a list of columns to pivot

  • names_to – the name of the new names column

  • value_to – the name of the new value column

Returns:

the table with the new columns

pivot_wider(names_from='variable', value_from='value', id_cols=None, values_fill=None) Table

pivot_wider() “widens” data, increasing the number of columns and decreasing the number of rows. The inverse transformation is pivot_longer()

Parameters:
  • names_from – the name of the new name column

  • value_from – the name of the new value column

  • id_cols – set_a set of columns that uniquely identifies each observation. If None, use all columns except names_from and value_from.

  • values_fill – set_a value or dict of values to fill in the missing values.

Returns:

the table with the new columns

drop_empty(cols=None, fast=False) Table

Drop rows with empty values of a tuplist.

Parameters:
  • cols – list of column names or single name.

  • fast – assume that the first row has all the columns.

Returns:

the table with empty values dropped.

lag_col(col, i=1, replace=False) Table

Lag a column by i steps.

Parameters:
  • col – the name of the column to lag

  • i – the number of steps to lag

  • replace – replace the former value of the column. If not, create a new column lag_{col}_i

Returns:

the table with the new column

distinct(columns) Table

Only keeps unique combinations values of the selected columns. When there are rows with duplicated values, the first one is kept.

Parameters:

columns – names of the columns.

Returns:

a Table (list of dict) with unique data.

order_by(columns, reverse=False) Table

Reorder the table according to the given columns.

Parameters:
  • columns – names of the columns to use to sort the table.

  • reverse – if True, the sorted list is sorted in descending order.

Returns:

the sorted Table

drop_nested() Table

Drop any nested column from a table. Nested value are dict or lists nested as dict values in the table. This function assume df structure is homogenous and only look at the first row to find nested values.

Returns:

the table without nested values.

check_type()

Check that the table is a list of dict.

to_set2(columns) TupList

Create a list of unique value from some columns of the table

Parameters:

columns – Columns to select to create the set.

Returns:

a tuplist with unique values

to_param(keys, value, is_list=False) SuperDict

Create a dict with the given columns as keys and values.

Parameters:
  • keys – columns to use as keys.

  • value – column to use as values.

  • is_list – True if the values are a list instead of a single value.

Returns:

a superdict indexed by the given keys.

is_unique(columns) bool

Check if the combination of values of given columns is unique.

Parameters:

columns – combination of columns to check.

Returns:

True if the combination of values of the columns is unique.

add_row(**kwargs) Table

Add a row to the table. Missing columns are filled with value None.

Parameters:

kwargs – values of the column in the format column_name=value

Returns:

the table with added row.

rbind(table: list) Table

Bind two tables by rows.

Parameters:

table – another table

Returns:

the complete table.

col_apply(columns, func: Callable, **kwargs) Table

Apply a function to a column or a list of columns.

Parameters:
  • columns – column or list of columns.

  • func – function to apply.

Returns:

the table

static format_dataset(dataset)

Format an entire data instance applying Table() to every table. Leave dict as they are.

Parameters:

dataset – a data instance in dict/json format.

Returns:

a dict of Tables

classmethod dataset_from_json(path, **kwargs)

Load a json file and format it applying Table() to every table.

Parameters:

path – path of the json file

Returns:

a dict of Tables

static from_pandas(df)

Create a table from a pandas dataframe.

Parameters:

df – a pandas dataframe

Returns:

a Table

to_pandas()

Create a pandas dataframe from a table.

Returns:

a pandas dataframe.

pk_save(file)

Save the table in a pickle file.

Parameters:

file – path of the file

Returns:

nothing

static pk_load(file)

Load a Table from a pickle file.

Parameters:

file – path of the file

Returns:

The table

to_json(path)

Export the table to a json file. If any column name is a number, transform it into string.

Parameters:

path – path of the json file

Returns:

nothing

static from_json(path, **kwargs)

Create a table from a json file.

Parameters:

path – path to json file

Returns:

a Table containing the data.

apply(func: Callable, *args, **kwargs)

Apply a function to the entire table. Useful to chain varius functions applied to the entire table.

Parameters:
  • func – a function which take the table as a first argument.

  • args – args of the function

  • kwargs – kwargs of the function

Returns:

what the function returns.

classmethod dataset_from_excel(path, sheets=None) dict

Read an Excel file and return a dict of Table()

Parameters:
  • path – path fo the Excel file.

  • sheets – list of sheets to read (all the sheets are read if None)

Returns:

a dict of Table objects.

to_excel(path, sheet_name=None)

Write the table to an Excel file

Parameters:
  • path – path fo the Excel file.

  • sheet_name – Name of the Excel sheet.

Returns:

None

classmethod from_csv(path, sep=',', encoding=None) Table

Load the table from a csv file.

Parameters:
  • path – path fo the Excel file.

  • sep – column separator in the csv file. (detected automatically if None).

  • encoding – encoding.

Returns:

a dict of Table objects.

to_csv(path, sep=',', encoding=None)

Write the table to a csv file.

Parameters:
  • path – path fo the Excel file.

  • sep – column separator in the csv file.

  • encoding – encoding.

Returns:

nothing.

add(*args) None

this is just a shortcut for doing

>>> TupList().append((0, 1, 2))

by doing:

>>> TupList().add(0, 1, 2)

which is a little more friendly and short

Parameters:

args – any number of elements to append

Returns:

modified TupList

append(object, /)

Append object to the end of the list.

chain() TupList

Flattens a TupList by applying itertools chain method

clear()

Remove all items from list.

copy()

Return a shallow copy of the list.

count(value, /)

Return number of occurrences of value.

extend(iterable, /)

Extend list by appending elements from the iterable.

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

insert(index, object, /)

Insert object before index.

intersect(input_list: Iterable) TupList

Converts list and argument into sets and then intersects them.

Parameters:

input_list (list) – list to intersect

Returns:

new TupList

len() int

Shortcut to:

>>> len(TupList())
Returns:

length of list

Return type:

int

pop(index=-1, /)

Remove and return item at index (default last).

Raises IndexError if list is empty or index is out of range.

remove(value, /)

Remove first occurrence of value.

Raises ValueError if the value is not present.

reverse()

Reverse IN PLACE.

set_diff(input_list: Iterable) TupList

Converts list and argument into sets and then subtracts one from the other.

Parameters:

input_list (list) – list to subtract

Returns:

new TupList

sort(*, key=None, reverse=False)

Sort the list in ascending order and return None.

The sort is in-place (i.e. the list itself is modified) and stable (i.e. the order of two equal elements is maintained).

If a key function is given, apply it once to each list item and sort them, ascending or descending, according to their function values.

The reverse flag can be set to sort in descending order.

take_np(indices: Iterable | int) TupList

filters the tuple of each element of the list according to a list of positions

Parameters:

indices (int or list) – a list of positions

Returns:

a new TupList

to_dict(result_col: Iterable | int | None = 0, is_list: bool = True, indices: Iterable | int | None = None) SuperDict
This magic function converts a tuple list into a dictionary

by taking one or several of the columns as the result.

Parameters:
  • result_col (int or list or None) – a list of keys for the result (positions of the tuple or keys of the dict)

  • is_list (bool) – the value of the dictionary will be a TupList?

  • indices (list) – optional way of determining the indices instead of being the complement of result_col

Returns:

new pytups.superdict.SuperDict

to_dictlist(keys: list) TupList

Converts a list of tuples to a list of dictionaries.

Parameters:

keys (list) – a unique list of dictionary keys

Returns:

new TupList

to_list() list
Returns:

list

to_set() set
Returns:

set

to_start_finish(compare_tups: Callable, pp: int = 1, sort: bool = True, join_func: Callable | None = None) TupList

Takes a calendar tuple list of the form: (id, month) and returns a tuple list of the form (id, start_month, end_month) it works with a bigger tuple too.

Parameters:
  • compare_tups (callable) – returns True if tups are not consecutive. Takes 3 arguments

  • pp (int) – the position in the tuple where the period is

  • join_func (callable) – returns joined tuple from list of consecutive tuples. Takes 1 argument.

Returns:

new TupList

vapply_col(pos: int | str | None, func: Callable)

Like vapply, but it stores the result in one of the positions of the tuple (or dictionary) :param pos: int or str :param callable func: function to apply to create col