Skip to content

Datafier

Datafier is deprecated, use plot specific datafiers instead.

Datafier

Datafier is deprecated, use plot specific datafiers instead.

Contains data preparation modules, which includes interpolation, rank generation, color_generation. data should be in this format where time is set to index

    Example:
    >>> time  col1 col2 col3 ...
    >>> 2012   1    0    2
    >>> 2013   2    3    1

Parameters:

Name Type Description Default
data pd.DataFrame

The data to be prepared, should be in this format where time is set to index

required
time_format str

Index datetime format

required
ip_freq str

Interpolation frequency

required
ip_frac float

Rank interpolation fraction (check end of docstring), by default 0.5

0.5
n_bars int

Number of bars to be visible on the plot, by default 10 or less

10
palettes list[str]

List of color palettes to generate bar colors, by default ["viridis"]

['viridis']
    ip_frac is the percentage of NaN values to be linearly
    interpolated for column ranks

    Consider this example
    >>>               a    b
    >>> date
    >>> 2021-11-13  1.0  4.0
    >>> 2021-11-14  NaN  NaN
    >>> 2021-11-15  NaN  NaN
    >>> 2021-11-16  NaN  NaN
    >>> 2021-11-17  NaN  NaN
    >>> 2021-11-18  2.0  6.0

    with ip_frac set to 0.5, 50% of NaN's will be linearly
    interpolated while the rest will back filled.

    >>>              a      b
    >>> 2021-11-13  1.00  4.00  << original value --------
    >>> 2021-11-14  1.33  4.67                            |
    >>> 2021-11-15  1.67  5.33                            |  50% linearly
    >>> 2021-11-16  2.00  6.00  <- linear interpolation   |  interpolated
    >>> 2021-11-17  2.00  6.00      upto here             |  rest are filled.
    >>> 2021-11-18  2.00  6.00  << original value---------

    This adds some stability in the barChartRace
    and reduces constantly shaking of bars.

add_var(row_var=None, col_var=None)

Adds additional variables to the data, both row and column wise.

Row wise data format: The index should be equal to that of the actual data.

    time  leap_year col2   ...
    2012    yes      0
    2013    no       3
Column wise data format: The index should be equal to the columns of the actual data.
    index  continent   col2 ...
    ind    Asia         0
    usa    N America    3
    jap    Asia         2

Parameters:

Name Type Description Default
row_var pd.DataFrame

Dataframe containing variables related to time, by default None

None
col_var pd.DataFrame

Dataframe containing variables related to columns, by default None

None

interpolate_even(data, freq, method='linear')

Interpolates the given dataframe according to the frequency

Parameters:

Name Type Description Default
data pd.DataFrame

Dataframe contaning the data

required
freq str

Interpolation frequency

required
method str

Interpolation method, by default "linear"

'linear'

Returns:

Type Description
pd.DataFrame

Interpolated dataframe

get_prepared_data(data, ip_frac=0.5)

Creates interpolated data and column ranks

Parameters:

Name Type Description Default
data pd.DataFrame

Dataframe containing the data

required
ip_frac float

Interpolation fraction, by default 0.5

0.5

Returns:

Type Description
tuple[pd.DataFrame, pd.DataFrame]

Tuple containing the following data

    pd.DataFrame: Interpolated data values
    pd.DataFrame: Interpolated column ranks

get_top_cols()

Selects columns where column_rank < n_bars in any timestamp

Returns:

Type Description
list[int]

List of columns that will appear in the animation at least once

get_bar_colors()

Generates bar (column) colors based on the given color palettes

Returns:

Type Description
dict[str, str]

dict containing column to color mapping