Skip to content

BarDatafier

BarDatafier

Bases: BaseDatafier

Contains data preparation modules, which includes interpolation, rank generation. data should be in this format where time is set to index

    Example:
    >>> time  col1 col2 col3 ...
    >>> 2012   1    0    2
    >>> 2013   2    3    1

Parameters:

Name Type Description Default
data pd.DataFrame

The data to be prepared, should be in this format where time is set to index

required
time_format str

Index datetime format

required
ip_freq str

Interpolation frequency

required
ip_frac float

Rank interpolation fraction (check end of docstring), by default 0.5

0.1
n_bars int

Number of bars to be visible on the plot, by default 10 or less

10
ip_method str

Interpolation Method, by default "linear"

'linear'
ip_fill_method str

fill method for ip_frac, by default "bfill"

'bfill'

ip_frac is the percentage of NaN values to be linearly interpolated for column ranks

    Consider this example
    >>>               a    b
    >>> date
    >>> 2021-11-13  1.0  4.0
    >>> 2021-11-14  NaN  NaN
    >>> 2021-11-15  NaN  NaN
    >>> 2021-11-16  NaN  NaN
    >>> 2021-11-17  NaN  NaN
    >>> 2021-11-18  2.0  6.0
with ip_frac set to 0.5, 50% of NaN's will be interpolated by ip_method while the rest will be filled by ip_fill_method. The example uses bfill, if ffill is used the filling will happen before interpolation.
    >>>              a      b
    >>> 2021-11-13  1.00  4.00  << original value --------
    >>> 2021-11-14  1.33  4.67                            |
    >>> 2021-11-15  1.67  5.33                            |  50% interpolated
    >>> 2021-11-16  2.00  6.00  <- linear interpolation   |  by ip_method
    >>> 2021-11-17  2.00  6.00      upto here             |  rest are filled
    >>> 2021-11-18  2.00  6.00  << original value---------   by ip_fill_method
This adds stability in the barChartRace and reduces constant shaking of bars.

get_data_ranks(ip_frac=0.1)

Creates column ranks and interpolates them.

Parameters:

Name Type Description Default
ip_frac float

pct of NaNs to interpolate by 'self.method' rest will be backfilled, by default 0.1

0.1

Returns:

Type Description
pd.DataFrame

Interpolated column ranks

get_top_cols()

Selects columns where column_rank < n_bars in any timestamp

Returns:

Type Description
list[str]

List of columns that will appear in the animation at least once