DataFrame Changes¶
record_df_info¶
-
pywrangle.df_changes.record_df_info.record_df_info(df: DataFrame, name: Union[str, int] = None) → dict¶ Returns dict with information about DataFrame, including name, cols, rows, and size.
- Parameters
df (DataFrame) – DataFrame to record information from.
name (Union[str, int], optional) – Name of the DataFrame for comparison. Defaults to None.
- Returns
Contains information about DataFrame.
- Return type
dict
Notes
This function allows users change a DataFrame while recording its previous state.
For instance, after filtering a DataFrame, you may compare the two DataFrames using the print_df_info function.
Example
>>> df = create_df.create_int_df_size(cols= 10, rows= 20) >>> df_info = pw.record_df_info(df) >>> print(df_info) {'name': None, 'cols': 10, 'rows': 20, 'size': 200}
print_df_info¶
-
pywrangle.df_changes.print_df_info.print_df_info(*args: List[Union[df, dict]], compare_dfs: bool = True, compare_base_df: int = 0, compare_end_df: int = - 1, abs_comparison: bool = True, relative_comparison: bool = True) → None¶ Prints DataFrame information from args.
Args may include either be either pd.DataFrame or a dict returned from the record_df_info function.
- Parameters
args (List[ Union['df', dict]]) – List of DataFrames & dicts to print information.
compare_dfs (bool, optional) – Show the difference between 2 DataFrames. May show absolute and relative differences. Defaults to True.
compare_base_df (int) – Index of base DataFrame for comparison. Defaults to 0.
compare_end_df (int) – Index of DataFrame to compare to base. Defaults to -1.
abs_comparison (bool) – If should show absolute comparison between DataFrames. Defaults to True.
relative_comparison (bool) – If should show relative comparison between DataFrames. Defaults to True.
Notes
DataFrames are assigned a name based on the index that they are passed into args.
Relative (%) difference is calculated as total of base df.
Example
>>> df1, df2 = (create_df.create_int_df_size(cols= i * 10, rows= i * 20) for i in range(1, 3)) >>> pw.print_df_info(df2, df1) Name | Cols | Rows | Size -------- | ----- | ----- | ----- 0 | 20 | 40 | 800 1 | 10 | 20 | 200 Abs Diff | -10 | -20 | -600 % Diff | -50.0 | -50.0 | -75.0 Compared indices -1 & 0