String Cleaning

clean_strcol

pywrangle.str_cleaning.clean_strcol.clean_strcol(df: DataFrame, colname: str, case: Union[l, t, u] = 'l', trim: bool = True) → DataFrame

Cleans column in DataFrame based on case and trim args.

Parameters
  • df (DataFrame) – DataFrame to clean.

  • colname (str) – Column to clean.

  • case (Union['l', 't', 'u']) – Case to standardize column, available in constants.py module. Defaults to ‘l’ for lowercase.

  • trim (bool, optional) – If should trim white spaces from column. Defaults to True.

Returns

Returns DataFrame with cleaned strings in specified column.

Return type

DataFrame

Example

>>> df1.animals = pw.clean_strcol(df1, 'animals', CASE_LIST[i])

clean_all_strcol

pywrangle.str_cleaning.clean_all_strcol.clean_all_strcols(df: DataFrame, columns: Optional[Union[list, tuple]] = None, col_cases: Optional[Union[list, tuple]] = None, trim: bool = True, clean_case: Union[l, t, u] = 'l') → DataFrame

Returns DataFrame with cleaned string columns.

Parameters
  • df (DataFrame) – DataFrame to clean.

  • col_cases (Union[ list, tuple, None]) – Names of the columns to clean. If not specified, will attempt to clean all columns.

  • columns (Union[list, tuple, None], optional) – col_cases to use with the columns. If not specified, will default to optional clean_case parameter.

  • trim (bool, optional) – If should trim the string data in columns. Defaults to True.

  • clean_case (Union['l', 't', 'u']) – Sentence case to default string column cleaning. Defaults to ‘l’, or lowercase.

Returns

Returns DataFrame with cleaned string columns.

Return type

DataFrame

Notes

  • If columns is not specified, the function will clean all string columns in DataFrame.

  • May optionally pass column & col_cases to specify what columns to clean and how.

  • Available clean_case arguments represent lower, title, and upper respectively.

Example

>>> df = create_df.create_mixed_df_size(10, 10)
>>> df = pw.clean_all_strcols(df)

Record   |   Column   |   Is Str Col   |   Clean Method
------   |   ------   |   ----------   |   ------------
    1    |   A        |        False   |   None
    2    |   B        |         True   |   lower
    3    |   C        |        False   |   None
    4    |   D        |         True   |   lower
    5    |   E        |        False   |   None