SemanticScuttle - klotz.me » klotz: data quality+data cleaning

These one-liners provide quick and effective ways to assess the quality and consistency of the data within a Pandas DataFrame.

Code Snippet	Explanation
`df.isnull().sum()`	Counts the number of missing values per column.
`df.duplicated().sum()`	Counts the number of duplicate rows in the DataFrame.
`df.describe()`	Provides basic descriptive statistics of numerical columns.
`df.info()`	Displays a concise summary of the DataFrame including data types and presence of null values.
`df.nunique()`	Counts the number of unique values per column.
`df.apply(lambda x: x.nunique() / x.count() * 100)`	Computes the percentage of unique values for each column.
`df.isin( value » ).sum()`	Counts the number of occurrences of a specific value across all columns.
`df.applymap(lambda x: isinstance(x, type_to_check)).sum()`	Counts the number of values of a specific type (e.g., int, str) per column.
`df.dtypes`	Lists the data type for each column in the DataFrame.
`df.sample(n)`	Returns a random sample of n rows from the DataFrame.

SemanticScuttle - klotz.me