SemanticScuttle - klotz.me » klotz: python+pandas

klotz: python* + pandas*

10 Pandas One-Liners for Quick Data Quality Checks

These one-liners provide quick and effective ways to assess the quality and consistency of the data within a Pandas DataFrame.

Code Snippet	Explanation
`df.isnull().sum()`	Counts the number of missing values per column.
`df.duplicated().sum()`	Counts the number of duplicate rows in the DataFrame.
`df.describe()`	Provides basic descriptive statistics of numerical columns.
`df.info()`	Displays a concise summary of the DataFrame including data types and presence of null values.
`df.nunique()`	Counts the number of unique values per column.
`df.apply(lambda x: x.nunique() / x.count() * 100)`	Computes the percentage of unique values for each column.
`df.isin( value » ).sum()`	Counts the number of occurrences of a specific value across all columns.
`df.applymap(lambda x: isinstance(x, type_to_check)).sum()`	Counts the number of values of a specific type (e.g., int, str) per column.
`df.dtypes`	Lists the data type for each column in the DataFrame.
`df.sample(n)`	Returns a random sample of n rows from the DataFrame.

2025-01-03 Tags: pandas, data quality, one-liners, data cleaning, python, data engineering by klotz

Three Important Pandas Functions You Need to Know

Mastering specific Pandas functions can enhance data manipulation skills for data scientists using Python, focusing on less explored methods for data transformation and analysis.

2025-01-02 Tags: pandas, python, data science, apply, data pipeline by klotz

How to Reset a pandas DataFrame Index

Reset a pandas DataFrame index

2024-11-07 Tags: pandas, dataframe, index, python, data science by klotz

You Don’t Need Matplotlib When Pandas Is Enough for Data Visualisation

This article demonstrates how to use Pandas plotting capabilities for common data visualization tasks, suggesting that Pandas can be sufficient for routine EDA without relying on libraries like Matplotlib.

2024-07-22 Tags: pandas, data visualization, matplotlib, eda, python by klotz

How moving from Pandas to Polars made me write better code (without writing better code)

An exploration of the benefits of switching from the popular Python library Pandas to the newer Polars for data manipulation tasks, highlighting improvements in performance, concurrency, and ease of use.

2024-07-13 Tags: pandas, polars, data engineering, python, dataframe by klotz

Demystifying Matplotlib

There’s a reason you’re confused

2023-11-03 Tags: matplotlib, machine learning, statistics, pandas, python, model, towardsdatascience by klotz

Bigquery DataFrame

2023-10-14 Tags: bigquery, pandas, python, data frame, gcp, data engineering by klotz

why start using sktime for forecasting

2023-07-14 Tags: sktime, python, pandas, time-series, forecasting by klotz

Develop Data Visualization Interfaces in Python With Dash – Real Python

2023-02-24 Tags: python, dash, visualization, pandas, dashboard by klotz

How to Calculate Conditional Probabilities from Any DataFrame in 3 Lines of Code | by Graham Harrison | Feb, 2023 | Towards Data Science

2023-02-13 Tags: bayesian, conditional probability, pandas, python, statistics, machine learning by klotz