Data analyst and Data processing in a single line

2 min readSep 19, 2024

When tackling a Machine Learning problem, the first essential steps are data analysis and data processing. In this article, I will guide you through handling these two tasks efficiently and effectively.

The standard steps for handling any CSV file are:

Check Data Types of Features: Ensure that each feature has the correct data type.
Check for Missing, Outlier, or Duplicate Values: Identify and handle any missing values, outliers, or duplicate entries.
Perform In-Depth Data Analysis: Analyze statistics, data distribution, etc.

In summary, there are many steps involved, and it can take a significant amount of time to code and analyze these aspects. YData has created the ydata_profiling library to help us handle these tasks more efficiently.

How to use

You can install ydata_profiling by running this command

pip install ydata-profiling

Demo

import pandas as pd
from ydata_profiling import ProfileReport

df = pd.read_csv('data.csv')
profile = ProfileReport(df, title="Profile csv", explorative=True)
profile.to_file("output.html")

With just two simple lines of code, you can generate a complete HTML file with detailed statistical analysis of your data.

Data analyst and Data processing in a single line

How to use

Demo

Written by Gia Huy ( CisMine)