Data analyst and Data processing in a single line

Gia Huy ( CisMine)
2 min readSep 19, 2024

--

When tackling a Machine Learning problem, the first essential steps are data analysis and data processing. In this article, I will guide you through handling these two tasks efficiently and effectively.

The standard steps for handling any CSV file are:

  1. Check Data Types of Features: Ensure that each feature has the correct data type.
  2. Check for Missing, Outlier, or Duplicate Values: Identify and handle any missing values, outliers, or duplicate entries.
  3. Perform In-Depth Data Analysis: Analyze statistics, data distribution, etc.

In summary, there are many steps involved, and it can take a significant amount of time to code and analyze these aspects. YData has created the ydata_profiling library to help us handle these tasks more efficiently.

How to use

You can install ydata_profiling by running this command

pip install ydata-profiling

Demo

import pandas as pd
from ydata_profiling import ProfileReport

df = pd.read_csv('data.csv')
profile = ProfileReport(df, title="Profile csv", explorative=True)
profile.to_file("output.html")

With just two simple lines of code, you can generate a complete HTML file with detailed statistical analysis of your data.

--

--

Gia Huy ( CisMine)

My name is Huy Gia. I am currently pursuing a B.Sc. degree. I am interested in the following topics: DL in Computer Vision, Parallel Programming With Cuda.