Handy Python Pandas for Data Normalization and Scaling
Data Cleaning & Data Preparation Series — from sklearn.preprocessing, scaler=MinMaxScaler(), scaler=StandardScaler(),scaler.fit_transform(df)
You can download the Jupyter notebook and data of this tutorial here
Table of Contents
1. Introduction
2. Data normalization using MinMaxScaler
3. Scaling data using StandardScaler
1. Introduction
Data normalization and scaling are important steps in data preprocessing before modeling. These techniques help to bring all the features to a similar scale, which is essential for certain machine learning algorithms to work effectively.
Normalization is particularly useful for models that require inputs to be on the same scale, such as K-nearest neighbors and artificial neural networks. Scaling is useful for models that require features to be on the same scale, such as support vector machines and linear regression.
In this post, we will discuss how to normalize and scale data using pandas library in Python.
2. Data normalization using MinMaxScaler
Normalization is the process of transforming the data to a common scale. The main objective of normalization is to rescale the features to a range of 0 to 1. This makes it easier to compare the data as it eliminates the effects of the scale on the analysis.
Pandas provides a convenient way to normalize data using the MinMaxScaler class from the sklearn.preprocessing module. Here is an example of how to use the MinMaxScaler to normalize the data.
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
# read the data from a csv file
# data = pd.read_csv('data.csv')
# Create a sample dataframe
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50]
})
# create a MinMaxScaler object
scaler = MinMaxScaler()
# fit and transform the data
normalized_data = scaler.fit_transform(df)
# create a new DataFrame with the normalized data
normalized_df = pd.DataFrame(normalized_data, columns=df.columns)
print("Raw Data")
print(df)
print("\nNormalized Data")
print(normalized_df)
In this example, we first read the data from a csv file using pandas. We then create a MinMaxScaler object and fit it to the data using the fit_transform method. Finally, we create a new DataFrame with the normalized data and columns.
3. Scaling data using StandardScaler
Scaling is the process of transforming the data to a new scale without changing the shape of the data distribution. Scaling is useful for data that has a wider range of values or different units of measurement. Pandas provides a way to scale data using the StandardScaler class from the sklearn.preprocessing module. Here is an example of how to use the StandardScaler to scale the data.
from sklearn.preprocessing import StandardScaler
import pandas as pd
# read the data from a csv file
# data = pd.read_csv('data.csv')
# Create a sample dataframe
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50]
})
# create a StandardScaler object
scaler = StandardScaler()
# fit and transform the data
scaled_data = scaler.fit_transform(df)
# create a new DataFrame with the scaled data
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)
print("Raw Data")
print(df)
print("\nNormalized Data")
print(df_scaled)
In this example, we first read the data from a csv file using pandas. We then create a StandardScaler object and fit it to the data using the fit_transform method. Finally, we create a new DataFrame with the scaled data and columns.
Conclusion
In this post, we discussed how to normalize and scale data using pandas library in Python. Normalization and scaling are important steps in data preprocessing before modeling. These techniques help to bring all the features to a similar scale, which is essential for certain machine learning algorithms to work effectively. Using the MinMaxScaler and StandardScaler classes from the sklearn.preprocessing module, we can easily normalize and scale the data in pandas.
Many thanks for reading this post!🙏.
If you found this content helpful😊, please LIKE 👍, SHARE, and FOLLOW to stay updated on our future posts.
If you have a moment, I encourage you to see my other kernels below: