
Project Simple Linear Regression
πΌ Salary Prediction Using Linear Regression
This project uses a simple machine learning model to predict salaries based on years of experience. Itβs ideal for beginners looking to understand supervised learning and regression techniques.
Project Link
π View the Salary Prediction Project on Google Colab
π Dataset Overview
File Name: Salary_dataset.csv
Columns:
YearsExperience
: Number of years someone has worked.Salary
: The salary associated with that experience level (in dollars).
This dataset is clean and minimal, making it great for learning linear regression.
π§ Project Type
- Category: Supervised Machine Learning
- Algorithm: Linear Regression
- Problem Type: Regression (predicting a continuous value)
Weβre using the known input feature YearsExperience
to predict a continuous output variable Salary
.
π§ Step-by-Step Procedure
1. π οΈ Importing Required Libraries
We import essential Python libraries like:
pandas
for data handlingnumpy
for numerical operationsmatplotlib
andseaborn
for data visualizationsklearn
for machine learning models and evaluation
2. π₯ Loading the Dataset
We load the Salary_dataset.csv
file into a pandas DataFrame using:
df = pd.read_csv('Salary_dataset.csv')
3. π§Ό Data Inspection and Cleaning
- Use
df.info()
anddf.describe()
to explore data types and basic statistics. - Check for missing values using
df.isnull().sum()
. - Drop unnecessary columns like
'Unnamed: 0'
if present.
4. π Exploratory Data Analysis (EDA)
- Use scatter plots to visualize the relationship between
YearsExperience
andSalary
. - Use box plots to identify outliers in both features.
5. π€ Feature Selection
Split the dataset into:
- X: Independent variable (
YearsExperience
) - Y: Dependent variable (
Salary
)
6. π§ͺ Splitting the Dataset
Split the data into training and testing sets using:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=101)
Jai hanuman
4 Reactions
2 Bookmarks