To implement linear regression using sklearn
in Python, you can follow these basic steps:
- Import Libraries: You’ll need to import necessary libraries like
sklearn.linear_model
for the model,numpy
for data manipulation, andmatplotlib
orseaborn
for visualization. - Prepare Data: You can use sample data or load your dataset using
pandas
. Ensure your data has features (X) and target values (y). - Train the Model: Fit the linear regression model to your data using
LinearRegression().fit()
. - Make Predictions: Use the trained model to predict on new data.
- Evaluate: Evaluate the model performance using metrics like Mean Squared Error (MSE), R-squared, etc.
Here’s an example of how to implement linear regression using sklearn
:
Example: Linear Regression with sklearn
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Sample dataset (for illustration)
# Let's assume we have some data with 'X' as independent variable and 'y' as dependent variable
# You can replace this with your own dataset
data = {
'X': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'y': [1, 2, 1.8, 3.6, 4.2, 5.1, 6.2, 7.3, 8.1, 9.0]
}
# Convert data to pandas DataFrame
df = pd.DataFrame(data)
# Features (X) and target (y)
X = df[['X']] # Features must be a 2D array
y = df['y'] # Target is a 1D array
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize the Linear Regression model
model = LinearRegression()
# Train the model using the training data
model.fit(X_train, y_train)
# Make predictions using the test data
y_pred = model.predict(X_test)
# Print model coefficients (slope and intercept)
print("Slope (m):", model.coef_)
print("Intercept (b):", model.intercept_)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")
# Visualizing the results
plt.scatter(X, y, color='blue', label='Actual data')
plt.plot(X, model.predict(X), color='red', label='Regression line')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()
Explanation:
- Data Preparation: In this example, we’re creating a simple dataset with
X
andy
values. You can replace this with a real dataset loaded from a file or other sources. - Train/Test Split: We split the data into training and testing sets to evaluate the model’s performance.
- Model Fitting: The
LinearRegression().fit()
method trains the model using the training data. - Prediction: We use
model.predict()
to make predictions on the test data. - Evaluation: We calculate the Mean Squared Error (MSE) and R-squared value to understand how well the model performs.
- Visualization: A scatter plot of the actual data is plotted along with the regression line for better understanding.
This is a simple example, but you can extend it to more complex datasets with multiple features, use cross-validation, and fine-tune the model further.