Thursday, January 30, 2025
HomeProgrammingHow do you perform data analysis using R?

How do you perform data analysis using R?

Performing Data Analysis Using R

R is one of the most powerful languages for data analysis, statistical computing, and data visualization. Here’s a structured approach to performing data analysis using R.


1️⃣Importing Data

The first step is loading the dataset. R supports multiple file formats.

(a) Read CSV File

data <- read.csv("data.csv", header = TRUE, stringsAsFactors = FALSE)

🔹 header = TRUE → Uses the first row as column names
🔹 stringsAsFactors = FALSE → Prevents automatic conversion of strings to factors

(b) Read Excel File (readxl package)

install.packages("readxl")
library(readxl)
data <- read_excel("data.xlsx", sheet = 1)

(c) Read Data from a Database (DBI package)

install.packages("DBI")
library(DBI)
conn <- dbConnect(RSQLite::SQLite(), "database.db")
data <- dbGetQuery(conn, "SELECT * FROM table_name")
dbDisconnect(conn)

2️⃣ Exploring the Data

Before analysis, check the structure and contents of the dataset.

(a) View the First & Last Few Rows

head(data) # First 6 rows
tail(data) # Last 6 rows

(b) Check Structure and Summary

str(data) # Structure of dataset
summary(data) # Summary statistics

(c) Get Column Names

colnames(data)

3️⃣ Data Cleaning & Preprocessing

Data often requires cleaning before analysis.

See also  Software Levels of Testing

(a) Handling Missing Values

sum(is.na(data)) # Count missing values
data <- na.omit(data) # Remove rows with missing values
data[is.na(data)] <- 0 # Replace NA with 0

(b) Convert Data Types

data$Age <- as.numeric(data$Age) # Convert column to numeric
data$Category <- as.factor(data$Category) # Convert to categorical

(c) Remove Duplicates

data <- unique(data) # Remove duplicate rows

4️⃣Data Visualization

R provides powerful visualization libraries like ggplot2 and base R.

(a) Histogram (Distribution)

hist(data$Age, col = "blue", main = "Age Distribution", xlab = "Age")

(b) Scatter Plot

plot(data$Height, data$Weight, col = "red", pch = 19, main = "Height vs Weight")

(c) Boxplot (Outlier Detection)

boxplot(data$Income, main = "Income Distribution", col = "green")

(d) ggplot2 for Advanced Visualization


5️⃣Statistical Analysis

R is widely used for statistical computations.

(a) Mean, Median, and Standard Deviation

mean(data$Age, na.rm = TRUE) # Average
median(data$Age, na.rm = TRUE) # Middle value
sd(data$Age, na.rm = TRUE) # Standard deviation

(b) Correlation Analysis

cor(data$Height, data$Weight, use = "complete.obs") # Pearson correlation

(c) T-Test (Compare Two Groups

t.test(data$Income ~ data$Gender)

(d) Linear Regression

model <- lm(Weight ~ Height, data = data)
summary(model)

6️⃣Machine Learning in R

R supports machine learning using packages like caret, randomForest, and e1071.

(a) Train a Simple Linear Model

r
install.packages("caret")
library(caret)
model <- train(Weight ~ Height, data = data, method = “lm”)
summary(model)

(b) Decision Tree


7️⃣Exporting Data

After analysis, you may need to save the results.

(a) Save Processed Data to CSV

write.csv(data, "cleaned_data.csv", row.names = FALSE)

(b) Save Model for Future Use

saveRDS(model, "model.rds")

Load it later using:

loaded_model <- readRDS("model.rds")

📌Summary

Step Function/Package Purpose
Import Data read.csv(), read_excel(), DBI Load data from files/databases
Explore Data head(), summary(), str() Check dataset structure
Clean Data na.omit(), as.numeric(), unique() Handle missing values, duplicates
Visualization plot(), hist(), ggplot2 Graphical analysis
Statistical Analysis mean(), cor(), t.test(), lm() Basic statistics & regression
Machine Learning caret, rpart Predictive modeling
Export Data write.csv(), saveRDS() Save results

 

RELATED ARTICLES
0 0 votes
Article Rating

Leave a Reply

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
- Advertisment -

Most Popular

Recent Comments

0
Would love your thoughts, please comment.x
()
x