Correlation and Regression

Correlation and Regression

Correlation and Regression

Published on October 22, 2025By EduResHub Team

Simple Correlation and Regression Analysis

Author

Prof. CKDash

Step 1: Enter the data

x <- c(0, 15, 45, 60, 75, 90, 105, 120)
y <- c(3.3, 3.5, 4.0, 4.2, 4.6, 5.0, 5.3, 5.8)

dat <- data.frame(x, y)
dat
    x   y
1   0 3.3
2  15 3.5
3  45 4.0
4  60 4.2
5  75 4.6
6  90 5.0
7 105 5.3
8 120 5.8

Step 2: Correlation analysis

# Correlation coefficient (Pearson)
r <- cor(x, y)
r
[1] 0.9901078
# Correlation test (shows r, p-value, and confidence interval)
cor.test(x, y)

    Pearson's product-moment correlation

data:  x and y
t = 17.285, df = 6, p-value = 2.402e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.9442173 0.9982792
sample estimates:
      cor 
0.9901078 

Step 3: Simple linear regression

# Fit the model
model <- lm(y ~ x)

# Summary of the regression model
summary(model)

Call:
lm(formula = y ~ x)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.18559 -0.08176 -0.00473  0.06430  0.18378 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 3.154955   0.088995   35.45 3.36e-08 ***
x           0.020511   0.001187   17.29 2.40e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1326 on 6 degrees of freedom
Multiple R-squared:  0.9803,    Adjusted R-squared:  0.977 
F-statistic: 298.8 on 1 and 6 DF,  p-value: 2.402e-06

Step 4: ANOVA table

anova(model)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value    Pr(>F)    
x          1 5.2533  5.2533  298.78 2.402e-06 ***
Residuals  6 0.1055  0.0176                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Step 5: 95% Confidence intervals for coefficients

confint(model, level = 0.95)
                 2.5 %     97.5 %
(Intercept) 2.93719242 3.37271749
x           0.01760701 0.02341401

Step 6: Fitted values and residuals

fitted_values <- fitted(model)
residuals_values <- resid(model)

# Add them to the dataset
dat$fitted <- fitted_values
dat$resid  <- residuals_values
dat
    x   y   fitted         resid
1   0 3.3 3.154955  0.1450450450
2  15 3.5 3.462613  0.0373873874
3  45 4.0 4.077928 -0.0779279279
4  60 4.2 4.385586 -0.1855855856
5  75 4.6 4.693243 -0.0932432432
6  90 5.0 5.000901 -0.0009009009
7 105 5.3 5.308559 -0.0085585586
8 120 5.8 5.616216  0.1837837838

Step 7: Predict new values (optional)

# Example: predict Y for new X values
new_x <- data.frame(x = c(0, 30, 60, 90, 120))
predict(model, newdata = new_x, interval = "confidence", level = 0.95)
       fit      lwr      upr
1 3.154955 2.937192 3.372717
2 3.770270 3.619400 3.921141
3 4.385586 4.270356 4.500815
4 5.000901 4.863176 5.138626
5 5.616216 5.416634 5.815799

Step 8: Scatter plot with regression line

# ===============================
# Step 8: Scatter plot with regression line and equation
# ===============================

# Fit regression line on the data
plot(x, y,
     xlab = "X values",
     ylab = "Y values",
     main = "Scatter Plot with Regression Line and Equation",
     pch = 19)

# Add the regression line
abline(model, lwd = 2, col = "blue")

# Get coefficients and R² for labeling
b0 <- round(coef(model)[1], 3)
b1 <- round(coef(model)[2], 3)
R2 <- round(summary(model)$r.squared, 4)

# Create equation and R² text
eq_text <- bquote(hat(y) == .(b0) + .(b1)*x)
r2_text <- bquote(R^2 == .(R2))

# Add text to the plot
legend("top",
       legend = c(as.expression(eq_text), as.expression(r2_text)),
       bty = "n",
       text.col = "black")


Step 9: Interpretation (for students)

  • Correlation (r) shows the strength and direction of relationship.
  • Regression equation gives a prediction line:
    [ = b_0 + b_1X ]
  • R-squared shows how much variation in Y is explained by X.
  • ANOVA and p-values test if the relationship is statistically significant.
  • Residuals show how far observed values are from the predicted line.