Just a little demo of what happens if you don’t or do adjust your r-squared.

Here’s the bottom line:

rm(list=ls()) ## Why adjust your r-squared? ## Below is a simple demo of the difference between unadjusted and adjusted r-squared. ## Lets do some multiple regression, with different numbers of explanator variables, ## with completely random data numb.expl.vars <- rep(2^seq(0, 5, 0.5), each=50) ## Number of observations n <- 100 ## The response variable y <- rnorm(n) ## Function to return the unadjusted and adjusted r-squared get.r2 <- function(ne) { x <- as.data.frame(matrix(rnorm(n*ne), n, ne)) m1 <- lm(y ~ ., x) result <- c(summary(m1)$r.squared, summary(m1)$adj.r.squared) result } ## use lapply to run the function over the number of explanatory variables vector rez <- do.call(rbind, lapply(numb.expl.vars, function(x) get.r2(x))) ## get the mean r-squared and adjusted r-squared per number of expl varbs means <- aggregate(rez, list(numb.expl.vars=numb.expl.vars), mean) ## plot the data matplot(log2(numb.expl.vars), rez, type="n", ann=F, axes=F) box() abline(h=0) matpoints(jitter(log2(numb.expl.vars)), rez, pch=19, col=c("#11ff1144", "#ff111144")) mtext(1, line=2.5, text="Number of explanatory variables") mtext(2, line=2, text="R-squared\n(green unadjusted, red adjusted)") axis(1, at=0:5, labels=2^(0:5)) axis(2) matpoints(log2(means[,1]), means[,2:3], pch=21, bg=c("#11ff1144", "#ff111144"), cex=2, type="b", lty=1, col=1) ## So the unadjusted r-squared increases with the number of explanatory variables, ## even when they are totally random. ## Whereas the adjusted remains 0. ## for fun, calculate the adjusted r-squared manually adj.rsquared <- 1 - (1-rez[,1])*(n-1)/(n-numb.expl.vars-1) sum(abs(adj.rsquared-rez[,2])>1e-10) ## should be zero