Category Archives: Uncategorized

Adjusted r-squared demo

Just a little demo of what happens if you don’t or do adjust your r-squared.
Here’s the bottom line:

rsquared

rm(list=ls())

## Why adjust your r-squared?
## Below is a simple demo of the difference between unadjusted and adjusted r-squared.

## Lets do some multiple regression, with different numbers of explanator variables,
## with completely random data
numb.expl.vars <- rep(2^seq(0, 5, 0.5), each=50)

## Number of observations
n <- 100

## The response variable
y <- rnorm(n)

## Function to return the unadjusted and adjusted r-squared
get.r2 <- function(ne) {
	x <- as.data.frame(matrix(rnorm(n*ne), n, ne))
	m1 <- lm(y ~ ., x)
	result <- c(summary(m1)$r.squared, summary(m1)$adj.r.squared)
	result
}

## use lapply to run the function over the number of explanatory variables vector
rez <- do.call(rbind, lapply(numb.expl.vars, function(x) get.r2(x)))
## get the mean r-squared and adjusted r-squared per number of expl varbs
means <- aggregate(rez, list(numb.expl.vars=numb.expl.vars), mean)

## plot the data
matplot(log2(numb.expl.vars), rez, type="n", ann=F, axes=F)
box()
abline(h=0)
matpoints(jitter(log2(numb.expl.vars)), rez, pch=19, col=c("#11ff1144", "#ff111144"))
mtext(1, line=2.5, text="Number of explanatory variables")
mtext(2, line=2, text="R-squared\n(green unadjusted, red adjusted)")
axis(1, at=0:5, labels=2^(0:5))
axis(2)
matpoints(log2(means[,1]), means[,2:3], pch=21, bg=c("#11ff1144", "#ff111144"), cex=2, type="b", lty=1, col=1)

## So the unadjusted r-squared increases with the number of explanatory variables,
## even when they are totally random.
## Whereas the adjusted remains 0.

## for fun, calculate the adjusted r-squared manually
adj.rsquared <- 1 - (1-rez[,1])*(n-1)/(n-numb.expl.vars-1)
sum(abs(adj.rsquared-rez[,2])>1e-10) ## should be zero