Plot and Play : Finding Correlation between Attributes

Its always good to getting know about our data. It is a great way to visualize our data intuitively. When it come to visualization, plots and graphs will be our first choice. In statistical analysis, people call this procedure as Plot and Play.

I am going to use ggplot package in R for basic plots. In order to install ggplot in R, you can simply type install.packages("ggplot").

Minimum use of domain knowledge is always good in statistical analysis. But in practice, it is really hard to purely analyze without domain knowledge. In following section I am going to see what kind of relations exist between chosen attributes. These attributes are chosen based on domain knowledge, which means I think they can possible be interrelated. But you are free to select any pair of attributes and identify patterns.

I created a function called corPlot to plot graphs using ggplot given a data frame and  set of attributes.

corPlot <- function(dataFrame, attribute_x, attribute_y, xName, yName){
    filePath <- file.path("C:","users","nsiva","Documents","My Projects", "CO421_Sabermetrics", "R", "Plots",dataFrame$opposition,paste("plot_",dataFrame$opposition,"_",xName,"_vs_",yName,".jpg", sep = ""))
    jpeg(file = filePath,quality = 150,res = 300,height = 10,width = 15, units = "in")
    plot <- ggplot() + geom_bar(aes(y = attribute_y, x = attribute_x, fill = attribute_y), data = dataFrame,stat="identity")+ coord_flip() + ggtitle(paste("Plot: ",xName," vs ",yName)) + xlab(xName) + ylab(yName) + labs(fill = yName) + geom_text(stat='count', aes(label=..count..), vjust=-1)
    plot <- plot + theme(
        plot.title = element_text(size = 24), axis.title.x = element_text(size = 24), axis.title.y = element_text(size = 24), legend.title = element_text(size = 20), axis.text.x = element_text(size = 14), axis.text.y = element_text(size = 14), legend.text = element_text(size = 14) 
    )
    print(plot)
    dev.off()
}


I am going to find relationship between following pair of attributes.

  • Venue vs Winner
  • City vs Winner
  • Toss Winner vs Winner
  • Date vs Winner
  • Match Type vs Winner
  • Overs vs Winner
  • Toss Winner vs Toss Decision
  • Venue vs Toss Decision
  • Toss Decision vs Winner
You can get the R script for the analysis here.

You can lot of interesting plots. Some of the plots are shown below.






Comments

Popular posts from this blog

Getting Started with Data set

What is Cricket Sabermetrics?

What is this Blog about?