Bridging Social Research And Programming
According to Nate Silver, the obsession with "big data" is declining and is being replaced by an informed curiosity about "data science." This transition indicates that collecting data only for data's sake is not a worthy exercise. Instead, data scientists focus on establishing relationships within and between datasets in order to answer questions about the physical and the social worlds. In other words, there is no point to collecting exabytes and zettabytes, that is billions and trillions of gigabites, of information, if that information is not used.
Social researchers are incredibly good at formulating questions and designing tools and instruments for making sense of data. However, their ability to access the oceans of available data are limited to instruction they receive. Things are also complicated by the fact that the specialized software packages available to social scientists (e.g. SPSS, SAS, Stata) use proprietary data and query formats that do not allow effective cross-platform analysis and collaboration.
Enter R, "a free software environment for statistical computing." R is an open-source statistical package and programming language. It allows users to work on virtually any operating system. More importantly, it enables researchers to access vast amounts of data and create their own customized tools for analysis. As data science continues take over big data, social researchers can become data scientists. Learning to use R and code in R can be a great step in that direction.
Here are some resources to get started with R:
- "The Art of R Programming" by Douglas Matloff offers a substantive overview of how to get started with R. There are other books available, but this author kindly offers a free pdf copy of the 2009 version of the book.
- Specific to survey researchers, Thomas Lumley offers a great list of resources for survey analysis using R.
- MOOCs (most of them free) are also available. I would recommend Coursera's Code Yourself!, a programming course for beginners offered by the University of Edinburgh and Universidad ORT Uruguay. A more advanced option is Coursera's Data Science Specialization, a sequence of data science courses focusing on R offered by Johns Hopkins University.