4 M1A: Install R and RStudio
This content draws on material from STAT 545 by Jenny Bryan, licensed under CC BY-SA 4.0
Changes to the source material include addition of new material; light editing; rearranging, removing, and combining original material; adding and changing links; and adding first-person language from current author.
The resulting content is licensed under CC BY-SA 4.0.
4.1 Introduction
My formal training is in education, so I have some strong opinions about what learning looks like and what good teaching ought to look like. In particular, I hold to a sociocultural view of learning that assumes that:
knowledge is distributed in the world among individuals, the tools, artifacts, and books that they use, and the communities and practices in which they participate (Greeno et al., 1996, p. 20)
In other words, I can’t teach you data science by merely rattling off a list of facts for you to memorize and then repeat at the appropriate time. Rather, if I’m going to effectively teach you data science, I need to introduce you to data science communities, have you use the tools that data scientists use, and have you act in the way that data scientists act.
In relation to this second point, R and RStudio are software that are widely used in the world of data science, so becoming familiar with them is part of learning data science. Some data scientists prefer other software, and that’s fine, but this is what we’ve decided on teaching here in UK’s School of Information Science (and it’s what I personally use, so I’m better suited to teaching it anyway).
This activity is about introducing you to this software and helping you set it up. Even if you have a pre-existing installation of R or RStudio, I highly recommend that you re-install both and get as current as possible. It can be considerably harder to run old software than new.
4.2 R
R is an open source programming language designed for statistics. Two things are important about that initial description:
First, R is open source, meaning that is freely available and that other programmers may add to it or modify it to their heart’s content. This is good news—it means that in addition to the basic features of R, it is possible (and relatively easy) to add new features by installing and loading packages. We’ll be doing plenty of that this semester.
Second, R is designed for statistics. That doesn’t mean it can’t be used for other things: In my research, I regularly use R, but I rarely use it for traditional statistics. Nonetheless, R is built with statistical needs and tasks in mind.
You can (and should now) install R from CRAN, the Comprehensive R Archive Network. I highly recommend you install a precompiled binary distribution for your operating system (as opposed to the source code). Follow the link for your operating system at the CRAN link in this paragraph. (You’ll probably notice that version names for R are… eccentric!).
4.3 RStudio
Programming in R can be done in a number of ways, but in this class, we’ll be using an IDE (integrated development environment) called RStudio (developed by an organization called Posit).
To download RStudio Desktop, navigate to this this link. It will provide you with a link for downloading R; since you’ve already done that, you can ignore it. What you shouldn’t ignore, though, is the link it will provide to download RStudio for your computer’s operatins system.
It’s important to understand that RStudio is one of many interfaces available for working with the R programming language. Another interface (simply called R) will also be installed on your computer as a result of installing the R language, and it will be possible to open R code in either the R interface or the RStudio interface. Make sure that you always open code in RStudio—not only do I find it cleaner and easier to work with code there, but RStudio also has some extra features that we’ll be using throughout the semester.