This semester I am running my R workshops
once again, and as always I start by teaching people the packagers of the
. As part of Endangered Data Week
, I am teaching two workshops introducing beginner R programmers to data tidying/manipulation and data visualization.
I’ve taken this approach to using the
instead of base R for two primary reasons. First, learning how to manipulate data with dplyr and tidyr is easy to understand conceptually and often easier than learning the idiosyncrasies of R. When I show students two lines of code that achieve the same thing in base R and
, I’ve always gotten the same answer: the
way is much easier to read and understand.
I’m not alone in my approach here — David Robinson has made the same case
in regard to
. My rationale largely follows his: that teaching students the basics of the
means they can be up and running with a powerful set of tools quickly. In the case of Endangered Data Week, that means introducing students to messy government data, tidying that data, working with data to produce new data, and drawing conclusions. I’m able to teach these concepts relatively quickly thanks to the power behind
. I don’t need to worry about teaching the syntax around
. If students need base R techniques or have questions, they can always get in touch with me for more pointers.
For our data manipulation exercises in our workshop, we work off an RMarkdown worksheet together
during the session. I provide them with some population data I compiled for a project I worked on last year
and we work through most of the functions available in
—and if we don’t get through it all, that’s fine; they have the worksheet to complete on their own time. (I make teaching these workshops a little easier for myself by also installing RStudio Server and the necessary packages on Digital Ocean so we can be up and running quickly.)
Second, students can be up and running with a good amount of knowledge about R, data manipulation, and visualization in a relatively short amount of time. After an hour-and-a-half together, even students who haven’t programmed previously are learning to work with the language. The grammar of data tidying allows these concepts to be grasped quickly since each step builds upon the previous one. Chaining together a series of tidyverse functions allows the students to see the steps necessary to reshape, clean, and explore a dataset. And those skills can be applied to any dataset, meaning students can take what they learn and use them towards other projects or classes. Likewise, I prize
methods for their consistency. I’ve seen some wild ways people have accessed or manipulated columns in a data frame (just spend some time on Stack Overflow), but anytime I read someone’s
example the process clicks faster. That consistency, again, makes using, finding answers, and learning the language that much easier.
This isn’t to say I don’t teach any base R — even in the above workshops, students still learn about
, logical operators, and other base methods. But pairing some of the base R methods with the
makes for a powerful set of tools that can have students manipulating and visualizing data quickly.
This approach of teaching
with an interactive worksheet has worked well — students are up and running with R and applying new skills quickly. My goal is to help people to work with data, and the
provides a powerful way to get started.