draft advice for intro to data science students

20 August 2024 - 5 minutes read - 999 words

I am, unbelievably, preparing my fourth offering of my department’s ICT/LIS 661 Intro to Data Science class, and this time around, I’ve decided to add a new section to my “about the class” page in Canvas to head off some concerns that I’ve seen over the past few years. I have a lot of students with no background in either statistics or programming who take my class, and it can be really intimidating for them. I’m not convinced that the advice I give below is everything that I ought to say (or exactly how I ought to say it), but this semester, I want to get out ahead of a lot of the one-on-one pep talks I give throughout the semester.

In the spirit of public drafts, sharing advice, and giving feedback, I include it in this blog post, too:

This course can be a lot.

Data science involves the application of statistics and programming to specific contexts. I don’t take any personal or professional pleasure in making courses demanding, and I haven’t gone out of my way to do so for this one. Yet, even in an introductory course, it would be unfair to you if I didn’t make sure that you had a strong foundation in stats and programming by the end of the semester, even if that feels like a lot to take on! Note that I’m specific about the timing there: I don’t expect you to have a strong foundation at the beginning of the semester, and the course is designed to get you there over the course of sixteen weeks. In short, you don’t have to know any statistics or have any experience with programming to be successful in this class. That said, it’s perfectly normal to feel overwhelmed with what we’re learning, and the big secret is that most of your classmates feel the same way, even if you feel like you’re struggling on your own.

I’m not writing this section to scare anyone—I hope this semester will be a smooth and enjoyable one for you, and I’m committed to doing everything I can to make that the case! That said, there are a few things that you can do to make your lives easier in 661, and I think it’s helpful if I share them from the outset:

Follow directions carefully: Contrary to popular belief, computers are dumb and have to be told exactly what to do. The difference between a 1, an l, and an I is important in programming, so it pays to pay close attention. Likewise, if I’ve told you to do things (like load your data into RStudio) a particular way, chances are that I have a good reason for doing so. Even if you’ve found a workaround, that workaround may not be helpful a few weeks down the line, so it pays to learn it right the first time.
Think at a higher level: I am okay in the kitchen, but what stops me from being actually good is that I’m overly dependent on line-by-line recipes, and I have weak comprehension of the underlying principles of why you do things the way that you do. When learning to program, it is really important to get past the line-by-line walkthroughs to get a firm understanding of the underlying principles of why we do things the way that we do. Even when a walkthrough does not explicitly call attention to it, you should always try to consider what the different pieces of some code are each contributing to the overall picture. If you can figure that out, you’ll not only better understand what’s going on but also more easily adapt the code for other purposes.
Be cool with making mistakes: I am a perfectionist, and in most areas of my life, making even the slightest mistake feels like a moral failure and proof that I am a terrible person. I’ve had to ditch my perfectionism when it comes to data science, because learning to program involves making mistakes: Lots of them. I have about a decade of experience with R, and I still make mistakes every dang time I try to write some code. No exceptions. Hadley Wickham is one of the best R programmers in the world, and he makes it clear in public appearances that he makes mistakes too: Lots of them. You are going to make a lot of mistakes in this class. That’s fine! It’s proof that you’re learning, and while you should be committed to fixing and overcoming those mistakes, it is unhelpful to see mistakes as failures or as evidence that data science isn’t right for you.
Ask for help: In a year, I spend more time in office hours with 661 students than with the students in all my other courses combined. That’s a good thing! To continue my earlier thought, it’s impossible for me to anticipate all the mistakes that you might make in R during the semester, and that means that I can’t warn you against all of them in my walkthroughs. I will try to head off common mistakes when I can, but you are a creative bunch, and you will come up with mistakes that I have never thought of before, just like I did when I was first learning R. I expect you to try to figure out where you went wrong (make sure you’re following directions carefully and try to think at a higher level), but when you just can’t figure it out, I expect you to get in touch with me. A dozen panicked emails in one night might not be the right way to do that, but let me know when you’re struggling with something, and I’ll answer what I can via email and schedule a Zoom meeting with you for the rest. (Plus, when you see me code in a Zoom meeting, you’ll see the proof that I make mistakes every dang time I try to code. It will make you feel better.)

Comments:

You can click on the < button in the top-right of your browser window to read and write comments on this post with Hypothesis. You can read more about how I use this software here.

This course can be a lot.

Similar Posts:

Comments: