1 M1U: Course Syllabus
The syllabus for this course is available as a PDF through Canvas. However, it is also reproduced here for the purposes of annotation.
1.1 Overview
1.1.3 Contact Information
- Office—341 Lucille Little Library Bldg.
- Phone—859.218.2294
- Email—spencer.greenhalgh@uky.edu (preferred)
- Zoom—https://uky.zoom.us/my/greenhalgh
1.1.4 Response Time
During the work week (but not the weekend!), I commit to respond to all emails within 24 hours. I expect you to regularly check Canvas and email for messages from me and to respond quickly.
1.2 Required Materials
This course uses a free custom online textbook based on Creative Commons-licensed works such as Data Feminism, OpenStax’s Introductory Statistics, and the “ModernDive” Statistical Inference Via Data Science textbook.
1.3 “Life is Difficult” Statement [inspired by [Dr. Andrew Heiss]
Recent years have been characterized by a global pandemic, increased (and overdue) attention to inequalities and injustices, and stressful political tensions; we might hope that the worst of all of these has passed, but the truth is that none of them have disappeared. This can be a difficult time to be in grad school—and it may be more difficult for some individuals and groups than others.
Despite these difficulties, I am fully committed to making sure that you learn everything you were hoping to learn from this class! My late policy and willingness to make accommodations are generous even during normal times, and if your life is being turned upside down, I’m willing to be as flexible as you need me to be—so long as you are active in communicating with me.
If you feel like you’re behind, not understanding everything, or just plain stressed, do not suffer in silence! I’m usually quick to respond to email and more than happy to meet with you.
1.4 Basic Needs Statement [inspired by Dr. Sara Goldrick-Rab]
Any student who has difficulty affording or accessing food to eat every day or who lacks a safe and stable place to live and believes this may affect their performance in the course is urged to contact the Center for Support and Intervention. Furthermore, if you are comfortable doing so, you can also notify me.
The following resources are also available at the University of Kentucky:
1.5 Course Information
1.5.1 Course Description
This course will provide a foundation in the area of data science based on data curation and statistical analysis. The primary goal of this course is for students to learn data analysis concepts and techniques that facilitate making decisions from a rich data set. Students will investigate data concepts, metadata creation and interpretation, the general linear model, cluster analysis, and basics of information visualization. At the beginning, this course will introduce fundamentals about data and data standards and methods for organizing, curating, and preserving data for reuse. Then, we will focus on the inferential statistics: drawing conclusions and making decisions from data. This course will help students understand how to use data analysis tools, and especially, provide an opportunity to utilize an open source data analysis tool, R, for data manipulation, analysis, and visualization. Finally, in this course we will discuss diverse issues around data including technologies, behaviors, organizations, policies, and society.
1.5.2 Course Objectives—“I Can Statements”
The following “I can” statements will guide all of the learning and assessment activities throughout this course. Although these objectives have some overlap, activities within each module will clearly and specifically relate to a single objective, and larger assessments will implicitly ask you to demonstrate all of them. As we proceed throughout the semester, you should feel increasingly comfortable making these statements about yourself:
- I can express my understanding of philosophical, ethical, statistical, research, and other concepts underpinning data science.
- I can apply that understanding—in conjunction with R programming—to completing practical projects.
- I can connect conceptual and practical elements of data science to disciplinary and contextual knowledge.
1.5.3 Course Assessment
Your grade for this course will be based on 100 points:
- 90 points – 100.0 points = A
- 80 points – 89.9 points = B
- 70 points – 79.9 points = C
- 0 points – 69.9 points = E
These 100 points come from the following assessment activities, which should all be completed honestly and individually on Canvas:
1.5.3.1 Projects
Throughout the semester, you will complete four projects worth a total of 55 points:
- Project #1: Finding and Evaluating Data (10 points)
- Project #2: Exploring and Describing Data (10 points)
- Project #3: Building and Evaluating Models (10 points)
- Final Project: Reporting Data Analysis (25 points)
Detailed instructions for these projects can be found on Canvas.
1.5.3.2 Participation
Throughout the semester, you will earn 45 points from a series of participation activities. During each of the fifteen modules of the semester, you will complete three reading or participation activities (each worth one point) that will help you extend or apply your understanding of course content; while these activities vary from module to module, a plurality of modules involve annotating a reading from the textbook, completing a programming walkthrough with provided data, and then adapting (some of) the code from the walkthrough to work with your own data.
1.5.4 Late Work Policy
Officially, each assignment is due at 11:59pm on the Sunday night indicated in Canvas. Practically speaking, however, I will grade without penalty (for graded assessments) and provide feedback on (for all assessments) anything that is turned in by the time I begin reviewing that assessment. However, I will not grade or provide feedback on any work that is completed after this time unless you have made other arrangements with me. Naturally, because my schedule varies from week to week and because I try to provide feedback as quickly as possible, your best bet is to turn in your work by the official deadline or—if life has thrown you a curveball—to get in touch with me ahead of time to make other arrangements.
1.5.5 Prep Week
UK policies limit what I can assign during Week 16 of the course. However, I am “permitted to grade student participation,” “collect regularly assigned homework,” and “collect projects” so long as those assignments are scheduled ahead of time and, in the case of projects, there is nothing due during Finals Week. Please note that your final project meets these requirements and is due during Week 16.
1.6 Course Policies
All of the policies listed on this page are in effect for this course.
1.7 Plagiarism, Cheating, and Generative AI
1.7.1 Plagiarism [source]
“All academic work, written or otherwise, submitted by students to their instructors or other academic supervisors, is expected to be the result of their own thought, research or self-expression. In cases where students feel unsure about a question of plagiarism involving their work, they are obliged to consult their instructors on the matter before submission. When students submit work purporting to be their own, but which in any way borrows ideas, organization, wording or content from another source without appropriate acknowledgment of the fact, the students are guilty of plagiarism.
“Plagiarism includes reproducing someone else’s work (including, but not limited to a published article, a book, a website, computer code or a paper from a friend) without clear attribution. Plagiarism also includes the practice of employing or allowing another person to alter or revise the work which a student submits as their own, whoever that other person may be, except under specific circumstances (e.g. Writing Center review, peer review) allowed by the Instructor of Record or that person’s designee. Plagiarism may also include double submission, self-plagiarism or unauthorized resubmission of one’s own work, as defined by the instructor.
“Students may discuss assignments among themselves or with an instructor or tutor, except where prohibited by the Instructor of Record (e.g. individual take-home exams). However, the actual work must be done by the student, and the student alone, unless collaboration is allowed by the Instructor of Record (e.g. group projects). When a student’s assignment involves research in outside sources or information, the student must carefully acknowledge exactly what, where and how they have employed them. If the words of someone else are used, the student must put quotation marks around the passage in question and add an appropriate indication of its origin. Making simple changes while leaving the organization, content and phraseology intact is plagiaristic. However, nothing in this AR shall apply to those ideas which are so generally and freely circulated as to be a part of the public domain.”
1.7.2 Cheating [source]
“Cheating is defined by its general usage. It includes, but is not limited to, the wrongfully giving, taking or presenting any information or material by a student with the intent of aiding themself or another on any academic work which is considered in any way in the determination of the final grade.
“The fact that a student could not have benefited from an action is not by itself proof that the action does not constitute cheating.”
1.7.3 Code, Plagiarism, and Generative AI
It is common practice in data science and programming communities to borrow code from other, more knowledgeable programmers. Indeed, many of the weekly activities in this class will explicitly involve copying or adapting code from our textbook. While I prefer that you draw from the textbook when borrowing code from other sources, you might also find online or other sources helpful for figuring out how to complete a specific task for your class projects. When done properly, this is not plagiarism or cheating—in fact, it is good practice in data science.
Nonetheless, you are ultimately responsible for completing assessments, and plagiarism and cheating remain a serious concern for this course. If you consult other sources, please ensure that they support (rather than replace) your personal work, effort, initiative, and understanding. It is your responsibility to ensure that you understand what plagiarism is and how to avoid it; when in doubt, reach out to me with your questions.
Along these lines, I strongly discourage you from using any generative AI tool (including ChatGPT, Claude, Llama, Gemini, or Grok) to write code or text for you. AI-generated output can include errors, and as a general rule, if you know enough to catch those errors, you know enough to generate that output yourself. Furthermore, there are multiple different “styles” of coding in R, and I’m intentionally trying to teach you one particular style; even if a generative AI tool returns “correct” code for you, it may give you code that isn’t compatible with the style that we’re focusing on in this class. This isn’t a huge deal, but it will get in the way of your learning, and that’s the most important point I can make here: Generating output yourself (and asking humans like me for help when you run into trouble) will help you develop your knowledge more than relying on a tool.
If you do use any generative AI in completing your work, you must explicitly acknowledge it in your submission—and you will assume responsibility for any errors the tool makes. Students who use generative AI output in their work without acknowledging it will be penalized.
1.8 Course Schedule
1.8.1 Module 1: Course Introduction (25 Aug - 31 Aug)
- read and annotate the course syllabus
- complete “Install R and RStudio” walkthrough
- introduce yourself to the class
1.8.2 Module 2: Data Science (1 Sep - 7 Sep)
- read and annotate “The New(?) and Shiny(?) Science of Data”
- complete “Getting Started with Data in R” walkthrough
- complete “Set up GitHub” walkthrough
1.8.3 Module 3: Reproducibility and Paradigms (8 Sep to 14 Sep)
- read and annotate “Research Paradigms and Reproducibility”
- complete “Using Projects and Scripts in R” walkthrough
- complete “Writing in R Markdown” walkthrough
1.8.4 Module 4: Data Sharing (15 Sep - 21 Sep)
- read and annotate “The Value of Open Data”
- complete “Find a Dataset Relevant to You” walkthrough
- read and annotate “Show Your Work”
- submit Project 1: Finding and Evaluating Data
1.8.5 Module 5: Theory and Ethics (22 Sep - 28 Sep)
- read and annotate “Numbers Don’t Speak for Themselves”
- read and annotate “Are Ethics Enough in Data Science”
- reflect on theoretical and philosophical constraints in context
1.8.6 Module 6: Data Cleaning (29 Sep - 5 Oct)
- read and annotate “Unicorns, Janitors, and Rock Stars”
- complete “Wrangling and Tidying Data” walkthrough
- practice wrangling and tidying your own data
1.8.7 Module 7: Data Visualization (6 Oct - 12 Oct)
- read and annotate “Subjectivity in Data Visualization
- complete “Data Visualization” walkthrough
- practice visualizing your own data
1.8.8 Module 8: Descriptive Statistics (13 Oct - 19 Oct)
- read and annotate “Statistics and Scientific Racism”
- complete “Descriptive Statistics” walkthrough
- calculate descriptive statistics for your own data
- submit Project 2: Exploring and Describing Data
1.8.9 Module 9: Basic Regression (20 Oct - 26 Oct)
- read and annotate “Basic Regression”
- complete “Basic Regression” walkthrough
- perform a basic regression with your own data
1.8.10 Module 10: Multiple Regression (27 Oct - 2 Nov)
- read and annotate “Consequences of Failed Predictions”
- complete “Multiple Regression” walkthrough
- perform a multiple regression with your own data
1.8.11 Module 11: Statistical Sampling (3 Nov - 9 Nov)
- read and annotate “Samples and Populations”
- complete “Sampling” walkthrough
- explore sampling with your own data
1.8.12 Module 12: Confidence Intervals (10 Nov - 16 Nov)
- read and annotate “Confident About What?”
- complete “Confidence Intervals” walkthrough
- explore confidence intervals with your own data
1.8.13 Module 13: Hypothesis Testing (17 Nov - 23 Nov)
- read and annotate “The Danger of False Positives”
- complete “Hypothesis Testing” walkthrough
- explore hypothesis testing with your own data
1.8.14 Module 14: Inferential Regression (26 Nov - 30 Nov)
- read and annotate “Small Stories vs. Big Data”
- complete “Inferential Regression” walkthrough
- perform an inferential regression with your own data
- submit Project 3: Building and Evaluating Models