Overview and Goals
This course is intended to provide students experience with data science within a framework of data ethics in service of equity-oriented public policy. Our primary goals are:
- Make progress on a project that advances social justice and policy understanding in collaboration with a community partner, and have a project you can point to as an example of your work.
- Practice working with real data (that is, messy data resulting from policy administration) to answer pressing questions with attention to the moral and ethical implications of our work. This includes finding, cleaning, and understanding data; exploring, analyzing, modeling data; visualizing, contextualizing, and communicating data; with care and humility and respect for the affected partners and communities throughout.
- Develop experience in data workflows that support ethical data science, including processes for working collaboratively, openly, inclusively, and reproducibly.
Spring 2022 Project
Our Spring 2021 partner is UVA Legal Data Lab. We have been working with them to develop a platform to make anonymized court record data from the Virginia Judiciary more accessible for use by researchers and advocates to better understand Virginia’s judicial system and the potential inequities in the disposition of cases brought before the courts in both the criminal and civil arenas.
As the Legal Data Lab team works to launch and build an interface for the community to find, query, and use this valuable database, we will undertake some initial analysis of subsets of this data to create a series of policy-oriented data stories to accompany the database platform. These will serve as examples of the types of research the court data enables, provide models of reproducible analysis, and be an educational resource for others wishing to use the data platform for analysis and advocacy.
About Our Work
In this class, we’ll be learning how to use data to answer important and public-minded questions with an emphasis on justice and equity. You don’t need to be an expert in statistics or R or visualization or coding coming in, the project will provide opportunities to develop these skills further. Good data-oriented work comes from the collaboration of people with diverse talents, perspectives, and expertise. We need individuals interested in data wrangling and analysis, working in R, and visualizing data effectively; but we also need people with an understanding or interest in the civil and criminal legal systems, who can ask probing questions and think creatively; and members who can help teams work better and keep projects moving, who can dig into a new substantive area to synthesize information, and who can communicate about our work effectively and help us convey people-centered stories and explanations. That doesn’t mean everyone won’t be responsible for learning about and contributing to each step of the project, but you may find that at some points you are more of a learner and at other points you are more of a leader.
After the first few weeks, we’ll begin working in smaller teams to define and develop analysis around specific questions. We’ll need to quickly develop a better understanding of the state court system and processes. We’ll need to understand the data we have, consider its provenance, and evaluate its limitations. There will be no right answers, and we don’t have predetermined outcomes. So it’s just like real-life data science work - embrace the ambiguity!
R, RStudio, and RStudio Cloud
We will be using the programming language R for our data work. R is a free, open-source language (😄) most commonly used with RStudio, a free user interface (❤️). R users are an amazing bunch and have generated a variety of free learning resources on which we’ll rely (🎉).
For our work together, we’ll be using RStudio Cloud. This will provide us a common space to share projects and data and it means you don’t have to install anything on your computer. You’ll need to sign up for a free RStudio Cloud Free plan, which will provide you with a private workspace. I’ll share a link for you to join our class space.
As part of our ethic, we’ll center free resources and materials licensed by the library for our use. Consequently, all reading materials will be available for free. Data practice resources and references are online; readings on data ethics and power along with material on courts, criminal, and civil case analysis will be made available on Perusall or our class slack (though many of these are also available online).
Getting Help and the Class Slack
Computers are stupid which can make programming hard. Any little error – a small typo, a missed closing parentheses – can cause inordinate pain (especially if you’re tired). Some of that is unavaoidable when learning a new language. But don’t beat your head against the wall when you get stuck. Seek help!
There are many online resources. Two of the most useful are StackOverflow (a Q&A site with hundreds of thousands of answers to all sorts of programming questions) and RStudio Community (a forum specifically designed for people using RStudio).
The UVA Library also an amazing team of consultants in the StatLab. They’ve got online how-tos you might find useful (including this one on how to ask reproducible questions on StackOverflow or the RStudio Community) and you can email them at firstname.lastname@example.org to set up a consultation (you shouldn’t use them for your problem sets, but they’re a great resource if you’re looking for help on the final visualization or after our class is done).
Additionally, we have a class chatroom at Slack where anyone in the class can ask questions and anyone can answer. You’re encouraged to do both and you should check the slack regularly. It will be our primary means of communication. If you have questions, someone else probably does too, so slack will help us share knowledge widely. You’ll likely be able to answer others’ questions, and you should! That’s part of our collaborative learning.
Evaluation of Our Work
Grades will depend on your learning and contribution to our collective effort, which will be assessed by a combination of individual work and team-based work.
Reading Annotations, 20%
To facilitate understanding and discussion we’ll all contribute to collaborative annotation and commenting on the reading using Perusall. I will review these ahead of class to help organize our discussion.
We have reading assigned on 9 out of 13 weeks (minus the first week). Contributing annotations to readings for at least 6 weeks is necessary for full credit. Make at least four full contributions for a given day’s reading.
A full contribution will go beyond agreeing with a point or asking a question. It will engage a point, connect it to other ideas we might learn from, amend it to suggest a more limited or wider scope, note conditions under which it is more or less relevant, ponder how it might inform your project or other work of which you are aware, ask a question and provide an initial answer. If others have already commented on a passage and you want to address and elaborate on the initial comment (beyond basic agreement), you should. Think about the kinds of things you might raise in a discussion and bring them up in the annotations.
Individual Data/R Assignments, 20%
In the first four weeks, individuals will complete assignments to become more comfortable using R and to learn about the court data. If you get stuck, feel free to ask for help on slack so that the learning can be shared. If you’re able to answer others’ questions, you should; it’s a great chance to deepen your own understanding by explaining ideas.
Midterm exam, 10%
We’ll have one exam focused on the data ethics readings and it’s application to your primary research questions. This is really an opportunity to revisit and synthesize the ideas we’ve been reading and reflect on what they might suggest for the question your team is exploring.
Team Assignments, 40%
In the last two-thirds of the class, we’ll begin working in groups to complete data analysis and stories centered on selected questions. Once teams have been formed, I expect groups to meet each week between class to work together (via zoom, through chat in slack or a preferred platform). In addition, each team will meet with me at least once along the way.
Team work will include
- Data work, 15%: Each team will take on responsibility for a different research question, exploring the relevant parts of the data set, setting up the data to address those questions, creating visuals and comparisons, possibly implementing models and results. Each group will submit four updates reporting on their progress and showing their work to date (March 2, March 23, April 13, April 20). Based on that progress, we’ll collectively determine the next steps. Though I have something of an outline in my head of where I imagine we’ll be from week to week, applied research and data analysis often refuses to conform to planned timelines. In part, that’s what makes it exciting, but I know that can be uncomfortable for some, so I want to be upfront that the project will require some flexibility on all our parts.
- Literature synthesis, 10%: Each team will dive further into research and literature relevant to their question, first identifying three articles or reports that will help you make sense of your question (March 16), followed by a written synthesis of these sources (March 30).
- Presentation, Final Analysis and Documentation, 15%: Each team will complete a report of their work, as a Rmarkdown and knitted html files, suitable for inclusion in the Court Data platform page. This should, at a minimum, outline the key question they’ve been addressing and how the literature intersects with the questions, communicate their process and highlight concerns and caveats, and explain their findings in an a way accessible to a broader community.
Collaborative Learning, 10%
Learning new things is tough. Supportive colleagues can make it easier. For this reason, you will be evaluated based on the extent to which you help each other learn through our discussions, our team practice, and our class slack.
In addition, once we begin submitting team assignments, each group will provide feedback in response to another group’s progress reports, sharing ideas, making suggestions, identifying elements that need more clarification or thought, and proposing additional steps.
- Whichever answers below describe your prior relationship with R:
- It’s one of my top 5 favorite letters
- I’ve admired it from afar
- I’ve run a few lines of code in R (e.g., provided by others and tweaked)
- I’ve written R code from scratch (e.g., without following a template)
- I’m comfortable doing basic data manipulation and analysis in R
- I’m comfortable doing advanced data manipulation and analysis in R
- I’m an R masteR and should probably be TA’ing this course
- An image or gif of an adorable animal. I think we’re all going to need a little cute animal therapy this semester, so let’s start now! Here’s one that reflects my current vibe…
In response, I will invite you to the class slack!