Preparing for Data, Code, Ethics

Things to bring

It goes without saying that you should bring whatever helps you learn best ― a laptop, a tablet, a notebook, colored pens, sticky notes, whatever works! But because we'll be learning to code and working on digital projects during the track, it's important that we have some people with laptops. Not everyone needs one, since we'll be coding in pairs, but we'll need at least half the track with a laptop that has some important (free, open-source) software installed (see below). Because we'll be working with code, not just websites or taking notes, those should be bonafide laptops (or tablet PCs) running MacOS, Windows, or Linux.

Things to install

For those bringing laptops, please be sure to have the following installed before DPL if you can. (If you need help, we can definitely take care of that on Day 1).

  • The R programming language ― alongside Python, one of the main languages for work in data science and computational statistics.
  • RStudio ― the most common and most comprehensive development environment for R.

Things to read

There is no "homework" for the first day, but here are a few optional optional open-access articles/books to engage in advance of the Institute. We'll go through some of these during the Institute, but others are simply good overviews to topics we'll explore together in Fredericksburg.

Teacher Knows if You've Done the E-Reading, David Streitfeld ― a good introduction to the issue of data privacy as it applies to educational technology.

Algorithmic Harms Beyond Facebook and Google: Emergent Challenges of Computational Agency, Zeynep Tufekci ― a short, accessible academic article about the capabilities and ethics of the algorithms that increasingly function as "gatekeepers" or non-human "news editors" that determine what information we encounter online.

Hacking the Attention Economy, danah boyd ― a detailed account of how nefarious actors are using the current media landscape to spread disinformation.

The Bots That Are Changing PoliticsRenee DiResta, et al. ― a summary of a number of findings from activist data scientists studying the spread of disinformation online, particularly in the context of influence operations during recent Western elections. The summary is brief, but there are a lot of links worth following if you want to explore in more detail.

Machine Learning, Kris Shaffer ― a high-level introduction to what machine learning is (and isn't), with a lot of links to helpful resources for diving deeper.

R for Data Science, Garrett Grolemund and Hadley Wickham ― the open access version of the book published by O'Reilly. An excellent introduction to coding for data analysis, and the primary resource we will use for the learn-to-code part of the track.

Data versus Democracy: How Big Data Algorithms Shape Opinions and Alter the Course of History, Kris Shaffer ― a little shameless self-promotion, but we will also be exploring concepts laid out in this book at points throughout the week.