University of Oxford

Sep 26-27, 2017

9:00--17:00

Instructors: Iain Emsley, Lucia Michielin, Pip Willcox

Helpers: Laura Fortunato, Aaron Ceross, Stephen Jones, Alex Orlek

General Information

Data Carpentry workshops are for any researcher who has data they want to analyze, and no prior computational experience is required. This hands-on workshop teaches basic concepts, skills and tools for working more effectively with data.

We will cover Data organization in spreadsheets and OpenRefine, Introduction to Python, Data analysis and visualization in Python and SQL for data management. Participants should bring their laptops and plan to participate actively. By the end of the workshop learners should be able to more effectively manage and analyze data and be able to apply the tools and approaches directly to their ongoing research.

This workshop is run by the Reproducible Research Oxford project. For announcements about future workshops and related activities, check our project website, subscribe to our mailing list, and follow us on Twitter @RR_Oxford.

Who: The course is aimed at students, researchers, and staff of the University of Oxford. Please register with your .ox.ac.uk email address.

You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: Institute of Cognitive and Evolutionary Anthropology, 64 Banbury Road, Oxford, OX2 6PN. Get directions with OpenStreetMap or Google Maps.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below). They are also required to abide by Data Carpentry's Code of Conduct.

Accessibility: We are committed to making this workshop accessible to everybody. The workshop organisers have checked that:

Materials will be provided in advance of the workshop and large-print handouts are available if needed by notifying the organizers in advance. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch and we will attempt to provide them.

Contact: Please email iain.emsley@oerc.ox.ac.uk for more information.


Schedule

Surveys

Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey

Day 1

Morning Data organization in spreadsheets and OpenRefine
Afternoon Introduction to Python

Day 2

Morning Data analysis and visualization in Python
Afternoon SQL for data management

Programming in Python

Data Organisation in Spreadsheets

Data Cleaning with OpenRefine

Data Management with SQL


Setup

To participate in a Data Carpentry workshop, you will need working copies of the described software. Please make sure to install everything (or at least to download the installers) before the start of your workshop. Participants should bring and use their own laptops to insure the proper setup of tools for an efficient workflow once you leave the workshop.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

Spreadsheet

To work with with spreadsheets, we can use Microsoft Excel, OpenOffice.org, or other programs. Commands may differ a bit between programs, but general ideas for thinking about spreadsheets are the same. For this lesson, if you don’t have a spreadsheet program already, you can use LibreOffice. It’s a free, open source spreadsheet program.

Windows

Only if you don't have MS Excel installed. Install LibreOffice by going to the download page. Your download should begin automatically. You will go to a page that asks about a donation, but you don’t need to make one.

Mac OS X

Only if you don't have MS Excel installed. Install LibreOffice by going to the download page. Your download should begin automatically. You will go to a page that asks about a donation, but you don’t need to make one.

Linux

Install LibreOffice by going to the download page. The version for Linux should automatically be selected. Click Download Version 5.3.X. You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.

OpenRefine

For this lesson you will need OpenRefine (formerly Google Refine) and a web browser.

Note: this is a program that runs on your machine (not in the cloud). It is accessed via your browser, but no web connection is needed.

Windows

  • OpenRefine uses the Java Runtime Enviroment. If you don't already have it installed, install it from here.
  • Download the OpenRefine 2.7 Windows Kit from http://openrefine.org
  • Unzip the downloaded file into a directory by right-clicking and selecting “Extract…”. Name that directory something like 'OpenRefine'. Remember where you extracted it
  • Go to your newly created OpenRefine directory using File Explorer.
  • Double click "openrefine" (the icon is a blue diamond). A black console window will apppear, and your default browser shortly afterwards.
  • If OpenRefine does not automatically open for you, point your web browser at http://127.0.0.1:3333/ or http://localhost:3333.

Mac OS X

  • Open Refine uses the Java Run Environment. To check you have Java installed, open System Preferences and look for a Java icon. If you don't have it, download and install it.
  • Download the OpenRefine 2.7 Mac Kit from http://openrefine.org
  • Open the downloaded file and drag the OpenRefine icon to Applications as instructed.
  • Launch OpenRefine from Applications.
  • If you receive a warning about installing untrusted applications: Applications -> Utilities -> Terminal and type the following: spctl --add /Applications/OpenRefine.app and try again.
  • If OpenRefine does not automatically open for you, point your web browser at http://127.0.0.1:3333/ or http://localhost:3333.

Linux

  • OpenRefine uses the Java Runtime Enviroment. To check if you have Java installed, open a terminal and type java -version. If you don't have it, the run sudo apt-get install default-jre (Ubuntu) or sudo dnf install java-1.8.0-openjdk (Fedora)
  • Download the OpenRefine 2.7 Linux kit from http://openrefine.org
  • Unzip the downloaded file into a directory. Name that directory something like "OpenRefine".
  • Go to your newly created OpenRefine directory.
  • Type ./refine into the terminal within the OpenRefine directory
  • If OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to launch the program.

Python

Python is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of its scientific packages individually can be a bit difficult, so we recommend Anaconda, an all-in-one installer.

Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., 3.4 is fine).

Windows

Video Tutorial
  1. Open http://continuum.io/downloads with your web browser.
  2. Download the Python 3 installer for Windows.
  3. Install Python 3 using all of the defaults for installation except make sure to check Make Anaconda the default Python.

Mac OS X

Video Tutorial
  1. Open http://continuum.io/downloads with your web browser.
  2. Download the Python 3 installer for OS X.
  3. Install Python 3 using all of the defaults for installation.

Linux

  1. Open http://continuum.io/downloads with your web browser.
  2. Download the Python 3 installer for Linux.
  3. Install Python 3 using all of the defaults for installation. (Installation requires using the shell. If you aren't comfortable doing the installation yourself stop here and request help at the workshop.)
  4. Open a terminal window.
  5. Type
    bash Anaconda3-
    and then press tab. The name of the file you just downloaded should appear.
  6. Press enter. You will follow the text-only prompts. When there is a colon at the bottom of the screen press the down arrow to move down through the text. Type yes and press enter to approve the license. Press enter to approve the default location for the files. Type yes and press enter to prepend Anaconda to your PATH (this makes the Anaconda distribution the default Python).

SQLite

SQL is a specialized programming language used with databases. We use a very lightweight database system called SQLite in our lessons. On its own, it's so light, it doesn't even include a user interface! So, we use DB Browser for SQLite.

Windows

Download and install DB Browser for SQLite (Windows)

Mac OS X

Download and install DB Browser for SQLite (Mac)

Linux

Download and install DB Browser for SQLite (Linux)

Once you are done installing the software listed above, please go to this page, which has instructions on how to test that everything was installed correctly.