Motivation

Much of a researcher’s time in modern economics and business research is spent in front of a computer, performing some form of computational analysis—whether it's analyzing data or simulating economic models. Until recently, there has been little emphasis on teaching early career researchers how to perform their computational tasks and manage the resulting projects in a structured and efficient way. Class exposure to programming languages is often limited to the simple use of Stata and Matlab to solve 'toy' examples designed to illustrate a theoretical result or implement a method with known properties and ex-ante known results. These skills do not scale up in a straightforward manner to handle complex projects that make up research papers, PhD theses, or typical work in government or private business settings. As a result, young economics and business researchers spend too much time wrestling with software and too little time doing research—where our comparative advantage lies.

This course is designed to improve learners’ programming abilities and acquire skills to decrease their time wrestling with software. It is aimed at PhD students who expect to write their theses in a field that requires modest to heavy use of computation and data analysis. Examples include applied microeconomics, econometrics, macroeconomics, quantitative marketing, quantitative finance, and other fields that either involve real-world data or do not generally lead to analytical models with closed-form solutions.

The course introduces students to a new set of tools and programming methods that aim to reduce time spent programming while making programs more dependable and results reproducible. It draws extensively on techniques and tools that are the backbone of modern software development and large-scale data science. Students gain insight into the usefulness of these techniques and how to use them through hands-on examples from a wide variety of applications in economics and business research.

Target Audience

This course is intended for PhD students in economics and business who are transitioning from coursework to research. Aside from your economics/business background, we will only assume that you have written small pieces of code before, such as Stata .do files or Matlab .m files for problem sets in your Master’s degree or first-year PhD classes. Knowledge of a specific programming language is not required.

A large part of this course is really about tool choice. We will carefully point out which language is most appropriate for which problem and provide you with introductions to two popular choices for data- and computationally intensive computing. We also introduce a toolkit designed to improve the replicability of your code. The programming languages and tools introduced in the course are not the only choices available, but some knowledge of these languages and best practices will make picking up others on your own relatively easy by providing solid basic training.

Course Objectives

By the end of this course, students will be able to:

  1. Use a computer’s command line to provide text instructions that can navigate around a computer’s file system, copy and move files, and edit new/existing files.
  2. Explain different variable types and their advantages and disadvantages in Python and R.
  3. Construct scripts that load, manipulate, and visualize data in Python and R.
  4. Implement statistical and economic models in Python and R.
  5. Run Python and R inside a GUI and from the command line (including passing complex arguments).
  6. Use and manage conda environments.
  7. Explain the advantages of using version control software such as Git as opposed to ‘manual’ version control.
  8. Manage a version-controlled project using Git.
  9. Explain what a workflow management system is and its advantages for economics and business research.
  10. Manage a research project using the workflow management software Snakemake.

Learning objectives for specific modules will be provided within the Course Notes.

Rules of the Game

The class is designed to be hands-on in the sense that you will be programming a lot of things during the class. We strongly believe the only way to learn programming is to do programming. Please bring your laptop with you to each session and install the required software before the course begins. Try to complete each activity we do in class and be prepared to ask and answer questions during class. Slides or notes will be made available at the beginning of each day, codes that solve exercises will be posted during or after the session.