CS 303: Designing Computer Science Experiments


Spring Quarter, 2010
Wed 3:15-6:05
Wallenberg 124

Instructor and office hours:
  • Scott Klemmer
        Tuesdays 3:30-5:00pm, Gates 384
  • Philip Levis
        Thursdays 1:30-3:30pm, Gates 358
  • Christopher Manning
        Tuesdays 2-3, Gates 158

  • Course assistant office hours:
  • Katherine Breeden
        Fridays 3:30-5pm, Gates 260

  • CS 303 is a graduate course that examines experimental design in computer science research. Papers often succeed or fail based on their evaluation section. The goal of CS 303 is to help you improve how you design experiments to evaluate your research. It will do so by teaching you how to

    • Reason through the strengths and limitations of an experimental design
    • Design new experiments to unambiguously measure something
    • Decide which experiments to include, given limited space

    The class will also teach basic uses of R for data analysis. Note to participants: please take a few minutes prior to the first class to download R.  

    The course will have two major parts. The first is a series of experimental case studies from human-computer interaction, natural language processing, and computer systems. These case studies will include examples of exemplary depth, standard practices, innovative designs, and unforeseen flaws. The second major part of the course is a project, where students design and execute experiments for either their own research or prior work. Members of the class will constructively critique and discuss each other's designs. Coursework for the class involves problem sets in R that recreate experimental results in papers as well as a final project.

    The course is paper-centric and has no textbook. Because the papers read are typically deep research papers from a diverse set of fields, before the class reads each paper one of the instructors will give a 30 minute presentation on its material.


    Syllabus

    Date Topics Reading Assignment Due
    March 31 Course Introduction and In-class HCI Experiment

    Class dataset.
      Fitts' Law: The Information Capacity of the Human Motor System in Controlling the Amplitude of Movement.

    Card et al.: Evaluation of Mouse, Rare-Controlled Isometric Joystick, Step Keys, andText Keys for Text Selection on a CRT.
       
    Preview: Introduction to Wireless  
    April 7 Challenges of many uncontrolled variables
      Aguayo et al.: Link-level Measurements from an 802.11b Mesh Network.   Systems with R ; Human Subjects Clearance

    Preview: Introduction to Clickthrough data;  Eyetracking.
    Google blog article on eye-tracking search users  
     
    April 14
    NLP I
      Joachims et al.: Accurately Interpreting Clickthrough Data as Implicit Feedback
    [longer journal version]
      (due Friday!)
    NLP/IR with R
    Preview: Where do experimental
    ideas come from?

     
    April 21 Project checkpoint  
    Readings: McGuire: Creative Hypothesis Generating in Psychology

      Checkpoint writeup
    Preview: Wireless routing  
    April 28
    Examining measurement bias with diversity
      Gnawali et al.: Collection Tree Prototol   
    Preview: User studies
    May 5
    Parallel vs. Serial Prototyping

    Dow et al.: The Effect of Parallel Prototyping on Design Performance, Learning, and Self-Efficacy.

      Analysis of experimental data from week 1 with R
     
    Preview: Speech recognition
    May 12 NLP II (Speech recognition errors) Goldwater et al., Prosodic, lexical, and disfluency factors that increase speech recognition error rate
    William Morgan: Statistical Hypothesis Tests for NLP or: Approximate Randomization for Fun and Profit
      Speech rec analysis with R  
    Preview: Checkpoint details
     
    May 19 Checkpoint
          Project Checkpoint Writeup (1-2 pages)

    Preview: How to be a skeptic

     
    May 26
    Experimental Round-Table

     
    June 2 Presentations   Final Project Guidelines

    (Monday)
    June 7th
    Due date for final project write-ups (see Final Project Guidelines for details.)


    Resources

    R

    To download R, please see the R Project homepage.

    There are many GUIs available for working in R. One such interface (recommended for novices) is R Commander, but there are many others. The Wikipedia page on R lists several. 

    Looking for a good tutorial? Norman Matloff at UC Davis has written this introduction, which is geared towards programmers. There's also this page from Illinois State with many examples of R's graphing capabilities, as well as how to input data files and perform basic statistical tests. This is the standard R introduction. Stanford's CS109L is also a valuable resource.

    NLP

    Dan Klein and Christopher Manning. Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency.

    Claire Cardie, and Ellen Riloff. Conundrums in Noun Phrase Coreference Resolution: Making Sense of the State-of-the-Art, ACL-IJCNLP 2009.

    Stefan Riezler and John T. Maxwell III. On Some Pitfalls in Automatic Evaluation and Significance Testing for MT, MTSE 2005.

    Ahmed Hassan, Rosie Jones, and Kristina Lisa Klinkner. Beyond DCG User Behavior as a Predictor of a Successful Search, WSDM 2010.

    Dan Klein and Christopher D. Manning. Conditional Structure versus Conditional Estimation in NLP Models, ACL 2002.

    Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, Filip Radlinkski, and Geri Gay. Evaluating the Accuracy of Implicit Feedback fromClicks and Query Reformulations in Web Search, ACM TOIS 2007.

    Sharon Goldwater, Dan Jurafsky, and Christopher Manning. Prosodic, lexical, and disfluency factors that increase speech recognition error rate, Speech Communication 52 (2010).

    Systems

    David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan. FAWN: A Fast Array of Wimpy Nodes, SOSP 2009.

    Michael Piatek, Tomas Isdal, Thomas Anderson, Arvind Krishnamurthy, and Arun Venkataramani. Do incentives build robustness in BitTorrent?, NSDI 2007.

    Dattatraya Gokhale, Sayandeep Sen, Kameswari Chebrolu, and Bhaskaran Raman. On the Feasibility of the Link Abstraction in (Rural) Mesh Networks, INFOCOM 2008.

    HCI

    Joel Brandt, Mira Dontcheva, Marcos Weskamp, and Scott R. Klemmer. Example-Centric Programming: Integrating Web Search into the Development Environment, CSTR-2009-01.

    Dan Klein and Christopher D. Manning. Conditional Structure versus Conditional Estimation in NLP Models, ACL 2002.

    Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, Filip Radlinkski, and Geri Gay. Evaluating the Accuracy of Implicit Feedback fromClicks and Query Reformulations in Web Search, ACM TOIS 2007.

    Sharon Goldwater, Dan Jurafsky, and Christopher Manning. Prosodic, lexical, and disfluency factors that increase speech recognition error rate, Speech Communication 52 (2010).

    Systems

    David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan. FAWN: A Fast Array of Wimpy Nodes, SOSP 2009.

    Michael Piatek, Tomas Isdal, Thomas Anderson, Arvind Krishnamurthy, and Arun Venkataramani. Do incentives build robustness in BitTorrent?, NSDI 2007.

    Dattatraya Gokhale, Sayandeep Sen, Kameswari Chebrolu, and Bhaskaran Raman. On the Feasibility of the Link Abstraction in (Rural) Mesh Networks, INFOCOM 2008.

    HCI

    Joel Brandt, Mira Dontcheva, Marcos Weskamp, and Scott R. Klemmer. Example-Centric Programming: Integrating Web Search into the Development Environment, CSTR-2009-01.

    I. Scott Mackenzie, Abigail Seller, and William Buxton. A Comparison of Input Devices In Elemental Pointing and Dragging Tasks, CHI 1991.

    Tovi Grossman and Ravin Balakrishnan. The Bubble Cursor: Enhancing Target Acquisition by Dynamic Resizing of the Cursor's Activation Area, CHI 2005.