Assignment 1: Analyzing Systems Data with R

due: 3:15pm April 7th by email to cs303@cs.stanford.edu,
submission instructions below

Introduction:
In this assignment, you will use R to recreate some of the plots from the Roofnet measurement paper and dig a little deeper into some of the results.

One of the very nice things about the Roofnet paper is that all of the experimental data is easy to download as a bzipped tarball: this has led to several follow-on papers that have examined the results more deeply. In the standard Roofnet dataset, one file describes all transmitted packets and another file describes all received packets. The receive file is over 1GB in size, which is a bit much for R to handle easily. So for this assignment we've done some preprocessing to make the dataset easier to handle and process in R. We've also cleaned up a few rough edges in the dataset, such as a few nodes with no send records but for which there are receive records.

The dataset can be found here and is laid out as follows:

 roofnet-links/
1/ 1Mbps measurements
2/ 2Mbps measurements
5.5/ 5.5Mbps measurements
11/ 11Mbps measurements

Within each measurement directory, there are two kinds of files: comma-separated value (.csv) files of packets received on a single, directional link, and text (.txt) files that state how many packets a given sender sent.

The csv files have a header describing the fields. The four fields you care about are:

 src: the source of the packet
 dst: the destination (who received the packet)
 noise: the noise value as described in the paper
 signal: the signal strength value as described in the paper

Note that the src and dst fields are constant within a given csv, and are also in the file name. So generally you'll only work with the noise and signal values. Recall that because signal and noise are on a logarithmic scale, signal-to-noise ratio (SNR) is signal - noise.

The txt files have just two entries: the sender ID and how many packets were sent. You can compute the packet reception ratio along a link by counting how many entries are in the corresponding csv and dividing it by how many packets the source sent.

Collaboration policy: You must complete problem 1 individually. You may collaborate on problem 2. The R program for Problem 1 can be written in about 20 lines (and probably less). If you collaborate on problem 2, then clearly state in your answer whom you collaborated with.

Problem 1 (8 points):
Recreate the four plots in Figure 14 of the Roofnet paper using R. Check that they look correct.

Problem 2 (2 points):
Compute the minimum SNR observed for a packet on each link and plot a histogram of these values. You should see some links that have packet SNRs far lower than the curves in Figure 12 suggest is possible. Why might these values occur? Do you think they represent outliers that should be excluded from the analysis, or is something else at work? Make a (brief) argument for your case.

Submission guidelines:
Your submission should consist of an email to the staff mailing list with five attachments. The first four attachments must be named: suid-1.pdf, suid-2.pdf, suid-5.5.pdf, and suid-11.pdf, where suid is your SUID and each PDF is the plot from problem 1. The fifth attachment must be named suid-roofnet.r and contains the R code for Problem 1. The text of the email should contain the answer to problem 2 (and the names of any collaborators).