At this point we have some fancy formulas and lots of nice numbers. What we need to do is see how well our formulas model the numbers. My favorite software for this task is JMP from SAS Institute. It’s a great piece of software that will do just about anything you can imagine within the realm of statistics. Unfortunately, I’m not a student, teacher, have a university research budget, nor am I a “One Percenter” so the $1350 license fee makes it out of reach for me. Instead, I use an open source (as in FREE) software package called R. You can find it here. It runs on a variety of hardware platforms and does everything we need. Unfortunately for the casual user, it has one of the most obtuse, obscure, and infuriating command line interfaces on the face of the earth. But don’t worry, I’ve spent hours figuring out the commands and will describe what’s happening as we use them. Well, at least MY understanding of what they’re doing.
Here’s the cut and paste of the data from the “Perfect Data” page. It has been cleaned up: the column headings were simplified and there is only one space between the columns.
Yep, it’s ugly but it’s not for our consumption.
Starting R gives you something that looks like this:
The “>” is where you enter all your R commands. The first commands we have to issue are to read in our data and make it so that we can access it using the column headers. Here’s how you do that:
The first line is “pronounced”, “Read the data table ask21_polar.txt into a variable named pdata. The first line of the data table contains the column names.” Of course “/Users/wemrt/desktop/ask21_polar.txt” is particular to my machine. The attach function gives us the ability to access each column of data by the column name and the head function just displays the beginning of whatever is within its parenthesis.
These commands get the data loaded and give us easy access to the columns. In the next post we’ll get busy with the heavy lifting.
Next: The Heavy Lifting