CS 11 Python Track: Assignment 4: Pokemon and Elections

Goals

In this assignment you will learn some the basics of pandas, a standard data analysis library for python. You will also review plotting using matplotlib.

Covered this week

The pandas data analysis library
Review of plotting with matplotlib

Instructions

lab4a: Pokemon HP

In the Pokemon games, pocket monsters or "pokemon" have stats that determine how well they fight and their properties in battle. These properties include how fast they move, how hard they hit and how much punishment they can take before fainting. Before any special considerations are made (training, enhancements etc.) these stats are at their base level, a.k.a. base stats. In addition, every pokemon has a type. These fanciful monsters can be related to fire, water, grass, electricity etc. We want to know how the type impacts base stats. There's no reason that the type really should affect the base stats, but we want to know if pokemon of a particular type are more likely to have higher hitpoints (health). We could just take an average, but we want the full distribution, and we want to graph it.

Fortunately, all of the data is easily accessible. Unfortunately, it is stored in an annoying format. The CSV files ("comma-separated values") that we're providing you don't just link pokemon to their base stats and types. For reasons that will be explored in CS 121, individual pokemon are mapped to an integer id. Types are mapped to an id. And pokemon ids are mapped to type ids. Matching this up is going to be very difficult in NumPy because NumPy arrays can only hold one type of data. We want to store names (strings) and data (ints). It is possible to do this in NumPy, but NumPy isn't designed for this purpose. Fortunately, the Pandas library (Python Data Analysis) is designed for this sort of task.

The CSV files we are providing are:

type_names maps a type_id and language_id to the name of a type. pokemon_types maps pokemon_ids to type_ids. pokemon maps pokemon names to pokemon_ids. stats maps the name of a stat to the stat_id. pokemon_stats maps pokemon_id and stat_id to the base stat for that pokemon.

[5] Note that Pandas is better for importing CSV data than NumPy, so use the Pandas read_csv function. Play around with the data some. Take a look at pandas.dataframe's slicing and .loc attribute. (There is nothing to hand in for this problem.)

[30] Now, what we want to do is graph a histogram for the HP (hit point) stat for every type. We expect to see a uniform distribution for all of the types. Do we? Using the subplots feature, graph the histograms for every pokemon type in the same figure but on different plots. Be sure to label everything. Your code should output a PNG file that you can look at in e.g. a web browser.

lab4b: Elections

Political Scientists have great interest in large data sets in the form of election data. Here is some election data from the 2012 US Presidential election (Obama vs. Romney), broken down by county. Note that the file format is .xlsx. We *could* transform this into a CSV, but pandas can import such data directly!

For each of the following problems, write Pandas code to answer the question (and include that in your submission!) and then write the answers in comments.

[5] How many votes did each candidate receive?

[10] Who won the popular vote for the entire country?

[10] Which candidate won more counties?

[15] Which candidate won which states? Note that your code might output a Pandas type. Write the answers in this form:

# state-abbreviation candidate

For instance, we might have:

# AK Romney
# AL Romney
# AR Romney
# AZ Romney
# ...

[15] Did third party candidates split the vote?. Plot PCT_OBM ("percent Obama") against PCT_OTHR ("percent other") and put in a trendline. Do the same for Romney. Is there anything notable here?

[30] Let's draw a political map of the United States. Color each state Red (Romney) or Blue (Obama) based on who won. We will need a Basemap shapefile to find and draw these colors. We are supplying you with these shape files which you should download:

Note that the data we have uses the abbreviations of the states rather than the full names which Basemap uses. It would be poor style to add this data directly into your code. We want some file to convert between them. It's not fun to do this by hand, so we're providing one called states.py. (Import this into your code.)

Hints: You will want to read in the data about the states with:

    map = Basemap(... your arguments here ...)
    map.readshapefile('st99_d00', name='states', drawbounds=True)

You will want to fill in a state by building a Polygon based on this shape data, and adding it to the axis with ax.add_patch.

To hand in

The files lab4a.py and lab4b.py.