In this assignment you will learn some the basics of pandas
, a standard
data analysis library for python. You will also review plotting using
matplotlib
.
pandas
data analysis librarymatplotlib
In the Pokemon games, pocket monsters or "pokemon" have stats that determine how well they fight and their properties in battle. These properties include how fast they move, how hard they hit and how much punishment they can take before fainting. Before any special considerations are made (training, enhancements etc.) these stats are at their base level, a.k.a. base stats. In addition, every pokemon has a type. These fanciful monsters can be related to fire, water, grass, electricity etc. We want to know how the type impacts base stats. There's no reason that the type really should affect the base stats, but we want to know if pokemon of a particular type are more likely to have higher hitpoints (health). We could just take an average, but we want the full distribution, and we want to graph it.
Fortunately, all of the data is easily accessible. Unfortunately, it is stored in an annoying format. The CSV files ("comma-separated values") that we're providing you don't just link pokemon to their base stats and types. For reasons that will be explored in CS 121, individual pokemon are mapped to an integer id. Types are mapped to an id. And pokemon ids are mapped to type ids. Matching this up is going to be very difficult in NumPy because NumPy arrays can only hold one type of data. We want to store names (strings) and data (ints). It is possible to do this in NumPy, but NumPy isn't designed for this purpose. Fortunately, the Pandas library (Python Data Analysis) is designed for this sort of task.
The CSV files we are providing are:
type_names
maps a type_id
and language_id
to the name of a type.
pokemon_types
maps pokemon_id
s to type_id
s. pokemon
maps pokemon
names to pokemon_id
s. stats
maps the name of a stat to the stat_id
.
pokemon_stats
maps pokemon_id
and stat_id
to the base stat for that
pokemon.
[5] Note that Pandas is better for importing CSV data than NumPy, so use
the Pandas read_csv
function. Play around with the data some. Take a look at
pandas.dataframe
's slicing and .loc
attribute. (There is nothing to hand
in for this problem.)
[30] Now, what we want to do is graph a histogram for the HP (hit point) stat for every type. We expect to see a uniform distribution for all of the types. Do we? Using the subplots feature, graph the histograms for every pokemon type in the same figure but on different plots. Be sure to label everything. Your code should output a PNG file that you can look at in e.g. a web browser.
Political Scientists have great interest in large data sets in the form of
election data. Here is some election data from the
2012 US Presidential election (Obama vs. Romney), broken down by county. Note
that the file format is .xlsx
. We *could* transform this into a CSV, but
pandas
can import such data directly!
For each of the following problems, write Pandas code to answer the question (and include that in your submission!) and then write the answers in comments.
[5] How many votes did each candidate receive?
[10] Who won the popular vote for the entire country?
[10] Which candidate won more counties?
[15] Which candidate won which states? Note that your code might output a Pandas type. Write the answers in this form:
# state-abbreviation candidate
For instance, we might have:
# AK Romney
# AL Romney
# AR Romney
# AZ Romney
# ...
[15] Did third party candidates split the vote?. Plot PCT_OBM
("percent Obama") against PCT_OTHR
("percent other") and put in a trendline.
Do the same for Romney. Is there anything notable here?
[30] Let's draw a political map of the United States. Color each state
Red (Romney) or Blue (Obama) based on who won. We will need a Basemap
shapefile to find and draw these colors. We are supplying you with these
shape files which you should download:
Note that the data we have uses the abbreviations of the states rather than
the full names which Basemap
uses. It would be poor style to add this data
directly into your code. We want some file to convert between them. It's not fun
to do this by hand, so we're providing one called
states.py
. (Import this into your code.)
Hints: You will want to read in the data about the states with:
map = Basemap(... your arguments here ...)
map.readshapefile('st99_d00', name='states', drawbounds=True)
You will want to fill in a state by building a Polygon
based on this shape
data, and adding it to the axis with ax.add_patch
.
The files lab4a.py
and lab4b.py
.