In this assignment you will learn some generally useful libraries, and how to set up Jupyter.
You are expected to have an account on the CS cluster and to understand the
basics of using Unix, the filesystem, logging in and out, etc. You should also
be familiar with the man
command to access on-line manual pages.
You are free to use whatever text editor you like. Good ones include
emacs
, vi/vim
, sublime
, atom
and WingIDE
. We've also heard good
things about pycharm
.
This course is *NOT* intended to be an introduction to Python. You can and should take CS1 or look through its material if you are shaky with Python syntax. By the end of this course, we will be looking at Python bytecode, so some understanding of the stack/assembly may prove helpful.
We will be using Python 3 for this course, not Python 2, but the differences are fairly small, and you can learn about them by doing a web search on "Python 2 and 3 differences" or something similar. (We expect that you are willing to do this.)
This course was written for python3 which is a cross-platform language.
However, this does not mean that certain libraries are guarenteed to work with
all operating systems. For example, there appear to be issues with exiting plots
generated in matplotlib
on Mac OSX. For this reason, we are providing you
with a VM. You are free to use your own local environment, but be warned that
there might be some minor issues. Then we will show you how to install
Anaconda, which is a souped-up Python installation which contains all the
libraries we will be using pre-installed.
The easiest way to get set up is to follow the instructions on this page. Those instructions are for the CS 11 C track, but they will work for this track too. Once this is done you will have a fully-functional Linux system ready to use. Log in to it and continue with the rest of these instructions.
Open up a web browser in your VM (there's a button on the bottom left). Go to the Anaconda download page. Click the link to download the 64-bit installer because the VM is 64-bit. If for some reason you are installing on your own machine and it is particularly old (mid-2000s or earlier) you might need 32-bit (this is highly unlikely).
Open a terminal. You can do this with ctrl + alt + t
, or from the Menu in
the bottom left. Change directory to Downloads
. Run bash
Anaconda3-4.3.1-Linux-x86_64.sh
. Agree to the license agreement and follow the
defaults. Anaconda will now be installed to /home/student/anaconda3
. This
will also install a large number of libraries. Many of them we will not use in
this track, but we will use a number of the more important ones. Note that this
installation should add Anaconda3 to your path. If it doesn't, you will have to
add it yourself.
You can check that Anaconda has been installed by running
ipython
in a new terminal. This is a package that will have been
installed by Anaconda. It provides a superior interactive python interpreter for
serious users. When it starts, it will print out (among other things):
Python 3.6.0 |Anaconda custom (64-bit)
as one of the lines of output. Another good check would be running
which conda
which will tell you where conda is installed. It should
be /home/student/anaconda3/bin/conda
. Since this is part of the
core of anaconda, you know you have installed it correctly.
You are also going to need the basemap
package for a later
assignment, so you may as well install it now. In the terminal, type conda
install basemap
. You are now ready for the rest of this class. Note
that the username for this VM is student
and its password is
spring2017
. This is mostly useful when updating software or installing
new software from the VM's package manager (apt
).
logging
modulecollections
moduleThis part is not going to be turned in. If you've ever used Mathematica, you might wish that there was an interactive Python terminal in which you could go back and modify previous lines. This exists. Both Sage and Jupyter service this need. Jupyter is more popular at the moment. Jupyter comes installed with the anaconda package. Thus, go to your shell and type:
% jupyter-notebook
(We'll use %
as the Unix shell prompt; don't type it.) This
will open a web page on your default web browser. Note that the URL is
localhost:8888
. This means that you are using your browser to
interact with this "site", but you are not using any resources from the
internet, just your browser. To open a new notebook, click New
and
select Python [default]
. This will bring you to a new page that
will look a bit like a Mathematica notebook. Go ahead and type some Python code.
Hit Shift-Enter
to run the code in a single cell. Variables stored
in one cell will be available in another. This course won't explicitly require
Jupyter notebooks, but they are a good resource for helping you to write your
code. You can close Jupyter from your browser, or by hitting
Control-C
in the shell.
Both Jupyter notebooks and iPython in the terminal have access to special
commands prefixed by %
. Some that are particularly useful are
listed below.
%magic
or %quickref
gives a list of all inline commands.%notebook -e foo.py
in a terminal will export the current session to the
notebook file foo.py.
.%pastebin
will let you do the same to pastebin but with a text file.%time
and %cd
let us access regular command utilities.%prun
runs the profiler. It's autoimported instead of requiring a special
import.%psearch
lets you run a regex search on objects in the session.%store
and %store -r
lets variables be stored between sessions.As you might recall from CS1 (or any other programming you may have done), your code will invariably have bugs in it. Rereading the code might help, but it is typically faster to find out which line is doing something unexpected by printing out some relevant variables and comparing them to your expectations. In languages like Python this is often good and fine. However, with code that is (for instance) more vulnerable to race conditions the print statements can sometimes impact the resulting code execution. This stems from the fact that printing to the screen is slow compared to other processes. Consider:
def foo():
s = 0
for i in range(10):
print(i)
s += i
print('done')
compared to
def bar():
s = 0
for i in range(10):
s += i
print('done')
A somewhat related issue is dealing with large code bases. It's fine if your hundred line script prints out an error message every once in a while, but every time something goes wrong for an operating system or a web server, it can't just be printed to the screen. However, the information as to what went wrong might still be necessary if some user wants to diagnose and fix their machine.
Thus, we introduce the logging
library. Available in both python2 and 3,
the built-in logging
module is versatile and useful. If you need more
information than what is contained on the set, look at the documentation
here. Let's take a look
at how the logging module works. Logging can be configured using
logging.basicConfig()
. It is a function that takes named arguments to
determine what sorts of logging should be done. You can set log messages to go
to a file using filename='mylog.txt'
, or set the format of the messages to
log using format='%(message)s'
. Then you can log messages using
logging.debug()
, .info()
, .warning()
, .error()
. and
.critical()
.
Let's try some examples:
import logging
logging.warning('This is a test warning.')
logging.critical('It\'s important to understand logging levels.')
logging.info('The five different logging levels were mentioned above.')
[5] What happened? Write your answer in a comment. (Numbers in brackets are time estimates in minutes, not mark counts.)
Logging handles messages differently depending on their logging level. A critical message is on a higher level than an info or a warning. We can change the default settings to force all of the levels to be treated the same.
logging.basicConfig(level=logging.DEBUG)
logging.debug('This message should now display.')
logging.error('And so should this one')
Because the level was set to DEBUG
, all levels DEBUG
and up will be
handled by the basic handler. We'll get to handlers in just a minute. Note that
we said "DEBUG and up." As should be fairly clear, critical messages are more
important than debug or info messages. The importance of the tiers is as
follows: debug, info, warning, error, critical. A debug is like a print
statement that you want for developing or bug testing your code, but that an
end-user shouldn't see.
logging.debug('The velocity is {}'.format(velociraptor1.velocity))
An info alerts the end user as to what the code is doing.
logging.info('Purging all files on C:\.')
A warning warns that something might be going wrong, but that the program is trying to handle it.
logging.warning('No initial item database provided. Generating default table.')
An error implies that something is wrong, but can perhaps be dealt with. It won't necessarily cause a crash, but likely some issue.
logging.error('The sum of these positive numbers is negative.')
A critical implies a catastrophic failure. The program will crash, possibly even the system.
logging.critical('Insufficient resources for this many requests.')
Adjacent logging levels can be fairly similar, and there is often some overlap. The reason to keep in mind which levels are which is that you can customize how each type of message is handled. A given kind of message can be ignored, written to a file or printed to the console.
Now we are going to talk about handlers, and then you're going to get to log what happens in some code.
There are three main parts to how logging actually works: the logger, the handler and the formatter.
The logger contains the information about logging itself. It knows what level
of logging to pass to the handler(s) and has some assigned handler(s). Instead
of using logging.info()
, we could create a custom logger called
mylogger
(using the logging.Logger
class/constructor)
and use mylogger.info()
. The handler tells us what to do with the
message. It has an assigned formatter, and can also have a logging level. The
real beauty here is in the number of handler classes available. The two most
common ones are the SteamHandler
and the FileHandler
.
However, more exotic ones exist, like the SMTPHandler
which emails
messages or the RotatingFileHandler
which automatically removes log
lines when the file hits a particular maxmimum size to avoid wasting disk space.
The formatter is the simplest part. It takes the message from
logger.info('Test')
, and formats it. It can do things like add a
timestamp (useful if tasks are taking longer than they should), inform you as to
what module the message came from (useful if many modules have similar messages)
and more.
[5] Why is it a good idea to let the logger set the level (rather than the handler(s)) if possible? Answer in a comment.
More information can be found in this tutorial. Please read it now and refer to it in what follows.
[10] Ben Bitfiddle has written the code in
lab1b.py
. Help him debug it by using logging. Do not
actually fix the code, but only the debug messages. By that, we mean remove the
print
statements and convert them to logging debug statements. Use the
basicConfig
tool.
[40] Ben Bitfiddle has also written the code in
lab1c.py. Help him by using logging. Use good judgment as
to what levels of logging to use. This code is production code, so debug
information should be suppressed, info information should be saved to
lab1c.log
, and warnings (and higher) should be sent to the console. Hint:
This will require multiple handlers. Don't worry about the format you choose.
collections
LibraryTrainer Codestar is working on making his own Pokedex. He wants to count how many Pokemon of each species he has encountered. Unfortunately, he's really busy getting ready for the Elite Four, so you'll have to help. Write the following function so that it takes a string, and adds it to the dictionary. If it is already in the dictionary, increment the counter instead.
GLOBAL_POKEMON_DICT = {}
def poke_sort(pokemon_name):
# TODO
Professor Oak looks over your code and explain to you what a defaultdict
is. The collections
library contains a number of useful data types, one of
which is a defaultdict
. A defaultdict
is initialized as follows:
a = defaultdict(arg)
where the argument arg
is either a builtin type or a lambda
function
that returns a value. When a piece of code tries to extract a value from the
defaultdict
and the defaultdict
does not explicitly have any data in that
slot, the lambda
is evaluated to get the default value, or the default value
of that type is used. For instance:
a = defaultdict(int)
a['hi'] = 9
print(a['hi']) # 9
print(a['fish']) # 0: default value of type int
Since we used int
as our default type, we get back 0 in the second case.
[10] Rewrite the poke_sort
function with a defaultdict
. Note the
increased cleanliness of the code. If you need the documentation, it's
here.
GLOBAL_POKEMON_DEF_DICT = ????
def poke_sort2(pokemon_name):
# TODO
Sometimes you want some benefits of a class structure, but without the
overhead of making an entire class. The namedtuple
satisfies some of these
needs. Suppose you want a rectangle object, but there aren't any methods you can
think of. It seems like a bit of a pain to make a new class just for that. So
maybe a tuple is in order. However, should the tuple be (x1,x2,y1,y2)
or
should it be (x1,y1,x2,y2)
? Or maybe (x1,y1,x_side_length,y_side_length)
?
It doesn't really matter, but if you come back to this project in a few weeks,
you might not remember the convention. And anyone who looks at this code in the
future is going to spend a few minutes thinking about it.
t[0] + t[1]
doesn't mean anything by itself. However,
rect1.x1 + rect1.x2
does mean something. Thus, we have the namedtuple
.
a = namedtuple(class_name, lst)
where the class_name
refers to the name of the class you would
have made, and the lst
is a list of the fields of the proposed
class.
Thus, we would create the Rect
class by doing
Rect = namedtuple('Rect', [x1, x2, y1, y2])
Now we can do either of:
t = Rect(0, 0, 1, 1)
t = Rect(x1=0, x2=0, y1=1, y2=1)
And we can access pieces using t.x1
or t[0]
.
NOTE: There is nothing to hand in for this section (on namedtuple
).
It's for informational purposes only.
The lab1a.py
, lab1b.py
, lab1c.py
and lab1d.py
programs. Part A
should be only comments. Please follow the same style guide lines as CS1, which
follow PEP8.