Scientific Graphing in Python
In my last few articles, I looked at several different Python modules that are useful for doing computations. But, what tools are available to help you analyze the results from those computations? Although you could do some statistical analysis, sometimes the best tool is a graphical representation of the results. The human mind is extremely good at spotting patterns and seeing trends in visual information. To this end, the standard Python module for this type of work is matplotlib. With matplotlib, you can create complex graphics of your data to help you discover relations.
You always can install matplotlib from source; however, it's easier to install it from your distribution's package manager. For example, in Debian-based distributions, you would install it with this:
sudo apt-get install python-matplotlib
The python-matplotlib-doc package also includes extra documentation for matplotlib.
Like other large Python modules, matplotlib is broken down into several sub-modules. Let's start with pyplot. This sub-module contains most of the functions you will want to use to graph your data. Because of the long names involved, you likely will want to import it as something shorter. In the following examples, I'm using:
import matplotlib.pyplot as plt
The underlying design of matplotlib is modeled on the graphics module for the R statistical software package. The graphical functions are broken down into two broad categories: high-level functions and low-level functions. These functions don't work directly with your screen. All of the graphic generation and manipulation happens via an abstract graphical display device. This means the functions behave the same way, and all of the display details are handled by the graphics device. These graphics devices may represent display screens, printers or even file storage formats. The general work flow is to do all of your drawing in memory on the abstract graphics device. You then push the final image out to the physical device in one go.
The simplest example is to plot a series of numbers stored as a list. The code looks like this:
plt.plot([1,2,3,4,3,2,1]) plt.show()
The first command plots the data stored in the given list in a regular
scatterplot. If you have a single list of values, they are assumed to be
the y-values, with the list index giving the x-values. Because you did not
set up a specific graphics device, matplotlib assumes a default device
mapped to whatever physical display you are using. After executing the
first line, you won't see anything on your display. To see
something, you need to execute the second show()
command. This
pushes the graphics data out to the physical display (Figure 1).
You should notice that there are several control buttons along the
bottom of the window, allowing you to do things like save the image
to a file. You also will notice that the graph you generated is
rather plain. You can add labels with these commands:
plt.xlabel('Index')
plt.ylabel('Power Level')
Figure 1. A basic scatterplot window includes controls on the bottom of the pane.
You then get a graph with a bit more context (Figure 2). You
can add a title for your plot with the title()
command, and the
plot
command is even more versatile than that. You can change the plot
graphic being used, along with the color. For example, you can make green
triangles by adding g^
or blue circles with
bo
. If you want more than
one plot in a single window, you simply add them as extra options to
plot()
. So, you could plot squares and cubes on the same plot with
something like this:
t = [1.0,2.0,3.0,4.0]
plt.plot(t,[1.0,4.0,9.0,16.0],'bo',t,[1.0,8.0,27.0,64.0],'sr')
plt.show()
Figure 2. You can add labels with the xlabel and ylabel functions.
Now you should see both sets of data in the new plot window (Figure 3). If you import the numpy module and use arrays, you can simplify the plot command to:
plt.plot(t,t**2,'bo',t,t**3,'sr')
Figure 3. You can draw multiple plots with a single command.
What if you want to add some more information to your plot, maybe a text
box? You can do that with the text()
command, and you can set the location
for your text box, along with its contents. For example, you could use:
plt.text(3,3,'This is my plot')
This will put a text area at x=3, y=3. A specialized form of text box is
an annotation. This is a text box linked to a specific point of data. You
can define the location of the text box with the xytext
parameter and
the location of the point of interest with the xy
parameter. You
even can set the details of the arrow connecting the two with the
arrowprops
parameter. An example may look like this:
plt.annotate('Max value', xy=(2, 1), xytext=(3, 1.5),
↪arrowprops=dict(facecolor='black', shrink=0.05),)
Several other high-level plotting commands are available.
The bar()
command lets you draw a barplot of your data. You can change
the width, height and colors with various input parameters. You even
can add in error bars with the xerr
and
yerr
parameters. Similarly, you
can draw a horizontal bar plot with the barh()
command. Or, you can draw box and whisker
plots with the boxplot()
command. You can create plain contour
plots with the contour()
command. If you want
filled-in contour plots,
use contourf()
. The
hist()
command will draw a histogram,
with options to control items like the bin size. There is even a command
called xkcd()
that sets a number of parameters so all of the
subsequent drawings will be in the same style as the xkcd comics.
Sometimes, you may want to be able to interact with your
graphics. matplotlib needs to interact with several different toolkits,
like GTK or Qt. But, you don't want to have to write code for every
possible toolkit. The pyplot sub-module includes the ability to add event
handlers in a GUI-agnostic way. The FigureCanvasBase class contains
a function called mpl_connect()
, which you can use to connect some
callback function to an event. For example, say you have a function
called onClick()
. You can attach it to the button press event with
this command:
fig = plt.figure()
...
cid = fig.canvas.mpl_connect('button_press_event', onClick)
Now when your plot gets a mouse click, it will fire your callback
function. It returns a connection ID, stored in the variable
cid
in this
example, that you can use to work with this callback function. When you
are done with the interaction, disconnect the callback function with:
fig.canvas.mpl_disconnect(cid)
If you just need to do basic interaction, you can use the
ginput()
command. It will listen for a set amount of time and return a list of
all of the clicks that happen on your plot. You then can process those
clicks and do some kind of interactive work.
The last thing I want to cover here is animation. matplotlib includes a sub-module called animation that provides all the functionality that you need to generate MPEG videos of your data. These movies can be made up of frames of various file formats, including PNG, JPEG or TIFF. There is a base class, called Animation, that you can subclass and add extra functionality. If you aren't interested in doing too much work, there are included subclasses. One of them, FuncAnimation, can generate an animation by repeatedly applying a given function and generating the frames of your animation. Several other low-level functions are available to control creating, encoding and writing movie files. You should have all the control you require to generate any movie files you may need.
Now that you have matplotlib under your belt, you can generate some really stunning visuals for your latest paper. Also, you will be able to find new and interesting relationships by graphing them. So, go check your data and see what might be hidden there.