Physics Analysis Workstation
CERN is the European Laboratory for Particle Physics. It has been in the news quite a bit lately with the discovery of the Higgs Boson at the Large Hadron Collider. Something that many people may not know is that it also has a long tradition of developing software for scientific use. The HTML document format and the first browser both were developed there as a way of using rich documents that could include links between many different sources of information. It was so useful, it ended up sparking the World Wide Web. Along with such widespread software, CERN has been responsible for quite a bit of scientific software, especially physics software.
In this article, I take a look at a fairly large group of modules and libraries called the Physics Analysis Workstation (PAW). PAW contains several thousand subroutines and programs that are written in FORTRAN, C and even some assembly language code, which is built on top of a library called the CERN Program Library (CERNLIB).
You can download and install the code from the source located at the main Web site if you have any special needs, but considering the long list of required external libraries, I suggest you avoid that if possible. Packages should be available for your distribution. For Debian-based distros, you can install everything you need with the command:
sudo apt-get install paw
PAW also includes a large series of graphing and data visualization routines to help in data analysis. Sometimes you need to see what your data looks like in order to figure out what further analysis you need to investigate.
PAW actually is an interactive system, where you can apply commands
against your data set. The original interface was a command-line one,
but it now has collected several other interfaces that you can try out. If
you open a terminal, type the command
paw
and press Enter,
you are presented with a question as to which terminal type you want
to use (Figure 1). The default is to use type 1, which opens an HIGZ
graphic window where your plots will be displayed (Figure 2). If
you are using PAW on a remote machine, you probably will want to use
a different type. You can get a list by typing
?
. For a regular
xterm, enter 7879
.
Figure 1. You can select the terminal type to use when you start PAW.
Figure 2. The default is to open a graphics window to draw your plots into, along with a command interface.
Once everything has finished loading, you are presented with a prompt that looks like this:
PAW >
Now you can start typing commands and doing data analysis. But,
what commands can you use? Luckily, PAW includes a help system within
the program that you can access by typing the
help
command, which
pops up a list of topics.
Commands in PAW are grouped together in a tree structure, with the top-most level being the topics that pop up when you start the help system. There is also quite a bit of documentation available on the main Web site, including tutorials and a very large FAQ.
Because PAW is used for data analysis, let's start with what kinds of data you can use. PAW has three main data types: VECTORS, HISTOGRAMS and NTUPLES. VECTORS store arrays of reals or integers. PAW can handle up to three dimensions, or indexes, for these VECTORS. They can be manipulated by the group of VECTOR commands. Commands in PAW are not case-sensitive, but in most documentation, they are shown in uppercase. You also can use abbreviations for commands, as long as they can be matched uniquely to the full command text. So, you can create a new VECTOR of 20 elements with the command:
VECTOR/CREATE vec1(20)
This new VECTOR is named "vec1". Then you can add elements to your new vector with this command:
VECTOR/INPUT vec1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
The command takes a vector name and a list of values to add. This is
fine if you are dealing with just a small set of data. If you have larger
data sets stored in files, you can use the command
VECTOR/READ
. This
command takes a filename, and it also can take several other options, like the
format of the elements, and loads the data into the given VECTORS.
The optional format string is similar to those used in reading and writing data in FORTRAN code, so a refresher course may be a good idea if it has been some time since you have used FORTRAN.
You can output
data to a file with the inverse
VECTOR/WRITE
command.
To visualize
your data, use the VECTOR/DRAW
command. The options available
allow you to select whether to draw a histogram, a smooth curve or a bar
chart. You also can draw this visualization over the top of another
graph.
You can get a list of all of the VECTORS that have been created
with the VECTOR/LIST
command, and you can clean up unneeded data with the
VECTOR/DELETE
command.
Once you have loaded your data and taken a look
at it, you may have an idea of how the different parts are related to
each other. You can use the VECTOR/FIT
command to take a function,
defined by you with a subroutine, and try to fit the data to it. You
also can include a set of associated errors when issuing the command.
The HISTOGRAM group of commands within PAW gives you a larger selection
of plotting and analysis tools to apply to your data. The commands are
broken down into subgroups that give you commands to create histograms,
2D plots and apply histogram operations to histograms. You can use the
GET_VECT
and PUT_VECT
command subgroups to interact with the VECTOR
object that you created above. You also can use
FUNCTION
commands to
create functions that are used in commands that do data fitting, among
other areas.
The NTUPLE
group of commands are used to manipulate ntuple
objects. Ntuples essentially are lists of lists, and you can think of them as
matrices. In the PAW documentation, each row is called an event,
and each column is called a variable. There are functions to merge
data together or make cuts of subsets. Ntuples have their own plot
commands that allow you to plot different variables against each other
in various forms. If you have lots of data to deal with, you can use the
CHAIN
command to chain together multiple ntuples to create data sets of
essentially unlimited size.
Although PAW is no longer under active development, there still is more than enough really useful code here to keep any scientist busy. If you are doing any work involving data analysis or modeling, especially in C or FORTRAN, it would be well worth your time to do a quick search of the available modules and subroutines in PAW to see if there is anything you can use to make your work progress more quickly. I cover only a very small portion of the functionality available in this article, so be sure to do a bit of a deeper dive to see what you can mine for your own work.