Popcon - Are You In Or Out?
Those of you who regularly install Debian may have noticed a prompt that asks you if you would like to install Popcon, the Debian Popularity Contest. Popcon gathers statistics about package usage and periodically submits it to Debian. The anonymous statistics gathered by the script are freely available on the Debian website, and the script can be invoked manually to give a clearer idea of package usage on your own system.
I must admit that I had always declined to take part in the survey. Some people will object on privacy grounds, but personally, I trust that Debian aren't going to do anything devious with the info. I had opted out because it sounded like another possible point of failure and didn't actually know what the project did.
If you didn't select it when installing Debian, you can install Popcon at any time via the package manager, and this doesn't hamper the quality of the data. If you're installing it manually, bear in mind that it installation script prompts for user input, so make sure that you can view the text output of your package management system. The information that it is actually gathering is the installation date and most recent access date of every package on your system. By default, Popcon gathers the information and submits it once a week using a cron job.
Once installed, you can invoke it automatically by typing (as root)
popularity-contest
You'll receive a long list of all of the packages on your system arranged in order of most recently accessed. Here is a sample of the output when I ran it on my Debian Sid box.
1290877204 1290877209 iptables /usr/sbin/ip6tables-apply OLD
1290877204 1290877339 ed /usr/bin/red OLD
1290877204 1290877401 laptop-detect /usr/sbin/laptop-detect OLD
1290877204 1290877230 libnfsidmap2 /usr/lib/libnfsidmap/static.so OLD
1290877204 1290877414 libruby1.8 /usr/lib/ruby/1.8/net/ftp.rb OLD
1290877204 1290877455 google-gadgets-gst /usr/lib/google-gadgets/modules/gst-audio-framework.so OLD
1290877204 1290877246 tcpd /usr/sbin/tcpd OLD
The first two numbers are the access and the creation time of the most recently accessed file within the library. The time is presented in Unix time format, that is, number of seconds elapsed since midnight January 1970. This is followed by the name of the library and the most recently accessed file in that library. The last piece of information is a tag which indicates if that library is considered old (not accessed for more than a month). There are tags to indicate if the library is recently installed or contains no runnable programs.
Obviously, the output for a typical system is going to be vast. For this reason, if you're invoking it from the command line, either piping to a file or grep is the best approach. For example, piping it to a file with
popularity-contest >popcon.txt
yielded a file that worked fine when dropped onto the Gnumeric spreadsheet application. It's worth noting that Gnumeric has a function convert Unix time into typical date format.
You can obtain the statistics that have been collated from all participating systems via the Debian website. Obviously, these results are tainted by the classic voluntary survey weakness of self selection. Who knows, perhaps people who choose to participate in Popcon are have different usage patterns to people who don't?
Personally, in future, I'm going to enable Popcon on my main system as I'm sure the data is useful to the Debian project. In addition, I've often wondered what stuff is installed on my system yet never actually used.
The Debian Popularity Contest website
The readme file, which gives detailed instruction on how to use Popcon.
The FAQ file which addresses potential concerns that users might have in terms of privacy issues etc.