Linux for Suits - The World Live Web

by Doc Searls

There's a split in the Web. It's been there from the beginning, like an elm grown from a seed that carried the promise of a trunk that forks twenty feet up toward the sky.

The main trunk is the static Web. We understand and describe the static Web in terms of real estate. It has “sites” with “addresses” and “locations” in “domains” we “develop” with the help of “architects”, “designers” and “builders”. Like homes and office buildings, our sites have “visitors” unless, of course, they are “under construction”.

One layer down, we describe the Net in terms of shipping. “Transport” protocols govern the “routing” of “packets” between end points where unpacked data resides in “storage”. Back when we still spoke of the Net as an “information highway”, we used “information” to label the goods we stored on our hard drives and Web sites. Today “information” has become passé. Instead we call it “content”.

Publishers, broadcasters and educators are now all in the business of “delivering content”. Many Web sites are now organized by “content management systems”.

The word content connotes substance. It's a material that can be made, shaped, bought, sold, shipped, stored and combined with other material. “Content” is less human than “information” and less technical than “data”, and more handy than either. Like “solution” or the blank tiles in Scrabble, you can use it anywhere, though it adds no other value.

I've often written about the problems that arise when we reduce human expression to cargo, but that's not where I'm going this time. Instead I'm making the simple point that large portions of the Web are either static or conveniently understood in static terms that reduce everything within it to a form that is easily managed, easily searched, easily understood: sites, transport, content.

The static Web hasn't changed much since the first browsers and search engines showed up. Yes, the “content” we make and ship is far more varied and complex than the “pages” we “authored” in 1996, when we were still guided by Tim Berners-Lee's original vision of the Web: a world of documents connected by hyperlinks. But the way we value hyperlinks hasn't changed much at all. In fact, it was Sergey Brin's and Larry Page's insights about the meaning of links that led them to build Google: a search engine that finds what we want by giving maximal weighting to sites with the most inbound links from other sites that have the most inbound links. Although Google's PageRank algorithm now includes many dozens of variables, its founding insight has proven extremely valid and durable. Links have value. More than anything else, this accounts for the success of Google and the search engines modeled on it.

Among the unchanging characteristics of the static Web is its nature as a haystack. The Web does have a rudimentary directory with the Domain Name Service (DNS), but beyond that, everything to the right of the first single slash is a big “whatever”. UNIX paths (/whatever/whatever/whatever/) make order a local option of each domain. Of all the ways there are to organize things—chronologically, alphabetically, categorically, spatially, geographically, numerically—none prevails in the static Web. Organization is left entirely up to whoever manages the content inside a domain. Outside those domains, the sum is a chaotic mass beyond human (and perhaps even machine) comprehension.

Although the Web isn't organized, it can be searched as it is in the countless conditional hierarchies implied by links. These hierarchies, most of them small, are what allow search engines to find needles in the World Wide Haystack. In fact, search engines do this so well that we hardly pause to contemplate the casually miraculous nature of what they do. I assume that when I look up linux journal diy-it (no boolean operators, no quotes, no tricks, just those three words), any of the big search engines will lead me to the columns I wrote on that subject for the January and February 2004 issues of Linux Journal. In fact, they probably do a better job of finding old editorial than our own internal searchware. “You can look it up on Google” is the most common excuse for not providing a search facility for a domain's own haystack.

I bring this up because one effect of the search engines' success has been to concretize our understanding of the Web as a static kind of place, not unlike a public library. The fact that the static Web's library lacks anything resembling a card catalog doesn't matter a bit. The search engines are virtual librarians who take your order and retrieve documents from the stacks in less time than it takes your browser to load the next page.

In the midst of that library, however, there are forms of activity that are too new, too volatile, too unpredictable for conventional Web search to understand fully. These compose the live Web that's now branching off the static one.

The live Web is defined by standards and practices that were nowhere in sight when Tim Berners-Lee was thinking up the Web, when the “browser war” broke out between Netscape and Microsoft, or even when Google began its march toward Web search domination. The standards include XML, RSS, OPML and a growing pile of others, most of which are coming from small and independent developers, rather than from big companies. The practices are blogging and syndication. Lately podcasting (with OPML-organized directories) has come into the mix as well.

These standards and practices are about time and people, rather than about sites and content. Of course blogs still look like sites and content to the static Web search engines, but to see blogs in static terms is to miss something fundamentally different about them: they are alive. Their live nature, and their humanity, defines the live Web.

It is essential that we understand the live Web on its own terms, rather than those leveraged from the static Web.

Blogs are journals, not sites. They are written, not built. The best ones have a heart that beats daily or faster. The writing itself is more conversational than homiletic (which is how I'm behaving here, in a print publication with a monthly heartbeat). That means its authors are speaking, and not just “creating content”. They speak to readers and other bloggers who speak back, through e-mails, comments or on blogs of their own. That means what each blogger says is often incomplete and provisional. Like all forms of life, blogging remains unfinished for the duration. (Site content, on the other hand, is finished at any one time, then replaced with other finished content.)

A few months back, I was asked to explain blogging to somebody who knew nothing about it. When I finished, the guy understood that blogging was a new form of journalism that gave individuals a higher degree of leverage than ever before. He then instructed me, as a fairly well-known blogger, to devote my remaining life immediately to correcting the familiar evils of the world.

I replied that I was already 57 years old and tired of pushing large rocks up steep hills for short distances—also of getting flattened by the rocks that rolled back over me. I told him blogging might make Sisyphus' life a bit easier in some cases, but that its better leverage was on snowballs. My work as a blogger, I explained, is rolling snowballs downhill. Some I create new; others I push along, adding a small measure of mass along the way.

My point: rolling snowballs is way different from building sites and transporting content. Not totally different, perhaps, but enough to fork the Web.

Blogging predated syndication, but it was syndication that began to give form to the live Web. Syndication provided a way for people, and the tools they use, to pay attention (through subscription) to feeds from syndicated sources. At first these sources were blogs and publications, but later they came to include searches for topics of conversation, including the names of authors, URLs and permalinks for particular blog posts or news stories. Many of those sources were not the blogs themselves, but search engines reporting the results of keyword and URL searches.

At the time of this writing, the most popular live Web search engine is Technorati (now about #700 on Alexa, with around 80-million page views per day). It was born in November 2002 on a Linux box from Penguin Computing that sat in David Sifry's basement. The box was loaned to help the two of us write a feature on blogging that ended up running in the February 2003 issue of Linux Journal. David wrote Technorati to help him do research for the story. The first time I saw it, I also saw the fork in the Web. What Technorati searched was alive, moving, changing. Its results were also radically different from what I got from the static Web. This past spring somebody who works for Victoria's Secret complained to a friend about the limited knowledge the company had obtained regarding its IPEX bra, which had hit the stores only a few weeks before. A search on Google brought up only Victoria's Secret's own site and a few others that offered retail information. My friend showed her a Technorati search for “ipex” that brought up hundreds of posts, mostly by women telling to other women how much they liked the bra. That search was a window on Unfiltered Truth that barely resembled anything the company would get from focus groups or other customary forms of market research.

Today there are a half-dozen engines devoted to searching the live Web. They're all different. Blogpulse stresses trending and ranking (with a great UI and excellent graphics). PubSub doesn't offer Web search but instead concentrates on keyword search feeds to users' aggregators. Bloglines integrates search with aggregation and other services. IceRocket emphasizes performance and simplicity. Technorati focuses on rapid indexing, tag search and hot topics. Feedster leads with personalization and index size.

All those characterizations are simplistic and incomplete. They are also obsolete by the time you read this. The whole category is changing as rapidly as the individuals and social trends they follow, as well as the technologies that make them possible and the developers who do new things with those technologies. A couple days ago I talked with a new company that gathers and syndicates conversation around local businesses and services, making the Live branch of the Wide Web as local as possible. I have at least one of these conversations every week.

This morning I had a conversation with some techies involved in “microformats”. These are described on the microformats.org site as “a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems, first by adapting to current behaviors and usage patterns (for example, XHTML, blogging).” Rather than specifications and standards, microformats are “design principles”, “methods of adaptation to usage patterns”, “correlated with semantic XHTML and the Real World” and “a way of thinking about data”. Far as I know, nobody around microformats wants to patent them or to patent a business model that makes use of them. Just as nobody patented RSS (which first meant “rich site summary” but came to mean “really simple syndication” after Dave Winer led its evolution into a stable live Web enabler). We can thank this kind of largesse for the Net and the Web, as well as for Linux and the Free Software and Open Source movements.

Tagging is a perfect example of standards and practices evolving in a live, organic way. Tags are labels that serve as categories, attached by users to photographs, lists, blog posts or anything they put up on the Web (or that others put up). Tags first appeared on del.icio.us, a social bookmarks manager, and on Flickr, a photo sharing service. In both cases, developers put users in control of their own creations (note that I avoid saying “content”) and the descriptions of those creations. Later, Technorati began doing not only tag searches, but also establishing standards for tagging in links (including the rel=“tag” element). Authors and users began adding tags to all kinds of stuff. As a result, tags are now becoming a form of live Web organization.

The blogging branch of the live Web has another kind of order: chronological. Whether served up by TypePad or Drupal or Manila or some other system, blogs are all organized the same way: blogname.suffix/year/month/day/post. The permalink of the post is its unique URL.

Any pile of organized data can be archived. This means that the part of the Web that's least static is also the part that can be archived and organized like a library—and researched the same way, only better. Think about the amount of data that can be gathered from a sum of sources organized by date and category (tags). Think of the intelligence that can be gleaned from that. Also think about the business there might be in facilitating or selling that intelligence.

I see by Netcraft that all the live Web search engines I've named so far run on Linux. So do Google, AskJeeves and A9. Even MSN Search runs on Linux, through Akamai's giant server farms. The only exception is Yahoo, running its own breed of BSD (which is still an open-source OS).

As I write this, I'm also helping put together the Syndicate conference in San Francisco (December 12–14, 2005, at the Hilton downtown—this issue of Linux Journal should be on the newsstands at that time). It is customary at tradeshows to look to vendors and large service providers for leadership. With the live Web, however, leadership doesn't just come from the big guys. In fact, most of it comes from independent developers and pioneering users. In this respect, the live Web is more an ecosystem than an industrial category. The folks standing on stage will have lots to say, but so will the folks who compose what we used to call “the audience”. It will be interesting to see how conversations go.

It also will be interesting to see which way the live Web carries Linux innovations and conversations about them. Linux and open-source development have always had their live qualities. As the live Web grows, we can expect those to become more organized (by chronology or tag, for example) at the very least.

Is it possible that “live” will join “free” and “open” in our pantheon of adjectives? Possibly. Whether or not it does, I'd like to thank my son Allen for being the first to utter “World Live Web”, providing me with a perspective I never knew I lacked, until I heard it.

His original vision of the World Live Web was a literal one: a Web where anybody could contact anybody else and ask or answer a question in real time. When he first encountered the Web, as a researcher, he saw it as something fundamentally deficient at supporting the most human forms of interaction: the kind where one person increased the knowledge of another directly.

We've moved a long way in the live direction since Allen first introduced me to the concept. VoIP alone is a huge live category. Mobile Web progress will all happen along its live branch.

Where it goes exactly is anybody's guess. All we can say for sure is it's headed toward the sky.

Doc Searls is Senior Editor of Linux Journal.

Load Disqus comments