An in depth study undertaken by AltaVista, Compaq, and IBM reveals that not all pages on the World Wide Web are as well connected as we think.
The Web is shaped like a large bow tie with many underconnected sites out on its hard-to-reach fringes, say the researchers, who hope to use their indexed results to design better search engines and help electronic-commerce sites get noticed.
To determine the Web's structure, the companies used the AltaVista search engine and Compaq AlphaServer hardware to perform two massive "crawls" of more than 200 million pages by following the 1.5 billion hyperlinks connecting them.
Search engines normally perform crawls to create the indexes that help speed up searches, says Jim Schissler, an AltaVista spokesperson.
IBM researchers analysed the results and discovered that about a third of all Web sites are in a "strongly-connected core"--the knot of the figurative bow tie.
You can easily travel between those pages via hyperlinks. Meanwhile, one side of the tie, containing about a quarter of all Web pages, consists of "origination" pages that let you eventually get to the core, but can't be reached from it.
Likewise, "termination" pages on the other side of the tie can be reached from the core, but have trouble returning to it. Finally, one-fifth of the pages can't be reached from the core at all, but only from origination or termination pages.