If searching the World Wide Web for that one nugget of information already seems like a bad trip into a jungle of data, Internet experts have some bad news: The situation is getting worse.
Even the most comprehensive search engine today is aware of no more than 16 percent of the estimated 800 million pages on the Web, according to a study published in the scientific journal Nature. In addition, the gap between what is posted on the Web and what is retrievable by search engines is widening fast. The study underscores a little-understood feature of the Internet. While many users believe that Web pages are automatically available to the search programs employed by such sites as Yahoo, Excite and Alta Vista, the truth is that finding, identifying and categorizing new Web pages requires great expenditures of
time, money and technology.
Researchers from the NEC Research Institute in Princeton, N.J., found that most major search engines index less than 10 percent of the Web. Even by combining all the major search engines, only 42 percent of the Web has been indexed.
The rest of the Web--trillions of bytes of data ranging from scientific papers to family photos--exist in a kind of black hole of information, accessible only if surfers have the exact address of a given site. Even Web pages that are indexed take an average of six months to be discovered by search engines.
There is hope for the future. Steve Lawrence, one of the researchers believes that new indexing technologies will eventually enable search engines to start gaining on the proliferating data. "I'm pretty optimistic that over a period of years the trend will reverse," he said. But he added, "The next
10 to 20 years could be really rough."