

The deep Web contains nearly 550 billion individual documents compared to the one billion of the surface Web.The deep Web contains 7,500 terabytes of information compared to nineteen terabytes of information in the surface Web.Public information on the deep Web is currently 400 to 550 times larger than the commonly defined World Wide Web.With this in mind, BrightPlanet has quantified the size and relevancy of the deep Web in a study based on data collected between March 13 and 30, 2000. If the most coveted commodity of the Information Age is indeed information, then the value of deep Web content is immeasurable. BrightPlanet's search technology automates the process of making dozens of direct queries simultaneously using multiple-thread technology and thus is the only search technology, so far, that is capable of identifying, retrieving, qualifying, classifying, and organizing both "deep" and "surface" content. But a direct query is a "one at a time" laborious way to search. Deep Web sources store their content in searchable databases that only produce results dynamically in response to a direct request. The deep Web is qualitatively different from the surface Web. Because traditional search engine crawlers can not probe beneath the surface, the deep Web has heretofore been hidden. Traditional search engines can not "see" or retrieve content in the deep Web - those pages do not exist until they are created dynamically as the result of a specific search. To be discovered, the page must be static and linked to other pages. Traditional search engines create their indices by spidering or crawling surface Web pages. The reason is simple: Most of the Web's information is buried far down on dynamically generated sites, and standard search engines never find it. While a great deal may be caught in the net, there is still a wealth of information that is deep, and therefore, missed. Searching on the Internet today can be compared to dragging a net across the surface of the ocean. Although it is designed as a marketing tool for a program "for existing Web portals that need to provide targeted, comprehensive information to their site visitors," its insight into the structure of the Web makes it worthwhile reading for all those involved in e-publishing. This White Paper is a version of the one on the BrightPlanet site.
