April 28, 2003
Weaving search database with distributed crawling
THIS WEEK NetSpeak takes a look at a project that attempts to build a search engine database using the distributed computing model.
It is known that a search service scans its web page index, collects all the web pages that contain a search string, organises and displays the search result on a web page. This means that an important factor that determines effectiveness of a search service is the size of its web page database, which is the basic data to generate the search output. However, the Net contains billions of web pages and it is rather impossible for any search engine to scan the entire Net to collect all the required web pages.
It is known that a search service scans its web page index, collects all the web pages that contain a search string, organises and displays the search result on a web page. This means that an important factor that determines effectiveness of a search service is the size of its web page database, which is the basic data to generate the search output. However, the Net contains billions of web pages and it is rather impossible for any search engine to scan the entire Net to collect all the required web pages.
