搜索引擎已经在1994个网站的新生开始了。许多人认为雅虎!作为第一个搜索引擎,但雅虎!实际上是一个目录第一和然后开始搜索网络。搜索引擎和目录之间的差异;进行信息的存储和检索的方法。在一个目录,如雅虎!DMOZ网站,或socengine,提交和添加到由人审查特定类别帮助分类并剔除,并# 039位点T属。在一个搜索引擎,网页上自动的搜索引擎蜘蛛电脑程序,索引和目录页面自动。搜索引擎保持页面的缓存副本的索引来检索这些信息来决定何时显示一个链接到网页谁是用户搜索。
Let's examine the primary functions of a search engine:
我们的研究# 039;一个搜索引擎的主要功能:
Spider pages on the web by visiting a page, then following all of the links on that page and spidering the next set of pages (this process is repeated infinitely).
通过访问一个网页蜘蛛页面,然后所有在该网页和搜索页的下一步设置链接(这个过程重复无限)。
Cache/index each spidered page in an enormous database of web pages that is easily and quickly accessible for search.
每个爬虫页面缓存/索引的网页,方便快捷地搜索一个庞大的数据库。
Create an algorithm that ranks web search results based on factors that will list the results from most relevant to least relevant.
创建一个算法,队伍的基础上,从最相关的结果的因素最相关的网页搜索结果列表。
Return results to users based on the search queries they enter.
返回结果的基础上的搜索查询他们进入用户。
The modern search engine is like a giant catalogue of every web page it's spiders have crawled. Google, Yahoo!, MSN, Teoma & others store billions of web pages in their server banks, ready to call upon any of them to appear in the search results should they properly match a query done by a user. At the present time, search engine index sizes are probably close to these numbers:
现代搜索引擎,像一个巨大的每个网页和# 039目录的蜘蛛爬行。谷歌雅虎。,MSN,Teoma &;其他商店的数十亿网页在自己的服务器上的银行,准备呼吁任何人出现在搜索结果中应正确匹配的查询用户所做的。目前,搜索引擎索引的大小可能接近这些数字:
Google ~8 billion pages
谷歌~ 8000000000页
Yahoo! ~6 billion pages
雅虎。~ 6000000000页
MSN Beta ~4 billion pages
MSNβ~ 4000000000
Teoma (Ask Jeeves) ~4 billion pages
发布(Ask Jeeves)~ 4000000000页
Gigablast ~1 billion pages
gigablast ~ 1000000000页
Search engine index size is actually an excellent measurement of quality and thoroughness. This is not only because search engines with more pages indexed can return more results, but because these higher powered engines spider pages more frequently and thus have the freshest data available, as well as the best understanding of the web's link structure (which pages link to which other pages), providing a better measure of popularity and quality.
搜索引擎索引的大小实际上是质量和彻底的优良的测量。这不仅是因为有更多的网页索引搜索引擎可以返回更多的结果,但是因为这些更高的引擎蜘蛛页面更加频繁,从而得到最新的可用数据,以及最好的了解互联网和# 039;链接结构(该页面链接到其他页),提供一个更好的措施人气和质量。
清空内容