jiangdm的个人博客分享 http://blog.sciencenet.cn/u/jiangdm

博文

Distributed Web Retrieval

已有 2678 次阅读 2012-6-18 23:00 |个人分类:AI & ML|系统分类:论文交流

Distributed web retrieval.pdf

Distributed Web Retrieval
Ricardo Baeza-Yates
WWW 2011 – Tutorial
 
ABSTRACT
  In the ocean of Web data, Web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters. Web data, however,is always evolving. The number of Web sites continues to grow rapidly (over 270 millions at the beginning of 2011) and there are currently more than 20 billion indexed pages. On the other hand, Internet users are above one billion and hundreds of million of queries are issued each day. In the near future, centralized systems are likely to become less effective against such a data-query load, thus suggesting the need of
fully distributed search engines. Such engines need to maintain high quality answers, fast response time, high query throughput, high availability and scalability; in spite of network latency and scattered data. In this tutorial we present the architecture of current search engines and we explore the main challenges behind the design of all the processes of a distributed Web retrieval system crawling, indexing, and query processing.
 
Keywords: Web Retrieval, Distributed Systems, Crawling, Indexing, Query Processing
 
 


https://wap.sciencenet.cn/blog-468147-583536.html

上一篇:review: super freakonomics
下一篇:review: Automatic Annotation of Web Services Based on Workfl
收藏 IP: 115.148.249.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-6-2 19:09

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部