Advanced metasearch engine technology [electronic resource] / Weiyi Meng, Clement T. Yu.

Meng, Weiyi.
San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool, c2011.
Synthesis digital library of engineering and computer science.
Synthesis lectures on data management, 2153-5426 ; # 11.
1 electronic text (x, 117 p.) : ill., digital file
Web search engines -- Mathematical models.
Federated searching -- Mathematical models.
System Details:
Mode of access: World Wide Web.
Among the search tools currently on the Web, search engines are the most well known thanks to the popularity of major search engines such as Google and Yahoo! While extremely successful, these major search engines do have serious limitations. This book introduces large-scale metasearch engine technology, which has the potential to overcome the limitations of the major search engines. Essentially, a metasearch engine is a search system that supports unified access to multiple existing search engines by passing the queries it receives to its component search engines and aggregating the returned results into a single ranked list. A large-scale metasearch engine has thousands or more component search engines. While metasearch engines were initially motivated by their ability to combine the search coverage of multiple search engines, there are also other benefits such as the potential to obtain better and fresher results and to reach the DeepWeb. The following major components of large-scale metasearch engines will be discussed in detail in this book: search engine selection, search engine incorporation,and result merging. Highly scalable and automated solutions for these components are emphasized. The authors make a strong case for the viability of the large-scale metasearch engine technology as a competitive technology for Web search.
1. Introduction
Finding information on the web
A brief overview of text retrieval
System architecture
Document representation
Document-query matching
Query evaluation
Retrieval effectiveness measures
A brief overview of search engine technology
Special characteristics of the web
Web crawler
Utilizing tag information
Utilizing link information
Result organization
Book overview

2. Metasearch engine architecture
System architecture
Why metasearch engine technology
Challenging environment
Heterogeneities and their impact
Standardization efforts

3. Search engine selection
Rough representative approaches
Learning-based approaches
Sample document-based approaches
Statistical representative approaches
Number of potentially useful documents
Similarity of the most similar document
Search engine representative generation

4. Search engine incorporation
Search engine connection
HTML form tag for search engines
Automatic search engine connection
Search result extraction
Semiautomatic wrapper generation
Automatic wrapper generation

5. Result merging
Merging based on full document content
Merging based on search result records
Merging based on local ranks of results
Round-robin based methods
Similarity conversion based methods
Voting based methods
Machine learning based method

6. Summary and future research
Authors' biographies.
Part of: Synthesis digital library of engineering and computer science.
Series from website.
Includes bibliographical references (p. 107-115).
Yu, Clement T.
9781608451937 (electronic bk.)
9781608451920 (pbk.)
Publisher Number:
10.2200/S00307ED1V01Y201011DTM011 doi
Access Restriction:
Restricted for use by site license.
