Stanford Digital Library Project
[http://www-diglib.stanford.edu/diglib/]
SIDL-WP-1997-0072
[http://www-diglib.stanford.edu/diglib/WP/PUBLIC/DOC159.html]
Larry Page
page@cs.stanford.edu
Abstract: The Problem
The web creates new challenges for information retrieval. The amount of information on the web is growing rapidly, as well as the number of new users inexperienced in the art of web research. People are likely to surf the web using its link graph, often starting with high quality human maintained indexes such as "Yahoo!" or with search engines. Human maintained lists cover popular topics effectively but are subjective, expensive to build and maintain, slow to improve, and cannot cover all esoteric topics. Automated search engines that rely on keyword matching usually return too many low quality matches. To make matters worse, some advertisers attempt to gain people's attention by taking measures meant to mislead or "spam" automated search engines.
PageRank, Providing Part of the Solution
The citation graph of the web is an important resource that has largely gone unused in existing web search engines. We have created maps containing as many as 518 million of these hyperlinks, a significant sample of the total. These maps allow rapid calculation of a web page's "PageRank", an objective measure of its citation importance that corresponds well with people's subjective idea of importance. Because of this correspondence, PageRank is an excellent way to prioritize the results of web keyword searches. For most popular subjects, a simple text matching search that is restricted to web page titles performs admirably when PageRank prioritizes the results.
Although PageRank draws on the academic citation literature, it depends on properties of the web that are not present in typical academic citations. In addition, we will discuss how PageRank measures how often each web page is visited according to an idealized model of user behavior. We will demonstrate the difficulty of artificially inflating a page's PageRank as advertisers may attempt to do. We will describe expected discrepancies between a web page's actual usage count and its PageRank.
A Demonstration of our Prototype
A prototype that contains the PageRank for 16 million pages and searches over web page titles will be demonstrated. The attendees will be asked to offer queries to help us demonstrate the system and assess the quality of the search results. We will also demonstrate a web browsing accessory that graphically annotates each link in the current page with its PageRank, allowing users to visually spot the destinations with the highest citation importance.
Note: Papers in this series are in development and are not in a final form for publication or general dissemination. They are subject to change. Please do not quote or further distribute them without explicit permission from the authors.
This paper was created on: 08/18/97 and last revised on:9/16/1997
Author's Comments: This is currently a talk -- it will evolve into a paper.
Status: PUBLIC
Click here to see the full text of SIDL-WP-1997-0072 [http://www-diglib.stanford.edu/diglib/WP/PUBLIC/DOC159.html] (HTML)
Revision History
Version | Format | Date | Comments |
---|---|---|---|
1* | HTML | 9/15/1997 | This is currently a talk -- it will evolve into a paper. |
*http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1997-0072?1
http://www.stanford.edu/
http://www-diglib.stanford.edu/diglib/
http://www-diglib.stanford.edu/diglib/WP/
Steve Cousins [http://www-pcd.stanford.edu/cousins/] (cousins@cs.stanford.edu)
Copyright 1997