Hyperlink Search Engine(sm)

Demonstration site (patent pending)


(New!) How it works . . .

Put us to the test - Search Logic - What's real, what's vapor - Business model - About IDD - Contact

This web-search engine relies on votes by hundreds of thousands of web page authors to determine which sites are the best ones. It is the first public application of a new technique known as Hyperlink Vector Voting(sm).

The Hyperlink Vector Voting algorithm, developed by IDD Information Services, is at once subtle and easy to understand. A typical hyperlink is composed of the internet address of a referenced site, and a more user-friendly description displayed to the reader, also called anchor text. This is the anchor text description of a hyperlink pointing to the Hyperlink Search Engine [ http://rankdex.gari.com/ ].

The crucial innovation of Hyperlink Vector Voting is to collect anchor descriptions as the content of a searchable database. Each description is associated in the database with the referenced site, not the site where the link appeared. In this way, the descriptions constitute "votes" as to the subject of the referenced site. Counting these votes provides a decentralized and democratic way to identify what each web site is about. The "vector" in Hyperlink Vector Voting refers to the mathematics used for weighting the votes according to frequency.

It is important to recognize that the Hyperlink Search Engine, unlike other web search engines, does not index the text of web pages. According to the Hyperlink Vector Voting algorithm, what a site says about itself is not considered reliable. The only information the Hyperlink Search Engine stores about any page is the text of hyperlinks that point to it -- that is, only what others say that page is about!

Because the Hyperlink Search Engine avoids the need to index most of the content of web pages, its index is inherently smaller and more efficient than other web search engines. A database of hyperlinks requires less than 10% of the computer resources of a conventional web index. For this reason, we estimate that a search engine relying on Hyperlink Vector Voting alone is more efficient than other search engines by an order of magnitude. This demonstration site, running on a Sparc20, contains the results of an extensive web crawl and can handle up to 10 searches a second.

As an example of how Hyperlink Vector Voting works, consider the web site of the Kelley Blue Book car pricing service [ http://www.kbb.com/ ]. Hundreds of web authors have established hyperlinks to this popular site. They each describe the link in their own way. Examples of descriptions (anchor text) of hyperlinks pointing to the site are:

Kelly Blue Book [note popular misspelling of "Kelley"]
Check the value of the vehicle you own or want
Used car bluebook values [note misspelling of "blue book" as one word]
New and used car dealer prices
Kelley's Blue Book
Automobile prices
Used car values
Used car pricing
New car cost
Automotive bluebooks
According to the Hyperlink Vector Voting algorithm, each of these descriptions constitute an opinion, or vote, as to what the referred page is about. The more popular any particular description for a site, the more votes it gets -- and thus the more weight it receives in the Hyperlink Search Engine. A search on the misspelled "Kelly bluebook" will correctly return the Kelley site with a strong score, because other web authors used the same misspellings to describe the site!

PUT US TO THE TEST

We think the best way to evaluate the Hyperlink Search Engine is as follows. Ask it about subjects for which you already know the strongest web sites. For example, if your hobby is raising guinea pigs, and you know the major guinea pig web sites, then search on guinea pigs, and see which sites it returns among those you would rate as the top ones. Then submit the same query to other search engines and let us know how the results compare!

SEARCH LOGIC

The Hyperlink Search Engine is designed to make query formulation as simple as possible. No connectors or logical operators are needed. Simply enter a group of key words or a natural language phrase.

For those interested in the underlying search logic, the most useful principle to understand is: The more words in your query, the less weight each individual word receives when site scores are calculated. Thus if your search returns too few or no sites, and it is a long query, try deleting less important words; more weight will be given to the remaining words. On the other hand, try adding synonyms or related terms to a short query. If plenty of sites were found, but they are not what you wanted: add concepts, or substitute different wording for terms that might be rare or ambiguous.

WHAT'S REAL, WHAT'S VAPOR

This site is not engineered as a robust commercial operation. That is why it is labeled as a demo. If our system administrator is out for the day, and a disk crashes or a routing table gets corrupted, then we will be down for awhile. Our plan is to look to others to implement our technology on a larger scale, using their own look and feel. This demo site has some other known limitations:

The database of web sites is incomplete. We have crawled (spidered) a large portion of the web, but we know we have missed substantial pieces of it. So if we missed your site, that is why. The collection will grow with each update.

There are too many stale links. We know that. An upcoming release will encompass link validation checking.

Many site listings are missing missing page titles; only the URL appears. An upcoming database reload will include a more comprehensive set of titles. (Hey -- we're a development shop, not a production site!)

Too many duplicate pages appear under slightly different URLs. We agree. We have an enhancement in the works which will compare URLs and consolidate them if they point to the same page.

The search engine sometimes returns pages that make no sense. The search engine itself -- the dbms, as opposed to the content set or the voting algorithm -- is a new piece of code, and may have bugs. Please do tell us if you encounter any anomalies or nonsensical results.

IDD'S BUSINESS MODEL

The Hyperlink Search Engine was created to illustrate the functionality of the pure hyperlink voting algorithm. IDD believes, however, that hyperlink voting is complementary to other web search techniques. Any web search engine's raw output can be ranked according to hyperlink voting scores. Also, other search methods are still needed for unusual queries which might match no hyperlinks.

IDD has applied for patent protection for the Hyperlink Vector Voting algorithm. The patent application is pending in the USA and other countries. Assuming a patent is received, others who want to use the algorithm will need permission (a license) from IDD. We are offering licenses to web site operators who may want to integrate this web-search technique with other applications or create a unique user interface for it. The algorithm may be licensed with or without IDD's application development tools and text dbms on which the Hyperlink Search Engine is built.

In view of the rapid evolution the search-engine business is undergoing, we do not have an "off the shelf" price list. A likely business arrangement would involve a split of advertising revenue. If you are the business manager responsible for a major web site, and would like to consider leveraging this technology, please contact us.

ABOUT IDD

IDD Information Services, a division of IDD Enterprises, L.P. [ http://www.idd.net/ ], is devoted to developing financial-information and text-filtering applications. IDD has delivered many solutions for distributing news [ http://bis.dowjones.com/dowvision/ ] and stock market information to financial professionals [ http://www.webfinance.net/ ] as well as personal investors [ http://nestegg.iddis.com/ ], via both the internet and private institutional networks. IDD has about 300 employees and is based in New York City, with major development sites in Waltham, Mass., and Livingston, N.J. The majority owner of IDD is Dow Jones & Company.

CONTACT
Doran Howitt [ dhowitt@iddis.com ]
Director, Business Development
IDD Information Services
293 Eisenhower Parkway, Suite 250
Livingston, New Jersey 07039 USA
Tel +1 (973) 740 2605 (fax 973 994 3570)

URL submissions: Sorry, we do not accept submissions. The Rankdex engine was designed to be as objective as possible, and for this reason it determines 100% automatically which sites are included in our database.

Copyright © 1997 IDD Enterprises, L.P. All rights reserved.