Search Engine Implementation

From: boo...@cs.buffalo.edu (Ricky B Teh)
Subject: Search Engine Implementaion
Date: 1997/02/28
Message-ID: <5f5pfp$abq@Holly.aa.net>#1/1
X-Deja-AN: 222007008
Organization: University at Buffalo
Reply-To: boo...@cs.buffalo.edu (Ricky B Teh)
Newsgroups: comp.infosystems.www.authoring.cgi

Hi.. I've recently implemented a mini search engine. I'm wondering if it's 
working efficiently. Basically, I have a file type database, and what the 
script (Perl) does is to read in the file, and match one of the field line
by line (sequencial search which is Big-O(n)). 
It seems like it's runnig okay (time wise) when searching thru couple of 
hundred items. But I never know what's gonna happen when searching thru 
larger amount of data.. like 500,000 items.
In the matter of fact, what is the best method for searching huge amount of
data? What are the most common method used by commercial search engines?
Binary Search Tree...? Any hint would be greatly appreciated. Thanks.

-Rick

From: y...@cs.buffalo.edu (Yanhong Li)
Subject: Re: Search Engine Implementaion
Date: 1997/03/10
Message-ID: <5g1ga9$d4j@Holly.aa.net>#1/1
X-Deja-AN: 224438311
References: <5f5pfp$abq@Holly.aa.net>
Organization: State University of New York at Buffalo/Computer Science
Reply-To: y...@cs.buffalo.edu (Yanhong Li)
Newsgroups: comp.infosystems.www.authoring.cgi

The basic technic for search engines are called inverted file.

You need to index every word in the documents in your database.
Searching a flat file takes forever when your database is large.
If you want relevance ranking -- you need more sophiscated structure.