From: boo...@cs.buffalo.edu (Ricky B Teh) Subject: Search Engine Implementaion Date: 1997/02/28 Message-ID: <5f5pfp$abq@Holly.aa.net>#1/1 X-Deja-AN: 222007008 Organization: University at Buffalo Reply-To: boo...@cs.buffalo.edu (Ricky B Teh) Newsgroups: comp.infosystems.www.authoring.cgi Hi.. I've recently implemented a mini search engine. I'm wondering if it's working efficiently. Basically, I have a file type database, and what the script (Perl) does is to read in the file, and match one of the field line by line (sequencial search which is Big-O(n)). It seems like it's runnig okay (time wise) when searching thru couple of hundred items. But I never know what's gonna happen when searching thru larger amount of data.. like 500,000 items. In the matter of fact, what is the best method for searching huge amount of data? What are the most common method used by commercial search engines? Binary Search Tree...? Any hint would be greatly appreciated. Thanks. -Rick
From: y...@cs.buffalo.edu (Yanhong Li) Subject: Re: Search Engine Implementaion Date: 1997/03/10 Message-ID: <5g1ga9$d4j@Holly.aa.net>#1/1 X-Deja-AN: 224438311 References: <5f5pfp$abq@Holly.aa.net> Organization: State University of New York at Buffalo/Computer Science Reply-To: y...@cs.buffalo.edu (Yanhong Li) Newsgroups: comp.infosystems.www.authoring.cgi The basic technic for search engines are called inverted file. You need to index every word in the documents in your database. Searching a flat file takes forever when your database is large. If you want relevance ranking -- you need more sophiscated structure.