From: k...@socrates.hr.att.com (Kenneth Almquist)
Subject: Infoseek Ultra - search engine review
Date: 1996/08/23
Message-ID: <4vivda$f7l@nntpa.cb.lucent.com>
X-Deja-AN: 175927455
organization: Lucent Technologies, Columbus, Ohio
newsgroups: comp.infosystems.www.misc


Infoseek Ultra is a new WWW search engine from Infoseek which has just
entered public beta testing.  It appears to be the promised Ultraseek
search engine under a different name.

1.  Size

Size may not be everything, but bigger is usually better.  To estimate
the relative sizes of the indexes used by various search engines, I
searched for "gorbachev" using various search engines.  The results
are:

		    Excite ........... 12363
		    Hotbot ........... 12318
		    Altavista ......... 5263
		--> Infoseek Ultra .... 4558
		    Lycos ............. 1516
		    Open Text .......... 850
		    NlightN ............ 546
		--> Infoseek Guide ..... 453
		    MCI Search ......... 410
		    WebCrawler ......... 158
		    Yahoo ................ 7

As you can see, Ultra has a data base about ten times that of the
Infoseek Guide.  Ultra lags behind Excite and Hotbot, but there is no
reason to suppose that it can't catch up.  Ultra is about where Hotbot
was when Hotbot first went public.

(Disclaimer:  Lycos and Yahoo don't index the entire text of pages, so
that the numbers for those aren't directly comparable to the others.
My impression is that Lycos *is* smaller than Ultra.  Also, the number
of matches is the number reported by the search engine.  Excite
sometimes reports more matches than it actually finds, and Ultra
sometimes reports the same page multiple times.  Finally, these
numbers provide only a rough estimate.  The numbers above suggest that
Altavista's index is 15% larger than Ultra's, but searching for the
word "the" produces 15,572,658 documents in Altavista and 10,912,404
documents in Ultra, making Altavista 43% larger.)


2.  Query language

Ultra has a reasonably decent documentation, although some details are
not explained.  The documentation is under "help"; "about Ultra" calls
up marketing literature containing more hype than substance.  Some
features are:

 * Searching for phrases.  The ability to match phrases rather than
   just individual words is one of the most useful features that a
   search engine can support.  The marketing literature stresses that
   Ultra can handle phrases containing common words, which is nice
   but in practice I've never had a problem with the way other search
   engines handle phrases.

 * Require or exclude particular words or phrases.  This is a poor
   man's substitute for a boolean search capability.  Preceding a
   word or phrase with a plus sign will exclude pages which do not
   contain the query term.  Preceding a word or phrase with a minus
   sign will exclude pages which DO contain the query term.  This
   is not as flexible as a true boolean search, but is easy to learn.

 * Match capitalization.  If any letters in a word are capitalized,
   Ultra will require the case of the word to match exactly.  This
   is probably inferior to the Altavista approach of allowing a
   lower case letter to match the corresponding upper case letter.
   Altavista allows "Gates" to find "THE BILL GATES PAGE".  The
   Ultra approach handles names like "NeXT" and "TeX" nicely.

 * Region constraints.  Ultra will allow you to limit matches to
   the title, url, host name, or link URLs.  This is less extensive
   than Altavista's list of constraints, but includes the important
   ones.

 * Match word variants.  This apparently uses a dictionary, which is
   a significant improvement over the "match any suffix" approach
   used by search engines such as Lycos, Altavista and Excite.  False
   matches still occur; for example a search for "Little Women" turns
   up "The Little Woman Page" and "Internet Sleuth" matches "Internet
   Sleuths."  It also misses some cases:  "UFO" will not match "UFOs".
   In general this seems to be a useful feature, but it would be nice
   if there were a way to require exact matches.

In short, Ultra cannot quite match Altavista in terms of search
features--it lacks a full boolean search and a NEAR operator, among
other things--but it has sufficient features for most queries.  It
does not have the "find similar pages" feature of Infoseek Guide.


As far as I can tell, Ultra interprets anything you type at it as
a valid query of some sort.  For example, suppose you type

	"Little Women" - "Winona Ryder"

rather than

	"Little Women" -"Winona Ryder"

Ultra does not allow a space following the minus sign, which is simple
and clearly documented, but easy to get wrong.  Rather than informing
the user that the query is invalid, Ultra simply ignores the minus
sign.  Similarly, Ultra names the host constraint "site" rather than
"host."  If you type

	host:ultra.infoseek.com

rather than

	site:ultra.infoseek.com

Ultra will search for the phrase "host ultra infoseek com", with no
indication (other than the failure to find any results) that it has
done this.

A final "gotcha" of the query language is that adjacent capitalized
words are treated as a phrase.  This is allegedly a feature, but it
is one more thing you have to watch out for.


3.  Search results

Ultra returns 10 results per page.  The current search form does
not allow you to modify this, but if you download the search form
and edit it, you can set "nh" (which controls the number of results
per page) to any value from 5 through 25.  Ultra also has a "lk"
(short for "look," I presume) variable which controls whether the
results include abstracts or just titles.  (Setting it to 1 gives
abstracts, setting it to 2 gives titles only.)  Again, there is no
way to set this in the search form, after you get the first page
of query results, you can select "hide summaries" to set it to 2.
There does not appear to be any limit on the total number of results
that Ultra will return for a query.

The abstracts are not necessarily the first words of the document.
It seems that Ultra will sometimes skip forward to the first
occurrence of the document title within the document body, and
start the abstract with the first word following this occurrence.
This seems a bit weird at first, but is probably fine once you get
used to it.

An important feature of a search engine is how it orders the results.
It is not easy to compare the quality of search engines in this
regard.  I did a search for "Little Women" and compared the first
fifty results returned from Altavista with the first fifty results
returned from Ultra.  The 8th result from Ultra was the same as the
26th result from Altavista, and there were no other matches.  This
seems to indicate that the ordering functions used by Altavista and
Ultra are differ significantly, but it is not obvious which is better.
Altavista gave higher priority than Ultra to pages containing chapters
from the book, suggesting that Ultra was more sensitive to word
frequency than Altavista.  (In the chapters, "Little Women" probably
appeared at the top of the page and no where else, making the
frequency of occurrence within the text low.)


4.  Problems

Obviously the folks at Infoseek have not entirely mastered HTML.  I
did not encounter a single page on the entire site which didn't
contain multiple HTML errors.  Even the search results contain invalid
HTML.  The most amusing example comes from the "special searches"
page, which includes the HTML code:

    <h4>Find HTML errors on a web page (via Imagiware)</h4>
    <A name=html>
    <FORM action="http://imagiware.com/RxHTML/doc.cgi#Summary" method=post>
    </a>

The writer has managed to transpose the <FORM> and </a> tags.  If the
client corrects the error by adding a </form> tag before the </a> tag
(a reasonable approach because it is easier to omit a close tag than
to transpose tags), the subsequent <input> tags will be outside a form
and this must be deleted by subsequent error correction.  Of course,
if the form is unusable you can still "find HTML errors on a web page"
by selecting "display source!"

Ultra is more solid than Hotbot was when it was first made available
to the public.  (For that matter, Ultra may be more solid than Hotbot
is today.)  The query

	title:"Internet Sleuth"

returned three entries for a page titled "Institute of Geology and
Paleontology".  Ultra gave these entries a relevance score of 0% and
displayed them after all of the valid results, so the problem is not
major.

A more serious problem, mentioned in the documentation, is that the
same page can get entered into Ultra more than once.  For example, the
Ultra home page (http://ultra.infoseek.com/) is entered in the data
base six times, so any query that matches this page will list it six
times.  I assume that this will be fixed before the system moves out
of public beta.


5.  Speed

Ultra appears to be quite fast, although comparisons are difficult
because so much of the delay is in the network rather than in the
search engine.

A page containing 10 results is about 8.5K bytes, which is comparable
to the Altavista advanced search, and about 1K better than the
Infoseek Guide.  However, Ultra includes a different advertising image
in each result page, which will slow things down if you have image
loading enabled.


6.  Summary

In conclusion, Ultra is a serious challenge to the other search
engines out there.  Assuming that the the index is expanded and the
remaining kinks are resolved during the public beta period, Ultra
has a good shot at establishing itself as the best search engine
on the net
				Kenneth Almquist

From: y...@cs.buffalo.edu (Yanhong Li)
Subject: Re: Infoseek Ultra - search engine review
Date: 1996/08/23
Message-ID: <4vksk4$n0m@prometheus.acsu.buffalo.edu>#1/1
X-Deja-AN: 176032698
references: <4vivda$f7l@nntpa.cb.lucent.com>
organization: State University of New York at Buffalo/Computer Science
nntp-posting-user: yli
newsgroups: comp.infosystems.www.misc


In article <4vivda$f...@nntpa.cb.lucent.com>,
Kenneth Almquist <k...@socrates.hr.att.com> wrote:
>Infoseek Ultra is a new WWW search engine from Infoseek which has just
>entered public beta testing.  It appears to be the promised Ultraseek
>search engine under a different name.
>
>1.  Size
>
>Size may not be everything, but bigger is usually better.  To estimate
>the relative sizes of the indexes used by various search engines, I
>searched for "gorbachev" using various search engines.  The results

Gueesing the size by just one sigle query result is not fair.
As you mentioned, they can cheat, if they only found 5000 matches,
but they claim 10,000, how can you tell? If the number of hits
is larger than a certain number, say 50, how mnay hits you get is
no longer important.

Did anyone EVER found something they were looking for from one
search engine but not the others BECAUSE the others are smaller?

I never did. Most of the searches result in too many hits, when you
can not find something you really want, you never know if it is because
the search engines donot have it, or it's because you did not examine
all the hits(You could not afford to do so!)

>2.  Query language
>
>Ultra has a reasonably decent documentation, although some details are
>not explained.  The documentation is under "help"; "about Ultra" calls
>up marketing literature containing more hype than substance.  Some
>features are:

As for features, I like hotbot's interface, you need only specify
phrase, all words, person etc. AltaVista''s is too dum, the others
require some effort to learn(or remember) their rules.

The best search engine for General Purpose is those do not require
a user to learn a lot yet can find the web pages a user is looking
for.

A casual user usually does not want to think in boolena logic when
they do search, most of the times, when a user type in a few keyword,
they really mean, find web pages that matches my interest best, they
do not care about the logic.

>
>3.  Search results
>
>An important feature of a search engine is how it orders the results.

This is the key! When your search results in hundreds or even
thausands of hits, you donot care how large their databases are, you
care the ORDER.

The technology behind the search engines belong to information
retrieval, an area that has been studied for more than 50 years.
The focus has been how you order those hits. Unfortunately, IR 
people seem only order the hits by relevance, which is wrong!



>A more serious problem, mentioned in the documentation, is that the
>same page can get entered into Ultra more than once.  For example, the
>Ultra home page (http://ultra.infoseek.com/) is entered in the data
>base six times, so any query that matches this page will list it six
>times.  I assume that this will be fixed before the system moves out
>of public beta.
>

Duplication is definately a problem! Not only the same URL
duplications, but more seriously, those mirror sites. Unfortunately,
lots of popular sites are mirrored in many places.

>5.  Speed
>
>Ultra appears to be quite fast, although comparisons are difficult
>because so much of the delay is in the network rather than in the
>search engine.

Speed is not so important, it is always faster than the graphical
ads that come with your search results:-)

>
>6.  Summary
>
>In conclusion, Ultra is a serious challenge to the other search
>engines out there.  Assuming that the the index is expanded and the
>remaining kinks are resolved during the public beta period, Ultra
>has a good shot at establishing itself as the best search engine
>on the net
>				Kenneth Almquist


Infoseek is using the technology licensed from U. Mass, but I am
sure they have made a lot of changes. Yahoo uses AltaVista's or
Opentext's, Hotbot uses Inktomi's. All others use their own.

From: k...@socrates.hr.att.com (Kenneth Almquist)
Subject: Re: Infoseek Ultra - search engine review
Date: 1996/08/27
Message-ID: <4vvhel$667@nntpa.cb.lucent.com>#1/1
X-Deja-AN: 176816676
references: <4vivda$f7l@nntpa.cb.lucent.com> <4vksk4$n0m@prometheus.acsu.buffalo.edu>
organization: Lucent Technologies, Columbus, Ohio
newsgroups: comp.infosystems.www.misc


> Guessing the size by just one single query result is not fair.
> As you mentioned, they can cheat, if they only found 5000 matches,
> but they claim 10,000, how can you tell?

I also tried searching for "orgami" which produces a small enough
number of hits that I could request them all.  This is how I
discovered that Excite can overstate the number of hits.  (It reported
53 hits and returned 51.)  Ultra's position in the ordering of search
engines by size was the same for both "orgami" and "gorbachev."

> Did anyone EVER found something they were looking for from one
> search engine but not the others BECAUSE the others are smaller?

Real example:  The "Twining memo" was discussed in sci.skeptic a while
back.  Searching for "Twining memo" in Infoseek Ultra turns up three
hits.  The same search using Infoseek Guide turns up nothing.

> The best search engine for General Purpose is those do not require
> a user to learn a lot yet can find the web pages a user is looking
> for.
>
> A casual user usually does not want to think in boolean logic when
> they do search, most of the times, when a user type in a few keyword,
> they really mean, find web pages that matches my interest best, they
> do not care about the logic.

My impression is that it is easier to learn search engine features
than to learn how to construct effective queries.  Of course the
difficulty of constructing an effective query varies considerably
depending on what you are searching for.

Suppose you are searching for information on the University of
Washington.  It is reasonably easy to learn that a phrase like
"Mission Impossible" can be enclosed in double quotes.  If you don't
know about phrase searches, or are using a search engine which doesn't
support them, you may be able to narrow in on the TV program by adding
additional search terms.  But it is harder to learn to select
additional search terms than to learn the mechanics of phrase search,
and the results will almost certainly be worse.

>> An important feature of a search engine is how it orders the results.
>
> This is the key! When your search results in hundreds or even
> thousands of hits, you do not care how large their databases are, you
> care the ORDER.

Unfortunately, result ordering is very difficult to evaluate.  Since I
don't know which search engine does the best job of ordering, I tend
to use search engines which have advanced search features which make
it easier to eliminate irrelevant pages from the results.  If the
results contain mostly relevant pages, then ordering is less important.

> Speed is not so important, it is always faster than the graphical
> ads that come with your search results:-)

I eliminate the graphical ads by turning off image downloading.

I've used Ultra a little more and concluded that it is significantly
faster that Alta Vista when accessed from my ISP.  The speed really
does make a subjective difference.
					Kenneth Almquist