Newsgroups: comp.os.linux.development
Path: bga.com!news.sprintlink.net!hookup!swrinde!gatech!
newsxfer.itd.umich.edu!isclient.merit.edu!msuinfo!
harbinger.cc.monash.edu.au!yarrina.connect.com.au!
warrane.connect.com.au!kralizec.zeta.org.au!socs.uts.edu.au!metro!
mama.research.canon.oz.au!luke
From: lu...@research.canon.oz.au (Luke Kendall)
Subject: Linux seems to perform terribly for large directories
Message-ID: <Cs76BL.2F2@research.canon.oz.au>
Lines: 50
Sender: ne...@research.canon.oz.au
Nntp-Posting-Host: tosh
Organization: Canon Information Systems Research Australia
Date: Thu, 30 Jun 1994 06:35:45 GMT

I have strong suspicion that Linux has a problem with large
directories.  An early pointer to this, was that doing an `ls'
on a directory with (say) 5000 files, took several minutes
to begin producing output.  This is _far_ slower than on other
versions of Unix.

The 2nd indicator was what happened when I used cpio to read a large
number of files from a floppy containing files from the Linux
newsgroups: (in particular, the voluminous comp.os.linux.help).

(My pattern of use was to dump a whole lot of files to my home machine,
and every now and then read them and delete uninteresting articles.
I started loading from around news item 20000; I'm now up in the
early 40,000's.  So about 20,000 files have been added and removed.)

There was 40Mb free at the time; (the hard disc had recently been filled
to within 500kb of Full).  Then I lots of junk.

So, this time, reading 761 files from the floppy (2295 blocks,
i.e. 1.15Mb), the elapsed time was something like 7 minutes!
Normally reading a floppy like this takes between 1 and 2 minutes.

I timed a 2nd, similar floppy of files.  Elapsed time just over 4
minutes; 20secs user, 117secs system, 49% of CPU.  I believe that
the process was swift until it read a fixed amount from floppy into
internal memory, and then slowed down dramatically when writing the
files out to the hard disc (judging by the screen output & drive
access light).

Just listing the files on the floppy was as fast as normal.

Reading a floppy into /tmp was normal speed.  Moving the files into
the right directory took only seconds.

Reading a floppy into a directory that had contained far fewer files
also took only a reasonable amount of time.

Processes running were an xview X session with a performance monitor
and clock in the background (as normal).

A ps of the cpio process about 2/3 or 3/4 of the way through the
process showed it had used 1m30s CPU time.

Processor is a 486DX33, with a VLB controller and a 340Mb Western
Digital IDE drive, 8Mb of memory, running Linux 0.99.13.  The
floppies are 1.44Mb.

So: what gives?  Have others noticed this problem?

luke

-- 
Luke Kendall, Senior Software Engineer.      | Net:   lu...@research.canon.oz.au
Canon Information Systems Research Australia | Phone: +61 2 805 2982
P.O. Box 313 North Ryde, NSW, Australia 2113 | Fax:   +61 2 805 2929

Newsgroups: comp.os.linux.development
Path: bga.com!news.sprintlink.net!hookup!yeshua.marcam.com!
MathWorks.Com!europa.eng.gtefsd.com!emory!swrinde!pipex!uknet!festival!
dcs.ed.ac.uk!sct
From: s...@dcs.ed.ac.uk (Stephen Tweedie)
Subject: [ANSWER] Linux seems to perform terribly for large directories
In-Reply-To: kjb@cs.vu.nl's message of Tue, 5 Jul 1994 09:31:29 GMT
Message-ID: <SCT.94Jul8143930@ascrib.dcs.ed.ac.uk>
Sender: cn...@dcs.ed.ac.uk (UseNet News Admin)
Organization: Department of Computer Science, University of Edinburgh
References: <Cs76BL.2F2@research.canon.oz.au> <Cs94z0.s9@pe1chl.ampr.org>
	<1994Jul4.140054.10696@uk.ac.swan.pyr>
	<2vb2kk$hvs@wombat.cssc-syd.tansu.com.au> <CsGnsH.KC8@cs.vu.nl>
Date: Fri, 8 Jul 1994 13:39:29 GMT
Lines: 32

Hi,

I just thought I'd mention that this is very much a work-in-progress
topic.  At the Heidelburg conference, Ted Ts'o and I discussed this at
some length, and Ted even found a glaring bug in the existing
readdir() system call (one which affects all long-filename
filesystems, by the way).

So, I've currently got tentative bug-fix patches and performance
improvements for the directory handling code.  The bug-fix should also
mean that the ext2fs directory cache may now be re-enabled.

There are two major performance enhancements: first, readdir() now
returns more than one directory entry if requested (up to a whole
block-full, in fact).  This requires library support too, by the way,
but will not require any applications to be recompiled: *all*
applications use the readdir(3) from the library, not readdir(2).

Secondly, there is the directory cache.  This is a major win for
things like "ls -l", where an application does a repeated
readdir()/stat().  Up to 128 directory names resolved by readdir() and
lookup() will be cached, and these names may then be referenced
without scanning the entire directory tree again.

Once they are tested, these patches should be in a kernel soon.  Watch
this space... :-)

Cheers,
 Stephen.
---
Stephen Tweedie <s...@dcs.ed.ac.uk>   (JANET: sct@uk.ac.ed.dcs)
Department of Computer Science, Edinburgh University, Scotland.