Technology and Trends
 USENET Archives
  
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!linus!philabs!prls!pyramid!ut-sally!std-unix
From: std-u...@ut-sally.UUCP (Guest Moderator, John B. Chambers)
Newsgroups: mod.std.unix
Subject: Case sensitive file names
Message-ID: <5860@ut-sally.UUCP>
Date: Thu, 2-Oct-86 02:59:13 EDT
Article-I.D.: ut-sally.5860
Posted: Thu Oct  2 02:59:13 1986
Date-Received: Fri, 3-Oct-86 05:35:41 EDT
Organization: IEEE 1003 Portable Operating System for Computer Environments Committee
Lines: 53
Approved: j...@sally.utexas.edu

From cbosgd!cbosgd.ATT.COM!m...@seismo.CSS.GOV Wed Oct  1 16:55:45 1986
Date: Mon, 29 Sep 86 12:33:36 edt
From: m...@cbosgd.att.com (Mark Horton)
Message-Id: <8609291633.AA10479@cbosgd.ATT.COM>
Newsgroups: mod.std.unix
Subject: Case sensitive file names

OK, here's a new topic.  File names.

I note that the committee recently decided that all file names
in conforming systems must be case sensitive, for example,
makefile and Makefile must be different files.  (I've forgotten
where I read this, it was probably Communixations.)

I think this is a mistake.  UNIX is the only major operating system
that treats things like file names, logins, host names, and commands
as case sensitive.  The net effect of this is that users get
confused, since they have to get the capitalization right every time.
To avoid confusion, everybody always just uses lower case.  So
there are few, if any, benefits from a two-case system, and any time
anyone tries to do something that isn't pure lower case, it causes
confusion for somebody and often breaks some program.

Another problem is that emulations on other operating systems,
such as VMS or MS DOS, will become impossible without drastic
changes to their file systems.  Given the problems in the above
paragraph, plus politics as usual, I think it is unlikely that
other systems will be changed to have case sensitive file systems.
After all, it's not like it was easiest to make the VMS filesystem
case insensitive - that took extra effort on their part.

I think it's a mistake to move in the direction of requiring other
operating systems to become case sensitive.  If anything, motion in
the other direction might be of more benefit.

Note: I am NOT suggesting that UNIX should have a case insensitive
filesystem that maps everything to UPPER CASE like MS DOS.  There is
nothing wrong with mapping everything to lower case, for example.
It's also reasonable to leave the case alone, but ignore case in
comparisons.  There is also probably a good argument for keeping
it case sensitive (after all, there are probably 5 or 6 people out
there who really need both makefile and Makefile, or both mail and
Mail, for some reason that escapes me at the moment.)  But I think
it would be a mistake to require other systems to change if they
are to support a POSIX emulation on top of them.  (On the other hand,
it may be reasonable to expect other operating systems to support
more general file name lengths and character sets, rather than things
like the MS DOS 8+3 convention.  But in practice, this may be too
painful to fix.)

	Mark Horton

Volume-Number: Volume 7, Number 11

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!decvax!tektronix!teklds!cae780!amdcad!amd!intelca!qantel!lll-lcc!
lll-crg!rutgers!husc6!ut-sally!std-unix
From: std-u...@ut-sally.UUCP
Newsgroups: mod.std.unix
Subject: Re: Case sensitive file names
Message-ID: <5865@ut-sally.UUCP>
Date: Thu, 2-Oct-86 12:08:21 EDT
Article-I.D.: ut-sally.5865
Posted: Thu Oct  2 12:08:21 1986
Date-Received: Fri, 3-Oct-86 07:55:03 EDT
Organization: IEEE 1003 Portable Operating System for Computer Environments Committee
Lines: 53
Approved: j...@sally.utexas.edu

From @SUMEX-AIM.ARPA:MRC@PANDA Thu Oct  2 05:09:39 1986
Date: Thu 2 Oct 86 01:59:26-PDT
From: Mark Crispin <MRC%PA...@SUMEX-AIM.Stanford.EDU>
Subject: Re: Case sensitive file names
To: std-unix%ut-sally.U...@SALLY.UTEXAS.EDU
In-Reply-To: <5860@ut-sally.UUCP>
Postal-Address: 1802 Hackett Ave.; Mountain View, CA  94043-4431
Phone: +1 (415) 968-1052
Message-Id: <12243533720.7.MRC@PANDA>

I would like to add a loud "Bravo!" to Mark Horton's message!  The present
case sensitivity of the Unix filesystem is a real drag, and something that
has regularly and reliably caused me problems when working in a heterogenous
environment.  As far as I can tell, the only individuals who actually *like*
case sensitivity in Unix are the high-schoolish hackers who think it's really
cute to write programs with separate -1, -l, -I, and -L switches.

I think that the most reasonable proposal is to do a free case match on input,
so that "more foobar" is the same as "More FooBar", etc.  On output, you first
do a free case match to see if there is an extant file and if so preserve the
case of that file.  In other words, if I overwrite FooBar but specify foobar
or FOOBAR, the file is still called FooBar.  Otherwise, use whatever case the
user specifies.  Renaming would always use the case the user specifies, so the
user can rename foobar to FooBar, etc.

Now, if I can convince you guys to do this for usernames, I will take back at
least 50% of the nasty things I've ever said about Unix.  Golly gee, it would
be nice to be MRC or Crispin, not "mrc" or "crispin"...

Another way of doing it is how TOPS-20 does it.  TOPS-20's filesystem isn't
*really* case independent.  All lowercase characters are coerced into upper
case, so if I say foobar.txt it becomes FOOBAR.TXT in the actual filename.
This is both from the user interface and from the filename lookup system call.
It is, however, possible for any of the 128 ASCII characters to be in a filename,
provided that the "oddball" characters are quoted using CTRL/V.  In other words,
a FooBar.Txt file is possible on TOPS-20, but only by F<^V>o<^V>oB<^V>a<^V>r.T<^V>x<^V>t.

For once, I don't favor the TOPS-20 way of doing things.  TOPS-20's scheme is
alright if you started with case independence to begin with, but I don't think
it would fit in well into Unix, and certainly not without a major flag day.  I
hope that my suggestion above could fit in with only minimal inconvenience.

I found on TOPS-20 that no serious user used case-sensitive filenames.  Everybody
appreciated the case-insensitivity of the interface, even though it took the form
of coercing to upper case.  My experience also suggests that case sensitivity is
a pain in the a**; I tried writing a major utility in Interlisp using mixed case
function and variable names and eventually gave up when most of my errors turned
out to be case errors.  It's *so* much easier to keep the shift lock key down...

-- Mark --
-------

Volume-Number: Volume 7, Number 12

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!decvax!ucbvax!ucbcad!nike!sri-spam!rutgers!husc6!ut-sally!std-unix
From: std-u...@ut-sally.UUCP (Guest Moderator, John B. Chambers)
Newsgroups: mod.std.unix
Subject: Re:  Case sensitive file names
Message-ID: <5875@ut-sally.UUCP>
Date: Fri, 3-Oct-86 13:56:07 EDT
Article-I.D.: ut-sally.5875
Posted: Fri Oct  3 13:56:07 1986
Date-Received: Sat, 4-Oct-86 07:22:45 EDT
Organization: IEEE 1003 Portable Operating System for Computer Environments Committee
Lines: 64
Approved: j...@sally.utexas.edu

From im4u!...@prophet.bbn.com Fri Oct  3 04:42:00 1986
Message-Id: <8610030928.AA14794@im4u.UTEXAS.EDU>
Date:     Thu, 2 Oct 86 12:43:49 EDT
From: Dan Franklin <im4u!...@prophet.bbn.com>
To: "Guest Moderator, John B. Chambers" <std-unix%ut-sally.U...@im4u.UTEXAS.EDU>
Subject:  Re:  Case sensitive file names

I can see that it will be hard to emulate POSIX filenames on top of an
operating system such as MS-DOS or VMS, but the benefits of changing the
POSIX spec must be weighed against the costs.  Suppose we changed the spec
so that it permitted a POSIX implementor to provide either a
case-sensitive or case-insensitive filesystem, their choice (which I think
is what Mark is proposing).  There are three groups of people who will be
affected: those who write POSIX emulators, those who write programs for
POSIX, and those who *use* POSIX and its programs.  The last group will be
the largest and most important by far; the emulator writers will be the
smallest group.

So how would users be affected?  It might benefit them, because
case-insensitivity might really be better than case-sensitivity.  However,
in the absence of a controlled study, let's assume the null hypothesis:
that it makes no big difference.  More than "proof by assertion" is needed!

Regardless of which is really better, some users will probably benefit
because they will be used to other operating systems providing
case-insensitivity, particularly MS-DOS.

However, if we really make it an implementor's choice, users will
be hurt by the fact that each POSIX system they encounter will be
different.  In fact, this system-to-system difference will probably
cause more problems than optional case insensitivity would solve.

What about people who write POSIX programs?  They will lose.  To the extent
that POSIX permits two possible underlying filesystems, a truly portable
POSIX program will have to be prepared for either one.  For many programs
it may not matter what the FS looks like, but if it does matter, it will
mean extra work.

Finally, there are all those emulator writers.  They might find it easier;
then again, they might not.  If I were going to do an emulator on top of
MS-DOS, then (since I don't work for Microsoft) I would probably use the
existing filesystem just as a base to build the POSIX filesystem, almost
the way UNIX builds a named hierarchical filesystem space out of inodes.
Going to case insensitivity wouldn't help me a bit, because of the other
limitations Mark mentioned.  It might help Microsoft, because they could
change the 8+3 convention at the same time.  But unless they were willing
to do that, it wouldn't help them either.  VAX-VMS might be easier, but
again there are other problems I would have to solve.  Case-insensitivity
would help me some, but I'd still have a lot of work ahead of me.

But arguments regarding emulator-writing are beside the point.  No matter
what POSIX does on this, it will always be possible to write a POSIX
emulator on top of an existing operating system.  So the ease of *using*
the system must take precedence over the ease of writing it.

For the reasons above, I believe that making case-insensitivity an *option*
would be a bad idea.  Changing the spec to *insist* on case-insensitivity
might be a good idea, but it would cause enough problems w.r.t. existing
UNIX systems that it ought to be very strongly motivated.  To start with:
is it really much easier for people to use such a system?

	Dan Franklin

Volume-Number: Volume 7, Number 14

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!decvax!ucbvax!ucbcad!nike!think!husc6!ut-sally!std-unix
From: std-u...@ut-sally.UUCP (Guest Moderator, John B. Chambers)
Newsgroups: mod.std.unix
Subject: Re:  Case sensitive file names
Message-ID: <5913@ut-sally.UUCP>
Date: Sun, 5-Oct-86 18:25:25 EDT
Article-I.D.: ut-sally.5913
Posted: Sun Oct  5 18:25:25 1986
Date-Received: Mon, 6-Oct-86 05:41:41 EDT
Organization: IEEE 1003 Portable Operating System for Computer Environments Committee
Lines: 49
Approved: j...@sally.utexas.edu

Date: Fri, 3 Oct 86 23:56:26 edt
From: m...@cbosgd.att.com (Mark Horton)
Subject: Re:  Case sensitive file names

>Finally, there are all those emulator writers.  They might find it easier;
>then again, they might not.  If I were going to do an emulator on top of
>MS-DOS, then (since I don't work for Microsoft) I would probably use the
>existing filesystem just as a base to build the POSIX filesystem, almost
>the way UNIX builds a named hierarchical filesystem space out of inodes.
>Going to case insensitivity wouldn't help me a bit, because of the other
>limitations Mark mentioned.  It might help Microsoft, because they could
>change the 8+3 convention at the same time.  But unless they were willing
>to do that, it wouldn't help them either.  VAX-VMS might be easier, but
>again there are other problems I would have to solve.  Case-insensitivity
>would help me some, but I'd still have a lot of work ahead of me.

I'm not concerned very much about the amount of work the emulator
writer has to do, but I am concerned about the quality of the
resulting emulation.  If I'm a user of an emulator which is written
on an otherwise-reasonable case insensitive filesystem (VMS comes
to mind) which emulates case sensitivity, then apparent POSIX filenames
will bear little resemblance to real native filenames.  Either there's
an external table somewhere not unlike the UNIX directory/inode # tables,
or else file names are somehow encoded into longer native filenames.
I'm living with the latter kind of system now (Sun's PC/NFS, which makes
UNIX filesystems look like DOS filesystems) and the contortions it has
to go through to fit ordinary UNIX file names into DOS filenames are
a serious inconvenience.  The former kind of system makes it impossible
to access native files from inside the POSIX environment, unless someone
is awfully clever.

On the other hand, if case insensitive is an option for the emulator,
then two possibilities occur: (1) the vendor of the native operating
system can otherwise upgrade their filesystem to allow a clean POSIX
implementation (maybe they will arrange that their native OS conforms
directly to POSIX; wouldn't you consider it strongly if the market
starts to demand POSIX compatibility?) and (2) True UNIX systems have
the option to evolve to case insensitive, should a study be done and
the world conclude that insensitive is better.

I agree that a study should be done; I have my own intuitive feelings
on the subject, and there is quite a collection of operating systems
out there that went to extra work to be case insensitive, they can't
all be wrong, can they?  But by all means, this would make a great
human factors study for somebody.

	Mark

Volume-Number: Volume 7, Number 18

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!decvax!ucbvax!ucbcad!nike!think!husc6!ut-sally!std-unix
From: std-u...@ut-sally.UUCP (Guest Moderator, John B. Chambers)
Newsgroups: mod.std.unix
Subject: Case sensitive file names
Message-ID: <5914@ut-sally.UUCP>
Date: Sun, 5-Oct-86 18:26:23 EDT
Article-I.D.: ut-sally.5914
Posted: Sun Oct  5 18:26:23 1986
Date-Received: Mon, 6-Oct-86 05:42:02 EDT
Organization: IEEE 1003 Portable Operating System for Computer Environments Committee
Lines: 25
Approved: j...@sally.utexas.edu

Date: Sat, 4 Oct 86 04:19:12 CDT
From: dutoit!...@research.UUCP
Subject:  Case sensitive file names

The suggestion that POSIX be required (worse, permitted) to conflate
cases in file names is utterly loony.  We have enough portability
problems already in reconciling System V with 4.x without trying to
make Unix compatible with MS-DOS.

It is granted that Stu Feldman committed a rare lapse of taste in
accepting both `makefile' and `Makefile' (thus dooming everyone to
typing `cat ?akefile') and that Fowler apparently compounded the
distinction to the point of felony by encouraging both kinds of
?akefiles to exist and have different meanings.

Nevertheless, neither the possibility of silliness in choosing file
name conventions nor the dubious advantages of permitting Unix to be
embedded in other systems are relevant; what is important is that such
a subtle yet central change would be certain to make transport of
programs and of files more onerous.  This is not a wise thing for an
endeavor devoted to promoting portability.

	Dennis Ritchie

Volume-Number: Volume 7, Number 19

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!decvax!ucbvax!ucbcad!nike!think!husc6!ut-sally!std-unix
From: std-u...@ut-sally.UUCP (Guest Moderator, John B. Chambers)
Newsgroups: mod.std.unix
Subject: Re: Case sensitive file names
Message-ID: <5915@ut-sally.UUCP>
Date: Sun, 5-Oct-86 18:31:52 EDT
Article-I.D.: ut-sally.5915
Posted: Sun Oct  5 18:31:52 1986
Date-Received: Mon, 6-Oct-86 05:42:22 EDT
Organization: IEEE 1003 Portable Operating System for Computer Environments Committee
Lines: 51
Approved: j...@sally.utexas.edu

Date: Sat, 4 Oct 86 16:54:37 PDT
From: hoptoad!...@lll-crg.ARPA (John Gilmore)
Subject: Re: Case sensitive file names

> From: m...@cbosgd.att.com (Mark Horton)
> Another problem is that emulations on other operating systems,
> such as VMS or MS DOS, will become impossible without drastic
> changes to their file systems.

I think we should eliminate the hierarchical file system too (-:).
After all, VM/370 doesn't use it, nor does CP/M.  It would be too hard
to emulate.  (Thank Bog that MSDOS and the Mac added the feature, and
that Atari and Amiga started that way, or somebody might actually take
me seriously!)  We could consider getting rid of devices-as-files, though --
there's an idea that none of those people have picked up :-).

> After all, it's not like it was easiest to make the VMS filesystem
> case insensitive - that took extra effort on their part.

Their feeling it was worth the work for VMS doesn't make it right for Unix.

> I think it's a mistake to move in the direction of requiring other
> operating systems to become case sensitive.

Nobody is requiring anything of any other operating system.  We're
defining a *new* operating system here.

My impression was that the "new operating system" was supposed to look
very much like the set of features-in-common to the various Unix operating
systems.  If we are trying to standardize an environment that will
run under other operating systems, somebody better tell us quick.
I thought the "Portable Operating System" stuff was just a legalese hack
because we can't use the trademarked name "Unix".  Was I wrong?

>                                                        But I think
> it would be a mistake to require other systems to change if they
> are to support a POSIX emulation on top of them.  (On the other hand,
> it may be reasonable to expect other operating systems to support
> more general file name lengths and character sets, rather than things
> like the MS DOS 8+3 convention.  But in practice, this may be too
> painful to fix.)

Either they will implement POSIX compatability or they won't.  If we
define POSIX systems to be case insensitive, MSDOS would not qualify
anyway, since you can't use an arbitrary 14-character file name.  VMS
would have problems with files whose names contained [, ], or colon,
etc.  So they will have to provide some form of file name translation,
and they should handle the case issue at the same time they handle the
length and allowable character set issues.

Volume-Number: Volume 7, Number 20

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!decvax!ucbvax!ucbcad!nike!sri-spam!rutgers!husc6!ut-sally!std-unix
From: std-u...@ut-sally.UUCP (Moderator, John Quarterman)
Newsgroups: mod.std.unix
Subject: Re:  Case sensitive file names
Message-ID: <5929@ut-sally.UUCP>
Date: Mon, 6-Oct-86 18:55:36 EDT
Article-I.D.: ut-sally.5929
Posted: Mon Oct  6 18:55:36 1986
Date-Received: Tue, 7-Oct-86 03:41:10 EDT
Organization: IEEE 1003 Portable Operating System for Computer Environments Committee
Lines: 21
Approved: j...@sally.utexas.edu

The discussion has been interesting and has brought up some topics,
such as what case insensitivity means in non-English languages, that
many of the readers were evidently unaware of.  However, it's getting
a bit out of hand.

IEEE P1003.1 is interested in promoting portability of applications
by defining a UNIX-like operating system interface.  Any major change
from a feature of *every* variant of UN*X, such as case-sensitive
file names (really, filenames as uninterpreted byte strings), needs
major justification before being considered.  So further assertions
of the form "I want it because I like it" are not of interest.  It
would be most interesting to see the results of a survey on user
reaction to case sensitivity or insensitivity, but this newsgroup
isn't the place to conduct such a survey, and it's not clear that
the results would be relevant to 1003.1 anyway (what does case
mean in Japanese or Finnish)?

So, unless you've got something new to say on this subject, please
let's go on to something else.

Volume-Number: Volume 7, Number 27

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!decvax!ucbvax!ucbcad!nike!rutgers!husc6!ut-sally!std-unix
From: std-u...@ut-sally.UUCP (Moderator, John Quarterman)
Newsgroups: mod.std.unix
Subject: Re: Case sensitive file names
Message-ID: <6018@ut-sally.UUCP>
Date: Thu, 16-Oct-86 11:47:50 EDT
Article-I.D.: ut-sally.6018
Posted: Thu Oct 16 11:47:50 1986
Date-Received: Thu, 16-Oct-86 21:59:52 EDT
References: <6002@ut-sally.UUCP> <5865@ut-sally.UUCP>
Organization: IEEE P1003 Portable Operating System for Computer Environments Committee
Lines: 101
Approved: j...@sally.utexas.edu

[ *sigh*  Below you will find two examples of proof by emotion,
one for case sensitivity, one for case insensitivity.  Now that
we have one on each side together like this, how about let's
either use facts and arguments or go on to another subject?

Below the second example there is a somewhat new point, marked
by another interjection from the moderator.  -mod ]

From: seismo!mcvax!gec-mi-at.co.uk!adam
Date: Thu, 16 Oct 86 09:29:20 -0100
Organization: Marconi Instruments Ltd., St. Albans, Herts, UK

>I would like to add a loud "Bravo!" to Mark Horton's message!  The present
>case sensitivity of the Unix filesystem is a real drag....

No NO nO NO nO No no! Case sensitivity is a bonus. If you can't handle it,
it's your problem. I've worked with both case-sensitive, -preserving and
-insensitive systems, and I prefer them in that order.

       -Adam.

From: pyramid!lll-crg!nike!ucbcad!ucbvax!excelan!donp (Don Provan)
Date: Wed, 15 Oct 86 09:58:48 pdt

This is a good example of why people coming from other operating
systems so often dislike UNIX.  Two people pointed out what is
clearly a bug in UNIX which particularly upsets them.  Many people
responded that it was a feature.  Hrumph!

[ Below is the new point.  -mod ]

If you're so concerned about correctly handling of foreign languages,
why don't you start by handling English correctly?  In English,
"Make" and "make" are considered identical.  Capitalization rarely
has an effect on meaning.  Yet in UNIX, "Makefile" and "makefile" are
two different files with different "meanings".  Where are your *NEW*
users that are going to understand this sudden departure from a rule
of their native tongue?

[ The point is wrong.  Capitalization is significant in English:
internet and Internet do not have the same meaning, nor do john and
John (for readers outside the States, perhaps I should point out that
john with no capital refers to a toilet).  The distinction applies
not only to proper names but also in Emphasis and in syntax at the
beginning of sentences.  -mod ]

I am not sufficiently versed in foreign languages to understand the
issues concerning capitalization there.  It sounds like in some cases
the rules of what letters are equivalent (such as "A" and "a" in
English) might require tailoring.  If you're going to support foreign
languages in a meaningful way, i assume you're going to make lots of
other modifications, too.  For example, "Makefile" would need to have
a different name, right?  (I suppose the UNIX utilities themselves
already have names far enough removed from English so that they're no
problem.  What *does* "ls" stand for, anyway?)

[ As a moderately good reader of French and Spanish, I believe I can
state that the same sort of capitalization conventions exist in them as
in English, but with different details as to when capitalizaition is
appropriate.  The lexical details also differ:  the capital of ll (a single
letter in Spanish) is usually Ll, except when it's LL; in French, whether
an e with an acute accent still has an accent in its capital E form
depends on whether you're in France, Belgium, Quebec, Louisiana, etc.

I understand Greek is an interesting language:  there are several kinds
of lower case forms of some letters, to be used in different places in
a word (beginning, middle, end).  Similar distinctions exist in Arabic.

And, as several people have pointed out, case isn't meaningful in
Chinese, Korean, or Japanese kanji.  Also, the number of bytes used to
encode a character changes with the language, and multiple languages
should be supportable on the same system (in Japan, they commonly use
English, Japanese in romanji, and Japanese in Kanji; in Scandinavian
countries I suspect they have a lot of English interspersed with the
national language in technical literature).

In most European countries, UNIX command names are used unchanged,
and Makefile does not in fact have a different name.  Would some
Europeans care to comment?
-mod ]

Having done a lot of case insensitive work, i've always felt that the
UNIX case sensitivity was from laziness.  If i were to be charitable,
i might go so far as to call it a shortcut.

[ See Doug Gwyn's previous article for a good explanation of why file
names are case sensitive (or, rather, byte streams uninterpreted by the
kernel) in UNIX (see Barry Shein's article for a good explanation of why
some other systems are case insensitive).  In places where there was a
reason for case insensitivity (e.g., to match mail standards), it has
been done.  -mod ]

  But it's ridiculous to
say it makes more sense or it makes UNIX easier for new users or it
allows UNIX to support foreign languages.

[ "Ridiculous" is not an argument.  -mod ]

						don provan

Volume-Number: Volume 7, Number 62

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!decvax!decwrl!amdcad!lll-crg!seismo!ut-sally!std-unix
From: std-u...@ut-sally.UUCP
Newsgroups: mod.std.unix
Subject: Re: Case sensitive file names
Message-ID: <6029@ut-sally.UUCP>
Date: Fri, 17-Oct-86 12:35:48 EDT
Article-I.D.: ut-sally.6029
Posted: Fri Oct 17 12:35:48 1986
Date-Received: Fri, 17-Oct-86 21:20:24 EDT
References: <6002@ut-sally.UUCP> <5865@ut-sally.UUCP> <6018@ut-sally.UUCP>
Organization: IEEE P1003 Portable Operating System for Computer Environments Committee
Lines: 73
Approved: j...@sally.utexas.edu

From: cbosgd!cbosgd.ATT.COM!m...@ucbvax.berkeley.edu (Mark Horton)
Date: Fri, 17 Oct 86 11:20:32 edt
Organization: AT&T Medical Information Systems, Columbus

Don Provan raises some interesting questions about foreign languages.
In general, I think we know how to do a case insensitive comparison
appropriately, by extending a function (I think it's called strcoll,
but I don't have my X3J11 draft handy) defined in ANSI C; the function
is like strcpy, but the destination buffer gets a translation of the
string that will collate properly when a lexicographic comparison like
strcmp is used.  If we extend this function to also translate to one
case (as appropriate) and allow each country to define its own function,
it's technically possible to ignore case.  Whether it's fast enough for
the UNIX filesystem is unclear, although this problem is not restricted
to UNIX.

I think it would be interesting to hear what other, case-insensitive
operating systems do about these issues.  What do MS DOS, or VM/CMS,
or VMS, or whatever, do with their case insensitive file names in
Europe, or Japan, or whereever?

If the answer is that file names are restricted to use the same character
set as in the USA, and that extra letters are disallowed, then we need to
know how well this is accepted by the users on other systems.  Maybe it's
good enough.  Do users in other countries often create files whose names
contain extra letters?  If they try, does the shell get in the way if their
letter happens to be "|", for example?

If the answer is that other operating systems have forced other countries
to put up with Americanisms, and that POSIX is an opportunity to break new
ground by handling other languages properly, then by all means let's do it
right.  This might require 8 bit characters in file names, for example.

Incidently, I've seen it claimed here that UNIX allows arbitrary byte
streams in file names.  Perhaps this is the intent, but in reality the
UNIX filesystem is far from a transparent path.  There are lots of
restrictions, some of which are:

	The slash character is special.
	The null character is special.
	Sequences of more than 14 chars not containing a slash are
		either illegal or only significant to 14 chars or
		significant to 256 chars, depending on the version of UNIX.
	Characters with the 8th bit turned on are not allowed.
	Since many commands take names beginning with "-" as flags,
		file names beginning with "-" don't always work.
	Since the shell treats many of the punctuation characters
		specially, file names containing space, #, $, &, *, (, ),
		[, ], ;, ', ", \, |, <, >. and ? do not always work
		properly.  Even if you quote them, the shell strips
		off the quotes, so that if multiple layers of shell
		are involved (for example, uux) it still fails.

Because some of these problems only affect certain uses of the filesystem
(whether or not you go through the shell, whether or not you're going
through a command that takes arguments) it's not unusual for casual users
to create a file and then have trouble using, renaming, or even removing it.
I recall that removing a file whose 8th bit has been set is a frequent topic
on net.unix.
	
If the filesystem were really transparent, the designers of /proc would
not have had to encode process ID's in ASCII digits, they could have
directly used the binary representation.

It's for these reasons that I feel that a conservative UNIX user should
restrict themselves to certain "reasonable" filename conventions; basically
using only lower case letters, digits, and a few save punctuation characters
such as . and - in their filenames.  Just because it's possible to put a
space in a file name doesn't make it a good idea.

	Mark

Volume-Number: Volume 7, Number 67

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!ut-sally!std-unix
From: std-u...@ut-sally.UUCP (Moderator, John Quarterman)
Newsgroups: mod.std.unix
Subject: Re: Case sensitive file names
Message-ID: <6036@ut-sally.UUCP>
Date: Fri, 17-Oct-86 19:09:37 EDT
Article-I.D.: ut-sally.6036
Posted: Fri Oct 17 19:09:37 1986
Date-Received: Sat, 18-Oct-86 00:30:44 EDT
References: <6002@ut-sally.UUCP> <5865@ut-sally.UUCP> <6018@ut-sally.UUCP> 
<6029@ut-sally.UUCP>
Organization: IEEE P1003 Portable Operating System for Computer Environments Committee
Lines: 29
Approved: j...@sally.utexas.edu

From: mordor!...@sally.utexas.edu (John Bruner)
Reply-To: j...@s1-c.arpa 
Date: Fri, 17 Oct 86 14:39:08 PDT
Organization: S-1 Project, LLNL

It seems to me that there are three alternatives.  POSIX can specify
that conforming implementations must be case sensitive, must be case
insensitive, or may be either case sensitive or case insensitive.

If a conforming system must be case insensitive, then UNIX doesn't
conform.  If UNIX is to be included in the set of POSIX-compatible
systems, then case sensitivity must be permitted.

If a conforming system may be case sensitive or case insensitive,
then a lot of programs won't be portable.  Ignore for the moment
all existing UNIX code and consider new program development.  I
believe that programmers on one kind of system won't bother
with the library routines that are used to compare and/or convert
mixed-case names to monocase.  It doesn't matter what people "ought"
to do.  A well-known example of this effect is 4.2BSD.  The source
code is full of variables that should be declared "long" but --
since on the VAX "long" and "int" are identical -- are not.  In the
same way, optional case sensitivity will spawn code that only runs
correctly in the environment where it was written.

Therefore, I believe that case sensitivity must be retained, and
it should not be made optional.

Volume-Number: Volume 7, Number 68

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!decvax!decwrl!pyramid!ut-sally!std-unix
From: std-u...@ut-sally.UUCP
Newsgroups: mod.std.unix
Subject: Re: Case sensitive file names
Message-ID: <6049@ut-sally.UUCP>
Date: Mon, 20-Oct-86 05:13:29 EDT
Article-I.D.: ut-sally.6049
Posted: Mon Oct 20 05:13:29 1986
Date-Received: Mon, 20-Oct-86 21:40:36 EDT
References: <6002@ut-sally.UUCP> <5865@ut-sally.UUCP> <6018@ut-sally.UUCP> 
<6029@ut-sally.UUCP> <6036@ut-sally.UUCP>
Organization: IEEE P1003 Portable Operating System for Computer Environments Committee
Lines: 65
Approved: j...@sally.utexas.edu

From: cbosgd!cbosgd.ATT.COM!m...@ucbvax.berkeley.edu (Mark Horton)
Date: Sun, 19 Oct 86 23:11:35 edt
Organization: AT&T Medical Information Systems, Columbus

>If a conforming system may be case sensitive or case insensitive,
>then a lot of programs won't be portable.  Ignore for the moment
>all existing UNIX code and consider new program development.  I
>believe that programmers on one kind of system won't bother
>with the library routines that are used to compare and/or convert
>mixed-case names to monocase.  It doesn't matter what people "ought"
>to do.  A well-known example of this effect is 4.2BSD.  The source
>code is full of variables that should be declared "long" but --
>since on the VAX "long" and "int" are identical -- are not.  In the
>same way, optional case sensitivity will spawn code that only runs
>correctly in the environment where it was written.
>
>Therefore, I believe that case sensitivity must be retained, and
>it should not be made optional.

I'm sorry, but I don't buy this argument.  It seems to be based on
the assumption that case insensitivity will be implemented by the
use of subroutines for case-insensitive operations, with a different
user interface from that available today.  I think such an implementation
is silly, even if other operating systems may do it that way.

I'm talking about file names only.  I do not advocate even considering
making all of the user interfaces in UNIX case insensitive.  While it
might have once been a good idea to design them that way, I feel it's
far too late for someone to decree that all the upper and lower case
keys in, say, vi must be equivalent.

I think it's a given that existing code won't be rewritten to use new
interfaces, even if we come up with a wonderful way to do it.  Vi still
uses raw terminfo, even through curses would have been much easier and
better.  Also, there are lots of binaries out there that can't even be
recompiled.  Any solution to this problem must be in the kernel, or possibly
in libc underneath such subroutines as open, unlink, and chmod, (if you
have shared libraries or full source to recompile) or it won't work all
the time.

The obvious implementation is that the code in the kernel, when mapping a
filename to an inode number, to do a case-insensitive comparison when
checking each filename element in a directory.  This would be pretty
simple to add, although issues such as speed and international variations
would probably require a clever case-insensitive comparison, possibly
using a country-specific case mapping table with some flags or other
hacks to deal with single-multiple glyph mappings like SS to ess-tset.
There might even be a performance GAIN if creation of a directory entry
including calculating an appropriate hash function which is also stored
in the directory and used for initial comparisons.

I see no need to map everything to lower case when creating the directory
entry.  Let the entries be in mixed case; this allows more readable names.
I don't know what to do about sorting (e.g. in the shell or ls) - it might
be case sensitive or insensitive sorting, and good arguments can probably
be made for both.

The behavior I'm concerned about is that, if the user types, say, "mail"
and there's a command "Mail" in the search path, it should still work.
If the file "FooBar" exists and the user cats "foobar", because somebody
read that name over the phone, it should find it.

	Mark

Volume-Number: Volume 7, Number 72

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!decvax!ucbvax!ucbcad!nike!rutgers!seismo!ut-sally!std-unix
From: std-u...@ut-sally.UUCP (Moderator, John Quarterman)
Newsgroups: mod.std.unix
Subject: Re: case sensitive filenames
Message-ID: <6107@ut-sally.UUCP>
Date: Sun, 26-Oct-86 01:19:05 EST
Article-I.D.: ut-sally.6107
Posted: Sun Oct 26 01:19:05 1986
Date-Received: Sun, 26-Oct-86 07:17:13 EST
References: <5860@ut-sally.UUCP>
Organization: IEEE P1003 Portable Operating System for Computer Environments Committee
Lines: 77
Approved: j...@sally.utexas.edu

From: mcken...@sri-unix.arpa (Paul E. McKenney)
Date: Thu, 23 Oct 86 17:27:21 pdt
Organization: SRI, Menlo Park, CA.

Ok, how about a compromise proposal?

Keep roughly the same case-sensitivity in the kernel interface that exists
now.  This means that (for example) 'unlink("abc")' and 'unlink("ABC")' will
remove two different files.

Keep the normal shell interface for filenames.  This means that (again, for
example) 'rm abc' and 'rm ABC' will again remove two different files.

Make escape completion case insensitive.  (Escape completion is used in some
versions of BSD 4.x csh, perhaps elsewhere also.  It allows a user to
type the first part of a filename (or command name) and then hit
ESC.  The system will complete the filename as best it can.  If it cannot
unambiguously determine the filename from the part given by the user, it
will beep after having supplied as much of the filename as it can without
problems with ambiguity.  There is also usually a feature that allow the
user to display all filenames that match what he has typed so far --
control-D serves this function in some variants of BSD 4.2 csh.)

In other words, if a user types 'rm abc<ESC>' (where <ESC> represents the
ESC key), and there is a file named 'ABC', and there is no other file that
matches the pattern '[aA][bB][cC]', the shell (-not- the kernel) will
backspace over the 'abc' and overwrite it with 'ABC' so that the command
line will look as if the user had typed 'rm ABC'.  The user may then
hit RETURN if he wishes to execute the command, or he may further edit
the command line (using his usual backspace/delete, etc. characters).

This escape-mapping facility should be supplied in a library routine so that
application programs can easily act the same way.  It would be nice if such
a function could work with keywords, hostnames, etc. as well as filenames.

This proposal has the following advantages:

o	It does not impact existing software (addition of the case-insensitive
	ESC does not add any functionality, it just makes it easier on users).

o	It answers Mark Horton's 'filename-over-the-phone' problem
	<6...@ut-sally.UUCP> (just tell the user to type 'foobar<ESC>').

o	It allows users from a case-insensitive environment a helpful tool
	to ease their transition (let's face it -- if it is different than
	whatever you are used to, it ain't friendly -- regardless of whether
	you are used to case sensitivity, case insensitivity, or hieroglyphics).

o	Removes the need for millions and millions of 'upper()' calls in
	application code mentioned by Dan Libes <5...@ut-sally.UUCP>
	(although the additional code to do good escape-completion is far
	from trivial!).

o	Removes the need for 'isfsense()' or 'isflegal()' (Chris Lent,
	<5...@ut-sally.UUCP>) since all implementations could use the same
	definition of legal characters in a pathname.  Note that 'isflegal()'
	is still useful for programs that are trying to be portable across
	different operating systems.

This proposal leaves the following two issues unresolved:

o	Whether the eighth bit on characters within a filename should be
	significant.  The developers of BSD 4.[23] must have had some good
	reason for making it insignificant, but the only reason that comes
	to mind is that most terminals cannot easily specify the eighth bit
	(just like some older terminals cannot easily specify lower case!).

o	Whether there should be some escaping mechanism to allow slash ("/")
	and ASCII NUL in a filename.  I cannot think of a reason for allowing
	this that seems worth the trouble -- any comments?


			Paul E. McKenney
			mcken...@sri-unix.arpa
			{pyramid,rutgers,ucbvax!hplabs}!sri-unix!mckenney

Volume-Number: Volume 7, Number 89

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!rutgers!sri-spam!mordor!lll-crg!seismo!ut-sally!std-unix
From: std-u...@ut-sally.UUCP (Moderator, John Quarterman)
Newsgroups: mod.std.unix
Subject: Re: Case sensitive file names
Message-ID: <6226@ut-sally.UUCP>
Date: Tue, 4-Nov-86 12:36:22 EST
Article-I.D.: ut-sally.6226
Posted: Tue Nov  4 12:36:22 1986
Date-Received: Wed, 5-Nov-86 06:23:59 EST
Organization: IEEE P1003 Portable Operating System for Computer Environments Committee
Lines: 54
Approved: j...@sally.utexas.edu

From: ch...@mimsy.umd.edu (Chris Torek)
Date: Tue, 4 Nov 86 07:33:44 EST

We seem to have three proposals:

CS: Case sensitive file systems.  This is what all major Unix variants
    (V6, V7, SysIII, SysV, 2BSD, and 4BSD) now support.

CC: Case coercive file systems (file names forced to all upper or all
    lower case).

CR: Case retaining but otherwise insensitive file systems (new names
    are created according to the given case; matches are not case
    sensitive).

I sincerely hope that no one is seriously suggesting POSIX adopt
CC: no one seems to like such systems much.  That leaves CS and
CR.  The case for CR appears to be that those who have used both
CS and CR prefer CR.  This may be true; I have seen no studies,
but the anecdotes do seem to favour it.  I have used such a system,
and did not think it so wonderful, but for the sake of argument,
let us assume that CR really is objectively better than CS---so
much so that 5BSD and System V Release N+1 will have CR style file
systems.  Fine.

But as I understand it, POSIX is intended to be an interface
specification for something that resembles `Unix' (whatever `Unix'
may be).  If that is indeed the case, the only sensible choice is
CS, for, as I noted above, this is what all major Unix variants
*do*.  *They all agree:* file names are case sensitive.  Should
we make standard something that no one uses?  I say no!  When
5BSD and Release N+1 come out, then we can create a new standard
to describe these wonderful new systems, but until then, let
us write something that describes what we have now.

I believe that the first standard for *anything* that already exists
should describe the existing implementations, at least wherever
they agree.  Afterward, feel free to invent new improved standards,
so as to foist progress upon vendors.  Indeed, it might not be a
bad idea to publish two standards virtually simultaneously: That
Which Is, and That Which Should Be.  But list first That Which Is.

[ There really are (or at least were) two discussions going on here:
one about what should be in POSIX, the other about what UNIX should do.
I haven't seen any recent arguments that POSIX should do anything but
reflect what UNIX currently does, i.e., case sensitive file names
(really file names as uninterpreted byte streams).  -mod ]

-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	ch...@mimsy.umd.edu

Volume-Number: Volume 8, Number 34

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!ut-sally!std-unix
From: std-u...@ut-sally.UUCP (Guest Moderator, John B. Chambers)
Newsgroups: mod.std.unix
Subject: Re: Case sensitive file names
Message-ID: <6412@ut-sally.UUCP>
Date: Fri, 21-Nov-86 16:11:54 EST
Article-I.D.: ut-sally.6412
Posted: Fri Nov 21 16:11:54 1986
Date-Received: Fri, 21-Nov-86 21:49:42 EST
Organization: IEEE P1003 Portable Operating System for Computer Environments Committee
Lines: 29
Approved: j...@sally.utexas.edu

References:


>From bu-cs!...@harvard.UUCP Wed Nov 19 07:19:28 1986
Date: Tue, 18 Nov 86 21:35:03 EST
From: bu-cs!bu-cs.BU.EDU!...@harvard.UUCP (Barry Shein)


The problem with a file system where you cannot have ReadMe and
README is that you are throwing away possibilities. This also
means that I cannot have tmp01234A, tmp01234B, ... , tmp01234a, ...

I fear that although many people have applications that are small and
have small requirements they should not place restrictions on those
with large requirements, use your imagination, consider MasterCard's
data base for a moment or some of the multi-library catalog systems
people are building, they may need (and have machines that have no
trouble with) many thousands of files who's names may serve as primary
keys (why not, it's one way to guarantee write-through on update...)

Next they'll be telling us we should only allow 16-bit ints because
any number larger than 16-bits is hard to type in and error prone
anyhow.

I still suggest the use of 'stty lcase' if that's what you want
(alias run 'stty -lcase; \!* ; stty lcase' :-)

	-Barry Shein, Boston University



Volume-Number: Volume 8, Number 58