Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!rutgers!ames!sdcsvax!darrell
From: mar...@hplabs.HP.COM (Martin McKendry)
Newsgroups: comp.os.research
Subject: Transaction support in Unix
Message-ID: <3357@sdcsvax.UCSD.EDU>
Date: Tue, 23-Jun-87 17:31:02 EDT
Article-I.D.: sdcsvax.3357
Posted: Tue Jun 23 17:31:02 1987
Date-Received: Thu, 25-Jun-87 04:33:26 EDT
Sender: darr...@sdcsvax.UCSD.EDU
Lines: 37
Approved: mod...@sdcsvax.uucp


I'm interested in Unix support for transactions.   Our product
has the usual split, where there is a standard Unix filesystem,
but the database people use raw device I/O.  I would like to
unify the two systems, so one set of engineering/maintenance
covered both.  It seems to me that you would need to worry about
such things as granularity and mappings.  For example, you might
use mapped files, but you would somehow need to say that 'any
pages touched by this activity are transaction T', then later
do "fsync T", or "msync T", or whatever.  Maybe you'd do some
physical locking too (multiple processes, but only a single 
machine beating on each file).

Has anyone had any successful experience modifying a file system
this way?  What is the current state-of-the art, or where can
I read about it?  Note that I am not interested in transaction
support in which a file is the smallest unit of locking/recovery
granularity.  I want to get to the page level, and even below
that if possible.  They system has to be complete enough that
a database implementor will use it voluntarily (to dream the
impossible dream...)

An interesting quote:

    "If you are looking for a hard problem, here is one: define
an interface to page management which is useable by data management
in lieu of buffer management."
	-- Jim Gray, in Notes on Database Operating Systems, p.412,
in "Operating Systems," Springer Verlag 1979.

    That was written in 1977.  I figure that by now someone should
have worked it out.


	Martin S. McKendry
	FileNet Corp
	{hplabs,trwrb}!felix!martin

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!husc6!cmcl2!beta!hc!ames!sdcsvax!darrell
From: Avadis.Tevanian%wb1.cs.cmu.edu@sdcsvax (Avie)
Newsgroups: comp.os.research
Subject: Re: Transaction support in Unix
Message-ID: <3379@sdcsvax.UCSD.EDU>
Date: Fri, 26-Jun-87 22:25:00 EDT
Article-I.D.: sdcsvax.3379
Posted: Fri Jun 26 22:25:00 1987
Date-Received: Sun, 28-Jun-87 01:35:16 EDT
Sender: darr...@sdcsvax.UCSD.EDU
Lines: 37
Approved: mod...@sdcsvax.uucp

| Has anyone had any successful experience modifying a file system
| this way?  What is the current state-of-the art, or where can
| I read about it?  Note that I am not interested in transaction
| support in which a file is the smallest unit of locking/recovery
| granularity.  I want to get to the page level, and even below
| that if possible.  They system has to be complete enough that
| a database implementor will use it voluntarily (to dream the
| impossible dream...)
|
| An interesting quote:
|
|    "If you are looking for a hard problem, here is one: define
| an interface to page management which is useable by data management
| in lieu of buffer management."
| 	-- Jim Gray, in Notes on Database Operating Systems, p.412,
| in "Operating Systems," Springer Verlag 1979.
|
|     That was written in 1977.  I figure that by now someone should
| have worked it out.
|
|	Martin S. McKendry

In Mach we've worked it out.  Mach supports a notion of "external paging."
An "external pager" is a server that implements a set of routines used for
page management - routines such as pagein, pageout, etc.  In addition there
are several routines which the pager may use to cause the kernel to perform
certain cache (VM cache) functions, such as flush, lock, etc.  It's all
fairly complicated, but seems to be general enough to implement network
shared memory (fully coherent) outside of the kernel.  In addition, another
project here at CMU (Camelot) plans to use external pages to implement
recoverable memory - they are writing a server that manages the recoverable
memory transparently to client tasks.  Look for for the paper "The Duality
of Memory and Communication in the Implementation of a Multiprocessor
Operating System" by Mike Young, myself, Rick Rashid and others in the
upcoming SOSP for a reasonable description of the external pager mechanism.

	Avie