Path: utzoo!mnetor!uunet!husc6!mit-eddie!ll-xn!ames!sgi!jmb
From: j...@patton.SGI.COM (Jim Barton)
Newsgroups: comp.sys.sgi
Subject: TLB Synchronization Paper
Message-ID: <12193@sgi.SGI.COM>
Date: 4 Mar 88 20:16:07 GMT
Sender: dae...@sgi.SGI.COM
Organization: Silicon Graphics Inc, Mountain View, CA
Lines: 870
Keywords: multiprocessing

Here's the second of the papers given at USENIX.  This one is quite
technical, and a bit outdated by now (we've gone to local instead
of global TLB id allocation).  It will, however, give you the flavor
of a high-performance pure software solution to this problem.

-- Jim Barton
From the UNIX asylum at SiliconGraphics, Inc.
j...@sgi.sgi.com,  sgi!...@decwrl.dec.com, ...{decwrl,sun}!sgi!jmb

---------- cut here -----------------
.TL
Translation Lookaside Buffer Synchronization in a Multi\-Processor System
.AU
M. Y. Thompson
.AU
J. M. Barton
.AU
T. A. Jermoluk
.AU
J. C. Wagner
.AI
Silicon Graphics, Incorporated
.AB
.LP
Most current computer architectures use a high\-speed cache
to translate user virtual addresses into physical memory addresses.
On machines that require software to implement cache
fills and invalidations, the software task is fairly straightforward.
In a multi\-processor multi\-cache configuration, however,
where processes are allowed to migrate across processors,
there is an inherant synchronization problem,
as well as performance issues.
.LP
This paper discusses a solution to these issues that is general
enough to implement without specialized hardware,
yet offers good performance.
.AE
.NH 1
Introduction
.LP
Most current computer architectures use a high\-speed cache
to translate user virtual addresses into physical memory addresses (a
.I
translation lookaside buffer,
.R
or TLB).
When a translation entry does not exist for
a particular user virtual address,
some combination of software and hardware must be employed to create
that translation and supply it to the TLB.
When a current virtual\/physical translation changes or becomes invalid,
as happens when a physical page is ``stolen'' from
one process and assigned to another,
an extant TLB entry must be replaced or removed.
The methodology to perform these functions is well\-known on a traditional
single\-processor (SP) computer system.
.LP
It was found, however, that the methodology available was insufficient
when applied to a multi\-processor (MP) configuration where processes are
allowed to migrate across processors.
In particular, the methodology fails on a multi\-processor system
where each processor is coupled with a private TLB: replacing or
removing an entry in one TLB does not change or invalidate other,
possibly extant, entries on other system processors.
.LP
This paper discusses the overall strategy that was
devised to manage the TLB.
The various situations in which TLB entries must be replaced or
invalidated are enumerated, as are the details of both the
SP and MP implementation.
.NH 1
Translation Lookaside Buffer
.LP
The target hardware is a system using the MIPS R2000
simplified\-instruction\-set processor.
The TLB is part of the system coprocessor, one of which
is associated with each processor.
The TLB does not have a direct connection to memory, and it knows
neither the form nor the location for page tables.
TLB management is accomplished by software via coprocessor instructions.
This approach requires slightly longer refill times than might occur
with dedicated hardware, but has the advantages of simplified hardware
and flexibility.[MIPS86]
.LP
Each TLB entry consists of two words.
The low word contains a physical page frame
number and various hardware bits (valid, dirty, etc.).
The high word contains a virtual page number (VPN) and an id (TLBID).
The id field is currently six bits \-\- thus, 64 TLB ids are available.
Additionally, there exist two index registers which are used to
address TLB entries (an Index register and a Random register),
and an EntryLo and EntryHi register pair.
The formats of the EntryLo and EntryHi registers pairs are the same as
the TLB entries.
Figure 1 shows these formats.
A TLB match occurs when there exists an entry which matches the input virtual
address and the current TLBID field in the EntryHi register.
Misses cause an exception, as do references to invalid entries,
or stores to an address that matches a TLB entry that is not marked dirty.
.KS
.DS B
.PS
#
#	Set any necessary parameters
#
boxht = 0.25
#
#	Draw the boxes
#
B1: box wid 0.75 "\s-2VPN\s0"
"\s-220\s0" at B1.s below
"EntryHi" at B1.nw + (0.0, 0.2) ljust
#
B2: box wid 0.5 "\s-2TLBID\s0" with .sw at B1.se
"\s-26\s0" at B2.s below
#
B3: box wid 0.5 "\s-20\s0" with .sw at B2.se
"\s-26\s0" at B3.s below
#
move to B3.e
move right 0.5
B4: box wid 0.75 "\s-2PFN\s0"
"\s-220\s0" at B4.s below
"EntryLo" at B4.nw + (0.0, 0.2) ljust
B5: box wid 0.20 "\s-2N\s0" with .sw at B4.se
"\s-21\s0" at B5.s below
B6: box wid 0.20 "\s-2D\s0" with .sw at B5.se
"\s-21\s0" at B6.s below
B7: box wid 0.20 "\s-2V\s0" with .sw at B6.se
"\s-21\s0" at B7.s below
B8: box wid 0.20 "\s-2G\s0" with .sw at B7.se
"\s-21\s0" at B8.s below
B9: box wid 0.5 "\s-20\s0" with .sw at B8.se
"\s-28\s0" at B9.s below
.PE
.DE
.DS C
\fBFigure 1.\fP Translation Lookaside Buffer Format
.DE
.KE
Coprocessor instructions exist to probe for an extant entry (the index
of the entry is left in the Index register); to read a specific TLB
entry (EntryHi and EntryLo receive the contents of the TLB entry indexed
by the Index register); to write to a specific entry (EntryHi and EntryLo
via the Index register); and to write to a ``random'' TLB entry
(the pseudo\-random Random register is used as the index).
.KS
.DS B
.PS
#
#	Set any necessary parameters
#
boxht = 0.25
#
#	Draw the boxes
#
B4: box wid 0.75 "\s-2PFN\s0"
"\s-220\s0" at B4.s below
B5: box wid 0.20 "\s-2N\s0" with .sw at B4.se
"\s-21\s0" at B5.s below
B6: box wid 0.20 "\s-2D\s0" with .sw at B5.se
"\s-21\s0" at B6.s below
B7: box wid 0.20 "\s-2V\s0" with .sw at B6.se
"\s-21\s0" at B7.s below
B8: box wid 0.20 "\s-2G\s0" with .sw at B7.se
"\s-21\s0" at B8.s below
B9: box wid 0.20 with .sw at B8.se
"\s-21\s0" at B9.s below
B10: box wid 0.20 "\s-2SV\s0" with .sw at B9.se
"\s-21\s0" at B10.s below
B11: box wid 0.20 "\s-2CW\s0" with .sw at B10.se
"\s-21\s0" at B11.s below
B12: box wid 0.22 with .sw at B11.se
"\s-22\s0" at B12.s below
B13: box wid 0.25 "\s-2NR\s0" with .sw at B12.se
"\s-23\s0" at B13.s below
#
#	Draw the legend
#
line <-> from B4.nw + (0.0, 0.1) to B8.ne + (0.0, 0.1) "\s-2hard bits\s0" \
	above
line <-> from B8.ne + (0.0, 0.1) to B13.ne + (0.0, 0.1) "\s-2soft bits\s0" \
	above
.PE
.DE
.DS C
\fBFigure 2.\fPSoftware Page Table Format
.DE
.KE
The page table is the software counterpart of the TLB.
When a TLB entry is written, it is the software page table entry
that is copied into the TLB.
Bit fields not used by the TLB hardware are used for
(software) valid and copy\-on\-write flags,
and for a reference counter.
Figure 2 shows the page table entry format.
.NH 1
Operating System Support
.LP
There are five different situations in which, on our system,
(a port of the 5.3 UNIX Operating System)
.FS
"UNIX is a trademark of AT&T Bell Laboratories"
.FE
TLB entries can become inconsistent with process state.
They are:
.RS
.IP 1.
A process shrinking its address space.
.IP 2.
Physical pages being ``stolen'' from a process.
.IP 3.
System virtual address reallocation.
.IP 4.
System physical address reallocation.
.IP 5.
Writes to copy\-on\-write pages.
.RE
.LP
The first situation occurs when a process sets its maximum data value
to a lower value, when it releases a shared memory segment,
or when it releases all its address space on exit.
.LP
The second scenario occurs in low\-memory situations when the
memory management daemon takes physical pages from a process
to be available to others.
.LP
The system also keeps a map of virtual addresses
which are allocated for short durations
for purposes such as mapping user physical addresses into system
virtual space for DMA.
After each use, the virtual addresses are returned to the
system address map for reuse.
.LP
Similarly, physical pages are often assigned for varying durations
to steadfast system virtual addresses such as file system buffers.
Over time, pages may be assigned, usurped, and new pages assigned.
.LP
Lastly, when a process writes to a shared copy\-on\-write page,
a copy of the page is created and the new page is assigned
to the writing process.
.LP
In all of these situations,
there can exist entries in the various TLBs that are suddenly incorrect.
In all of these situations
it is necessary to ensure that the process doesn't access addresses
that it has surrendered.
To that end, there must be no entries in the TLB on the processor
on which a user process is running that map virtual/physical addresses
which are no longer correct.
Similarly, the kernel process must take pains not to access kernel
.\" virtual addresses which are no longer valid.\s-2\u 1\s0\d
virtual addresses which are no longer valid.\s-2\v'-0.4m'1\v'0.4m'\s+2
.FS
1. The implication is that, while user processes are not to be trusted,
the kernel can certainly understand its own memory management state
and take care not to abrogate its policies.
.FE
.LP
In our original SP port,
the TLB replacement and invalidation policies were situational.
That is, for each situation an expedient method was devised to
keep the TLB synchronized with the system state,
but there existed no overall strategy for TLB management.
On the MP system, it became clear that an over\-all policy was essential,
both to make the various mechanisms work efficiently (severally and together),
and to make the problem manageable.
.LP
While on an SP system it was often appropriate
to replace or remove TLB entries immediately as the entry became invalid,
on an MP system this strategy suffers from overenthusiasm.
It could well be the case that a process which has divested itself
of pages or has had new physical pages assigned to particular virtual
addresses never runs on other processors on the system,
or runs on another processor only after ``natural'' events have
caused the invalid entries to be replaced or removed.
We decided to accentuate this tendency and put off TLB invalidations
until absolutely necessary.
.LP
To implement this strategy (which we labeled ``lazy devaluation''),
system and process state is recorded to understand when
TLB entries on a particular processor must be replaced or invalidated.
When such an event does occur, the entire TLB is flushed,
and the state structures are adjusted so that, logically,
the flush creates the greatest effect.
.LP
The following sections explicate the various situations and mechanisms
involved.
.NH 2
Shrinking Processes
.LP
There are several scenarios in which a process might divest itself
of current address space.
These range from a process resetting its break value to
a process detaching a shared memory region or unmapping a mapped file region
to a process exiting.
.LP
The last case is benign \-\-
an exiting process no longer has the ability to reference its
address space.
.LP
The other cases are surmountable.
Since a TLB match requires that an entry match both the input virtual address
and the current TLBID (in EntryHi), assigning a new TLB id to the process
effectively renders current (possibly stale) entries inaccessible.
This approach is more efficient than the alternatives \-\-
flushing the entire TLB whenever a process shrinks its address space,
or probing for and invalidating each possible (now invalid) TLB entry.
.LP
It is only when there are no readily available TLB ids that drastic
action needs to be taken.
In that case, each process' TLB id is set to an invalid value
(the id is kept in the proc structure) and the TLB is flushed.
It is safe to invalidate the id field
of an active process since it is guaranteed that, on an SP system,
no other process besides the one requesting an id is currently running,
and thus, there is no process actively using TLB ids.
When a process resumes, it checks if its TLB id is still valid;
if not, it requests a valid id from the id allocator.
.LP
On an MP system, there is no such guarantee \-\- processes on
other processors may well be active.
The TLB id reallocation problem is easily solved, however,
by freeing only those ids whose associated process is not currently running.
A field in the proc structure indicates whether the process is
currently running on any processor.
With suitable spin locks and semaphores to protect bit fields and
TLB id allocation code, process shrinking becomes quite tenable
for the operating system.
.LP
TLBIDs are managed as a site\-wide resource, so, at the time
that ids must be recycled all TLBs on the site must be flushed.
To effect site\-wide flushing,
it is only necessary to set bits in a global bit field,
one bit for each active processor.
Whenever a processor flushes its TLB,
it clears its corresponding bit in the field.
The initiating routine merely sets all appropriate bits beforehand,
flushes its own TLB,
and waits until the entire field has been cleared.
(On systems that have an inter\-processor interrupt facility,
this wait is minimal.
On systems without hardware support, simple messaging can be used
to initiate TLB flushing on the various processors.)
.NH 2
Reclaiming Pages
.LP
The major functions of the paging daemon are to determine page usage
and to free pages into the page pool when memory gets tight.
As there are no hardware reference bits available,
page usage on our system is determined by periodically
decrementing software reference counters
and turning off the hardware valid bits in
the page table entries.
The paging daemon has only to invalidate the corresponding
entries in the TLB to cause subsequent references to produce reference faults.
The fault code resets the valid bit and the reference counter for the
faulting page, and drops the entry into the TLB.
.LP
Similarly, to reclaim a page for the free pool, the paging daemon clears the
software and hardware valid bits in the page table entries,
and inserts the pages into the free list.
Semaphores associated with each virtual memory region [Bach86]
are used to ensure that page faults and page manumission are,
effectively, atomic.
.LP
For both reference fault enabling and page manumission,
TLB entries are not invalidated individually.
Instead, a number of pages, possibly spanning several regions,
are operated on at once.
Before the region semaphores are released,
TLBs are flushed site\-wide.
.NH 2
System Virtual Addresses
.LP
In general, the operating system runs without TLB mappings.
The kernel is divided into three segments which carve out the
addresses from 0x80000000 through 0xffffffff (the user segment \-\-
kuseg \-\- includes all virtual addresses from zero through 0x7fffffff).
References to kseg0 (0x80000000 to 0xa0000000) are cached
but not mapped into the TLB.
Most of the kernel's executable code and some of its data reside here.
The kseg1 segment (0xa0000000 to 0xc0000000) provides uncached,
unmapped references \-\-
I\/O registers and ROM code are mapped to these addresses.
Both kseg0 and kseg1 addresses are direct\-mapped onto the first
512MB of physical address space.
Like kuseg, the kseg2 segment (0xc0000000 through 0xfffffff) uses
TLB entries to map virtual address to arbitrary physical ones.\*(AA
The operating system allocates kseg2 addresses for some dynamic
structures and for performing DMA into user space.
.KS
.DS B
.PS
#
#	Set any necessary parameters
#
userh = 2.0
kseg0h = userh/4
kseg1h = userh/4
kseg2h = userh/2
physmemh = userh/8
iomemh = userh - physmemh + kseg0h + kseg1h + kseg2h
boxwid = 0.75
#
#	Draw the address space and attach labels
#
B1: box ht userh "kuseg" "\s-2(TLB mapped)\s0"
"0x00000000 " at B1.sw rjust
B2: box ht kseg0h with .sw at B1.nw "kseg0" "\s-2(cached)\s0"
"0x80000000 " at B2.sw rjust
B3: box ht kseg1h with .sw at B2.nw "kseg1" "\s-2(uncached)\s0"
"0xa0000000 " at B3.sw rjust
B4: box ht kseg2h with .sw at B3.nw "kseg2" "\s-2(TLB mapped)\s0"
"0xc0000000 " at B4.sw rjust
"Virtual Memory Map" at B4.n + (0.0, 0.1)
#
#	Draw the physical address space and attach labels
#
B5: box ht physmemh "Memory" with .sw at B1.se + (1.5, 0.0)
"0x00000000 " at B5.sw rjust
B6: box ht iomemh with .sw at B5.nw "I/O Space" 
"0x20000000 " at B6.sw rjust
"Physical Memory Map" at B6.n + (0.0, 0.1)
#
#	Connect up the dots
#
line -> from B2.e to B5.w
line -> from B3.e to B5.w
.PE
.DE
.DS C
\fBFigure 3.\fP Hardware Defined Virtual Memory Map
.DE
.KE
For user DMA, the system allocates kernel virtual addresses
from a system address map
and double\-maps the user's pages into the system space.
The interrupt code which transfers data then does not need
knowledge of the user process for which the transfer occurs.
On an SP system, dropping in new TLB entries for the system
virtual pages when they are allocated is sufficient to ensure
that no stale TLB entries exist from the previous allocation.
(The dropin code probes for a current entry for
the TLBID/virtual\-address pair and replaces that entry if it exists.)
But on an MP system, dropping in new TLB entries on one processor does
not affect other processors' TLBs.
Again, instead of signalling each processor and having each processor
replace or invalidate entries, we take the lazy devaluation approach.
The various TLBs are allowed to fill with new entries ``naturally'',
that is, by reference.
Upon deallocation, however, the page is not returned to the free map,
but is instead placed in a stale address map.
If the system map becomes depleted,
the site\-wide TLB flush routine is called.
This routine always merges the stale address map back into the system map
while waiting for other processors to flush their TLBs.
.NH 2
System Kseg2 Mappings
.LP
A variation of the page reclaiming problem exists with
certain kernel routines that allocate and free physical pages associated with
kernel virtual addresses (for example, pages for file system buffers).
Unlike the memory management paging daemon, which frees large numbers
of pages at a time, pages are released in small numbers.
Because of this, wholesale TLB flushing is inappropriate.
Instead, we apply the precept of lazy devaluation.
We track page usage through state tables and postpone TLB flushing.
.LP
When a page is returned to the free list, the valid bits are reset,
but the page frame number persists in the system page tables.
It is only when the virtual address is surrendered that the
page table entry's page frame number is cleared, indicating that
there is no ``remembered'' association with a physical page.
.LP
When allocating a page for a system virtual address,
if there exists a page frame number in the page table entry,
the named page it is reassigned to the virtual address if the page is free.
If the page is not available,
the system\-wide TLB flush routine is called.
At this time, all invalid system page table entries
have their physical page frame number fields cleared,
indicating that there is no longer a residual relationship
between the virtual addresses and the physical pages.
.LP
As a performance enhancement, when a physical page that was mapped
to a system virtual address gets returned to the free memory list,
its corresponding system page table entry is linked on a dirty list.
The TLB flush routine traverses this list when clearing
the physical page frame numbers.
If a process is surrendering both the virtual and physical pages,
this linking is not necessary \-\- returning the virtual addresses
into a stale map ensures that the system won't use the address
without first flushing the TLB.
.LP
When a previously\-assigned page is reassigned to the same
virtual address, it must, of course, be dequeued from the
dirty list.
.LP
Overloaded fields in the parallel disk block descriptor (DBD) [Bach86]
are currently used for this chore.
(The descriptors are unused since the corresponding pages are never
swapped to disk, and, in the current implementation,
the DBDs are not separably allocatable.)
To facilitate dequeuing, a doubly\-linked list is used;
in order to fit into the DBDs,
the fields are actually offsets into the system page table.
.KS
.DS B
.PS
#
#	Set any necessary parameters
#
boxht = 0.25
boxwid = 1.0
tcenter = (boxwid + boxwid/2)/2
define pfn X
box $1 with .nw at last box.sw
X
define sv X
box wid boxwid/2 $1 with .nw at last box.sw
X
#
#	Draw the page frame array.
#
B1: box ht 0.0 wid 0.0 invisible
pfn("\s-2(unused)\s0")
T1: "\s-1Page Frame #\s0" at last box.n above
"System Page Table" at B1.n + (tcenter, 0.4)
pfn("\s-2(list head)\s0")
pfn("\s-2AAA\s0")
pfn("\s-2BBB\s0")
pfn("\s-20\s0")
pfn("\s-2DDD\s0")
pfn("\s-2EEE\s0")
pfn("\s-2FFF\s0")
move to B1.sw + (boxwid, 0.0)
B2: box ht 0.0 wid 0.0 invisible
sv("\s-2X\s0")
"SV" at last box.n above
sv("\s-2X\s0")
sv("\s-21\s0")
sv("\s-20\s0")
sv("\s-20\s0")
sv("\s-21\s0")
sv("\s-20\s0")
sv("\s-20\s0")
#
#	Now draw the disk block descriptors
#
boxwid = .5
move to B1.e + (2.0, 0.0)
B4: box wid 0.0 ht 0.0 invisible
pfn()
T2: "\s-1Forward\s0" at last box.n above
"System Disk Block Descriptors" at last box.ne + (0.0, 0.4)
pfn("\s-23\s0")
pfn()
pfn("\s-26\s0")
pfn()
pfn()
pfn("\s-27\s0")
pfn("\s-21\s0")
move to B4.se + (boxwid, 0.0)
B5: box wid 0.0 ht 0.0 invisible
pfn()
T3: "\s-1Back\s0" at last box.n above
pfn("\s-27\s0")
pfn()
pfn("\s-21\s0")
pfn()
pfn()
pfn("\s-23\s0")
pfn("\s-26\s0")
#
#	clean up.
#
.PE
.DE
.DS C
\fBFigure 4.\fP System Page Table with Stale Relationships
.DE
.KE
.LP
Figure 4 shows an example in which page table entries three, six and seven
have been chained into the ``stale relationships'' list.
Entry four is not chained \-\- the virtual address was released with
the physical page.
Figure 5 shows the same system page table entries after
a system\-TLB flush.
.KS
.DS B
.PS
#
#	Set any necessary parameters
#
boxht = 0.25
boxwid = 1.0
tcenter = (boxwid + boxwid/2)/2
define pfn X
box $1 with .nw at last box.sw
X
define sv X
box wid boxwid/2 $1 with .nw at last box.sw
X
#
#	Draw the page frame array.
#
B1: box ht 0.0 wid 0.0 invisible
pfn("\s-2(unused)\s0")
T1: "\s-1Page Frame #\s0" at last box.n above
"System Page Table" at last box.nw + (tcenter, 0.4)
pfn("\s-2(list head)\s0")
pfn("\s-2AAA\s0")
pfn("\s-20\s0")
pfn("\s-20\s0")
pfn("\s-2DDD\s0")
pfn("\s-20\s0")
pfn("\s-20\s0")
move to B1.sw + (boxwid, 0.0)
B2: box ht 0.0 wid 0.0 invisible
sv("\s-2X\s0")
"SV" at last box.n above
sv("\s-2X\s0")
sv("\s-21\s0")
sv("\s-20\s0")
sv("\s-20\s0")
sv("\s-21\s0")
sv("\s-20\s0")
sv("\s-20\s0")
#
#	Now draw the disk block descriptors
#
boxwid = .5
move to B1.e + (2.0, 0.0)
B4: box wid 0.0 ht 0.0 invisible
pfn()
T2: "\s-1Forward\s0" at last box.n above
"System Disk Block Descriptors" at last box.ne + (0.0, 0.4)
pfn("\s-21\s0")
pfn()
pfn("\s-20\s0")
pfn()
pfn()
pfn("\s-20\s0")
pfn("\s-20\s0")
move to B4.se + (boxwid, 0.0)
B5: box wid 0.0 ht 0.0 invisible
pfn()
T3: "\s-1Back\s0" at last box.n above
pfn("\s-21\s0")
pfn()
pfn("\s-20\s0")
pfn()
pfn()
pfn("\s-20\s0")
pfn("\s-20\s0")
#
#	clean up.
#
.PE
.DE
.DS C
\fBFigure 5.\fP System Page Table After TLB FLush"
.DE
.KE
.NH 2
Faults \-\- Misses, Reference, Protection
.LP
The strategy for handling TLB misses is fairly straightforward.
For first\-level misses, the page table entry is copied to EntryLo,
the VPN/TLBID pair is written into EntryHi, and the pair is randomly
deposited into the TLB.
Second\-level misses are handled in a similar manner, except that the
second\-level entry (the TLB entry for the page table itself) is
deposited into a specific TLB location,
that location determined by software.
On the current implementation, the processor constrains the Random
register to contain a value from eight to 63.
This allows entries zero through seven to be reserved for page tables,
the kernel stack, and the like.
.LP
Reference faults and protection faults are handled similarly \-\-
the page table entry for the faulting address is fetched (possibly
causing a second\-level miss), sanity checking is performed, and
the new entry is dropped in, either replacing an extant entry,
or, if none exists, dropped into a random TLB location.
When a valid reference is made to an address
to which a physical page is not currently
assigned, the fault code must assign a physical page for the
process and fill it appropriately.
.LP
None of these actions are a problem on an SP system, and,
for the most part, on an MP system.
Dropping an unchanged entry into a TLB is innocuous.
Dropping in an entry for a newly\-assigned page is trouble\-free, too \-\-
the assumption is made that the routine that disassociated the page
from its previous process\/address took care to purge the (possibly
extant) entries from the TLB(s).
.LP
Protection faults pose a problem on an MP system, however,
when the fault is on a copy\-on\-write page.
A copy\-on\-write page might be referenced by multiple processes at the
time one process writes to it.
The SP approach is simple:  if more than one process is currently
referencing the page, a new page is assigned for the writer,
the data are copied,
and a new TLB entry is deposited (with the dirty bit set).
But on an MP system, there could exist entries on other TLBs
(had the process previously run on other processors)
that reflect the previous virtual/physical mapping.
If the process migrated to another processor (on which
it had previously run) without ensuring that the entry was purged,
further references could access the wrong page.
.LP
Again, the approach is to keep state tables and avoid action until necessary.
Instead of actively searching out and removing entries on TLBs
throughout the system, it is just noted that there exist (possibly)
stale TLB entries for this process on other processors.
The minimal data structure is a bit field the size of the number
of processors in the system, one for each TLB id
(or, one for each proc structure, for small systems).
After assigning a new page, it is only necessary to set the bits
corresponding to all but the current processor,
indicating there might be invalid entries for this TLBID (process)
on the flagged processors.
(A new entry is deposited in the current processor's TLB,
so it is not necessary to set the dirty bit for the current processor.)
When a process resumes, it checks whether a bit is set
for the processor on which the process is now running.
If so, the current processor's TLB is flushed.
To further performance, a parallel bit field is kept that
indicates the processors on which the process has actually run.
It is only necessary to set dirty bits for processors
other than the current one on which the process has previously run.
.LP
When a process is assigned a new TLB id (either because the process
is just starting up, or because it shrank, or because its id
was taken away for reassignment),
dirty bits in the entry indexed by the TLB id are cleared,
as are the history bits
for all but the current processor.
(TLB ids are delivered ``clean'', that is, without any entries
in any TLB using that id.)
Similarly, whenever a processor flushes its TLB,
the dirty bits for that processor in all entries are cleared,
as are the history bits in all entries except for the currently
running process.
.KS
.DS B
.PS
#
#	Set up any necessary information
#
boxwid = .75
boxht = .25
#
#	Draw everything
#
box "00100000"
"1 " at last box.w rjust
"\s-1history\s0" at last box.n above
box "00000000" with .sw at last box.se
"\s-1dirty\s0" at last box.n above
box "01010001" with .ne at last box.sw
"2 " at last box.w rjust
box "00000000" with .sw at last box.se
"\fBBefore Write\fP" at last box.e + (0.25, 0.0) ljust
box "00000101" with .ne at last box.sw
"3 " at last box.w rjust
"\s-27\s0" at last box.sw below
"\s-20  \s0" at last box.se below
box "00000001" with .sw at last box.se
"\s-2  7\s0" at last box.sw below
"\s-20\s0" at last box.se below
#
#	Move down and draw it again.
#
move to last box.sw - (boxwid, 0.75)
box "00100000"
"1 " at last box.w rjust
"\s-1history\s0" at last box.n above
box "00000000" with .sw at last box.se
"\s-1dirty\s0" at last box.n above
box "01010001" with .ne at last box.sw
"2 " at last box.w rjust
box "01000001" with .sw at last box.se
"\fBAfter Write\fP" at last box.e + (0.25, 0.0) ljust
box "00000101" with .ne at last box.sw
"3 " at last box.w rjust
"\s-27\s0" at last box.sw below
"\s-20  \s0" at last box.se below
box "00000001" with .sw at last box.se
"\s-2  7\s0" at last box.sw below
"\s-20\s0" at last box.se below
#
#	Move down and draw it a third time
#
move to last box.sw - (boxwid, 0.75)
box "00100000"
"1 " at last box.w rjust
"\s-1history\s0" at last box.n above
box "00000000" with .sw at last box.se
"\s-1dirty\s0" at last box.n above
box "01010000" with .ne at last box.sw
"2 " at last box.w rjust
box "01000000" with .sw at last box.se
"\fBAfter TLB Flush on Processor 0\fP" at last box.e + (0.25, 0.0) ljust
box "00000101" with .ne at last box.sw
"3 " at last box.w rjust
"\s-27\s0" at last box.sw below
"\s-20  \s0" at last box.se below
box "00000000" with .sw at last box.se
"\s-2  7\s0" at last box.sw below
"\s-20\s0" at last box.se below 
.PE
.DE
.DS C
\fBFigure 6.\fP Example TLBID State Structures
.DE
.KE
Figure 6 shows three snapshots of an abbreviated TLBPID state structure.
The first and second are just before a process owning TLBPID 2 running
on processor 4 writes to a shared copy\-on\-write page.
At the time of the write, the process has a history of having run
on processors 0, 4 and 6.
The last snapshot is the same TLBPID state structure just after processor 0
has flushed its TLB.
Note that the persistence of the history bit 0 for TLBPID 3 implies
that it is currently running on processor 0.
.NH 1
Summary
.LP
There are certainly other approaches that might have been
followed to solve the problem of keeping multiple TLBs correct.
Preliminary performance figures, however,
indicate that the lazy devaluation approach succeeds without
causing excessive TLB flushing.
.LP
Most importantly,
the various state structures can be enhanced and routines tuned
to take advantage of added information without changing the
underlying mechanisms.
.LP
For example, the TLBPID state structures could be extended to
list the individual stale entries.
Instead of flushing the entire TLB, those entries (if still extant)
could be individually flushed.
If TLB id information were to be made available to the memory
management daemon (currently, there is no path from a region
structure, upon which the daemon operates, to process and TLB ids),
a similar refinement could be made.
It is not clear if these changes would be of benefit,
but the changes could be implemented and tested without
restructuring the entire TLB management system.
.bp
.SH
References
.XP
.I
MIPS System Programmer Guide, Beta Version.
.R
Mips Computer Systems, Inc., Mountain View, CA, 1986.
.XP
Bach, Maurice J.,
.I
The Design of the UNIX Operating System,
.R
Prentice Hall, New Jersey, 1986.
---------- cut here -----------------