ULTRACOMPUTER PROTOTYPE NOTE #2 Jim Lipkis 24 Nov. 1981 The Motorola MC68451 Memory Management Unit: Prospects for Use in the Ultracomputer Prototype [The following proposal has evolved from conversations with Ralph Grishman, Allan Gottlieb, and Kevin Mcauliffe, and is based on a suggestion by Ralph.] I. A general description of the new MMU chip, based on the preliminary documentation available from Motorola ("Product Preview"). The MC68451 provides a single-level mapping from 24-bit virtual (logical) address to 24-bit real (physical) address. It supports variable-length segments, which may be separately relocated in real memory. Each task in the system may have an unlimited number of segments, each of which may vary independently from 256 bytes to 16 megabytes in multiples of powers of 2. (The MMU actually translates only the high-order 16 bits of the address; the seven low-order address lines and upper/lower-byte-select lines from the MC68000 are connected directly to the memory (or bus) and bypass the MMU. Each MMU chip contains 32 "segment descriptors", and any number of chips can be connected in parallel to a single processor.) Each segment descriptor designates the one or more address spaces of which the particular segment is a piece. During a memory reference an associative search is done for all segments designating the "current" address space. The virtual address ranges of those segments are then examined to determine whether a successful translation can be performed. An individual task in the system is considered to consist of four address spaces (user mode and supervisor mode; program and data for each). The MMU contains four 8-bit registers identifying the four address spaces for the "current" task. At context switch, only these registers need to be reloaded by the software; segment descriptors need not change. Function lines from the CPU chip (FC0,1,2) are used to select automatically among the four at each memory reference. The fast context switch is, in general, an attractive feature of this MMU. Furthermore, since segments can be allocated very flexibly, the number of segments required will be much lower than (for example) the WICAT MMU in which segments are fixed at 4K. Each segment may be shared, either globally or between individual address spaces (on one processor). (The sharing mechanism is somewhat awkward, however, and would be difficult to use in a system with more than eight address spaces defined.) Write protection can be set on any segment. Each segment descriptor contains reference and change flags, which would be useful for a demand-paging system. However, a two-level address translation (segment map and page map) mechanism would be required to support demand-paging within software-allocated segments. II. Proposal for ultracomputer WICAT UNIX, (presumably) the operating system for the prototype ultracomputer (PUC), is currently understood to be a virtual but non-paging system. It requires memory mapping to support the FORK primitive, in which two sets of physical memory areas are accessed as a single virtual address space by each of the two resulting processes. The further requirement of the PUC is a special status for shared (between PEs) read-write memory segments. These must be recognizable by the PNI during memory reference so that caching restrictions, etc., can be enforced. It seems natural for the shared read-write (SRW) memory references to be detected in the MMU, since in the flat address architecture of the MC68000 the processor does not recognize segments. SRW segments may be identified to the operating system through a new specification on the "Allocate" system call, which would be used by the loader and by compiler-generated code. The system can pass this information on to the MMU when the MMU segment descriptors are being loaded (which will occur whenever a task is initiated on a new PE). This might be done by using the high-order bit of the base physical address in each segment descriptor as an SRW flag. The PNI (which is assumed to follow the MMU) would then receive the SRW notification with each physical memory reference. If an SRW segment is to be temporarily accessed read-only by some task, such that full caching is desired, either the segment descriptor could be rewritten with the high-order bit off, or a new segment descriptor could be added and the old one temporarily disabled. Since the high-order bit would then be unavailable for addressing, the maximum attachable physical memory would be 8 megabytes. By the time the 64-PE prototype is being designed the MMU will almost certainly support 32-bit addresses, so this scheme would allow 2 gigabytes of physical memory. A further advantage of this scheme is that the minimal Level-0 PNI becomes extremely simple, consisting only of the Motorola MMU chip and a bus interface with the high-order address line tied to zero. All further hardware developments (cache, network) are then transparent to the software. (The code for Fetch-and-add will change when the full PNI is available, but this is not a critical-path dependency.) The corresponding drawback is that more software work will be required to bring up a level-0 system. Conceptually, this involves changing the WICAT UNIX memory-management component to deal with a small number of segments rather than a large number of pages. (If the current scheme were mapped directly, using fixed 4K segments, 16 MMU chips would be required on each processor.) Since the complexity of the UNIX mechanisms is unknown, it is impossible to estimate the difficulty of this change. Nonetheless, there appear to be several advantages in using the Motorola MMU. It is (or soon will be) commercially available, hopefully debugged, and supports a flexible segmentation scheme. Furthermore, updating MMU tables can be the most time-consuming part of a context switch. The fast inter-user context switch of this MMU may make individual-PE multiprogramming feasible at an earlier stage than would otherwise be the case.