MPEG-4 promoted to Committee Draft
Fribourg, October 31, 1997 - During its 41st meeting in Fribourg, Switzerland, mpeg took a major step towards finalizing the first stage of the new mpeg-4 Multimedia Standard, by promoting the ‘Audio’, ‘Video’, ‘Systems’, ‘DMIF’ and ‘Reference Software’ parts of the Standard to Committee Draft (CD). This means that the mpeg-4 standard under development has reached a very stable status, and that ‘National Standardization Bodies’ will now be asked to comment and vote on it. The subsequent stages towards reaching the status of International Standard in February ’99 are Final CD (July ’98) and Draft International Standard (December ’98).
The meeting in Fribourg, at the kind invitation of Schweizerische Normung Verein (SNV) and hosted by The College of Engineering of Fribourg, in Fribourg, Switzerland, was the 41st the Moving Pictures Experts Group held since its establishment nearly 10 years ago. Next to the hectic activity on MPEG-4 Version 1, there were delegates present to work on the development of the MPEG-7 Standard. This MPEG-7 group has produced new Requirements and Applications documents, and has started work on a Call for Proposals, that will be issued in Fall ’98. Also, some preliminary work was done on MPEG-4 Version 2. Version 2 will not replace, but rather extend what is in MPEG-4 Version 1, by adding new profiles. Version 2 will follow Version 1 will all phases shifted one year in time.
The MPEG-4 standard as it is currently defined in the ‘CDs’ makes it possible to integrate natural and synthetic audio, ‘classic’ rectangular video and moving video ‘objects’ with an arbitrary shape, animated faces and animated 2D meshes with several kinds of textures. Scalability is built into all the tools. The most bandwidth-hungry element, moving video, is currently optimized for operation at bitrates from as low as 5 kbit/s to as high as 5 Mbit/s. Interlaced as well as progressive content are supported. Audio covers the range from the extremely low bitrates (mainly for speech and synthetic audio) to transparent quality, multichannel audio. An exceptional speech quality was demonstrated at a mere 2 kbit/s. The Systems layer allows complex ‘scenes’ to be created, from the classical rectangular video with sound to virtual environments. The synchronised, real-time play-out of the different objects is taken care of by the MPEG-4 Systems layer, which also supports user interaction with the individual objects. A number of profiles have been defined for the Audio, Visual and Systems parts. These profiles define tool subsets that cater for a large class of applications.
MPEG is making available freely usable software, donated by companies participating in MPEG, for all relevant parts of the standards (Audio, Visual, Systems, DMIF), to any party wishing to use it for the development of MPEG-4 compliant products.
Consensus on how to identify the existence of Intellectual Property Rights on audiovisual content in MPEG-4 was reached between technical experts and representatives from several content industries, resulting in an optional ‘Intellectual Property Identification’ dataset that can be attached to audiovisual objects. In future work, MPEG will also address the persistence of this identification information, and the protection of the content itself.
MPEG was very pleased with the presence of representatives of VRML and of SC24 (the ISO/IEC committee that standardizes VRML). A cooperation framework has been defined, that combines the strengths of both the VRML and MPEG approaches. To MPEG, the result of this cooperation is to have the ‘BIFS’ specification harmonized with VRML’s textual description (BIFS stands for BInary Format for Scenes).
The remainder of this document gives more detailed information, organized according to the different subgroups.
Profiles have been defined for Audio, Video and Facial animation. Profiles are subsets of the complete tool set that can be used by many applications. (The most well-know profile for MPEG-2 is called ‘Main, usually deployed at ‘Main Level’.) For Audio, three hierarchical so-called ‘Composition Profiles’ have been defined: Main, Simple and Speech. Both the Simple and Main profiles contain tools for decoding of speech, natural and synthetic audio, Main being more complex and containing more tools.
Five Systems profiles were identified: 2D, 3D, VRML, Audio and Complete. The 2D and 3D profiles speak for themselves, but note that 3D is not a superset of 2D. The VRML profile maximizes the interoperation with the VRML specification; Audio is meant for applications that only need audio, and Complete includes all the so-called ‘BIFS-nodes’ (see below).
The most important visual ‘object profiles’ are called Simple and Core. Both are able to decode ‘arbitrary shape’ video objects. Core defines a more complex and more powerful tool set than Simple. Simple addresses low complexity applications at lower bitrates, while Core is meant for more demanding applications. Also two profiles for Facial Animation exist: Simple and Advanced.
For DMIF (see below) three profiles were defined: 1) broadcast, 2) broadcast and storage and 3) broadcast, storage and interaction.
MPEG-4 Version 1 Systems contains the basic set of tools to reconstruct a synchronous, interactive and streamed audiovisual scene: timing and buffer model, scene description (BIFS), scene « stream association through the Object Descriptor, synchronization of streams through the ‘AccessUnit Layer and efficient multiplexing of streams (FlexMux). In addition to these basic tools, Systems provides for the coding of information about the objects (OCI, ‘object content information’) and ‘back channel’ signalling functionalities.
Systems Version 2 will complete this set of tools. It will include: Advanced BIFS (new BIFS nodes, currenlty undergoing experiments), Adaptive Audio-visual Session (interfaces for interoperation of MPEG-4 media with Java), Content Return Channel (interaction with the sending side, within the MPEG-4 framework) and the MPEG-4 Intermedia Format. According to the MPEG-4 Requirements Document, this format shall support exchange/distribution of MPEG-4 content on storage media. It must also allow access to and ‘publishing’ of (parts) of the content in a flexible way. A Call for Proposals for this format has been issued at the Fribourg meeting, and contributions will be evaluated in San Jose, February ’98. The collaborative work will begin after the Tokyo meeting, in March ’98.
The draft MPEG-4 Audio Standard covers the range of audio applications, from low bitrate – down to 2 kbit/s per channel – for communications applications, through medium bitrate – in the order of 16 to 24 kbit/s per channel – for Internet radio and related applications, up to full broadcast quality applications – approximately 64 kbit/s per channel. The Audio standard supports natural speech and audio, synthetic or structured audio (e.g. music synthesis) and provides a communications or transport interface for Text to Speech (TTS) applications.
Software for both the encoder and decoder for MPEG-4 Audio has been donated by the companies who developed it, with copyright release for MPEG-4 development purposes.
Whenever a draft standard is in the course of preparation, one of the essential requirements of the process is to prove the interchangeability of software code and compressed bitstream files between development sites. During the inter-meeting gap since July 97, nearly 900 bitstreams and 74 software code packages were exchanged between the members of the group, to be run on different computer platforms at different sites.
During the Fribourg meeting, several implementations of MPEG-4 technology were demonstrated. Giving a full list here is impossible, but key amongst them were the following. AT&T and Fraunhofer Gesselschaft, Institut fur Integrierte Schaltungen demonstrated good quality AAC (Advance Audio Coding) compression down to 16 kb/s mono and stereo and showed high quality stereo coding at 64 kb/s. The principle of fine-step scaleable quantization was demonstrated by Samsung, who demonstrated the effects – or lack thereof! – as the decoder was swept through the bitrate range of 128 kb/s to 48 kb/s for stereo and back again. Text to Speech (TTS) Interface demonstrations were given by ETRI, Korea, showing TTS packages for English, Japanese and Korean speech working through the standardized MPEG-4 TTS Interface. (Note that MPEG only specifies this interface, and not the TTS system itself.)
Note: Contrary to what could be concluded from what was stated in the previous MPEG press release, 3-D audio (or ‘spatialized sound’) is planned for Version 1 of MPEG-4. It has already been demonstrated to work in test implementations of MPEG-4 audio composition. Version 2 of MPEG-4 will add ‘environmental auralization’, to the 3D audio capability. Environmental Auralization allows a model of the environment (e.g. a furnished room) to be taken into account in the audio rendering. MPEG regrets any confusion caused by this part of the previous press release.
The video group added tools for interlaced coding to the tool set. Until the previous meeting, the focus had been mainly on progressive material, but as the deadline was nearing, several companies realized that they wanted to use the MPEG-4 to be useful for interlaced material also, and stepped up the effort on the exchange of ‘interlaced bitstreams’. As the results were of very good quality, the Video Group decided to accept these tools in its Committee Draft.
Synthetic Natural Hybrid Coding (SNHC) Group
Tools will be included in the Visual CD for composition of natural and synthetic, 2D and 3D scenes that can integrate downloaded and streaming media. A dynamic 2D mesh tool provides animation of 2D objects for special effects in manipulating texture and video. A face animation tool provides the ability to animate 3D synthetic talking faces at extremely low bitrates. Image coding based on wavelets provides tools for high-quality compression of texture with SNR Scalability and Resolution Scalability. View-dependent scalable texture provides incremental decoding of 3D images (e.g. a landscape) in response to a user’s viewpoint in a terminal. These tools can manipulate or augment objects specified in an MPEG-4 audio/visual scene.
The animated 2D mesh tool provides efficient compression of 2D geometry and motion. If combined with Visual coding tools, it allows video object manipulation. SNHC tools such as 2D mesh and face animation are harmonized for use with Systems scene description tools.
Technologies such as body animation and 3D model coding for MPEG-4 version 2 have been reviewed, and work will continue.
Facial Animation has two ‘object profiles: ‘Simple and Advanced. Whereas compliance to the ‘Advanced’ profile implies using all the syntactic elements in the bitstream, ‘Simple’ allows discarding of the more complex instructions, while only mandating use of ‘facial animation parameters’ (FAPs). A local model is necessary when only FAPs are used.
DMIF (Delivery Multimedia Integration Framework) Group
The DMIF group at the Fribourg meeting has completed the Committee Draft of the Digital Storage Media - Command and Control conformance testing part of the MPEG-2 standard. That part allows the vendors of the MPEG-2 DSM-CC to indicate their conformance to the DSM-CC specification and carry out appropriate tests to verify this.
In addition, the DMIF group has completed the Committee Draft document for DMIF part of MPEG-4 Version 1. DMIF presents a consistent interface DMIF-Application Interface (DAI) to applications. DAI allows transparent access to MPEG-4 content on a remote interactive end-system, on local file storage or on broadcast media, using native signaling proper to the access technology used. DMIF Version 1 will undergo verification until July 1998 when a Final Committee Draft DMIF Version 1 document will be available.
Implementation Study Group
The Implementation Study Group identifies ways to reduce the implementation complexity of the standard without adversely affecting functionality and quality of the standard. Additionally, it provides guidance upon the setting of conformance points, which was the focus of attention during this meeting. A conformance point essentially specifies how complex the bitstreams are that the decoder should be able to decode without problems. Implementers of decoders should build their machines to these conformance points in order to comply to the standard.
In this way the content developer can be confident that the conformant MPEG-4 content will be displayed in the manner originally intended on a conformant terminal. To provide this guidance, ISG has identified the most computationally expensive tools. Over the coming months, it will characterize metrics for these tools and formalize conformance test procedures.
The work on ‘Computational Graceful Degradation’ (CGD) has come to fruition with the specification of CGD syntax for Video compression applications. This syntax provides advance notice of the processing load for each video frame. The decoder is therefore in a position to request the system operating system for extra resources in order to meet a peak in processing demand and/or memory resources.
Future MPEG meetings will be held in San Jose, US (2-6 February ’98) Tokyo, JP (March '98), Dublin, IE (July '98), Israel (Dec. '98) and Korea (Mar. '99). An MPEG-7 seminar with invited speakers will be organized during the meeting in San Jose on Tuesday 4 February. This seminar is open to non MPEG delegates as well.
For further information about MPEG, please contact:
Dr. Leonardo Chiariglione, (Convenor of mpeg)
Via G. Reiss Romoli, 274
10148 Torino, ITALY
Tel.: +39 11 228 6120; Fax: +39 11 228 6299
This press release and a wealth of other MPEG-related information can be found on the MPEG homepage:
The MPEG homepage has links to other MPEG pages, that are maintained by some of the subgroups. It also contains links to public documents, that are freely available for download to non-MPEG members.
Journalists that wish to receive MPEG Press Releases automatically can contact:
tel. +31 70 332 5310; fax +31 70 332 5567