ASF: A universal container file format for synchronized media

January 1998

INTRODUCTION

ASF -- THE SPARK FOR THE MULTIMEDIA REVOLUTION
WHAT IS ASF?
BENEFITS
ASF IN A STREAMING MEDIA FRAMEWORK
 
DEVELOPMENT OF ASF
HISTORY
STANDARDS ADOPTION OF ASF
MICROSOFT SUPPORT FOR ASF
 
TECHNICAL OVERVIEW OF ASF
TECHNICAL GOALS OF ASF ASF FILE STRUCTURE
ASF OBJECTS
HEADER OBJECT
DATA OBJECT
INDEX OBJECT
 

Introduction: ASF -- The spark for the multimedia revolution

Multimedia streaming -- the remote or local delivery of synchronized media data like video, audio, text, and animation -- is a critical link in the digital multimedia revolution. Today, synchronized media is primarily about video and audio, but a richer, broader digital media era is emerging with profound and growing impact on the Internet and digital broadcasting.

Advanced Streaming Format (ASF) fills a critical link in this digital multimedia revolution. ASF provides an open, file-level interoperability solution for synchronized media across operating systems, transmission protocols, and composition frameworks.

ASF is in essence an open, extensible container format rather than a standard for the contents of the container. ASF is "agnostic" as to any particular codec (like MPEG), communication protocol (like HTTP, RTP, or multicast IP), or media composition framework (like MPEG-4 or Dynamic HTML). But ASF is intended to support all of these, and more.

ASF has been developed by a community of companies inspired by a vision for the potential of metadata-level interchange of multimedia data to overcome the current balkanization of multimedia formats. Key elements of this vision are:

ASF's file interoperability solution is gaining wide industry support among media tools, servers, clients and standards organizations. This is because of a growing recognition that ASF's interoperability and wide support will enable vendors, empower authors, excite consumers, and help to build the enabling infrastructure for the digital media era.


Introduction: What Is ASF?

ASF is a universal container file format for synchronized media. 

ASF is a storage container. Media streams in an ASF file are read by a media server and transmitted over a data communications transport protocol to a local client for rendering or local storage. The local client could also play an ASF file from its local storage.

The core definitional concepts are "universal container file format" and "synchronized media":

"Universal container file format." ASF is a presentation format, as opposed to an edit format. This means that it is designed for efficient playback of multiple media streams by media servers and clients. ASF files are editable, but ASF is not intended to be a replacement for high-end video editing formats, cut lists, or media authoring systems.

"Synchronized media." Synchronized media means multiple media objects that share a common timeline. An elementary example is video and audio -- each is a separate stream with its own data structure that must be played back concurrently. But virtually any media type can have a timeline. For example, an image object can change like an animated .gif file: text can change and move, and animation and digital effects happen over time. This concept of synchronizing multiple media types is gaining greater meaning and currency with the emergence of more sophisticated media composition frameworks implied by MPEG-4, Dynamic HTML, and other media playback environments.


Introduction: Benefits

ASF provides essential benefits across media industry segments.


Introduction: ASF in a Streaming Media Framework

To understand ASF and its benefits, it is important to understand its relationship to other aspects of streaming multimedia. Specifically, it is important to understand that ASF is not a codec, a data communications (or "wireline") protocol, or a media composition framework.

ASF and codecs. ASF is codec-independent -- it does not replace MPEG or any other media compression-decompression format. Data for any particular media type can be contained within ASF files and optionally synchronized with other media. An ASF file's header can contain component download information to help a client locate and download code to decompress or render a particular media type.

ASF and data communications protocols. ASF is data communications "agnostic." ASF data units may be carried by any conceivable underlying data communications transport. ASF is similarly agnostic about how the data is packetized by network or transmission protocols (for example, whether the multimedia data is sent in an interleaved or non-interleaved fashion).

When media data from an ASF file is transmitted over a particular protocol like RTP, the individual headers of ASF data units and streams may be examined and used to convert the media stream into a native packetized format of the network protocol. This eliminates a double-wrapper problem of duplicate data being transmitted both in protocol packet headers and ASF data unit headers.

Moreover, ASF is not a network control protocol like RTSP that tells a server when to start or stop playing a media file. However, ASF files contain information that should prove useful to control protocols.

ASF and media composition frameworks. ASF does not describe where different media streams should appear on the screen. At first glance, this might seem to be a shortcoming of ASF -- if the receiving client does not know where to display the various streams and objects on the screen, of what use is it to support multiple media streams in the first place?

Actually, this independence from a media composition or layout system is one of ASF's strengths and is essential for efficient media stream storage and transmission. Layout information -- for example, MPEG-4, streaming Dynamic HTML, and so on -- is "out-of-band," and must be determined by the receiving client. This could be handled in several ways -- for example, one media stream could contain the composition data that describes the intended presentation layout for the other streams in the file, or the presentation layout could be sent ahead of the media streams.

So why not embed layout information with each stream? This would dramatically complicate transmission of individual streams, make editing of the streams more difficult, and conflict with the authoring process. If ASF attempted to define a composition framework, it would require media types for MPEG-4, DHTML, SMIL, or any other composition information to be translated into and out of this ASF-specific framework. This is a fundamental problem with other formats that intertwine media composition and codecs with stream storage. A composition-agnostic system such as ASF can be remarkably efficient for storing any media, regardless of whether it has composition associated with it or not.


Development of ASF: History

File formats such as WAVE, AVI and QuickTime were originally designed only for local random access playback of synchronized media, in a time before the explosion of the Internet. These formats were not designed to be streamed: if served over a network, they are downloaded in their entirety and played locally.

But today, digital media file formats must handle efficient remote and local random access playback. Vendors of streaming multimedia products have thus found it necessary to design proprietary file formats for storing multimedia data on servers. Microsoft has ASFv1, Real Networks has RMFF, VDONet has VDO, Vivo has VIV, VXtreme had VXI, and there are others.

The design roots of ASF stem from the RIFF format (for example, AVI and WAVE), which IBM and Microsoft defined over a decade ago. Subsequently, several different companies defined new streaming file formats (for example, ASFv1, VIV, RMFF, and VXI file formats) to correct limitations within RIFF. Independently, AVI2 was defined to correct local recording problems with AVI, and QuickTime was also developed. The creators of WAVE, AVI, ASFv1, VIV, RMFF, AVI2, and RMFF then teamed together to define ASF.

The subsequent version was enhanced by an additional 40 companies. The most recent version was constructed as a result of a public design review process involving over 100 companies (and universities).

Thus, ASF is the result of perhaps tens of thousands of man-hours of experience in multimedia file formats. The result is a highly flexible and expandable multimedia file format tailored to simultaneously support the diverse needs of local playback (for example, CD-ROM, DVD, and hard disk), HTTP playback, and media server streaming.


Development of ASF: Standards Adoption of ASF

ASF has gained momentum as a complementing standard in several industry standards domains: MPEG-4. ASF has been proposed as the container format for MPEG-4 to the International Organization for Standardization (ISO).

SMPTE. ASF has been submitted to SMPTE as a file format to satisify their "Metadata and File Wrapper" request for technology.

RTP. A payload format has been defined for encapsulating in ASF streams in the Real-Time Transport Protocol.

MBone. An Internet-Draft has been submitted to the IETF for recording MBone Sessions to ASF Files

IETF. ASF is being submitted as an Informational RFC (this is not a standardization of it).

More details on standards initiatives underway relating to ASF are available.


Development of ASF: Microsoft Support for ASF

Microsoft has taken a lead role in the design of ASF, with the stated goal of replacing WAVE and AVI with ASF.

In furtherance of this commitment, Microsoft will release an SDK that will help tool, server, and other developers to implement ASF support in their products. Additionally, an update to NetShow 3.0 will support ASF (NetShow 3.0 and prior versions support ASFv1, a previous version of ASF that has not been released publicly and will be superceded).


Technical Overview of ASF: Technical Goals of ASF

This background of what ASF is and is not gives a context for the core goals of ASF: extensible media types, efficient media playback, support for scalable media types, authoring control over media stream relationships and prioritization, and O/S, protocol, and composition framework independence.

Extensible media types

One of the fundamental aspects of ASF is its extensibility. Although ASF defines structures to hold information such as indexing, scalability, and content information, which are necessary for any server to efficiently manipulate the file in basic ways, all of these structures, and the file format itself, are extensible.

Extensibility is accomplished in the following object-oriented way. ASF consists of a sequence of objects. Each object contains a header and a body. The header contains a type field (to identify the object type) and a length field (to specify the length of the object). The body contains type-specific fields, followed by a sequence of zero or more sub-objects. Any object type, or the file format itself, can be extended by adding sub-objects having new types. The type field for an object is always a 128-bit universally unique identifier (UUID). UUIDs can be generated at any time by any computer with a network card, and they are guaranteed to be globally unique. In this way, anyone at any time can generate a new object type, and no central registration is required. A pre-existing server, or any tool that manipulates the file, can safely ignore objects with types that it does not understand. In the event that a pre-existing tool has access to a distributed network registry, it can look up any unrecognized UUIDs, and possibly download code to handle the extension information. In this way, most tools can be automatically upgraded to handle the extension.

Efficient media playback

Efficient playback from media servers, HTTP servers, and local storage devices is a significant design challenge. Some of the issues include multiple simultaneous users, rapid processing of scalable content for different levels of bandwidth, indexes and markers, stream prioritization, recording, and live and on-demand playback. Historically, media file formats like AVI and QuickTime were designed for local playback only and did not need to take these design requirements into account.

Scalable media types

ASF is designed to express the dependency relationships between logical "bands" of scalable media types. It stores each band as a distinct media stream. Dependency information among these media streams is stored in the file header, providing sufficient information for clients to interpret scalability options (such as spatial, temporal, or quality scaling for video) in a compression-independent manner.

Authoring control over media stream relationships

Modern multimedia delivery systems can dynamically adjust to changing constraints (for example, available bandwidth). Authors of multimedia content must be able to express their preferences in terms of relative stream priorities as well as a minimum set of streams to deliver. Stream prioritization is complicated by the presence of scalable media types, since it is not always possible to determine the order of stream application at authoring time. ASF allows content authors to effectively communicate their preferences, even when scalable media streams are present.

O/S, composition, and protocol independence

ASF is designed to be independent of any particular multimedia composition system, computer operating system, or data communications protocol.

Component Download

Stream-specific information about playback components (for example, decompressors and renderers) can be stored in the file header. This information enables each client implementation to retrieve the appropriate version of the required playback component if it is not already present on the client machine.

Multiple Languages

ASF is designed to support multiple languages. Media streams can optionally indicate the language of the contained media. This feature is typically used for audio or text streams. A multilingual ASF file indicates that a set of media streams contains different language versions of the same content, allowing an implementation to choose the most appropriate version for a given client.

Bibliographic Information

ASF provides the capability to maintain extensive bibliographic information in a manner that is highly flexible and very extensible. All bibliographic information is stored in the file header in Unicode and is designed for multiple language support, if needed. Bibliographic fields can either be predefined (for example, author and title) or author-defined (for example, search terms). Bibliographic entries can apply to either the whole file or a single media stream.


Technical Overview of ASF: ASF File structure

An ASF file consists of three top-level objects: a Header Object, a Data Object, and an Index Object: 

The Header and Data objects are required, and there are only one each per ASF file. The Index Object is optional, but strongly recommended. A minimum implementation of an ASF file consists of a Header Object containing only a File Properties Object, a Stream Properties Object, a Language List Object, and a Data Object containing only a single ASF data unit.

ASF File Object Required/
Optional
Object Instances Description
Header Object Required One only The Header Object describes the ASF multimedia stream as a whole. It provides global information as well as specific information about the content contained within the media streams. This component can be transmitted separately over a reliable protocol.
Data Object Required One only This object contains the streamable multimedia data organized as data units which are sorted in terms of increasing send times.
Index Object Optional Zero or One This object contains index entries to data units within the Data Object. The index section is not subject to streaming, but can be used for fast lookup, search, and maintenance.
Other object Optional One or more The ASF definition permits another object, identified by its own unique UUID, to be defined.
 

Technical Overview of ASF: ASF Objects

The basic unit of organization for an ASF file is an ASF Object. It consists of a 128-bit globally unique identifier (GUID) for the object, a 64-bit integer object size, and variable length object data. The value of the object size field is the sum of 24 bytes plus the size of the object data in bytes.

An ASF Object is similar to a Resource Interchange File Format (RIFF) chunk, which is the basis for AVI and WAV files. ASF basically makes three fundamental changes to the RIFF format which are needed to improve its scalability, versatility, and flexibility:


Technical Overview of ASF: Header Object

The Header Object provides global information about the ASF file as a whole as well as specific information about the streams stored in the Data Object

The Header Object is a container for specific objects:

A client must receive the Header Object semantics before it can interpret the media data in the Data Object. This is a requirement of ASF, but how this information actually gets to the client is not specified by ASF and is a "local implementation issue." Possible approaches include:


Technical Overview of ASF: Data Object

The Data Object contains the data for all of the media streams stored within the file. The Data Object is a sorted collection of ASF data units, organized in terms of send time. A data unit is usually sized to contain only one media object (for example, a frame) but it may contain multiple (small) media objects or only a fragment of a (large) media object.

Each data unit potentially has two timestamps associated with it: the Send Time and the Presentation Time. The Send Time is when the server should transmit the data to the receiver; the Presentation Time is when the receiver should render the data. This send/presentation time mechanism helps to smooth out the presentation of data that might be transmitted over a bursty transmission medium. All time fields in an ASF use a common timeline, which begins at time zero.


Technical Overview of ASF: Index Object

The purpose of the Index Object is to speed file access to the streams contained in the Data Object to permit random accesses into the file (for example, fast-forward, rewind, and jumps to specific points, efficient processing of stream prioritizations, and so on). The Index contains a time-based index into the data in the Data Object. An Index Object is not required, and if one exists it may not index every media stream in the file.

A key reason to include an index is that a single server needs to simultaneously serve thousands of streams, and the file format must be computationally easy to manipulate. A networked server must not only support continuous playback and random access of the media streams; it must also support scalability to network and client conditions by demultiplexing the file, selecting an appropriate subset of streams, and remultiplexing them in the transcoded transmission format. Thus, it is vitally important in today's world that new stored file formats be designed for efficient remote service, while simultaneously supporting the needs of local usages. Should the Index inadvertently become lost or corrupted, it can be recreated by using information from the Index Parameters Object that was sent with the Header Object.

[Please note that the Marker Object in the Header Object is the normal way for client-based random accesses (for example, jump points to specific, named parts of a presentation) to occur.]

 

© 1997 Microsoft Corporation. All rights reserved. Terms of Use.