Video for Linux API Proposal

These proposals do not cover the entire Video for Linux spec. The rest of the spec I consider OK as is, or I have no opinion on it.

Multiple Devices per System

Drivers should be able to support multiple devices, as long as the hardware can do it. It's trivial if the driver writer keeps all global variables in a device structure that begins with a struct video_device. All entry points into the driver pass in a pointer to this structure.

Multiple Opens per Device

Supporting multiple simultaneous capture operations on the same device is not practical because there is no open handle which can be used to differentiate the different capture contexts, and because streaming or frame buffer capture is impractical for more than one instance.

However, it would be really good to support two opens on a device for the purpose of having one open be for capturing, and the other for a GUI control panel application that can change brightness, select inputs, etc. along side the capturing application. A standard video control panel that works with all Video for Linux devices and that can run concurrently with any capturing application would be very cool, and would relieve all the application developers from each having to incorporate their own control panel.

[We could also support a scheme where there is one open controlling the capture, but other opens have access to the mmapped buffers and can select() on the driver.]

Query Capabilities - VIDIOC_G_CAP

This ioctl call is used to obtain the capability information for a video device. The driver will fill in a struct video_capability object.

**struct video_capability**
char name[32]		Friendly name for this device
int type		Device type and capability flags (see below)
int inputs		Number of video inputs that can be selected
int audios		Number of audio inputs that can be selected
int maxwidth		Best case maximum image capture width in pixels
int maxheight		Best case maximum image capture height in pixels
int minwidth		Minimum capture width in pixels
int minheight		Minimum capture height in pixels
int maxframerate		Maximum capture frame rate
int reserved[4]		reserved for future capabilities

Capability flags used in the **type** field:
VID_TYPE_CAPTURE		Can capture frames via the read() call
VID_TYPE_STREAMING		Can capture frames asynchronously into pre-allocated buffers
VID_TYPE_FRAMEBUF		Can capture directly into compatible graphics frame buffers
VID_TYPE_SELECT		Supports asynchronous I/O via the select() call
VID_TYPE_TUNER		Has a tuner of some form
VID_TYPE_MONOCHROME		Image capture is grey scale only
VID_TYPE_CODEC		Can compress/decompress images separately from capturing
VID_TYPE_FX		Can do special effects on images separately from capturing

Note that the minimum and maximum image capture dimensions are for comparison purposes only. The actual maximum size you can capture may depend on the capture parameters, including the pixel format, compression (if any), the video standard (PAL is higher resolution than NTSC), and possibly other parameters. Same applies to maximum frame rate. The minimum and maximum sizes do not imply that all combinations of height/width within the range are possible. For example, the Quickcam has three settings.

Capture to a frame buffer might not work depending on the capabilities of the graphics card, the graphics mode, the X Windows server, etc.

The Video Image Format Structure - struct video_format

The video image format structure is used in several ioctls. This structure completely defines the layout and format of an image or image buffer, including width, height, depth, pixel format, stride, and total size.

**struct video_format**
int width		Width in pixels
int height		Height in pixels
int depth		Average number of bits allocated per pixel. Does not apply to compressed images.
int pixelformat		The pixel format or type of compression
int flags		Format flags
int bytesperline		Stride from one line to the next. Only applies if the FMT_FLAG_BYTESPERLINE flag is set.
int sizeimage		Total size of the buffer to hold a complete image, in bytes

The depth is the amount of space in the buffer per pixel, in bits. The pixel information may not fill all bits allocated, e.g. RGB555 and RGB32. Leftover bits are undefined. For planar YUV formats the depth is the average number of bits per pixel. For example, YUV420 is eight bits per component, but the U and V planes are 1/4 the size of the Y plane so the average bits per pixel is 12. The pixelformat values and flags values are defined in the tables below.

Bytesperline is the number of bytes of memory between two adjacent lines. Since most of the time it's not needed, bytesperline only applies if the \ FMT_FLAG_BYTESPERLINE flag is set. Otherwise the field is undefined \ and must be ignored. For YUV planar formats, it's the stride of the Y plane.

Sizeimage is usually either \ width*height*depth /8 for \ uncompressed images, but it's different if bytesperline is used \ since there could be some padding between lines.

Values for the **pixelformat** and **depth** fields
PIX_FMT_RGB555	16	RGB-5-5-5 packed RGB format. High bit undefined
PIX_FMT_RGB565	16	RGB-5-6-5 packed RGB format
PIX_FMT_RGB24	24	RGB-8-8-8 packed into 24-bit words. B is at byte address 0.
PIX_FMT_RGB32	32	RGB-8-8-8 into 32-bit words. B is at byte address 0. Top 8 bits are undefined.
PIX_FMT_GREY	8	Linear grey scale. Greater values are brighter.
PIX_FMT_YVU9	9	YUV, planar, 8 bits/component. Y plane, 1/16-size V plane, 1/16-size U plane. (Note: V before U)
PIX_FMT_YUV420	12	YUV 4:2:0, planar, 8-bits per component. Y plane, 1/4-size U plane, 1/4-size V plane. (Note: U before V)
PIX_FMT_YUYV	16	YUV 4:2:2, 8 bits/component. Byte0 = Y0, Byte1 = U01, Byte2 = Y1, Byte3 = V01, etc.
PIX_FMT_UYVY	16	Same as YUYV, except U-Y-V-Y byte order
PIX_FMT_HI240	8	Bt848 8-bit color format
PIX_FMT_YUV422P8	8	8 bits packed as Y:4 bits, U:2 bits, V:2 bits

Flags defined for the **video_format flags** field
FMT_FLAG_BYTESPERLINE		The bytesperline field is valid
FMT_FLAG_COMPRESSED		The image is compressed. The depth and bytesperline fields do not apply.
FMT_FLAG_INTERLACED		The image consists of two interlaced fields
		[some of the flags bits should be set aside for format-specific use]

Capture Image Format - VIDIOC_G_FMT, VIDIOC_S_FMT

Use VIDIOC_S_FMT to set the capture image format. VIDIOC_G_FMT retrieves the current capture format. Both ioctls use a struct video_format to pass the format. Devices will not be able to support every combination of width and height. Upon a VIDIOC_S_FMT call, the driver will find the width and height compatible with the hardware which are as close as possible to the requested width and height without going over in either dimension. The driver will modify the structure to indicate the granted dimensions, and the resulting size of the image. Applications must make sure they are suitable.

Sizeimage is ignored on VIDIOC_S_FMT. On VIDIOC_G_FMT the driver will fill in the \ sizeimage field with the minimum required size of the capture \ buffer. A capture operation such as read() is allowed to fail if the buffer is smaller than sizeimage since a partial image read may be nonsensical or impractical to implement.

An interlaced image will have "comb" or "feathering" artifacts around moving objects. If the \ FMT_FLAG_INTERLACED flag is not set on \ VIDIOC_S_FMT, then that indicates the driver is not permitted to \ capture interlaced images. If the flag is set then the driver may (but is not required to) capture interlaced images if the requested vertical resolution is too high for a single field. \ FMT_FLAG_INTERLACED is set on return from \ VIDIOC_G_FMT only if the driver is actually going to capture \ interlaced images.

Compressed Capture - VIDIOC_G_COMP, VIDIOC_S_COMP

These ioctls set additional capture parameters needed for compressed capture. They both pass the information in a struct video_compression object. The keyframerate field only applies to temporal compression algorithms. The quality factor ranges from 0 to 65535.

**struct video_compression**
int quality		The quality factor
int keyframerate		How often to make a keyframe, in frames
int reserved[4]		reserved for more parameters

Reading Captured Images - read()

This capture mode is supported if the VID_TYPE_CAPTURE flag is set in the struct video_capabilities. Each call to read() will fill the buffer with a new frame. The driver may fail the read() if the length parameter is less than the required buffer size specified by the VIDIOC_G_FMT ioctl. This is reasonable since each call to read() starts over with a new frame, and a partial frame may be nonsense (e.g. for a compressed image) or impractical or inefficient to implement in the driver.

Non-blocking read() mode is supported in the usual way. Read() does not work if either streaming capture or hardware frame buffer capture is active.

Capturing to a Hardware Frame Buffer - VIDIOC_G_FBUF, VIDIOC_S_FBUF, VIDIOC_G_WIN, VIDIOC_S_WIN, VIDIOC_CAPTURE

This capture mode is supported if the VID_TYPE_FRAMEBUF flag is set in the struct video_capabilities. [This is very much like the current spec. We might add some get-capture-card-capabilities thing. For example the card I have can only DMA YUV4:2:2 data.]

VIDIOC_S_FBUF sets the frame buffer parameters. VIDIOC_G_FBUF returns the current parameters. The structure used by these ioctls is a struct video_buffer. Ideally the frame buffer would be a YUV 4:2:2 buffer the exact size (or possibly with some line padding) of the capture. It could also be the primary graphics surface, though. You must also use VIDIOC_S_WIN to set up the placement of the video window.

**struct video_buffer**
void *base		Physical base address of the frame buffer.
struct video_format fmt		Physical layout of the frame buffer
int flags		Additional frame buffer type flags

Flags for the **struct video_buffer** **flags** field
FBUF_FLAG_PRIMARY		The frame buffer is the primary graphics surface
FBUF_FLAG_OVERLAY		The frame buffer is an overlay surface the same size as the capture

Note that the buffer is often larger than the visible area, and so the fmt.bytesperline field is most likely valid. XFree86 DGA can provide the parameters required to set up this ioctl.

VIDIOC_G_WIN and VIDIOC_S_WIN work just like the existing VIDIOCGWIN and VIDIOCSWIN ioctls. Except:

The width and height fields of the struct video_window reflect the width and height of the image on the screen, not the width and height of the capture. In other words, the captured image may appear stretched on screen.
If the buffer is an overlay surface, the video data is always written into the buffer at coordinate 0,0 at the capture dimensions. (And it is up to X Windows and the application to place the overlay on the screen.)
These ioctls only apply to frame buffer capture. The capture dimensions are set with the VIDIOC_S_FMT ioctl.

VIDIOC_CAPTURE is the same as the existing \ VIDIOCCAPTURE ioctl.

Capturing Continuously to Pre-Allocated Buffers - VIDIOC_STREAMBUFS, VIDIOC_QUERYBUF, VIDIOC_STREAM, VIDIOC_QBUF, VIDIOC_NEXTBUF, VIDIOC_DQBUF

This capture mode is supported if the VID_TYPE_STREAM flag is set in the struct capture_capabilities.

First, the application must call VIDIOC_STREAMBUFS with the number and type of buffers that it wants. Upon return the driver will fill in how many buffers it will allow to be allocated. This ioctl takes a struct video_streambuffers object, see below. The only flag that's valid on VIDIOC_STREAMBUFS is BUF_FLAG_DEVICEMEM. To allocate the buffers call VIDIOC_QUERYBUF for each buffer to get the details about the buffer, and call mmap() to allocate and map the buffer. VIDIOC_QUERYBUF takes a struct video_buffer object with the index field filled in to indicate which buffer is being queried.

To do the capturing, call VIDIOC_QBUF to enqueue the buffers you want to be filled. This ioctl takes a struct video_buffer with the index field filled in to indicate which buffer to queue. The driver will internally queue the buffers in a capture queue. Then call \ VIDIOC_STREAM with the value of 1 to commence the capturing process. \ [I want to have a separate ioctl that starts the streaming mode because knowing when the stream began lets the driver compute each frame's place in the stream, adjust the timestamps to be integral multiples of the frame period thus erasing any interrupt latency, and to compute performance stats including number of dropped frames and actual delivered frames per second. Also the rest of the driver knows streaming is active and therefore can disallow changing the format or activating another capture mode.]

The driver will begin filling the buffers with frame data. Only buffers that have been queued will be filled. Once a buffer is filled, it will not be filled again until it has been explicitly dequeued and requeued by the application. The application can sleep until the next frame is done by calling \ VIDIOC_NEXTBUF, or select(). The two are equivalent. \ VIDIOC_NEXTBUF has no parameter. If no buffers are done then \ VIDIOC_NEXTBUF/select() will block until a buffer is done. If there \ is(are) already a buffer(s) done, then VIDIOC_NEXTBUF/select() will return immediately. It is not possible to wait on a specific buffer if there is more than one buffer queued. Call VIDIOC_DQBUF to dequeue the next ready buffer. VIDIOC_DQBUF takes a struct video_buffer objec. The driver will fill in all the fields. It is not possible to dequeue a specific buffer; buffers are always dequeued in the order in which they were captured. The bytesused field indicates how much data is in the buffer. After the data has been read out, the buffer should be queued up again to keep the frames flowing continuously. VIDIOC_DQBUF immediately returns an error if there is no buffer ready.

An application can call VIDIOC_QUERYBUF at any time for any buffer, and the driver will return the current status of the buffer. You can dynamically throttle the capture frame rate by only queueing buffers at the rate you want to capture.

Call VIDIOC_STREAM with the value of 0 to turn off streaming. If any buffers are queued for capture when streaming is turned off, they remain in the queue. Use munmap() to free the buffers.

There are certain things you can't do when streaming is active, for example changing the capture format, reading data through the read() call, or munmap()ing buffers.

**struct video_streambuffers**
int count		The number of buffers requested or granted
int flags		Flags concerning buffer attributes
int reserved[2]		reserved

**struct video_buffer**
int index		Which buffer number this is or which to query
int offset		Offset parameter to pass to mmap() to allocate this buffer
int length		Length parameter to pass to mmap()
int bytesused		The number of bytes of data in the buffer
struct timeval timestamp		Timestamp for the frame relative to when streaming was started
int flags		Flags concerning the attributes and current status of the buffer
int reserved[4]		reserved

Flags for the **flags** fields of **struct video_buffer** or **struct video_streambuffers**
BUF_FLAG_ALLOCATED		The buffer is currently allocated (and mmap()ed)
BUF_FLAG_DEVICEMEM		The buffer is physically located in the device's on-board memory
BUF_FLAG_QUEUED		The buffer is queued for capture (set by the driver on VIDIOC_QBUF)
BUF_FLAG_DONE		The buffer has data in it (set by the driver when the frame is captured, cleared by the driver on VIDIOC_QBUF)
BUF_FLAG_KEYFRAME		This frame is a keyframe or I frame (always set for uncompressed)
BUF_FLAG_PFRAME		This frame is a predicted frame (only for some compressions)
BUF_FLAG_BFRAME		This frame is a bidirectionally predicted frame (only for some compressions)

Waiting for Frames Using select()

The driver supports the select() call on its file descriptors if the VID_TYPE_SELECT flag is set in the struct capture_capabilities. If neither streaming nor frame buffer capture is active, select() returns when there is data ready to be read with the read() call. If streaming capture is running, select() returns when the next buffer is filled. The caller should be sure there is a buffer in the queue first. If frame buffer capture is running select() returns when the next frame has been written to the frame buffer.

Capture Parms - VIDIOC_G_PARM, VIDIOC_S_PARM

This is to control various parameters related to video capture. These ioctls use struct video_parm objects. The microsecperframe field only applies to read() and streaming capture. Capture to frame buffer always runs at the natural frame rate of the video.

High quality mode is intended for still imaging applications. The idea is to get the best possible image quality that the hardware can deliver. It is not defined how the driver writer may acheive that; it will depend on the hardware and the ingenuity of the driver writer. High quality mode is a different mode from the the regular motion video capture modes. In high quality mode:

The driver may be able to capture higher resolutions than for motion capture.
The driver may support fewer pixel formats than motion capture (e.g. true color).
The driver may capture and arithmetically combine multiple successive fields or frames to remove color edge artifacts and reduce the noise in the video data.
The driver may capture images in slices like a scanner in order to handle larger format images than would otherwise be possible.
An image capture operation may be significantly slower than motion capture.
Moving objects in the image might have excessive motion blur.
Capture might only work through the read() call.

**struct video_parm**
int input		Which video input is selected
int capability		The supported standards and capturemode flags
int capturemode		Capture mode flags
unsigned long microsecperframe		The desired frame rate expressed as microseconds per frame
int reserved[4]		reserved for future parameters

Flags for the **capturemode** and **capability** fields
CAP_MODE_HIGHQUALITY		High quality capture mode for imaging applications
CAP_MODE_VFLIP		The captured image is flipped vertically
CAP_MODE_HFLIP		The captured image is flipped horizontally
CAP_MICROSECPERFRAME		The driver supports programmable frame rates (capability field only)

Video Inputs - VIDIOC_G_INPUT

This ioctl retreives the properties of a video input into a struct video_input object. Before calling VIDIOC_G_INPUT the caller fills in the number field to indicate which input is being queried.

**struct video_input**
int number		The input to which these properties apply (set by the caller)
char name[32]		Friendly name of the input, preferably reflecting the label on the input itself
int tuners		Number of tuners on this input [do we need this?]
int type		Type of device, if known
int capability		Capability flags of this input
int reserved[4]		reserved for future input properties

Values for the **type** field
INPUT_TYPE_TUNER		This input is a TV tuner
INPUT_TYPE_CAMERA		This is a general purpose input

Flags for the **capability** field
INPUT_CAP_AUDIO		The input has an associated audio channel

Video Standard - VIDIOC_G_STD, VIDIOC_S_STD

These ioctls query and switch the video standard, e.g. NTSC, PAL, etc. The video standard selected applies to all inputs on the device. These ioctls pass a struct video_std object. VIDIOC_G_STD returns the current standard, and which standards are supported on the device. \ VIDIOC_S_STD selects a new standard.

It is worth stressing that switching the video standard is a big deal. Many capabilities of the capture card depend on the video standard selected, including the image resolution and frame rate. After changing the standard, the capture dimensions, required image buffer size, or other capture parameters may have changed. The caller should re-set-up the capture. [The driver may only allow a standard change when there is only one open on the device.]

[Logically, the standard should be on a per-input basis, but since changing the standard is so dangerous, and we want to be able to have a separate control panel that can select inputs independent of a capturing application, we don't want an input change to possibly change the standard. Also mixed standard devices on one system is extremely rare.]

**struct video_std**
int capability		The supported standards
int standard		The current video standard
int flags		undefined
int reserved		reserved

Flags for the **capability** field
CAP_STD_AUTO		The device supports standard auto-detect
CAP_STD_PAL		The device supports PAL mode
CAP_STD_NTSC		The device supports NTSC mode
CAP_STD_SECAM		The device supports SECAM mode

Values for the **standard** field
VIDEO_STD_AUTO		The device adjusts automatically or video standard does not apply
VIDEO_STD_PAL		PAL mode
VIDEO_STD_NTSC		NTSC mode
VIDEO_STD_SECAM		SECAM mode
		[there are some more regional flavors to the above standards too]

Video Tone Controls - VIDIOC_G_PICT, VIDIOC_S_PICT

These get or set the video tone control settings for the currently selected input. The settings are passed in a struct video_picture object. There are separate tone control settings for each input, so an application must do a VIDIOC_G_PICT after changing the input. All values are scaled between 0 and 65535. 32768 is always a safe neutral position, unless noted otherwise.

**struct video_picture**
int capability		Flags indicating which controls are supported
int brightness		Brightness or black level
int contrast		Contrast or luma gain
int colour		Color saturation or chroma gain (color only)
int hue		Hue (color only)
int whiteness		Whiteness (greyscale only)
int reserved[4]		reserved for future controls

**struct video_picture capability** flags
PICT_BRIGHTNESS		Brightness is supported
PICT_CONTRAST		Contrast is supported
PICT_COLOUR		Colour is supported (dig the British spelling)
PICT_HUE		Hue is supported
PICT_WHITENESS		Whiteness is supported

Tuning - VIDIOC_G_TUNER, VIDIOC_S_TUNER

Let's fill this in. Something like the existing VIDIOxTUNER is probably pretty close. Someone wanted to add fine tuning hint feedback from the tuner. We can probably get rid of the FREQ ioctls if we add a frequency field to struct video_tuner.

Compression/Decompression and Effects

This refers to performing operations on video frames that have been captured previously. This does not refer to compressed capture. It's possible that a device implementing some of these functions may not have capture capability at all.

Compression/Decompression is really just image format conversion, so we can have a general purpose image conversion interface. An ioctl to set up the conversion: input format, output format, other parameters. Use write() to send the input image and read() to read the result of the conversion. Or use mmap()ed buffers.

For special effects I'm thinking of devices that can accelerate fades, wipes, etc. in a video editing application. Again, this is image conversion, but there could be two (or more?) input images.

Still a research topic....