Video for Linux API Proposal

These proposals do not cover the entire Video for Linux spec. The rest of the spec I consider OK as is, or I have no opinion on it.

Multiple Devices per System

Drivers should be able to support multiple devices, as long as the hardware can do it. It's trivial if the driver writer keeps all global variables in a device structure that begins with a struct video_device. All entry points into the driver pass in a pointer to this structure.

Multiple Opens per Device

Supporting multiple simultaneous capture operations on the same device is not practical because there is no open handle which can be used to differentiate the different capture contexts, and because streaming or frame buffer capture is impractical for more than one instance.

However, it would be really good to support two opens on a device for the purpose of having one open be for capturing, and the other for a GUI control panel application that can change brightness, select inputs, etc. along side the capturing application. A standard video control panel that works with all Video for Linux devices and that can run concurrently with any capturing application would be very cool, and would relieve all the application developers from each having to incorporate their own control panel.

Query Capabilities - VIDIOC_G_CAP

This ioctl call is used to obtain the capability information for a video device. The driver will fill in a struct video_capability object.

**struct video_capability**
char name[32]		Friendly name for this device
int type		Device type and capability flags (see below)
int inputs		Number of video inputs that can be selected
int audios		Number of audio inputs that can be selected
int maxwidth		Best case maximum image capture width in pixels
int maxheight		Best case maximum image capture height in pixels
int minwidth		Minimum capture width in pixels
int minheight		Minimum capture height in pixels
int maxframerate		Maximum capture frame rate
int reserved[4]		reserved for future capabilities

Capability flags used in the **type** field:
VID_TYPE_CAPTURE		Can capture frames via the read() call
VID_TYPE_STREAMING		Can capture frames asynchronously into pre-allocated buffers
VID_TYPE_FRAMEBUF		Can capture directly into compatible graphics frame buffers
VID_TYPE_SELECT		Supports asynchronous I/O via the select() call
VID_TYPE_TUNER		Has a tuner of some form
VID_TYPE_MONOCHROME		Image capture is grey scale only
VID_TYPE_CODEC		Can compress/decompress images separately from capturing
VID_TYPE_FX		Can do special effects on images separately from capturing

Note that the minimum and maximum image capture dimensions are for comparison purposes only. The actual maximum size you can capture may depend on the capture parameters, including the pixel format, compression (if any), the video standard (PAL is higher resolution than NTSC), and possibly other parameters. Same applies to maximum frame rate. The minimum and maximum sizes do not imply that all combinations of height/width within the range are possible. For example, the Quickcam has three settings.

Capture to a frame buffer might not work depending on the capabilities of the graphics card, the graphics mode, the X Windows server, etc.

Capture Image Format - VIDIOC_G_FMT, VIDIOC_S_FMT

The capture format defines how the image is layed out in the capture buffer. It defines the dimensions of the image and the pixel format or compression format. The information is exchanged in a struct video_format object.

**struct video_format**
int width		Capture width in pixels
int height		Capture height in pixels
int depth		Average number of bits allocated per pixel. Does not apply to compressed images.
int pixelformat		The pixel format or type of compression
int flags		Format flags
int bytesperline		Stride from one line to the next. Only applies if the FMT_FLAG_BYTESPERLINE flag is set.
int sizeimage		Minimum required size of the buffer to hold a complete image (Get only)

Devices will not be able to support every combination of width and height. The driver will find the width and height compatible with the hardware which are as close as possible to the requested width and height without going over in either dimension. Applications must do a VIDIOC_G_FMT to get the actual dimensions granted and make sure they are suitable.

The depth is the amount of space in the buffer per pixel, in bits. The pixel information may not fill all bits allocated, e.g. RGB555 and RGB32. Leftover bits are undefined. For planar YUV formats the depth is the average number of bits per pixel. For example, YUV420 is eight bits per component, but the U and V planes are 1/4 the size of the Y plane so the average bits per pixel is 12. The pixelformat values and flags values are defined in the tables below.

Bytesperline is the number of bytes of memory between two adjacent lines. Since most of the time it's not needed, bytesperline only applies if the \ FMT_FLAG_BYTESPERLINE flag is set. Otherwise the field is undefined \ and must be ignored. For YUV planar formats, it's the stride of the Y plane.

Sizeimage is usually \ width*height*depth for \ uncompressed images, but it's different if bytesperline is used \ since there could be some padding between lines. Sizeimage is \ ignored on set. A capture operation such as read() is allowed to fail if the \ buffer is smaller than sizeimage since a partial image read may be \ nonsensical or impractical to implement.

Values for the **pixelformat** and **depth** fields
PIX_FMT_RGB555	16	RGB-5-5-5 packed RGB format. High bit undefined
PIX_FMT_RGB565	16	RGB-5-6-5 packed RGB format
PIX_FMT_RGB24	24	RGB-8-8-8 packed into 24-bit words. B is at byte address 0.
PIX_FMT_RGB32	32	RGB-8-8-8 into 32-bit words. B is at byte address 0. Top 8 bits are undefined.
PIX_FMT_GREY	8	Linear grey scale. Greater values are brighter.
PIX_FMT_YVU9	9	YUV, planar, 8 bits/component. Y plane, 1/16-size V plane, 1/16-size U plane. (Note: V before U)
PIX_FMT_YUV420	12	YUV 4:2:0, planar, 8-bits per component. Y plane, 1/4-size U plane, 1/4-size V plane. (Note: U before V)
PIX_FMT_YUYV	16	YUV 4:2:2, 8 bits/component. Byte0 = Y0, Byte1 = U01, Byte2 = Y1, Byte3 = V01, etc.
PIX_FMT_UYVY	16	Same as YUYV, except U-Y-V-Y byte order
PIX_FMT_HI240	8	Bt848 8-bit color format
PIX_FMT_YUV422P8	8	8 bits packed as Y:4 bits, U:2 bits, V:2 bits

Flags defined for the **video_format flags** field
FMT_FLAG_BYTESPERLINE		The bytesperline field is valid
FMT_FLAG_COMPRESSED		The image is compressed. The depth and bytesperline fields do not apply.
FMT_FLAG_INTERLACED		The image consists of two interlaced fields

[some of the flags bits should be set aside for format-specific use]

An interlaced image will have "comb" or "feathering" artifacts around moving objects. If the \ FMT_FLAG_INTERLACED flag is not set on \ VIDIOC_S_FMT, then the driver is not permitted to capture interlaced \ images. If the flag is set then the driver may (but is not required to) \ capture interlaced images if the requested vertical resolution is too high for a single field. FMT_FLAG_INTERLACED is set on return from VIDIOC_G_FMT only if the driver is actually going to capture interlaced images.

Compressed Capture - VIDIOC_G_COMP, VIDIOC_S_COMP

These ioctls set additional capture parameters needed for compressed capture. They both pass the information in a struct video_compression object. The keyframerate field only applies to temporal compression algorithms. The quality factor ranges from 0 to 65535.

**struct video_compression**
int quality		The quality factor
int keyframerate		How often to make a keyframe, in frames
int reserved[4]		reserved for more parameters

Reading Captured Images - read()

This capture mode is supported if the VID_TYPE_CAPTURE flag is set in the struct video_capabilities. Each call to read() will fill the buffer with a new frame. The driver may fail the read() if the length parameter is less than the required buffer size specified by the VIDIOC_G_FMT ioctl(). This is reasonable since each call to read() starts over with a new frame, and a partial frame may be nonsense (e.g. for a compressed image) or impractical or inefficient to implement in the driver.

Non-blocking read() mode is supported in the usual way. Read() does not work if either streaming capture or hardware frame buffer capture is active.

Capturing to a Hardware Frame Buffer - VIDIOC_G_FBUF, VIDIOC_S_FBUF, VIDIOC_G_WIN, VIDIOC_S_WIN, VIDIOC_CAPTURE

This capture mode is supported if the VID_TYPE_FRAMEBUF flag is set in the struct video_capabilities. [This is very much like the current spec. We might add some get- capture- card- capabilities thing. For example the card I have can only DMA YUV4:2:2 data.]

VIDIOC_S_FBUF sets the frame buffer parameters. VIDIOC_G_FBUF returns the current parameters. The structure used by these ioctls is a struct video_buffer. Ideally the frame buffer would be a YUV 4:2:2 buffer the exact size (or possibly with some line padding) of the capture. It could also be the primary graphics surface, though. You must also use VIDIOC_S_WIN to set up the placement of the video window in the frame buffer.

**struct video_buffer**
*void base**		Physical base address of the frame buffer.
struct video_format fmt		Physical layout of the frame buffer

Note that the buffer is often larger than the visible area, and so the fmt.bytesperline field is most likely valid. XFree86 DGA can provide the parameters required to set up this ioctl.

VIDIOC_G_WIN and VIDIOC_S_WIN work just like the existing VIDIOCGWIN and VIDIOCSWIN ioctls. Except:

The width and height fields of the struct video_window reflect the width and height of the image on the screen, not the width and height of the capture. In other words the captured image may appear stretched on screen.
These ioctls only apply to frame buffer capture. The capture dimensions are set with the VIDIOC_x_FMT ioctls.

VIDIOC_CAPTURE is the same as the existing \ VIDIOCCAPTURE ioctl.

Capturing Continuously to Pre-Allocated Buffers - VIDIOC_G_STREAM, VIDIOC_S_STREAM, VIDIOC_MCAPTURE, VIDIOC_SYNC

This capture mode is supported if the VID_TYPE_STREAM flag is set in the struct capture_capabilities.

I need help on this one! I guess similar to the way the Bt848 driver is doing it now is the best we can do. The driver allocates page-locked buffers in kernel space. The app uses VIDIOC_S_STREAM to request how many buffers it wants, and VIDIOC_G_STREAM to find out how many buffers the driver agreed to allocate. The layout of the buffers is defined with VIDIOC_S_FMT. If the format changes, the buffers are freed. Mmap is used to map the buffers to user space. Calls to VIDIOC_MCAPTURE add buffers to a queue. When a buffer is filled the app is notified (VIDIOC_SYNC or select() unblocks?). Somehow the caller needs to know which buffer is ready. When the app has consumed the data it requeues the buffer again.

When a buffer is filled, the caller will need a way to get the number of bytes in the buffer (varies for compressed formats) and the time stamp of when the frame was captured. I guess the caller can do a VIDIOC_G_STREAM after a frame completes to get this information. Ok, how about VIDIOC_G_STREAM returns a structure something like this:

int numbuffers		Number of buffers allocated by the driver
int buffer		Buffer that the following fields refer to (caller fills in)
unsigned long timecaptured		Time stamp in milliseconds when the frame was captured. Time is relative to when capture started.
int bytesused		Number of bytes of data that need to be read from the buffer
int flags		ISQUEUED, KEYFRAME, ...?

Waiting for Frames Using select()

The driver supports the select() call on its file descriptors if the VID_TYPE_SELECT flag is set in the struct capture_capabilities. Details to come... (Thanks, Aaron)

Capture Parms - VIDIOC_G_PARM, VIDIOC_S_PARM

This is to control various parameters related to video capture. These ioctls use struct video_parm objects. The microsecperframe field only applies to read() and streaming capture. Capture to frame buffer always runs at the natural frame rate of the video.

It is worth stressing that switching the video standard is a big deal. Many capabilities of the capture card depend on the video standard selected, including the image resolution and frame rate. After changing the standard the capture dimensions, required image buffer size, or other capture parameters may have changed. The caller should re-set-up the capture. [The driver may only allow a standard change when there is only one open on the device.]

[Logically, the standard should be on a per-input basis, but since changing the standard is so dangerous, and we want to be able to have a separate control panel that can select inputs independent of a capturing application, we don't want an input change to possibly change the standard. Also mixed standard devices on one system is extremely rare. So I put it here.]

**struct video_parm**
int input		Which video input is selected
int capability		The supported standards and capturemode flags
int standard		The video standard (NTSC, PAL, SECAM)
int capturemode		Capture mode flags
unsigned long microsecperframe		The desired frame rate expressed as microseconds per frame
int reserved[4]		reserved for future parameters

Flags for the capturemode and capability fields [not done yet]

Settings for the standard field [not done yet]

Video Inputs - VIDIOC_G_INPUT

This ioctl retreives the properties of a video input into a struct video_input object. Before calling VIDIOC_G_INPUT the caller fills in the number field to indicate which input is being queried.

**struct video_input**
int number		The input to which these properties apply (set by the caller)
char name[32]		Friendly name of the input, preferably reflecting the label on the input itself
int tuners		Number of tuners on this input [do we need this?]
int capability		Capability flags of this input
int type		Type of device if known

int reserved[4]		reserved for future input properties

Video Tone Controls - VIDIOC_G_PICT, VIDIOC_S_PICT

These get or set the video tone control settings for the currently selected input. The settings are passed in a struct video_picture object. There are separate tone control settings for each input, so an application must do a VIDIOC_G_PICT after changing the input. All values are scaled between 0 and 65535. 32768 is always a safe neutral position.

**struct video_picture**
int capability		Flags indicating which controls are supported
int brightness		Brightness or black level
int contrast		Contrast or luma gain
int colour		Color saturation or chroma gain (color only)
int hue		Hue (color only)
int whiteness		Whiteness (greyscale only)
int reserved[4]		reserved for future controls

**struct video_picture capability** flags
PICT_FLAG_BRIGHTNESS		Brightness is supported
PICT_FLAG_CONTRAST		Contrast is supported
PICT_FLAG_COLOUR		Colour is supported
PICT_FLAG_HUE		Hue is supported
PICT_FLAG_WHITENESS		Whiteness is supported

Tuning - VIDIOC_G_TUNER, VIDIOC_S_TUNER

Let's fill this in. Something like the existing VIDIOxTUNER is probably pretty close. Someone wanted to add fine tuning hint feedback from the tuner. We can probably get rid of the FREQ ioctls if we add a frequency field to struct video_tuner.

Compression/Decompression and Effects

This refers to performing operations on video frames that have been captured previously. This does not refer to compressed capture. It's possible that a device implementing some of these functions may not have capture capability at all.

Ideas?