Video for Linux API Proposal

These proposals do not cover the entire Video for Linux spec. The rest of the spec I consider OK as is, or I have no opinion on it.

Multiple Devices per System

Drivers should be able to support multiple devices, as long as the hardware can do it. It's trivial if the driver writer keeps all global variables in a device structure that begins with a struct video_device. All entry points into the driver pass in a pointer to this structure.

Multiple Opens per Device

Supporting multiple simultaneous capture operations on the same device is not practical because there is no open handle which can be used to differentiate the different capture contexts, and because streaming or frame buffer capture is impractical for more than one instance.

However, it would be really good to support two opens on a device for the purpose of having one open be for capturing, and the other for a GUI control panel application that can change brightness, select inputs, etc. along side the capturing application. A standard video control panel that works with all Video for Linux devices and that can run concurrently with any capturing application would be very cool, and would relieve all the application developers from each having to incorporate their own control panel.

 

Query Capabilities - VIDIOC_G_CAP

This ioctl call is used to obtain the capability information for a video device. The driver will fill in a struct video_capability object.

struct video_capability
char name[32]   Friendly name for this device
int type   Device type and capability flags (see below)
int inputs   Number of video inputs that can be selected
int audios   Number of audio inputs that can be selected
int maxwidth   Best case maximum image capture width in pixels
int maxheight   Best case maximum image capture height in pixels
int minwidth   Minimum capture width in pixels
int minheight   Minimum capture height in pixels
int maxframerate   Maximum capture frame rate
int reserved[4]   reserved for future capabilities
     
Capability flags used in the type field:
VID_TYPE_CAPTURE   Can capture frames via the read() call
VID_TYPE_STREAMING   Can capture frames asynchronously into pre-allocated buffers
VID_TYPE_FRAMEBUF   Can capture directly into compatible graphics frame buffers
VID_TYPE_SELECT   Supports asynchronous I/O via the select() call
VID_TYPE_TUNER   Has a tuner of some form
VID_TYPE_MONOCHROME   Image capture is grey scale only
VID_TYPE_CODEC   Can compress/decompress images separately from capturing
VID_TYPE_FX   Can do special effects on images separately from capturing

Note that the minimum and maximum image capture dimensions are for comparison purposes only. The actual maximum size you can capture may depend on the capture parameters, including the pixel format, compression (if any), the video standard (PAL is higher resolution than NTSC), and possibly other parameters. Same applies to maximum frame rate. The minimum and maximum sizes do not imply that all combinations of height/width within the range are possible. For example, the Quickcam has three settings.

Capture to a frame buffer might not work depending on the capabilities of the graphics card, the graphics mode, the X Windows server, etc.

 

Capture Image Format - VIDIOC_G_FMT, VIDIOC_S_FMT

The capture format defines how the image is layed out in the capture buffer. It defines the dimensions of the image and the pixel format or compression format. The information is exchanged in a struct video_format object.

struct video_format
int width   Capture width in pixels
int height   Capture height in pixels
int depth   Average number of bits allocated per pixel. Does not apply to compressed images.
int pixelformat   The pixel format or type of compression
int flags   Format flags
int bytesperline   Stride from one line to the next. Only applies if the FMT_FLAG_BYTESPERLINE flag is set.
int sizeimage   Minimum required size of the buffer to hold a complete image (Get only)

Devices will not be able to support every combination of width and height. The driver will find the width and height compatible with the hardware which are as close as possible to the requested width and height without going over in either dimension. Applications must do a VIDIOC_G_FMT to get the actual dimensions granted and make sure they are suitable.

The depth is the amount of space in the buffer per pixel, in bits. The pixel information may not fill all bits allocated, e.g. RGB555 and RGB32. Leftover bits are undefined. For planar YUV formats the depth is the average number of bits per pixel. For example, YUV420 is eight bits per component, but the U and V planes are 1/4 the size of the Y plane so the average bits per pixel is 12. The pixelformat values and flags values are defined in the tables below.

Bytesperline is the number of bytes of memory between two adjacent lines. Since most of the time it's not needed, bytesperline only applies if the \ FMT_FLAG_BYTESPERLINE flag is set. Otherwise the field is undefined \ and must be ignored. For YUV planar formats, it's the stride of the Y plane.

Sizeimage is usually \ width*height*depth for \ uncompressed images, but it's different if bytesperline is used \ since there could be some padding between lines. Sizeimage is \ ignored on set. A capture operation such as read() is allowed to fail if the \ buffer is smaller than sizeimage since a partial image read may be \ nonsensical or impractical to implement.

Values for the pixelformat and depth fields
PIX_FMT_RGB555   16   RGB-5-5-5 packed RGB format. High bit undefined
PIX_FMT_RGB565   16   RGB-5-6-5 packed RGB format
PIX_FMT_RGB24   24   RGB-8-8-8 packed into 24-bit words. B is at byte address 0.
PIX_FMT_RGB32   32   RGB-8-8-8 into 32-bit words. B is at byte address 0. Top 8 bits are undefined.
PIX_FMT_GREY   8   Linear grey scale. Greater values are brighter.
PIX_FMT_YVU9   9   YUV, planar, 8 bits/component. Y plane, 1/16-size V plane, 1/16-size U plane. (Note: V before U)
PIX_FMT_YUV420   12   YUV 4:2:0, planar, 8-bits per component. Y plane, 1/4-size U plane, 1/4-size V plane. (Note: U before V)
PIX_FMT_YUYV   16   YUV 4:2:2, 8 bits/component. Byte0 = Y0, Byte1 = U01, Byte2 = Y1, Byte3 = V01, etc.
PIX_FMT_UYVY   16   Same as YUYV, except U-Y-V-Y byte order
PIX_FMT_HI240   8   Bt848 8-bit color format
PIX_FMT_YUV422P8   8   8 bits packed as Y:4 bits, U:2 bits, V:2 bits
         
Flags defined for the video_format flags field
FMT_FLAG_BYTESPERLINE   The bytesperline field is valid
FMT_FLAG_COMPRESSED   The image is compressed. The depth and bytesperline fields do not apply.
FMT_FLAG_INTERLACED   The image consists of two interlaced fields
     
[some of the flags bits should be set aside for format-specific use]    

An interlaced image will have "comb" or "feathering" artifacts around moving objects. If the \ FMT_FLAG_INTERLACED flag is not set on \ VIDIOC_S_FMT, then the driver is not permitted to capture interlaced \ images. If the flag is set then the driver may (but is not required to) \ capture interlaced images if the requested vertical resolution is too high for a single field. FMT_FLAG_INTERLACED is set on return from VIDIOC_G_FMT only if the driver is actually going to capture interlaced images.

 

Compressed Capture - VIDIOC_G_COMP, VIDIOC_S_COMP

These ioctls set additional capture parameters needed for compressed capture. They both pass the information in a struct video_compression object. The keyframerate field only applies to temporal compression algorithms. The quality factor ranges from 0 to 65535.

struct video_compression
int quality   The quality factor
int keyframerate   How often to make a keyframe, in frames
int reserved[4]   reserved for more parameters

 

 

Reading Captured Images - read()

This capture mode is supported if the VID_TYPE_CAPTURE flag is set in the struct video_capabilities. Each call to read() will fill the buffer with a new frame. The driver may fail the read() if the length parameter is less than the required buffer size specified by the VIDIOC_G_FMT ioctl(). This is reasonable since each call to read() starts over with a new frame, and a partial frame may be nonsense (e.g. for a compressed image) or impractical or inefficient to implement in the driver.

Non-blocking read() mode is supported in the usual way. Read() does not work if either streaming capture or hardware frame buffer capture is active.

 

Capturing to a Hardware Frame Buffer - VIDIOC_G_FBUF, VIDIOC_S_FBUF, VIDIOC_G_WIN, VIDIOC_S_WIN, VIDIOC_CAPTURE

This capture mode is supported if the VID_TYPE_FRAMEBUF flag is set in the struct video_capabilities. [This is very much like the current spec. We might add some get- capture- card- capabilities thing. For example the card I have can only DMA YUV4:2:2 data.]

VIDIOC_S_FBUF sets the frame buffer parameters. VIDIOC_G_FBUF returns the current parameters. The structure used by these ioctls is a struct video_buffer. Ideally the frame buffer would be a YUV 4:2:2 buffer the exact size (or possibly with some line padding) of the capture. It could also be the primary graphics surface, though. You must also use VIDIOC_S_WIN to set up the placement of the video window in the frame buffer.

struct video_buffer
void *base   Physical base address of the frame buffer.
struct video_format fmt   Physical layout of the frame buffer

Note that the buffer is often larger than the visible area, and so the fmt.bytesperline field is most likely valid. XFree86 DGA can provide the parameters required to set up this ioctl.

VIDIOC_G_WIN and VIDIOC_S_WIN work just like the existing VIDIOCGWIN and VIDIOCSWIN ioctls. Except:

  1. The width and height fields of the struct video_window reflect the width and height of the image on the screen, not the width and height of the capture. In other words the captured image may appear stretched on screen.
  2. These ioctls only apply to frame buffer capture. The capture dimensions are set with the VIDIOC_x_FMT ioctls.

VIDIOC_CAPTURE is the same as the existing \ VIDIOCCAPTURE ioctl.

 

Capturing Continuously to Pre-Allocated Buffers - VIDIOC_G_STREAM, VIDIOC_S_STREAM, VIDIOC_MCAPTURE, VIDIOC_SYNC

This capture mode is supported if the VID_TYPE_STREAM flag is set in the struct capture_capabilities.

I need help on this one! I guess similar to the way the Bt848 driver is doing it now is the best we can do. The driver allocates page-locked buffers in kernel space. The app uses VIDIOC_S_STREAM to request how many buffers it wants, and VIDIOC_G_STREAM to find out how many buffers the driver agreed to allocate. The layout of the buffers is defined with VIDIOC_S_FMT. If the format changes, the buffers are freed. Mmap is used to map the buffers to user space. Calls to VIDIOC_MCAPTURE add buffers to a queue. When a buffer is filled the app is notified (VIDIOC_SYNC or select() unblocks?). Somehow the caller needs to know which buffer is ready. When the app has consumed the data it requeues the buffer again.

When a buffer is filled, the caller will need a way to get the number of bytes in the buffer (varies for compressed formats) and the time stamp of when the frame was captured. I guess the caller can do a VIDIOC_G_STREAM after a frame completes to get this information. Ok, how about VIDIOC_G_STREAM returns a structure something like this:

int numbuffers   Number of buffers allocated by the driver
int buffer   Buffer that the following fields refer to (caller fills in)
unsigned long timecaptured   Time stamp in milliseconds when the frame was captured. Time is relative to when capture started.
int bytesused   Number of bytes of data that need to be read from the buffer
int flags   ISQUEUED, KEYFRAME, ...?

 

Waiting for Frames Using select()

The driver supports the select() call on its file descriptors if the VID_TYPE_SELECT flag is set in the struct capture_capabilities. Details to come... (Thanks, Aaron)

 

Capture Parms - VIDIOC_G_PARM, VIDIOC_S_PARM

This is to control various parameters related to video capture. These ioctls use struct video_parm objects. The microsecperframe field only applies to read() and streaming capture. Capture to frame buffer always runs at the natural frame rate of the video.

It is worth stressing that switching the video standard is a big deal. Many capabilities of the capture card depend on the video standard selected, including the image resolution and frame rate. After changing the standard the capture dimensions, required image buffer size, or other capture parameters may have changed. The caller should re-set-up the capture. [The driver may only allow a standard change when there is only one open on the device.]

[Logically, the standard should be on a per-input basis, but since changing the standard is so dangerous, and we want to be able to have a separate control panel that can select inputs independent of a capturing application, we don't want an input change to possibly change the standard. Also mixed standard devices on one system is extremely rare. So I put it here.]

struct video_parm
int input   Which video input is selected
int capability   The supported standards and capturemode flags
int standard   The video standard (NTSC, PAL, SECAM)
int capturemode   Capture mode flags
unsigned long microsecperframe   The desired frame rate expressed as microseconds per frame
int reserved[4]   reserved for future parameters

Flags for the capturemode and capability fields [not done yet]

Settings for the standard field [not done yet]

 

 

Video Inputs - VIDIOC_G_INPUT

This ioctl retreives the properties of a video input into a struct video_input object. Before calling VIDIOC_G_INPUT the caller fills in the number field to indicate which input is being queried.

struct video_input
int number   The input to which these properties apply (set by the caller)
char name[32]   Friendly name of the input, preferably reflecting the label on the input itself
int tuners   Number of tuners on this input [do we need this?]
int capability   Capability flags of this input
int type   Type of device if known
     
int reserved[4]   reserved for future input properties

 

Video Tone Controls - VIDIOC_G_PICT, VIDIOC_S_PICT

These get or set the video tone control settings for the currently selected input. The settings are passed in a struct video_picture object. There are separate tone control settings for each input, so an application must do a VIDIOC_G_PICT after changing the input. All values are scaled between 0 and 65535. 32768 is always a safe neutral position.

struct video_picture
int capability   Flags indicating which controls are supported
int brightness   Brightness or black level
int contrast   Contrast or luma gain
int colour   Color saturation or chroma gain (color only)
int hue   Hue (color only)
int whiteness   Whiteness (greyscale only)
int reserved[4]   reserved for future controls
     
struct video_picture capability flags
PICT_FLAG_BRIGHTNESS   Brightness is supported
PICT_FLAG_CONTRAST   Contrast is supported
PICT_FLAG_COLOUR   Colour is supported
PICT_FLAG_HUE   Hue is supported
PICT_FLAG_WHITENESS   Whiteness is supported

 

Tuning - VIDIOC_G_TUNER, VIDIOC_S_TUNER

Let's fill this in. Something like the existing VIDIOxTUNER is probably pretty close. Someone wanted to add fine tuning hint feedback from the tuner. We can probably get rid of the FREQ ioctls if we add a frequency field to struct video_tuner.

 

Compression/Decompression and Effects

This refers to performing operations on video frames that have been captured previously. This does not refer to compressed capture. It's possible that a device implementing some of these functions may not have capture capability at all.

Ideas?