These proposals do not cover the entire Video for Linux spec. The rest of the spec I consider OK as is, or I have no opinion on it.
Multiple Devices per System
Drivers should be able to support multiple devices, as long as the hardware can do it. It's trivial if the driver writer keeps all global variables in a device structure that begins with a struct video_device. All entry points into the driver pass in a pointer to this structure.
Multiple Opens per Device
Supporting multiple simultaneous capture operations on the same device is not practical because there is no open handle which can be used to differentiate the different capture contexts, and because streaming or frame buffer capture is impractical for more than one instance.
However, it would be really good to support two opens on a device for the purpose of having one open be for capturing, and the other for a GUI control panel application that can change brightness, select inputs, etc. along side the capturing application. A standard video control panel that works with all Video for Linux devices and that can run concurrently with any capturing application would be very cool, and would relieve all the application developers from each having to incorporate their own control panel.
[We could also support a scheme where there is one open controlling the capture, but other opens have access to the mmapped buffers and can select() on the driver.]
Query Capabilities - VIDIOC_G_CAP
This ioctl call is used to obtain the capability information for a video device. The driver will fill in a struct video_capability object.
char name[32] | Friendly name for this device | |
int type | Device type and capability flags (see below) | |
int inputs | Number of video inputs that can be selected | |
int audios | Number of audio inputs that can be selected | |
int maxwidth | Best case maximum image capture width in pixels | |
int maxheight | Best case maximum image capture height in pixels | |
int minwidth | Minimum capture width in pixels | |
int minheight | Minimum capture height in pixels | |
int maxframerate | Maximum capture frame rate | |
int reserved[4] | reserved for future capabilities | |
VID_TYPE_CAPTURE | Can capture frames via the read() call | |
VID_TYPE_STREAMING | Can capture frames asynchronously into pre-allocated buffers | |
VID_TYPE_FRAMEBUF | Can capture directly into compatible graphics frame buffers | |
VID_TYPE_SELECT | Supports asynchronous I/O via the select() call | |
VID_TYPE_TUNER | Has a tuner of some form | |
VID_TYPE_MONOCHROME | Image capture is grey scale only | |
VID_TYPE_CODEC | Can compress/decompress images separately from capturing | |
VID_TYPE_FX | Can do special effects on images separately from capturing |
Note that the minimum and maximum image capture dimensions are for comparison purposes only. The actual maximum size you can capture may depend on the capture parameters, including the pixel format, compression (if any), the video standard (PAL is higher resolution than NTSC), and possibly other parameters. Same applies to maximum frame rate. The minimum and maximum sizes do not imply that all combinations of height/width within the range are possible. For example, the Quickcam has three settings.
Capture to a frame buffer might not work depending on the capabilities of the graphics card, the graphics mode, the X Windows server, etc.
The Video Image Format Structure - struct video_format
The video image format structure is used in several ioctls. This structure completely defines the layout and format of an image or image buffer, including width, height, depth, pixel format, stride, and total size.
int width | Width in pixels | |
int height | Height in pixels | |
int depth | Average number of bits allocated per pixel. Does not apply to compressed images. | |
int pixelformat | The pixel format or type of compression | |
int flags | Format flags | |
int bytesperline | Stride from one line to the next. Only applies if the FMT_FLAG_BYTESPERLINE flag is set. | |
int sizeimage | Total size of the buffer to hold a complete image, in bytes |
The depth is the amount of space in the buffer per pixel, in bits. The pixel information may not fill all bits allocated, e.g. RGB555 and RGB32. Leftover bits are undefined. For planar YUV formats the depth is the average number of bits per pixel. For example, YUV420 is eight bits per component, but the U and V planes are 1/4 the size of the Y plane so the average bits per pixel is 12. The pixelformat values and flags values are defined in the tables below.
Bytesperline is the number of bytes of memory between two adjacent lines. Since most of the time it's not needed, bytesperline only applies if the \ FMT_FLAG_BYTESPERLINE flag is set. Otherwise the field is undefined \ and must be ignored. For YUV planar formats, it's the stride of the Y plane.
Sizeimage is usually either \ width*height*depth /8 for \ uncompressed images, but it's different if bytesperline is used \ since there could be some padding between lines.
PIX_FMT_RGB555 | 16 | RGB-5-5-5 packed RGB format. High bit undefined | ||
PIX_FMT_RGB565 | 16 | RGB-5-6-5 packed RGB format | ||
PIX_FMT_RGB24 | 24 | RGB-8-8-8 packed into 24-bit words. B is at byte address 0. | ||
PIX_FMT_RGB32 | 32 | RGB-8-8-8 into 32-bit words. B is at byte address 0. Top 8 bits are undefined. | ||
PIX_FMT_GREY | 8 | Linear grey scale. Greater values are brighter. | ||
PIX_FMT_YVU9 | 9 | YUV, planar, 8 bits/component. Y plane, 1/16-size V plane, 1/16-size U plane. (Note: V before U) | ||
PIX_FMT_YUV420 | 12 | YUV 4:2:0, planar, 8-bits per component. Y plane, 1/4-size U plane, 1/4-size V plane. (Note: U before V) | ||
PIX_FMT_YUYV | 16 | YUV 4:2:2, 8 bits/component. Byte0 = Y0, Byte1 = U01, Byte2 = Y1, Byte3 = V01, etc. | ||
PIX_FMT_UYVY | 16 | Same as YUYV, except U-Y-V-Y byte order | ||
PIX_FMT_HI240 | 8 | Bt848 8-bit color format | ||
PIX_FMT_YUV422P8 | 8 | 8 bits packed as Y:4 bits, U:2 bits, V:2 bits | ||
FMT_FLAG_BYTESPERLINE | The bytesperline field is valid | |
FMT_FLAG_COMPRESSED | The image is compressed. The depth and bytesperline fields do not apply. | |
FMT_FLAG_INTERLACED | The image consists of two interlaced fields | |
[some of the flags bits should be set aside for format-specific use] |
Capture Image Format - VIDIOC_G_FMT, VIDIOC_S_FMT
Use VIDIOC_S_FMT to set the capture image format. VIDIOC_G_FMT retrieves the current capture format. Both ioctls use a struct video_format to pass the format. Devices will not be able to support every combination of width and height. Upon a VIDIOC_S_FMT call, the driver will find the width and height compatible with the hardware which are as close as possible to the requested width and height without going over in either dimension. The driver will modify the structure to indicate the granted dimensions, and the resulting size of the image. Applications must make sure they are suitable.
Sizeimage is ignored on VIDIOC_S_FMT. On VIDIOC_G_FMT the driver will fill in the \ sizeimage field with the minimum required size of the capture \ buffer. A capture operation such as read() is allowed to fail if the buffer is smaller than sizeimage since a partial image read may be nonsensical or impractical to implement.
An interlaced image will have "comb" or "feathering" artifacts around moving objects. If the \ FMT_FLAG_INTERLACED flag is not set on \ VIDIOC_S_FMT, then that indicates the driver is not permitted to \ capture interlaced images. If the flag is set then the driver may (but is not required to) capture interlaced images if the requested vertical resolution is too high for a single field. \ FMT_FLAG_INTERLACED is set on return from \ VIDIOC_G_FMT only if the driver is actually going to capture \ interlaced images.
Compressed Capture - VIDIOC_G_COMP, VIDIOC_S_COMP
These ioctls set additional capture parameters needed for compressed capture. They both pass the information in a struct video_compression object. The keyframerate field only applies to temporal compression algorithms. The quality factor ranges from 0 to 65535.
int quality | The quality factor | |
int keyframerate | How often to make a keyframe, in frames | |
int reserved[4] | reserved for more parameters |
Reading Captured Images - read()
This capture mode is supported if the VID_TYPE_CAPTURE flag is set in the struct video_capabilities. Each call to read() will fill the buffer with a new frame. The driver may fail the read() if the length parameter is less than the required buffer size specified by the VIDIOC_G_FMT ioctl. This is reasonable since each call to read() starts over with a new frame, and a partial frame may be nonsense (e.g. for a compressed image) or impractical or inefficient to implement in the driver.
Non-blocking read() mode is supported in the usual way. Read() does not work if either streaming capture or hardware frame buffer capture is active.
Capturing to a Hardware Frame Buffer - VIDIOC_G_FBUF, VIDIOC_S_FBUF, VIDIOC_G_WIN, VIDIOC_S_WIN, VIDIOC_CAPTURE
This capture mode is supported if the VID_TYPE_FRAMEBUF flag is set in the struct video_capabilities. [This is very much like the current spec. We might add some get-capture-card-capabilities thing. For example the card I have can only DMA YUV4:2:2 data.]
VIDIOC_S_FBUF sets the frame buffer parameters. VIDIOC_G_FBUF returns the current parameters. The structure used by these ioctls is a struct video_buffer. Ideally the frame buffer would be a YUV 4:2:2 buffer the exact size (or possibly with some line padding) of the capture. It could also be the primary graphics surface, though. You must also use VIDIOC_S_WIN to set up the placement of the video window.
void *base | Physical base address of the frame buffer. | |
struct video_format fmt | Physical layout of the frame buffer | |
int flags | Additional frame buffer type flags | |
FBUF_FLAG_PRIMARY | The frame buffer is the primary graphics surface | |
FBUF_FLAG_OVERLAY | The frame buffer is an overlay surface the same size as the capture |
Note that the buffer is often larger than the visible area, and so the fmt.bytesperline field is most likely valid. XFree86 DGA can provide the parameters required to set up this ioctl.
VIDIOC_G_WIN and VIDIOC_S_WIN work just like the existing VIDIOCGWIN and VIDIOCSWIN ioctls. Except:
VIDIOC_CAPTURE is the same as the existing \ VIDIOCCAPTURE ioctl.
Capturing Continuously to Pre-Allocated Buffers - VIDIOC_STREAMBUFS, VIDIOC_QUERYBUF, VIDIOC_STREAM, VIDIOC_QBUF, VIDIOC_NEXTBUF, VIDIOC_DQBUF
This capture mode is supported if the VID_TYPE_STREAM flag is set in the struct capture_capabilities.
First, the application must call VIDIOC_STREAMBUFS with the number and type of buffers that it wants. Upon return the driver will fill in how many buffers it will allow to be allocated. This ioctl takes a struct video_streambuffers object, see below. The only flag that's valid on VIDIOC_STREAMBUFS is BUF_FLAG_DEVICEMEM. To allocate the buffers call VIDIOC_QUERYBUF for each buffer to get the details about the buffer, and call mmap() to allocate and map the buffer. VIDIOC_QUERYBUF takes a struct video_buffer object with the index field filled in to indicate which buffer is being queried.
To do the capturing, call VIDIOC_QBUF to enqueue the buffers you want to be filled. This ioctl takes a struct video_buffer with the index field filled in to indicate which buffer to queue. The driver will internally queue the buffers in a capture queue. Then call \ VIDIOC_STREAM with the value of 1 to commence the capturing process. \ [I want to have a separate ioctl that starts the streaming mode because knowing when the stream began lets the driver compute each frame's place in the stream, adjust the timestamps to be integral multiples of the frame period thus erasing any interrupt latency, and to compute performance stats including number of dropped frames and actual delivered frames per second. Also the rest of the driver knows streaming is active and therefore can disallow changing the format or activating another capture mode.]
The driver will begin filling the buffers with frame data. Only buffers that have been queued will be filled. Once a buffer is filled, it will not be filled again until it has been explicitly dequeued and requeued by the application. The application can sleep until the next frame is done by calling \ VIDIOC_NEXTBUF, or select(). The two are equivalent. \ VIDIOC_NEXTBUF has no parameter. If no buffers are done then \ VIDIOC_NEXTBUF/select() will block until a buffer is done. If there \ is(are) already a buffer(s) done, then VIDIOC_NEXTBUF/select() will return immediately. It is not possible to wait on a specific buffer if there is more than one buffer queued. Call VIDIOC_DQBUF to dequeue the next ready buffer. VIDIOC_DQBUF takes a struct video_buffer objec. The driver will fill in all the fields. It is not possible to dequeue a specific buffer; buffers are always dequeued in the order in which they were captured. The bytesused field indicates how much data is in the buffer. After the data has been read out, the buffer should be queued up again to keep the frames flowing continuously. VIDIOC_DQBUF immediately returns an error if there is no buffer ready.
An application can call VIDIOC_QUERYBUF at any time for any buffer, and the driver will return the current status of the buffer. You can dynamically throttle the capture frame rate by only queueing buffers at the rate you want to capture.
Call VIDIOC_STREAM with the value of 0 to turn off streaming. If any buffers are queued for capture when streaming is turned off, they remain in the queue. Use munmap() to free the buffers.
There are certain things you can't do when streaming is active, for example changing the capture format, reading data through the read() call, or munmap()ing buffers.
int count | The number of buffers requested or granted | |
int flags | Flags concerning buffer attributes | |
int reserved[2] | reserved | |
int index | Which buffer number this is or which to query | |
int offset | Offset parameter to pass to mmap() to allocate this buffer | |
int length | Length parameter to pass to mmap() | |
int bytesused | The number of bytes of data in the buffer | |
struct timeval timestamp | Timestamp for the frame relative to when streaming was started | |
int flags | Flags concerning the attributes and current status of the buffer | |
int reserved[4] | reserved | |
BUF_FLAG_ALLOCATED | The buffer is currently allocated (and mmap()ed) | |
BUF_FLAG_DEVICEMEM | The buffer is physically located in the device's on-board memory | |
BUF_FLAG_QUEUED | The buffer is queued for capture (set by the driver on VIDIOC_QBUF) | |
BUF_FLAG_DONE | The buffer has data in it (set by the driver when the frame is captured, cleared by the driver on VIDIOC_QBUF) | |
BUF_FLAG_KEYFRAME | This frame is a keyframe or I frame (always set for uncompressed) | |
BUF_FLAG_PFRAME | This frame is a predicted frame (only for some compressions) | |
BUF_FLAG_BFRAME | This frame is a bidirectionally predicted frame (only for some compressions) | |
Waiting for Frames Using select()
The driver supports the select() call on its file descriptors if the VID_TYPE_SELECT flag is set in the struct capture_capabilities. If neither streaming nor frame buffer capture is active, select() returns when there is data ready to be read with the read() call. If streaming capture is running, select() returns when the next buffer is filled. The caller should be sure there is a buffer in the queue first. If frame buffer capture is running select() returns when the next frame has been written to the frame buffer.
Capture Parms - VIDIOC_G_PARM, VIDIOC_S_PARM
This is to control various parameters related to video capture. These ioctls use struct video_parm objects. The microsecperframe field only applies to read() and streaming capture. Capture to frame buffer always runs at the natural frame rate of the video.
High quality mode is intended for still imaging applications. The idea is to get the best possible image quality that the hardware can deliver. It is not defined how the driver writer may acheive that; it will depend on the hardware and the ingenuity of the driver writer. High quality mode is a different mode from the the regular motion video capture modes. In high quality mode:
int input | Which video input is selected | |
int capability | The supported standards and capturemode flags | |
int capturemode | Capture mode flags | |
unsigned long microsecperframe | The desired frame rate expressed as microseconds per frame | |
int reserved[4] | reserved for future parameters | |
CAP_MODE_HIGHQUALITY | High quality capture mode for imaging applications | |
CAP_MODE_VFLIP | The captured image is flipped vertically | |
CAP_MODE_HFLIP | The captured image is flipped horizontally | |
CAP_MICROSECPERFRAME | The driver supports programmable frame rates (capability field only) | |
Video Inputs - VIDIOC_G_INPUT
This ioctl retreives the properties of a video input into a struct video_input object. Before calling VIDIOC_G_INPUT the caller fills in the number field to indicate which input is being queried.
int number | The input to which these properties apply (set by the caller) | |
char name[32] | Friendly name of the input, preferably reflecting the label on the input itself | |
int tuners | Number of tuners on this input [do we need this?] | |
int type | Type of device, if known | |
int capability | Capability flags of this input | |
int reserved[4] | reserved for future input properties | |
INPUT_TYPE_TUNER | This input is a TV tuner | |
INPUT_TYPE_CAMERA | This is a general purpose input | |
INPUT_CAP_AUDIO | The input has an associated audio channel |
Video Standard - VIDIOC_G_STD, VIDIOC_S_STD
These ioctls query and switch the video standard, e.g. NTSC, PAL, etc. The video standard selected applies to all inputs on the device. These ioctls pass a struct video_std object. VIDIOC_G_STD returns the current standard, and which standards are supported on the device. \ VIDIOC_S_STD selects a new standard.
It is worth stressing that switching the video standard is a big deal. Many capabilities of the capture card depend on the video standard selected, including the image resolution and frame rate. After changing the standard, the capture dimensions, required image buffer size, or other capture parameters may have changed. The caller should re-set-up the capture. [The driver may only allow a standard change when there is only one open on the device.]
[Logically, the standard should be on a per-input basis, but since changing the standard is so dangerous, and we want to be able to have a separate control panel that can select inputs independent of a capturing application, we don't want an input change to possibly change the standard. Also mixed standard devices on one system is extremely rare.]
int capability | The supported standards | |
int standard | The current video standard | |
int flags | undefined | |
int reserved | reserved | |
CAP_STD_AUTO | The device supports standard auto-detect | |
CAP_STD_PAL | The device supports PAL mode | |
CAP_STD_NTSC | The device supports NTSC mode | |
CAP_STD_SECAM | The device supports SECAM mode | |
VIDEO_STD_AUTO | The device adjusts automatically or video standard does not apply | |
VIDEO_STD_PAL | PAL mode | |
VIDEO_STD_NTSC | NTSC mode | |
VIDEO_STD_SECAM | SECAM mode | |
[there are some more regional flavors to the above standards too] |
Video Tone Controls - VIDIOC_G_PICT, VIDIOC_S_PICT
These get or set the video tone control settings for the currently selected input. The settings are passed in a struct video_picture object. There are separate tone control settings for each input, so an application must do a VIDIOC_G_PICT after changing the input. All values are scaled between 0 and 65535. 32768 is always a safe neutral position, unless noted otherwise.
int capability | Flags indicating which controls are supported | |
int brightness | Brightness or black level | |
int contrast | Contrast or luma gain | |
int colour | Color saturation or chroma gain (color only) | |
int hue | Hue (color only) | |
int whiteness | Whiteness (greyscale only) | |
int reserved[4] | reserved for future controls | |
PICT_BRIGHTNESS | Brightness is supported | |
PICT_CONTRAST | Contrast is supported | |
PICT_COLOUR | Colour is supported (dig the British spelling) | |
PICT_HUE | Hue is supported | |
PICT_WHITENESS | Whiteness is supported |
Tuning - VIDIOC_G_TUNER, VIDIOC_S_TUNER
Let's fill this in. Something like the existing VIDIOxTUNER is probably pretty close. Someone wanted to add fine tuning hint feedback from the tuner. We can probably get rid of the FREQ ioctls if we add a frequency field to struct video_tuner.
Compression/Decompression and Effects
This refers to performing operations on video frames that have been captured previously. This does not refer to compressed capture. It's possible that a device implementing some of these functions may not have capture capability at all.
Compression/Decompression is really just image format conversion, so we can have a general purpose image conversion interface. An ioctl to set up the conversion: input format, output format, other parameters. Use write() to send the input image and read() to read the result of the conversion. Or use mmap()ed buffers.
For special effects I'm thinking of devices that can accelerate fades, wipes, etc. in a video editing application. Again, this is image conversion, but there could be two (or more?) input images.
Still a research topic....