H.261 Encoder

Ultimedia Services Version 2 for AIX: Programmer's Guide and Reference

H.261 Encoder

The UMSH261Encoder compresses video frames into H.261 compressed frames. H.261 is a Recommendation of the International Telecommunications Union (ITU).

To learn more about the UMSH261Encoder, see:

For introductory information, see Programming with Video Codec Objects.

Output Format

H.261 video frames may be transmitted by a variety of mechanisms: RTP, H.320, H.324, etc.

Algorithm Overview

The UMSH261Encoder object implements ITU-T Recommendation H.261, which is part of ITU-T Recommendation H.320. The UMS object implements all required portions of the Recommendation, except for the special and optional modes such as split-screen, and the optional still-image mode. In addition, the UMSH261Encoder object allows the compressed frame to be "packetized", for use with the Real-time Transport Protocol (RTP). The interface for the object is essentially a combination of those for the UMSMPEG1VideoEncoder and UMSH263Encoder objects.

This object can accept input image data in either YUV (YCbCr) or RGB formats. The encoder accepts frame sizes of 352x288 (CIF) and 176x144 (QCIF); the encoding is proportionately slower for CIF input frames. The compressed bit rate can be set to any positive value up to 2 Mbits/sec.

The encoded frame rate can be set to approximately 7.5, 10, 15, or 30 fps. Note that the actual encoding speed depends on the frame size, video content, frequency of P-frames, the CPU used, and other factors. For a given requested bit rate, the quality of the decoded frames increases as the frame rate is decreased. Also, in general, the video quality increases if P-frames are used, but the encoding is slower than I-frame-only encoding.

Actually, there are two modes for the frame rate--fixed and variable--which are implemented in two different compression methods. In the fixed case, the encoder compresses frames to be played back at the specified rate. In the variable case, the setting is treated as a maximum, and the encoder varies the playback rate for the frames to try to maximize video quality.

Although H.261 does not distinguish between I-frames and P-frames, the UMSH261Encoder object can compress a frame as an I-frame or as a P-frame. The interface for determining the frame type for each frame is similar to that for the UMSMPEG1VideoEncoder object.

Note that the H.261 specification limits the maximum size of a compressed frame; for QCIF input, the limit is 64 kbits, and for CIF input, it is 256 kbits. The UMSH261Encoder object does not guarantee that all compressed frames fall below the limit. The magnitude of the problem depends on video content, whether or not P-frames are used, and the requested bit rate. Obviously, scene changes and motion do affect the variation of sizes of the compressed frames in a clip. Also, if P-frames are used, then the I-frames are larger than the average size of a compressed frame, and could exceed the limit. Finally, at extreme settings of the bit rate and the frame rate, the average size of a compressed frame can equal or exceed the limit.

Rate Control

As mentioned before, there are two modes for the frame rate: fixed and variable. These correspond to two different modes for the rate control.

The fixed-frame-rate mode is similar to that in the UMSMPEG1VideoEncoder, and uses the compress_frame method. In this mode, the frame rate of the compressed video is assumed to be identical to that of the source video, and is set by the set_frame_rate method. The encoder attempts to control the sizes of the compressed frames to get the required average bit rate.

In this mode, if the real capture rate for the input video is greater than the frame rate set by the set_frame_rate method, the application must "skip" frames to get the correct source frame rate for the encoder.

The variable-frame-rate mode works similarly to that in the UMSH263Encoder object. It is implemented in the compress_frame_vf method. In this method, the frame rate of the source video is assumed to be 30 fps, and the set_frame_rate method sets a target frame rate for the compressed video. The frame_incr variable in the method call helps the encoder and the application communicate to regulate the frame rate locally. In normal operation, the application initializes frame_incr to 1, and the encoder returns a value every time it compresses a frame. The application skips ahead by frame_incr frames to get the frame to send to the next call to compress_frame_vf. If the encoder is lagging too much, the application can force a value of frame_incr in the next call to compress_frame_vf.

RTP Packets

The UMSH261Encoder object can break the compressed frame into smaller pieces, suitable for sending that data using RTP packets. RTP is a networking protocol defined by the Internet Engineering Task Force. There is a payload format specification defined for the H.261 video data that allows that compressed data to be sent using RTP. This feature is enabled by setting the packet size, in bytes, with the set_packet_size method, and then calling the get_packet_info method repeatedly after the frame is compressed. Each call to the get_packet_info method returns the RTP header and packet boundary information for one packet. This allows each packet's data to be extracted from the compressed frame and sent individually.

Encoding H.261 Video

To encode H.261 video, follow these steps:

Set up the source for the video.
Create the UMSH261Encoder object.
Change any default settings for the image size, input image format, quantization factor, or maximum motion displacement with the corresponding methods.
Call the encoder's get_max_buffer_size method to determine the required buffer size for the compressed frame.
Allocate memory for the input and output buffers. Initialize the SOM sequence data structures associated with the input and output buffers.
Capture the image frame and write the frame to the image buffer.
Call compress_frame or compress_frame_vf to compress the frame. For the first frame in a sequence, the frame_type must be FirstFrame, and, for compress_frame_vf, frame_incr must be set to 1.

Tips for Using the H.261 Encoder

The following tips provide important information for using the H.261 encoder:

The application can control whether a particular frame is compressed as an I-frame or a P-frame, or just specify an interval between I-frames, which the encoder can use to set the frame type. I-frames require more bits than P-frames, and should be inserted at a rate of one I-frame every 1 to 4 seconds.
Compression is faster if only I-frames are used. Also, if all other parameters are identical, the object encodes QCIF video about 4 times faster than CIF video. If P-frames are used, the speed of encoding increases as the maximum motion displacement is decreased.
The image quality improves significantly if P-frames are used. The absolute minimum size for a QCIF frame, compressed as an I-frame, is 6545 bits; for a CIF input frame, it is 25850 bits. At a fixed frame rate of 15 fps, these sizes correspond to bit rates of about 98 kbits/s and 390 kbits/s. With these minimum frame sizes, I-frames appear blocky, with no image detail.
If P-frames are used, 15 fps QCIF video can look very good at a bit rate of 128 kbits/s. As the bit rate decreases, any H.261 decoder has to trade off spatial detail for smoothness of motion. Finally, as the bit rate decreases, both the spatial detail and the motion deteriorates. The H.261 Recommendation suggests a minimum bit rate of 40 kbits/s.
For given input video content, and video quality, the bit rate is approximately proportional to the frame rate and the frame size.
If P-frames are used, and all other parameters are fixed, video quality improves as the maximum motion displacement increases.
For given parameter settings, the minimum and maximum attainable bit rates depend on the video content.
If image quality within a frame is important, then the maximum quantization value can be lowered (values in the range of 4-8 give very good quality).
Do not mix the compress_frame and compress_frame_vf methods.
The UMSH261Encoder object's default settings are:
frame rate 15 fps

image format YUV422

image size 176x144 (QCIF)

maximum motion displacement 15

maximum quantization value 31

RTP packet size 0 (OFF)

For introductory information, see Programming with Video Codec Objects.

frame rate	15 fps
image format	YUV422
image size	176x144 (QCIF)
maximum motion displacement	15
maximum quantization value	31
RTP packet size	0 (OFF)