Overview of MPEG, H.261, and H.263 Codecs

Ultimedia Services Version 2 for AIX: Programmer's Guide and Reference

Overview of MPEG, H.261, and H.263 Codecs

MPEG, H.261, and H.263 are three closely related codecs for motion video. They all are international, non-proprietary standards. MPEG is an International Standard of the International Organization for Standardization (ISO), while H.261 and H.263 are Recommendations of the International Telecommunications Union (ITU). MPEG is intended for playback of movies from digital storage media, while the other two codecs are intended for teleconferencing.

MPEG is an acronym that stands for Moving Picture Experts Group, which is the name of the ISO committee that developed it. Actually, there are two standards for MPEG video. The one referred to here is commonly called MPEG-1. Another more recent standard allows for wider ranges of visual quality, bit rate, and application support; this standard is usually called MPEG-2. Ultimedia Services supports only MPEG-1.

All three codecs are based on the the discrete cosine transform (DCT), predicted frames, and motion estimation. Each compresses frames with the DCT in a way similar to that used in JPEG, and uses predicted frames and motion estimation to take advantage of the correlation between frames. This allows a codec to maintain a given level of visual quality at lower bit rates than would be possible with motion JPEG.

A predicted frame is essentially a difference frame--the difference between the current input frame and the previously encoded and reconstructed frame. This difference should be small over most of the frame area, except around the edges of moving objects and where new objects are introduced into the frame. Often, we call a predicted frame a P-frame; another name is inter-frame. We call a frame that is encoded independently of other frames an intra-frame or I-frame.

Motion estimation estimates the translational motion of objects in the current frame relative to the previous frame. This allows the encoder to reduce the energy in the frame difference by moving pixels around to simulate object motion. The cost is that the encoder must insert a small amount of motion information in the compressed data so that the decoder can reproduce the pixel shuffling exactly. The encoder keeps the quantity of motion parameters low by only estimating the motion for 8x8 or 16x16 blocks of pixels in the input frames.

Both MPEG and H.263 enhance prediction and motion estimation by using bidirectionally-predicted (B) frames. One can think of a B-frame as the average of two P-frames that use previous and future input frames as predictors for the current input frame. Generally speaking, a B-frame can be about one third the size of a P-frame. Obviously, the use of B frames implies out-of-order encoding, as the encoder can only encode a B-frame after encoding the requisite previous and future frames. Note that a B-frame never predicts another B-frame; only I- and P-frames are used to predict B-frames.

One feature that H.261 and H.263 share that MPEG does not have is support for variable frame rate within a video sequence. This is important in teleconferencing for two reasons. First, the bit rate in a teleconference can be very low, so an encoder must be able to lower the frame rate to maintain reasonable visual quality. Second, the encoder must be able to adjust to sudden changes in video content in real time without warning. For example, at a scene change, the first compressed frame tends to be large; with variable-frame-rate encoding , the encoder can encode the frame and then skip a few input frames before encoding the next frame. In fact, the human eye will not see motion in the video for a while after the scene change.

Historically, MPEG-1 was derived from H.261 and JPEG. H.263 is based on H.261 and MPEG-1, and adds some enhancements of its own. On the other hand, MPEG has some enhancements present in neither H.261 nor H.263. All other things being equal, at bit rates of 1 Mbps or above, MPEG-1 video will look better than the same content encoded with either H.261 or H.263. At the common rate of about 1.2 Mbps, with a frame resolution of 352x240 and a frame rate of 30 fps, the visual quality is comparable to or better than that from an analog VCR. In low bit rate teleconferencing (128 kbps and below), H.261 and H.263 video might look better than MPEG video because the first two codecs can vary the frame rate within a video sequence. H.263 can run at lower bit rates than H.261. It can also run at higher bit rates and support larger frames (up to 4 times larger in each dimension). However, if an H.263 encoder does not use the optional modes of H.263, its output should be comparable to that from a similar H.261 encoder.

MPEG supports a whole range of frame sizes and frame rates; the most common are 352x240 at 30fps and 352x288 at 25 fps. H.261 supports 176x144 and 352x288 frames at target frame rates of 7.5 to 30 fps; H.263 adds a smaller size (128x96) and larger sizes (704x576 and 1408x1152). In Ultimedia Services, the H.263 encoder only supports the 128x96, 176x144, and 352x288 sizes.

When pushed to the limit, these codecs can produce visual artifacts similar to those for JPEG. These include blockiness and artifacts near the edges of objects. Such artifacts can be common in low bit rate teleconferencing, and at unexpected scene changes. H.261 will be the most sensitive to such problems, followed by H.263, and then MPEG. The artifacts should be rare for normal MPEG formats and bit rates.

All of these codecs can outperform the proprietary codecs mentioned in the previous section in visual quality and compression ratio. They should also, in general, outperform motion-JPEG. It is now feasible to implement these standard codecs in software, so they should supersede non-standard codecs in general applications.

For introductory information, see Comparison of Video Codecs.