Single-chip MPEG2 MP@HL Decoder with Multi-decode and Seamless Display Features

Hidenaga Takahashi  Yukio Otobe  Kiyoshi Kohiyama
(Manuscript received December 1, 1999)

This paper reports on the development of an MPEG video decoder LSI. The new LSI can process MPEG MP@HL, which contains six times as much information as the normal MPEG MP@ML used in DVDs and CS digital broadcasting, and can receive HDTV broadcasts. Also, the chip has a multi-decode and seamless display capability for use in the digital broadcasts of the future. This paper gives an outline of the new chip and its architecture and describes the realization of the multi-decode and seamless display functions.

1. Introduction

The digitization of TV broadcasts using the Moving Picture Expert Group (MPEG) video standard is proceeding swiftly on a global scale. Japan will digitize TV broadcasts, starting with broadcast satellite (BS) transmissions and moving on to terrestrial transmissions after 2000. Digital broadcasts will feature MPEG main profile at high level (MPEG MP@HL) compressed High-definition TV (HDTV), multiple channels, high-volume data transmission using data broadcast technology, and the electronic program guide (EPG) and will make it possible to provide high-value digital services that are impossible with existing analog TV broadcasts. Also, digitization will enable conventional TVs to evolve into home information terminals. With an eye on the huge digital TV market of the future, TV and PC makers are hurrying with the development of digital receivers.1,2)

With these factors in mind, we developed an MPEG video decoder LSI with HDTV receiving capability. This will be a key device for digital receivers. The key point was to develop a compact and high-quality MPEG MP@HL core for displaying HDTV images that need six times the processing power of MP@ML or normal standard definition TV (SDTV) images. By using this core in time division mode, we can simultaneously decode and display four SDTV programs. Also, the architecture we adopted is simple and has highly independent core circuits, making enhancements of functions easy in the future.

2. LSI overview

2.1 LSI specifications

Table 1 gives a functional outline of the new chip, and Figure 1 shows a photograph of it.

All the functions necessary for digital TV video processing, for example, an MPEG video (MP@HL) decoder core, TS de-multiplexer (transport stream de-multiplexer), and an image

<table>
<thead>
<tr>
<th>Technology</th>
<th>0.25-micron, 4 metal layers</th>
</tr>
</thead>
<tbody>
<tr>
<td>Circuit size</td>
<td>1 million gates/10 mm²</td>
</tr>
<tr>
<td>Power dissipation</td>
<td>3.2 W at 2.5 V</td>
</tr>
<tr>
<td>Clock rate</td>
<td>125 MHz</td>
</tr>
<tr>
<td>Package</td>
<td>HQFP-304</td>
</tr>
<tr>
<td>Memory</td>
<td>64-Mbit SDRAM x 2</td>
</tr>
</tbody>
</table>
formatter, are integrated onto a single chip using 0.25-micron, 4-layer metal technology. The chip contains 1 million gates. A 125 MHz clock drives the MPEG video decoder and SDRAM memory controller. The formatter and On-screen Display (OSD) use a 27 to 74 MHz clock. The clock rate can be selected depending on the display.

A 2.5 V power supply is used to drive internal circuitry, while a 3.3 V power supply is used to drive external I/Os. The power dissipation is 3.2 W when receiving HDTV broadcasts. The chip supports sleep mode (low-power mode), which can be used to receive mail at night. In sleep mode, only the demultiplexer and CPU interface are active and the power dissipation is only 0.2 W.

2.2 Functional blocks

This LSI consists of seven main functional blocks.

1) TS demultiplexer block
This block is used to select video packetized elementary stream (PES), audio PES, data broadcast, EPG information, etc., from the original TS stream.

2) Video decoder block
This block is used to decode the MP@HL video stream. It can also be used to decode up to four MP@ML video streams.

3) Format converter block
This block is used to reduce or enlarge images for display. It can be used to display up to four images simultaneously. The four images can be reduced or enlarged independently and can be overlapped. Thus, image layout can be customized.

4) OSD block
This block is used to realize graphics display. There are 2 Mbytes of graphics data space in the external SDRAM, and the following two types of display modes are supported:
- 1920 x 1080 maximum resolution color look-up table (CLUT) 8 bits/pixel mode,
- 960 x 1080 maximum resolution RGB 16 bits/pixel mode:

The size of the graphic plane can be customized, and a maximum of two layers are supported.

5) Memory controller (MC) block
This controls external SDRAM.

6) Audio IF block
This block extracts the elementary stream (ES) and presentation time stamp (PTS) from PES audio streams.

7) CPU IF block
This block transmits and receives data from an external CPU.

Table 2 shows the main functions for each block, and Figure 2 shows an example of images displayed on the monitor.
3. Decoder architecture

We decided to adopt a simple architecture for the digital TV LSI. In addition to the video decoder function, this LSI has a format converter, OSD, and other peripheral functions. Therefore, functionally the chip is fairly complicated. However, by making the interface between functions simple and regular we simplified the architecture as much as possible. As a result, we were able to develop the chip in a short time. This architecture will make it easy to add or enhance functions in the future and also makes it possible to reuse core circuits. The chip architecture is described in detail below.

3.1 MPEG (MP@HL) video decoder core

An MP@HL compatible decoder core needs six times the capability of conventional MP@ML decoders. We considered using an existing MP@ML decoder developed for DVD MPEG decoders in a parallel architecture, but decided instead to develop a compact and efficient MP@HL decoder core. This was because we wanted to reuse our video decoder core and make our circuit compact and easy to use. Also, we wanted to avoid the difficulties in the distribution of streams and memory access which would occur in a parallel architecture.

The newly developed MP@HL video decoder realizes variable length decoding (VLD), inverse quantization (IQ), and inverse DCT (IDCT) processing using a single 125 MHz pipeline. This made it easy to realize multiple decoding by time division processing.

3.2 Memory access using fixed memory period

A key point in the development of an MPEG decoder architecture is how to access external memory. Figure 3 shows the functional blocks of the memory controller and the memory access requirements. The blocks in the LSI transfer data with each other through external SDRAM, and the memory controller is a key unit for this purpose.

The memory access requirements are as follows: 1) writing data extracted from the TS demultiplexer, 2) reading the video bit stream, 3) reading the MPEG reference picture, 4) writing the decoded MPEG picture, 5) reading the
display image. Items 6) to 9) in Figure 3 are other memory accesses.

Memory accesses in conventional MPEG decoders are realized using arbitration. Each functional block makes a request to use the memory, and the memory controller responds to the requests if possible. Arbitration has the following three problems. First, it is difficult to implement efficient priority algorithms. Second, inefficient memory accesses occur. Third, since MPEG realizes video compression using variable length codes and memory access of reference pictures depends on the particular stream, it is difficult to guarantee memory accesses for worst-case streams.

We addressed the above problems by using fixed time periods and time slots for each memory access requirement. These time slots and periods are described below.

### 3.2.1 Periods

Some of the memory accesses we described above are regular accesses and some are irregular ones. Also, the amount of time needed and the amount of data transferred varies greatly. We call a set of these memory accesses with a fixed time duration a “period.”

MPEG video decoding is done in 16 × 16-pixel units called “macroblocks.” We designed a period to correspond to the time it takes to process one macroblock. The time it takes to process a macroblock depends on the number of pictures that need to be referenced, but the duration of a period is the duration in the worst case; that is, the duration when pictures in both the forward and backward direction are referenced. The hardware MPEG decoder pipeline is synchronized to this period.

There are different types of periods. The main types of periods are the decode period, video stream period, audio stream period, and refresh period. Figure 4 shows the operations performed in these periods. The basic decode period contains all the memory accesses needed to decode one macroblock, for example, 3) reading the MPEG reference picture and 4) writing the decoded MPEG picture. The video stream period includes 2) read access to the video bit stream memory. Memory accesses which are needed on a regular basis such as 1) writing data extracted from the TS demultiplexer and 5) reading of the display image are performed in both periods.

#### 3.2.2 MPEG decoding using periods

Figure 5 shows how MPEG decoding proceeds using period memory accesses. Normally, the decode period is generated and decoding proceeds one macroblock at a time. At the end of a period the LSI checks to see how many bits are left in the stream buffer. If the number of remaining bits is above a certain threshold, the next decode period is started. If the number is below the threshold, the video stream period is started and video stream data is sent to the stream buffer. MPEG decoding stops during the video stream period. Decoding proceeds by alternating between the decode period and video stream.
period. Other periods such as the refresh period and audio stream period are started as needed. A total of 8160 decode periods are needed to process one HDTV (MP@HL) image.

### 3.2.3 Guaranteed MP@HL decoding for worst-case MPEG streams

Using this architecture, it is possible to guarantee MPEG decoding from the initial design phase, even for worst-case streams. We define "worst-case" here to mean an MPEG stream in which all macroblock reference pictures in the forward and backward directions and all video data in the VBV buffer is used and the rate of data transfer from the video decoder to external memory is maximized. In this case, MP@HL decoding for worst-case streams is guaranteed as follows.

An MPEG MP@HL image consists of $1920 \times 1080$ pixels or 8160 macroblocks. Therefore, 8160 decode periods are necessary.

The maximum number of video data bits needed for one picture is $9,781,248$ for MP@HL. Since 16 K bits are read out in one video stream period, 597 periods ($9,781,248/16 K = 597$) are needed.

Therefore, 8757 periods ($597 + 8160 = 8757$) are needed to guarantee MPEG decoding.

When periods such as refresh periods are included, the present LSI processes 9000 periods per frame. This guarantees MPEG decoding even for worst-case streams.

Also, this method of accessing memory simplifies the control circuits and reduces the memory access overhead.

### 3.3 Asynchronous clocks for display and decode units

We adopted an architecture in which display control circuits such as the formatter/OSD and MPEG decode circuits are driven by asynchronous clocks.

System users require all kinds of resolutions and picture qualities. To meet this need and ensure that core circuits can be easily removable and adapted to users’ needs, it is necessary for the display control circuits to be independent from the decode function. There are two clock modes: internal mode and external mode. The internal clock mode supports 27 MHz and 74 MHz clocks. These clocks can be selected and are generated by internal PLL circuits. An external clock can be used when external clock mode is selected.

The decoder and display units are loosely synchronized frame-by-frame. The decoder starts MPEG decoding after receiving frame pulses from the display unit. After processing a picture, decoding stops until the next frame pulse is received. The picture buffers accessed by the display and decoding unit are different. This makes it easy to realize frame-rate conversion and error concealment.

### 4. LSI features (functions for new services)

In this section, we explain the multi-decode and seamless decode functions. Although these two functions are not mandatory for conventional MPEG decoders, we implemented them since they are needed to adapt to programming changes made at the broadcasting stations and to realize new digital broadcast services. These functions are described below.
4.1 Multi-decode function

Digital broadcasting makes it possible to multiplex digital data and broadcast multiple programs within a single TV channel. For example, in BS digital broadcasting, it is not necessary to allocate the entire 23 Mbps bit rate to a single MP@HL video program because it can be shared among multiple MP@ML video streams. These video streams can be inter-related, for example, they can be views of the same scene taken from different angles. This kind of service is already being studied in Japan. Since our MPEG decoder has enough processing power to decode six MP@ML streams, to comply with this new service, we designed our video decoder so that a maximum of four MP@ML streams can be decoded and displayed at the same time.

4.1.1 Using the MPEG MP@HL core in time-division mode

Figure 6 shows how our decoder works in multi-decode mode. The LSI decodes macroblock-by-macroblock, using the decode periods explained earlier. Therefore, it is possible to decode multiple video streams by sending them in time-division sequence. Changing from one MP@ML stream to another involves an overhead, but this overhead is easily dealt with since there are 9000 periods in a single frame. We therefore implemented multi-decoding of up to four MP@ML video streams.

4.1.2 Buffer management for multi-decoding

Figure 7 shows the buffers in external SDRAM when multi-decoding is in progress. The VBV buffer (used for storing video streams from the TS demultiplexer) and frame buffers (used for storing decoded MPEG pictures) are divided into four regions. There is also a space for storing four sets of MPEG picture parameters.

Our LSI has functions to automatically decide which stream should be decoded first and whether decoding is possible. This means that it is possible to decode MPEG pictures with different picture resolutions and frame rates.

4.2 Seamless decode and display

In the digital broadcast era, it is possible that broadcasting stations will broadcast MPEG pictures with different resolutions and frame rates. For example, commercial programs and viewer programs might have different resolutions and it might be desirable for the decoder to be able to switch seamlessly from one picture resolution to another so that the viewer is unaware of the change in resolution.

4.2.1 Conversion of picture size

Changing picture size automatically can be easily realized since our LSI has both a decoder unit and a display unit. The decoder memorizes the frame rate and picture resolution of the picture currently being processed. When the picture
When a frame is displayed, the decoder sends the picture information to the display unit/formatter and the unit calculates the amount of picture enlargement or reduction necessary during the vertical blanking interval. There is an on-chip dedicated hardware calculator so that the picture information that is sent can be calculated every frame. Conventional decoders find it hard to calculate this information on time since there is no dedicated calculator.

### 4.2.2 Changing the frame rate

A change in the frame rate has a big influence on the decoder. For example, if the frame rate changes from 60 frames/s to 30 frames/s, the frame immediately after the change must be processed at 60 frames/s instead of the normal 30 frames/s. Otherwise, the frame cannot be displayed normally. This means that the decoder needs to process MPEG data at twice the normal rate. To deal with this problem, a circuit which automatically repeats or skips MPEG pictures has been included. As a result, seamless display is realized while keeping processing needs to a minimum.

**Figure 8** shows an example of how the LSI works when the frame rate changes from 60 to 30 frames/s. In this case, display of the first 30-frames/s picture is delayed by 1/30 second. To fill in the gap, pictures B7 and P8 (the two pictures immediately before the change) are repeated for 1/60 second each.

**Figure 9** shows what happens when the rate changes from 30 to 60 frames/s. In this case, display of the first 60-frames/s picture is advance by 1/30 second. Two 30-frames/s pictures, B7 and P8, are skipped.

Therefore, in a 60-to-30 change, pictures are repeated for 1/30 second and in a 30-to-60 change, pictures are skipped for 1/30 second. Since the time needed to display the source image and the actual display time match, even when there are multiple frame rate changes, the timing between the bit stream input and display output is kept constant and seamless display is realized.

### 5. Conclusion

We have developed an MPEG MP@HL decoder LSI that performs all the video functions necessary for a digital TV receiver. The chip also supports audio data extraction so that audio can be easily processed by an external DSP. In addition to normal MP@HL decoding, the chip supports multi-decoding and the other functions that will be needed for the digital broadcast services of the future. The LSI's architecture makes it easy to reuse its MPEG core functions and add new functions. We intend to further enhance this chip in the near future.

### References

Hidenaga Takahashi received the B.S. degree in Electrical Engineering from Chiba University, Chiba, Japan in 1985. He joined Fujitsu Ltd., Kawasaki, Japan in 1985, where he has been engaged in development of digital television circuits and LSIs.

Yukio Otobe received the B.S. degree in Electronic Engineering from Kyushu University, Fukuoka, Japan in 1984. He joined Fujitsu Ltd., Kawasaki, Japan in 1984, where he has been engaged in development of LSIs for image processing. Currently, he is engaged in development of MPEG video LSIs for Digital TV.

Kiyoshi Kohiyama received the B.S. degree in Electronic Engineering from Keio University, Yokohama, Japan in 1977. He joined Fujitsu Ltd., Kawasaki, Japan in 1977, where he was engaged in evaluation of high-speed computer memory and ECL logic circuits. Since then, he has been engaged in development of digital TV circuits and LSIs. He transferred to Fujitsu Laboratories Ltd., Kawasaki, Japan in 1994. He received the Oyama Matsujiro Award from the Promotion Foundation for Electronic Science and Engineering, Tokyo in 1992 and the Development of Technical Promotion Award from the Institute of Television Engineers in 1996. He is a member of the Institute of Image Information and Television Engineers and the Association of Radio Industries and Businesses (ARIB).