ECN Asia
  Mark as your homepage Bookmark us Print Subscription
               
Tuesday, September 9, 2008
Home About Us Current Issue Archive RSS Free Subscription Trade Shows Media Kit Contact Us

Boards & Modules

Computers, Peripherals & Networking Devices

Digital Den

Electromechanical/Mechanical Devices

Embedded Systems & Networking

Integrated Circuits & Semiconductors

Microwave & RF Components

Optoelectronics & Displays

Packaging & Interconnects

Passive & Discrete Components

Power Sources & Conditioning Devices

Sensors & Actuators

Software

Test & Measurement

Search:
 
  Advanced Search
Product Info Search:
 
     
 
 
 
Issue > May 2007 > Cover Story
 
 
Ads by Google
 

Video compression and data flow for video surveillance


( 01 May 2007 )

By Zhengting He, Texas Instruments

The desire for increased security has catapulted the popularity of video surveillance systems which are now widely deployed in places such as airports, banks, public transportation centers and even private homes. The many problems associated with traditional analog-based systems are influencing the push for digital based systems. Furthermore, with the growing popularity of computer network, semiconductor and video compression technologies, next generation video surveillance systems will undoubtedly be digital and based on standard technologies and IP networking.



In a video surveillance system over Internet Protocol (VSIP), hardware handling the network traffic is an integral part of the camera because video signals are digitalized by the camera and compressed before being transmitted to the video server to overcome the bandwidth limitation of the network. A heterogeneous processor architecture, such as DSP/GPP, is desirable in achieving maximum system performance. Interrupting intensive tasks such as video capturing, storing and streaming can be partitioned to GPP while the MIPS intensive video compression is implemented on the DSP. After the data is transferred to the video server, it then stores the compressed video streams as files to a hard disk, overcoming traditional quality degradation associated with analog storage devices. Various standards have been developed for compression of digital video signals that can be classified into two categories:

• Motion-estimation (ME) based approaches: Even N frames are defined as a group of pictures (GOP) in which the first frame is encoded independently. For the other (N-1) frames, only the difference between the previously encoded frame(s) (reference frame(s)) and itself is encoded. Typical standards are MPEG-2, MPEG-4, H.263 and H.264.

• Still image compression: Each video frame is encoded independently as a still image. The most well-known standard is the JPEG. MJPEG standard encodes each frame using the JPEG algorithm.

ME VS. STILL IMAGE COMPRESSION

Figure 1 shows the block diagram of an H.264 encoder. Similar to other ME based video coding standards, it processes each frame macroblock (MB) by macroblock which is 16 × 16 pixels. It has a forward path and reconstruction path. The forward path encodes a frame into bits. The reconstruction path generates a reference frame from the encoded bits. Here (I)DCT stands for (inverse) discrete cosine transform and (I)Q stands for (inverse) quantization. ME and MC stand for motion estimation and motion compensation, respectively.



In the forward path (DCT to Q), each MB can either be encoded in intra mode or inter mode. In inter mode, the reference MB is found in previously encoded frame(s) by the motion estimation (ME) module. In intra mode, M is formed from samples in the current frame.



The purpose of the reconstruction path (IQ to IDCT) is to ensure that the encoder and decoder will use the identical reference frame to create the image. Otherwise, the error between the encoder and decoder will accumulate.



Figure 2 is a JPEG encoder block diagram. It divides the input image into multiple 8×8 pixel blocks and processes them one by one. Each block passes through the DCT module first. Then the quantizer rounds off the DCT coefficients according to the quantization matrix. The encoding quality and compression ratio is adjustable depending on the quantization step. The output from the quantizer is encoded by the entropy encoder to generate the JPEG image.



Since sequential video frames often contain a lot of correlated information, ME based approaches can achieve a higher compression ratio. For example, for NTSC standard resolution at 30f/s, the H.264 encoder can encode video at 2Mbps to achieve average image quality with a compression ratio of 60:1. To achieve similar quality, MJPEG's compression ratio is about 10:1 to 15:1.



MJPEG has several advantages over the ME-based approach. Foremost, JPEG requires significantly less computation and power consumption. Also, most PCs have the software to decode and display JPEG images. MJPEG is also more effective when a single image or a few images record a specific event such as a person walking across a door entrance. If the network bandwidth cannot be guaranteed, MJPEG is preferred since the loss or delay of one frame will not affect other frames. With the ME-based method, the delay/loss of one frame will cause the delay/loss of the entire GOP since the next frame will not be decoded until the previous reference frame is available.



Since many VSIP cameras have multiple video encoders, users can choose to run the most appropriate one based on the specific application requirement. Some cameras even have the ability to execute multiple codecs simultaneously with various combinations. MJPEG is typically considered to be the minimal requirement, and almost all VSIP cameras have a JPEG encoder installed.



MOTION JPEG IMPLEMENTATION

In a typical digital surveillance system, video is captured from a sensor, compressed and then streamed to the video server. It is undesirable to interrupt the video encoder task implemented on modern DSP architecture since each context switch may involve large numbers of register saving and cache throwing. Thus, the heterogeneous architecture is desirable so that video capturing and streaming tasks can be offloaded from the DSP. The block diagram below illustrates an example of DSP/GPP processor architecture used in video surveillance applications.



When implementing a Motion JPEG on a DSP/GPP SoC-based system, developers should first partition the function modules appropriately to achieve better system performance. The EMAC driver, TCP/IP network stack and HTTP server, that work together to stream compressed images to the outside, the video capture driver and ATA driver should all be implemented on the ARM to help offload DSP processing. The JPEG encoder should be implemented on the DSP core since its VLIW architecture is particularly good at processing this type of computation intensive task. Once the video frames are captured from the camera via the video input port on the processor, the raw image is compressed by exercising the JPEG encoder, and then the compressed JPEG image files are saved to the hard disk on the board.



Typically, PCs are used to monitor a video scene in real time by retrieving the streams in the video server and decoding and displaying them on the monitor. Encoded JPEG image files can be retrieved on the board via the Internet. Multiple streams can be monitored in a single PC. The streams can also be watched simultaneously from multiple points in the network. As a huge benefit over traditional analog systems, the VSIP central office can contact the video server through the TCP/IP network, and it can be physically located anywhere in the network. The single point of failure becomes the digital camera, not the central office. The quality of the JPEG images can also be dynamically configured to meet varying video quality specifications.



OPTIMIZING THE JPEG ENCODER

Out of the three main function modules in a JPEG encoder, DCT and quantizer are computationally intensive. It is also noticeable that the performance difference between highly optimized assembly code and un-optimized C code for these two modules can be dramatic. Thus, optimizing these two modules is necessary.



Optimizing the 2-dimentional (2D) 8×8 DCT function reduces the number of additions/subtractions and multiplication by removing the redundant computations in the original equation. Many fast DCT algorithms have been published among which Chen’s algorithm is widely accepted by the industry. For 2D 8×8 DCT, Chen’s algorithm requires 448 additions/subtractions and 224 multiplications.



These additions/subtractions and multiplications can further be partitioned to multiple function units in the DSP core to achieve parallel instruction execution, achieving better performance. Highly optimized DSP assembly code can finish a 2D DCT within 100 cycles, excluding the overhead. Some other fast DCT algorithms require even fewer computations. However, they often require more buffer to save intermediate computation results. For modern DSP with pipelined VLIW architecture, loading/storing data from/to the memory takes more cycles than a multiplication. Thus, it is important for developers to consider the idea of balancing computations and memory accessing when optimizing the algorithm.



Quantizing each pixel requires a multiplication and an addition. The computation typically requires only 16bit precision, while the size of DSP registers is 32bits. The first objective to optimize the quantizer module is to pack two pixels into one register and perform additions and multiplications on a pair of pixels. The second is also to use multiple DSP function units in parallel. Since the DSP core in TMS320DM6446 has two multipliers and two adders, up to four pixels can be quantized simultaneously. The last but not least goal is to take advantage of the pipelined DSP architecture. When the DSP core is quantizing the current four pixels, the next four can be loaded from memory so that data can be fed to the multipliers and adders in every cycle. The first two objectives usually have to be realized by developers themselves writing optimized C code or assembly code. Pipelining the code can rely on the DSP compiler.



Other than optimizing each function module, a PING-PONG buffering scheme needs to be deployed to optimize the JPEG encoder at system level. The DSP core accesses data residing in internal RAM (IRAM) at much faster speed compared to accessing data in external DDR2 memory. However, the precious IRAM has very limited size, and it is not large enough to fit the whole input frame. Thus, a portion of the blocks are processed at a time in IRAM. When the PING(PONG) set of blocks are being processed, the PONG(PING) set of blocks are transferred by DMA from DDR2 to IRAM so that the DSP core can start processing the next set immediately after completing the current set.



It is clear that the move to digital video surveillance systems is well on its way. Understanding video compression, system partitioning and codec optimization are key to developing next generation video surveillance systems to meet the escalating demand.

 

 
 
 
ADVERTISEMENT
 
 
 
Ads by Google
 
OUR SPONSOR
   
   
 
 
 
   
   
     
 
 
         
     
 
Related Articles
   
Power-management solutions for telecom systems improve performance, cost, and size
“First” to combine handset audio filtering and ESD protection
DSP: From ideas to implementation
Floating-point DSCs yield greener control systems
Drain current dynamic sharing of paralleled MOSFETs
Exponentials simplify linear circuit analysis (part one)
Single-chip audio solution targets soundbar market
“Smallest” non-volatile push-button DCPs for control solutions
“Smallest” dual-antenna input GPS receiver IC
ADC and low-jitter clock combination delivers “highest” SNR and SFDR
   
 
Product News
   
VEGA Controls Launches Hydrostatic Transmitter
AQUACOUNTER Karl Fischer Coulometric Titrator from JM Science
IR Introduces Automotive-Qualified 600V ICs
ADLINK Introduces Extension Daughter Boards
Confidex Introduces Steelwave Micro passive UHF RFID Tag
Texas Instruments Offers 4-GHz Quadrature Modulator
Hall Sensor Amplifier from Portescap
Haydon Switch Unveils Size 11 Double Stack Hybrid Linear Actuators
Setcom Wireless Announces Single Test System
Maxim’s Accurate Overvoltage Protectors with Active Current Limiting
   
  More News >>
 
     
     
 
         
 
 
     
         
 
spacer
Country Report
spacer
   
bullet

TAIWAN: Inductor technologies are developed independently

bullet

KOREA: Inductor manufacturers are highly competitive, but scarce

bullet

CHINA: World’s high-volume producer of transformer, coil and inductor

bullet

TAIWAN: Moderate but steady growth in LED market

bullet

KOREA: LED has a bright future in our homes

  more on country report >>
   
 
spacer
Our Sponsor
spacer
   
bullet
 
   
 
     
 
     
 
spacer
Features
spacer
   
bullet

Drain current dynamic sharing of paralleled MOSFETs

bullet

Handy features of a USB current limit switch

bullet

Floating-point DSCs yield greener control systems

bullet

Solutions for LCD TV super IP applications

bullet

DSP: From ideas to implementation

  more on features >>
   
 
Distribution
   

Dealing with distributors even when there are manufacturers around

Value addition is the key in distribution

Distributors supply solutions, not just parts

Taiwan distributors compete by bolstering in-house R&D

“Nature of distribution is changing”

  more on distribution >>
   
 
     
         
 
 
     
         
 
Industry Focus
   

Ethernet adoption encourages open protocols

Managing Bluetooth profiles: A billion served

Enabling a true wireless multimedia home network

Bluetooth paves the way for truly wireless car interiors

Eliminating massive clock trees in SoC designs using GALS

  more on industry focus >>
   
 
Web Exclusives
   

Power-management solutions for telecom systems improve performance, cost, and size

Changing the network security playing field

WiMAX “personality pack” provides complete IEEE802.16 functionality

LED: A tiny light source with a bright future

SSDs: Carving a Niche in the Consumer and Enterprise Markets

  more on web exclusives >>
   
 
     
     
   
     
 
Semiconductors
   

Simulating the effect of blockers on data converter performance in wideband receivers

Decrease processor power consumption using a CPLD

Taking full advantage of new, low-power MCUs

Power train integration for 2007 and beyond: The true dawn of multi-chip modules

Wireless network options for industrial applications

  more on semiconductors >>
   
 
Field Applications
   

Test Equipment

Power Sources/Circuit Protection

Advanced Signal Processing Dramatically Improves Capability of Artificial Limbs

Voice Interface Technology for Hands-free Function in Automobiles

LXI: A Technology Leap for Test Instrumentation

  more on field applications >>
   
 
     
     
   
     
     
 
INDUSTRY LINKS
   
Photonics Association (Singapore)
bullet Singapore Industrial Automation Association (SIAA)