Current Research


Image Processing and Coding

Variable Block Size Transform Coding of HD Video Using Order-4, -8 and -16 Transforms

(C.K. Fong, W.K. Cham)

Digital TV systems are being installed in many regions. At first they carry standard definition TV. In near future, they will evolve into high-definition TV (HDTV) systems of resolution of 1920 × 1080 pixels and even ultra high-definition video (UHDV) of resolution of 7,680 × 4,320 pixels. This project investigate the development of video coding techniques with even higher compression ability to tackle the sharp increase of video data.

Today's best video coding standards like H.264 and AVC use only order-4 and order-8 integer cosine transforms (ICT). We propose to improve the compression ability by using order-16 integer transforms that can be realized using 16-bit digital signal processors and easily incorporated in existing standards. Preliminary results have shown that 1db gain in PSNR can be obtained.



Statistical Modeling for Transform Coefficients

(C.K. Fong, W.K. Cham)

The statistical model of transform coefficients is important for image and video processing. They are often used in coding and restoration. The accuracy of these models affects the performance of the processing. We are investigating a more accurate model than the conventional Laplacian model.



Postprocessing of Low Bit Rate Block DCT Coded Images

(Deqing Sun, W.K. Cham)

Transform coding using the Discrete Cosine Transform (DCT) has been widely used in image and video coding standards. However, at low bit rates, the coded images suffer from severe visual distortions which prevent further bit reduction. Postprocessing can reduce these distortions and alleviate the conflict between bit rate reduction and quality preservation. In this project, we view postprocessing as an inverse problem and use a Bayesian approach to solve it. At present, we are developing statistical models to describe the distortions caused by coding, which can also be used in other applications, such as high resolution reconstruction of compressed videos.



Fast Video Object Segmentation

(Lawrence C.M. Mak, W.K. Cham)

Video object segmentation from raw video data is always a difficult problem. The computation involved is usually very high, prohibiting it from being used in real-time environment. We are now designing algorithms to speed up critical steps in the segmentation process, such as motion estimation, preliminary object clustering, and object boundary refinement. Multiple moving objects will be segmented and tracked from the frames simultaneously. The target frame rate is 10 fps for CIF size frames.



Variable Block Size Motion Estimation

(Lawrence C.M. Mak, W.K. Cham)

In the newly emerged video coding standard H.264/AVC, variable block size motion estimation (VBSME) is used to reduce temporal redundancy. We proposed a novel VBSME algorithm which performs motion estimation in the Walsh Hadamard domain. It allows us to reject most of the mismatch candidates in early stage, thus significantly reduces the computation requirement of the motion estimation process. The proposed algorithm is able to achieve accuracy close to the ideal full search, but requires only 10% computation time.



Model Fusion from Multiple Depth Maps

(Wei Zhang, W.K. Cham, H.T. Tsui)

The goal of this project is to develop a new method for rendering of a high-quality surface model mapped with a high-quality texture by merging multiple depth maps and textures. Given a depth map with every input image, we hope to fuse the depth estimate into one consistent model surface. At the same time, a refined texture should also be generated. In this way, we get a more vivid and complete 3D model by texture mapping. And the visual quality can be improved a lot. This will be very significant and can be widely applied in image based rendering, virtual reality, communication and so on.



Visual Signal Processing and Communications

Automatic Video Segmentation and Tracking for Real Time Multimedia Services

(Hongliang Li, King N. Ngan)

In the past several years, there has been rapid growing interest in content-based applications of video data, such as video retrieval and browsing, video summarization, video event analysis, video editing. Video segmentation has been a key technique for semantic object extraction and plays an important role in digital video processing, pattern recognition, and computer vision. The task of segmenting/tracking a video object emerges in many applications, such as bank transactions monitoring, surveillance and video conferencing.

Generally, it is difficult to segment objects automatically without any primary criteria for segmentation. An intrinsic problem of the 'blind-segmentation'; algorithms, which have no contextual knowledge assumption regarding the object being segmented, is that objects of interest may not be homogeneous with respect to low-level features or usually change with the environmental factors.

The objectives of the project are:

§ To investigate the saliency model for extracting objects of interest from videos.

§ To develop novel automatic video object segmentation techniques for generic and specific (e.g., video surveillance, videoconferencing) applications based on saliency model.

§ To develop real time object segmentation techniques based on saliency model and fast transform.

The successful completion of this project will provide new video segmentation tools for many applications in multimedia services such as videotelephony, videoconferencing, computer games and digital entertainment.



Multiview Image Segmentation for Video-based Rendering

(Wenxian Yang, King N. Ngan)

User interactivity has been a key feature of the new and rapidly developing content-based multimedia, and can be achieved using video-based rendering. Video object segmentation has been one of the vital tasks in video-based rendering. Although humans can identify semantic entities effortlessly, video object segmentation remains a fundamental research problem. Fortunately, multiview video provides depth information which not only indicates the 3D scene structure, but also functions as an important cue for segmentation. In addition, integration of multiple image cues including depth, motion and color will produce more robust segmentation results.

However, most existing multiview video segmentation algorithms segment the depth field and the color image independently or alternatively, and fuse the results for the final mask. They fail to utilize all the information simultaneously and efficiently, and may lack of accuracy, generality, or require expensive computations. Moreover, most existing methods can only be applied to dense-sampled stereoscopic images, and cannot be used for practical video-based rendering systems where sparse-sampled multiview video sequences are processed. In this research, we focus on the segmentation of multiview video sequences obtained by sparse camera settings so that the system is cost-effective and practical. We also emphasis on the fusion of multiple image cues by including multiple constraints in a global energy function, and the energy minimization will be solved using graph cuts.



Transform Domain Downsizing Transcoding for HDTV

(Haiyan Shu, King N. Ngan)

High-Definition television (HDTV) refers to the broadcasting of television signals with a higher resolution than traditional formats allow. In order to transmit HDTV to wireless users, transcoding is required to adapt these high resolution, high quality video sources to the bandwidth constrained network and limited display size devices.

Video transcoding is a tool to convert a pre-coded bit stream to another coded bit stream with different format, size, transmission rate, or simply translate it to a new syntax according to different requirements. When a high definition video is transmitted over a limited bandwidth network, an efficient solution is adopting a downsizing transcoder to reduce the spatial resolution of the bit stream.

For video downsizing transcoding, an attractive solution is to realize it in the transform domain. This is because transform domain downsizing always presents better video quality over spatial domain downsizing. In addition, the computational complexity can be reduced since the forward and inverse transformations are saved.

The purpose of this project is to apply downsizing transcoding completely in the transform domain and solve the drift propagation problem in the transform domain transcoding.



Error Resilient Video Transmission over Wireless Networks

(Jie Li, King N. Ngan)

Due to the enormous amount of raw video data and the limited bandwidth, common video coding standards employ very efficient compressing techniques which introduce complicated dependency in the coded bit stream. As the result, the highly compressed video is very sensitive to transmission errors because of error propagation. Video transmission over wireless networks is especially challenging because of limited available bandwidth and high bit and packet error rate.

This project focuses on source and channel coding of videos to implement a bandwidth efficient error resilient method for the transmission of coded video sequences over wireless channels. VLC/FLC data partitioning is proposed to reduce the effect of spatial error propagation introduced by bit errors, based on the analysis of the effect of spatial error propagation for each MB-level FLC (fixed length coded) syntax. FLC syntaxes that will not potentially introduce spatial error propagation are grouped into a separate partition. RCPC (Rate-Compatible Punctured Convolutional) channel coding is employed to reduce channel bit error rate. Error detection and concealment techniques are implemented at the decode side to mitigate the loss of picture quality.

Error sensitivity-based redundant macroblock strategy is proposed to efficiently protect sensitive MBs against packet losses. MB DMSE (Differential MSE) is employed to evaluate the error sensitivity of MBs. The most sensitive MBs are transmitted in separate additional slices while coarsely quantized copies of the MBs are placed in the original slice. Future directions include robust motion estimation and mode selection, efficient error concealment, error resilient techniques for scalable video coding, etc.



Highly Scalable Video Object Coding Using Wavelet Packet Transform

(Yu Liu, King N. Ngan)

One of the outstanding features of the MPEG-4 standard is the ability of direct access and manipulation of objects within a video sequence. Such object-based access and manipulation can be used in video editing, video games, advertising, news broadcasting, etc. However, despite these features brought by MPEG-4, some important features, such as scalability, coding efficiency, etc., still need more efforts. In this research, we will extend our object-based CCAQO algorithm, which is based on context classification and quadtree ordering in wavelet packet domain, from image object coding to video object coding. For video object texture coding, the coding performance of the OB-CCAQO algorithm can be further improved by the online updated probability distribution function (pdf) of the estimated significance probabilities obtained from previous video object planes. To improve further the accuracy and efficiency of motion estimation in shift-invariant wavelet domain, we propose to develop two advanced ME/MC modes for arbitrarily shaped video objects: block-based and mesh-based ME/MC in shift-invariant wavelet domain. All these key techniques will be integrated to develop a new framework for 3D scalable wavelet/wavelet packet video object coding algorithm by using overcomplete in-band motion compensated temporal filtering (OIBMCTF). The new framework will include the coding of still image and moving video objects of arbitrary shape.



Advanced Techniques for HD Video Coding

(Jie Dong, King N. Ngan)

As High-Definition (HD) videos are widely used for many applications nowadays, such as HDTV broadcasting, HD DVD, surveillance and so on, the good visual quality is the main attraction. So for HD video coding, high fidelity becomes the most important issue. On the other hand, since the raw data rate of HD videos ranges from 265 to 746 Mbps, high compression ratio is definitely the key enabler of the development of these applications.

These two conflicting aspects make HD video coding a very challenging topic and deserve much research effort.

Our research work is started from investigate the unique properties of HD videos, for example, relatively smoother texture and higher frame rate, which mean higher spatial and temporal correlations. Then, special coding tools are proposed to fully exploit these properties and bring a breakthrough of performance. We have designed larger transforms for HD video coding, which improve the coding efficiency significantly.

Another goal of the project is to extend the functionalities for different applications, including salable solution, de-interlacing techniques, rate control, and so on. These functionalities are critical when HD videos are transmitted in various channel conditions and received by a variety of devices, ranging from high-resolution projection systems down to small mobile terminals.

From an academic perspective, coding HD video offers a unique opportunity to revisit and reinvent most coding techniques, ranging from motion prediction/compensation to transforms, from quantization to entropy coding, from bit allocation to rate control.



Implementation of H.264 on Mobile Device

(Zhenyu Wei, Kai Lam Tang, King N. Ngan)

H.264/AVC \cite{264standard} is the newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. Our work is to implement H.264 baseline codec on a HP 4700 PDA which has a powerful embedded processor PXA27x produced by Intel. By using the way of system optimization, algorithm optimization and instruction optimization, we have successfully implemented the H.264 baseline codec on mobile device and made great improvement in terms of coding speed with very little performance degradation in terms of PSNR and bit-rate. Experimental results show that the speed of our codec increases significantly after optimization (more than 25 frames per second). In the QCIF (176×144) resolution, our encoder and decoder can run in real time on the mobile device. Our work is significant in engineering and has a lot of market values.



Perceptual Optimized Algorithms for H.264 Video Coding

(Zhenyu Wei, King N. Ngan)

As we known, H.264 (JVT) is a new video coding standard jointly developed by ITU-T and ISO MPEG. It uses many advanced techniques to improve the coding performance. Under the same bit rate, H.264 can achieve much less coding distortion compared with other previous standards. The distortion metrics adopted in H.264 are mean squared error (MSE) and peak signal-to-noise ratio (PSNR). However, they have been widely criticized for not correlating well with perceived quality measurement. It is very important to develop an objective video quality assessment method which incorporates perceptual quality measures by considering human visual system (HVS). In order to solve these problems, the goal of this research work is to propose some efficient and perceptual optimized algorithms for H.264 encoder. We expected to propose an efficient and perceptual optimized encoder, which not only has better visual quality under constrained bit rate, but also is useful for some real-time or low-power applications.



Optimization of H.264 Encoder and Decoders

(Kai Lam Tang, King N. Ngan)

H.264 is a new video coding standard which outperforms the previous video coding standards such as H.261 and H.263. It employs many advanced video coding techniques to improve the coding performance. Variable block size motion estimation (VBSME) is one of these techniques but it is computational intensive. Enhanced SAD Reuse Fast Motion Estimation is a fast VBSME algorithm which reuses computed SADs (sum of the absolute difference) to improve the speed of VBSME. In addition to SADs reuse, pattern-based ME and refinement search are also applied to maintain good coding performance. By employing many fast algorithms including mode decision, early skip, fast motion estimation, fast sub-pixel motion estimation algorithms and code-level optimization techniques, the H.264 baseline profile codec can eventually run in real time on Pocket PC with more than 25 frames per second for QCIF 4:2:0 format. This is a great achievement because H.264 is computational intensive. Beside the H.264 codec, these techniques can also be applied to AVS decoder to achieve fast decoding speed for HD video sequences.



Joint Disparity Estimation and Segmentation for Image-based Rendering

(Chunhui Cui, King N. Ngan)

Interactivity is a key feature of new and emerging visual application, where the user has the opportunity to be active in some way, for example, to look around within a visual scene by freely choosing a viewpoint. An effective and applicable technology to provide such functionality is image-based rendering (IBR), which focuses on generation of intermediate images from other images instead of using 3-D models. Typical IBR exploits scene geometry in the form of depth or disparity data, thus early vision tasks such as object segmentation and correspondence estimation are still the crucial issues.

This research work is to develop a joint disparity estimation and segmentation scheme that can provide accurate and cost-effective stereo representation for successive multiview coding and rendering processing. The solution of this problem is the key technology for the development of the majority of leading-edge interactive visual communication and rendering systems.



Rate Control for Multiple Video Object Video Coding

(Zhenzhong Chen, King N. Ngan)

MPEG-4 is the first international multimedia standard that supports object-based video coding. In MPEG-4, a scene is viewed as a composition of video objects with intrinsic properties such as shape, motion, and texture. It is different from classical video coding standards in the sense that each object in the scene is separately encoded. This poses several difficulties that are not known to classical rate control algorithms. This research project addresses multiple video object rate control for MPEG-4. The first question is how to distribute the available bit-rate between the various objects in the scene. The second question deals with how many bits should be allocated to encoding the shape and how many to encoding the texture. It should also be noted that the efficiency of the rate control algorithm selected to regulate the videos at a given bit-rate heavily impacts on the visible quality of the video reconstructed at the decoder. The results from this project will therefore be of interest not only to the research community but also to the telecommunications and multimedia industry.





    Legal Notices | Privacy Policy | Site Map | Contact Us
© 2008 Image and Video Processing Laboratory
All Right Reserved