Current Research


Prof. T. Blu
Research Projects

Prof. W.K. Cham
Research Projects

Prof. K.N. Ngan
Research Projects

Prof. X.G. Wang
Research Projects



Prof. W.K. Cham

Variable Block Size Transform Coding of HD Video Using Order-4, -8 and -16 Transforms

(C.K. Fong, W.K. Cham)

Digital TV systems are being installed in many regions. At first they carry standard definition TV. In near future, they will evolve into high-definition TV (HDTV) systems of resolution of 1920 × 1080 pixels and even ultra high-definition video (UHDV) of resolution of 7,680 × 4,320 pixels. This project investigate the development of video coding techniques with even higher compression ability to tackle the sharp increase of video data.

Today's best video coding standards like H.264 and AVC use only order-4 and order-8 integer cosine transforms (ICT). We propose to improve the compression ability by using order-16 integer transforms that can be realized using 16-bit digital signal processors and easily incorporated in existing standards. Preliminary results have shown that 1db gain in PSNR can be obtained.



Statistical Modeling for Transform Coefficients

(C.K. Fong, W.K. Cham)

The statistical model of transform coefficients is important for image and video processing. They are often used in coding and restoration. The accuracy of these models affects the performance of the processing. We are investigating a more accurate model than the conventional Laplacian model.



Postprocessing of Low Bit Rate Block DCT Coded Images

(Deqing Sun, W.K. Cham)

Transform coding using the Discrete Cosine Transform (DCT) has been widely used in image and video coding standards. However, at low bit rates, the coded images suffer from severe visual distortions which prevent further bit reduction. Postprocessing can reduce these distortions and alleviate the conflict between bit rate reduction and quality preservation. In this project, we view postprocessing as an inverse problem and use a Bayesian approach to solve it. At present, we are developing statistical models to describe the distortions caused by coding, which can also be used in other applications, such as high resolution reconstruction of compressed videos.



Fast Video Object Segmentation

(Lawrence C.M. Mak, W.K. Cham)

Video object segmentation from raw video data is always a difficult problem. The computation involved is usually very high, prohibiting it from being used in real-time environment. We are now designing algorithms to speed up critical steps in the segmentation process, such as motion estimation, preliminary object clustering, and object boundary refinement. Multiple moving objects will be segmented and tracked from the frames simultaneously. The target frame rate is 10 fps for CIF size frames.



Variable Block Size Motion Estimation

(Lawrence C.M. Mak, W.K. Cham)

In the newly emerged video coding standard H.264/AVC, variable block size motion estimation (VBSME) is used to reduce temporal redundancy. We proposed a novel VBSME algorithm which performs motion estimation in the Walsh Hadamard domain. It allows us to reject most of the mismatch candidates in early stage, thus significantly reduces the computation requirement of the motion estimation process. The proposed algorithm is able to achieve accuracy close to the ideal full search, but requires only 10% computation time.



Model Fusion from Multiple Depth Maps

(Wei Zhang, W.K. Cham, H.T. Tsui)

The goal of this project is to develop a new method for rendering of a high-quality surface model mapped with a high-quality texture by merging multiple depth maps and textures. Given a depth map with every input image, we hope to fuse the depth estimate into one consistent model surface. At the same time, a refined texture should also be generated. In this way, we get a more vivid and complete 3D model by texture mapping. And the visual quality can be improved a lot. This will be very significant and can be widely applied in image based rendering, virtual reality, communication and so on.





Prof. K.N. Ngan

Free-View Video Communication Using Multiple Color-Depth Cameras

(S. Li, L. Sheng, K.N. Ngan)

We aim at developing a system that can deliver both 3D and free-view experiences to the end user. The system consists of a server and a client. At the server side, two color-depth cameras (Kinect) are used to capture the scene. The captured contents are compressed and transmitted over the internet to the client. At the client side, the system detects the head position of the viewer, and displays the corresponding view to create the free-view experience. To enhance the immersive feeling, the system will display the scene in 3D mode. [Detail]



Full Reference Visual Quality Assessment

(S. Li, L. Ma, F. Zhang, K.N. Ngan)

The objective is to design full-reference image and video quality metrics that can accurately simulate human perception of visual quality. To this end, we develop a decoupling algorithm that can separate spatial distortions into two categories: additive impairments and detail losses. The final quality prediction balances the influences of these two distortion types, and has been proved experimentally to correlate well with human quality ratings. [Detail]



3D Dynamic Scene Reconstruction Rendering via Multiple RGBZ Cameras

(L. Sheng, S. Li, K.N. Ngan)

Several depth cameras nowadays are able to capture middle-resolution depth in video frame rate (15Hz ~ 30Hz). With multiple RGBZ cameras, we propose to reconstruct reliable and accurate 3D dynamic scene with two different threads processing foreground and background, respectively. Since depth sensor is lacking in accuracy, we propose an overall depth enhancement approach consisting of spatial-temporal detail enhancement and missing data completion in a geometric way. Estimated dynamic scene need to render in a novel viewpoint to produce free-view experience. [Detail]



Adaptive Transform Kernels for Image/Video Coding

(M.H. Wang, L. Xu, K.N. Ngan)

The mode-dependent directional transform (MDDT) employed Karhunen-Loeve Transform (KLT) for compressing directional residue signal of intra prediction along its direction. The transform bases were derived from the singular value decomposition (SVD) of residue signals coming from all kinds of video sequences, which were expected to be efficient for most video sequences. However, the advantage of KLT comes from the concept of a "signal content dependent transform". MDDT and its variants failed to exploit such a concept, so they did not fully exploit the efficiency of KLT.

In our work, we employ a set of KLT bases to cater for each category of video sequences. All video sequences of the same category are assumed to have an identical feature of video content. The classification of video content could utilize any criteria capable of identifying the difference in video content. We use the histogram of the intra prediction residues to classify the video sequences on the training set into several classes. Thus, there are multiple sets of KLT bases provided for the encoder in the proposed algorithm. During the encoding process, one set of KLT bases is selected for a frame according to the feature matching process. [Detail]



Subjective and Objective Object Segmentation Quality Evaluation

(R. Shi, K.N. Ngan)

Object segmentation is a challenge and important task in the computer vision. In the past decade, a larger number of object segmentation methods have been proposed, meanwhile it still lacks the reliable way for evaluating the methods' performance. In the traditional objective object segmentation metrics, they just use the area and distance information to measure the difference between segmentation result and reference (ground truth) without including any perceptual factors. In this project, we aim at designing an objective object segmentation metric which is derived on the basis of perceptual information through subjective experiments and can be matched with subjective evaluation. [Detail]



Intelligent Surveillance System

(Q. Liu, K.N. Ngan)

A fast Head and Shoulder Detector (HSD), Face Recognition based on Bag-of-words are proposed to find and identify the interesting object. Human body segmentation and object video coding as further steps deal with the object. Those algorithms are integrated in an indoor security system including: a cameras setting and calibrations, object detection, object segmentation and recognition, human identification, and object coding. [Detail]



Arbitrarily Shaped Object Coding Based on H.264/AVC

(Q. Liu, K.N. Ngan)

Recent video coding research gives birth to the latest video coding standard H.264/AVC whose compression performance significantly exceeds previous standards with more than 50%. But as compared with the MPEG-4, the capability of coding arbitrarily shaped objects is absent from the latest standard. In this paper, we propose a new arbitrarily shaped object coding codec including encoder and decoder based on the latest H.264/AVC, which adopts improved binary alpha coding with a novel motion estimation to facilitate the binary alpha blocks prediction and in the texture coding, a new arbitrarily shaped integer transform derivative from 4x4 ICT in H.264 to code texture and associated coding techniques. Extension in High Definition sequences and subjective evaluations are made. Experimental results prove the coding efficiency and flexibility of our proposal and the potential applications are demonstrated. [Detail]





    Legal Notices | Privacy Policy | Site Map | Contact Us
© 2012 Image and Video Processing Laboratory
All Right Reserved