A multistage predictive coding approach to unsupervised text detection in video images

Eliza Yingzi Du, Paul D. Thouin, Chein I. Chang

Research output: Contribution to journalConference articlepeer-review

2 Citations (Scopus)

Abstract

Text detection in video images is a challenging research problem because of the poor spatial resolution and the complex backgrounds, which may contain a variety of colors. This paper presents a multistage predictive coding scheme, referred to as Multistage Pulse Code Modulation (MPCM), which can be used to effectively detect text in color video frames. It converts a video image to a coded image with each pixel encoded by a priority code ranging from 7 down to 0. A priority code "7" retains the most significant information while a priority code "0" represents the least significant information which can be dropped without loss of much information. Using the global mean of the coded image as a threshold value, a set of potential text regions can be detected from each video frame. A series of spatial filters is then implemented in order to eliminate regions that are unlikely to contain text. As a final step, we eliminate those potential text regions where Optical Character Recognition (OCR) produces no results. An extensive set of experiments demonstrates that our proposed MPCM-based text detection technique is effective in detecting text in a wide variety of video images.

Original languageEnglish
Pages (from-to)12-19
Number of pages8
JournalProceedings of SPIE - The International Society for Optical Engineering
Volume4670
DOIs
Publication statusPublished - 2002
EventDocumentation Recognition and Retrieval IX - San Jose, CA, United States
Duration: 2002 Jan 212002 Jan 22

All Science Journal Classification (ASJC) codes

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A multistage predictive coding approach to unsupervised text detection in video images'. Together they form a unique fingerprint.

Cite this