Skip to main content

Fujitsu

中文 | 日本語

China

Archived content

NOTE: this is an archived page and the content is likely to be out of date.

Seal Retrieval Technique for Chinese Ancient Document Images

Fujitsu Research & Development Center Co. Ltd.

Beijing, China, March 30, 2016

Fujitsu R&D Center today announced a seal retrieval technique for Chinese ancient document images. By applying advanced seal extraction technique and high precision two-level hierarchical seal matching technique, we can retrieve the seals in the ancient document images from our massive seal database to acquire the basic information of the seals, such as seal text, author of the seal and dynasty of the seal. Our technique can not only reveal the classical beauty of this unique world intangible cultural heritage to the public, but also provide a convenient and powerful tool for librarians and researchers to make a better understanding and management of the existing ancient seal resources. We hope the technique might make further contribution to the protection of Chinese ancient archives.

Background

Seals in Chinese ancient documents are often regarded as one of the most important information carriers of the documents. Various information, such as owner of the seal, dynasty background and types of the seal, can be retrieved by analyzing the seal images stamped on the documents. They provide significant support on the research and management of Chinese ancient documents.

Currently, most of the seal image processing, such as extraction, retrieval and analysis, are mainly achieved manually. The enormous amount of the seals makes this work highly costly. Therefore, there is a strong need to replace the manual work with automatic techniques. Under this situation, we developed an advanced seal extraction technique which helped to build massive seal database from ancient document images. Meanwhile, seal retrieval technique with high performance is also designed for matching certain seal from massive seal database.

Topics

Traditional seal extraction approaches usually make use of shape and color information of the seal. They work well on clear modern document images. However, most of the Chinese ancient documents are seriously degraded and the seals on the documents often have various shapes. Thus, the traditional methods are not appropriate. In order to solve the problem, we regard seals as enclosed region with dense red strokes inside. Based on this characteristic, we designed a robust seal extraction method which is capable for dealing with ancient document images with complex background.

Technology

1. Seal extraction based on SSR(Stroke Stable Region)

For most of the Chinese ancient documents, seals usually present as an enclosed region with dense red strokes inside. By applying color space transform, we first enhance the red region of the image by using human vision-based color space, as shown in figure 1. Afterward, we use SSR technique to extract the region with stable stroke feature that can be regarded as red stroke (see figure 2). By analyzing the closure property of the extracted red strokes, the seal region can be located precisely.

After we locate the seal successfully, by applying color separation technique, we can split the seal from background image to gain a clear seal image (illustrated in figure 3)

Using human vision-based color space to enhance red regionFigure 1: Using human vision-based color space to enhance red region

Applying SSR technique to extract the region with stable stroke featureFigure 2: Applying SSR technique to extract the region with stable stroke feature

locate the seal and split the seal from background by color separation techniqueFigure 3: locate the seal and split the seal from background by color separation technique

2. Two-level hierarchical matching

In ancient China, seals possess a large variety of styles and contents. They can be shaped as squares, rectangles, circles or even irregular shapes. Traditional Optical Character Recognition (OCR) techniques face challenging obstacles such as connected character segmentation and the lack of learning samples. We developed a two level hierarchical seal image matching technology which adaptively select matching strategies according to the query samples. Our technology treat the whole seal image as the matching item to avoid the difficult character segmentation problem. For simple seals with clear strokes, we calculate global stroke distribution features to achieve fast matching. For difficult seal images caused by inaccurate detection or severely overlapping with text, we apply local feature matching to accurately find the correct match in a small set of candidates.

Figure 4: Two-level hierarchical matchingFigure 4: Two-level hierarchical matching

3. Massive seal database establishment

To include most of the famous and classical seals from ancient documents, we expanded cooperation with Chinse libraries and universities. By utilizing the techniques mentioned above, we established a massive seal database with over 50,000 seal samples.

Future Plan

Fujitsu R&D Center will expand the local tries of our seal retrieval technique with major Chinese libraries and museums. We will improve our technology based on the feedbacks of customers and enlarge our database to provide more robust seal analysis services.

About Fujitsu

Fujitsu is the leading Japanese information and communication technology (ICT) company, offering a full range of technology products, solutions, and services. Approximately 159,000 Fujitsu people support customers in more than 100 countries. We use our experience and the power of ICT to shape the future of society with our customers. Fujitsu Limited (TSE: 6702) reported consolidated revenues of 4.8 trillion yen (US$40 billion) for the fiscal year ended March 31, 2015.
For more information, please see http://www.fujitsu.com.

About Fujitsu Research and Development Center

Established in 1998, Fujitsu Research and Development Center Co., Ltd. is a wholly owned R&D center of Fujitsu Limited, located in Beijing. The center's research areas cover the major business fields of the Fujitsu Group, including information processing, telecommunications, semiconductors, and software and services. For more information, please see: http://www.fujitsu.com/cn/frdc/en/

Contacts

E-mail: E-mail: itl-ocr@cn.fujitsu.com
Company:Fujitsu R&D Center Co., Ltd.


All company or product names mentioned herein ae trademarks or registered trademarks of their respective owners. Information provided in this press release is accurate at time of publication and is subject to change without advance notice.

Press Release ID: 2016-03-30
Date: 30 March, 2016
City: Beijing, China
Company: Fujitsu Research and Development Center Co., Ltd.