Beijing, China, March 30, 2016
Fujitsu R&D Center today announced a seal retrieval technique for Chinese ancient document images. By applying advanced seal extraction technique and high precision two-level hierarchical seal matching technique, we can retrieve the seals in the ancient document images from our massive seal database to acquire the basic information of the seals, such as seal text, author of the seal and dynasty of the seal. Our technique can not only reveal the classical beauty of this unique world intangible cultural heritage to the public, but also provide a convenient and powerful tool for librarians and researchers to make a better understanding and management of the existing ancient seal resources. We hope the technique might make further contribution to the protection of Chinese ancient archives.
Background
Seals in Chinese ancient documents are often regarded as one of the most important information carriers of the documents. Various information, such as owner of the seal, dynasty background and types of the seal, can be retrieved by analyzing the seal images stamped on the documents. They provide significant support on the research and management of Chinese ancient documents.
Currently, most of the seal image processing, such as extraction, retrieval and analysis, are mainly achieved manually. The enormous amount of the seals makes this work highly costly. Therefore, there is a strong need to replace the manual work with automatic techniques. Under this situation, we developed an advanced seal extraction technique which helped to build massive seal database from ancient document images. Meanwhile, seal retrieval technique with high performance is also designed for matching certain seal from massive seal database.
Topics
Traditional seal extraction approaches usually make use of shape and color information of the seal. They work well on clear modern document images. However, most of the Chinese ancient documents are seriously degraded and the seals on the documents often have various shapes. Thus, the traditional methods are not appropriate. In order to solve the problem, we regard seals as enclosed region with dense red strokes inside. Based on this characteristic, we designed a robust seal extraction method which is capable for dealing with ancient document images with complex background.
Technology
1. Seal extraction based on SSR(Stroke Stable Region)
For most of the Chinese ancient documents, seals usually present as an enclosed region with dense red strokes inside. By applying color space transform, we first enhance the red region of the image by using human vision-based color space, as shown in figure 1. Afterward, we use SSR technique to extract the region with stable stroke feature that can be regarded as red stroke (see figure 2). By analyzing the closure property of the extracted red strokes, the seal region can be located precisely.
After we locate the seal successfully, by applying color separation technique, we can split the seal from background image to gain a clear seal image (illustrated in figure 3)
Figure 1: Using human vision-based color space to enhance red region
Figure 2: Applying SSR technique to extract the region with stable stroke feature
Figure 3: locate the seal and split the seal from background by color separation technique
2. Two-level hierarchical matching
In ancient China, seals possess a large variety of styles and contents. They can be shaped as squares, rectangles, circles or even irregular shapes. Traditional Optical Character Recognition (OCR) techniques face challenging obstacles such as connected character segmentation and the lack of learning samples. We developed a two level hierarchical seal image matching technology which adaptively select matching strategies according to the query samples. Our technology treat the whole seal image as the matching item to avoid the difficult character segmentation problem. For simple seals with clear strokes, we calculate global stroke distribution features to achieve fast matching. For difficult seal images caused by inaccurate detection or severely overlapping with text, we apply local feature matching to accurately find the correct match in a small set of candidates.
Figure 4: Two-level hierarchical matching
3. Massive seal database establishment
To include most of the famous and classical seals from ancient documents, we expanded cooperation with Chinse libraries and universities. By utilizing the techniques mentioned above, we established a massive seal database with over 50,000 seal samples.
Future Plan
Fujitsu R&D Center will expand the local tries of our seal retrieval technique with major Chinese libraries and museums. We will improve our technology based on the feedbacks of customers and enlarge our database to provide more robust seal analysis services.
All company or product names mentioned herein ae trademarks or registered trademarks of their respective owners. Information provided in this press release is accurate at time of publication and is subject to change without advance notice.