Skip to main content

Fujitsu

中文 | 日本語

China

Archived content

NOTE: this is an archived page and the content is likely to be out of date.

An Integrated Multi-Language Patent Retrieval System Is Successfully Developed Out. This System Supports Easy Multi-Language Retrieval and Reading of Patents!

With one language, you can easily retrieve and view multi-language patents.

Fujitsu Research and Development Center Co., Ltd.

Beijing, August 01, 2012

Fujitsu Limited, Fujitsu Research Laboratories Ltd and Fujitsu R&D Center Co. Ltd have successfully developed an integrated multi-language patent retrieval system. With this system, a user can enter Chinese, English or Japanese key words and then realize integrated patent retrieval based on three languages. The system also embeds the patent translation function and can feed the retrieved information to the users according to the language entered by the user, so it can extremely facilitate users to know and get the patent contents. The performance of the machine translation module as the kernel module exceeds it of the similar products from other companies.

【Development Background】

The patent quantity and quality are two the key indicators to measure the innovation capability and industry potential of a country. The patent applications are ranked in top three positions in America, Japan and China. The patent applications from America, Japan and China are over 50% of the global patent applications in 2010, especially the patent application is focused by enterprises in China. The patent applications are continuously increasing annually. In this case, all countries should know the patent applications. The related enterprises or individuals should also know publication of the patents.

How can related patent references from different countries be easily and quickly retrieved? How can the published patents from different countries and enterprises be known? How can the specific contents of the patent references written in different languages be known? The Fujitsu R&D Center develops the multi-language integrated patent retrieval system. Now this system can support to retrieve Chinese, English and Japan patents and can be extended for other languages by virtue of this technology.

【Solution】

0801-1-en

To enable different persons to view the patent references written in different languages, Fujitsu R&D Center develops out a machine translation system for patent translation. This system can translate Chinese, Japanese and English. The translation precision of the Chinese-English translation and Chinese-Japanese translation is 80.7% and 67.6%, so it can meet the common retrieval and analysis requirement of the professionals.

When a user retrieves the patents from China, Japan and Europe and America by using any of Chinese, Japanese and English, the system will automatically translate the retrieval request into other two languages, uniformly retrieve the patent database of three languages and return to the users the patents in the language entered by a user. The system can summarize and analyze the retrieved results and provide the visual analysis results.

0801-2

【Development Technology】

The kernel module of the system is the machine translation module. We develop out the machine translation technology based on combination of rule and statistics to realize mutual translation of the patents in different languages. The key technical points are described as follows:

1)Rule-based conversion technology

The patent references feature long structural sentences. A common machine translation system is not suitable for translation of long sentences. If a long sentence can be converted to multiple short sentences for translation, it can greatly improve translation quality. The rule-based conversion technology can convert the long sentences in the patent references to shorter sentences based on the structural feature of the patent references and input the shorter sentences into the machine translation system for translation. This technology can greatly improve readability of translation.

2)Statistical machine translation model based on hierarchical phases

Now the popular statistical machine translation model includes phrase-based model, hierarchical phrase-based model and syntax-based model. The phrase-based model is mature and is applied extensively, but its translation capability is not strong. The syntax-based model is not practical due to slow translation and high system requirement. The hierarchical phrase-based model can balance the translation effect and translation performance and support hierarchical phrases, so it can effectively solve long distanceorder adjustment problem in the machine translation.

3)Machine translation system based on dependency tree information

With continuous improvement of the machine translation technology, different syntax knowledge is gradually introduced into the machine translation to improve performance of the machine translation. This system improves performance of the machine translation system based on the dependency tree information in two aspects. First this system filters the translation rules to reduce the rules and improve the translation speed by using the dependency tree information. Secondly, this system adjusts the syntax block order and improves translation quality by using the dependency tree structure.

【Main achievements】

The following achievements are achieved in the R&D of the cross-language patent retrieval system:

Patent applications: 11

Published paper: publish multiple parameters in the international top-level conferences such as The Association for Computational Linguistics (ACL) and International Conference on Computational Linguistics (COLING).

Awards in competition:

Ranking the first position in all attending teams from China in the Chinese-English translation competition held by National Institute of Standards and Technology (NIST2009).

Ranking the first position in all attending teams from China in the patent Japanese-English translation competition held by NII Test Collection for IR Systems (NTCIR2011).

【Future】

The machine translation technology is the kernel technology in the system development. We will be dedicated to improvement of the machine translation quality and provide better user experiences. The patent data is the foundational resources in the system development. We will continuously collect and sort all patent data to improve data coverage rate of our system. We will further improve retrieval speed in future.

Information Technology Research Division

Phone: Phone: 010-5969-1000
E-mail: E-mail: mengyao@cn.fujitsu.com
Website:http://www.fujitsu.com/cn/about/local/subsidiaries/frdc/
Company:Fujitsu R&D Center Co. Ltd

Press Release ID: August 1st, 2012
Date: 01 August, 2012
City: Beijing
Company: Fujitsu R&D Center Co., Ltd