Archived content

NOTE: this is an archived page and the content is likely to be out of date.

Fujitsu Develops Technology that Identifies Applicable Areas from Within Materials Being Discussed

Speaker's voice is linked to a material's content in real time with high accuracy

Fujitsu Laboratories Ltd.

Kawasaki, Japan, April 01, 2015

Fujitsu Laboratories Ltd. today announced that it has developed technology that, based on a speaker's voice, detects in real time and with high accuracy the applicable area in presentation or remote-conference materials.

For meeting materials, product pamphlets, and other presentation materials, providing supplementary information and displaying a section as it is being discussed by the presenter is effective in promoting understanding of the speaker's explanation. To realize this, it is necessary to identify at a glance the place being explained within the materials. However, raising the precision of detecting the correct place after just a few words has proved problematic.

Fujitsu has developed technology that compares spoken words against the content of the presentation materials, and uses characteristics of the presentation's sequence based on statistical calculations to filter candidate sections of the presentation materials, in order to accurately identify the correct section in real time, based on only a few spoken words. When tested in a prototype system designed to automatically highlight the correct place in presentation materials, this technology was found to detect the correct section with 97% accuracy.

It is expected that this technology can be used to create a communication-support system that uses ICT to recognize the content of speech and provide appropriate information in a broad range of settings where information is explained, such as teleconferences, electronic educational materials, and consultations with customers in stores.

Background

Business communications are often based on materials, such as pamphlets used for product explanations, meetings that follow an agenda or talks that use slides that are shared with participants. Given this, there is a need to communicate so that listeners understand quickly, clearly, and easily.

To improve the efficiency of such work-related communications, Fujitsu has developed a communication-support system for communication involving text materials that uses speech-recognition technology to recognize what is being said in real time in order to provide the appropriate information (Figure 1).

Figure 1: Applications of the communications-support system using shared materials

Technological Issues

Commonly, the frequency with which spoken words appear in presentation materials is used to identify the place within the presentation that is being discussed. This method employs techniques such as detecting words from recorded speech and is effective when they can be sufficiently extracted. However it is not suited for real-time identification of the correct section when the presenter has only spoken a few words, as there is no way to distinguish word frequency. Also, with current speech-recognition technologies, a misrecognition rate of up to 10% is unavoidable. As a result, with inferences based on just a few words, errors in recognition have a significant impact on accuracy.

About the Technology

Fujitsu has developed technology that compares what a speaker is saying with text materials and accurately detects the place being explained within the materials in real time, as they are being spoken.

Features of the technology are as follows

1. Automatically generates speech-recognition dictionary to avoid recognition errors (Figure 2)

A challenge in speech recognition is that many short words have similar pronunciation, which increases the likelihood of errors in recognition. Fujitsu solved this problem by combining these short words with the words located in their immediate proximity and storing them in a speech-recognition dictionary as single words. This reduced recognition errors by roughly 60% compared to previous technologies.

Figure 2: Automatically generating the speech-recognition dictionary

2. Increases detection accuracy with characteristics of statistically generated explanatory sequences (Figure 3)

By statistically calculating the relationship between the sequence of a spoken presentation and the materials' structural information, including layout, paragraphing, and location of explanations, it became clear that when the content being discussed exceeds a certain distance from a point in the materials, the frequency that the spoken presentation transitions to that place drops precipitously. Using this sequential characteristic and the frequency of words contained in a given part of the spoken presentation, this technology is able to filter the candidates for the next part of the presentation, and can accurately infer a correspondence with the spoken presentation, even with only a few spoken words being recognized.

Figure 3: How characteristics of presentation sequence and word frequency are used to infer spot in presentation

Results

Applying the developed technology, Fujitsu prototyped and evaluated an "automatic pointing system" that highlights the section of the materials corresponding to the spoken explanation, for use with shared slide materials in a teleconference (Figure 4). Use of this technology boosted detection accuracy to 97%, up from the previous 70%, when, for example, settings were made to display the information to be emphasized within roughly two seconds from the start of an explanation.

When evaluated in comparison to existing pointing methods, such as using a mouse cursor, this technology was found to increase ease of understanding by 30% and cut bothersome display issues in half, demonstrating its usefulness as a communication-support system for remote conferences.

Figure 4: The automatic pointing system being used in a remote conference

Future Plans

Fujitsu aims to have a practical implementation of this technology in a remote communications-support system within 2015. In addition, when combined with the company's sightline-detection technology and translation technology, this technology has a broad range of potential applications to help businesses run more efficiently, such as giving support to operators in call centers by providing information related to frequently asked questions, or providing information-desk support or educational support.

About Fujitsu

Fujitsu is the leading Japanese information and communication technology (ICT) company offering a full range of technology products, solutions and services. Approximately 162,000 Fujitsu people support customers in more than 100 countries. We use our experience and the power of ICT to shape the future of society with our customers. Fujitsu Limited (TSE: 6702) reported consolidated revenues of 4.8 trillion yen (US$46 billion) for the fiscal year ended March 31, 2014. For more information, please see http://www.fujitsu.com.

About Fujitsu Laboratories

Founded in 1968 as a wholly owned subsidiary of Fujitsu Limited, Fujitsu Laboratories Ltd. is one of the premier research centers in the world. With a global network of laboratories in Japan, China, the United States and Europe, the organization conducts a wide range of basic and applied research in the areas of Next-generation Services, Computer Servers, Networks, Electronic Devices and Advanced Materials. For more information, please see: http://www.fujitsu.com/jp/group/labs/en/.

Press Contacts

Public and Investor Relations Division
Inquiries

Company:Fujitsu Limited

Technical Contacts

Media Processing Laboratory

E-mail: spsol-inquiry@ml.labs.fujitsu.com
Company:Fujitsu Laboratories Ltd.

All company or product names mentioned herein are trademarks or registered trademarks of their respective owners. Information provided in this press release is accurate at time of publication and is subject to change without advance notice.

Date: 01 April, 2015
City: Kawasaki, Japan
Company: Fujitsu Laboratories Ltd.

Top of Page