Skip to main content

Fujitsu

Global

Archived content

NOTE: this is an archived page and the content is likely to be out of date.

Fujitsu Technology to Elicit New Insights from Graph Data that Expresses Ties between People and Things

Surpassing limits of conventional deep learning for machine learning technology that acquires knowledge, verifies effectiveness of in IoT, finance, and pharmaceutical fields

Fujitsu Laboratories Ltd.

Kawasaki, Japan, October 20, 2016

Fujitsu Laboratories Ltd. today announced the development of machine learning technology that enables highly accurate analysis of graph-structured data that expresses the relationships between people and things.

Fujitsu Laboratories has now developed new technology that allows existing deep learning technology, which has already achieved extremely high accuracy in image and voice recognition, to be applied to graph-structured data. Graph-structured data has a complicated structure and mixes a variety of data, such as different sizes and methods of expression, but by transforming different data to a uniform expression called a "tensor"(1), used in cutting-edge mathematics, it becomes possible to do highly accurate machine learning on graph-structured data using deep learning technology.

This technology was used for learning the structure and activities of chemical compounds, based on data from the PubChem BioAssay(2) open database of chemical compounds. It was able to learn the relationships between the structures of several hundred thousand chemical compounds, about 100 times that of previous technology, as well as their individual activities. Also, by extracting features that could not be grasped with existing technology, it achieved about 80% accuracy in predicting activity, a 10% increase compared to existing technology.

This technology will be used as part of Human Centric AI Zinrai, Fujitsu Limited's AI technology.

Fig. 1 Data expressed in a graph structure and tensor expressionFig. 1 Data expressed in a graph structure and tensor expression

Development Background

In recent years, drug discovery and a variety of other fields utilize composition databases, such as for finance and chemical substances. These databases handle IoT log data for communication between things, or account transactions, and continue to generate an enormous amount of data that can be expressed in a graph structure to show the relationships between people and things (Fig.1). Previously, Fujitsu Laboratories had developed technology, known as "LOD"(3) to retrieve and analyze graph-structured data. It is expected that accurately categorizing and analyzing this graph-structured data will lead to the creation of new value and the opening up of new business areas.

Issues

Previously, categorization of graph-structured data was done on the basis of whether such data contained partial graphs people had previously focused on. When categorizing large volumes of graph-structured data, however, there were many yet-to-be-expressed features in the partial graphs that had been explored beforehand, so there were limits to achieving accurate categorization.

Deep learning technology can automatically extract characteristic features from data, attracting attention to such areas as image and voice recognition, but due to the complicated structure and the variety of data sizes and expressions mixed in graph-structured data, it was difficult to apply deep learning technology to the problem.

About the Technology

Fujitsu Laboratories has now developed new deep learning technology that can learn with high accuracy from a variety of graph-structured data that express the connections between people and things. Features of the technology are as follows:

1. New tensor factorization technology converts graph-structured data to a uniform expression

This technology uses a type of mathematical expression called a tensor, an extension of vectors and matrices, to express graph-structured data that has a variety of expression formats (Fig. 1). It uses a mathematical operation called tensor factorization(4), a cutting-edge data mining technology, to transform data to a uniform expression format (Fig. 2). Conventional tensor factorization could not always transform similar graph-structured data into similar tensor expressions, but now Fujitsu Laboratories has developed a technology that can perform tensor factorization in a way that maximizes the degree of similarity to an arbitrary pattern chosen as a basis.

Fig.2 tensor-based uniform expression and graph-structured data classificationFig.2 tensor-based uniform expression and graph-structured data classification

2. Technology that optimizes uniform expressions and neural network learning

By extending the scope of application of back-propagation(5), which is commonly used in the learning process for neural networks, to tensor expressions, this technology simultaneously optimizes uniform expressions to maximize the accuracy of categorization (Fig. 3). Specifically, it updates the basis pattern for tensor expressions according to the amount of the difference in the categorization error of the neural network when the basis pattern is changed.

Fig.3 learning of neural network and optimization of a uniform expressionFig.3 learning of neural network and optimization of a uniform expression

Effects

With this new deep learning technology, it is now possible to use data that can be expressed with a graph structure, such as the communication logs of computers or IoT devices, financial transactions, or chemical compositions, in new analyses.

In a trial in which this technology was applied to data from the PubChem BioAssay open database of the structure and activity of chemical compounds, and then to a virtual screening, which searches for candidate chemical compounds for drugs on a computer, it was able to learn the relationships between the structure and activity of several hundred thousand chemical compounds, about 100 times what was achieved with previous technology using support vector machines(6). By extracting features that could not be grasped with previous technology, it achieved an activity prediction accuracy of about 80%, an improvement of 10% compared with existing technology. It is expected that this will greatly reduce development time and cost, which are pressing issues in drug development.

In addition, Fujitsu Laboratories conducted a trial to detect illicit activity or attacks in which this technology was applied to benchmark data(7) derived from graph-structured data representing the communication relationships between hosts. The result was that false positives were successfully reduced by more than 20% compared with existing methods using support vector machines. It is expected that this will increase the efficiency of network monitoring tasks. Beyond that, by applying this technology to such data as records of transactions with digital currencies or the financing records of social lending services, such improvements as highly accurate detection of improper monetary manipulation or sophisticated judgements of suitability for lending become possible.

Future Plans

Fujitsu Laboratories will continue to further improve the accuracy of its categorization technology for graph-structured data, aiming to bring it into practical implementation as a core technology of Human Centric AI Zinrai. In addition, Fujitsu Laboratories will continue to expand the applicability of deep learning technology to more diverse data formats, providing advanced data analysis in a variety of fields from the first half of fiscal 2017.

Endorsement

As a technology that makes it possible to learn from large volumes of diverse data from the life sciences, and that is excellent at learning from large-scale data, deep learning has also been attracting attention in the pharmaceutical industry, where designing chemical-compound feature quantities suited to various predicted effects, such as drug efficacy and side effects, is a significant issue. I expect that Fujitsu's new deep learning technology, which can automatically create a number of features suited for prediction from the learning data, will have a huge impact on the pharmaceutical field.

Professor Yasushi Okuno, Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University

Related links

- Fujitsu and DERI Revolutionize Access to Open Data by Jointly Developing Technology for Linked Open Data (April 3, 2013 press release)

- Fujitsu Takes Systematic Approach to Artificial Intelligence with "Human Centric AI Zinrai" (Nov. 2, 2015 press release)


  • [1] Tensor

    Data representing multidimensional arrays, a generalization of the concepts of vectors and matrices.

  • [2] PubChem BioAssay

    The world’s largest dataset recording data on the structure and activity of chemical compounds in tests of medicinal efficacy and toxicity.

  • [3] Linked Open Data (LOD)

    Open data that can be mutually connected to from anywhere in the world.

  • [4] Tensor factorization

    A technology that factors multidimensional arrays based on the sum of the correlations between multiple elements.

  • [5] Backpropagation

    An algorithm that reduces classification error in neural networks.

  • [6] Support vector machines

    A machine learning method that calculates hyperplanes in multidimensional space, and that can accurately separate data.

  • [7] Benchmark data

    DARPA Intrusion Detection Data Sets.

About Fujitsu

Fujitsu is the leading Japanese information and communication technology (ICT) company, offering a full range of technology products, solutions, and services. Approximately 156,000 Fujitsu people support customers in more than 100 countries. We use our experience and the power of ICT to shape the future of society with our customers. Fujitsu Limited (TSE: 6702) reported consolidated revenues of 4.7 trillion yen (US$41 billion) for the fiscal year ended March 31, 2016. For more information, please see http://www.fujitsu.com.

About Fujitsu Laboratories

Founded in 1968 as a wholly owned subsidiary of Fujitsu Limited, Fujitsu Laboratories Ltd. is one of the premier research centers in the world. With a global network of laboratories in Japan, China, the United States and Europe, the organization conducts a wide range of basic and applied research in the areas of Next-generation Services, Computer Servers, Networks, Electronic Devices and Advanced Materials. For more information, please see: http://www.fujitsu.com/jp/group/labs/en/.

Press Contacts

Public and Investor Relations Division
Inquiries

Company:Fujitsu Limited

Technical Contacts

Knowledge Information Processing Laboratory

E-mail: E-mail: deeptensor@ml.labs.fujitsu.com
Company:Fujitsu Laboratories Ltd.


All company or product names mentioned herein are trademarks or registered trademarks of their respective owners. Information provided in this press release is accurate at time of publication and is subject to change without advance notice.

Date: 20 October, 2016
City: Kawasaki, Japan
Company: Fujitsu Laboratories Ltd.