Skip to main content

Fujitsu

Global

Archived content

NOTE: this is an archived page and the content is likely to be out of date.

Fujitsu Laboratories Develops Technology to Accelerate Analysis of Genomic Information

Speeds up in-database, large-scale analyses by a factor of approx. 400, helping to advance genomic medical research

Fujitsu Laboratories Ltd.

Kawasaki, Japan, March 15, 2016

Fujitsu Laboratories Ltd. announced the development of a technology that accelerates database analyses of the correlations between genomic variations and environmental information, such as disease and lifestyle habits. This technology speeds up the process by a factor of roughly 400 compared to existing methods.

Thanks to advances in genomic medicine, it is possible to analyze genomic and genetic information in combination with clinical and environmental information to study the relationship between genetic factors and environmental factors. This kind of research relies on genomic information stored in databases in order to analyze the information from different perspectives, but because of the massive volumes of genomic information being handled, there is the problem of the lengthy time required for processing.

Fujitsu Laboratories has greatly accelerated analysis processing by introducing a new data structure that makes it possible to rapidly analyze large-scale genomic information within a database.

This technology makes it possible to acquire knowledge that previously was difficult to obtain quickly, aiding the advance of genomic medical research.

Details of this technology are being presented at the 19th International Conference on Extending Database Technology (EDBT 2016), opening March 15 in Bordeaux, France.

Background

The advent of next-generation sequencers which quickly read enormous volumes of genomic information has opened up the possibility of measuring and analyzing a genome to reveal what diseases a person might be susceptible to, to predict a patient's response to a drug and the drug's side effects, and to design personalized preventative and therapeutic treatments (Figure 1, lower section). Making effective use of genomic medicine will require studying and understanding the relationship between genomic information and clinical and environmental information.

With a person's entire genome being approximately three billion bases in length, there can be tens of millions of variations, known as "variants"(1) that can account for differences between individuals. With type-2 diabetes, for example, there are dozens of variants and several lifestyle habits that are known to cause the disease, and there may be synergies among each of these factors. One method for gaining such insights is the genome-wide association study(2), where a huge volume of genomic information and clinical and environmental information are collected and subjected to statistical analysis (Figure 1, upper half).

Figure 1: Genome-wide association study and sample genomic treatmentFigure 1: Genome-wide association study and sample genomic treatment
Larger View (143 KB)

Issues

Aggregating data on a single variant across a population of 100,000 people takes about one second of processing time using existing open-source database software (according to Fujitsu Laboratories' research). Accordingly, for a single disease, for example, aggregating variants at 10 million loci in a study population of 100,000 people would take roughly 120 days. Genome-wide association studies require multiple iterations of this kind of analysis, making improvements in processing speed a pressing issue.

About the Technology

Fujitsu Laboratories has developed a data structure and its processing method for quick aggregation processing of genomic information in a database, to greatly accelerate genome-wide association studies. This structure stores an individual's genomic information in a single column in the database, and encodes information on each variant with a fixed bit length for storage (Figure 2).

Figure 2: Genome-type data structure columnsFigure 2: Genome-type data structure columns

This genome-type data structure has the following benefits:

1. A data structure that enables simultaneous aggregation of variants

Storing each instance of variant information in a conventional database table structure required repeated database queries corresponding to the number of variants (Figure 3). With the new genome-type data structure, all variants are stored in a single column, which enables them to be aggregated simultaneously using a single query, dramatically improving the aggregation processing performance per variant (Figure 4).

2. Encoding technology allows for faster aggregation

The majority of variant types can be expressed as a two-bit code using a computer. But because there are many variants that require codes of three or more bits, there is a need for variable-length data handling for codes with multiple bit lengths. When variable-length data structures are used, however, high-speed aggregation processing is no longer possible. Fujitsu Laboratories devised a method for the storing and aggregation processing of this kind of variable-length data without breaking the fixed bit-length structure, enabling high-speed aggregation processing.

In addition, the encoding technique compresses the size of the genomic information to one-sixteenth of that when variants are stored as text strings. This means that data for even several hundreds of thousands of people can be handled in-memory, enabling high-speed processing.

Figure 3: Conventional aggregation processingFigure 3: Conventional aggregation processing
Larger View (40 KB)

Figure 4: Aggregation processing using genome-type data structureFigure 4: Aggregation processing using genome-type data structure
Larger View (28 KB)

Results

With this technology, a genome-wide association study using all genome variants covering tens of millions of loci can be performed on a conventional computer in a short period of time. Furthermore, correlations with diseases that had been overlooked in the past due to limits on the variants studied because of time constraints can now be covered. This will help promote next-generation genomic medical research and comprehensive analyses of genomes and other molecular information in living things using "omics" big data analyses.

Future Plans

Fujitsu Laboratories is continuing work to further accelerate aggregation processing and to add features that will be needed for practical use. After passing through joint research with medical institutions and ethics reviews, the company plans to apply this technology to the solutions in Fujitsu Limited's Healthcare Systems Unit.


  • [1] Variant

    The DNA in the human genome is made up of roughly three billion bases, which come in four types (represented by the letters A, G, C, and T), and there are differences in these bases on the genomes of different people, called mutations and polymorphisms, that can be the source of variations between individuals. These differences are called variants. Although variants account for less than 1% of the total length of the genome, that still amounts to some tens of millions out of the approximate three billion base pairs.

  • [2] Genome-wide association study (GWAS)

    A comprehensive analytic method that statistically studies the correlations between hundreds of thousands of variants (genotypic) and diseases and drug response (phenotypic).

About Fujitsu

Fujitsu is the leading Japanese information and communication technology (ICT) company, offering a full range of technology products, solutions, and services. Approximately 159,000 Fujitsu people support customers in more than 100 countries. We use our experience and the power of ICT to shape the future of society with our customers. Fujitsu Limited (TSE: 6702) reported consolidated revenues of 4.8 trillion yen (US$40 billion) for the fiscal year ended March 31, 2015. For more information, please see http://www.fujitsu.com.

About Fujitsu Laboratories

Founded in 1968 as a wholly owned subsidiary of Fujitsu Limited, Fujitsu Laboratories Ltd. is one of the premier research centers in the world. With a global network of laboratories in Japan, China, the United States and Europe, the organization conducts a wide range of basic and applied research in the areas of Next-generation Services, Computer Servers, Networks, Electronic Devices and Advanced Materials. For more information, please see: http://www.fujitsu.com/jp/group/labs/en/.

Press Contacts

Public and Investor Relations Division
Inquiries

Company:Fujitsu Limited

Technical Contacts

Computer Systems Laboratory

E-mail: E-mail: genome-db@ml.labs.fujitsu.com
Company:Fujitsu Laboratories Ltd.


All company or product names mentioned herein are trademarks or registered trademarks of their respective owners. Information provided in this press release is accurate at time of publication and is subject to change without advance notice.

Date: 15 March, 2016
City: Kawasaki, Japan
Company: Fujitsu Laboratories Ltd.