Archived content

NOTE: this is an archived page and the content is likely to be out of date.

Fujitsu Develops Column-Oriented Data-Processing Engine Enabling Fast, High-Volume Data Analysis in Database Systems

Accelerates analysis processing more than fifty-fold on a single server with PostgreSQL open-source database

Fujitsu Laboratories Ltd.

Kawasaki, Japan, February 26, 2015

Fujitsu Laboratories Ltd. today announced that it has developed a column-oriented data-storage and processing engine that enables fast analysis of large volumes of data in a database system.

In recent years, column-oriented databases have emerged as a system that allows for better speed when reading and analyzing large volumes of data, as a counterpart to existing row-oriented databases, which are suited to handling data updates. But problems have been either that the changes to row-oriented data cannot be automatically reflected in column-oriented data, or that the size of the column-oriented data is constrained by installed memory.

Fujitsu has developed an engine that, running on a PostgreSQL open-source database, without being dependent on memory capacity, instantly updates column-oriented data in response to changes in row-oriented data, and processes column-oriented data quickly. The engine quickly analyzes indexes⁽¹⁾, which are provided by most database systems, and can be used by developers without special consideration to whether the storage method is row-oriented or column-oriented. With a parallel-processing engine especially suited for processing column-oriented data, analyses run on a single CPU core are conducted 4 times faster than before, and one server equipped with 15 CPU cores can run analyses at least 50 times faster.

Even on smaller computer systems with little memory, this technology enables real-time data analysis reflecting the latest data.

Details of this technology are being presented at the Seventh Forum on Data Engineering and Information Management (DEIM 2015), opening March 2 in Koriyama, Fukushima.

Background

Database systems are able to report processing results back to a terminal efficiently, for what is called online transaction processing (OLTP), and are used widely for processing changes to data, such as with the storage and utilization of data from business systems.

Issues

In recent years, there has been an increasing demand for high-volume data analysis that is fast and available on demand, creating a need for a single database system that can handle OLTP and high-volume data analysis simultaneously. In contrast to the row-oriented data that is best-suited to OLTP, column-oriented data is better for data analysis, but this method gets bogged down when processing changes to data. One relatively recent solution is to store both row-oriented and column-oriented data as a way to accelerate analyses. But with previous technologies, changes to the row-oriented data are not automatically reflected to the column-oriented data, and memory constraints are also problematic.

About the Technology

Fujitsu has developed an engine for PostgreSQL open-source databases that instantly reflects updated row-oriented data to column-oriented data, stores column-oriented data without being dependent on memory capacity, and quickly conducts analysis of column-oriented data. Massive volumes of column-oriented data can be stored by taking advantage of a new technique for managing column-oriented data. The engine also enables high-speed analyses of the indexes that typical database systems provide, and can be used without special consideration for whether the data is stored as row-oriented or column-oriented. On the DBT-3 benchmark⁽²⁾ Query1 for reading, filtering, and aggregating, the parallel-processing analysis engine, which has been optimized for column-oriented data, runs 4 times faster on a single CPU core than its predecessors. On a single server with 15 CPU cores, performance is at least 50 times faster.

Key features of the technology are as follows:

Large-volume column-oriented data storage
To efficiently manage large volumes of column-oriented data that cannot fit into memory, data domains are managed in "extents," large increments (roughly 260,000 records), in which data domains are secured or deleted, and in which free domains are reclaimed. While managing large increments and simultaneously running analyses can result in long wait times, Fujitsu has adopted a solution in the form of MultiVersion Concurrency Control (MVCC ⁽³⁾), which allows analyses to run at the same time that data domains are managed.
Column-oriented indexes (column-store indexes)
Like other indexes, creating a column-oriented index (column-store index) is a way to select a data-storage method (row-oriented or column-oriented) that suits the contents of the database being queried and to process it. When there is an update to row-oriented data from which the column-store index is created, the column-oriented data is automatically updated. This completely frees users from worries about the data-storage method.

Figure 1: Architecture of the new technology
Larger View (96 KB)
Analysis engine optimized for column-oriented data and parallel processing using an original shared-memory structure
Simply using column-oriented data to improve read performance does not make the most of the benefits that column-oriented data can offer. Fujitsu developed an analysis engine that can apply the same process at once to multiple types of data (vector processing), which improves performance under single parallelization. Also as a parallel-analysis mechanism, the company also developed a new shared-memory structure so that multiple processes operating in parallel in PostgreSQL can hand off data with little slowdown. On a server with 15 CPU cores, this can achieve minimum fifty-fold performance improvements over the previous PostgreSQL.

Results

This technology enables existing smaller systems with limited memory to achieve real-time analysis and utilization of big-data in ways that were not possible before.

Future Plans

Fujitsu is aiming for a commercial implementation of this technology during fiscal 2015, as a part of Symfoware Server, Fujitsu's database product.

[1] Index

Hint information for searching a database more quickly.
[2] DBT-3 benchmark

A benchmark for measuring the performance of decision-support systems.
[3] MultiVersion Concurrency Control

A technique for ensuring consistency when there are simultaneous requests from multiple users. Used in many database systems.

About Fujitsu

Fujitsu is the leading Japanese information and communication technology (ICT) company offering a full range of technology products, solutions and services. Approximately 162,000 Fujitsu people support customers in more than 100 countries. We use our experience and the power of ICT to shape the future of society with our customers. Fujitsu Limited (TSE: 6702) reported consolidated revenues of 4.8 trillion yen (US$46 billion) for the fiscal year ended March 31, 2014. For more information, please see http://www.fujitsu.com.

About Fujitsu Laboratories

Founded in 1968 as a wholly owned subsidiary of Fujitsu Limited, Fujitsu Laboratories Ltd. is one of the premier research centers in the world. With a global network of laboratories in Japan, China, the United States and Europe, the organization conducts a wide range of basic and applied research in the areas of Next-generation Services, Computer Servers, Networks, Electronic Devices and Advanced Materials. For more information, please see: http://www.fujitsu.com/jp/group/labs/en/.

Press Contacts

Public and Investor Relations Division
Inquiries

Company:Fujitsu Limited

Technical Contacts

ICT Systems Laboratories
Data Platform Lab.

E-mail: csi-db@ml.labs.fujitsu.com
Company:Fujitsu Laboratories Ltd.

All company or product names mentioned herein are trademarks or registered trademarks of their respective owners. Information provided in this press release is accurate at time of publication and is subject to change without advance notice.

Date: 26 February, 2015
City: Kawasaki, Japan
Company: Fujitsu Laboratories Ltd.

Top of Page