Archived content

NOTE: this is an archived page and the content is likely to be out of date.

Fujitsu Develops Industry's First Integrated Development Platform for Big Data

Slashes data processing development time by 80%; enables integrated development for large-scale stored data analysis and complex event processing

Fujitsu Laboratories Ltd.

Kawasaki, Japan, August 20, 2012

Fujitsu Laboratories Limited today announced the development of the industry's first integrated big data development platform for processing large volumes of diverse time-series data.

In recent years, massive amounts of diverse data—as represented by sensor data, human location data, and other kinds of time-series data—continue to grow at an explosive pace. This has prompted the development of parallel batch processing technologies such as Hadoop⁽¹⁾, as well as complex event processing technologies⁽²⁾ for processing data in real time. However, because each processing technology has employed different types of development and execution environments, it has been difficult to quickly apply insights gained from data analysis results to real-time processing applications. Moreover, maximizing the performance of Fujitsu's event processing engine⁽³⁾ has required considerable knowledge about parallel application design, such as how to estimate network traffic.

By developing an integrated development platform able to handle description languages for both stored data analysis and complex event processing, Fujitsu was able to reduce development time for both batch and event processing by roughly 80% (from 8 weeks to 1.5 weeks) in a case study involving POS analysis-based coupon issuing. The platform is also equipped with a newly developed parallelism extraction function that automatically improves the processing efficiency of complex event processing. This function automatically extracts parallelism opportunities from event processing applications and recommends how to combine each analysis step in a way that optimizes execution plans without any extra effort.

This is one of the technologies that will be put to use to support human-centric computing, which will provide precisely targeted services anywhere.

Details of the new technology will be published at the IPSJ/SIGSE Software Engineering Symposium 2012 (SES2012), to be held from August 27 - 29, 2012.

Background

In recent years, massive amounts of diverse data—sensor data, human location data, and other kinds of time-series data—continue to grow at an explosive pace. There is a strong demand for taking this kind of "big data" and efficiently extracting valuable information that can be put to immediate use in delivering services, such as various navigation services.

Challenges

To process big data, Hadoop and other parallel batch processing technologies are deployed to analyze large volumes of stored data, as well as complex event processing technologies for processing event data in real time as it arrives. At the same time, these technologies are supported by different development and execution environments that, until now, have not been integrated. As a result, these disparate environments have made it difficult for analysts to quickly apply insights gained from data analysis results to complex event processing.

Furthermore, high-speed processing employing multiple servers in the cloud has proven to be crucial for performing complex event processing of large volumes of events. While Fujitsu has implemented a distributed event processing engine, its ability to raise performance by simply provisioning new servers, so as to take advantage of a cloud computing environment, relied on a complex application design phase that aims to identify effective ways to distribute each processing step.

Newly Developed Technology

Fujitsu has developed an integrated development platform that combines big data analysis and complex event processing. Using this platform, for example, companies can analyze the most up-to date purchasing trends from accumulated POS data and then hone in on a specific customer segment to issue coupons in real time, all as part of a simple process that does not require additional programming.

This technology consists of two parts: 1) A development platform integration feature that easily performs automatic program generation, regardless of development language; and 2) a parallelism extraction function that automatically improves the processing efficiency of complex event processing (Figure 1).

Figure 1: Overview of the Integrated Development Platform

Features of the newly developed technology are as follows.

1. Development platform integration feature

After the processing details have been defined via data-flow diagrams and properties (corresponding to processing parameters), a proper set of patterns are selected and used to automatically generate either batch or real-time processing programs. During this generation phase, the operations produced by the selected templates are automatically supplemented with data conversion steps wherever necessary. The generated programs are finally deployed and executed on either a batch or real-time execution environment (Figure 2).

Figure 2: Development Platform Integration Feature

2. Parallelism extraction function for complex event processing

The parallelism extraction function extracts parallelism from real-time processing programs that have been automatically generated by the integrated development platform. The function will automatically recommend optimal combinations of parallelization schemes in order to decrease network traffic (Figure 3).

In real-time processing, incoming events can be distributed among multiple servers for parallel execution. For each processing step, various distribution schemes are applicable, and performance varies greatly depending on the scheme that is chosen. Generally speaking, a distribution scheme with finer granularity makes it easier to evenly distribute loads and facilitates better performance. For event processing, however, better performance is achieved by reducing network traffic. Here, traffic is minimized by the application of a uniform distribution scheme that tries to place inter-dependent processing steps on a same server whenever possible. By doing so, it avoids intermediary transfers and optimizes the overall application performance.

At runtime, the recommended distribution scheme will be used to select an optimal server allocation strategy in response to event volume fluctuations, resulting in better overall performance.

Figure 3: Parallelism Extraction Function for Complex Event Processing

Results

1. Development platform integration feature

Using the new integrated development platform, as demonstrated during a Fujitsu case study, it was possible to shorten development time for both batch and event processing by approximately 80% (from 8 weeks to 1.5 weeks). Moreover, because parameters for each kind of processing can be easily modified without additional programming, trial-and-error tests on the development platform can be easily performed, such as for quickly applying insights gained from data analysis results to event search criteria.

2. Parallelism extraction function for complex event processing

The new parallelism extraction function generates executable programs that are specifically adapted for dynamic load balancing. Those programs can be easily scaled-out or down without having to re-compile the original application.

In addition, after measuring the performance of sample programs with different event distribution schemes, Fujitsu Laboratories confirmed that a uniform distribution scheme, by placing inter-dependent processing steps onto a same server, is able to reduce communications traffic by 60% and achieve a 3.5x improvement in processing efficiency compared to isolated distribution schemes that distribute each processing step independently (Figure 4).

Figure 4: Results from Parallelism Extraction Function for Complex Event Processing

Future Developments

Going forward, Fujitsu plans to further expand the features of the new technology while aiming to commercialize it in the company's platforms and middleware for big data by fiscal year 2013. Fujitsu will also explore deploying the technology in a wide range of applications, such as its services and products, in order to enable the utilization of valuable information generated through the process of collecting, accumulating and analyzing large volumes of sensor data.

[1] Parallel batch processing technologies such as Hadoop

A technique in which massive data sets are converted to batches, which are processed in parallel. Developed and released by the Apache Software Foundation (ASF), Hadoop is an open-source framework for efficiently performing distributed parallel processing of massive volumes of data.
[2] Complex event processing (CEP)

A method of extracting valuable information from a stream of big data in real time. By processing data in memory in accordance with pre-defined rules (described by EPL), the data can be processed in real time. Event Processing Language (EPL) is a description language for describing the content of complex event processes.
[3] Event processing engine

PRESS RELEASE "Fujitsu Develops Distributed and Parallel Complex Event Processing Technology that Rapidly Adjusts Big Data Load Fluctuations" (December 16, 2011)

About Fujitsu

Fujitsu is the leading Japanese information and communication technology (ICT) company offering a full range of technology products, solutions and services. Over 170,000 Fujitsu people support customers in more than 100 countries. We use our experience and the power of ICT to shape the future of society with our customers. Fujitsu Limited (TSE:6702) reported consolidated revenues of 4.5 trillion yen (US$54 billion) for the fiscal year ended March 31, 2012. For more information, please see http://www.fujitsu.com.

About Fujitsu Laboratories

Founded in 1968 as a wholly owned subsidiary of Fujitsu Limited, Fujitsu Laboratories Limited is one of the premier research centers in the world. With a global network of laboratories in Japan, China, the United States and Europe, the organization conducts a wide range of basic and applied research in the areas of Next-generation Services, Computer Servers, Networks, Electronic Devices and Advanced Materials. For more information, please see: http://jp.fujitsu.com/labs/en.

Press Contacts

Public and Investor Relations Division
Inquiries

Company:Fujitsu Limited

Technical Contacts

Software Systems Laboratories
Software Innovation Laboratory

E-mail: soft-bigdata@ml.labs.fujitsu.com
Company:Fujitsu Laboratories Ltd.

All company or product names mentioned herein are trademarks or registered trademarks of their respective owners. Information provided in this press release is accurate at time of publication and is subject to change without advance notice.

Date: 20 August, 2012
City: Kawasaki, Japan
Company: Fujitsu Laboratories Ltd., , , , , , , , , ,

Top of Page