Kawasaki, Japan, April 01, 2013
Fujitsu Laboratories Limited today announced the development of high-speed data processing technology that enables the timely utilization of big data, which is ever-expanding on account of such sources as social media. This development opens up new potential applications for e-commerce and other businesses.
Fujitsu Laboratories has tackled the problem of conducting real-time analysis of huge volumes of data, increasing overall system speed by a factor of five. Servers may be constantly becoming more technically advanced, but real-time analysis of big data has remained problematic without system-wide performance improvements. Fujitsu Laboratories' insight was to closely link data-analysis software running on a server and the data-management software that handles the data-storage process, then to vary the volume of data being processed at any one time in response to the frequency of processing requests from the data-analysis side. The result is the ability to execute quick analysis even when there is a sharp increase in server access.
An example of a real-world application of this technology could be the distribution of information to multiple users on a moving train. Based on the train's location obtained from the locational information on the mobile devices of the train's passengers, this technology could quickly offer any number of users information that they would find useful, such as a list of nearby shops or restaurants that have been trending for the past several minutes.
Background
Social media is generating enormous volumes of data, and real-world time-series data from sensors and locational information is continuously increasing. More than just data storage, it is important that big data undergo a variety of analyses to quickly extract any valuable information.
A typical example of big data use is recommendation analysis that estimates a person's next action based on social-media data or purchasing history. The process of tracing the connection between data elements contained in the flood of incoming messages, however, is hampered by the fact that the volume of data results and intermediate data of ongoing analysis is too large to be stored in memory.
Figure 1. Background to development and positioning of newly developed technology
Technological Issues
To handle a volume of data that cannot be contained in memory, a hard drive needs to be used as a storage device. The best way to make the most of the hard drive's performance is to continuously record large units of data, but if the unit is too big, performance declines and processing times lengthen. Conversely, when recording small units of data, there is an increase in disk access as data arrives with a higher frequency which diminishes performance. The ideal read/write unit is dictated by the frequency with which data arrives, so different conditions will result in different levels of efficiency and performance.
About the Technology
Fujitsu Laboratories has developed a technology that increases overall system performance by creating a close link between the data-analysis software running on the server and the data-management software that handles the data-storage process, then varying the volume of data being processed at any one time in response to the frequency of processing requests from the data-analysis side. Even when there is a sharp increase in people accessing the server, high-speed analyses can still be performed. Key features of this technology are as follows (Figure 2).
1. Data read/write in bulk
When reading data, the data-management software reads not only the data requested by the data-analysis side but also other data that is laid out nearby on the hard drive. The data-analysis software then selects and uses the necessary parts of this data. Likewise, when writing data, the data-analysis software specifies data that is not necessary and passes it to the data-management side, which then takes the bulk of data it receives and deploys it on the disk as near as the physical layout makes possible.
Performing read/write operations in larger bulks reduces the number of disk accesses and increases the system's overall throughput.
2. Dynamically adjusts the size of bulk read/write
To process as much as possible at one time, the data-analysis software reads more data than is needed, then the needed data is selected for processing. The ideal size for a bulk read will vary depending on conditions, so the system monitors the volume of arriving data and the pace of analysis to decide the size for bulk read/writes, making automatic adjustments for the best performance.
Figure 2. Rationale behind newly developed technology
This technology results in throughput five times faster than previously possible (Figure 3). This enables events transpiring minutes before to be reflected in analytic results, delivering valuable information quickly.
Figure 3. Impact of technology
Results
This technology can be used to distribute information to multiple users on a moving train based on the train's location, such as updated information on nearby sites or events of interest or trendy restaurants. In e-commerce, were a website to experience a sharp increase in the number of users accessing it before Christmas, it could still remain highly responsive. Performing big data analysis in real-time opens up new potential applications and business uses.
Future Plans
Fujitsu Laboratories will move forward in applying the technology to a variety of analysis applications and conduct verification testing with the aim of bringing it into commercial use in fiscal 2014.