Addressing the issue of stream data processing in conventional systems
Amid the rapid progress of IoT technologies, a lot of companies in various industries have been showing increasing demand for analyzing and processing a large amount of data sent from each sensor device in real time to utilize it quickly for various services. Especially in automobile industry, there has been an accelerated shift towards the development of connected cars that enable real-time utilization of a huge amount of data sent from vehicles, while the wave of “CASE (Connected, Autonomous, Shared&Service, Electric) is surging in the industry. For example, automobile makers are trying to design and develop platforms for analyzing the data of vehicle speeds and locations in real time to visualize the status of roads. In case roads are congested, or an accident happens, drivers moving toward the roads in question will receive a warning message to change their routes. In this way, a lot of efforts have been made for designing and developing the systems to assist and enhance safe driving.
However, there is a problem with conventional data processing methods. Senior Manager Miwa Ueki, who is leading the R&D of cyber-physical systems (CPS) including IoT systems in the Cyber Physical System Project of Super Middleware Unit at Fujitsu Laboratories, says:
“In the case of connected cars, in order to detect the status of each vehicle and road at high speed based on a large amount of data sent from moving vehicles on a second-by-second basis, we need a parallel distributed system for stream data processing. However, when we use the conventional system, we have to stop the process of a target service every time we add or change only the small part of it. Therefore, in order to provide the information to self-driving cars, in which operation cannot be suspended for even a moment, we needed a novel technological innovation for realizing nonstop operation.”
Implementing a highly flexible data processing platform by combining various insights
In order to solve this issue, members of the Cyber Physical System Project started the development of a stream data processing architecture that enables service content to be added or changed without stopping the ongoing data processing. Ms. Ueki adds that, at the beginning, this theme was originally approached by a group engaged in IoT system development at Fujitsu Laboratories. However, they encountered difficulties in finding the optimal solution and could not go on as they intended.
At that time, we were introducing a technology that would eventually help us to make a breakthrough this problem, involving a group working on software development also at Fujitsu Laboratories. Mr. Kota Itakura of the Cyber Physical System Project explains:
“We were told about the ‘Apache Flink,’ which is an open source distributed stream data processing engine. We set up a cross-sectoral team of researchers with different specialties and started development of this unique technology using this engine. Bringing together the in-house diverse insights, we developed ‘Dracena (Dynamically-Reconfigurable Asynchronous Consistent EveNt-processing Architecture)’.”
Dracena is a data processing platform that enables data processing programs to be added or changed dynamically as a plug-in while continuing a large amount of real-time data processing and the target service. We do not have to stop the target system to add a new function, or perform batch processing for data analysis, as before. It becomes possible to perform stream data processing with high extensibility and flexibility in development such as for adding new services. Ms. Ueki adds, “In particular, when we try to realize a service using sensor data, we have to take an agile development method to repeat by trial and error the changing of processes or parameters and confirming the results. Dracena is a very effective platform for utilizing real-time data as it can support both agile development and nonstop operation.”
Moreover, a large variety of IoT data in the real world tends to be collected separately by its use or system. Therefore, it was difficult to connect these kinds of data across different services at high speed. On the other hand, Dracena made it possible to handle the data on an object basis such as a “person, thing and event” in the stream data processing and to share all processing results, by taking special measures to keep the status of data and its processing, and improved its usability in various services.
Fujitsu Laboratories executed the simulation of a system to which a huge amount of data such as car speeds and locations is sent from 1 million vehicles (objects) per second by using Dracena, and checked the system performance by supposing the case where a new service or function (such as to detect a vehicle that stopped suddenly) is added. As a result, we could confirm that even if a new data processing program is added during the ongoing data process, the delay of service operation can be kept within 5 ms on average.
Continuing the R&D considering “speed” and “user-friendliness”
The development of Dracena is being promoted by engineers with high expertise. Mr. Itakura is leading the development and implementation of its core system.
He explains their current initiatives and future plans by saying, “As Dracena is basically a platform for real-time stream data processing, in order to utilize the data processing results, application programs which run on this platform are also very important. From this viewpoint, we are also putting emphasis on the development of a framework that makes application development easy, as well as API development for connecting services, with an eye to making Dracena user-friendly for both application developers and end users.”
Mr. J. Michaelis is improving the system by taking a different approach. He is an engineer, who was engaged in database and network development before.
“I am fixing and modifying the source code written by Itakura-san for improving the system speed and performance. As a stream data processing platform, Dracena is never allowed to stop its operation. Thus, I am executing tests from various aspects to find bugs and fixing them.”
Mr. Takafumi Onishi, who is also the member of the Cyber Physical System Project, is also executing tests repeatedly with J. Michaelis for system enhancement. He talks about his role by saying, “I check the performance of data processing systems connected to Dracena and tune it by changing the setting parameters to improve the performance to process a much larger amount of data at high speed.”
As mentioned before, many experts with various backgrounds across multiple fields such as software and IoT collaborate in the development of Dracena, which is a characteristic of this project. Onishi says, “Although it is just my second year at Fujitsu Laboratories, I am enjoying my work as a member of this cutting-edge technology project with many researchers who have expertise.”
Starting with a mobility field, aiming to expand its application areas
As it was difficult for conventional stream data processing systems to perform parallel processing, to connect the data of different systems, or to add or change services, they were categorized as vertically integrated silo type systems, which can utilize analysis result data for only some specific service. On the contrary, Dracena enables us to take an agile and flexible development approach firstly to implement a simple system for data analysis, and to add new services one after another.
The development of Dracena was started by firstly targeting a mobility field where real-time processing of a large amount of data, nonstop system operation and frequent addition/change of services are needed. Now, it is incorporated into a Fujitsu product called “Future Mobility Accelerator” as its key technology. We have already started a joint project with Autonomic in the U.S., which is a mobility service company and a subsidiary of Ford Motor Company. We are also advancing PoCs (Proof of Concept), with automobile makers in and outside Japan.
Connected cars have diverse potential to create new services such as to detect symptoms of drunk driving from steering data, to predict the side wind strength at the exit of a tunnel in combination with map data, or to find illegally parked vehicles in the image data.
Although the current target of Dracena is a mobility field, it can be applied to other fields as well. In the future, the use of IoT will expand rapidly into various fields. In order to process and utilize a huge amount of collected data, we will have to struggle with more challenges than now. Therefore, so as to avoid such situations, we will need a system to collect high-quality data appropriately and it will be a turning point for new data utilization.
Ms. Ueki concluded by saying, “In the future, we are planning to use Dracena for all kinds of real-time services to address problems in the real world such as support for the economical use of home appliances, watching the elderly who need care, and route guidance at some event or disaster.”