Skip to main content

Fujitsu

Global

Archived content

NOTE: this is an archived page and the content is likely to be out of date.

Fujitsu Develops Industry's First System-Failure Management Technology for Cloud Computing Era

- Delivers high-reliability non-stop system services; automatically detects and resolves system failures -

Fujitsu Laboratories Ltd.

Kawasaki, Japan, February 23, 2010

Fujitsu Laboratories Ltd. today announced the development of technology that will enable the company to implement the Trusted-Service Platform it has been advocating for cloud computing services, in view of the shift toward the era of cloud computing. As an industry first, Fujitsu has developed technologies that can detect system failures before they happen, by improving the ability to analyze cloud system data and gather information, narrowing down the causes of failures, and automatically resolving them. Cloud systems play an important role in supporting various societal infrastructure systems and must be able to continuously deliver services without interruption. Even in the event that a failure does occur, services must not be interrupted. Through Fujitsu's new technology, it is possible to address cloud system failures before they occur. Furthermore, because failures can be automatically resolved, the technology reduces the workload of administrators and delivers cloud services that users can utilize with confidence.

Background

Cloud computing is a delivery model whereby remotely located IT resources, such as servers, storage, networks, middleware and business applications, are provided as services over the Internet. Users have the ability to use the functions they need, in whatever amounts they need, and only when they need them.

In addition to its use as a platform for further enhancing work efficiency and productivity, cloud computing is also employed as a system for supporting various societal infrastructure systems, like those used in entertainment or lifestyle-related applications. In order to support the creation of a human-centric networked society, whereby IT is employed with an emphasis on people (information and communication technologies, "ICTs"), there is a need for cloud systems to continue delivering secure and stable services non-stop.

Traditionally, many companies have addressed system failures immediately after their occurrence. However, because companies cannot afford downtime for cloud systems that play an important role in supporting infrastructure systems, a different approach is required. In addition, large-scale systems thus far have ensured the continuous operation of services through expensive, redundant configurations. In order to deliver high reliability and stability to cloud systems - which aim to operate economically - what is needed is technology that can predict and resolve failures before they emerge.

Technological Challenges

Cloud computing systems have the following characteristics:

  1. Large-scale:
    When companies take existing systems that operate independently and consolidate them into data centers and enterprise IT systems, the scale of the systems increases.
  2. Complexity:
    When companies employ virtualization technologies and operate numerous services on the same physical server, system configurations and system dependency relationships become complex.

Given the aforementioned characteristics, when a failure takes place in a cloud system, it can affect various services, in addition to requiring a great deal of manpower and time to locate where the failure has occurred.

Newly-Developed Technology

In order to provide highly reliable and stable services via cloud computing, Fujitsu Laboratories developed a technology that detects failures and averts them before they occur. Specifically, the technology monitors the system, predicts failures, narrows down their causes, and quickly resolves them. (Figure 1)

Figure 1: Previous vs. newly-developed method of detecting and handling failures

Larger View (112 KB)

1. Detection of signs of failure (Prediction)

Fujitsu Laboratories has developed two technologies to detect signs of failures depending on the type of failure.

(1) Detection of failures through the analysis of system messages:

This technology focuses on specific patterns in messages that are generated just before failures occur and detects warning signs. By comparing the pattern of generated messages with messages from previous system failures, the technology can pick up on signs of failure. By employing Bayesian learning(1) methods to assign weights to example data from previously generated message patterns, the system can detect signs of failure with great accuracy. (Figure 2)

Figure 2: Failure detection through analysis of system messages

Larger View (62 KB)

(2) Detection of potential failures that do not generate messages:

When configuring equipment such as servers, human error can lead to the input of incorrect settings. In this kind of situation, the server will operate according to the settings and may not generate any error messages. An effective method for detecting failures in this instance is to gather and analyze data packets that travel across networks that link servers and systems, and then analyze minor changes on the packet level - such as data loss, resent packets and transmission delays. In order to monitor large-scale systems that are involved in cloud computing, Fujitsu Laboratories has developed a technology that is compatible with 10Gbps high-speed communication technology, and which detects network and server system failures in real time.

2. Narrows down causes of failures

The technology scans through detected signs pointing towards system failure and makes inferences about the most likely areas that have generated these signs. Using the observed symptoms as a point of origin, the technology employs network and system configuration information to trace the symptoms' causes. It then overlays the results of evaluations taken from multiple points of origin, generating inferences about the most likely causes based on the areas with the most overlap or with no proper activities.

3. Resolves causes of failures

The system leverages past knowledge of how to deal with system failures, including system log information, and presents administrators with the most suitable methods for dealing with the determined causes of the failures. Due to the fact that previous failures will often occur again, the system stores previous cases of system failures and the procedure history to resolve them in its knowledge base, so that it can quickly determine a solution in order to resolve the cause of the failures.

Results

With this new technology, Fujitsu is able to quickly address cloud system failures, allowing the delivery of high-reliability, continuous-operation cloud system services to its customers.

In its own internal systems that employ the technology, Fujitsu has been able to detect instances of mistaken system settings prior to errors actually occurring. In addition, Fujitsu has been able to reduce the average time required to resolve failures from an average of 15 minutes to approximately one (1) minute.

Future Developments

Fujitsu plans to gradually deploy this technology in its On-Demand Virtual System Services and LCM services, on its Trusted-Service Platform.


  • [1] Bayesian learning

    A probabilistic method for estimating the cause for an event based on evidence. Fujitsu Laboratories' application of Bayesian learning in its technology has achieved a failure detection rate of 96.2% after training an example of a failure 10 times.

About Fujitsu Laboratories

Founded in 1968 as a wholly owned subsidiary of Fujitsu Limited, Fujitsu Laboratories Limited is one of the premier research centers in the world. With a global network of laboratories in Japan, China, the United States and Europe, the organization conducts a wide range of basic and applied research in the areas of Multimedia, Personal Systems, Networks, Peripherals, Advanced Materials and Electronic Devices. For more information, please see:http://jp.fujitsu.com/group/labs/en/

Press Contacts

Public and Investor Relations Division
Inquiries

Company:Fujitsu Limited

Technical Contacts

Cloud Computing Research Center

Phone: Phone: +81(44)754-2575
E-mail: E-mail: cloud-mate@ml.labs.fujitsu.com
Company:Fujitsu Laboratories Ltd.


Company and product names mentioned herein are trademarks or registered trademarks of their respective owners. Information provided in this press release is accurate at time of publication and is subject to change without advance notice.

Date: 23 February, 2010
City: Kawasaki, Japan
Company: Fujitsu Laboratories Ltd., , , , ,