Solaris Fault Manager of Solaris™ 10 Operating System function
: [ Solaris Service Manager | Solaris Fault Manager ]
This function reduces service downtime, leading to server process availability improvements.
Solaris Fault Manager reduces service downtime by automatically handling hardware errors, including error detection, error analysis and failed part isolation. The result is a major improvement in server process availability.
This new function provides enhancement to the SPARC Enterprise machine management function called Enhanced Support Facility (ESF). Combined with SPARC Enterprise's own hardware monitoring technology it provides even higher-levels of server failure management.
Hardware error diagnostic mechanism
In Solaris10 the hardware monitoring mechanism has has three main components. The Error Handler detects errors. The Fault Manager receives error information from the Error Handler. The Diagnostic Engine provides the cause and effect resulting from the error information.
Fault Manager records the error to a log file and directs an agent to deactivate any failed component. Finally, an error message is sent to the system console screen providing details to the administrator for remedial action to be taken.
Error message improvements and Knowledge Article Web
Until Solaris 9, system administrators had to infer the occurrence of a server event from the record in the log file ( syslog) and then determine the resolution themselves.
With Solaris 10, error cause and resolution is much easier as all necessary information (Message ID, remedial action, detailed information (URL), etc. is provided in the message automatically displayed on their console.
The URL in the message above refers to the Knowledge Article Web, which Fujitsu and Sun Microsystems provide. These Web pages provide comprehensive details of error and remedial actions.