Fault management is the module of network management that will help detect, isolate, and solve the issues. Properly implementing fault management will help keep connectivity applications and services running optimally, providing error tolerance and lower downtime.
Whenever any fault or issue occurs, a network component will notify the network operator with the help of a protocol. There will be an alarm that will indicate the fault. And it will stop only when the condition has been resolved.
The fault management process uses complex filtering systems to assign alarms on different levels. These levels range from average to severity, debug to emergency, as in the syslog protocol. The alarm that functions on the severity field is called 733. This alarm will work on indeterminate critical or significant warnings.
The latest version of the syslog protocol draft is under development and includes two sets of severities. It is a good practice when a notification is sent when a problem occurs and when it has been resolved. The information is called the latter when the seriousness or issue is cleared.
Also, fault management will allow a system operator to monitor events for multiple performance actions based on the information. The latest version of the fault management system should be able to identify the possibilities and automatically take action or activate notification software that will allow a human to take proper action. Examples of these notifications are sending an SMS to our mobile phone.
Now, there are two types of fault management performance:
Active fault management: This management addresses the issue by actively monitoring different devices through tools such as ping to check if the device is active and responding. If the device has stopped responding, the management will throw an alarm showing that the device is not working and allow for the proactive correction of the issue.
Passive fault management: This management will collect device alarms when something happens in the machine. In this mode, this system only understands if a monitoring device is intelligent enough to generate an issue and report it to the management tool. The alarm will throw up if the monitor fails, and the notice won’t work if the device locks up and the problem is not detected.