Best Practices for Process Alarm Management

The purpose of process control alarms is to use automation to assist human operators as they monitor and control processes, and alert them to abnormal situations. Incoming process signals are continuously monitored, and if the value of a given signal moves into an abnormal range, a visual and/or audio alarm notifies the operator of that condition.

This seems like a simple concept, almost not worthy of a second thought, and unfortunately, sometimes the configuration of alarms in a control system doesn’t get the attention it deserves. Configuring and maintaining alarms properly requires careful planning and has a significant impact on the overall effectiveness of a control system.

Early Alarm Systems

Before digital process control, each alarm indicator required a dedicated lamp and some physical wiring. This meant that:

  1. Due to the effort required, the need for a given alarm was carefully scrutinized, somewhat limiting the total number of alarms
  2. Once the alarm was in place, it had a permanent “home” where an operator could become comfortable with its location and meaning

The Introduction of Digital Alarms

As control systems became digital, the creation and presentation of alarms changed significantly. First, where a “traditional” control panel was many square feet in size, digital control system human machine interfaces (HMIs) consisted of a few computer monitors which displayed a representation of the process in an area more appropriately measured in square inches than square feet.

Second, creating an alarm event was a simple matter of reconfiguring some software. Multiple levels of alarms (hi & hi-hi, lo & lo-lo) could easily be assigned to a single process value. This led to an increase in the number of possible alarm notifications. Finally, when an alarm was activated, it was presented as an icon, or as flashing text on a process schematic screen, and then logged in a dedicated alarm list somewhere within the large collection of display screens. However when the alarm was presented, it lacked the consistency of location and intuitive meaning that the traditional physical lamp had.

The Dilemma With Digital Alarms

The digital alarm systems worked acceptably well for single alarms and minor upsets. But for major upsets the limited visual real estate and the need to read and mentally place each alarm created bottlenecks to acknowledging and properly responding to large numbers of alarms in a short interval of time.

If a critical component in a process fails, for example a lubrication pump on a large induction fan, the result can be a “flood” of alarms occurring over a short time period. The first wave of alarms is associated with the immediate failure, low lube oil pressure, low lube oil flow, and high bearing temperatures. The second wave is associated with interlocks shutting down the fan, high inlet pressure, low air flow and low downstream pressure. With no ID fan the upstream boiler will soon start to shut down and generate numerous alarms, followed most likely by problems from the process or processes which are served by the boiler.

The ASM Consortium

Analyses of a number of serious industrial accidents has shown that a major contributor to the severity of the accidents was an overwhelming number of alarms that operators were not capable of understanding and properly responding to in a timely manner. As a result of these findings, in 1992 a consortium of companies including Honeywell and several petroleum and chemical manufacturers was established to study the issue of alarm management, or more generally, abnormal situation management.

The ASM Consortium, with funding from the National Institute of Standards and Technology, researched and developed a series of documents on operator situation awareness, operator effectiveness and alarm management. Since then a number of other industry groups and professional organizations, such as the Engineering Equipment and Materials Users Association in the UK and Instrument Society of America have also examined the issue of alarm management and issued best practices papers.

Alarm Management Best Practices

The central message of these alarm management best practices documents is that the alarm portion of a digital control system should be put together with as much care and design and the rest of the control system. It is not adequate to simply assign a high and low limit to each incoming process variables and call it good. There are a number of practices which can improve the usability and effectiveness of an alarm system. Some techniques are rather simple to implement, others are more complex and require more effort

1. Planning

When designing or evaluating an existing system, start by looking at each alarm. Evaluate whether it is really needed, and is it set correctly? For example, a pump motor may have an alarm which sounds if the motor trips out. However, if there is also a flow sensor downstream of the pump which has an alarm on it, if the pump stops, two alarms will register. Since the real effect on the process is a loss of flow, it makes sense to keep that alarm and eliminate the motor-trip alarm.

2.Prioritization

Alarms should be prioritized. Some alarms are safety related and should be presented to the operator in a manner that emphasizes their importance. High priority alarms should be presented in a fixed location on a dedicated alarm display. This allows operators to immediately recognize them and react in critical situations. It is very difficult to read, understand and quickly react to an alarm which is presented only in a scrolling list of alarms which will be continuously growing during a process upset.

3. Grouping & Suppression

Correctly identifying the required alarms and prioritizing them is a help, but these techniques alone will not stop a surge of alarms during a crisis. In order to significantly reduce the number of presented crisis alarms, methods like alarm grouping and alarm suppression are needed. As mentioned in the ID fan example above, a single point of failure can lead to several abnormal process conditions and thus several alarms.

It is possible to anticipate these patterns and create control logic which handles the situation more elegantly. In the case of the ID fan, if the inlet pressure to the fan goes high and the outlet flow drops it makes sense to present the operator with virtual alarm of “Fan down” rather than a dozen individual alarms, all presented within seconds of each other, that he or she has to deal with. While the operator is trying to comprehend a cluster of individual alarms to deduce that the fan is down, the upstream boiler may trip out.

Hopefully, with a single concise alarm of a lost fan, the operator can take action at the boiler and perhaps keep that unit running at reduced rate until the fan can be restored. All alarms are still registered by the system for diagnosis and troubleshooting, but only condensed, pertinent information is presented to the operator. This type of grouping and suppression can be done manually as well. If there is a process unit that is sometimes taken offline or bypassed, it makes sense to group and suppress all of the alarms associated with that unit’s operation. An operator shouldn’t have to continuously acknowledge a low flow alarm on a line that he knows has no flow in it.

4. Human Administration

Perhaps the most important part of alarm management is the actual human administration of the system. However a system is designed, its intent and use needs to be clearly communicated to the operators which use the system. Training operators on how to use and respond to alarms is as important as good original system design. Alarm management is a dynamic endeavor, and as operators use the system they will have feedback which will lead to design improvements. The system should be periodically audited to look for points of failure and areas of improvement. As processes change, the alarm configuration will also need to be changed. This ongoing attention to the alarm system will make it more robust and yield a system which will avert serious process related incidents.