For the last few years, just before Christmas, I have compiled my reflections based on things that have cropped up in the year. I plan to continue doing this and keeping them all on this page.
Most companies have procedures and most procedures (in my experience) are not very good. Guidance is available that at face value appears to support the development of better procedures but it tends to be focussed on form over function – following the guidance can give you procedures that look good at a superficial level but do not actually perform well in practice.
One of the problems is that procedures are written without any clear idea of why they are needed or how they are going to make a contribution. There are lots of reasons to write procedures but the two that I am most interested in are:
- Supporting competent people to carry out a task;
- Supporting training and assessment of people so that they can become competent in performing the task.
In some cases a single procedure can satisfy both of these requirements. In other cases two procedures may be required for the same task.
Whilst I accept that format and presentation does have some influence on how a procedure performs, it is the content that makes the biggest difference. One thing that people are not very clear about is the level of detail to include. It is often suggested that they should have enough information for the man or woman on the street. That is fine if your intention is to give tasks to people with no training or experience. In practice this will not and should not happen. My guide is to consider someone competent to perform a similar task at another location. For example, a process operator or maintenance technician moving from another site.
Another reason to avoid unnecessary detail in procedures is that they will never be 100% correct for every potential scenario where the task is performed. By recognising that people have to be competent to perform a task using a procedure we can see that using proven skills and exercising judgement should be supported by the procedure instead of trying to replace them with additional lines of text. Emergency procedures are good example. There is a danger that if we give people detailed procedures they will start to believe that is how every emergency will happen. Of course this is not the case.
Why do we need so much preamble to our procedures? I have been working with a client who had developed a new procedure template with input from consultants, technical authors and quality control. I am pleased to say that we have managed to reduce this to about two pages, which is probably a fair compromise.
Why do we need a date, time and signature against every step in a procedure? It is very common but does little more than cause suspicion about why it is considered necessary. It is fair to say that keeping track of progress within a procedure is important and useful to the person doing the task, but a simple tick box is more than enough. A record of when the task started and ended can be included if time and date is important, and possibly at interim stages if key hold points are identified.
Which leads me on to key hold points. In some tasks this can be critical but most it probably is not. I need to develop some criteria for deciding when they should be used, but I find that if a procedure is well structured, especially if based on a hierarchical task analysis, it is quite easy to identify the need.
Another item that is usually covered in the guidance is warnings before critical steps. However, I am not aware of any clear guidance about which steps require a warning. I don’t think I have ever seen warnings within procedures used well. I have seen some procedures where the warnings take up more space than the main task steps. I commonly see actions included in warnings that inevitably cause confusion.
I think I understand the idea of highlighting certain steps in the procedure. But what does this mean that steps that don’t have a warning are optional or can be performed carelessly? If that is the case why are those steps included in the procedure? My view is that if a task has been identified as critical enough to need a procedure then every step in that procedure has to be performed. The other issue I have is what should the warning say? Either you say that the step is important, and so imply that others are not. The other option usually results in simply describing the step in a different way. My strong preference is to ensure each step is described clearly and concisely so that additional explanation is not required. Overall, I return to my explanation above that procedures are there is support competent people. Part of that competence is knowing the hazards and how the risks are controlled. This is something that I believe should be in the preamble to the procedure.
I had intended to write an article on procedures to address the issues I routinely encounter. Unfortunately I have not found the time. However, I did stumble across an old article I wrote in 2008 that I had completely forgotten about. Looking back it actually addresses some of my concerns and so I am sharing it here. https://abrisk.co.uk/papers/2008%20Tips%20-%20Better%20Procedures.pdf
(I have had permission from the original publisher Indicator FLM to share it – https://www.indicator-flm.co.uk/en/)
I am currently leading a review and update of EEMUA 191, the go to guide for alarm management. I am finding it a challenging but very interesting activity that is helping me to understand some of the issues around alarm management. I think the guide is currently generally ‘correct’ and has the answers to most of the questions. But there are some gaps and contradictions that I hope we are able to resolve.
One area of confusion for me has been ‘Safety Related’ and ‘Highly Managed’ alarms. I could not decide if they were the same thing or fundamentally different, or whether one was a subset of the other. They are terms that have come from IEC 61508 and IEC 62682. The largely numerical explanation for safety related may be useful in LOPA studies but less so for general alarm management. The definition of high managed seems to tell us more about what to do with them rather than how to identify them in the first place.
From discussion it appears that Highly Managed Alarms (HMA) is the preferred term and they should be identified for situations that rely on human actions to avoid significant consequences because there is no other control or barrier; or because the integrity of the controls or barriers is less than required. They are situations where process operators are our last line of defence.
I have tested this with a few clients recently from different sectors and have been pleasantly surprised by how useful it has been. It has shown that clarifying the nature of the event can be particularly useful so we can have HMA for safety (HMA-S), the environment (HMA-E), commercial threats (HMA-C), assets (HMA-A) and even quality (HMA-Q). Whilst it is tempting to suggest that safety has to be the main priority so the others should be disregarded, the fact is that all of these threats are important to any business and so influence the operators’ behaviours.
Similarly, and at the other end of the scale, a better understanding and use of Alerts is proving to be very useful. These are the events that do not require an operator response to avoid a consequence, so do not satisfy the definition of alarm, but may still be useful for situational awareness of operators or others. They provide a very useful solution in alarm rationalisation when people feel uncomfortable about deleting alarms. Also, they provide the opportunity to direct this information to other groups, so reducing operator workload.
Alerts can be useful and should not be ignored. The main difference is that we expect operators to monitor alarms continuously and respond immediately. But they only need to review alerts periodically (at the start and end of shift, and every couple of hours in between). Alerts that other groups can deal with can be captured in reports that may be sent at the end of each day so that work the following day can be planned.
Hopefully the new edition of EEMUA 191 will be published in 2023. Watch this space…
I have been concerned that companies are not carrying giving people enough opportunity to practice emergency response. They may organise one big exercise every couple of years, usually involving emergency services. But I didn’t think that was enough. It seemed that the problem was that these exercises were a drain on resources and were not leaving people enough time to do anything else. I have tried to encourage the use of regular table top exercises. Simple activities requiring minimal resource that can give everyone who may have to deal with an emergency one day to talk through what may happen and what they would need to do.
It has been great this year to talk with clients that have been supporting their operating and emergency response teams with frequent table top exercises. It appears to me that these companies stand out from the others in many aspects. I can’t prove this but my conclusion is that making sure people take time to pause their normal work to talk about what can go wrong leads them to have a better appreciation of hazards and controls. Whether this is a direct result of the exercises themselves or just that companies that can manage this are just better at managing everything is not clear and does not really matter.
It is worth noting that the group of people required to deal with the early stages of any incident is usually the process operators. They usually work shifts, so the table top exercises have to occur frequently to make sure everyone has the chance to take part. Weekly may be a bit excessive but if this is stretched to monthly it is entirely possible that individuals may not take in part in one for more than a year.
Proof Testing Safety Instrumented Functions
Papers (co-authored with Nick Wise and Harvey Dearden) swere prompted from work with various clients where carrying out task analyses of proof testing showed that it is more complicated that it appears and often the methods being used fall short of where they need to be.
It’s still with us and unfortunately Omicron has ramped it up again. When I wrote my reflections a year ago I really did not think it would still be top of my list of issues. Personally, it hasn’t affected me too much. I have not (knowingly at least) caught the virus. Our eldest son was supposed to be coming home from university today but got a positive lateral flow earlier in the week so has had to delay his plans. He has no symptoms and luckily should be clear early next week and will make it home for Christmas.
I have kept busy with work. Lots of Teams meetings, which are generally working very well. In some cases, remote workshops are working better than face-to-face. Task analysis is a good example because it is allowing more and different people to attend workshops and is less disruptive at the client end because it can be organised in a more flexible way. The task Walk-Through and Talk-Through on site is still very important but can be done as a separate activity when COVID restrictions allow.
The reporting on the pandemic has highlighted to me the problems we have with data in general. The problem is that as soon as numbers are presented as part of an argument or explanation we start to believe that they give us an answer. That can be the case if we are sure that the data is truly representative and applicable to the question that is posed. I don’t know about the rest of the world but in the UK the number of positive test results has been continuously reported as evidence of the pandemic getting better or worse but most of the time appears to be most influence by the number of tests carried out. If I am generous I would say this is just lazy reporting of headline numbers with no attempt to contextualise the meaning. I guess the same often happens when we are presented with quantified safety analyses. We are given numbers that give a simple answer to what is normally a far more complex question.
Another observation that has parallels with my human factors work is not taking human behaviour into account when deciding what controls to implement. A particular example is face coverings. I have seen several lab studies showing how they can stop droplets that may be expelled when you cough, sneeze or even sing. But that seems to have very little relevance to the real world. I know I am not alone in using the same face covering all day, putting it on and taking it off multiple times, and adjusting it the whole time I am wearing it. I don’t know how these behaviours affect the effectiveness of face coverings but I would have liked to see some results from real life applications instead just the lab. At a risk of contradicting my rant above about use of data but infection rates in Wales over the last couple of months have been consistently higher than in England, at a time when face coverings were required in more places in Wales.
Emergency response procedures (nothing to do with COVID)
Several of my clients have asked me to review their emergency response procedures this year. It has usually cropped up as part of a wider scope of task analysis work that I have been involved in.
A key point I would like to emphasise is that treating emergency response as a task is rarely appropriate. One of our main aims when we conduct task analysis is to understand how the task is or will be performed. A key feature of emergencies is that they are unpredictable. Clearly we can carry out analyses for some sample scenarios but the danger is we start to believe that the scenarios we look at will happen in reality. This can give us a very false sense of security thinking we understand something that cannot be understood.
My main interest is usability of emergency procedures. That means the procedures must be very easy to read and provide information that is useful to the people who use them. Often emergency response procedures I see are too long and wordy.
My guidance to clients is to create two documents:
- Operational guide – supporting people responding to emergencies. Develop this first.
- Management system – the arrangements needed to make sure people can respond to emergencies when they occur.
The main contents of the operational guide are a set of “Role cards,” which identify the emergency roles and list the main activities and responsibilities in an emergency, and a set of “Prompt cards” that cover specific scenario types. Each card should fit on one page if possible, but can stretch to two if that makes it more useful. The guide can include other information as appendices, but only content that would be useful to people when responding to an emergency.
There should be a role card for each emergency role and not a person’s normal role. For example, the Shift Manager may be the person who, by default, becomes the Incident Controller so it may appear the terms can be used interchangeably. But in some scenarios the Shift Manager may not available and someone else would have to take the role. Also, there will be times when an individual has to fulfil several emergency roles. For example, in the early stages of an incident the Shift Manager may act as Incident Controller and Site Main Controller until a senior manager arrives to take on the latter role.
Prompt Cards should cover every realistic scenario but you do not need a different one for every scenario. For example, you may have a number of flammable substances so may have several fire scenarios. However, if the response is the same for each you should only have one prompt card for fire. In fact, you will probably find that the same main actions have to be considered for most incidents and it makes sense to have one main Prompt Card supplemented by scenario specific cards that list any additional actions.
The management system is important, but secondary to the Operational Guide. It should identify the resources (human, equipment etc.) required for emergency response, training and competence plans including emergency exercises, communications links and technologies, cover and call-in arrangements, audit and review, management of change etc. The format is less of a concern because it is not intended as an immediate support to people when responding to emergencies, although it may be a reference source so should still be readily available.
Human Factors Engineering
Human Factors Engineering (HFE) is the discipline for applying human factors in design projects. Whilst specialists can be brought into projects to assist, their ability to influence the basic design can be limited. Much better results are achieved if discipline engineers have some knowledge of human factors, especially at the earliest stage of a project. Unfortunately HFE is perceived as concerned only with detail and so left towards the end of a project. Opportunities to integrate human factors into the design are often missed as a result.
I find that engineers are often interested in human factors but rarely get the opportunity to develop their skill in the subject. They already have a lot to do and do not have the time or energy to look for more work. This is a shame because with a little bit of awareness they can easily incorporate human factors into their design with little or no additional effort.
To support designers to improve their awareness of HFE I have created an online course. It introduces HFE and how it should be implemented in projects, from the very earliest concept/select stages though to detail design and final execution. It should be useful to anyone involved in design of process plant and equipment including process engineers, technical safety and operations representatives.
The course is presented as a series of 2-minute videos explaining how to carry out HFE in projects. It is hosted on Thinkific and the landing page is https://lnkd.in/dPeq52RU
The full course includes 35 videos (all less than 2 minutes long), splint into 7 sections, each with a quiz to check your understanding. Also, a full transcript that you can download. You have up to 6 months to work through the course.
There is a small fee (currently $30) for the full course. There is free trial of the course, intended to give you an idea of how the course is presented. You will need to create a user account to see that.
Technology and communications
I have been working with my friends at Eschbach.com, exploring the human factors of communications and how technology can help. This started with a focus on shift handover. It is easy to focus on the 10 or 15 minutes of direct communication when teams change but to be effective it has to be a continuous process performed 24 hours per day.
One of the ironies is that the busiest and most difficult days are the ones where communication at shift handover is most important. But on those days people do not have the time or energy to prepare their handover reports or even update their logs during the shift.
In our everyday life we have become accustomed to instantaneous communication with friends and family using our mobile phones. Most of us probably use WhatsApp or Messenger etc. to send important and more trivial messages to friends and family. Often these include photos because we all know an image can convey so much more than words. Wouldn’t it be good to use some of these ideas to help communication at work?
We held a couple of really good, highly interactive webinars on the subject. You can see a recording of this at https://player.vimeo.com/video/539446810.
If you would like us to run another for your company, industry group etc. let me know because we keen to continue the conversation.
I had a series of three articles published in The Chemical Engineer with co-author Nick Wise. The main theme was on deciding whether risks are As Low As Reasonably Practicable (ALARP). Our interest was whether the various safety studies we carry out can ever demonstrate ALARP. Our conclusion was the studies are our tools to help us do this but ultimately we all have to make our own judgement. The main objective should be to satisfy ourselves that we have the information we need to feel comfortable explaining or defending our judgement to others who may have an interest.
Trevor Kletz compendium
Finally. I am pleased to say the Trevor Kletz compendium was published earlier this year by the Institution of Chemical Engineers (IChemE) in partnership with publisher Elsevier. I was one of the team of authors.
Trevor Kletz had a huge impact on the way the process industry viewed accidents and safety. He was one of the first people to tackle the issues and became internationally renowned for sharing his ideas on process safety. We hope that this compendium will introduce his ideas to new audiences.
The book focuses on understanding systems and learning from past accidents. It describes approaches to safety that are practical and effective and provides an engineer’s perspective on safety
Trevor Kletz was ahead of his time and many of his ideas and the process safety lessons he shared remain relevant today. The aim of the compendium was to share his work with a new audience and to prompt people who may have read his books in the past to have another look. Trevor did not expect everyone to agree with everything he said, but he was willing to share his opinions based on experience. We have tried to follow this spirit in compiling this compendium.
It is available from https://www.elsevier.com/books/trevor-kletz-compendium/brazier/978-0-12-8194478
Contact me for a discount code!
Let’s get this over with at the start. The pandemic has certainly caused me to think:
- Managing risks is difficult enough. Mixing in politics and public opinion magnifies this enormously;
- We have to remember what it means to be human. Restricting what we can do may reduce the immediate risk but life is for living and there has to be a balance, especially when you take into account the knock-on effects of measures taken;
- Our plans for managing crises usually assume external support and bought in services. These are far more difficult to obtain when there is a global crisis and everyone wants the same.
I have not given a great deal of thought into how these points can be applied to my work but they do give a bit of perspective and have affected many aspects of work, including management of our ‘normal’ risk.
I have shared my views on shift handover several times over the years and it is still something we need to improve significantly. However, this year’s pandemic has highlighted how much we rely on person to person communication in many different ways. Infection control and social distancing disrupted this greatly. Just look at how many Skype/Teams/Zoom calls you have had this year compared to last!
My concern is that companies are failing to recognise the importance of communication or the fact that a lot of it takes place informally. You can hold very successful meetings over the internet but with people working separately you miss all those chance encounters and opportunities to have a chat when passing. These are the times when more exploratory discussions take place. They are safe times when you can ask daft questions and throw in wild ideas. Even if most of the time is spent talking about the weather or football, they help you get to know your colleagues better; fostering teamwork.
I posted an article on LinkedIn to highlight the issues with communication and COVID-19, which you can access at https://www.linkedin.com/pulse/dont-overlook-process-safety-andy-brazier. My concern was that companies had implemented the measures they needed to handle the personal health aspects of the pandemic but were not considering the knock-of effects on communication. This is a classic management of change issue. It is easy to focus on what you want or need to do. But you always need to be aware of the unintended consequences.
Posting my article led to discussions with software company Eschbach and we wrote a whitepaper together. You can download it from my website.
An illustration of how companies do not always think about communication has arisen when looking at shift patterns. It is quite right that risks of fatigue from working shifts have to be managed, but that is not the only concern. For 12 hour shifts a fairly standard pattern is to work 2 days, 2 nights and then have 4 days off. This is simple and does well on fatigue calculations. The first day shift rotates through the days. This means that at one part of the cycle the first day is on a Saturday and the following week it is on a Sunday. The problem is that at the weekend the day staff are not present, so if important information is missed at the handover there is no one available to fill in the gaps or answer questions.
Alternative shift patterns are available that ensure the first day shift always happens on a weekday when the day workers are also present. The patterns are bit more complicated and may involve working one or two additional shifts before having a break, so don’t score so well in the fatigue calculations. I am not saying that everyone should change to a shift pattern like that, but I am pointing out that we need to give more recognition to communication and put more effort into supporting it.
The table below shows a comparison of Standard vs an Alternative shift pattern (days of the week along the top).
Control room design
Last year’s news was the publication of the 3rd edition of EEMUA 201 “Control Rooms: A Guide to their Specification, Design, Commission and Operation.” Unfortunately, it is not available free, unless you, or your employer, are a member of EEMUA. However, a free download is now available at https://www.eemua.org/Products/Publications/Checklists/EEMUA-control-rooms-checklist.aspx that includes a high level Human Factors Integration Plan template that can be used for new or modification control room projects (actually it is not a bad template for any type of project). Also, a checklist for evaluating control rooms either at the design or operational stages. I have used the checklist a number of times this year and am pleased to confirm that it really is very effective and useful. You should really use it with the 201 guide, but even on its own the checklist shows you what to consider and it will probably be a useful way of persuading your employer to but a copy of EEMUA 201.
On the subject of control rooms I published another article on LinkedIn this year titled “Go and tidy your (Control) room.” I used COVID-19 to reinforce the message I often give my clients about the state of their control rooms. My opinion is that we need to make sure our control room operators are always at the top of their game and having a pleasant and healthy place to work can help this. Unfortunately the message often seems to fall on deaf ears. Access the article at https://www.linkedin.com/pulse/go-tidy-your-control-room-andy-brazier/
Internet of Things (IoT)
This is a popular buzz word at the moment. The idea is that over the decades technology has given us more and more devices. Recently they have become smarter and so perform more functions autonomously. But there is even greater potential if they can be connected to each other, especially if there is an internet or cloud based system that can perform higher level functions.
Whilst I have a passing interest in the technology I am far more interested in how people fit into this future. The normal idea seems to be that people are just one of the ‘things’ that can be connected to the devices via the cloud. I have a number of problems with this. Firstly, ergonomics and human factors has shown us that technology often fails to achieve its potential because people cannot use it effectively or simply don’t want to. Perhaps more significantly, I feel that the current focus on the technology means that the potential to harness human strengths will be missed.
It is true that simple systems can be automated reasonably easily, and if they are used widely the investment in developing the technology can be justified. But automating more complicated systems is far more difficult. The driverless or autonomous car gives us a very good example. How many billions of pounds/dollars have already been spent on developing that technology? There will probably be a good return on this investment in the end because once it is working effectively it will result in many thousands or millions of car sales. Industrial and process systems are complicated and tend to be unique. It is unthinkable that such massive investment will be made into developing automation that can handle every mode of operation and handle every conceivable event. This is why we still need people, and will do for many years to come.
Although the focus is currently on the technology, I think the greatest advances are going to come from using IoT to support people rather than replace them. By understanding what people can do better than technology we will achieve much more reliable and efficient systems.
Loss Prevention Bulletin
Good news for all members of the Institution of Chemical Engineers (IChemE) is that they will have free access to Loss Prevention Bulletin from January. I believe this is a very significant step forward, making practical and accessible information about process safety readily available to so many more people. I have had a couple of papers published in the bulletin this year. They are both use slightly quirky but tragic case studies to illustrate important safety messages. You can download them at
Control room design
The great news is that the new, 3rd edition of EEMUA 201 was published this year. It has been given the title “Control Rooms: A Guide to their Specification, Design, Commission and Operation.” I was the lead author of this rewrite, and it was fascinating for me to have the opportunity to delve deeper into issues around control room design; especially where theory does not match the feedback from control room operators.
I would love to be able to send you all a copy of the updated guide but unfortunately it is a paid for publication (free for some members of EEMUA members). However, I have just had a paper published in The Chemical Engineer describing the guide and this is available to download free at https://www.thechemicalengineer.com/features/changing-rooms/.
Now it has been published my advice about how to use the updated guide is as follows:
- If you are planning a new control room or upgrading or significantly changing an existing one you should be using the template Human Factors Integration Plan that is included as Appendix 1. This will ensure you consider the important human factors and follow current good practice;
- If you have any form of control room and operate a major hazard facility you should conduct a review using the checklist that is included as Appendix 2. This will allow you to identify any gaps you may have between your current design and latest good practice.
If you have any comments or questions about the updated guide please let me know.
Quantifying human reliability
It has been a bit of surprise to me that human reliability quantification has cropped up a few times this year. I had thought that there was a general consensus that it was not a very useful thing to attempt
One of the things that has prompted discussions has come from the HSE’s guidance for assessors, which includes a short section that starts “When quantitative human reliability assessment (QHRA) is used…”. This has been interpreted by some people to mean that quantification is an expectation. My understanding is that this is not the case, but in the recognition that it still happens HSE have included this guidance to make sure any attempts to quantify human reliability are based on very solid task analyses;
My experience is that a good quality task and human error (qualitative) analysis provides all the information required to determine whether the human factors risks are As Low As Reasonably Practicable (ALARP). This means there is no added value in trying to quantify human reliability and the effort it requires can be counter-productive, particularly as applicable data is sparse (non-existent). Maybe the problem is that task analysis is not considered to be particularly exciting or sexy? Also, I think that a failure to fully grasp the concept of ALARP could be behind the problem.
My view is that demonstrating risks are ALARP requires the following two questions to be answered:
- What more can be done to reduce risks further?
- Why have these things not been done?
Maybe the simplicity of this approach is putting people off and they relish the idea of using quantification to conduct some more ‘sophisticated’ cost benefit analyses. But I really do believe that sticking to simple approaches is far more effective.
Another thing that has prompted discussions about quantification is that some process safety studies (particularly LOPA) include look-up tables of generic human reliability data. People feel compelled to use these to complete their assessment.
I see the use in other process safety studies (e.g. LOPA) as a different issue to stand alone human reliability quantification. There does seem to be some value in using some conservative figures (typically a human error rate of 0.1) to allow the human contribution to scenarios to be considered. If the results achieved do not appear sensible a higher human reliability figure can be used to determine how sensitive the system is to human actions.
It is possible to conclude that the most sensible approach to managing risks is to place higher reliance on the human contribution. If this is the case it is then necessary to conduct a formal and detailed task analysis to justify this; and to fully optimise Performance Influencing Factors (PIF) to ensure that this will be achieved in practice.
It is certainly worth looking through your LOPA studies to see what figures have been used for human reliability and whether sensible decisions have been made. You may find you have quite a lot of human factors work to do!
Maintaining bursting discs and pressure safety valves
I am pleased to say that my paper titled “Maintenance of bursting disks and pressure safety valves – it’s more complicated than you think.” Was published in the Loss Prevention Bulletin in 2019. It highlights that these devices are often our last line of defence but we have minimal opportunities to test them in situ and so have to trust they will operate when required. However, there are many errors that can occur during maintenance, transport, storage and installation that can affect their reliability. Access my page on relief valve and bursting discs here.
Unfortunately I have still not written my next paper in the series, which will be on testing of Safety Instrumented Systems (SIS). It is clear to me that often the testing that takes place is not actually proving reliability of the system. Perhaps I will manage it in 2020.
However, I did have another paper published in The Chemical Engineer. It is actually a reprint of a paper published in Loss Prevention Bulletin in 2013, so many of you have seen it before. It is about process isolations being more complicated than you think. I know this is still a very relevant subject. Access my page on process isolations here.
I have been aware of the general concept of Inherent Safety for a long time, with Trevor Kletz’s statement “what you don’t have can’t leak” explaining the main idea so clearly. However, I have looked a bit more deeply into the concept in recent months and am now realising it is not as simple as I thought.
One thing that I now understand is that an inherently safe solution is not always the safest option when all risks are taken into account. The problem is that it often results in risk being transferred rather than eliminated; resulting in arrangements that are more difficult to understand and control.
I am still sure that inherent safety is very important but maybe it is not thought about carefully enough. The problem seems to be a lack of tools and techniques. I am aware that it is often part of formal evaluations of projects at the early Concept stage (e.g. Hazard Study 0) but I see little evidence of it at later stages of projects or during operations and maintenance.
I have a couple of things going on at the moment where I am hoping we will develop the ideas about inherent safety a bit. They are:
- I am part of a small team writing a book – a Trevor Kletz compendium. We are aiming to introduce a new audience to his work and remind others who may not have looked at it for a while that much of it is still very relevant. A second, equally important aim is to review some of Trevor’s ideas in a current context (including inherent safety) and to use recent incidents to illustrate why they still so important. We hope to publish late 2020, so watch this space.
- I am currently working on a paper for Hazards 30 with a client on quite an ambitious topic. It will be titled “Putting ‘Reasonably Practicable’ into managing process safety risks in the real world.” Inherent safety is an integral part of the approach we are working on.
We passed the 30 year anniversary of this disaster. It is probably the event that has most affected my career as it highlighted so many human factors and process safety issues. I know we have a better understanding of how accidents like Piper Alpha happen and how to control the risks but it is easy for things to get forgotten over time. I wrote two papers for the anniversary edition of Loss Prevention Bulletin. One looked at the role of ‘shared isolations’ (where an isolation is used for several pieces of work). The other was concerned with shift handover, which is one area where I worry that industry has still not properly woken up to.
Control room design
One of my main activities this year has been to rewrite the EEMUA 201 guidance document on design of control rooms and human machine interfaces in the process industry. I have investigated a range of aspects of design and made a point of getting input from experienced control room operators, control room designs, ergonomists and regulators. This has highlighted how important the design is for the operator to maintain the situational awareness they need to perform their job safely and efficiently; and to detect problems early to avoid escalation. This is not just about providing the right data in the right format; but also making sure the operator is healthy and alert at all times so that they can handle the data effectively. A complication is that control rooms are used by many different people who have different attributes and preferences. I found that currently available guidance did not always answer the designers’ questions or address the operators’ requirements but I hope that the new version of EEMUA 201, which will be published in 2019, will make a valuable contribution.
Arguably a bigger issue than original design is the way control rooms are maintained and modified over their lifetime. There seems to be a view that adding “just another screen” or allowing the control room to become storage area for any paperwork and equipment that people need a home for is acceptable. The control room operator’s role is highly critical and any physical modification or change to the tasks they perform or their scope of responsibility can have a significant impact. We, quite rightly, put a lot of emphasis on designing effective control rooms and so any change needs to be assessed and managed effectively taking into account all modes of operation including non-routine and emergency situations.
Safety critical maintenance tasks
Whilst I have carried out safety critical task analysis for many operating tasks over the years it is only more recently that I have has the opportunity to do the same for maintenance tasks. This has proven to be very interesting. A key difference when compared to operations is that most maintenance tasks are performed without reference to detailed procedures and there can be almost total reliance on competence of the technicians. In reality only a small proportion of maintenance tasks are safety critical, but analysis of these invariably highlights a number of potentially significant issues.
I have written a paper titled “Maintenance of bursting disks and pressure safety valves – it’s more complicated than you think.” It will be published in the Loss Prevention Bulletin in 2019. This highlights that the devices are often our last line of defence but we have no way of testing them in situ and so have to trust they will operate when required. However, there are many errors that can occur during maintenance, transport, storage and installation that can affect their reliability.
Another example of a safety critical maintenance task is testing of safety instrumented systems. This is likely to be my next paper because it is clear to me that often the testing that takes place is not actually proving reliability of the system. Another task I have looked at this year was fitting small bore tubing. It was assumed that analysing this apparently simple task would throw up very little but again a number potential pitfalls were identified that were not immediately obvious.
Safety 2/Safety Different
I am increasingly bemused by this supposedly “new” approach to safety. The advocates tell us that focussing on success is far better than the “traditional” approach to safety, which they claim is focussed mainly on failure (i.e. accidents). The idea is that there are far more successes than failures so far more can be learnt. Spending more time on finding out how work is actually done instead of assuming or imagining we know what really happens is another key feature of these approaches.
I fully agree that there are many benefits of looking at how people do their job successfully and learning from that. But I do not agree that this is new. The problem seems to be that people promoting Safety 2/Different have adopted a particular definition of safety, which is one that I do not recognise. They suggest that safety has always been about looking at accidents and deciding how to prevent them happening again. There seems to be little or no acknowledgement of the many approaches taken in practice to manage risks. I certainly feel that I have spent most of my time in my 20+ year career understanding how people do their work, understanding the risks and making practical suggestions to reduce those risks, and have observed this in nearly every place I have ever worked. As an example, permit to work systems have been an integral part of the process industry for a number of decades. They encourage people at the sharp end to understand the tasks that are being performed, assessing the risks and deciding how the work can be carried out successfully and safely. This seems to fulfil everything that Safety 2/Different is claiming achieve.
My current view is that Safety 2/Different is another useful tool in our safety/risk management toolbox. We should use it when it suits, but in many instances our “traditional” approaches are more effective. Overall I think the main contribution of Safety 2/Different is that it has given a label to something that we may have done more subconsciously in the past, and by doing that it can assist by prompting us to look at things a bit differently in order to see if there are any other solutions.
Bow tie diagrams
I won’t say much about these as I covered this in last year’s Christmas email with an accompanying paper. But I am still concerned that bow tie diagrams are being oversold as an analysis technique. They offer an excellent way of visualising the way risks are managed but they are only effective if they are kept simple and focussed.
I had a paper published in Loss Prevention bulletin explaining how human bias can result in people have a misperception about how effective procedures can be at managing risk. This bias can affect people when investigating incidents and result in inappropriate conclusions and recommendations. The paper was provided as a free download by IChemE.
I have written a few papers this year. I have decided to share two this year
1. Looking at the early stages of an emergency, pointing out that it is usually this is usually in the hands of your process operators, often with limited support. http://abrisk.co.uk/papers/2017%20LPB254pg09%20-%20Emergency%20Procedures.pdf
2. My views on Bowtie diagrams, which seem to be of great interest at the moment. I hope this might create a but of debate. http://abrisk.co.uk/papers/Bowties&human_factors.pdf
My last two Christmas emails included some of my ‘reflections’ of the year. When I came to write some for 2017 I found that the same topics are being repeated. But interestingly I have had the opportunity to work on a number of these during the year with some of my clients. As always, these are in no particular order.
This is still a significant issue for industry. But it is a difficult one to address. There really is no short cut to reducing nuisance alarms during normal operations and floods of alarms during plant upsets. Adopting ‘Alerts’ (as defined in EEMUA 191) as an alternative to an alarm appears to be an effective ‘enabler’ for driving improvements. It provides a means of dealing with something they think will be ‘interesting’ to an operator, but that is not so ‘important.’
During the year I have provided some support to a modification project. I was told the whole objective was simplification. But a lot of alarms were being proposed, with a significant proportion being given a high priority. Interestingly, no one admitted to being the person who had proposed these alarms, they had just appeared during the project, and it turned out the project did not have an alarm philosophy. We held an alarm review workshop and managed to reduce the count significantly. Some were deleted and others changed to alerts instead. The vast majority of the remaining alarms were given Low Priority.
I have had the chance to work with a couple of clients this year to review the way they implement process isolations. This has reinforced my previous observations that current guidance (HSG 253) is often not followed in practice. But having been able to examine some examples in more detail has become apparent that in many cases it is simply not possible to follow the guidance, and is some cases it would introduce more risk. The problem is that until we did this work people had ‘assumed’ that their methods were fully compliant both with HSG 253 and with their in-house standards, which were usually based on the same guidance.
I presented a paper at this year’s Hazards 27 on this subject, suggesting that keeping interlocks to the minimum and as simple as possible is usually better, whereas the current trend seems to be for more interlocks with increasing complexity. My presentation seemed to be well received, with several people speaking to me since saying they share my concerns. But, without any formal guidance on the subject it is difficult to see how a change of philosophy can be adopted in practice.
Human Factors in Projects
I presented a paper at EHF2017 on the subject of considering human factors in projects as early as possible. To do this human factors people need to be able to communicate effectively with other project personnel, most of whom will be engineers. Also, we need to overcome the widely held view that nothing useful can be done until later in a project when more detailed information is available.
I have had the opportunity to assist with several human factors reviews of project this year. Several were conducted at what is often called the ‘Concept’ or ‘Select’ phase, which is very early. These proved to be very successful. We found plenty to discuss and were able to make a number of useful recommendations and develop plans for implementation. It is still too early to have the proof, but I am convinced this will lead to much better consideration of human factors in the design for these projects.
This has been a concern of mine for a very long time (since Piper Alpha in 1988). But I am frustrated that the process industry has done so little to improve the quality of handovers. It just seems to fall into the ‘too difficult’ category of work to do. It is a complex, safety critical activity performed at least twice per day. We need to manage all aspects of the handover process well, otherwise communication failures are inevitable, and some of these are likely to contribute to accidents.
I have worked with a couple of clients this year to review and improve their shift handover procedures. It is good to know some are starting to tackle this subject, but I am sure many more have work to do.
I hope you find some of this interesting. To finish, I would like to point you to a free paper available from Loss Prevention Bulletin, presenting Lessons from Buncefield.
This was on last year’s list, but it continues to be a hobby horse of mine. During the year I have had the opportunity to review in-house isolation standards for two companies. This work has further reinforced my view that there are many instances where following the guidance from HSE (HSG 253) is not achievable, and may often not be the least risky when all factors are considered. The paper attached is my attempt to illustrate the real-life issues that operators and technicians have to deal with.
I am concerned that the use of interlocks is increasing dramatically with no real thought as to the benefit and potential risks. The problem is that there is no clear guidance to say what functions should be interlocked or how many interlocks should be used. And vendors are able and willing to sell ever more sophisticated and complicated interlocking solutions.
I believe that over use of interlocks encourages, or even forces, people to stop thinking about what they are doing, and they become focussed on identifying what they need to do to get the next key. I believe at some point this risk must outweigh the benefits of having interlocks in the first place.
I have tried to encourage clients on a number of occasions to reduce the number of interlocks in their design, but with little (or no) success. I think people feel that they cannot be criticised if they include the interlocks, and may be queried if they do not adopt the most ‘complete’ solution. I have submitted a paper on this subject to the Hazards 27 Conference, which takes place in May 2017. My paper is titled “Interlocking isolation valves – less is more.”
Human Factors in Projects
Another repeat from last year. Human Factors in Projects (often known as Human Factors Engineering – HFE) is starting to become normal, which is definitely positive. I have helped two companies with generating in-house procedures for implementing HFE. In both cases the aim was to make implementation as simple as possible, whilst ensuring suitable focus was given to the most important issues.
One of the key messages is that HFE should be on the agenda as soon as possible for any project. I have had the opportunity to assist one client with two projects this year that were at a very early stage. In both cases the consensus of all participants was very positive.
I have submitted a paper titled “Human Factors Engineering at the early phases of a project” to the Ergonomics and Human Factors 2017 conference, which take place in April.
Also, you may find this presentation on HFE interesting.
I have taken part in two investigations this year. Both highlighted human factors issues that I know crop up widely.
In one, scope creep on a maintenance task, combined with an over reliance on informal communications led to misunderstandings about plant status. The operating team, who were considered to be very competent and able, made some assumptions based on past experience, which turned out to be incorrect. The operating team were fully engaged in the investigation, and admitted that they were very disappointed with themselves for the errors they made, and wanted to understand why this had happened.
In the other, the plant was operating on the edge of its capability and there were multiple items of equipment were unavailable. When a problem occurred the operators perceived that their options to respond were very limited, and they reacted in a way that they thought was correct, but in hindsight simply exacerbated the problem. One thing that this investigation highlighted was how effective operators can be at ‘working around’ problems to keep the plant running. The unfortunate outcome of this is that the problems no longer appear to be so significant and so do not get resolved. However, as this incident demonstrated, this leaves the plant very vulnerable to events as there are not the safety margins available to cope.
Management of Alarms
Alarms continue to cause problems. But I am pleased to see that most companies have started recognise the need to modify their systems to reduce the frequency of nuisance alarms during normal operations and floods of alarms when things go wrong. And it is clear that improvements are being made.
I have assisted clients with setting up their alarm rationalisation programs and procedures; and I have been teaching a one day awareness course (based on EEMUA 191). From this I have made the following observations:
- Although it makes sense to focus on alarms, having a clear definition for an ‘alert’ can be a real enabler for people to see how they can improve their system. Whether it is a fear factor or some other concern, people find it difficult to say “that alarm can be removed.” However, they are happier to say “that alarm can be converted to an alert.” Of course we need to make sure that we don’t transfer a problem with alarms to a problem with alerts. But, we have a lot more flexibility with alerts, including how they are notified. For example, we can show them on separate summary pages, direct them to non-operational teams or automatically create daily alert reports. The result is that operators are not distracted by these lesser events if they are dealing with more important situations and ‘real’ alarms.
- EEMUA 191 introduces the concept of the “safety related” alarm (ISA 18.2 refers to them as “highly managed”). I find this term a bit confusing; and I think a lot of other people have struggled to identify which of their alarms fall into this priority. The reality is that many plants/sites will not have any alarms that satisfy the EEMUA definition of “safety related” and it is not just another priority. They are alarms that fill a gap where, in an ideal world, an automated response would be provided but this is deemed inappropriate. This means that the operator response to an alarm is considered to be a layer of protection. If credit is taken for this operator response the alarm is then considered “safety related” and it needs to be handled differently from all the other process alarms. If there is an automated protection device, the associated alarms will not be “safety related” and should be prioritised in the ‘normal’ way.
- We still have some disconnect between what the guidance says about alarms and what operators want. I think this is because, over the years, we have forced people to operate on alarms because they receive so many they don’t have the time to do anything else. When we suggest that alarm rates will be significantly reduced, operators cannot image how they will operate the plant if the alarms are not telling them about every little event that occurs.
- A solution to the concerns about reduced alarm frequency is to improve the quality of graphics on our control systems. This would make events more visible so that the operator does not feel they need an alarm. I think we quickly need to start looking at alarms and graphics together, which makes sense as together they make up the Human Machine Interface (HMI).
This has been a hobby horse of mine for a while. In fact a number of people have contacted me this year having read my paper on the subject titled “process isolation – it’s more complicated than you think.”
I have had the chance to carry out task analysis for some process isolation activities during the year. This has led to some heated debate at times. Everyone is aware of the guidance from HSE (HSG 253) but is finding it difficult to apply in practice. My observations include:
- A lot of designers (and other non-operators) simply do not understand how isolations occur in practice. This was illustrated to me on a project where double block and bleed arrangements were provided. Whilst the block valves had been identified as requiring frequent access the bleeds were not and had been positioned out of reach. Clearly the designers did not recognise that the valves and bleed points were used together to form an isolation.
- It is quite common (especially on older plants) to require multiple points of isolation to perform relatively simple jobs. If every isolation needs to be proven via a bleed, there will be multiple breaks of containment to remove the blanks from each bleed point. It is not uncommon to be creating significantly more breaks of containment to prove an isolation than are involved in the job to be performed. Each break involves risk at the time of the break and on return to service. Also, it creates very high workload. Unfortunately, the guidance currently available does not provide a method of weighing up the overall risks so that sensible strategy can be selected.
- Overall, it is appears that my paper from 2013 is still very valid. The last paragraph makes the point that “companies and individuals have accepted the guidance as relevant and correct but have not checked whether they can be applied in practice and/or whether the requirements are being followed. The concern is that this creates a large disconnect between theory and practice, which could result in risks being underestimated and hence improperly controlled. The solution is not simple, but being open about when the guidance cannot be followed will at least ensure alternative methods are developed that achieve similar levels of risk control.
Human Factors in Projects
I believe it is a very positive development that human factors are now being given more consideration during the design of new process plant. I am convinced that this will result in better designs of process plant that will be easier to operate and maintain; with reduced risk of major accidents. Having been involved in quite a number of projects over recent years my observations include:
- You cannot start to consider human factors too early. There has been a perception by some people that it can only be done once a project reaches detailed design. I have never agreed with this and have been involved in two projects this year at the “Select” phase (pre-FEED). We have been very successful at identifying human factors issues that need to be addressed during the project, and by doing this early we have been able to make sure the solution is covered by the design rather than through softer controls (procedures, training and competence), which is often your only option if you do this later on.
- On the other hand, it is never too late. Whilst the preference must always be to start human factors input as early as possible, if this has not happened it is still worth doing something. Earlier this year I had to complete a human factors study on a project where the plant had already been built, although it had not been operated. By bringing together the designers, vendors and operators together to discuss the potential human factors issues we identified a significant disconnect between what the plant had been designed to do and what the operator was expecting. Whilst it was too late to change anything, at least the operator knew what they had to do before start-up instead of having to learn everything on a live plant.
- We need to be careful that human factors in projects does not become an overly bureaucratic exercise. Unfortunately, on some of my projects I seem to spend more time ‘discussing’ specific details of what a standard may require instead of working towards the optimum solution for the project. I think this occurs partly because of the way some standards are written. Also, because of a general lack of knowledge amongst project personnel about what human factors is all about. Starting early and developing integration plans that are clear, concise and focussed on developing optimum solutions are the best ways, I think, of making sure human factors makes a valuable contribution.
Task Analysis and HAZOP
I wrote a paper a little while ago saying that we needed to create better linkages between the various safety studied carried out in the process industry. My view was that we are tending to do these things in isolation and missing something as a result. As an example, I felt that there must be useful links between task analysis performed as part of human factors and HAZOP performed as part of the process safety scope. I have had the opportunity to explore this idea a number of times this year. My observations include:
- HAZOP does often identify human errors within the causes of hazardous scenarios; and also procedures or training as risk controls. As a minimum, we have to make sure that we can demonstrate that the human factors associated with these causes and controls have been addressed. Task analysis is the obvious way of doing this.
- HAZOP usually differentiates between major accidents and lesser outcomes in a systematic and defensible way. Cross referencing these with our task analyses allows us to build a stronger case for the findings of our analyses and acceptance of the subsequent recommendations. I guess it helps us change perceptions of human factors so that it is not seen so ‘abstract’ (wishy washy) and is more routed in ‘proper’ engineering.
- Building these links between task analysis and HAZOP requires human factors people to start reading HAZOP reports. This is quite an undertaking. In fact, the size of many HAZOP reports makes it impossible for anyone to seriously sit and read them from front to back. Careful use of the ‘find’ function on Word or PDF; and a clear understanding of how the report is structured around nodes can help enormously. It is still not something to be taken lightly, but I do think this is a big part of making human factors more relevant and valuable.
- There is a big variation in the quality of HAZOP reports. One of the main problems is when similar issues are dealt with inconsistently throughout the report. The software available to assist with HAZOP seems to be starting to help in this regard. I don’t think there is much benefit in human factors people sitting through full HAZOPs, but they can work with the HAZOP leader at the start of study to make sure there is a common understanding of what needs to be done to improve the links between HAZOP and human factors (particularly task analysis).
I have had shift handover on my agenda for many years, ever since I studied the Piper Alpha accident as part of my PhD. I have been generally disappointed that industry has not taken the issue more seriously, especially as it has been cited as making a contribution to several other major accidents. I have suspected that it has generally fallen into the ‘too difficult’ category, largely because it is totally reliant on the behaviours of the people involved. However, I have worked with one of my clients this year to improve their procedures for shift handover and in developing a short training course and presenting it to shift teams. My observations from this include:
- Communication at shift handover is far more difficult than most people think it is; and most people are not nearly as good at communicating as they think they are. The circumstances surrounding shift handover create particular challenges. In particular, the person finishing their shift will have just finished working 8 or 12 hours and is understandably keen to get home. However, the person receiving the handover, and who needs the information, does not know what they don’t know. Neither of these is conducive to effective communication.
- Individual personalities make a big difference. The shift teams I spoke to all complained about colleagues who gave poor handovers or did not appear interested when receiving a handover. It is easy to get in to a ‘why bother’ frame of mind in these circumstances. But it was clear that people were often reluctant to challenge their colleagues because they did not want to create any tension given that they had to work together. I believe in a number of these cases the individuals involved had not realised that their behaviour was so critical simply because no one had ever told them.
- Preparation for the handover is absolutely crucial, and time has to be made available to do this well. Management has a very important role in making sure they communicate very clearly that this is a critical part of a shift worker’s job. Also, by making sure they do not ask for (or expect without asking) things to be done towards the end of shift that will limit the time available to prepare.
- A well-structured log sheet and end of shift handover report can make a great difference to the quality of shift handover. I am surprised at how many companies are using blank pieces of paper for these. Also, how many only use a chronological log or end of shift report during handover; as these two documents perform different purposes and are both needed. There is software available that can support shift handover, but it is of no value if the appropriate systems and behaviours are not in place.
I showed a couple of videos on the shift handover courses. The look on some of the operators’ faces and the “oh sh*t – that could have been us” comments highlighted to me that we all become complacent to risk. It is a natural human reaction and coping strategy. This is why we have to keep working to reduce risks, and I am sure that human factors working more closely with other elements of process safety provides us with the means of driving improvement.