In this week’s Pipeliners Podcast episode, Russel Treat welcomes back IT and OT professional Nicholas Guinn to discuss the importance of change management in pipeline operations.
The conversation focuses on the tug-and-pull between operations and IT, especially in the areas of software implementation and updates, cybersecurity, and field communications.
Russel and Nicholas provide actionable information for pipeliners who need to think about their systems and processes to maintain the integrity of their operation. Listen to this episode to gain valuable insight.
Pipeline Management of Change: Show Notes, Links, and Insider Terms
- Nicholas Guinn is a Senior Consultant for Summit Offshore Systems, Inc. Connect with Nicholas on LinkedIn.
- Management of Change (MOC) focuses on planning the processes in a system, the type and quality of changes required, the risks associated with the plan for change, and the people that will be impacted by the change so that it is clearly communicated to stakeholders.
- Process Safety Management (PSM) is a method of managing risks associated with processes in an industrial system to ensure safety for internal workers and the external public.
- Failure Modes and Effects Analysis (FMEA) identifies the potential failures or malfunctions in a system and how to address those failures in the event of a real-life occurrence.
- Job Safety Analysis (JSA) is a written-out, step-by-step guide for performing tasks in a hazardous work environment that is used to reduce the risk of harm to workers or the environment.
- Kanban (or Agile) boards are a project management tool to visualize the workflow in a system so that teams understand their role in supporting the overall system.
- Merrick provides IT technology solutions for oil & gas companies. Their service offering includes field operations management, field data capture, and drilling support.
- WellView is a management system to support oil & gas companies involved in well planning, drilling, completion, and testing.
- Lockout/tagout is a procedure implemented in industrial systems to prevent machines and equipment from operating outside of their standard operations to protect the safety of workers. This is especially utilized in hazardous environments.
- E&P (exploration & production) companies are involved in the upstream oil & gas sector, exploring new areas to discover raw material and bring the material out of the ground.
- The Bellingham Pipeline Incident (Olympic Pipeline explosion) occurred in 1999 when a pipeline ruptured near a creek in Bellingham, Wash., causing deaths and injuries. According to the NTSB report, the cause of the rupture and subsequent fire was a lack of employee training, a faulty SCADA system, and damaged pipeline equipment. [Read the NTSB Pipeline Accident Report]
Pipeline Management of Change: Full Episode Transcript
Russel Treat: Welcome to the Pipeliners Podcast, episode 43.
Announcer: The Pipeliners Podcast, where professionals, Bubba geeks, and industry insiders share their knowledge and experience about technology, projects, and pipeline operations. And now, your host, Russel Treat.
Russel: Thanks for listening to the Pipeliners Podcast. We appreciate you taking the time and to show that appreciation, we’re giving away a customized YETI tumbler to one listener each episode. This week, our winner is Matt Heft, with LOCAP. Congratulations, Matt, your YETI is on its way. To learn how you can win this signature prize pack, stick around to the end of the episode.
This week, we have Nick Guinn back with us. You may recall that Nick was with us in Episode 19 to talk about the logistics of compressor optimization. Nick returns to Episode 43 to talk about management of change and its importance. Nick, welcome back to the Pipeliners Podcast.
Nick Guinn: Thank you very much. Glad to be back.
Russel: We were sitting here talking, before we got on the microphone here, about management of change, what it is and all of that. I think the first question I’d like to ask you is, what’s your experience and track record around this whole idea of management of change? I know it’s a topic that’s near and dear to your heart.
Nick: Absolutely. In fact, I’ve worked with several companies helping to write a management of change specifically because I do, like we talked about last time, ride the line between IT and OT. Software is very much an important part of that to me. I think the management of change when it comes to software is usually an afterthought.
I feel like most of the time it’s just because software, whether it’s operational software like a SCADA system or IT software like Merrick or WellView or one of these others, that sometimes MOC is overlooked just because it’s not attached to something tangible like welding on a pipeline.
Russel: I agree with that. Let’s talk a little bit about definitions. How would you define MOC versus change management versus work planning? I think sometimes in this industry we get those things confused. I’m wondering how you would define that.
Nick: I think MOC puts a little more rigor around the planning process. It defines everything from what is it you’re doing, what’s the risk associated. If you think of it in terms of what the toolbox talks or the job safety analysis that the guys do in the field, it’s a similar type of process.
They even have a set of standards in a lot of cases to make things a little bit smoother if they’re doing a standard operation. I feel like MOC could be done much in the same way, but it does identify things, like I said, the type of change, the quality of the change, the impact or risks of that kind of thing, people involved.
It goes so far as to make sure that it’s well communicated to the people who need to now, thinking comparison to like a lockout/tagout solution. It makes sure that there’s an authorization by someone depending on if it’s a large change or a smaller change, that the appropriate level of person or people have given it the approval that it needs to move forward.
Going back to lockout/tagout, you’re not going to want to open a valve on something that’s under pressure without making sure that everybody knows what’s going on, not even then I guess, but I think MOC gives you the opportunity to really make sure that you have all your bases covered before you start work.
Russel: We tend to be fairly comfortable in our industry around job safety or task safety, particularly if we’re doing things like working around testing a pressure relief, doing some kind of lift or something like that.
We tend to be more deliberate about safety planning around those kind of tasks than we do around automation tasks or IT tasks. They can have just the same kind of impacts from a safety or operational consideration standpoint. You were talking a little bit about, you think people get confused between change management and MOC, so maybe you could tell us what you think that distinction is.
Nick: Change management is, again, goes back to more like the work planning. I think in my work as a project manager, change management is a lot of times, the more the transitions after the fact is that, how is everybody aware of the change, but it doesn’t really involve managing the actual change itself.
Maybe change management, CM, might be more around the training on a system or a tool after it’s already completed.
Russel: More about the people aspects and the ongoing support aspects versus working through the transition.
Nick: Making sure it’s communicated.
Russel: That makes sense to me actually.
Nick: Then MOC, of course, at the end of that, you even want to have an audit process that has somebody going back to make sure that everybody’s staying in compliance, that we’re being safe. I’ve seen plenty of examples, working in some shipyards in China and other places, where changes were not well communicated and ended up resulting in damage to the equipment that was just being built.
It was a prototype system. If they had gone through a proper change management process, more people would have known what was going on. The damage would have been avoided.
Russel: This subject’s actually very near and dear to my heart. I’m sure you’re familiar with Bellingham. Bellingham was a gasoline pipeline in Bellingham, Washington, that ruptured and put a lot of gasoline on the ground and then caught fire. It was a really horrendous pipeline incident.
There was an NTSB recommendation that came out, specifically, which I would call an MOC recommendation. One of the things that was a contributing factor in that incident is the computer system locked. They had made some changes to the computer system. The computer system locked up.
They lost visibility for about an hour until they got the computer system recovered. That’s just not okay if you’re running a petroleum products pipeline. Planning for making changes and in particular testing them before you put them into production and then monitoring them after they go into production and having a plan for yanking them out if something goes sideways is critical.
Nick: In fact, I think working with some of the companies that I’ve worked with on MOC, they were really good about getting that started, about the understanding of the need, especially as they start addressing some of the risks and doing a risk review on some of the things that they were wanting to change.
Even that ended up leading them to the need for better rollback plans. There’s always room for improvement. [laughs]
Russel: Of course.
Nick: Always a better tool to capture those things even. It definitely has to be an awareness first, right?
Russel: Yeah. My experience is there’s always some resistance to the paperwork aspect of all of this. Why do you think that is?
Nick: I think that that ties to the operations versus the IT mentality. In cybersecurity, you have your confidentiality, integrity, and your availability pretty much in that order as IT’s focus on how to manage the system. Whereas you have the exact opposite view from the operations side, where it’s availability first, integrity second, and confidentiality is last.
It’s not to say that it’s not still important, they’re both important. It’s just a different perspective. I think when you start talking about MOC, at least in my experience, a lot of the operations guys see it as a step in bureaucracy or red tape that they could avoid to get their work done.
Russel: I think you’re right. I think you’re exactly right. I think it’s harder in the software aspect of things, particularly on the operational technology side, because it’s more abstract.
If you’re used to work working with things that are mechanical, that you can look at and you can see how they perform, like running an engine or something like that, that’s very different than I have to have a mental picture in my head about what happens when I push this button or change this little piece of code. That’s a different kind of thinking. It’s a different way. It’s much more abstract.
Nick: In fact, I had a colleague say recently that — it’s nothing against the guy who’s doing the work — if you have an electrician in the field who is used to plugging in power and the power makes things happen, there’s a likelihood that that same mentality is applied to IT work.
All I have to do is plug it in, it should be fine. Unfortunately, as we know from some of your previous podcasts, especially the one with the gentleman from Kepware, it’s not quite as easy as just plugging it in.
Russel: That’s right. That’s the other thing. When I’m teaching SCADA fundamentals, I talk a lot about topology, which is how we physically connect things, just getting it wired. It’s one thing to be able to ping, it’s another thing to have the data actually move.
It’s a different level of the complexity of creating the communications. I think the abstract nature of software, its I plug it in and magic happens kind of thing, is one of the things that’s an obstacle. It also presents risks, right?
Your comment about IT versus OT, the way I always characterize that is, if I’m a company and I’m going to update my email server, when do I do an email server update? I do that at seven o’clock on Friday evening after everybody has gone home because nobody wants to be using their email at eight o’clock on a Friday night.
If I’m making a SCADA change, when do I do that? I do that first thing on Monday morning because if anything goes wrong, I don’t want to have to call people in on the weekend to fix it.
Russel: Again, it’s your perspective. The operating perspective is different than the IT perspective. The ITs perspective is, “We need to have it available during the work day.” The operations perspective is, “We need to have it available all the time, and we’ve got to manage that risk if it’s not available.”
I also think there’s another aspect to this, is that we tend to do fairly good planning of change, and execution of change of things that we think are big, and then we tend to minimize the planning and implement of change for things we think are small.
In the software world, that might be, “Well, you know, I’m just patching the software. I’m just implementing a patch. That’s not a big deal. I don’t need to test that. I’ll just implement it.”
Nick: I can think of one contract I had a few years ago where that was the mentality, “I’m just making a small change to the software. It’s not that big a deal. It’s a couple of HMIs and a little bit of the ladder logic in the background. That’s it. That’s all I’m changing.”
They had a regression, so they reintroduced a whole series of problems that they had previously fixed because they weren’t doing version control. That goes back to, part of this MOC is capturing the details, making sure that, “Hey, we fixed X, Y, and Z two weeks ago. Let’s go back and when we make a change to the same module, let’s go back and make sure that X, Y, Z is still fixed.”
That kind of regression can drag a project down forever.
Russel: That’s gets into, if you’re doing software development, if you’re using agile techniques, you’re using modern tools, you’re building databases and applications and all of that, there’s lots of really good tools for managing your code versions, checking things out, making changes, and checking them back in. There’s also really good tools for building automated testing and automated regression.
When you start getting into the SCADA world and the automation world, particularly when you start talking about PLCs and things like that, a lot of those kinds of tools don’t exist.
Nick: I think lately it’s gotten better. There’s a lot more event managers attached to SCADA systems that are capturing changes, at least at the configuration level if not at the code level. You’re seeing a lot more of that, a lot more awareness.
I was working on MOCs with companies when we were doing acceptance testing on control systems, working with companies like Saipem, Repsol, Shell, DNV, and all these other organizations. They were all eager to bring this kind of thing in, and you start looking at the contractors, of course to gear the operations mentality a little bit more, so they were not asking to bring that kind of thing in.
I think they realized the value of course, but if it slowed things down, maybe, maybe not. It did take a while to get everybody to come around and start using those tools, a little bit of education. Sometimes it was a glowing review when it was reviewed two, three months later, and sometimes it could have used a little bit more training, a little more time to get familiar with all the details.
Russel: I think the other thing that’s sometimes difficult to understand about implementing software changes is, depending on how you implement a software change, you may or may not have the ability to roll it back. That’s really critical when you start talking about critical process control.
If I put a change in, I need to be able to get that change pulled out quickly and back to the state I was before I tried to implement the change.
Nick: How easy is it to do? Like in the IT world, you have your DEV and your TEST domains where you can actually put software in, that you’re making changes to, and test it. You can capacity test it. You can smoke test it. You can do all these other different kinds of tests that give you a better indication of whether or not those changes you’re making are going to be functional.
Sometimes in SCADA world, you don’t have that opportunity because it’s difficult to replicate an operational environment in a test environment, or costly. [laughs]
Russel: Costly is part of it. The best practice I think is to make a distinction between functional testing and performance testing. Functional testing is, does the automation do what it’s supposed to do? Performance testing is, will it run at scale? Those things need to be addressed separately.
I would assert that the best practice from a SCADA or pipeline standpoint is, you have a test platform that allows you to test all your key threads of execution, make sure they all work.
You have a process that’s very rigorous and controlled, where you roll it into production with a plan on how you’re going to get back to where you were before you rolled it into production if there’s a problem, and very clear monitoring.
Generally, the way I like to look at that, I think you have to stack the monitoring. I think the first level of monitoring is, you want to look at, “I put it in. Is everything continuing to work?” That period of time is like 24 to 48 hours. I’m watching it really close.
Over a longer period of time, you want to take and do some performance monitoring, disc I/O, memory utilization, those kinds of things, and compare, “How does this look and how is it trending, versus how did it look and how it was trending before I made the change?” If you have some kind of thing that’s going to cause you to crash unexpectedly, typically it’s going to show up in those metrics.
Nick: Absolutely. Identify those metrics first. Honestly, in a lot of changes, you’re going to see something is not quite right pretty quickly, in most cases. Obviously, that’s not universal, but I feel like at least in the projects I’ve worked on, changes are pretty evident within a couple of real simple tests. It’s not like you’ve got to test every possible feature. Test the things that are impacted by the change.
Russel: That’s exactly right. That’s where the management of change thing comes in is, I’ve got to find what am I changing? What constraints can I put around that change? How do I test it? Then, how do I roll it out? The other thing we hadn’t talked about is communications. If I roll something out into a SCADA system or an automation system, who’s the best person to tell me if something is wrong?
Nick: [laughs] Probably your control room.
Russel: Exactly. The guy that’s in front of it all day long, every day.
Nick: Exactly. They’re the first ones that’s going to spot something.
Russel: They’ll know intuitively before anybody else has a clue. Part of this is, you’ve got to develop the right relationships with people and communicate effectively, “Hey, we’re making this change. Here’s the behavior now. Here’s what you should see. Here’s when the change is being made,” and follow up, “Did you see anything? Did you notice anything?”
Nick: Absolutely. I’ve seen in several of the larger projects that we were talking about MOC and that kind of thing, we went through FMEA, a failure modes and effects analysis. A whole two week period, we did FMEA on the SCADA system to identify any of the failures that could possibly happen.
What were the mitigations going to be to avoid those before, more on the design side, than having to do it after the fact? There were several times where just process was identified, i.e. communication was identified as the way to mitigate the problems we were seeing because any other way, it was going to be, no, it wouldn’t address the risk well enough, it wasn’t cost effective, or whatever.
It just made more sense to have people say, “Okay. I have to be aware of this. It has to be written down on a checklist somewhere.”
Russel: When I do the SCADA fundamentals training, we have a whole segment that we do on troubleshooting, and a whole segment we do on testing, and then a whole segment we do on critical system failure, critical system failure meaning I completely lose my SCADA system. I get blind.
For us, that’s a huge deal. That should never happen, and when it does, you want to know exactly why it did, and capture your lessons learned, and make sure that gets rolled into your forward-looking policy and all of that. The other thing too that goes with this, we’re talking about some big things. I think the bigger challenge is in the smaller changes.
Nick: I would agree.
Russel: To me, this is what my experience has been. When you say MOC, people make up this big, complex, burdensome process to make a change.
Nick: I saw that recently even with the brand new document being developed, where the process had to have even the minorest of things go all the way up the authorization chain, which meant the executive level. That just didn’t make any sense.
Russel: If I’m doing something as simple as a transmitter change, that doesn’t require a full MOC, but it does require communication with the control room.
If I’m making a physical change to the configuration of the pipe, I may be going through a full PSM-style process that also requires a lot of communication. Part of this is, if you’re going to design an MOC process, it needs to be appropriate to the task being performed.
Russel: I think if you can use your MOC process as a way to streamline communications, then you can get people to use it because now it actually is providing a benefit for me.
Nick: I think that was part of the problem with some of the MOCs where it was just based on a spreadsheet. Spreadsheet is not going to communicate to anybody.
Russel: That’s right. Certainly, with where technology is going, I think there are some opportunities for looking at new and improved ways of doing that type of thing.
Nick: I do totally agree.
Russel: I’ve been kind of driving the conversation, Nick. I’m curious if you want to do a flip and maybe you can ask me some questions about what my perspective is on MOCs.
Nick: Have you seen any other hurdles aside from what we have already talked about in MOC, and trying to get that kind of a process implemented with different companies, outside of maybe technological and political resistance?
Russel: Yes. I think it would boil down to this, it’s all a challenge of finding the right model to use. What I mean by that is, I’ve seen some midstream companies and even E&P companies that are facility centric adopting PSM. PSM is great around the facilities.
I don’t know how well it works for ongoing O&M around my mechanical equipment and my automation. I think the other challenge is, people are looking for a model that’s going to address their particular needs. Does that make sense to you?
Nick: What would be an alternate model from PSM then?
Russel: Oh man! That’s a great question. I don’t know that I know the answer to that. I think that’s an interesting question. If I’m doing facility design or I’m planning facility modification, PSM makes a lot of sense. What I tend to think about is, I want to do a modification in my SCADA system.
I’m not going to change any of the field I/O, but I’m going to add some kind of screen, report, or I’m going to make a change to a graphical object in a template to propagate that through the system. Those kinds of changes, PSM doesn’t really address that kind of change.
It’s almost like it’s an unaddressed aspect of the market at present. In fact, it’s something we’ve talked about within my company about, is there a need for that? How would you implement it in a way that makes people’s job easier?
You don’t want to just say, “Oh, here’s a cool way to create more paperwork for yourself.” Sometimes automation from an IT perspective is, “We’re not going paperless. We’re just generating more paper faster.”
Nick: [laughs] Right. If you were to implement something for management of change on small changes, would you maybe identifying a level of risk or impact? What would be the metric to create that dividing line between something small and large?
Russel: I don’t know. Great question. You should be a podcast host.
Nick: [laughs] I’m just happy to be on yours.
Russel: [laughs] Again, I think you’re asking some really good questions. You’re actually causing my brain to grind a little bit about the answer. You’re familiar with Agile and Kanban boards, right?
Russel: The idea of Kanban is, you take something that’s pretty difficult and you make it a lot simpler. You have a list of, “Here’s all the things that I need to do. Here’s what I’m currently doing. Here’s what I’m reviewing. Here’s what’s done.” Then you can organize those things into various swim lanes and so forth.
I almost think that an approach like that would be better, particularly if you can create some checklist or standard workflows around how I work a task given its risk profile.
Nick: I think some of the more successful MOCs that I’ve seen, especially the ones that address that from the very small changes all the way up, that’s what’s always been one of the better points in those processes is having something just like that.
Russel: Yes. There’s a lot of tools out there that do this kind of stuff. There’s a ton of them. I don’t know that there’s one that’s really ideal for this, because most of those tools are not doing anything where you consider, “What’s the risk of executing this task?” That’s the part you have to add.
Nick: I think I’ve seen a lot of the Kanban board options out there, and almost any of them would work for that function, probably some better than others, that would just take some more research to figure out which one.
Russel: That’s right, but none of those have risk as a consideration in the way they’re doing it, so I think that why do we do MOC, right? What is the real reason for it? Why is it so important? For us that are doing critical process control, it’s our primary mechanism for managing the risk of operating, maintaining, and enhancing our systems.
You almost need some kind of mechanism to evaluate, am I increasing or minimizing my operating risk? The other part of this conversation around MOCs is this idea of something Kanban like that allows you to have, in a single system, simple tasks and complex tasks appropriately and some range of that.
There’s also another thing which is around, how is what I’m doing impacting the risk of how I’m operating? The trick is, how do you combine those two things together?
Nick: Even identifying other parties that might be impacted by this. I feel if there’s a lot of operation people listening, it goes back to the lockout/tagout. You can’t have somebody welding one side of a pipeline while the other guy on the other side of the pipeline is trying to do something entirely different. It’s not an ideal situation.
It’s the same thing here. If you’re changing software, and if somebody is in the field who’s in front of a valve, who has local access, local control, and someone who has remote control on a system that’s being changed, the last thing you want them to do is both try to operate the valve at once.
Distributed control should take care of that, but on the off chance that it doesn’t, the guy on the field is probably at risk for injury.
Russel: That’s exactly right. However, in our world, the reality is the field guys, their communication tends to be tighter than the control center communication. There’s probably a lot of people in the business that might raise their hackles a little bit at what I just said.
I’m just speaking out of my experience and perspective, but communication in the control room and communication in the field work quite differently. Nick, we’re coming to the end here. I know we’ve been kind of jumping all around. Maybe the thing we can do is, you can make a final comment about here’s what you think is important about MOC, I’ll to the same, and we’ll wrap it up that way.
What do you think is important about MOC?
Nick: I think the most important thing about MOC is just the understanding of the purpose, that it’s to mitigate the risk and to avoid crisis before it happens. I feel like a lot of driving force behind change in oil and gas, whether regardless of midstream or upstream, is crisis.
This is a good way to avoid that, is to try to get in front of it ahead of time, to identify the risks and create a good mitigation plan, have it well communicated, and so on.
If you can do that, you can avoid a lot of the heartache, like you were talking about the pipeline that ruptured, and a lot of the problems that I’ve seen with people poorly communicating on the changes to SCADA systems, leading to equipment damage, leading to injury.
It took maybe an extra JSA or toolbox talk or extra couple minutes of just making sure that it was tested thoroughly before being put fully into production or put into production and tested just to make sure that it’s no big gotchas right away and that kind of thing. Identifying a good risk practice and getting your changes in there.
Russel: I think the most important thing in here is just to remember what the purpose is. That purpose is to mitigate risk in advance. I think the other thing is if you’re going to do something…This is the corollary, if you will.
That is if you’re going to do something, what you need to do is make the process as simple and straightforward and supportive of actually doing the work as possible, so that people will actually use the tool, the communication and thinking will happen. I think that’s a great place to leave it.
Russel: Thanks for joining us. Let’s have you back. I hope you enjoyed this week’s episode of the Pipeliners Podcast and our conversation with Nick Guinn. Just a reminder before you go. You should register to win our customized Pipeliners Podcast YETI tumbler. Simply visit pipelinerspodcast.com/win to enter yourself in the drawing.
If you’d like to support the show, then we would ask that you would simply submit a review on your podcast app. That’s the best way to let us know you appreciate what we’re doing.
Russel: If you have ideas, questions, or topics you’d be interested in, please let us know on the contact us page at pipelinerspodcast.com or reach out to me on LinkedIn. Thanks again for listening. I’ll talk to you next week.
Transcription by CastingWords