How does your org distinguish between change requests, and activities which pertain to a daily workflow of maintenance, response, and/or improvement?
Here's a hypothetical: You're adding a switch for a new office, but to support the addition you must perform a firmware update to all the other switches in the building. The building hosts a datacenter with mission-critical resources that must not go down. Your qualified and trained staff have done this before, and the resilience built into the network have been successfully tested in the past.
The goal of my question is to distinguish routine from risk, even though we all know any one change inherently introduces risk. I want to know how you draw the line between "Yeah, go ahead and do it," and "Have you presented your risk assessment to the change advisory board?"
So my take is that changes of this nature should be run through the Change management process.
Rationale: Even with experienced staff, things can go bump in the night.
In my experience, the Change team should be comprised of all disciplines (Network, Security, Platform support, Database admins, Developers and if you have any OT in your environment reps from them). Each of these teams knows what is happening in their environment (and if they don't they need do some checking (hopefully prior to the change meeting).
If an all clear is given for the change to happen, if something was to occur, folks are not chasing their tales trying to determine what the cause of an outage if, they have a place to begin investigating.
Here is what can happen (actual incident), A vendor came in, notified the department that they were upgrading the firmware on a piece of equipment.....No advice provided to anyone other than one person in the department. The CD being used for the Upgrade was infected. Once deployed, computers on the Campus began Blue Screening (I call it a campus as it covered buildings across 45 acres. Hours were spent running form location to location trying to determine what happened. This change also affected the OT environment. Two days later, it was tracked back to the vendor's patch.
Had the change been cleared through the proper process, it could have been determined that X was happening on this day and $$$$$ could have been saved.
Just my thoughts and my take is ALL changes need to be reviewed by the Change team
d
Many, Many years ago an organisation a large organisation I worked for was starting out with ITIL and decided that "Any change to a server or system on a server needs change management". The infrastructure team was upset - "Create a group in Active Directory? Yep Change", "Create a DHCP Reservation? Yep Change".
As you can imagine that mandate did not last long.
The requirement for change management in that organisation is now more like "a non-routine alteration of production environment assets that may be either asset configuration or asset addition/removal which may have an impact on the greater organisation". Routine tasks are defined and are repeatable with a consistent, known impact and risk (like the DHCP reservations and AD group additions).
In your hypothetical, while the installation procedure for firmware may be routine, the impact and risk is variable depending on what the firmware update includes. Because of this, it would need to go through change management where I am 🙂
It depends largely on if this is the near first time a modification of the type has been made. Obviously the first time it would certainly go to CAB and probably the next couple of times until it becomes something more akin to a standard work process. It then would still need to be advised to CAB so that it could be put into the forward schedule of change to prevent clashing changes being scheduled at the same time.
Having said that, some manufacturers switch firmware upgrades are so flaky that we know that they sometimes brick what are ostensibly the same make and model of switch as others that have upgraded successfully (and the upgrades can't be rolled back), that I'd take it to CAB. I'd also have enough spares on hand to allow for something going wrong and a standing instruction to halt the upgrade once all spares had been used.
The key question is when do you stop taking things to CAB. If you get it wrong CAB wants to scrutinise every medium and low severity patch, then starts believing that not patching is the safer option. I've even heard one CAB say creating a user account required CAB approval. I just sneered at the absurdity of that.
Hi All
Does any one use ITIL V3 or ITIL V4 or am I talking a different language?
There are standard changes which are pre-approved changes that are low impact, well known, and documented.
Normal changes, which follow the entire change process, including scheduling, after a risk assessment and approval process.
Emergency changes.
Any bypassing of such processes, would immediately invoke a major investigation, and probable root cause analysis with consequences.
Regards
Caute_Cautim
Sort of. It's ITIL with a company spin, and occasional knee jerk reaction to scrutinise everything if there have been any significant recent impact from a failed change.
Of everything I learned from ITIL 4, my favorite part was classifying changes as "standard", "normal", "emergency". And since you brought it up, @Caute_cautim, my goal is to ask "How do you define 'standard'? What does 'normal' look like? And could we know what 'emergency' is, if one came up and bit us on the leg?"
I'm very glad you brought this up. I need to define "standard" and "normal", more than anything.
@ericgeater I normally let my fingers do the work, when people ask these questions a) because I am curious b) I would like to know myself c) so I can help others too.
So found a BMC Software blog on the subject, which is attached, may provide illumination and save a lot of thinking too.
Alternatively, I should ask the nearest AI bot to hand, but who do you trust these days - your ability to type and think or a piece of software with the capability to do rapid searches, with possibly returning inaccurate answers which may be biased or influenced by the very people who planted the information in the first place.
Is this paranoia?
Regards
Caute_Cautim
@ericgeater wrote:"How do you define 'standard'? What does 'normal' look like? And could we know what 'emergency' is, if one came up and bit us on the leg?"
In my world,
An emergency change is one in which the technician feels the implementation is sufficiently urgent that it must precede the paperwork/approval and has concurrence from one other person. Typically emergency changes are the result of something breaking. The control against abuse is the knowledge that they must defend their decision when the paperwork subsequently follows the "normal" process.
A standard change is one where a "template" has gone through the change control process and we have agreed that each implementation only requires "notification", not "approval". For each standard change, we have a template that must be referenced and will automatically send the notification to interested parties. So, we may have a template named "assign a switch port to a vlan". And then an implementation would be something like "assign switch #17, port #5 to vlan 201".
This is a big subject, and I have been a change manager, infrastructure manager and 3rd line manager, working with service delivery, project support and release management.
OK so here is the thing:
CAB. You NEED to get a CAB sorted. An Asset register of all devices, and tangible/intangible items such as databases and to a point even roles. This needs to be put into a CAB and documented backed up and ensure that you can query this data and assign tasks or work to it. ServiceNow is a good example of this which can interface with other tools.
To get to this point you need something on the network which can discover all of your devices and put this in a database which is secure.
Now you have all of the information you can assign changes to, you need to formulate a process which includes the business. Remember change control is there to limit the impact to the business. So this must include people from all departments its going to affect (even CEO if needed), the initial incident or problem, the work which is going to be carried out, the rollback plan, the comms plan, the release plan (release management needs to be involved) and any other team which is needed (for example if its a software update, then the software packaging person or team which is responsible for this)
Look at the OLA or the SLA to refer to uptime needs, support needs etc. Obviously these need to be in place first.
All of this needs to be documented and history recorded with the work which is to be done and who did it, why they did it and when they did it and what the output was (for KPI's etc).
Next - and this is the thing which ALWAYS gets forgotten, or it gets lapse or not updated is DOCUMENTATION. If a change is to be made then time for documentation for the systems which are changed which follow the document change process needs to be accounted for. If the change is to a live system, then the live system should be part of BCDR, so the change needs to feed into the BCDR recovery plan which means you are updating:
So change control impacts a lot - if you have a switch change for example then this will feed into all of these processes and determine if an update or followup is needed.