Re: Thoughts on Patching for Zero Day

vishybear · ‎06-12-2025

I'm just interested in feedback from people here. My background is sysadmin and I've had MANY people come screaming at me over the last couple of decades when a Zero day comes out to cancel all my plans and suddenly patch the entire infrastructure.

I've had a few arguments with cybersecurity people who haven't worked sysadmin or similar who insist that the patches need to be installed STRAIGHT AWAY and scare the C-Suite about it.

However, I've seen enough times with Microsoft especially, but also VMWare back in the day where an entire infrastructure was taken down by a bad patch. And don't say - you should test it first, there are PLENTY of clients out there with no test environments and some of these patches that went out didn't display a problem until after reboot or even a couple of days later.

I'm a BIG fan of letting someone else take the risk first. Definitely DON't do it on a Friday night & leave at least 48 hours before even thinking about patching a 0 day, as the patches are usually rushed, badly written & VERY likely to be faulty.

Thoughts?

Brewdawg · ‎06-12-2025

For things like OS and infrastructure patches I fully agree. If you have a test environment feel free to start testing the zero day there immediately, but anything that needs to be up should wait at least a few days, and then start small with your testing. Most of these zero days would need more access to your environment and would require some level of expertise.

Now zero days for things like browsers are a different story. The level of exposure to things outside of your control make them a little more critical to start updating on endpoints that are browsing the web. My feeling is that browsers and similar software should be patched as soon as possible, and in some cases, same day as zero day is announced the rollout should be started. Now I would put in the caveat of, start patching your preplanned test users first and let the updates bake for a bit before rolling out across your entire environment, but I would not take as much time to get those patches out as I would for patches on mission critical equipment.

JKWiniger · ‎06-12-2025

This is where update rings come into play. Whatever you are using to push the updates out should have this feature, and if it doesn't you need to find something that does. It's simply pushing the update to large and larger groups of users on a set schedule. Your first ring would probably be a core set of users, including IT, os if there is a problem with an update they will know what going on and you can pause updates going out to anyone lease before is it addressed. With things I have seen with breaches it seems less to be about the 0 day and things being exploited what a patch has been out for a year and was just never applied...

Thoughts?

John-

akkem · ‎06-13-2025

Not every zero-day is automatically exploitable in every environment. Many require specific conditions, privileges, or exposed services. Blindly pushing patches without assessing applicability or risk often creates more problems than it solves.
Triage:
- Is it being exploited in the wild?
- Is your stack affected (OS version, exposed service)?
- Can it be mitigated temporarily (e.g., disabling a feature)?

Consider compensating controls:
- WAF rules, network segmentation, ACL hardening—these can buy you time while patch quality matures.

The key lies in making informed decisions through collaboration, prioritizing based on the specific environment, business impact, and existing security controls.

denbesten · ‎06-13-2025

To an extent, they are right. One should generally keep equipment patched. But like any advise, it needs to be moderated with a dose of reality. There is risk to patching; there is risk to not patching.

Is it worth risking a Crowdstrike event by applying patches to all devices immediately and depending solely on automated testing?
Timing can matter. I.e. can it wait till lunch/morning/off-shift? Sometimes, "right now" will shut down just-in-time manufacturing, resulting in burdensome contractual penalties due to the customer.
We once had an incident that required shutting down most of our entire corporate network for a few days. The fix was to apply a patch that had come out a week earlier.

The goal is to find the middle ground where everyone can sleep at night and everyone is equally unhappy with the plan.

We have a schedule stating how many "wait days" before each wave gets updates (as @JKWiniger mentions), including escalation based on CVSS score. And, if there is evidence we are under attack the schedule escalates further.

JKWiniger · ‎06-13-2025

@akkem But you seem to miss one thing... you don't know if a 0 day is being exploited until it's too late! So while many things are not actively exploited you need to act as if they all are. Even if 1 in 100 get exploited I would much rather apply 100 patched than not patch for that 1 that matters...

John-

vishybear · ‎06-14-2025

I think that my issue is that I've seen more patches go wrong than exploits .

As someone who's done the 2 or 3 days in a row in the office and the 6am finishes because someone else did something stupid while still being made redundant or having my contract not renewed.

I think I'm at the point in my career where I'd rather take a 99% risk than cancel a plan with friends.

I do genuinely dislike being expected to cancel plans or stay late with no extra pay by people who've never done it or have never worked sysadmin. Especially if the 0 day is on a product I didn't want to buy in the first place or a PM or cybersecurity guy ignored advice I gave 6 months before. Usually a PM. ;o)

vishybear · ‎06-14-2025

Thats if you have testing environments. To be fair if a security team came to me at 4:55 expecting me to cancel plans, I wouldn't even roll out testing patchers unless I could press a button and leave the office and worry about it in the morning.

I asked this question on the r/sysadmin reddit & there is definitely a split between the junior guys who haven't been mentally destroyed yet and the more senior guys like me that have been screwed over by corporate decisions repeatedly ;o)

I'm now a strict, no overtime without 1 month notice and double time, regardless of risk.

denbesten · ‎06-14-2025

@vishybear wrote:
I think that my issue is that I've seen more patches go wrong than exploits .

The missing bit is not knowing how many exploits were foiled by patching prior to exploit. Part of that may be because success does not create headlines. The classic example being fear of flying due to airplane accidents despite the fact that all objective measurements show driving has more fatalities.

I do genuinely dislike being expected to cancel plans or stay late with no extra pay by people who've never done it or have never worked sysadmin.

You might check your local labor laws regarding working "off the clock". But that is just a symptom. The true problem is not aligning staffing levels with requirements. If the organization requires 1-hour response, the organization needs to staff for 1-hour response.

@vishybear wrote:
insist that the patches need to be installed STRAIGHT AWAY ... [v.s.]... letting someone else take the risk first.

Both of these are immature positions. The mature position realizes this is not a binary choice. Notable to me is that all those advising a layered approach are CISSPs, indicating 5+ years demonstrable security experience, often including sysadmin work at least for their own boxes and an ability to understand the management perspective.

Back to the tiers/waves, a "test environment" is not mandatory. One possible approach would be something like this:

Everything "security" owns, plus all the I.T. laptops, using the "eat your own dog food" theory, coupled with not wanting devices involved in recovery to fail at the same time as production devices.
All non-production devices ("test", "development", or "quality assurance").
Any device identified as tolerable of 24 hours+ downtime, including the standby members of clusters.
Any device "publicly exposed" (e.g. public web servers).
Anything not directly involved with producing the company's product (e.g. payroll, scheduling, non-I.T. laptops)
And lastly, the manufacturing equipment.

Steve-Wilme · ‎06-17-2025

We take a mix and match approach focusing on all pertinent risks. Prioritising based on CVSS score and how exposed particular systems are. You'd test in non prod if feasible, but if not, take one of two approaches to derisk deployment; wait a few days or start patching less critical systems first and monitor for ill effects. Once you're happy you can accelerate the deployment.

-----------------------------------------------------------
Steve Wilme CISSP-ISSAP, ISSMP MCIIS