Successfully managing updates in a 24/7 environment
For most businesses, operational hours are 8AM-5PM, making managing Windows cumulative updates and feature updates (formerly known as build updates) a non-issue. For some sectors such as healthcare, however, this task is not so easy. Releasing an update everywhere could cause disaster. If not related to a bad update itself, catastrophe could be related to the delivery of the update itself, where several departments cannot afford to have all their systems down at once for an update or upgrade to install. Emergency rooms and nurse's stations come to mind in this scenario, but there are several others as well.
So, how do we set up our environment to manage this? The answer may cause you some restructuring, however, a little time spent now will save you headaches forever after. Note that this post is geared towards Configuration Manager (SCCM) managed environments, but you could still apply the concept to other environments as well.
Part A: The first thing you need to do is establish groups of systems which you will roll-out updates to at different times. This could be days of the week to roll-out monthly updates over a week, or different hours if you want to roll-out over one day. You will also need to establish some form of labeling to each group. This can be anything you like, but in my experience, colors are a simple way to provide grouping, as well as an easy visualized separation of your systems for update roll-outs. In this post, the "color group" example will be used, and we will go with monthly updates being rolled out over the course of a week. I will go with 6 color groups, but depending on the size of your organization, you may have more or less. In this example case, I will use Red, Orange, Yellow, Green, Blue, and Purple.
Grouping systems is where most of the legwork should be done. These groups will have updates rolled out at different times, and the whole point of this is to keep departments operational during update roll-outs. In short, your goal is to never have two systems physically side-by-side in the same color group. Ideally, you will somehow label each system with a visible indicator of the color group it is in, perhaps with a colored dot matching the color of group it is in, or in cases where your company labels each workstation with a support label, using labels of the color of the group they are in. This will allow communication to be sent out in a "On Monday at 10 PM, Computers with a Blue label will install monthly updates", and your end users have a simple visual way of preparing for that to occur without having to do anything special to know whether or not they can safely use a machine that night, or if they should find another computer to use instead.
You can think of the Red group (or whichever group you designate first) as a typical "Pilot" group. Much importance should be placed on this group, as it's scope should include all critical applications. This group should update up to a week in advance of rolling out updates to the other groups, to allow time for testing and to make sure the installed updates did not break anything mission critical in your environment, and to allow time to handle anything that did break prior to updates going out everywhere else.
In my example, the roll-out schedule would look similar to below:
Red Group: The 2nd Wednesday of the Month (Following Patch Tuesday)
Orange Group: The 3rd Monday of the Month
Yellow Group: The 3rd Tuesday of the Month
Green Group: The 3rd Wednesday of the Month
Blue Group: The 3rd Thursday of the Month
Purple Group: The 3rd Friday of the Month
You will want a collection built for each group. Use a standard naming convention for your update groups (EX: "Update Group - Red"), likely with a query rule to an AD group, also with a standard naming convention (EX: "UpdateGroup-Red").
Part B: The second thing you will need to do is set up Automatic Deployment Rules (ADRs) and handle the actual deployment of updates. This can be done in several ways. In this example, we will set them up very specifically to allow for expansion and handling different products with 3rd party patching, etc.
For each product you are updating, you will want to set up both an update collection. Use a standard naming convention here (EX: "Updates - Oracle - Java", "Updates - Microsoft - DotNetFX"). Also create an update exclusion collection (EX: "Updates Exclusion - Oracle - Java", "Updates Exclusion - Microsoft - DotNetFX") for each product. In addition, you will likely want to set up an active directory group for exclusions from that product's updates. Standard naming conventions are useful here as well (EX: "Updates-Java-DenyApply", "Updates-DotNetFX-DenyApply"). The naming scheme example here is done to speed up queries in a script or SQL queries, where a tailing wildcard is less intensive than a leading wildcard. In the product updates exclusion collection, add a query for system resource/System Group Name pointing to the Active Directory group name (EX: "Updates-Java-DenyApply", "Updates-DotNetFX-DenyApply"), and configure the collection to either use Incremental updates or to update on a schedule that meets your needs. Add an exclude rule to the product updates collection excluding membership of any systems in the product updates exclusion collection for that product.
You will need to build a separate collection for each product for each color group. Add an include rule for the main product update collection, and limit it by the color group collection. This will result in a collection of machines only in the specified update group designated to get the update, with the excluded machines filtered out. Use a standard naming convention here as well (EX: "Update Rollout - Oracle - Java - Blue"). This seems like alot of collections, but the end result will make managing updates very modular and manageable. I have a script that I will post to the script library soon to assist with the build-out of all these collections mentioned.
Set up a separate ADR for each product you want to update (Match the product's update collection name, EX: "Updates - Oracle - Java", "Updates - Microsoft - DotNetFX"), and have each ADR on a schedule to run the day and time of each month you want your first color group (or pilot group, in our example the Red group) to receive updates). I find it easiest to name the ADRs with a standard naming convention "Updates - - " (EX: "Updates - Oracle - Java", "Updates - Microsoft - DotNetFX"). Each ADR should be configured with its own software update group/package. Update the existing software group or set the ADR to create new ones depending on your needs (if you want to conserve space, you may update the existing group/package and enable binary differential compression on the software update package). I have a script to assist with this (posting to the script library soon!). In this example, we will update the existing group/package. Set up the ADR to deploy the product the product's specific color group collection mentioned above.
Note that you cannot simply create a separate ADR for each color group, as if there are no changes to the software group detected, the ADR will not run. Each month slightly prior to your ADRs running, you will need to delete the previous deployments of the update groups to avoid rolling out updates immediately. You will need create a separate deployment of each product's update group to the various color group collections for that product, scheduling them following whatever schedule you decided on in Part A. I have a script that I am working on and will post soon, which can be run via a scheduled task to accomplish this in an automated fashion for you, and email you the results of that month's update scheduling.
Some notes:
This is not the only correct way to accomplish this task. After seeing this challenge come up in a few different 24/7 environments now, it is a very effective and manageable way to keep updates organized and allow for handling of any issues that occur beforehand instead of reacting to them afterwards. Many organizations choose to have a classification of workstations which cannot receive updates at all, when in reality it is a very specific product that cannot receive updates, such as Java or .NET framework, else a product that relies on it will stop working. In these cases, it is better to patch everything else possible. Of course, you should press the vendor in these cases to update their application to work with the latest security patch level of the prerequisite product, however, we don't always have that luxury right away.
Setting up your updates in this way provides the benefit of having a pilot group to catch things that go out and allow testing to be done against applications before entire buildings or departments are stopped from working in the event that an update does break something. It is easier to manage a few workstations in a department being down than having several departments screaming for resolution at once. It also allows for Feature Updates (In-Place upgrades, such as Windows 10 1709 to Windows 10 1803, etc.) to be rolled out in a fashion where entire departments do not have to stop working in order to allow the build update to occur. This allows 24/7 environments to actually consider the 6-month update channel instead of choosing a 12-month update cycle to mitigate the number of times a year they need to deal with the time it takes for these updates to install.
Thank you for reading, and happy SysAdmining!
Sean