So the preface of this article is that I have moved to a new company! I’m still working in ConfigMgr everyday, but it’s a slightly different environment. I’ve moved from being the primary person in a 2,500 Windows client environment, to a member of a team of 6 managing 18,000 computers in a much more distributed environment. That means my new employer has many more locations.
I ran into an issue that I haven’t had to deal with previously yesterday, and if this helps someone then great! I’m glad my embarrassment can be useful for you.
Just starting I’m jumping into some menial tasks, one of which was working with the Windows Updates that came out this patch Tuesday. No problem I can go through those and make sure that everything looks good. They mentioned that the current deployments, for January – April had not been condensed into one deployment yet. Cool I can do that too.
So I got the patch Tuesday deployment ready with the help of a co-worker, and then started moving the group membership of the January – March patches into the April patch collection. These groups were still deployed at the time.
Do you see the issue I just created? I didn’t until some of our smaller sites (with slower network connections) started complaining of network slowness. The network guys also noted that the links were becoming saturated.
Investigation determined that it was computers connecting to our WSUS servers to determine if they needed the patches in the April patch collection or not. Lots of computers were checking. All the computers were checking! They were doing it very slowly (because we have BITS throttling in place) but so many were doing it that the computers were still clogging the slower links and causing the problem.
The interim solution was to delete the deployment that was causing the computers to check in, and to stop the WSUS service on the WSUS servers. The effect was immediate and networks recovered almost immediately.
There was a mitigating circumstance that the person that I was working with was not aware of. Sometime in the last few months the WSUS servers were “right-sized” for the number of machines that were connecting. So, in the past, when changes like the ones we made were done, the bottleneck was on the WSUS servers themselves. They would be pegged at 100% CPU for “days” when this type of change was made. Computers would be denied access, and they would check back later to see what updates they needed. When the WSUS servers were expanded, they were able to process all the requests that were coming to them. They were able to process so well that the new bottleneck became slower network connections.
Moral of the story: changing the content of the deployment made computers in the environment re-evaluate themselves against the whole group of patches in the collection, not just new ones. In our situation the traffic between the client and the server was able to significantly slow down network traffic at slow locations.
Second Moral of the Story: When you fix one thing, sometimes you shift the bottleneck and break something else, make sure you are ready for that.
There is also an article on Microsoft’s website about a known issue that causes WSUS to cause high network bandwidth that is located here https://support.microsoft.com/en-us/help/4163525/high-bandwidth-use-when-clients-scan-for-updates-from-local-wsus-serve it’s possible that this caused some additional problems with our collection, but it certainly wasn’t our only issue.
I try to learn something new every day. This was my learning for yesterday.