Tweaking SCOM 2012 Management Servers for large environments
<!--[if lt IE 9]>
<![endif]-->
Comments
- Anonymous
January 01, 2003
The comment has been removed - Anonymous
January 01, 2003
@ Ted T Hacker -
Great question. Yes, there is. "Command Timeout Seconds" has to do with regular stored procedure calls from a SCOM workflow to the DW. Such at maintenance operations/aggregations. "Deployment Command Timeout Seconds" is different - this value has to do with scripts that are called during a major update, such as a version update, service pack, or update rollup. Changing the latter is more rare, however I have seen issues reported where these scripts got caught up blocking, and took a LONG time to complete, so rather than fail due to a timeout - we had the customer set a very long time to get them to complete. It isn't a common occurrence and generally I'd only change that one under advisement from support, like you did. All good. - Anonymous
January 01, 2003
Brett - where did you get this 500 group limitation? The product group tested up to 1000 groups when performance testing SCOM 2007. It was recommended not to go over 1000 groups simply because we didn't test beyond that. However, using groups that don't rollup health state, and using simple group memberships heavily affected this scalability concern. Now that SCOM 2012 has a distributed model for config and group population, I have not heard any limitations such as this, nor have I heard what we test up to, I'd assume likely the same 1000 groups for testing. However, I have customers beyond this and they don't have any issues with group population. - Anonymous
January 01, 2003
The comment has been removed - Anonymous
January 01, 2003
Kevin,
Is there a difference between the HKLMSOFTWAREMicrosoftMicrosoft Operations Manager3.0Data Warehouse
"Command Timeout Seconds"
and "Deployment Command Timeout Seconds" values? A MS Engineer during a incident advised creating the second value. Do these values conflict with each other or are they complimentary? We have the "Deployment Command Timeout Seconds" set to 86400 (1 day). At that point we were having problems upgrading from 2012 SP1 to R2. Thanks. Ted. - Anonymous
January 01, 2003
The comment has been removed - Anonymous
January 01, 2003
@Jesse - Actually - Bulk Insert Command Timeout was a new registry control available with UR1. It wasn't added in UR5. I don't have a recommendation for adjusting this - it simply opened the capability to adjust this if needed. I only recommend changing that one if directed to by Microsoft Support to resolve a problem with bulk inserts to the warehouse, which is a rare condition. I have never worked with a customer who needed this modified from the default. - Anonymous
June 27, 2014
Hi, Kevin,
I have seen a lot of blogs that recommend against setting the PoolManager key. They indicate that this was required when SCOM 2012 was first released but has since been fixed - implementing it now can actually degrade performance. Can you confirm this is still required for large environments? - Anonymous
July 21, 2014
Hey Kevin,
Good stuff, as always! :)
It would be of great value if, for each of the above registry settings, associated monitoring instrumentation could be identified in order to help administrators determine whether the corresponding registry setting update should be considered for their environment.
Examples might include:
1) evaluating a particular PerfMon counter against a specific threshold, or
2) the presence of specific event log entries
One might even wonder if/why this is not already included as monitors in the OpsMgr based MPs... - Anonymous
August 06, 2014
Very helpful , good article. I wonder why Microsoft do not publish an article with those registry settings. - Anonymous
December 09, 2014
SCOM 2007 recommended not having more than 500 groups. Does SCOM 2012 have a recommended limit? - Anonymous
December 09, 2014
SCOM 2007 recommended not having more than 500 groups. Does SCOM 2012 have a recommended limit? - Anonymous
February 18, 2015
UR5 introduces a new registry value: Bulk Insert Command Timeout. http://support.microsoft.com/kb/3029227 Do you have any guidance around using this value as well? - Anonymous
April 07, 2015
You mention that these changes are recommended on management servers but make no mention if they are required on a gateway server. What is Microsoft's stance on tweaking registry settings on them? - Anonymous
April 16, 2015
Thanks for the quick response. I didn't think so either but I wanted to be 100% certain. Thanks again for taking the time to respond to everyone's questions and keep this site updated. It is appreciated! - Anonymous
August 06, 2015
@Jasper -
I don't know offhand. MOST registry changes require a service restart unless there is code to check the registry on an interval or to be notified of a reg change. I doubt we would do this and my assumption is that a restart of services is required. I'd have to tracelog to be sure. - Anonymous
September 05, 2015
Hallo. I have 4vm (8ram) and 2 phys (24ram) scom servers.
700 win agents
60 linux agents.
lot of MP`s. also custom heavy ones, like progess sonic monitoring
349 groups
270 network devices
health service on phys server ~3-4 gb
virtual servers sometimes says, that they dropping data..
should i made s reg edit with those params? - Anonymous
October 28, 2015
Hi Kevin,
Thank you for sharing.
Are these registry keys also applicable to Gateways Servers or just for Management Servers only?
Marlon - Anonymous
October 28, 2015
Hi Kevin, Are those keys also valid for a large environment with OpsMgr 2007 R2?
thank you - Anonymous
October 28, 2015
@Marlon - SOME of these are potential candidates for gateways, but I generally don't recommend any change3s on GW's unless you are specifically experiencing a problem. On management servers - I set these on all my customers, regardless of size. - Anonymous
October 28, 2015
@Jean - no - not all these keys are valid for SOCM 2012R2 - as some don't apply and some have moved.
For 2007R2 I will defer to Matt: http://blogs.technet.com/b/mgoedtel/archive/2010/08/24/performance-optimizations-for-operations-manager-2007-r2.aspx and I also strongly recommendhttp://blogs.technet.com/b/kevinholman/archive/2011/02/07/a-new-feature-in-r2-cu4-reconnecting-to-sql-server-after-a-sql-outage.aspx - Anonymous
October 29, 2015
Thank you! - Anonymous
November 25, 2015
Thanks Kevin for the confirmation! We are still observing the changes in our SCOM environment. - Anonymous
February 03, 2016
This is a common practice for rotating old physical servers coming off lease, or when moving VM based - Anonymous
April 01, 2016
Hi Kevin,Does these settings require server restart?T - Anonymous
April 01, 2016
Hi Kevin,Does these settings require server restart?Thanks Ashish - Anonymous
April 14, 2016
I do miss the parametersMaximum Global Pending Data Count andPersistence Version Store Maximumand Persistence Cache Maximumin this blog.With regards to the latter;In this PDF ( http://download.microsoft.com/download/8/2/8/828C05A2-E6A0-436A-9AE1-704A8005E355/9780735695825.pdf ) they say;Another important setting is Persistence Cache Maximum of type DWORD. This setting controls the amount of memory in pages used by Persistence Manager for its data store on the local database. The default value for this is 262144 (decimal), which is also the recommended value. If you are running an older version of Operations Manager on management servers that manage a large number of objects, you should change the value to 262144 (decimal).The default and recommended values are the same here, i think that is a mistake, but for now I've set it to 262144.Suggestions are welcome! - Anonymous
November 01, 2016
I love this article, deep explantion followed by a exec summary, followed by a "ok, i realise your lazy so..." the actual commands. Genius - Anonymous
November 01, 2016
Kevin,When I restart the healthservice from one management server, the All Management Server Pool Unavailable alert is raised and my Noc dashboard created by widged changes his state to gray. Then I wait for 15 minutes to go back to normal. I think this is a pool process version related. Is there any configuration to increase the value of pool unavailable?- Anonymous
November 01, 2016
How many management servers do you have?- Anonymous
March 01, 2017
Kevin,I have 3 management servers in the pool- Anonymous
March 01, 2017
The comment has been removed
- Anonymous
- Anonymous
- Anonymous
- Anonymous
January 04, 2017
Hi Kevin,Does the same recommandations apply for SCOM 2016? - Anonymous
July 05, 2018
Hi Kevin,I have an environment where I am regularly seeing Resource Pools fail, from the SCOM console itself we see a heartbeat failure for the Pool and going by Event Logs I see that members are not acknowledging the lease request and are unloading all workflows.As the environment isn't in live use I've tested applying the Pool Manager fixes mentioned in the blog and this appears to resolve or heavily reduce the issue. I was wondering whether you could provide more information around what the settings grant more time for the pool to complete, as the only time related recomendation I remember seeing in the official documentation is ~10ms latency between servers. In the case of my Resource Pools failing, is it almost certainly going to come down to workflows taking too long to complete despite apparant low load on the servers, or could it also be network issues causing the Pool to fail?As a side note, in the case of a pool being in a failed state would you expect SNMP traps (sent to all servers in a pool) to still get processed and alert within SCOM? The Pools in question are only used for SNMP monitoring, so if traps are still processed and it is just SNMP polling being impacted I'd be less concerned!Many thanks.James- Anonymous
July 05, 2018
I never recommend editing the registry for pool timeouts - because they often just mask the real problem and are almost never a good solution. We have also seen issues where they create pool instability by changing them.The most common issue with pool stability is network latency between management servers or MS to Database. The second most common issue is overloaded management groups, where we see blocking in the SQL database, or just too many objects hosted by the pools on the management servers.
- Anonymous
- Anonymous
July 12, 2018
Hi Kevin -We had updated the registry of all our management servers and no issues encountered for 2 years. Recently, we are going to create a new SCOM Group via console and we are experiencing long time loading and sometimes not loading at all.Can you give us insights what setting to we need to check or need to adjust? As per PS command, we have a total of 1442.Thanks!