Udostępnij za pośrednictwem


Tweaking SCOM 2012 Management Servers for large environments

<!--[if lt IE 9]>

<![endif]-->

Comments

  • Anonymous
    January 01, 2003
    The comment has been removed
  • Anonymous
    January 01, 2003
    @ Ted T Hacker -

    Great question. Yes, there is. "Command Timeout Seconds" has to do with regular stored procedure calls from a SCOM workflow to the DW. Such at maintenance operations/aggregations. "Deployment Command Timeout Seconds" is different - this value has to do with scripts that are called during a major update, such as a version update, service pack, or update rollup. Changing the latter is more rare, however I have seen issues reported where these scripts got caught up blocking, and took a LONG time to complete, so rather than fail due to a timeout - we had the customer set a very long time to get them to complete. It isn't a common occurrence and generally I'd only change that one under advisement from support, like you did. All good.
  • Anonymous
    January 01, 2003
    Brett - where did you get this 500 group limitation? The product group tested up to 1000 groups when performance testing SCOM 2007. It was recommended not to go over 1000 groups simply because we didn't test beyond that. However, using groups that don't rollup health state, and using simple group memberships heavily affected this scalability concern. Now that SCOM 2012 has a distributed model for config and group population, I have not heard any limitations such as this, nor have I heard what we test up to, I'd assume likely the same 1000 groups for testing. However, I have customers beyond this and they don't have any issues with group population.
  • Anonymous
    January 01, 2003
    The comment has been removed
  • Anonymous
    January 01, 2003
    Kevin,

    Is there a difference between the HKLMSOFTWAREMicrosoftMicrosoft Operations Manager3.0Data Warehouse
    "Command Timeout Seconds"
    and "Deployment Command Timeout Seconds" values? A MS Engineer during a incident advised creating the second value. Do these values conflict with each other or are they complimentary? We have the "Deployment Command Timeout Seconds" set to 86400 (1 day). At that point we were having problems upgrading from 2012 SP1 to R2. Thanks. Ted.
  • Anonymous
    January 01, 2003
    The comment has been removed
  • Anonymous
    January 01, 2003
    @Jesse - Actually - Bulk Insert Command Timeout was a new registry control available with UR1. It wasn't added in UR5. I don't have a recommendation for adjusting this - it simply opened the capability to adjust this if needed. I only recommend changing that one if directed to by Microsoft Support to resolve a problem with bulk inserts to the warehouse, which is a rare condition. I have never worked with a customer who needed this modified from the default.
  • Anonymous
    June 27, 2014
    Hi, Kevin,
    I have seen a lot of blogs that recommend against setting the PoolManager key. They indicate that this was required when SCOM 2012 was first released but has since been fixed - implementing it now can actually degrade performance. Can you confirm this is still required for large environments?
  • Anonymous
    July 21, 2014
    Hey Kevin,

    Good stuff, as always! :)

    It would be of great value if, for each of the above registry settings, associated monitoring instrumentation could be identified in order to help administrators determine whether the corresponding registry setting update should be considered for their environment.

    Examples might include:
    1) evaluating a particular PerfMon counter against a specific threshold, or
    2) the presence of specific event log entries

    One might even wonder if/why this is not already included as monitors in the OpsMgr based MPs...
  • Anonymous
    August 06, 2014
    Very helpful , good article. I wonder why Microsoft do not publish an article with those registry settings.
  • Anonymous
    December 09, 2014
    SCOM 2007 recommended not having more than 500 groups. Does SCOM 2012 have a recommended limit?
  • Anonymous
    December 09, 2014
    SCOM 2007 recommended not having more than 500 groups. Does SCOM 2012 have a recommended limit?
  • Anonymous
    February 18, 2015
    UR5 introduces a new registry value: Bulk Insert Command Timeout. http://support.microsoft.com/kb/3029227 Do you have any guidance around using this value as well?
  • Anonymous
    April 07, 2015
    You mention that these changes are recommended on management servers but make no mention if they are required on a gateway server. What is Microsoft's stance on tweaking registry settings on them?
  • Anonymous
    April 16, 2015
    Thanks for the quick response. I didn't think so either but I wanted to be 100% certain. Thanks again for taking the time to respond to everyone's questions and keep this site updated. It is appreciated!
  • Anonymous
    August 06, 2015
    @Jasper -
    I don't know offhand. MOST registry changes require a service restart unless there is code to check the registry on an interval or to be notified of a reg change. I doubt we would do this and my assumption is that a restart of services is required. I'd have to tracelog to be sure.
  • Anonymous
    September 05, 2015
    Hallo. I have 4vm (8ram) and 2 phys (24ram) scom servers.

    700 win agents
    60 linux agents.
    lot of MP`s. also custom heavy ones, like progess sonic monitoring
    349 groups
    270 network devices

    health service on phys server ~3-4 gb

    virtual servers sometimes says, that they dropping data..

    should i made s reg edit with those params?


  • Anonymous
    October 28, 2015
    Hi Kevin,

    Thank you for sharing.

    Are these registry keys also applicable to Gateways Servers or just for Management Servers only?

    Marlon
  • Anonymous
    October 28, 2015
    Hi Kevin, Are those keys also valid for a large environment with OpsMgr 2007 R2?
    thank you
  • Anonymous
    October 28, 2015
    @Marlon - SOME of these are potential candidates for gateways, but I generally don't recommend any change3s on GW's unless you are specifically experiencing a problem. On management servers - I set these on all my customers, regardless of size.
  • Anonymous
    October 28, 2015
    @Jean - no - not all these keys are valid for SOCM 2012R2 - as some don't apply and some have moved.

    For 2007R2 I will defer to Matt: http://blogs.technet.com/b/mgoedtel/archive/2010/08/24/performance-optimizations-for-operations-manager-2007-r2.aspx and I also strongly recommendhttp://blogs.technet.com/b/kevinholman/archive/2011/02/07/a-new-feature-in-r2-cu4-reconnecting-to-sql-server-after-a-sql-outage.aspx
  • Anonymous
    October 29, 2015
    Thank you!
  • Anonymous
    November 25, 2015
    Thanks Kevin for the confirmation! We are still observing the changes in our SCOM environment.
  • Anonymous
    February 03, 2016
    This is a common practice for rotating old physical servers coming off lease, or when moving VM based
  • Anonymous
    April 01, 2016
    Hi Kevin,Does these settings require server restart?T
  • Anonymous
    April 01, 2016
    Hi Kevin,Does these settings require server restart?Thanks Ashish
  • Anonymous
    April 14, 2016
    I do miss the parametersMaximum Global Pending Data Count andPersistence Version Store Maximumand Persistence Cache Maximumin this blog.With regards to the latter;In this PDF ( http://download.microsoft.com/download/8/2/8/828C05A2-E6A0-436A-9AE1-704A8005E355/9780735695825.pdf ) they say;Another important setting is Persistence Cache Maximum of type DWORD. This setting controls the amount of memory in pages used by Persistence Manager for its data store on the local database. The default value for this is 262144 (decimal), which is also the recommended value. If you are running an older version of Operations Manager on management servers that manage a large number of objects, you should change the value to 262144 (decimal).The default and recommended values are the same here, i think that is a mistake, but for now I've set it to 262144.Suggestions are welcome!
  • Anonymous
    November 01, 2016
    I love this article, deep explantion followed by a exec summary, followed by a "ok, i realise your lazy so..." the actual commands. Genius
  • Anonymous
    November 01, 2016
    Kevin,When I restart the healthservice from one management server, the All Management Server Pool Unavailable alert is raised and my Noc dashboard created by widged changes his state to gray. Then I wait for 15 minutes to go back to normal. I think this is a pool process version related. Is there any configuration to increase the value of pool unavailable?
    • Anonymous
      November 01, 2016
      How many management servers do you have?
      • Anonymous
        March 01, 2017
        Kevin,I have 3 management servers in the pool
        • Anonymous
          March 01, 2017
          The comment has been removed
  • Anonymous
    January 04, 2017
    Hi Kevin,Does the same recommandations apply for SCOM 2016?
  • Anonymous
    July 05, 2018
    Hi Kevin,I have an environment where I am regularly seeing Resource Pools fail, from the SCOM console itself we see a heartbeat failure for the Pool and going by Event Logs I see that members are not acknowledging the lease request and are unloading all workflows.As the environment isn't in live use I've tested applying the Pool Manager fixes mentioned in the blog and this appears to resolve or heavily reduce the issue. I was wondering whether you could provide more information around what the settings grant more time for the pool to complete, as the only time related recomendation I remember seeing in the official documentation is ~10ms latency between servers. In the case of my Resource Pools failing, is it almost certainly going to come down to workflows taking too long to complete despite apparant low load on the servers, or could it also be network issues causing the Pool to fail?As a side note, in the case of a pool being in a failed state would you expect SNMP traps (sent to all servers in a pool) to still get processed and alert within SCOM? The Pools in question are only used for SNMP monitoring, so if traps are still processed and it is just SNMP polling being impacted I'd be less concerned!Many thanks.James
    • Anonymous
      July 05, 2018
      I never recommend editing the registry for pool timeouts - because they often just mask the real problem and are almost never a good solution. We have also seen issues where they create pool instability by changing them.The most common issue with pool stability is network latency between management servers or MS to Database. The second most common issue is overloaded management groups, where we see blocking in the SQL database, or just too many objects hosted by the pools on the management servers.
  • Anonymous
    July 12, 2018
    Hi Kevin -We had updated the registry of all our management servers and no issues encountered for 2 years. Recently, we are going to create a new SCOM Group via console and we are experiencing long time loading and sometimes not loading at all.Can you give us insights what setting to we need to check or need to adjust? As per PS command, we have a total of 1442.Thanks!