Share via


Transport Layer Security (TLS) Handshake Failing, SChannel Error 36888

 

Hey guys, Keith Abluton here again. I wanted to share an interesting trend that I have seen lately in some of the Schannel cases that I have worked. There has been a large uptick in TLS handshake failures. They are usually accompanied by Schannel errors that show up in the System Event Log. In most cases, but not all, they are Event ID 36888.

Error Event ID 36888: "The following fatal alert was generated: 20. The internal error state is 960"

According to our documentation, the sub code 20 indicates a “Bad record MAC” and 960 is an “Unwrap_DecryptFailure

Scenario

The scenario that seems to be the most common is the customer has a 3rd party device or application that is connecting to a Microsoft Server (2008R2, 2012, and 2012R2 are the most common) and they are either failing the TLS handshake completely or sporadically.

Cause

The first thing that you should do is gather a network trace that lines up with the errors you are seeing. Use the network capture tool that you feel most comfortable with. Drilling down into the network trace to the Server Hello will tell us the Cipher Suite that is being negotiated. In the example below, and in most of the cases I have seen, the problematic Cipher Suite will be either TLS_DHE_RSA_WITH_AES_128_CBC_SHA or TLS_DHE_RSA_WITH_AES_256_CBC_SHA. There are various interoperability and deeply technical reasons for this that I will not get into in this article. Keep in mind that I have seen cases where a Cipher Suite other than the two above was the culprit so you may still want to follow my recommendations for remediation even if that is the case.

image

 

Resolution

The Cipher Suite order is determined by one of two registry keys

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Cryptography\Configuration\Local\SSL\00010002

or

HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Cryptography\Configuration\SSL\00010002

The second one is set by either a Local Policy or a Group Policy and takes precedence over anything defined in the first one. If you do not have a policy in place it will be blank.

Opening the Functions key will display a Multi String valued list that looks like below:

image

 

Disclaimer: Please backup this key before making any changes to it so that you can revert to back if necessary.

If the Cipher Suite you are seeing in your network trace is one of the two I outlined above, I would recommend simply cutting both of them out and then saving the registry key so that they are no longer on the list. A reboot MUST be done after this change for it to take effect. Doing this is a relatively quick and easy way to determine if the Cipher Suites are the issue.

Did that resolve your issue? If so, great. As a supported long term solution you really want to set the Cipher Suites via either a Local policy or a Group Policy so that future OS hotfixes will not add the Cipher Suites back in and return it to a broken state. Please refer to the following article for instructions on setting this in Group Policy: https://msdn.microsoft.com/en-us/library/windows/desktop/bb870930(v=vs.85).aspx#adding__removing__and_prioritizing_cipher_suites

Additionally it is recommended that you patch the OS fully when possible.

If this did not resolve your issue, your next step would be to move the below highlighted Cipher Suites to the top of the order (they are the ones that start with TLS_RSA_WITH_AES. Cut them from wherever they are in the order, paste them at the top, save the key, then reboot the server. (Again, a reboot is a MUST)!

image

Did that resolve the issue? If so, great. Please follow the advice stated previously for setting it via Group Policy as a long term supported solution.

If this did not resolve your issue it may be time to open a support call with us to assist you in figuring it out. Microsoft Support will ask you to gather data. I wrote the following blog to assist with that and it helps in cases where the issue is sporadic and not easy to reproduce.

https://blogs.technet.microsoft.com/keithab/2016/10/30/how-to-configure-data-captures-for-intermittentsporadic-schannel-events/

Either way it should gather what we will initially need to analyze and diagnose and you will be one step ahead of the game.

Cheers and happy hunting!

 

Keith A. Abluton, CISSP, MCSE +Security

Sr. Support Escalation Engineer

P.S. If this article helped you in any way I would greatly appreciate the feedback in the comments below. This helps us to gauge if the content we are putting out is valuable and saves our customers time and money. Thanks!

Comments

  • Anonymous
    November 14, 2016
    Very helpful. Thank you!
  • Anonymous
    November 14, 2016
    Thanks Keith, Nice blog about the SChannel Errors..
  • Anonymous
    November 16, 2016
    Hello Keith,really nice post!How is it possible to determine what the "internal error state” code means? Thanks!
    • Anonymous
      November 17, 2016
      AFAIK the full list is "internal Microsoft only." I will check with the Developers to see if it can be published publicly.
  • Anonymous
    April 20, 2017
    Great article! Thanks!