Vicky Winner on Thu, 25 Feb 2010 07:00:00
Users complain that when their jobs connect to SQL Server, they get below error message. They connect using SSMS or client applications. If they continuously try connecting, it connects without errors.
[SQL Native Client]TCP Provider: Timeout error .
[SQL Native Client]Login timeout expired
[SQL Native Client]Unable to complete login process due to delay in login
When I checked error log, I found below messages.
2010-02-24 21:32:00.670 Logon Login failed for user ''. The user is not associated with a trusted SQL Server connection. [CLIENT: 126.96.36.199]
2010-02-24 21:32:00.820 Logon Error: 17806, Severity: 20, State: 2.
2010-02-24 21:32:00.820 Logon SSPI handshake failed with error code 0x80090311 while establishing a connection with integrated security; the connection has been closed. [CLIENT: 188.8.131.52]
Could some one look into this issue and help/suggest?
Thanks In Advance
Stoyko Kostov - MSFT on Fri, 26 Feb 2010 17:57:34
That error code, 0x80090311, is SEC_E_NO_AUTHENTICATING_AUTHORITY, from a Windows component called SSPI. Per their documentation here (http://msdn.microsoft.com/en-us/library/aa374705(VS.85).aspx), it means:
"The function failed. No authority could be contacted for authentication. This could be due to the following conditions:
- The domain name of the authenticating party is incorrect.
- The domain is unavailable.
- The trust relationship has failed."
If I'm understanding correctly those explanations, it probably means that either the SQL Server instance couldn't contact the Domain Controller (maybe there is a bad network connection between them?) or there is some problem with the domain trusts between the client machine and the server machine.
However, I'm a little surprised to see the client giving a "login timeout" error when the server is throwing this error - I believe the server should immediately close the connection, in case of authentication failure. Are you sure that these client errors correspond 1-to-1 to the server error you have posted? If not, you might be able to resolve that specific client error just by increasing the Login Timeout (this can be controlled through the connection string).
Vicky Winner on Sat, 27 Feb 2010 05:58:54
Hi Stoyko. Thanks for taking this up.
I have gone through your note. I see that messages from SQL Server error log and client time out errors doesn't map together. You mean to say that they are totally different? I see these error messages on server just before the app job fails.
As I mentioned in my posted questions that when we double click multiple times using SSMS, it gets connected. (But its tough for monitoring for application jobs on re-connecting whenever they fail to connect)
Recently my Wintel team had looked into these issues and presumed that it could be due to "DNS Resolver Cache". They started clearing and when the application job was re-run, it went fine. Then onwards whenever 'timout error' occurs app team runs to wintel to flush the "DNS Resolver Cache". But I still see that DNS... takes very little memory.
I hope I had elaborated bit more on this issue. Please suggest if I am looking into wrong place.
I will try to look more into the error code and the link you provided.
Thanks In Advance.
Vicky Winner on Tue, 02 Mar 2010 04:29:42
Stoyko! I had replied to your post. Please see.
Dan Benediktson on Fri, 05 Mar 2010 18:01:58
If flushing the DNS cache is resolving the problem, then that fits with the explanation above: it looks like you actually have two problems:
1) Client machine occasionally has trouble making network connections to the SQL Server machine.
2) SQL Server machine occasionally has trouble making network connections to the Domain Controller.
Both of those could be explained by the root cause of stale DNS entries, which flushing the cache would resolve. I would work with your networking team, though, to see why the DNS entries are becoming stale so often, since it is causing instability for you and making you have to flush the DNS cache often.
Vicky Winner on Wed, 17 Mar 2010 03:53:28
Thanks for your reply. The above mentioned resolution was work around and now its been permanently fixed by Modifying hosts file for the problem server to have static entry to their domain controllers. I am not sure on what was the actual issue.
Armsteee on Thu, 01 May 2014 15:00:10
How did you go about it?