Finally I have found some time to write this blogpost in detail. It took place last year somewhere around october and november, so here we go!
The other day I received some complaints about not being able to access a CIFS share on the network. Several users acknowledged this, they got the Windows authentication pop-up (the clients are running Windows 7).
I related the problem to a on-going domain controller migration because that was the only major change the last couple of days in this particulair domain. In the end I’ve encountered an interesting Kerberos problem at the storage level and that’s what this blogpost is all about.
A little introduction to the existing network:
As you can see, the Active Directory trust topology isn’t optimal, especially the External Trust between domain A and C1 is something that I need to get rid off asap because we have a transitive 2-way forest trust at the top. This forest trust ensures that authentication will also work from domain C1 all the way to to domain A because domain A and B are in a Tree Root configuration, they share the same Active Directory schema. We inherited this config from the past, it’s not something I would build today.
Anyway, let’s focus on the authentication problem. The new NetApp vFiler is deployed in domain C1, the users access the resources located at the vFiler from domain A. An interesting fact was that there were no problems in accessing old shares from Filers deployed way back. Another important detail was that users from the C1 domain didn’t have any authentication issues against the share.
I started to check the eventlog on domain controllers and after searching I found an event related to some authentication issue. Parallel to my troubleshooting efforts I contacted the storage engineer to check their domain controller configuration at the vFiler. It turned out they configured the vFiler to use two new domain controllers based on Windows Server 2012 R2 in the C1 domain. Now, with this keeping in mind I’ve narrowed down the troubleshooting only to those new DC’s.
I’ve blurred the system names for legal and security reasons
So what do we know at this point:
- No problems accessing share from filers that were already deployed within the network.
- User accounts from de C1 domain didn’t experience any authentication issue against the share hosted at the newly deployed vFiler.
- It was possible to access the share from domain A by using an IP address in the UNC path. This gave me a good direction because Kerberos relies on FQDN.
- By default, Kerberos authentication is not supported over an External trust. Checked the setup, it was not enabled.
Start Some Tracing
The next step was to make some network captures using Wireshark. I installed wireshark on my workstation and started a capture while trying to connect to the share.
This screenshot shows you the unsuccessful session setup from the client to the vFiler. I’ve filtered out some extra lines that will show the DNS lookup and kerberios negotiations between the domain controllers from Domain A, B, C and C1. All domain controllers are contacted in the chain. The IP address of the client is 10.169.144.40, the vFiler has the 10.101.32.12 address. The client tries to setup an SMB session and fails because of this KRB5KRB_AP_ERR_MODIFIED error.
It’s also nice to see the authentication possibilities at the Session Setup Request frame.
At least we know that Kerberos is supported at the vFiler.
Now, why does Kerberos fail with the KRB5KRB_AP_ERR_MODIFIED and STATUS_MORE_PROCESSING_REQUIRED? And why aren’t there any issues when accessing old shares from existing NetApp Filers? I made another network capture from my workstation and captured the traffic to the existing Filers.
This NetApp Filer fails back to NTLM authentication, that’s why there are no problems!
I contacted the storage engineer again and requested him to deliver some debug logging. This is what I got:
[vFiler:auth.trace.authenticateUser.krbReject:info]: AUTH: Login attempt from 10.169.144.37 rejected by Kerberos. [vFiler:cifs.trace.GSSinfo:info]: AUTH: notice- Could not authenticate user. [vFiler:cifs.trace.GSSinfo:info]: AUTH: notice- Illegal cross-realm ticket. [vFiler:auth.trace.authenticateUser.loginAccepted:info]: AUTH: Login by $username from 10.169.144.37 accepted. [vFiler:cifs.trace.GSSinfo:info]: AUTH: notice- Could not authenticate user. [vFiler:cifs.trace.GSSinfo:info]: AUTH: notice- Illegal cross-realm ticket.
Solving the Problem
Okay, the NetApp vFiler is giving me a “Could not authenticatie user” and a “Illegal cross-realm ticket”. Does it have anything to do with the multiple Active Directory trusts? Yes it does…
The ONTAP operating system (runs at NetApp Filers) has some Kerberos related configuration options. The kerberos.check_transited_list.enable seems to be the culprit. This check is enabled by default, for example, kerberos.check_transited_list.enable is set to on.
After the storage engineer modified the option to “off” the Kerberos authentication started to flow and shine between the client and vFiler! Be sure to check out THIS NetApp KB article, it explains a lot!
TLDR; because the user needs to “hop” over several trusts and the option to support this was disabled at the NetApp vFiler caused the Kerberos ticket request and acknowledge to fail.