Spend any time with SharePoint 2013 (and 2016, I dare say) and you'll eventually hit a case where the Distributed Cache Service (AppFabricCache) won't start. This week saw me troubleshooting Distributed Cache in a new customer environment, and this time around the solution was a little unique.
Troubleshooting the Distributed Cache
Stepping through Sam Betts' great SharePoint 2013 + Distributed Cache (AppFabric) Troubleshooting article, I could see that SharePoint Service Instance was 'Online' while Get-CacheHost
reported a status of 'DOWN'. The host names reported by both commandlets were the same between the SharePoint and AppFabric commandlets, but the AppFabric Caching Windows Service would start and then stop after a few seconds. Windows Event Viewer was littered with messages such as:
- Faulting application name: DistributedCacheService.exe, version: 1.0.4632.0
- Faulting module name: KERNELBASE.dll, version: 6.3.9600.17415
After digging a bit deeper, I found two items that were the root cause of the service not starting:
- HostID not consistent between DistributedCacheService.exe.config and exported AppFabric cluster configuration settings
- HOSTS file had entries for the hostname and FQDN of the server
HostID not consistent
As part of my troubleshooting process I opened the DistributedCacheService.exe.config file and gave it a once-over to ensure that it looked OK. Following this I exported the AppFabric Cache configuration to file by running Use-CacheCluster
followed by Export-CacheClusterConfig -Path C:\temp\AFCacheConfig.xml
.
Each of these files contains a HostID parameter, which should have contained the same value - but in this case it wasn't consistent. A little copy & paste to update the DistributedCacheService.exe.config file and all is good; or, so I thought.
Running Remove-SPDistributedCacheServiceInstance
followed by Add-SPDistributedCacheServiceInstance
to have SharePoint start the Distributed Cache Service still resulted in the same behavior.
HOSTS file entries
The Windows Service continued to stop immediately after starting, but this time the error message was a little different:
- System.UriFormatException: Invalid URI: The hostname could not be parsed
The correct host names were reported by both the Get-CacheHost
and the SharePoint Service Instance commandlets, and the host names were resolving without issue. Turns out the HOSTS file contained entries for both the servername and the server's fully qualified domain name (FQDN), which were unnecessary as the names were resolvable in DNS.
After removing both entries and flushing the DNS cache, I ran
Remove-SPDistributedCacheServiceInstance
followed by Add-SPDistributedCacheServiceInstance
and finally the Windows AppFabric Caching service started as expected.