It's possible with SRM to get one of the following errors when testing failover (and actual failover - that's why testing is important):
Now you have the above errors and thinking oh s$%£ thats not good I can't see the storage on all the hosts.
So it's either incorrect zoning, or more likley if you've double checked that already.. insufficient time for the hosts to rescan and discover all the datastores before the recovery continues.
SRM only performs a rescan once. However it is quite possible that more than one rescan is required before all the datastores are discovered, it seems to vary depending on the array.
The fix for this is to modify an SRM advanced setting "storageProvider.hostRescanRepeatCnt", to increase the number of host rescans during testing and recovery to 2-3.
Additionally it's also possible to increase the host rescan timeout value "storageProvider.hostRescanTimeoutSec" combined with this to ensure all datastores are discovered sucessfully.
One thing to note is if you increase these values, your recovery time (RTO) will also increase.
Share this blog post on social media:
TweetAll advice, installation/configuration how to guides, troubleshooting and other information on this website are provided as-is with no warranty or guarantee. Whilst the information provided is correct to the best of my knowledge, I am not reponsible for any issues that may arise using this information, and you do so at your own risk. As always before performing anything; check, double check, test and always ensure you have a backup.