I noticed that there are a lot of Warnings issued from my LSI controllers in the eventlogs. LSI support told me that this can be due to failing disks, but in my case it could be something else, related to my SMART monitoring tool...
Click to Read More
Concretely, I have two LSI SAS Controllers: a LSI SAS 9211-8i and a LSI SAS 9201-16i. Both have been updated with the Firmware 19.00.00.00 and I am using the drivers version 18.104.22.168 for Windows x64: LSI Adapter SAS2 2008 Falcon and LSI Adapter SAS 2 2116 Metero ROC(E). I didn't upgrade the bios of those cards as more recent versions are not compatible with my motherboard (See here).
I often noticed "Warnings" related to my two LSI adapters (with "LSI_SAS2" as a source) in my Event Log; either "Reset to device, \Device\RaidPort1, was issued." or "Reset to device, \Device\RaidPort2, was issued.". Those Warnings are usually followed by one or two other Warnings like "The IO operation at logical block address 0 for Disk 0 was retried." with "disk" as a source. The address and the disk number vary a lot but the Warnings appear very precisely every 180 seconds (3 minutes)
I read on the web that this is usually due to a timeout on accessing the disk and resulting in the controller to be reset. This issue is often solved by using the "High Performance" Power Plan Power with the option "PCI Express" > "Link State Power Management" set to "Off" ! But it didn't solve my own issue.
I have therefore contacted LSI support and was told that "The resets are to the drives which are timing out. It is possible one or more of them have an issue. Replace the drive that has the highest number of resets on his port."
I have the issue mostly with all my disks, so I didn't know where to start... especially as according to their SMART status, they were all definitively perfect. Could it therefore be due to the cables ? No idea yet... But looking once more the SMART details, I noticed that the Warnings where typically logged when I was refreshing those status.
I am using CrystalDiskInfo which is IMO definitively the best free SMART Monitoring tool... It is e.g. configured on my PC to send emails as soon as an SMART alert occurs... and... it is configured to check the SMART status every 3 minutes ! Gosh ! A refresh rate of 180 seconds?! That rings a bell and even a siren! I immediately disabled the SMART Monitoring and didn't get any Warning anymore. Trying other SMART tools, I noticed the same issue...
I did submit my findings to LSI and wait now on their feedback: is there any conflict at LSI SAS adapter level when accessing data on disk at the same time as the disk's SMART info?
That being said, I noticed also that most 99% of the Warnings are related to my 5 Seagate ST3000DM001. I seldom have Warnings for my Samsung HD204UI (patched to avoid data corruption when accessing SMART info!)