Tips Replacing RAID-F by T-RAID

T-RAID (for Transparent RAID) is a new product of flexRAID. It comes as another option next to the existing product: RAID-F (RAID over File System). I did migrate from Raid-F to T-RAID months ago…

T-RAID

T-RAID

Here is how I configure it now in order to get the best performances for my server and my own usage.

Click to Read More

Nice Features

I love T-RAID. It has great features similar to RAID-F, e.g.:

  • Software Raid Array fully independent from the hardware.
    • If a physical controller die, no need to replace it with another identical one.
  • Support adding a disk with existing data into the Software Raid Array.
    • No need to add blank disk as required with hardware raid or with Windows Storage Server.
  • Survive to simultaneous failure of several drives.
  • Access each disk through a virtual disk or through a Pool offering a unique/global view on all the virtual disks.

But it comes with its own advantages on RAID-F

  • It’s a native Real-Time protection without any drawback compared to “RAID-F RealTime” (E.g.: RAID-F in RT mode MUST be stopped gracefully before shutting down the machine)
  • Data on failing disks are still accessible in Read and Write mode! There is therefore no downtime during the “disk reconstruction” (Similar to hardware Raid)
  • It comes with interesting monitoring and notification tools (Performances, S.M.A.R.T, …)
  • It comes with Storage Accelerations.
    • Currently, a “Landing Zone”: use of a SSD as a temporary storage. Files copied into the array are dropped onto the SDD and transferred later, in background, to the array.
    • Soon, “SSD caching”.

RAID Options

Once Physical Drives “Registered as Pass-Through”, to be used as DRU or PPU, and added into a “RAID Configuration (defining hence an “Array”), one can set various options on that “Configuration”

Options:

  • Auto Start Array=false. Because I don’t always turn on my PC to access the data stored in the T-RAID array. Bu also because I often change settings in my Configuration for testing purpose and changes may often not be applied if the array is already running…
  • Global Hot-Spare Rebuild=false. This is the recommended value as human interaction is preferred on automatic rebuild in case of disk failure
  • Read-Only Policy=Never. This is the default and authorize writing on all disks in the array, even on disks failing.
  • Scheduled Range Operation Size (in GB)=100. I didn’t fine-tune this default value yet (taking into account e.g. how much data can be validated per hours when the server is on). Actually, I turn my server on only a few times per month, to do massive backups. Once the backups completed, I start a complete Validation of the array and configure the system to shutdown on completion.
  • Statistics: File=true, RAID=true. I want indeed to monitor my system. But File Statistics requires a Job to be scheduled for the Storage!

Performance Options:

  • Performance Profile=PERFORMANCE. Because my server is only on when I want to do backups, I don’t care about saving disks/energy. On the opposite, I care about performance and this profile provides indeed noticeable improvements at disk access speed level.
  • Concurrency Queuing (CQ) Depth=64, Salt=16. Salt is use in the algorithm managing “concurrency” within T-RAID.  System could experience lock overrides if the salt is too high and constant out of sync blocks if it is too low. The perfect values depends on the hardware… So, as long as “out of sync blocks” are reported during “Verify and Sync” tasks, increase the salt. But look into the “RAID Monitoring” tab for the graphic “Lock Override“, if the value is increasing drastically, lower the salt!
  • OS Caching=false. I don’t use this one as it doesn’t help to keep high performances when copying files larger than the amount of RAM, which is the case for me. In addition, the PERFORMANCE mode is not guarantee to be efficient with “OS Caching”=true when using multiple PPU, which is also the case for me.
  • Tagged Command Queuing (TCQ)=true, Depth=32. I am using this option to improve performances as it’s compatible with the PERFORMANCE mode while using multiple PPU. It allow up to 90% of source disk write speed.
  • Sequential Write Optimization (SWO)=true, Depth=8. I keep those default values.
  • Direct I/O=true. I also keep those default values.

Storage Options:

  • Auto Storage Pooling Start=false, Delay=15. Notice that it’s recommended to never access the virtual disks directly (assigning them with a drive letter). Instead, using only the Pool add an extra visualization layer which makes hot-unplugging much less issue prone. But I often change settings in my Configuration for testing purpose and changes may often not be applied if the pool is already running…
  • Removable=false. This settings must be set on false on Windows Server 2012 Essentials.
  • Storage Pool Caching=META_DATA_ONLY, Max=310. I noticed that performances are much better when using this setting instead of File_AND_META_DATA for copy of large files, which is the case for me.
  • Sync Folder Last Modified Date=false. I would enable this only if I use a program tracking file modification date (Ex.: sync or backup daemon)
  • Thread Pool Size=32. I keep this default
  • Space Management Reserve (in GB)=50. I keep this default.
  • File Management Strategy=STRICT_FOLDER_PRIORITY. I want to keep all files together even if it’s not “energy optimal”. Indeed, in case of disaster, I will at least easily retrieve related files on disks still “alive”…
  • File System=NTFS, strict ACL Mode=false. I keep those defaults
  • Drive Letter or Folder Mount Point=V. This is the letter to be assigned to the Pool. It is shared to be accessible from other machine in my Intranet
  • Native NFS Support=false. I keep this default.
  • Volume Label: tRAID Storage Pool

Advanced Operations

  • Storage Acceleration. I don’t use it so far as the write performances are good enough for me and anyway, I don’t keep my server up and running 24/7. So I want to know when I can switch it off (I.e.: when the transfers are really completed). Using the Storage Acceleration, the SSD used as Landing Zone would never be flushed in my case… I indeed only turn the server on when I want to backup huge amount of data…

S.M.A.R.T

  • For each disks on a LSI SAS controller, I have to set an “Advanced Mapping”
    • Device Path Mapping: /dev/pdx where x is the disk id
    • DeviceType Mapping: sat
  • For each disks, I also enable SMART Monitoring (every 4 hours) except when disks are in standby.

Notes

  • Write performances are a lot impacted by the performances of the PPU. The best disks should be used as PPU instead of DRU.
  • To increase Read Performances, the File Management Strategy has better be ROUN_ROBIN as it enables I/O parallelism.
  • Never Defrag or Chkdsk the “Pool Drive” or “Source (physical) Drives”. Defrag instead the “NZFS (Virtual) Drives”. That being said:
    • I really try to avoid doing a Defrag as so-far, I am not yet 100% convinced that, on my system, it does not results “blocks out of sync” (I.e.: requires a Verify&Sync). For that reason, I have disabled the automatic-daily-defrag; E.g.: Turn off the Windows Disk Defragmenter Schedule (See FlexRaid’s Wiki) or uncheck the automatic optimization on concerned drives in O&O Defrag. Pay attention that new NZFS disk appearing when the array start can be taken automatically into account by the defrag tool.
    • Defrag, if done, should never be executed on several disk simultaneously (See FlexRaid’s wiki).
    • If you do a Defrag, you better stop the Pool or at least imperatively disable “Storage Pool Caching”.
    • I didn’t succeed to do a Chkdsk on the “NZFS Drives” and had to bring the “Source Drives” online to repair them… Once repaired, a Verify&Sync is mandatory! (NB.: One thing to try is dis-engaging driver protection mode. asit blocks certain low level operations. Unfortunately, it’s not recommended to run disk tools on the transparent disks with driver protection dis-engaged).
  • When a Verify task fails, it provides the exact first and last byte failure as well as the amount of 4KB blocks. One can then start an “Range Specific Operation” to Verify&Sync the specified zone.
    • Notice that first/last position of failure is in Bytes while the “Range Specific Operation” can be in KB, MB, etc… 1KB = 1024B).
    • Notice also that the Verify&Sync updates complete blocks (4KB) and will therefore possible report different addresses (first byte of the updated block) than the Verify Task!

Tips LSI_SAS2 and Disk Warnings in the System Event Log

I noticed that there are a lot of Warnings issued from my LSI controllers in the eventlogs. LSI support told me that this can be due to failing disks, but in my case it could be something else, related to my SMART monitoring tool…

Click to Read More

Concretely, I have two LSI SAS Controllers: a LSI SAS 9211-8i and a LSI SAS 9201-16i. Both have been updated with the Firmware 19.00.00.00 and I am using the drivers version 2.0.72.0 for Windows x64: LSI Adapter SAS2 2008 Falcon and LSI Adapter SAS 2 2116 Metero ROC(E). I didn’t upgrade the bios of those cards as more recent versions are not compatible with my motherboard (See here).

I often noticed “Warnings” related to my two LSI adapters (with “LSI_SAS2” as a source) in my Event Log; either “Reset to device, \Device\RaidPort1, was issued.” or “Reset to device, \Device\RaidPort2, was issued.”. Those Warnings are usually followed by one or two other Warnings like “The IO operation at logical block address 0 for Disk 0 was retried.” with “disk” as a source. The address and the disk number vary a lot but the Warnings appear very precisely every 180 seconds (3 minutes)

I read on the web that this is usually due to a timeout on accessing the disk and resulting in the controller to be reset. This issue is often solved by using the “High Performance” Power Plan Power with the option “PCI Express” > “Link State Power Management” set to “Off” ! But it didn’t solve my own issue.

I have therefore contacted LSI support and was told that “The resets are to the drives which are timing out. It is possible one or more of them have an issue. Replace the drive that has the highest number of resets on his port.”

I have the issue mostly with all my disks, so I didn’t know where to start… especially as according to their SMART status, they were all definitively perfect. Could it therefore be due to the cables ? No idea yet… But looking once more the SMART details, I noticed that the Warnings where typically logged when I was refreshing those status.

I am using CrystalDiskInfo which is IMO definitively the best free SMART Monitoring tool… It is e.g. configured on my PC to send emails as soon as an SMART alert occurs… and… it is configured to check the SMART status every 3 minutes ! Gosh ! A refresh rate of 180 seconds?! That rings a bell and even a siren! I immediately disabled the SMART Monitoring and didn’t get any Warning anymore. Trying other SMART tools, I noticed the same issue…

I did submit my findings to LSI and wait now on their feedback: is there any conflict at LSI SAS adapter level when accessing data on disk at the same time as the disk’s SMART info?

That being said, I noticed also that most 99% of the Warnings are related to my 5 Seagate ST3000DM001. I seldom have Warnings for my Samsung HD204UI (patched to avoid data corruption when accessing SMART info!)

Synology Send a DSM Notification to one of your Synology users

This is a simple note to remind me how easy it is to send a notification message to another user on my NAS, from a telnet prompt, using the command: synodsmnotify user/group “Title” “Message”

Click to Read More

Ex.: synodsmnotify valery “Hello” “Don’t forget to post a note about such findings on your blog”

Where “valery” is a valid user, defined on my Synology… Notice that if the user does not exist, you will get an error like this:

synodsmnotify.cpp:27 SYNOUserPreferenceDirGet(valery) fail, [0x1D00 user_db_get.c:53]

synodsmnotify.cpp:172 Fail to send notify to valery

Synology Update DSM 5.0 with the latest fixes

I have just applied the latest service pack for DSM 5.0. Soon after, I started to experience connection issues to my own blog from my Intranet. This was due to some (???) issues with the DNS Service running on my Synology.

Click to Read More

Issue confirmed: executing a “ping beatificabytes.be” in a CMD prompt was returning the internet IP of my ADSL Modem, instead of the IP of my NAS.

As a reminder: I did configure my Router and my Synology’s DNS Service to be able to access my blog on my intranet with it’s actual FQDN (See here). And after the upgrade from DSM 4.0 to DSM 5.0, I had to enable the “Resolution Service” in the “DNS Server”.

Now, to solve the connection issue experienced after updating DSM 5.0:

  1. On the Synology, in the “DNS Server” configuration pane, I had first to:
    1. Disable the “Resolution Service” and clicked Apply
    2. Re-enable the “Resolution Service” and clicked Apply
  2. Next, on my PC, in a CMD prompt, I did executed:
    1. ipconfig /flushdns
    2. ipconfig /renew *
    3. ping beatificabytes.be

Et voilà !

Hardwares Upgrade LSI 9201-16i and 9211-8i to Firmware 19.00.00.00 on Asus Striker II Formula

I was using deprecated firmwares and drivers for my two LSI SAS Controllers and noticed a lot of errors in the Windows’ eventlogs like “Event 11 LSI_SAS2 The driver detected a controller error on \Device\RaidPort1”. So I decided to update them.

Click to Read More

Notice that I didn’t update the bios of my cards as recent versions are not compatible with my motherboard. See here.

First, I did update the Windows Drivers. The most recent version of those drivers are available on LSI’s download page… (Search for “Component Type” = “Storage Component” with “Host Bus Adapters” as “Product Family”). E.g.: for “LSI SAS 9201-16i” see here and for “LSI SAS 9211-8i” see here.

The update is easy and straightforward via “Computer Management” > “Device Management” > “Storage Controllers”. Right-click on the “LSI adapter xxxx” and select “Update Driver Software” > “Browse my Computer for driver software”. Then select the folder where you unzipped the drivers.

Next step, I did update the Firmware. The most recent version of the firmwares is available on the download page too. As I have two card, I wanted to update them separately as the size of the firmware was 1KB different…

I did start a CMD prompt as Administrator and used the sas2flash.exe command the folder “sas2flash_win_x64_rel” available in the zip with the firmwares.

first I did check which number was assigned to each controller with the command “sas2flash.exe -list -c 0” and “sas2flash.exe -list -c 1”

C:\sas2flash_win_x64_rel>sas2flash.exe -list -c 0
LSI Corporation SAS2 Flash Utility
Version 19.00.00.00 (2014.03.17)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

Adapter Selected is a LSI SAS: SAS2008(B2)

Controller Number : 0
Controller : SAS2008(B2)
PCI Address : 00:04:00:00
SAS Address : 500605b-0-01bd-bec0
NVDATA Version (Default) : 11.00.00.07
NVDATA Version (Persistent) : 11.00.00.07
Firmware Product ID : 0x2213 (IT)
Firmware Version : 17.00.01.00
NVDATA Vendor : LSI
NVDATA Product ID : SAS9211-8i
BIOS Version : 07.15.00.00
UEFI BSD Version : N/A
FCODE Version : N/A
Board Name : SAS9211-8i
Board Assembly : N/A
Board Tracer Number : N/A

Finished Processing Commands Successfully.
Exiting SAS2Flash.

C:\sas2flash_win_x64_rel>sas2flash.exe -list -c 1
LSI Corporation SAS2 Flash Utility
Version 19.00.00.00 (2014.03.17)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

Adapter Selected is a LSI SAS: SAS2116_1(B1)

Controller Number : 1
Controller : SAS2116_1(B1)
PCI Address : 00:03:00:00
SAS Address : 5000000-0-8000-0000
NVDATA Version (Default) : 11.00.00.05
NVDATA Version (Persistent) : 11.00.00.05
Firmware Product ID : 0x2213 (IT)
Firmware Version : 17.00.01.00
NVDATA Vendor : LSI
NVDATA Product ID : SAS9201-16i
BIOS Version : 07.15.00.00
UEFI BSD Version : N/A
FCODE Version : N/A
Board Name : SAS9201-16i
Board Assembly : N/A
Board Tracer Number : N/A

Finished Processing Commands Successfully.
Exiting SAS2Flash.

So, id 0 was for the LSI SAS9211-8i and id 1 was for LSI SAS9201-16i. So, I did update like this:

C:\sas2flash_win_x64_rel>sas2flash.exe -o -c 0 -f 2118it.bin
LSI Corporation SAS2 Flash Utility
Version 19.00.00.00 (2014.03.17)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

Advanced Mode Set

Adapter Selected is a LSI SAS: SAS2008(B2)

Executing Operation: Flash Firmware Image

Firmware Image has a Valid Checksum.
Firmware Version 19.00.00.00
Firmware Image compatible with Controller.

Valid NVDATA Image found.
NVDATA Version 11.00.00.00
Checking for a compatible NVData image…

NVDATA Device ID and Chip Revision match verified.
NVDATA Versions Compatible.
Valid Initialization Image verified.
Valid BootLoader Image verified.

Beginning Firmware Download…
Firmware Download Successful.

Verifying Download…

Firmware Flash Successful.

Resetting Adapter…
Adapter Successfully Reset.

Finished Processing Commands Successfully.
Exiting SAS2Flash.

C:\\sas2flash_win_x64_rel>sas2flash.exe -o -c 1 -f 9201-16i_it.bin
LSI Corporation SAS2 Flash Utility
Version 19.00.00.00 (2014.03.17)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

Advanced Mode Set

Adapter Selected is a LSI SAS: SAS2116_1(B1)

Executing Operation: Flash Firmware Image

Firmware Image has a Valid Checksum.
Firmware Version 19.00.00.00
Firmware Image compatible with Controller.

Valid NVDATA Image found.
NVDATA Version 11.00.00.00
Checking for a compatible NVData image…

NVDATA Device ID and Chip Revision match verified.
NVDATA Versions Compatible.
Valid Initialization Image verified.
Valid BootLoader Image verified.

Beginning Firmware Download…
Firmware Download Successful.

Verifying Download…

Firmware Flash Successful.

Resetting Adapter…
Adapter Successfully Reset.

Finished Processing Commands Successfully.
Exiting SAS2Flash.

The PC did reboot without any issue and I have access to my diks.. but next:

c:\sas2flash_win_x64_rel>sas2flash.exe -listallboards
LSI Corporation SAS2 Flash Utility
Version 19.00.00.00 (2014.03.17)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

No LSI SAS adapters found! Limited Command Set Available!
ERROR: Command Not allowed without an adapter!
ERROR: Couldn’t Create Command -listallboards
Exiting Program.

I wanted to check the controllers and unfortunately, they are not detected anymore by the tool :(

Damned… I forgot to run it in a CMD prompt run as Administrator!!! Once run in such a CMD prompt, I got the expected info:

c:\sas2flash_win_x64_rel>sas2flash.exe -listallboards
LSI Corporation SAS2 Flash Utility
Version 19.00.00.00 (2014.03.17)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

Adapter Selected is a LSI SAS: SAS2008(B2)

Num Ctlr Board Name Serial Number
—————————————————-

0 SAS2008(B2) SAS9211-8i N/A
1 SAS2116_1(B1) SAS9201-16i N/A

Finished Processing Commands Successfully.
Exiting SAS2Flash.

 

c:\sas2flash_win_x64_rel>sas2flash.exe -listall
LSI Corporation SAS2 Flash Utility
Version 19.00.00.00 (2014.03.17)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

Adapter Selected is a LSI SAS: SAS2008(B2)

Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr
—————————————————————————-

0 SAS2008(B2) 19.00.00.00 11.00.00.08 07.15.00.00 00:04:00:00
1 SAS2116_1(B1) 19.00.00.00 11.00.00.06 07.15.00.00 00:03:00:00

Finished Processing Commands Successfully.
Exiting SAS2Flash.

Tips FlexRaid: powershell script to set disks online/offline

FlexRaid‘s products (Raid-F and tRaid) are setting offline all disks used as DRU or PPU to prevent direct access and mistakes. Access are made through virtual disks created by FlexRaid. But if you have many disks and sometimes needs to access your physical disks, you will like this script to reset online all those disks…

Click to Read More

Create a file to store this powershell script (ex.: ManageDisks.ps1) :


Clear-Host
$action = Read-Host 'Type "Off" to set disks offline or "On" to set them online'
Clear-Host
if ($action.ToLower() -eq 'off')
{
 echo 'Please Wait...'
 $lines = Get-Content C:\OffLineDisks.txt
 foreach ($line in $lines) {
 if ($line.Trim())
 {
 $fields = $line -split ' : '
 $disk = $fields[1]
 $command = '"select disk ' + $disk + '", "offline disk" | diskpart'
 echo $command
 invoke-expression $command
 }
 }
}
elseif ($action.ToLower() -eq 'on') {
 echo 'Please Wait...'
 Get-Disk | ? isoffline | Format-List Number > C:\OffLineDisks.txt
 $lines = Get-Content C:\OffLineDisks.txt
 foreach ($line in $lines) {
 if ($line.Trim()) {
 $fields = $line -split ' : '
 $disk = $fields[1]
 $command = '"select disk ' + $disk + '", "online disk", "detail disk" | diskpart | Where-Object {$_ -match ".*Volume.*Partition" } 2>&1'
 echo $command
 $output = invoke-expression $command
 if ($output -eq $null) {
 echo "No Volume on disk $disk"
 } else {
 $fields = $output.TrimStart(" Volume") -split ' '
 $volume = $fields[0]
 $command = '"select volume ' + $volume + '", "assign" | diskpart'
 echo $command
 invoke-expression $command
 }
 }
 }
}
else
{
 $message = $action + ' is not a supported action'
 echo $message
}

Usage: run this script “As Administrator” and

Type  ‘On’ to set online all disks currently offline. This will:

  1. Create a file OffLineDisks.txt on the C:\ drive with the ID of the disks currently offline. I presume that all disk offline are used by FlexRaid!
  2. Bring online each of those disks.
  3. Assign a letter to the volumes on those disks. I presume that there is only one volume per DRU and none on the PPU.

Type ‘Off’ to set offline the disks listed in the file OffLineDisks.txt created previously.

My purpose was to scan and repair various physical disks as this was not working when trying to do so via the NZFS Virtual Drives…

Click to Read More

Doing a Chkdsk on the NZFS Virtual Drive, I got:

Chkdsk was executed in read/write mode.

Checking file system on #:
Volume label is #.

Stage 1: Examining basic file system structure …

# file records processed. File verification completed.

  # large file records processed.

  # bad file records processed.

Stage 2: Examining file name linkage …
The data written out is different from what is being read back
at offset 0x# for 0x# bytes.
An unspecified error occurred (696e647863686b2e 1324).

So, instead, I do it on Physical Drives once set back online. And instead of Chkdks, I am using the powershell command “Repair-Volume”. E.g.: on a disk assigned with letter X:

PS C:\> Repair-Volume X -Scan
ScanErrorsFoundNeedSpotFix

PS C:\> Repair-Volume X
ErrorsFixed

PS C:\> Repair-Volume X -Scan
NoErrorsFound