Tips Replacing RAID-F by T-RAID

T-RAID (for Transparent RAID) is a new product of flexRAID. It comes as another option next to the existing product: RAID-F (RAID over File System). I did migrate from Raid-F to T-RAID months ago…

T-RAID

T-RAID

Here is how I configure it now in order to get the best performances for my server and my own usage.

Click to Read More

Nice Features

I love T-RAID. It has great features similar to RAID-F, e.g.:

  • Software Raid Array fully independent from the hardware.
    • If a physical controller die, no need to replace it with another identical one.
  • Support adding a disk with existing data into the Software Raid Array.
    • No need to add blank disk as required with hardware raid or with Windows Storage Server.
  • Survive to simultaneous failure of several drives.
  • Access each disk through a virtual disk or through a Pool offering a unique/global view on all the virtual disks.

But it comes with its own advantages on RAID-F

  • It’s a native Real-Time protection without any drawback compared to “RAID-F RealTime” (E.g.: RAID-F in RT mode MUST be stopped gracefully before shutting down the machine)
  • Data on failing disks are still accessible in Read and Write mode! There is therefore no downtime during the “disk reconstruction” (Similar to hardware Raid)
  • It comes with interesting monitoring and notification tools (Performances, S.M.A.R.T, …)
  • It comes with Storage Accelerations.
    • Currently, a “Landing Zone”: use of a SSD as a temporary storage. Files copied into the array are dropped onto the SDD and transferred later, in background, to the array.
    • Soon, “SSD caching”.

RAID Options

Once Physical Drives “Registered as Pass-Through”, to be used as DRU or PPU, and added into a “RAID Configuration (defining hence an “Array”), one can set various options on that “Configuration”

Options:

  • Auto Start Array=false. Because I don’t always turn on my PC to access the data stored in the T-RAID array. Bu also because I often change settings in my Configuration for testing purpose and changes may often not be applied if the array is already running…
  • Global Hot-Spare Rebuild=false. This is the recommended value as human interaction is preferred on automatic rebuild in case of disk failure
  • Read-Only Policy=Never. This is the default and authorize writing on all disks in the array, even on disks failing.
  • Scheduled Range Operation Size (in GB)=100. I didn’t fine-tune this default value yet (taking into account e.g. how much data can be validated per hours when the server is on). Actually, I turn my server on only a few times per month, to do massive backups. Once the backups completed, I start a complete Validation of the array and configure the system to shutdown on completion.
  • Statistics: File=true, RAID=true. I want indeed to monitor my system. But File Statistics requires a Job to be scheduled for the Storage!

Performance Options:

  • Performance Profile=PERFORMANCE. Because my server is only on when I want to do backups, I don’t care about saving disks/energy. On the opposite, I care about performance and this profile provides indeed noticeable improvements at disk access speed level.
  • Concurrency Queuing (CQ) Depth=64, Salt=16. Salt is use in the algorithm managing “concurrency” within T-RAID.  System could experience lock overrides if the salt is too high and constant out of sync blocks if it is too low. The perfect values depends on the hardware… So, as long as “out of sync blocks” are reported during “Verify and Sync” tasks, increase the salt. But look into the “RAID Monitoring” tab for the graphic “Lock Override“, if the value is increasing drastically, lower the salt!
  • OS Caching=false. I don’t use this one as it doesn’t help to keep high performances when copying files larger than the amount of RAM, which is the case for me. In addition, the PERFORMANCE mode is not guarantee to be efficient with “OS Caching”=true when using multiple PPU, which is also the case for me.
  • Tagged Command Queuing (TCQ)=true, Depth=32. I am using this option to improve performances as it’s compatible with the PERFORMANCE mode while using multiple PPU. It allow up to 90% of source disk write speed.
  • Sequential Write Optimization (SWO)=true, Depth=8. I keep those default values.
  • Direct I/O=true. I also keep those default values.

Storage Options:

  • Auto Storage Pooling Start=false, Delay=15. Notice that it’s recommended to never access the virtual disks directly (assigning them with a drive letter). Instead, using only the Pool add an extra visualization layer which makes hot-unplugging much less issue prone. But I often change settings in my Configuration for testing purpose and changes may often not be applied if the pool is already running…
  • Removable=false. This settings must be set on false on Windows Server 2012 Essentials.
  • Storage Pool Caching=META_DATA_ONLY, Max=310. I noticed that performances are much better when using this setting instead of File_AND_META_DATA for copy of large files, which is the case for me.
  • Sync Folder Last Modified Date=false. I would enable this only if I use a program tracking file modification date (Ex.: sync or backup daemon)
  • Thread Pool Size=32. I keep this default
  • Space Management Reserve (in GB)=50. I keep this default.
  • File Management Strategy=STRICT_FOLDER_PRIORITY. I want to keep all files together even if it’s not “energy optimal”. Indeed, in case of disaster, I will at least easily retrieve related files on disks still “alive”…
  • File System=NTFS, strict ACL Mode=false. I keep those defaults
  • Drive Letter or Folder Mount Point=V. This is the letter to be assigned to the Pool. It is shared to be accessible from other machine in my Intranet
  • Native NFS Support=false. I keep this default.
  • Volume Label: tRAID Storage Pool

Advanced Operations

  • Storage Acceleration. I don’t use it so far as the write performances are good enough for me and anyway, I don’t keep my server up and running 24/7. So I want to know when I can switch it off (I.e.: when the transfers are really completed). Using the Storage Acceleration, the SSD used as Landing Zone would never be flushed in my case… I indeed only turn the server on when I want to backup huge amount of data…

S.M.A.R.T

  • For each disks on a LSI SAS controller, I have to set an “Advanced Mapping”
    • Device Path Mapping: /dev/pdx where x is the disk id
    • DeviceType Mapping: sat
  • For each disks, I also enable SMART Monitoring (every 4 hours) except when disks are in standby.

Notes

  • Write performances are a lot impacted by the performances of the PPU. The best disks should be used as PPU instead of DRU.
  • To increase Read Performances, the File Management Strategy has better be ROUN_ROBIN as it enables I/O parallelism.
  • Never Defrag or Chkdsk the “Pool Drive” or “Source (physical) Drives”. Defrag instead the “NZFS (Virtual) Drives”. That being said:
    • I really try to avoid doing a Defrag as so-far, I am not yet 100% convinced that, on my system, it does not results “blocks out of sync” (I.e.: requires a Verify&Sync). For that reason, I have disabled the automatic-daily-defrag; E.g.: Turn off the Windows Disk Defragmenter Schedule (See FlexRaid’s Wiki) or uncheck the automatic optimization on concerned drives in O&O Defrag. Pay attention that new NZFS disk appearing when the array start can be taken automatically into account by the defrag tool.
    • Defrag, if done, should never be executed on several disk simultaneously (See FlexRaid’s wiki).
    • If you do a Defrag, you better stop the Pool or at least imperatively disable “Storage Pool Caching”.
    • I didn’t succeed to do a Chkdsk on the “NZFS Drives” and had to bring the “Source Drives” online to repair them… Once repaired, a Verify&Sync is mandatory! (NB.: One thing to try is dis-engaging driver protection mode. asit blocks certain low level operations. Unfortunately, it’s not recommended to run disk tools on the transparent disks with driver protection dis-engaged).
  • When a Verify task fails, it provides the exact first and last byte failure as well as the amount of 4KB blocks. One can then start an “Range Specific Operation” to Verify&Sync the specified zone.
    • Notice that first/last position of failure is in Bytes while the “Range Specific Operation” can be in KB, MB, etc… 1KB = 1024B).
    • Notice also that the Verify&Sync updates complete blocks (4KB) and will therefore possible report different addresses (first byte of the updated block) than the Verify Task!

Tips FlexRaid: powershell script to set disks online/offline

FlexRaid‘s products (Raid-F and tRaid) are setting offline all disks used as DRU or PPU to prevent direct access and mistakes. Access are made through virtual disks created by FlexRaid. But if you have many disks and sometimes needs to access your physical disks, you will like this script to reset online all those disks…

Click to Read More

Create a file to store this powershell script (ex.: ManageDisks.ps1) :


Clear-Host
$action = Read-Host 'Type "Off" to set disks offline or "On" to set them online'
Clear-Host
if ($action.ToLower() -eq 'off')
{
 echo 'Please Wait...'
 $lines = Get-Content C:\OffLineDisks.txt
 foreach ($line in $lines) {
 if ($line.Trim())
 {
 $fields = $line -split ' : '
 $disk = $fields[1]
 $command = '"select disk ' + $disk + '", "offline disk" | diskpart'
 echo $command
 invoke-expression $command
 }
 }
}
elseif ($action.ToLower() -eq 'on') {
 echo 'Please Wait...'
 Get-Disk | ? isoffline | Format-List Number > C:\OffLineDisks.txt
 $lines = Get-Content C:\OffLineDisks.txt
 foreach ($line in $lines) {
 if ($line.Trim()) {
 $fields = $line -split ' : '
 $disk = $fields[1]
 $command = '"select disk ' + $disk + '", "online disk", "detail disk" | diskpart | Where-Object {$_ -match ".*Volume.*Partition" } 2>&1'
 echo $command
 $output = invoke-expression $command
 if ($output -eq $null) {
 echo "No Volume on disk $disk"
 } else {
 $fields = $output.TrimStart(" Volume") -split ' '
 $volume = $fields[0]
 $command = '"select volume ' + $volume + '", "assign" | diskpart'
 echo $command
 invoke-expression $command
 }
 }
 }
}
else
{
 $message = $action + ' is not a supported action'
 echo $message
}

Usage: run this script “As Administrator” and

Type  ‘On’ to set online all disks currently offline. This will:

  1. Create a file OffLineDisks.txt on the C:\ drive with the ID of the disks currently offline. I presume that all disk offline are used by FlexRaid!
  2. Bring online each of those disks.
  3. Assign a letter to the volumes on those disks. I presume that there is only one volume per DRU and none on the PPU.

Type ‘Off’ to set offline the disks listed in the file OffLineDisks.txt created previously.

My purpose was to scan and repair various physical disks as this was not working when trying to do so via the NZFS Virtual Drives…

Click to Read More

Doing a Chkdsk on the NZFS Virtual Drive, I got:

Chkdsk was executed in read/write mode.

Checking file system on #:
Volume label is #.

Stage 1: Examining basic file system structure …

# file records processed. File verification completed.

  # large file records processed.

  # bad file records processed.

Stage 2: Examining file name linkage …
The data written out is different from what is being read back
at offset 0x# for 0x# bytes.
An unspecified error occurred (696e647863686b2e 1324).

So, instead, I do it on Physical Drives once set back online. And instead of Chkdks, I am using the powershell command “Repair-Volume”. E.g.: on a disk assigned with letter X:

PS C:\> Repair-Volume X -Scan
ScanErrorsFoundNeedSpotFix

PS C:\> Repair-Volume X
ErrorsFixed

PS C:\> Repair-Volume X -Scan
NoErrorsFound

Tips Status of FlexRAID Jobs currently running

While you manually trigger a FlexRAID job using the client FlexRAIDcmd.exe or when a FlexRAID Scheduled Job is started, its status is not displayed automatically in the Web UI currently opened.

Click to Read More

The Web UI needs to be reloaded in the Browser (Ctrl + F5). Doing so,

  • A status windows should now be displayed for the current process and
  • The job should also now appear in the “Command Execution Center” (FlexRAID UI > Your Configuration > Tool Box).
    • In that “Command Execution Center”, the “Pause”, “Resume” and “Abort” buttons should now be accessible.
To get the status of the current job, using the FlexRAID client (FlexRAIDcmd.exe), type in a command prompt:

FlexRAIDcmd.exe localhost - - status

Tips Use FlexRAIDCmd within PowerShell scripts to gracefully stop FlexRAID

When using FlexRAID in Real-Time mode, the Pool must be stopped before stopping the service, i.e., also before shutting down the server. The best approach is to define a Shutdown Task in Windows to manage this…

Click to Read More

The Shutdown Task will run a Powershell Script using the FlexRaidClient to query the state of FlexRAID and trigger actions…

The FlexRaidClient for windows, named FlexRaidCmd.exe, is not installed by default with the service. It must be downloaded as an Option here.

The syntax is: FlexRAIDClient Host Port Timeout Command

Once installed, one can use the Command “view” in a cmd prompt RUN AS Administrator on the server to Start/Stop the pool:

FlexRAIDCMD localhost - - view class1_0 start
FlexRAIDCMD localhost - - view class1_0 stop

The – –  are used as “default values” for the Port and Timeout parameters
class1_0 must be used for the new driver (=> class1) and to access the first pool (=> ID = 0).

Starting the pool takes about ~50 seconds.
Stopping the pool takes less than 10 seconds.

If the command fails due to a syntax error, the error message can be found in the file log.log
If the command succeed, there is nothing logged at all in the log file but a message displayed in the console (cmd prompt)

Quote

{“success”: true, “status”: null, “commandMessages”: [{“messageCode”: “successStoragePoolStarted”, “messageData”: [“V”]}], “serverMessages”: null}
=> Le pool de stockage est bien démarré pour le volume: V:…

Notice: In a normal cmd prompt (not run as admin), the command returns an error due to an access denied on the log file:

Quote

log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: log.log

Notice: if the Web UI was open before executing a Start/Stop, it must be reopened (or refreshed: CTRL-F5) otherwise it does not display the new state of the pool.

Regarding the log.log file, this one has better be located in a fix location. Edit the file log4j.properties and set a path like:

log4j.appender.default.File=C:/FlexRaid/FlexRAIDCmd.log

Pay attention to the path separator! It’s not the one used by Windows but the one used in Java!

To shutdown the FlexRaid server after stopping the pool, use the command shutdown-server.

FlexRaid localhost - - shutdown-server

If the pool is still running, we get the following response:

Quote

{“success”: false, “status”: null, “commandMessages”: [{“messageCode”: “errorShutdownNotAllowedStoragePoolServiceRunning”, “messageData”: []}], serverMessages”: null}
=> Pour pouvoir arrêter le service “host”, le service de pool de stockage doit être arrêté!

If the service stops successfully, we get :

Quote

{“success”: true, “status”: null, “commandMessages”: [{“messageCode”: “successServerShutingDown”, “messageData”: []}], “serverMessages”: null}
=> Arrêt du serveur en cours…

To restart the service, we can use:

net start "FlexRaid"

Notice: I didn’t find yet on the forum the difference between ‘FlexRaid localhost – – shutdown-server’ and ‘net stop “FlexRaid’. To my knowledge ‘net stop’ is synchronous and therefore maybe preferred to stop the service properly before shutting down ?! (FlexRaid message seems to indicate it’s asynchronous)

Notice: If I run “FlexRaid localhost — shutdown-server” while the service is not running, I get obviously an exception “Connection refused: connect : ConnectException” but also this message in the log file (log.log):

Quote

ERROR: Unexpected character (‘A’ (code 65)): expected a valid value (number, String, array, object, ‘true’, ‘false’ or ‘null’)
at [Source: java.io.StringReader@2f3d698; line: 1, column: 2] org.codehaus.jackson.JsonParseException: Unexpected character (‘A’ (code 65)): expected a valid value (number, String, array, object, ‘true’, ‘false’ or ‘null’)
at [Source: java.io.StringReader@2f3d698; line: 1, column: 2]

Now, here is how to create a “shutdown task” in the Local Group Policies of a Windows Server 2012:

1) Enable script execution on the server
a) On the Start Screen right-click the Windows PowerShell tile and run it As Administrator
b) execute “Set-ExecutionPolicy RemoteSigned” in that shell and answer “Y”

2) Create the script
a) Create a file “StopFlexRaid.ps1” in your “FlexRaid Client” folder (e.g.).
b) Type the script found bellow in the file (change the path to FlexRaidCmd)

3) Use the script as Shutdown Script
a) On the Start Screen, type “gpedit.msc” and run it.
b) Go to the node “Computer Configuration\Windows Settings\Scripts (Startup/Shutdown)”.
c)  Edit “Shutdown” and in the tab “PowerShell script”, “Add” StopFlexRaid.ps1

Notice:
– Shutdown scripts are run as Local System, and they have the full rights that are associated with being able to run as Local System.
– Shutdown scripts are run synchronously. The Server should wait on the script before shutting down.

Here is the StopFlexRaid script:


$srvName = "FlexRAID"
$flexCmd = "C:\Program Files (x86)\FlexRAID 2.0 Client\FlexRAIDCMD.exe"
$servicePrior = Get-Service $srvName
#"$srvName is currently " + $servicePrior.status

function ExitWithCode
{
 param
 (
 $exitcode
 )
 "Exit with code $exitcode"
 #$host.SetShouldExit($exitcode)
 #exit
}

$flexCmd = "C:\Program Files (x86)\FlexRAID 2.0 Client\FlexRAIDCMD.exe"

function ExecuteFRCmd([string]$cmd, [string]$hostname="localhost", [string]$port="-", [string]$timeout="-")
{
 $error.clear()

$pinfo = New-Object System.Diagnostics.ProcessStartInfo
 $pinfo.FileName = $flexCmd
 $pinfo.RedirectStandardError = $true
 $pinfo.RedirectStandardOutput = $true
 $pinfo.UseShellExecute = $false
 $pinfo.Arguments = "$hostname $port $timeout $cmd"
 $p = New-Object System.Diagnostics.Process
 $p.StartInfo = $pinfo
 $p.Start() | Out-Null
 $p.WaitForExit()
 $output = $p.StandardOutput.ReadToEnd()

 if ( $error.count -eq 0)
 {
 # Clean the JSON message (remove the trailing text (=> blabla))
 $output = $output -replace '(?<First>.*)=>.+', '${First}'
 }
 else
 {
 $output = $null
 }

 return $output
}

function IsAnyTaskRunning()
{
 $running = $FALSE

 $state = ExecuteFRCmd("status")

if ($state -eq $null)
 {
 throw "Command failed to execute"
 }
 else
 {
 #Write-Host "States: $state"

$process = $state | ConvertFrom-Json

 $message = $process.commandMessages.messageCode
 if ($message -eq "successNoProcessSinceServerStartup")
 {
 #Write-Host "No Process started since Server startup"
 }
 else
 {
 $processID = $process.status.referenceCode

do {
 $state = ExecuteFRCmd("status "+$processID)
 $process = $state | ConvertFrom-Json

switch ($process.status.status)
 {
 { @("STATUS_STARTED", "STATUS_PROCESSING", "STATUS_PAUSING", "STATUS_RESUMED", "STATUS_RESUMING") -contains $_ }
 {
 #Write-Host task $processID - $process.status.task - is running
 $running = $TRUE
 }
 { @("STATUS_COMPLETED", "STATUS_ABORTED", "STATUS_ABORTING", "STATUS_PAUSED") -contains $_ }
 {
 #Write-Host task $processID - $process.status.task - is not running
 }
 }

 $processID -=1
 } while (($processID -gt 0) -and ($running -eq $FALSE))
 }
 }

 return $running
}

Write-Eventlog -Logname 'Application' -source 'FlexRAID' -eventID 1 -EntryType Warning -Category 0 -message "Graceful FlexRAID Shutdown triggered"

$wid = [System.Security.Principal.WindowsIdentity]::GetCurrent()
$prp = new-object System.Security.Principal.WindowsPrincipal($wid)
$adm = [System.Security.Principal.WindowsBuiltInRole]::Administrator
$IsAdmin = $prp.IsInRole($adm)
if (-not $IsAdmin) {
 write-host "Current powershell process is not running with Administrator privileges"

$message = "Graceful FlexRAID Shutdown not running with adhoc rights..."
 Write-Eventlog -Logname 'Application' -source 'FlexRAID' -eventID 1 -EntryType Error -Category 0 -message $message
 cmd /c shutdown -a
 ExitWithCode -exitcode 2
}
elseif ($servicePrior.status -eq "Stopped")
{
 "$srvName is already " + $servicePrior.status
}
elseif ($servicePrior.status -ne "Running")
{
 "$srvName is not Running but " + $servicePrior.status
}
else
{
 $running = IsAnyTaskRunning
 if ($running -eq $TRUE)
 {
 $message = "FlexRAID process(es) still running and preventing Server to shutdown..."
 $message
 Write-Eventlog -Logname 'Application' -source 'FlexRAID' -eventID 1 -EntryType Error -Category 0 -message $message
 cmd /c shutdown -a
 ExitWithCode -exitcode 2
 }
 else
 {
 "Wait on the Storage Pool to stop. This can take a few seconds."

 $state = ExecuteFRCmd("view class1_0 stop")

 $state
 $abort = "False"

 if ( $state -eq $null)
 {
 "Storage Pool failed to stop"
 $error[0]
 $message = "FlexRaid Storage Pool failed to stop and is preventing Server to shutdown: " + $error[0]
 $message
 Write-Eventlog -Logname 'Application' -source 'FlexRAID' -eventID 1 -EntryType Error -Category 0 -message $message
 cmd /c shutdown -a
 ExitWithCode -exitcode 2
 }
 else
 {
 $process = $state | ConvertFrom-Json

 $message = $process.commandMessages.messageCode
 if ($message -eq "successStoragePoolStopped")
 {
 "Storage Pool successfuly stopped"
 Write-Eventlog -Logname 'Application' -source 'FlexRAID' -eventID 1 -EntryType Warning -Category 0 -message "Storage Pool stopped before shutting down"
 }
 else
 {
 if ($message -eq "errorNoActiveStoratePool")
 {
 "Storage Pool actually not started"
 }
 else
 {
 $abort = "True"
 $event = "FlexRaid Storage Pool failed to stop, preventing Server to shutdown: " + $state
 $event
 Write-Eventlog -Logname 'Application' -source 'FlexRAID' -eventID 1 -EntryType Error -Category 0 -message $event
 cmd /c shutdown -a
 ExitWithCode -exitcode 3
 }
 }

 if ($abort -eq "False")
 {
 $error.clear()
 Stop-Service $srvName
 if ( $error.count -eq 0)
 {
 Write-Host -NoNewLine "Waiting on $srvName to stop "
 $timeout = new-timespan -Minutes 1
 $sw = [diagnostics.stopwatch]::StartNew()
 while (((Get-Service $srvName).status -ne "Stopped") -and ($sw.elapsed -lt $timeout))
 {
 Write-Host -NoNewLine "."
 sleep 1
 }
 "."
 }

 $serviceAfter = Get-Service $srvName
 if ($serviceAfter.status -eq "Stopped")
 {
 "$srvName is now " + $serviceAfter.status
 ExitWithCode -exitcode 0
 }
 else
 {
 "$srvName failed to stop. It is now " + $serviceAfter.status
 ExitWithCode -exitcode 1
 }
 }
 }
 }
}

Here is the code to start FlexRaid, useful while testing.


$srvName = "FlexRAID"
$flexCmd = "C:\Program Files (x86)\FlexRAID 2.0 Client\FlexRAIDCMD.exe"
$servicePrior = Get-Service $srvName
#"$srvName is currently " + $servicePrior.status

function ExitWithCode
{
 param
 (
 $exitcode
 )
 "Exit with code $exitcode"
 #$host.SetShouldExit($exitcode)
 #exit
}

if ( ($servicePrior.status -ne "Stopped") -and ($servicePrior.status -ne "Running"))
{
 "$srvName is not Stopped but " + $servicePrior.status
}
else
{
 if ($servicePrior.status -eq "Running")
 {
 "$srvName is already " + $servicePrior.status
 }
 else
 {
 Start-Service $srvName

 Write-Host -NoNewLine "Waiting on $srvName to start "
 $timeout = new-timespan -Minutes 1
 $sw = [diagnostics.stopwatch]::StartNew()
 while (((Get-Service $srvName).status -ne "Running") -and ($sw.elapsed -lt $timeout))
 {
 Write-Host -NoNewLine "."
 sleep 1
 }
 "."
 }

 $serviceAfter = Get-Service $srvName
 if ($serviceAfter.status -eq "Running")
 {
 "$srvName is now " + $serviceAfter.status

 $error.clear()
 "Wait on the Storage Pool to start. This can take a while."
 $stopPool = Start-Process $flexCmd -ArgumentList "localhost - - view class1_0 start" -NoNewWindow -Wait -PassThru

 if ( $error.count -eq 0)
 {
 ExitWithCode -exitcode 0
 }
 else
 {
 "Storage Pool failed to start"
 $error[0]
 ExitWithCode -exitcode 2
 }
 }
 else
 {
 "$srvName failed to start. It is now " + $serviceAfter.status
 ExitWithCode -exitcode 1
 }
}

FlexRaid on Server 2012 instead of Space Storage

FlexRaid

FlexRaid – Software Raid and Storage Pool

In the past, I have been using the onboard RAID controller of my Home Server’s motherboard to secured its data, a RAID 5 with 6 HDD to be more precise. But I was in a urgent need for a new solution.

Click to Read More

Indeed, I had several concerns:

  • If the motherboard would die, data would not be accessible anymore except with a replacement motherboard with same amount of identical RAID controllers
  • if I would be out of space, new disks could not be added to easily in the existing RAID array (mainly due to lack of space in the case). Also replacing existing disks with larger ones would be quite dangerous as duplicating the whole array before such a risky upgrade was not possible (lack of backup storage).
  • After a power failure or a BSOD, the RAID was checked (for about 12 hours), making all read access veryyyy sloooooooow.
So, I have decided to “upgrade” my Home Server with
  • A server case able to enclose up to 24 HDD (with adequate controller cards)
  • A software RAID solution to not rely on hardware anymore
  • A software Pooling solution to be able to expand seamlesslythe storage space
For the Home Server Hardware part, including the case, see here.

For the Software part, I wanted to go with a Server 2012 Essentials because I really like

  • Its centralized PC-image backup feature and
  • Its centralized File History backup functionality.
  • I don’t want to mix OS in my network

Server 2012 also supports pools of disks with data redundancy, a feature named Space Storage. But it has a several (big or not) disadvantages IMO:

  • Drives containing data may not be added into the pool :(
  • In addition, if the server dies, a disk moved into another PC will be readable but not writable, except if this other PC is also running Space Storage and all disks are moves. That could be an issue if the dead server cannot be quickly replaced.

So… What else ? I started to look for solutions to manage pools of disks or to manage RAID, or – better – to do both: Greyhole, SoftRaid, mhddfs, UnRAID, FlexRAID, mdadm, SnapRaid, Amahi, FreeNAS, disParity, LVM, JBOD, MooseFS, GlusterFS, ZFS, Liquesce… And I finally decided to go with FlexRaid although it is not free (but not expensive either):

  • It runs on Windows Server 2012 x64 (On the opposite, FreeNAS or Unraid for example are integrated with their own OS – linux based).
  • It has a nice Web UI (based on extJS like the DSM of my Synology) which make remote management easy and comfortable, although IMO there is still place for improvement.
  • It supports both RAID software and Pooling (There are separated licences if you don’t want both)
  • It has supports for either a RealTime protection or a Snapshot mode (I.e.: RAID is nightly updated on schedule – not slowing down read/write access during the day).
  • Disks containing data can be added in the Pool at any time.
  • Disks can be temporarily removed to be accessed from within another machine. If Snapshot mode is used, data may be modified before re-plugging the disk in FlexRaid. Notice: If data are modified, they will only be protected if the disk is re-plugged in FlexRaid and if a the RAID is updated.
  • Disks temporarily removed can be re-plugged anywhere in the machine. FlexRaid does not rely on the physical location but on the disk mount point which is “memorized” on the disk itself.
  • It does not store any recovery information on the disks containing data. This information is stored on dedicated disks
  • It supports multiple simultaneous disk failures (it implements several type of RAID) depending on how many disks are assigned to store the recovery information.
  • It comes with a Wizard “for dummies” to easily create a pool of disks with a few default settings. It also supports an expert mode with more flexibility – but also with more complexity;
  • Disks can be replaced with larger ones.
  • As far as the FlexRaid configuration is backuped, the OS can be re-installed from scratch; the recovery information won’t have to be recomputed and the data are safe.
  • In case of crash, RAID validation is fast (I still have to check the speed of a recovery)
  • It has support for S.M.A.R.T monitoring with email alerts (although configuring S.M.A.R.T is not integrated in a Wizard and can require some research).

To be honest, IMHO, the version 2.0 of this product is not yet ready for all end-users. They will have to be comfortable themselves with server engineering to solve various possible issues. Also, it seems to me that only one technical guy from FlexRaid is answering questions and offering support on the official forum. He knows very well his product, but still…

On my own, I had a lot of issues while testing the product – mainly because I did a lot of operations like creating/deleting pools, adding/removing disks, sharing/un-sharing folders, stopping/starting the service, etc… But also because Murphy was at the party: after many various issues followed by a complete re-installation of the server, one disk used to store FlexRaid’s parity data started to die. If you intend to test it, I really recommend to do it in a VM with virtual drives that can be easily re-staged. FlexRaid has a Trial of 14 days.

Once the dead disk removed, FlexRaid started to run fine with the following “configuration” (as named in FlexRaid) – See Attention Points before create a new configuration:

  • 3 data disks (named DRU in FlexRaid terminology – they contains the data) compound each of 2 HDD (2TB + 3TB)
    • HDD used in DRU are not using a proprietary format but simply NTFS
    • HDD can be added in a DRU with existing data
    • HDD can be of any size, but DRU have better be of the same size.
  • One parity disk (named a PPU in FlexRaid terminology – they contain the recovery information) compound of 2 HDD (2TB + 3TB).
    • PPU must be at least as large as the largest DRU
      • It has better be larger because in case of bad-sector on a HDD in the PPU, FlexRaid will be out-of-space and fail to update the parity data.
    • It’s recommended to have one PPU per 3 DRU.
  • PPU and DRU created with the wizard (Cruise Control) instead of the expert mode.
    • Merge Mode = “Auto-Folder-Priority”.
      • To optimize power consumption (most probably only one drive accessed when fetching data)
      • To keep data grouped if by any chance the disks must be moved into another PC during DRP.
    • a Snapshot mode
      • To not slow down write operation
      • As anyway, data on the Home Server does not change often at all.
  • A login/password set in “System Control Toolbox” > Login pane
  • Automatic start-up of FlexRaid storage pool 15″ after the Server boot in “[your configuration] > “Preferences and settings”
  • A Scheduled Task to Update the RAID (Parity Data) every day at 23:00.
  • A Scheduled Task to Validatethe RAID (Parity Data) every week at .
    • The Validate task does both change detection along with datarot (silent data corruption) detection through data checksum valiation.
  • A Scheduled Task to Verifythe RAID (Parity Data) every month.
    • The Verify task does bit for bit verification of the RAID.
To be safe however, and based on the amount of disk I have, someone from FlexRaid told me I should either use two PPU of 3TB or possibly add one PPU of 5TB. Doing so, I will double the security level (the RAID could survive after 2 simultaneous disk failures) while only loosing ~6% of space (1TB). For sure, RAID updates will be slower as 2 PPU will have to be updated. But based on my experience, it’s really not that slower. It takes about 1 hour per TB of Data.
Here is a benchmark of accessing data in FlexRaid’s pool configured with one 5TB PPU and 3x5TB DRU (And I got the same results for two 3TB PPU and 4x3TB DRU…) Clearly, FlexRaid with SnapShot mode does not really impact the performances… (Actually, one the Pool features has an impact).
FlexRaid Snapshot Performance

FlexRaid Snapshot Performance

Here is the same benchmark for accessing data in an equivalent Pool with the Real-Time mode..;

FlexRaid Real-Time  Performance

FlexRaid Real-Time Performance

Writing small files is not performing well in Real-Time but that’s due to the accesses to the FlexRaid Configuration Database (on C:\). If that drive is on a SSD, those performance should be quite improved (Support to store that DB on another disk is foreseen…)

For information: it took a little bit less that 13 hours to compute the parity for about 11TB of data in this 15TB pool.

ATTENTION POINTS:

Before starting:

  • Drives used by FlexRaid may not be used as target for Paging or Shadow Copy.
    • However, Shadow Copy can be used as far as the volume used as target for the Storage Location is on a dedicated drive outside of the Pool.
      • Shadow Copy must be enabled:
        1. On the Start Screen type Computer and run Computer Management
        2. Right-Click Shard Folders > All Tasks > Configure Shadow Copies…
        3. Enable for all sources drives (DRU) but not for the virtual drive (Pool)
      • And the Storage Location must be configured on each disk
        • Via the Dsik Management node in the Computer Management:  Properties > Shadow Copies Tab > Settings button > Storage Area (Click details) (to be completed).
        • Configured via command lines (to be completed)
  • Recycle Bin may not be used on drives used by FlexRaid as well as in the virtual drive (pool)
    • Instead FlexRaid proprietary Recycle Bin can be enabled in the Snapshot mode only (there is no such support for Real-Time mode): Configuration > Preferences and Settings > Advanced Properties : Enable Recycle bin mode: true (and Save).
    • Also, make sure this registry folder exists or you could experience “Recycle Bin on V: is corrupted” errors
      • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\BitBucket (For 32 bit Windows)
      • HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Explorer\BitBucket (For 64 bit Windows)
    • And finally disable all the Recycle Bin on the drives intended to be managed by FlexRaid (as well as on the virtual drive)
      1. Using Windows Explorer, navigate to each drive create a dummy file and then delete that dummy file. This will for sure create a Recycle Bin
      2. After doing the above for each drive, empty the recycle bin
      3. Right-click on the Windows recycle bin icon and choose properties
      4. There disable recycle bin for each of your drives
      5. Reboot
Also notice:
  • FlexRaid web UI is not fully compatible with Chrome. Some expansible panes of this UI collapse and expand for ever one accessed.
  • The logs are full of security errors. It’s simply due to the Web Client UI session which expires… It’s not fatal.
    • To change the log level, edit the files in C:\Program Files (x86)\<FlexRAID folder>\logging.options.txt
      • FLEXRAID_LOG_LEVEL=TRACE
      • FLEXRAID_SYS_LOG_LEVEL=TRACE
    • Also change the logs location as many files will be created in TRACE level
      • FLEXRAID_LOG_FILE_ROOT=C:\FlexRaidLogs\
  • The path of the log in Log4j.properties must be like C:/FlexRaidLogs/
  • I always experience issues when trying to remove disks from PPU or DRU. The only solution I found is to delete the configuration and create a new one.
  • Once a disk is added in a PPU or DRU, it’s mounted by FlexRaid in a hidden folder under C:\FlexRAID-Managed-Pool\…
    • This folder is only accessible by the System account which is the account configured to run the FlexRaid Service.
    • After deleting a “configuration” or uninstalling Flexraid, this folder is still there.
    • To view it, you must
      1. Start Explorer, click on the “View” menu and select “Options” (on the extreme right).
      2. There, go to the “View” tab and tick “Show hidden files, folder and drives”
      3. Uncheck “Hide protected operating system files (recommended).
    • Once the folder is visible, you can change the Security and grant full access right to the “Administrators” group. Possibly use this great tip to also easily take back the ownership with a contextual menu (works only on files/folders. not on drives).
  • Hidden files are not protected as ignored by FlexRaid.
  • When deleting a “configuration”, disk that were assigned a letter before being added in a PPU or DRU will get back that letter. However, in my case, disks were mounted. After deletion of the “configuration”, they should have been re-mounted on their original folder (according to someone from FlexRaid). But this didn’t occur. I had to re-mount them myself one by one. I have not receive any valid explanation yet…
  • When managing Shares and Permissions via FlexRaid UI:
    • You must use username defined in the domain (on Server 2012 Essential, a domain is always installed by default – and mandatory for various services). However, you don’t have to prefix that username with the domain name (I.e.: <domain>\<username>)
    • Latest changes done via the UI appear sometimes to not be reflected immediately in the Windows Properties (In the “Advanced Sharing” tab of the folder). Ex.: If you delete permissions for a user. You must both
      • Close the folder Properties window if already open in Windows.
      • Navigate to the “Home” menu and back to “Server Shares” in FlexRaid UI.
      • Back to the folder Properties > Advance Sharing, the changes should now be visible.
    • Don’t forget that you need to use a domain user to access the shares from remote machine. If you try to access them from a PC not joined to the domain, you must provide a login like “<domain>\<username>”. Also remind that Server 2012 Essentials is missing “HomeGroup” support.
Important remarks:
  • When using a “Snapshot” mode, data are in great danger as soon as files start to be modified and as long as the RAID is not updated. Indeed, as some data are modified, the information on the PPU may not be used anymore to restore files in the same range of address (?) in a DRU which would crash. Concretely, files restored using a not-up-to-date PPU will be corrupted.
    • I didn’t find any information that confirm (or not) that a disk is most likely to fail during write operations. But as I plan to often backup data on my server, I am afraid that I wouldn’t be able to restore my files after a crash occurring such an operation… So => I will investigate the RealTime mode which is still experimental
  • When using a RealTime Mode
    • Only use softwares that preallocate files to copy data into the pool (e.g.: Windows Explorer but not TeraCopy!!!)
    • There must be at least 10GB of free space on each disk going to be added in the Pool (As DRU).
    • Check that the Reserve is at least 50GB or more in the Configuration > Preferences and Settings > Run-Time Properties > General Properties: Reserve. This is the default when using the Cruise Control mode.
    • After a server crash, a Reconcile is required (Similar to a Windows Disk Scan).
    • Always stop the storage pool through the Web UI before restarting the FlexRAID service or your OS!!!
      • This can be solved using a Shutdown Task to be created in the Local Group Policies. See the script posted as comments.
    • Increase the WaitToKillServiceTimeout Registry key value to 300000 (it’s 5000 by default; i.e. Windows systems will kill the service in as little as 5 seconds which often does not leave FlexRAID with enough time to properly close up it resources).
      • HKEY_LOCAL_MACHINE \ System \ CurrentControlSet \ Control
    • Never write directly to the “source” drives but only through the pool as Explorer writes silently data on the disks. It means that a drive may never be temporarily removed from the pool, accessed from within another PC (even for “READ” only operations) and re-plugged later in the pool.
    • There is no Recycle Bin, meaning that data are deleted permanently.
      • And unfortunately, Shadow Copies can NOT be used on the Pool… So it cannot be used either to restore deleted files.
      • Actually Recycle Bin must be disable on all disks participating in the pool as well as on the pool itself. And that must be done for all accounts connected on the server.
      • As a replacement, the undeluxe software pro can be used. It can run as a service (start with windows) and will move deleted files into a folder that can be configured to be located on a drive outside the FlexRAID Pool… As an alternative, FlexRAID will come with a universal recycle bin in a next version…
    • It could be advised – for the Real-Time mode only – to disable thumbs.db file generation although those files being hidden, they are not taken into account (I.e.: not protected by FlexRaid). In addition
      1. On the Start Screen, type and run gpedit.msc.
      2. Expand User Configuration – Administrative Templates – Windows Components.
      3. Click on File Explorer.
      4. Right-click the entry “Turn off the caching of thumbnails in hidden thumbs.db files” and choose Edit.
      5. Enable the setting.
      6. Log off and back on again (or reboot) after making these changes.
      7. Delete all the thumb.db files from you drives using this command: Del {Drive Letter}:\Thumbs.db /f/s/q/a
    • Actually the options “Restrict Thumbs.db” and “Restrict Desktop.ini” can be used in the Console for that purpose, but it is useful only when using the Real-Time RAID in Expert mode and if frequently browsing the source drives. Indeed, although we are not changing anything on the source drives, Explorer will actually update the thumbs.db and desktop.ini files as we browse. There are implications though with selecting those options as Explorer will no longer cache your thumbnails or remember certain folder view preferences.
    • Parity data are only updated when accessing the drives through the Pool. If data are modified directly on the source drives, you will have to fix that by running the Reconcile task in FlexRaid.
  • That being, the following Windows Features are safe to use with FlexRaid
    • Windows Search Service
    • Windows Backup
    • BITS (Background Intelligent Transfer Service)

Site: http://www.flexraid.com/

Documentation: http://wiki.flexraid.com/

Support: http://forum.flexraid.com/ (support Tapatalk ;))