====== NAS - Build a Linux NAS - Tune the System ======
There are many settings that can be tuned.
As there are many factors than can affect tuning performance, it is recommended to perform a number of tests using different figures to determine the optimum values:
* Create a test file that will be used for reading and writing tests.
* Perform each test using **dd**.
dd if=testfile of=/dev/null bs=8k
* Between each test, clear the OS disk cache.
sync;echo 3 > /proc/sys/vm/drop_caches
**NOTE:** Vary the size of the test file depending on how much RAM the system has.
* It is suggested to have the test file be large, such as 32G.
* Consider changing the blocksize used with dd to a larger value, say bs=16k, if there is a lot of RAM.
**WARNING:** Some changes may result in a bit of extra memory being used.
* After each command, the amount of memory being used can be checked with:
free -m
returns:
total used free shared buff/cache available
Mem: 64326 22064 5354 1366 36907 40203
Swap: 975 315 660
----
===== Tuning stripe_cache_size =====
**stripe_cache_size** affects RAM used by mdadm for writing of data.
Ubuntu default value is 256.
Verify the current value:
cat /sys/block/md0/md/stripe_cache_size
Change it:
echo *number* > /sys/block/md0/md/stripe_cache_size
or
echo *number* | sudo tee /sys/block/md0/md/stripe_cache_size
**NOTE:** Test with different sizes.
* Test with stripe_cache_size=256
* Test with stripe_cache_size=512
* Test with stripe_cache_size=1024
* Test with stripe_cache_size=2048
* Test with stripe_cache_size=4096
* Test with stripe_cache_size=8192
* Test with stripe_cache_size=16384
* Test with stripe_cache_size=32768
At each test, check the **Write** performance, for example:
dd if=testfile of=/dev/null bs=8k
361680+0 records in
361680+0 records out
2962882560 bytes (3.0 GB) copied, 0.92471 s, 3.2 GB/s
Read speed test:
dd if=testfile of=/dev/zero
Write speed test:
dd if=/dev/zero of=testfile
Also check block devices:
blockdev --report
----
===== Tuning Read Ahead =====
Changing the read ahead value should impact **read** performance.
* Default read ahead value is **1536**.
Change it::
blockdev --setra *number* /dev/md0
**NOTE:** Test with different sizes.
* Test with Read Ahead @ 1536
* Test with Read Ahead @ 4096
* Test with Read Ahead @ 32768
* Test with Read Ahead @ 262144
* Test with Read Ahead @ 524288
At each test, check the **Read** performance, for example:
dd if=testfile of=/dev/null bs=8k
361680+0 records in
361680+0 records out
2962882560 bytes (3.0 GB) copied, 0.92471 s, 3.2 GB/s
Read speed test:
dd if=testfile of=/dev/zero
Write speed test:
dd if=/dev/zero of=testfile
Also check block devices:
blockdev --report
----
===== Script to auto-determine best tuning changes =====
#!/bin/bash
#
# Please note this test requires 30 GB of free space
# in your RAID md device
#
# The aim of this script is to find the best settings for performance
# of your RAID by testing each setting separately.
# This script does make some system modification, but if you don't
# make these changes permanent (e.g. write them in /etc/rc.local)
# At the next boot all the changes will be lost,
# so fill free to play with it!
#
# developed by alfonso / Jan 2012
#
#
# this is your mount point for the RAID
# PLEASE NOTE the script will REMOVE any file called testfile*.out in this folder!!
MNT=/mnt/storage
# this is device from which to get input
# no need to change this
INPUT=/dev/zero
# test for privileges
if [ "$(whoami)" != 'root' ]
then
echo Need to be root!
echo ABORT
exit 1
fi
if ! [ -d $MNT/lost+found ]
then
echo
echo "$MNT is not a file system! Something went wrong?"
echo ABORT
exit 1
fi
# find out which one is your md
# note that the script only works for one md. If you have more than one
# just uncomment the line below and type something like MDDEV=md0
MDDEV="`cat /proc/mdstat | grep md | head -1 | awk '{print $1}'`"
# MDDEV=md0
if [ -z "$MDDEV" ]
then
echo
echo "I can\'t find any md"
echo ABORT
exit 1
fi
#
# get the letter of all devices from cat /proc/mdstat
#
# this expression takes the output of /proc/mdstat
# then takes the line of our md
# then changes spaces into new lines
# then takes only lines starting with sd
# then take the 3rd character (a for sda1, etc)
# and then remove new lines to make a single string
DEVS="`cat /proc/mdstat | grep $MDDEV | tr " " "\n" | grep '^sd' | awk '{print substr($0,3,1)}' | tr -d "\n"`"
echo "These are devices found in $MDDEV: $DEVS"
function test_write()
{
# writing tests:
echo -n .
hdparm -f /dev/sd[$DEVS] > /dev/null
WRTE1=`dd if=$INPUT of=$MNT/testfile1.out bs=100kB count=100000 2>&1 | grep copied`
WRUN1=`echo $WRTE1 | awk ' { print ( $(NF) ) }'`
WRSP1=`echo $WRTE1 | awk ' { print ( $(NF-1) ) }'`
if [ $WRUN1 != "MB/s" ];
then
echo
echo "This script was created for all speeds measured in MB/s"
echo ABORT
exit 1
fi
echo -n .
hdparm -f /dev/sd[$DEVS] > /dev/null
WRTE2=`dd if=$INPUT of=$MNT/testfile2.out bs=1MB count=10000 2>&1 | grep copied`
WRUN2=`echo $WRTE2 | awk ' { print ( $(NF) ) }'`
WRSP2=`echo $WRTE2 | awk ' { print ( $(NF-1) ) }'`
if [ $WRUN2 != "MB/s" ];
then
echo
echo "This script was created for all speeds measured in MB/s"
echo ABORT
exit 1
fi
echo -n .
hdparm -f /dev/sd[$DEVS] > /dev/null
WRTE3=`dd if=$INPUT of=$MNT/testfile3.out bs=10MB count=1000 2>&1 | grep copied`
WRUN3=`echo $WRTE3 | awk ' { print ( $(NF) ) }'`
WRSP3=`echo $WRTE3 | awk ' { print ( $(NF-1) ) }'`
if [ $WRUN3 != "MB/s" ];
then
echo
echo "This script was created for all speeds measured in MB/s"
echo ABORT
exit 1
fi
AVG_WRITE=`echo "($WRSP1+$WRSP2+$WRSP3)*100/3" | bc`
#echo " Average write is $AVG_WRITE MB/s NOTE: there should be a dot before the last 2 digits"
echo " average write is `echo "scale=2; $AVG_WRITE / 100;" | bc` MB/s"
# echo $WRTE1
# echo $WRTE2
# echo $WRTE3
}
function test_read()
{
# reading tests:
echo -n .
hdparm -f /dev/sd[$DEVS] > /dev/null
READ1=`dd if=$MNT/testfile1.out of=/dev/null bs=100kB count=100000 2>&1 | grep copied`
RDUN1=`echo $READ1 | awk ' { print ( $(NF) ) }'`
RDSP1=`echo $READ1 | awk ' { print ( $(NF-1) ) }'`
if [ $RDUN1 != "MB/s" ];
then
echo
echo "This script was created for all speeds measured in MB/s"
echo ABORT
exit 1
fi
echo -n .
hdparm -f /dev/sd[$DEVS] > /dev/null
READ2=`dd if=$MNT/testfile2.out of=/dev/null bs=1MB count=10000 2>&1 | grep copied`
RDUN2=`echo $READ2 | awk ' { print ( $(NF) ) }'`
RDSP2=`echo $READ2 | awk ' { print ( $(NF-1) ) }'`
if [ $RDUN2 != "MB/s" ];
then
echo
echo "This script was created for all speeds measured in MB/s"
echo ABORT
exit 1
fi
echo -n .
hdparm -f /dev/sd[$DEVS] > /dev/null
READ3=`dd if=$MNT/testfile3.out of=/dev/null bs=10MB count=1000 2>&1 | grep copied`
RDUN3=`echo $READ3 | awk ' { print ( $(NF) ) }'`
RDSP3=`echo $READ3 | awk ' { print ( $(NF-1) ) }'`
if [ $RDUN3 != "MB/s" ];
then
echo
echo "This script was created for all speeds measured in MB/s"
echo ABORT
exit 1
fi
AVG_READ=`echo "($RDSP1+$RDSP2+$RDSP3)*100/3" | bc`
#echo " Average read is $AVG_READ MB/s NOTE: there should be a dot before the last 2 digits"
echo " average read is `echo "scale=2; $AVG_READ / 100;" | bc` MB/s"
#echo $READ1
#echo $READ2
#echo $READ3
}
echo
echo CURRENT SYSTEM SETTINGS
echo your current value of /sys/block/$MDDEV/md/stripe_cache_size is `cat /sys/block/$MDDEV/md/stripe_cache_size`
echo your current value of disk readahead is `blockdev --getra /dev/sd[$DEVS]`
echo your current value of md readahead is `blockdev --getra /dev/$MDDEV`
DEVINDEX=0
NUMDEVS=${#DEVS}
until [ $DEVINDEX -ge $NUMDEVS ]
do
DEVLETTER=${DEVS:$DEVINDEX:1}
echo your current value of /sys/block/sd$DEVLETTER/queue/max_sectors_kb is `cat /sys/block/sd$DEVLETTER/queue/max_sectors_kb`
DEVINDEX=$[$DEVINDEX+1]
done
echo
for i in 1 2 3 4
#for i in 1
# 1 when testing /sys/block/$MDDEV/md/stripe_cache_size
# 2 when testing disk readahead
# 3 when testing md readahead
# 4 when testing /sys/block/sdX/queue/max_sectors_kb
do
BEST_WRITE=0
BEST_WRITE_ID=0
WORST_WRITE=0
WORST_WRITE_ID=0
BEST_READ=0
BEST_READ_ID=0
WORST_READ=0
WORST_READ_ID=0
for j in 64 128 256 512 1024 2048 4096 8192 16384
# for j in 64 16384
do
#echo
#echo SYSTEM SETTINGS
#echo your current value of /sys/block/$MDDEV/md/stripe_cache_size is `cat /sys/block/$MDDEV/md/stripe_cache_size`
#echo your current value of disk readahead is `blockdev --getra /dev/sd[$DEVS]`
#echo your current value of md readahead is `blockdev --getra /dev/$MDDEV`
#DEVINDEX=0
#NUMDEVS=${#DEVS}
#until [ $DEVINDEX -ge $NUMDEVS ]
#do
# DEVLETTER=${DEVS:$DEVINDEX:1}
# echo your current value of /sys/block/sd$DEVLETTER/queue/max_sectors_kb is `cat /sys/block/sd$DEVLETTER/queue/max_sectors_kb`
# DEVINDEX=$[$DEVINDEX+1]
#done
#echo
case "$i" in
1) echo "We are testing md stripe_cache_size"
echo $j > /sys/block/$MDDEV/md/stripe_cache_size
echo "step 1/4: NOW your current value of /sys/block/$MDDEV/md/stripe_cache_size is `cat /sys/block/$MDDEV/md/stripe_cache_size`"
;;
2) echo "We are testing disks readahead"
blockdev --setra $j /dev/sd[$DEVS]
echo "step 2/4: NOW your current value of disk readahead is `blockdev --getra /dev/sd[$DEVS]`"
;;
3) echo "We are testing md readahead"
blockdev --setra $j /dev/$MDDEV
echo "step 3/4 NOW your current value of md readahead is `blockdev --getra /dev/$MDDEV`"
;;
4) echo "We are testing disks max_sectors_kb"
DEVINDEX=0
NUMDEVS=${#DEVS}
until [ $DEVINDEX -ge $NUMDEVS ]
do
DEVLETTER=${DEVS:$DEVINDEX:1}
echo $j > /sys/block/sd$DEVLETTER/queue/max_sectors_kb
echo "step 4/4 NOW your current value of /sys/block/sd$DEVLETTER/queue/max_sectors_kb is `cat /sys/block/sd$DEVLETTER/queue/max_sectors_kb`"
DEVINDEX=$[$DEVINDEX+1]
done
;;
*) echo "This text should never appear"
echo ABORT
exit 1
;;
esac
rm $MNT/testfile*.out 2> /dev/null
test_write
if [ "$BEST_WRITE" -eq "0" ]
then
#echo 1st test BEST_WRITE
BEST_WRITE=$AVG_WRITE
BEST_WRITE_ID=$j
fi
if [ "$WORST_WRITE" -eq "0" ]
then
#echo 1st test WORST_WRITE
WORST_WRITE=$AVG_WRITE
WORST_WRITE_ID=$j
fi
if [ "$AVG_WRITE" -ge "$BEST_WRITE" ]
then
echo "found new best write - old: `echo "scale=2; $BEST_WRITE / 100;" | bc` new: `echo "scale=2; $AVG_WRITE / 100;" | bc`"
#echo "old: $BEST_WRITE new: $AVG_WRITE"
BEST_WRITE=$AVG_WRITE
BEST_WRITE_ID=$j
fi
if [ "$AVG_WRITE" -le "$WORST_WRITE" ]
then
echo "found new worst write - old: `echo "scale=2; $WORST_WRITE / 100;" | bc` new: `echo "scale=2; $AVG_WRITE / 100;" | bc`"
#echo old: $WORST_WRITE new: $AVG_WRITE
WORST_WRITE=$AVG_WRITE
WORST_WRITE_ID=$j
fi
test_read
if [ "$BEST_READ" -eq "0" ]
then
#echo 1st test BEST_READ
BEST_READ=$AVG_READ
BEST_READ_ID=$j
fi
if [ "$WORST_READ" -eq "0" ]
then
#echo 1st test WORST_READ
WORST_READ=$AVG_READ
WORST_READ_ID=$j
fi
if [ "$AVG_READ" -ge "$BEST_READ" ]
then
echo "found new best read - old: `echo "scale=2; $BEST_READ / 100;" | bc` new: `echo "scale=2; $AVG_READ / 100;" | bc`"
#echo old: $BEST_READ new: $AVG_READ
BEST_READ=$AVG_READ
BEST_READ_ID=$j
fi
if [ "$AVG_READ" -le "$WORST_READ" ]
then
echo "found new worst read - old: `echo "scale=2; $WORST_READ / 100;" | bc` new: `echo "scale=2; $AVG_READ / 100;" | bc`"
#echo old: $WORST_READ new: $AVG_READ
WORST_READ=$AVG_READ
WORST_READ_ID=$j
fi
rm $MNT/testfile1.out
rm $MNT/testfile2.out
rm $MNT/testfile3.out
done
echo BEST_WRITE is $BEST_WRITE
echo BEST_WRITE_ID is $BEST_WRITE_ID
echo WORST_WRITE is $WORST_WRITE
echo WORST_WRITE_ID is $WORST_WRITE_ID
echo BEST_READ is $BEST_READ
echo BEST_READ_ID is $BEST_READ_ID
echo WORST_READ is $WORST_READ
echo WORST_READ_ID is $WORST_READ_ID
# now we want to understand if this test affected more READ or WRITE performances
DIFF_WRITE=$[ BEST_WRITE - WORST_WRITE ]
DIFF_READ=$[ BEST_READ - WORST_READ ]
if [ "$DIFF_READ" -gt "$DIFF_WRITE" ]
then
echo this test affected more READ than WRITE
BEST_OVERALL_ID=$BEST_READ_ID
WORST_OVERALL_ID=$WORST_READ_ID
else
echo this test affected more WRITE than READ
BEST_OVERALL_ID=$BEST_WRITE_ID
WORST_OVERALL_ID=$WORST_WRITE_ID
fi
case "$i" in
1) echo "$BEST_OVERALL_ID is the OPTIMAL value for md stripe_cache_size"
BEST_1_ID=$BEST_OVERALL_ID
echo $BEST_OVERALL_ID > /sys/block/$MDDEV/md/stripe_cache_size
;;
2) echo "$BEST_OVERALL_ID is the OPTIMAL value for disks readahead"
BEST_2_ID=$BEST_OVERALL_ID
blockdev --setra $BEST_OVERALL_ID /dev/sd[$DEVS]
;;
3) echo "$BEST_OVERALL_ID is the OPTIMAL value for md readahead"
BEST_3_ID=$BEST_OVERALL_ID
blockdev --setra $BEST_OVERALL_ID /dev/$MDDEV
;;
4) echo "$BEST_OVERALL_ID is the OPTIMAL value for max_sectors_kb"
BEST_4_ID=$BEST_OVERALL_ID
DEVINDEX=0
NUMDEVS=${#DEVS}
until [ $DEVINDEX -ge $NUMDEVS ]
do
DEVLETTER=${DEVS:$DEVINDEX:1}
echo $BEST_OVERALL_ID > /sys/block/sd$DEVLETTER/queue/max_sectors_kb
DEVINDEX=$[$DEVINDEX+1]
done
;;
*) echo "This text should never appear"
echo ABORT
exit 1
;;
esac
done
echo the best for md stripe_cache_size is $BEST_1_ID
echo the best for disks readahead is $BEST_2_ID
echo the best for md readahead is $BEST_3_ID
echo the best for max_sectors_kb is $BEST_4_ID
echo
echo "Add the following lines to your /etc/rc.local"
echo
echo "echo $BEST_1_ID > /sys/block/$MDDEV/md/stripe_cache_size"
echo "blockdev --setra $BEST_2_ID /dev/sd[$DEVS]"
echo "blockdev --setra $BEST_3_ID /dev/$MDDEV"
DEVINDEX=0
NUMDEVS=${#DEVS}
until [ $DEVINDEX -ge $NUMDEVS ]
do
DEVLETTER=${DEVS:$DEVINDEX:1}
echo "echo $BEST_4_ID > /sys/block/sd$DEVLETTER/queue/max_sectors_kb"
DEVINDEX=$[$DEVINDEX+1]
done
exit 0
----
===== Make these tuning changes permanent =====
There are many ways to allow this.
The suggested approach is to create a file **/etc/profile.d/raid_tuning.sh**:
echo *number* | sudo tee /sys/block/md0/md/stripe_cache_size
blockdev --setra *number* /dev/md0
For example:
echo 8192 > /sys/block/md0/md/stripe_cache_size
echo 256 > /sys/block/sdb/queue/max_sectors_kb
echo 256 > /sys/block/sdc/queue/max_sectors_kb
echo 256 > /sys/block/sdd/queue/max_sectors_kb
echo 256 > /sys/block/sde/queue/max_sectors_kb
blockdev --setra 64 /dev/sd[bcde]
blockdev --setra 16384 /dev/md0
**NOTE:** The files within **/etc/profile.d/** are automatically run at boot, which will result in these tuning options being set at each boot.
----
===== References =====
https://h3x.no/2011/07/09/tuning-ubuntu-mdadm-raid56
https://ubuntuforums.org/showthread.php?t=1494846
https://ubuntuforums.org/showthread.php?p=11646960
https://askubuntu.com/questions/20852/making-stripe-cache-size-permanent