On Wed, 18 Feb 2015 15:04:53 +0000 (UTC) Chris wrote: > > Hello, > > by adapting what I could find, I compiled the following short snippet now. > > Could list members please look at this novice code and suggest a way to > determine the containing disk device $HDD_DEV from the parition/disk, > before I dare to test this. > > > > In udev-md-raid-assembly.rules, below LABEL="md_inc" (section only handling > all md suppported devices) add: > > # fix timouts for redundant raids, if possible > IMPORT{program}="BINDIR/mdadm --examine --export $tempnode" > TEST="/usr/sbin/smartctl", ENV{MD_LEVEL}=="raid[1-9]*", > RUN+="BINDIR/mdadm-erc-timout-fix.sh $tempnode" It might make sense to have 2 rules, one for partitions and one for disks (based on ENV{DEVTYPE}). Then use $parent to get the device from the partition, and $devnode to get the device of the disk. > > And in a new mdadm-erc-timout-fix.sh file implement: > > #! /bin/sh > > HDD_DEV= $1 somehow stipping off the tailing numbers? > > if smartctl -l scterc ${HDD_DEV} | grep -q Disabled ; then > /usr/sbin/smartctl -l scterc,70,70 ${HDD_DEV} > else > if ! smartctl -l scterc ${HDD_DEV} | grep -q seconds ; then > echo 180 >/sys/block/${HDD_DEV}/device/timeout > fi > fi You should be consistent and use /usr/sbin/smartctl everywhere, or explicitly set $PATH and just use smartctl everywhere. > > Correct execution during boot would seem to require that distro > package managers hook smartctl and the script into the initramfs > generation. > > Regards, > Chris One problem with this approach is that it assumes circumstances don't change. If you have a working RAID1, then limiting the timeout on both devices makes sense. If you have a degraded RAID1 with only one device left then you really want the drive to try as hard as it can to get the data. There is a "FAILFAST" mechanism in the kernel which allows the filesystem to md etc to indicate that it wants accesses to "fail fast", which presumably means to use a smaller timeout. I would rather md used this flag where appropriate, and for the device to respond to it by using suitable timeouts. The problem is that FAILFAST isn't documented usefully and it is very hard to figure out what exactly (if anything) it does. But until that is resolved, a fix like this is probably a good idea. NeilBrown