All of lore.kernel.org
 help / color / mirror / Atom feed
* md device remains active even when all supporting disks have failed and been disabled by the kernel.
@ 2022-01-07 22:30 Aidan Walton
  2022-01-10  9:47 ` Mariusz Tkaczyk
  0 siblings, 1 reply; 14+ messages in thread
From: Aidan Walton @ 2022-01-07 22:30 UTC (permalink / raw)
  To: linux-raid

Hi,
I have a system running:
Ubuntu Server 20.04.3 LTS on a 5.4.0-92-generic kernel.

On the motherboard is a:
SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller (rev 02)

Connected to this I have:
2x Western Digital - WDC WD5001AALS-00L3B2, BIOS :01.03B01 500Gb drives

Each drive has a single partition and is part of a RAID1 array:
/dev/md90:
.....
    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       2       8       49        1      active sync   /dev/sdd1

On top of this I have a single LVM VG and LV. (Probably not important)

I have noticed some strange behaviour and upon investigation I find
the md device in the following state:
/dev/md90:
....

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       -       0        0        1      removed

       2       8       49        -      faulty   /dev/sdd1


In fact neither /dev/sdc1 or /dev/sdd1 are available. In fact neither
are /dev/sdc or /dev/sdd, the physical drives, as they both been
disconnected by the kernel:
/dev/sdc is attached to ata7:00  and  /dev/sdd is attached to ata.8:00
This is the log of the kernel events:


Jan 07 22:09:03 mx kernel: ata7.00: exception Emask 0x32 SAct 0x0 SErr
0x0 action 0xe frozen
Jan 07 22:09:03 mx kernel: ata7.00: irq_stat 0xffffffff, unknown FIS
00000000 00000000 00000000 00000000, host bus
Jan 07 22:09:03 mx kernel: ata7.00: failed command: READ DMA
Jan 07 22:09:03 mx kernel: ata7.00: cmd
c8/00:00:00:cf:26/00:00:00:00:00/e0 tag 18 dma 131072 in
Jan 07 22:09:03 mx kernel: ata7.00: status: { DRDY }
Jan 07 22:09:03 mx kernel: ata7: hard resetting link
Jan 07 22:09:03 mx kernel: ata7: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jan 07 22:09:09 mx kernel: ata7.00: qc timeout (cmd 0xec)
Jan 07 22:09:09 mx kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 07 22:09:09 mx kernel: ata7.00: revalidation failed (errno=-5)
Jan 07 22:09:09 mx kernel: ata7: hard resetting link
Jan 07 22:09:19 mx kernel: ata7: softreset failed (1st FIS failed)
Jan 07 22:09:19 mx kernel: ata7: hard resetting link
Jan 07 22:09:29 mx kernel: ata7: softreset failed (1st FIS failed)
Jan 07 22:09:29 mx kernel: ata7: hard resetting link
Jan 07 22:09:35 mx kernel: ata8.00: exception Emask 0x40 SAct 0x0 SErr
0x800 action 0x6 frozen
Jan 07 22:09:35 mx kernel: ata8: SError: { HostInt }
Jan 07 22:09:35 mx kernel: ata8.00: failed command: READ DMA
Jan 07 22:09:35 mx kernel: ata8.00: cmd
c8/00:00:00:64:4a/00:00:00:00:00/e0 tag 2 dma 131072 in
Jan 07 22:09:35 mx kernel: ata8.00: status: { DRDY }
Jan 07 22:09:35 mx kernel: ata8: hard resetting link
Jan 07 22:09:45 mx kernel: ata8: softreset failed (1st FIS failed)
Jan 07 22:09:45 mx kernel: ata8: hard resetting link
Jan 07 22:09:55 mx kernel: ata8: softreset failed (1st FIS failed)
Jan 07 22:09:55 mx kernel: ata8: hard resetting link
Jan 07 22:10:04 mx kernel: ata7: softreset failed (1st FIS failed)
Jan 07 22:10:04 mx kernel: ata7: hard resetting link
Jan 07 22:10:09 mx kernel: ata7: softreset failed (1st FIS failed)
Jan 07 22:10:09 mx kernel: ata7: reset failed, giving up
Jan 07 22:10:09 mx kernel: ata7.00: disabled
Jan 07 22:10:09 mx kernel: ata7: EH complete
Jan 07 22:10:30 mx kernel: ata8: softreset failed (1st FIS failed)
Jan 07 22:10:30 mx kernel: ata8: hard resetting link
Jan 07 22:10:35 mx kernel: ata8: softreset failed (1st FIS failed)
Jan 07 22:10:35 mx kernel: ata8: reset failed, giving up
Jan 07 22:10:35 mx kernel: ata8.00: disabled
Jan 07 22:10:35 mx kernel: ata8: EH complete

This is happening because of some issue with the SATA controller on
the motherboard. This has not been resolved and probably never will
be, I see many others through google search complaining of similar
issues with the SATA controller.
This failure only occurs when the SATA controller is placed under very
heavy load, I have minimised the impact of the problem by not using
NCQ, this helps, but it still occurs. Ironically the biggest issue I
have is that mdadm "checkarray" is running because of a systemd
background process every week or so, and this hammers the disk into
failure. Most of the normal daily usage never generates the link
resets.
Naturally I have changed SATA cables and moved drives around onto
different controllers, but alas, it does seem to be the hardware on
the motherboard.
However as a workaround I was hoping to accept the occasional failure
and then using some scripting and 'setpci' I can get the kernel to
hard reset the chipset and attach the drives again. I have the process
working in terms of getting the kernel to re-attach the drives,
but.......

Unfortunately mdraid will not let go of them, I can not stop the
arrays, and therefore can't rebuild them. If I simply allow the kernel
to re-attach the drives the kernel names are swapped over, as
something (mdraid) is stopping the kernel re-using the same device
names. Anyway being dependent on the same kernel device names is not a
great plan anyway, so I was simply trying to get mdadm to reassemble
the array as soon as the 'workaround' script gets the drives back in
contact with libata (kernel).

Plan:
1. Detecting the problem. (mdadm state)
2. Stop the array totally (can NOT do it)
3. reset the chipset across the PCI bus.
4. allow kernel to re-attach drives.
5. re-assemble the md device with mdadm
6. restart, if necessary higher layer services...

So why is mdraid holding on to the array:

# mdadm --stop /dev/md90
mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
process, mounted filesystem or active volume group?

I can not be 100% sure that something else is using the device, but I
can't think of anything that is and I stopped every process I can
think of..... Plus why is the array still shown as 'active' when none
of its member devices even exist anymore?

What I do know is that device mapper (coming down from LVM)  still has
an entry in /dev/mapper. But then probably no surprise as /dev/md90
the failed array is still an active device node. If you attempt to
write to it, I receive I/O errors from the kernel. In fact as far as
any higher layer services are concerned md90 and the LVM LV on top of
it are still active and working when in reality, they are not. It
causes very strange NFS errors and such.

mdraid does actually attempt to iteratively remove both partitions
when the kernel signals the disable state, but only 1 of them
succeeds.
I did an strace of the same iterative 'fail:remove' process that
mdraid attempts when the kernel issues -- kernel: ata7.00: disabled

eg:
/sbin/mdadm -If sdc1 --path pci-0000:02:00.0-ata-1
mdadm: set device faulty failed for sdc1:  Device or resource busy

The only clue is perhaps this line from the strace:
openat(AT_FDCWD, "/sys/block/md90/md/dev-sdc1/block/dev", O_RDWR) = -1
EACCES (Permission denied)    What is the mdadm command doing that
results in a permission problem?

So the only way I can get rid of this md raid array is a reboot. Damn!!!


Any help is much appreciated.
Aidan





I include a truncated dump of "strace /sbin/mdadm -If sdc1 --path
pci-0000:02:00.0-ata-1"   below. Hoping not to clutter too much


strace /sbin/mdadm -If sdc1 --path pci-0000:02:00.0-ata-1
execve("/sbin/mdadm", ["/sbin/mdadm", "-If", "sdc1", "--path",
"pci-0000:02:00.0-ata-1"], 0x7ffea5512340 /* 24 vars */) = 0
brk(NULL)                               = 0x55a104817000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffd2e463570) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=62948, ...}) = 0
mmap(NULL, 62948, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f3e1985f000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0
\22\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=18816, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f3e1985d000
mmap(NULL, 20752, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f3e19857000
mmap(0x7f3e19858000, 8192, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7f3e19858000
mmap(0x7f3e1985a000, 4096, PROT_READ,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f3e1985a000
mmap(0x7f3e1985b000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f3e1985b000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360q\2\0\0\0\0\0"...,
832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"...,
784, 64) = 784
pread64(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0",
32, 848) = 32
pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\t\233\222%\274\260\320\31\331\326\10\204\276X>\263"...,
68, 880) = 68
fstat(3, {st_mode=S_IFREG|0755, st_size=2029224, ...}) = 0
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"...,
784, 64) = 784
pread64(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0",
32, 848) = 32
pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\t\233\222%\274\260\320\31\331\326\10\204\276X>\263"...,
68, 880) = 68
mmap(NULL, 2036952, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f3e19665000
mprotect(0x7f3e1968a000, 1847296, PROT_NONE) = 0
mmap(0x7f3e1968a000, 1540096, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f3e1968a000
mmap(0x7f3e19802000, 303104, PROT_READ,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19d000) = 0x7f3e19802000
mmap(0x7f3e1984d000, 24576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f3e1984d000
mmap(0x7f3e19853000, 13528, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f3e19853000
close(3)                                = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f3e19662000
arch_prctl(ARCH_SET_FS, 0x7f3e19662740) = 0
mprotect(0x7f3e1984d000, 12288, PROT_READ) = 0
mprotect(0x7f3e1985b000, 4096, PROT_READ) = 0
mprotect(0x55a103364000, 4096, PROT_READ) = 0
mprotect(0x7f3e1989c000, 4096, PROT_READ) = 0
munmap(0x7f3e1985f000, 62948)           = 0
getpid()                                = 44041
uname({sysname="Linux", nodename="mx", ...}) = 0
brk(NULL)                               = 0x55a104817000
brk(0x55a104838000)                     = 0x55a104838000
openat(AT_FDCWD, "/etc/mdadm/mdadm.conf", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=949, ...}) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=949, ...}) = 0
read(3, "# mdadm.conf\n#\n# !NB! Run update"..., 4096) = 949
read(3, "", 4096)                       = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=949, ...}) = 0
close(3)                                = 0
openat(AT_FDCWD, "/etc/mdadm/mdadm.conf.d", O_RDONLY) = -1 ENOENT (No
such file or directory)
uname({sysname="Linux", nodename="mx", ...}) = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=62948, ...}) = 0
mmap(NULL, 62948, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f3e1985f000
close(3)                                = 0
openat(AT_FDCWD,
"/lib/x86_64-linux-gnu/tls/x86_64/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/lib/x86_64-linux-gnu/tls/x86_64/x86_64", 0x7ffd2e462630) = -1
ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/tls/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/lib/x86_64-linux-gnu/tls/x86_64", 0x7ffd2e462630) = -1 ENOENT
(No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/tls/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/lib/x86_64-linux-gnu/tls/x86_64", 0x7ffd2e462630) = -1 ENOENT
(No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/tls/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/lib/x86_64-linux-gnu/tls", 0x7ffd2e462630) = -1 ENOENT (No such
file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/x86_64/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/lib/x86_64-linux-gnu/x86_64/x86_64", 0x7ffd2e462630) = -1
ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/lib/x86_64-linux-gnu/x86_64", 0x7ffd2e462630) = -1 ENOENT (No
such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/lib/x86_64-linux-gnu/x86_64", 0x7ffd2e462630) = -1 ENOENT (No
such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/lib/x86_64-linux-gnu", {st_mode=S_IFDIR|0755, st_size=20480, ...}) = 0
openat(AT_FDCWD,
"/usr/lib/x86_64-linux-gnu/tls/x86_64/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64-linux-gnu/tls/x86_64/x86_64", 0x7ffd2e462630) =
-1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/lib/x86_64-linux-gnu/tls/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64-linux-gnu/tls/x86_64", 0x7ffd2e462630) = -1
ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/lib/x86_64-linux-gnu/tls/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64-linux-gnu/tls/x86_64", 0x7ffd2e462630) = -1
ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/tls/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64-linux-gnu/tls", 0x7ffd2e462630) = -1 ENOENT (No
such file or directory)
openat(AT_FDCWD,
"/usr/lib/x86_64-linux-gnu/x86_64/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64-linux-gnu/x86_64/x86_64", 0x7ffd2e462630) = -1
ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64-linux-gnu/x86_64", 0x7ffd2e462630) = -1 ENOENT
(No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64-linux-gnu/x86_64", 0x7ffd2e462630) = -1 ENOENT
(No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64-linux-gnu", {st_mode=S_IFDIR|0755,
st_size=65536, ...}) = 0
openat(AT_FDCWD, "/lib/tls/x86_64/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/lib/tls/x86_64/x86_64", 0x7ffd2e462630) = -1 ENOENT (No such
file or directory)
openat(AT_FDCWD, "/lib/tls/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC)
= -1 ENOENT (No such file or directory)
stat("/lib/tls/x86_64", 0x7ffd2e462630) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/tls/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC)
= -1 ENOENT (No such file or directory)
stat("/lib/tls/x86_64", 0x7ffd2e462630) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/tls/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1
ENOENT (No such file or directory)
stat("/lib/tls", 0x7ffd2e462630)        = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/lib/x86_64/x86_64", 0x7ffd2e462630) = -1 ENOENT (No such file
or directory)
openat(AT_FDCWD, "/lib/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) =
-1 ENOENT (No such file or directory)
stat("/lib/x86_64", 0x7ffd2e462630)     = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) =
-1 ENOENT (No such file or directory)
stat("/lib/x86_64", 0x7ffd2e462630)     = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1
ENOENT (No such file or directory)
stat("/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
openat(AT_FDCWD, "/usr/lib/tls/x86_64/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/tls/x86_64/x86_64", 0x7ffd2e462630) = -1 ENOENT (No
such file or directory)
openat(AT_FDCWD, "/usr/lib/tls/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/tls/x86_64", 0x7ffd2e462630) = -1 ENOENT (No such file
or directory)
openat(AT_FDCWD, "/usr/lib/tls/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/tls/x86_64", 0x7ffd2e462630) = -1 ENOENT (No such file
or directory)
openat(AT_FDCWD, "/usr/lib/tls/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) =
-1 ENOENT (No such file or directory)
stat("/usr/lib/tls", 0x7ffd2e462630)    = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64/x86_64/libdlm_lt.so.3",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64/x86_64", 0x7ffd2e462630) = -1 ENOENT (No such
file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC)
= -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64", 0x7ffd2e462630) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC)
= -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64", 0x7ffd2e462630) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1
ENOENT (No such file or directory)
stat("/usr/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
munmap(0x7f3e1985f000, 62948)           = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=62948, ...}) = 0
mmap(NULL, 62948, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f3e1985f000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libcmap.so.4",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libcmap.so.4",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/libcmap.so.4", O_RDONLY|O_CLOEXEC) = -1 ENOENT
(No such file or directory)
openat(AT_FDCWD, "/usr/lib/libcmap.so.4", O_RDONLY|O_CLOEXEC) = -1
ENOENT (No such file or directory)
munmap(0x7f3e1985f000, 62948)           = 0
geteuid()                               = 0
ioctl(-1, GET_ARRAY_INFO, 0x7ffd2e463050) = -1 EBADF (Bad file descriptor)
openat(AT_FDCWD, "/proc/mdstat", O_RDONLY) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "Personalities : [raid1] [linear]"..., 1024) = 449
read(3, "", 1024)                       = 0
close(3)                                = 0
stat("/sys/block/md90/md", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
openat(AT_FDCWD, "/sys/block/md90/dev", O_RDONLY) = 3
read(3, "9:90\n", 20)                   = 5
close(3)                                = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=2000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=4000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=8000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=16000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=32000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=64000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=128000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/dev/.tmp.md.44041:9:90")       = 0
getpid()                                = 44041
mknod("/tmp/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/tmp/.tmp.md.44041:9:90", O_RDWR|O_EXCL|O_DIRECT) =
-1 EBUSY (Device or resource busy)
unlink("/tmp/.tmp.md.44041:9:90")       = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=256000000}, NULL) = 0
openat(AT_FDCWD, "/sys/block/md90/dev", O_RDONLY) = 3
read(3, "9:90\n", 20)                   = 5
close(3)                                = 0
getpid()                                = 44041
mknod("/dev/.tmp.md.44041:9:90", S_IFBLK|0600, makedev(0x9, 0x5a)) = 0
openat(AT_FDCWD, "/dev/.tmp.md.44041:9:90", O_RDONLY|O_DIRECT) = 3
unlink("/dev/.tmp.md.44041:9:90")       = 0
openat(AT_FDCWD, "/run/mdadm/map", O_RDONLY) = 4
fcntl(4, F_GETFL)                       = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fstat(4, {st_mode=S_IFREG|0600, st_size=164, ...}) = 0
read(4, "md90 1.2 36ddfdb1:8625d559:73510"..., 4096) = 164
read(4, "", 4096)                       = 0
close(4)                                = 0
openat(AT_FDCWD, "/proc/mdstat", O_RDONLY) = 4
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(4, "Personalities : [raid1] [linear]"..., 1024) = 449
read(4, "", 1024)                       = 0
close(4)                                = 0
mkdir("/run/mdadm/failed-slots", 0700)  = 0
openat(AT_FDCWD, "/run/mdadm/failed-slots/pci-0000:02:00.0-ata-1",
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
write(4, "1.2 36ddfdb1:8625d559:73510620:8"..., 40) = 40
close(4)                                = 0
fstat(3, {st_mode=S_IFBLK|0600, st_rdev=makedev(0x9, 0x5a), ...}) = 0
readlink("/sys/dev/block/9:90", "../../devices/virtual/block/md90", 199) = 32
stat("/sys/block/md90/md", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
ioctl(3, GET_ARRAY_INFO, 0x7ffd2e462600) = 0
fstat(3, {st_mode=S_IFBLK|0600, st_rdev=makedev(0x9, 0x5a), ...}) = 0
readlink("/sys/dev/block/9:90", "../../devices/virtual/block/md90", 199) = 32
openat(AT_FDCWD, "/sys/block/md90/md/component_size", O_RDONLY) = 4
read(4, "488253440\n", 120)             = 10
close(4)                                = 0
fstat(3, {st_mode=S_IFBLK|0600, st_rdev=makedev(0x9, 0x5a), ...}) = 0
readlink("/sys/dev/block/9:90", "../../devices/virtual/block/md90", 199) = 32
stat("/sys/block/md90/md", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
openat(AT_FDCWD, "/sys/block/md90/md/metadata_version", O_RDONLY) = 4
read(4, "1.2\n", 4096)                  = 4
close(4)                                = 0
fstat(3, {st_mode=S_IFBLK|0600, st_rdev=makedev(0x9, 0x5a), ...}) = 0
readlink("/sys/dev/block/9:90", "../../devices/virtual/block/md90", 199) = 32
fstat(3, {st_mode=S_IFBLK|0600, st_rdev=makedev(0x9, 0x5a), ...}) = 0
readlink("/sys/dev/block/9:90", "../../devices/virtual/block/md90", 199) = 32
openat(AT_FDCWD, "/sys/block/md90/md/dev-sdc1/block/dev", O_RDWR) = -1
EACCES (Permission denied)
openat(AT_FDCWD, "/sys/block/md90/md/dev-sdc1/block/dev", O_RDONLY) = 4
lseek(4, 0, SEEK_SET)                   = 0
read(4, "8:33\n", 20)                   = 5
close(4)                                = 0
ioctl(3, SET_DISK_FAULTY, 0x821)        = -1 EBUSY (Device or resource busy)
write(2, "mdadm: set device faulty failed "..., 67mdadm: set device
faulty failed for sdc1:  Device or resource busy
) = 67
openat(AT_FDCWD, "/sys/block/md90/dev", O_RDONLY) = 4
read(4, "9:90\n", 20)                   = 5
close(4)                                = 0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: md device remains active even when all supporting disks have failed and been disabled by the kernel.
  2022-01-07 22:30 md device remains active even when all supporting disks have failed and been disabled by the kernel Aidan Walton
@ 2022-01-10  9:47 ` Mariusz Tkaczyk
  2022-01-12  1:29   ` Aidan Walton
  0 siblings, 1 reply; 14+ messages in thread
From: Mariusz Tkaczyk @ 2022-01-10  9:47 UTC (permalink / raw)
  To: Aidan Walton; +Cc: linux-raid

On Fri, 7 Jan 2022 23:30:31 +0100
Aidan Walton <aidan.walton@gmail.com> wrote:

> Hi,
> I have a system running:
> Ubuntu Server 20.04.3 LTS on a 5.4.0-92-generic kernel.
> 
> On the motherboard is a:
> SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller
> (rev 02)
> 
> Connected to this I have:
> 2x Western Digital - WDC WD5001AALS-00L3B2, BIOS :01.03B01 500Gb
> drives
> 
> Each drive has a single partition and is part of a RAID1 array:
> /dev/md90:
> .....
>     Number   Major   Minor   RaidDevice State
>        0       8       33        0      active sync   /dev/sdc1
>        2       8       49        1      active sync   /dev/sdd1
> 
> On top of this I have a single LVM VG and LV. (Probably not important)
> 
> I have noticed some strange behaviour and upon investigation I find
> the md device in the following state:
> /dev/md90:
> ....
> 
>     Number   Major   Minor   RaidDevice State
>        0       8       33        0      active sync   /dev/sdc1
>        -       0        0        1      removed
> 
>        2       8       49        -      faulty   /dev/sdd1
> 
> 
> In fact neither /dev/sdc1 or /dev/sdd1 are available. In fact neither
> are /dev/sdc or /dev/sdd, the physical drives, as they both been
> disconnected by the kernel:
> /dev/sdc is attached to ata7:00  and  /dev/sdd is attached to ata.8:00
> This is the log of the kernel events:
> 
> 
> Jan 07 22:09:03 mx kernel: ata7.00: exception Emask 0x32 SAct 0x0 SErr
> 0x0 action 0xe frozen
> Jan 07 22:09:03 mx kernel: ata7.00: irq_stat 0xffffffff, unknown FIS
> 00000000 00000000 00000000 00000000, host bus
> Jan 07 22:09:03 mx kernel: ata7.00: failed command: READ DMA
> Jan 07 22:09:03 mx kernel: ata7.00: cmd
> c8/00:00:00:cf:26/00:00:00:00:00/e0 tag 18 dma 131072 in
> Jan 07 22:09:03 mx kernel: ata7.00: status: { DRDY }
> Jan 07 22:09:03 mx kernel: ata7: hard resetting link
> Jan 07 22:09:03 mx kernel: ata7: SATA link up 1.5 Gbps (SStatus 113
> SControl 310)
> Jan 07 22:09:09 mx kernel: ata7.00: qc timeout (cmd 0xec)
> Jan 07 22:09:09 mx kernel: ata7.00: failed to IDENTIFY (I/O error,
> err_mask=0x4) Jan 07 22:09:09 mx kernel: ata7.00: revalidation failed
> (errno=-5) Jan 07 22:09:09 mx kernel: ata7: hard resetting link
> Jan 07 22:09:19 mx kernel: ata7: softreset failed (1st FIS failed)
> Jan 07 22:09:19 mx kernel: ata7: hard resetting link
> Jan 07 22:09:29 mx kernel: ata7: softreset failed (1st FIS failed)
> Jan 07 22:09:29 mx kernel: ata7: hard resetting link
> Jan 07 22:09:35 mx kernel: ata8.00: exception Emask 0x40 SAct 0x0 SErr
> 0x800 action 0x6 frozen
> Jan 07 22:09:35 mx kernel: ata8: SError: { HostInt }
> Jan 07 22:09:35 mx kernel: ata8.00: failed command: READ DMA
> Jan 07 22:09:35 mx kernel: ata8.00: cmd
> c8/00:00:00:64:4a/00:00:00:00:00/e0 tag 2 dma 131072 in
> Jan 07 22:09:35 mx kernel: ata8.00: status: { DRDY }
> Jan 07 22:09:35 mx kernel: ata8: hard resetting link
> Jan 07 22:09:45 mx kernel: ata8: softreset failed (1st FIS failed)
> Jan 07 22:09:45 mx kernel: ata8: hard resetting link
> Jan 07 22:09:55 mx kernel: ata8: softreset failed (1st FIS failed)
> Jan 07 22:09:55 mx kernel: ata8: hard resetting link
> Jan 07 22:10:04 mx kernel: ata7: softreset failed (1st FIS failed)
> Jan 07 22:10:04 mx kernel: ata7: hard resetting link
> Jan 07 22:10:09 mx kernel: ata7: softreset failed (1st FIS failed)
> Jan 07 22:10:09 mx kernel: ata7: reset failed, giving up
> Jan 07 22:10:09 mx kernel: ata7.00: disabled
> Jan 07 22:10:09 mx kernel: ata7: EH complete
> Jan 07 22:10:30 mx kernel: ata8: softreset failed (1st FIS failed)
> Jan 07 22:10:30 mx kernel: ata8: hard resetting link
> Jan 07 22:10:35 mx kernel: ata8: softreset failed (1st FIS failed)
> Jan 07 22:10:35 mx kernel: ata8: reset failed, giving up
> Jan 07 22:10:35 mx kernel: ata8.00: disabled
> Jan 07 22:10:35 mx kernel: ata8: EH complete
> 
> This is happening because of some issue with the SATA controller on
> the motherboard. This has not been resolved and probably never will
> be, I see many others through google search complaining of similar
> issues with the SATA controller.
> This failure only occurs when the SATA controller is placed under very
> heavy load, I have minimised the impact of the problem by not using
> NCQ, this helps, but it still occurs. Ironically the biggest issue I
> have is that mdadm "checkarray" is running because of a systemd
> background process every week or so, and this hammers the disk into
> failure. Most of the normal daily usage never generates the link
> resets.
> Naturally I have changed SATA cables and moved drives around onto
> different controllers, but alas, it does seem to be the hardware on
> the motherboard.
> However as a workaround I was hoping to accept the occasional failure
> and then using some scripting and 'setpci' I can get the kernel to
> hard reset the chipset and attach the drives again. I have the process
> working in terms of getting the kernel to re-attach the drives,
> but.......
> 
> Unfortunately mdraid will not let go of them, I can not stop the
> arrays, and therefore can't rebuild them. If I simply allow the kernel
> to re-attach the drives the kernel names are swapped over, as
> something (mdraid) is stopping the kernel re-using the same device
> names. Anyway being dependent on the same kernel device names is not a
> great plan anyway, so I was simply trying to get mdadm to reassemble
> the array as soon as the 'workaround' script gets the drives back in
> contact with libata (kernel).
> 
> Plan:
> 1. Detecting the problem. (mdadm state)
> 2. Stop the array totally (can NOT do it)
> 3. reset the chipset across the PCI bus.
> 4. allow kernel to re-attach drives.
> 5. re-assemble the md device with mdadm
> 6. restart, if necessary higher layer services...
> 
> So why is mdraid holding on to the array:
> 
> # mdadm --stop /dev/md90
> mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
> process, mounted filesystem or active volume group?
> 
> I can not be 100% sure that something else is using the device, but I
> can't think of anything that is and I stopped every process I can
> think of..... Plus why is the array still shown as 'active' when none
> of its member devices even exist anymore?
> 
> What I do know is that device mapper (coming down from LVM)  still has
> an entry in /dev/mapper. But then probably no surprise as /dev/md90
> the failed array is still an active device node. If you attempt to
> write to it, I receive I/O errors from the kernel. In fact as far as
> any higher layer services are concerned md90 and the LVM LV on top of
> it are still active and working when in reality, they are not. It
> causes very strange NFS errors and such.
> 
> mdraid does actually attempt to iteratively remove both partitions
> when the kernel signals the disable state, but only 1 of them
> succeeds.
> I did an strace of the same iterative 'fail:remove' process that
> mdraid attempts when the kernel issues -- kernel: ata7.00: disabled
> 
> eg:
> /sbin/mdadm -If sdc1 --path pci-0000:02:00.0-ata-1
> mdadm: set device faulty failed for sdc1:  Device or resource busy
> 
> The only clue is perhaps this line from the strace:
> openat(AT_FDCWD, "/sys/block/md90/md/dev-sdc1/block/dev", O_RDWR) = -1
> EACCES (Permission denied)    What is the mdadm command doing that
> results in a permission problem?
> 
> So the only way I can get rid of this md raid array is a reboot.
> Damn!!!
> 
> 
> Any help is much appreciated.
> Aidan
> 
> 
> 
Hi Aidan,
This is how it is implemented. Drive is not removed if array failure
will cause array failed. Please see:
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?id=9a567843f7ce0037bfd4d5fdc58a09d0a527b28b

For RAID1 you can use solution proposed in patch below but IMO it is
not your problem. Please stop LVM and then try to stop array. To stop
array it needs to be "free" (all upper handlers are down).

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: md device remains active even when all supporting disks have failed and been disabled by the kernel.
  2022-01-10  9:47 ` Mariusz Tkaczyk
@ 2022-01-12  1:29   ` Aidan Walton
  2022-01-12 22:54     ` Roger Heflin
  2022-01-13 14:46     ` Mariusz Tkaczyk
  0 siblings, 2 replies; 14+ messages in thread
From: Aidan Walton @ 2022-01-12  1:29 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: linux-raid

Hi Mariusz,
In my case, the fact that mdraid does not show a 'total failure' is
not the root of the problem. However in my opinion I would say that
not having mdraid more accurately reflect the state of the underlying
hardware can be mis-leading. Initially when I looked at this issue, I
was convinced that only one disk had failed and I was scratching my
head about firstly why I still could not R/W the array while it
appeared to have an active member. Secondly, when I rebooted I noticed
that the array became instantly synchronised with both members active.
This was not what I expected, as normally an array that has had a
single failed disk would require a ra-add and resync.  Then when the
problem re-occured I noticed that it was not the same disk that was
flagged faulty, next reboot; the faulty disk flipped back the other
way... and so on. This was what prompted me to look closer at the
kernel. Here I found my answer at the SATA controller. Therefore
although mdraids design approach did not cause me any data loss, it
did have me looking in the wrong direction for the fault, assuming a
disk problem.

I have still not been able to successfully --stop the array. I think
the issue sits in the LVM domain. Although I can not be 100% sure.
What I have achieved is some level of understanding that some process
that starts a boot time is in some unknown manner holding a lock on
the mdarid  - devmapper - LVM combination. I have unmounted the file
system, but LVM refuses to let go of the logical volume. Therefore so
does dev-mapper and of course mdraid. I have systematically stopped or
killed almost every single running process on the system, taking it
back to a skeleton, with not much more than init running, it still
refuses to let go.

However, when I prevent auto-mounting of the raid array at boot, and
then manually assemble the raid array, LVM finds its meta data, builds
the VG and LV and mounts. If I then manually force the exact same SATA
controller failure, which results in the exact same mdraid behaviour,
I am then able to unmount the filesystem and ...... hey presto
deactivate the LVM LV. Which then allows me to --stop the mdraid. Just
as I want. Again it does not solve my SATA hardware issues, but being
at this point does give me options to restart the hardware etc, and
probably, though very messily, get the filesystem up again without a
reboot. The problem being I can not achieve this behaviour without
manually assembling the array after boot. If you have any idea what
could possibly be holding this lock I would be glad to hear.

At this point I'm going to have to try and systematically step through
the boot process and try re-arranging, when the array gets assembled.
My first attempts at this have been to <ignore> the raid array in
mdadm.conf and comment the array out of /etc/fstab. In this way mdraid
inside initramfs does not auto-assemble and LVM does not auto scan for
the VG. Once I am in the real boot sequence, I have created a systemd
mount unit that I can pull in from other systemd units, to change the
point in the boot process when the array is assembled. In this way
hopefully I can influence when other services are interacting with the
array in some way and perhaps find the root cause ......   Work in
progress..but slowly as the fault occurs only very occasionally and I
still need a working server.
All the best.. Aidan

On Mon, 10 Jan 2022 at 10:47, Mariusz Tkaczyk
<mariusz.tkaczyk@linux.intel.com> wrote:
>
> On Fri, 7 Jan 2022 23:30:31 +0100
> Aidan Walton <aidan.walton@gmail.com> wrote:
>
> > Hi,
> > I have a system running:
> > Ubuntu Server 20.04.3 LTS on a 5.4.0-92-generic kernel.
> >
> > On the motherboard is a:
> > SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller
> > (rev 02)
> >
> > Connected to this I have:
> > 2x Western Digital - WDC WD5001AALS-00L3B2, BIOS :01.03B01 500Gb
> > drives
> >
> > Each drive has a single partition and is part of a RAID1 array:
> > /dev/md90:
> > .....
> >     Number   Major   Minor   RaidDevice State
> >        0       8       33        0      active sync   /dev/sdc1
> >        2       8       49        1      active sync   /dev/sdd1
> >
> > On top of this I have a single LVM VG and LV. (Probably not important)
> >
> > I have noticed some strange behaviour and upon investigation I find
> > the md device in the following state:
> > /dev/md90:
> > ....
> >
> >     Number   Major   Minor   RaidDevice State
> >        0       8       33        0      active sync   /dev/sdc1
> >        -       0        0        1      removed
> >
> >        2       8       49        -      faulty   /dev/sdd1
> >
> >
> > In fact neither /dev/sdc1 or /dev/sdd1 are available. In fact neither
> > are /dev/sdc or /dev/sdd, the physical drives, as they both been
> > disconnected by the kernel:
> > /dev/sdc is attached to ata7:00  and  /dev/sdd is attached to ata.8:00
> > This is the log of the kernel events:
> >
> >
> > Jan 07 22:09:03 mx kernel: ata7.00: exception Emask 0x32 SAct 0x0 SErr
> > 0x0 action 0xe frozen
> > Jan 07 22:09:03 mx kernel: ata7.00: irq_stat 0xffffffff, unknown FIS
> > 00000000 00000000 00000000 00000000, host bus
> > Jan 07 22:09:03 mx kernel: ata7.00: failed command: READ DMA
> > Jan 07 22:09:03 mx kernel: ata7.00: cmd
> > c8/00:00:00:cf:26/00:00:00:00:00/e0 tag 18 dma 131072 in
> > Jan 07 22:09:03 mx kernel: ata7.00: status: { DRDY }
> > Jan 07 22:09:03 mx kernel: ata7: hard resetting link
> > Jan 07 22:09:03 mx kernel: ata7: SATA link up 1.5 Gbps (SStatus 113
> > SControl 310)
> > Jan 07 22:09:09 mx kernel: ata7.00: qc timeout (cmd 0xec)
> > Jan 07 22:09:09 mx kernel: ata7.00: failed to IDENTIFY (I/O error,
> > err_mask=0x4) Jan 07 22:09:09 mx kernel: ata7.00: revalidation failed
> > (errno=-5) Jan 07 22:09:09 mx kernel: ata7: hard resetting link
> > Jan 07 22:09:19 mx kernel: ata7: softreset failed (1st FIS failed)
> > Jan 07 22:09:19 mx kernel: ata7: hard resetting link
> > Jan 07 22:09:29 mx kernel: ata7: softreset failed (1st FIS failed)
> > Jan 07 22:09:29 mx kernel: ata7: hard resetting link
> > Jan 07 22:09:35 mx kernel: ata8.00: exception Emask 0x40 SAct 0x0 SErr
> > 0x800 action 0x6 frozen
> > Jan 07 22:09:35 mx kernel: ata8: SError: { HostInt }
> > Jan 07 22:09:35 mx kernel: ata8.00: failed command: READ DMA
> > Jan 07 22:09:35 mx kernel: ata8.00: cmd
> > c8/00:00:00:64:4a/00:00:00:00:00/e0 tag 2 dma 131072 in
> > Jan 07 22:09:35 mx kernel: ata8.00: status: { DRDY }
> > Jan 07 22:09:35 mx kernel: ata8: hard resetting link
> > Jan 07 22:09:45 mx kernel: ata8: softreset failed (1st FIS failed)
> > Jan 07 22:09:45 mx kernel: ata8: hard resetting link
> > Jan 07 22:09:55 mx kernel: ata8: softreset failed (1st FIS failed)
> > Jan 07 22:09:55 mx kernel: ata8: hard resetting link
> > Jan 07 22:10:04 mx kernel: ata7: softreset failed (1st FIS failed)
> > Jan 07 22:10:04 mx kernel: ata7: hard resetting link
> > Jan 07 22:10:09 mx kernel: ata7: softreset failed (1st FIS failed)
> > Jan 07 22:10:09 mx kernel: ata7: reset failed, giving up
> > Jan 07 22:10:09 mx kernel: ata7.00: disabled
> > Jan 07 22:10:09 mx kernel: ata7: EH complete
> > Jan 07 22:10:30 mx kernel: ata8: softreset failed (1st FIS failed)
> > Jan 07 22:10:30 mx kernel: ata8: hard resetting link
> > Jan 07 22:10:35 mx kernel: ata8: softreset failed (1st FIS failed)
> > Jan 07 22:10:35 mx kernel: ata8: reset failed, giving up
> > Jan 07 22:10:35 mx kernel: ata8.00: disabled
> > Jan 07 22:10:35 mx kernel: ata8: EH complete
> >
> > This is happening because of some issue with the SATA controller on
> > the motherboard. This has not been resolved and probably never will
> > be, I see many others through google search complaining of similar
> > issues with the SATA controller.
> > This failure only occurs when the SATA controller is placed under very
> > heavy load, I have minimised the impact of the problem by not using
> > NCQ, this helps, but it still occurs. Ironically the biggest issue I
> > have is that mdadm "checkarray" is running because of a systemd
> > background process every week or so, and this hammers the disk into
> > failure. Most of the normal daily usage never generates the link
> > resets.
> > Naturally I have changed SATA cables and moved drives around onto
> > different controllers, but alas, it does seem to be the hardware on
> > the motherboard.
> > However as a workaround I was hoping to accept the occasional failure
> > and then using some scripting and 'setpci' I can get the kernel to
> > hard reset the chipset and attach the drives again. I have the process
> > working in terms of getting the kernel to re-attach the drives,
> > but.......
> >
> > Unfortunately mdraid will not let go of them, I can not stop the
> > arrays, and therefore can't rebuild them. If I simply allow the kernel
> > to re-attach the drives the kernel names are swapped over, as
> > something (mdraid) is stopping the kernel re-using the same device
> > names. Anyway being dependent on the same kernel device names is not a
> > great plan anyway, so I was simply trying to get mdadm to reassemble
> > the array as soon as the 'workaround' script gets the drives back in
> > contact with libata (kernel).
> >
> > Plan:
> > 1. Detecting the problem. (mdadm state)
> > 2. Stop the array totally (can NOT do it)
> > 3. reset the chipset across the PCI bus.
> > 4. allow kernel to re-attach drives.
> > 5. re-assemble the md device with mdadm
> > 6. restart, if necessary higher layer services...
> >
> > So why is mdraid holding on to the array:
> >
> > # mdadm --stop /dev/md90
> > mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
> > process, mounted filesystem or active volume group?
> >
> > I can not be 100% sure that something else is using the device, but I
> > can't think of anything that is and I stopped every process I can
> > think of..... Plus why is the array still shown as 'active' when none
> > of its member devices even exist anymore?
> >
> > What I do know is that device mapper (coming down from LVM)  still has
> > an entry in /dev/mapper. But then probably no surprise as /dev/md90
> > the failed array is still an active device node. If you attempt to
> > write to it, I receive I/O errors from the kernel. In fact as far as
> > any higher layer services are concerned md90 and the LVM LV on top of
> > it are still active and working when in reality, they are not. It
> > causes very strange NFS errors and such.
> >
> > mdraid does actually attempt to iteratively remove both partitions
> > when the kernel signals the disable state, but only 1 of them
> > succeeds.
> > I did an strace of the same iterative 'fail:remove' process that
> > mdraid attempts when the kernel issues -- kernel: ata7.00: disabled
> >
> > eg:
> > /sbin/mdadm -If sdc1 --path pci-0000:02:00.0-ata-1
> > mdadm: set device faulty failed for sdc1:  Device or resource busy
> >
> > The only clue is perhaps this line from the strace:
> > openat(AT_FDCWD, "/sys/block/md90/md/dev-sdc1/block/dev", O_RDWR) = -1
> > EACCES (Permission denied)    What is the mdadm command doing that
> > results in a permission problem?
> >
> > So the only way I can get rid of this md raid array is a reboot.
> > Damn!!!
> >
> >
> > Any help is much appreciated.
> > Aidan
> >
> >
> >
> Hi Aidan,
> This is how it is implemented. Drive is not removed if array failure
> will cause array failed. Please see:
> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?id=9a567843f7ce0037bfd4d5fdc58a09d0a527b28b
>
> For RAID1 you can use solution proposed in patch below but IMO it is
> not your problem. Please stop LVM and then try to stop array. To stop
> array it needs to be "free" (all upper handlers are down).
>
> Thanks,
> Mariusz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: md device remains active even when all supporting disks have failed and been disabled by the kernel.
  2022-01-12  1:29   ` Aidan Walton
@ 2022-01-12 22:54     ` Roger Heflin
  2022-01-13  0:03       ` Aidan Walton
  2022-01-13 14:46     ` Mariusz Tkaczyk
  1 sibling, 1 reply; 14+ messages in thread
From: Roger Heflin @ 2022-01-12 22:54 UTC (permalink / raw)
  To: Aidan Walton; +Cc: Mariusz Tkaczyk, Linux RAID

What kind of sata controller do you have?   I know of some 4pt marvell
ones that if you execute smart commands or too many commands against
it will "fault" and drop all of the disks going via that controller.

On Wed, Jan 12, 2022 at 4:10 PM Aidan Walton <aidan.walton@gmail.com> wrote:
>
> Hi Mariusz,
> In my case, the fact that mdraid does not show a 'total failure' is
> not the root of the problem. However in my opinion I would say that
> not having mdraid more accurately reflect the state of the underlying
> hardware can be mis-leading. Initially when I looked at this issue, I
> was convinced that only one disk had failed and I was scratching my
> head about firstly why I still could not R/W the array while it
> appeared to have an active member. Secondly, when I rebooted I noticed
> that the array became instantly synchronised with both members active.
> This was not what I expected, as normally an array that has had a
> single failed disk would require a ra-add and resync.  Then when the
> problem re-occured I noticed that it was not the same disk that was
> flagged faulty, next reboot; the faulty disk flipped back the other
> way... and so on. This was what prompted me to look closer at the
> kernel. Here I found my answer at the SATA controller. Therefore
> although mdraids design approach did not cause me any data loss, it
> did have me looking in the wrong direction for the fault, assuming a
> disk problem.
>
> I have still not been able to successfully --stop the array. I think
> the issue sits in the LVM domain. Although I can not be 100% sure.
> What I have achieved is some level of understanding that some process
> that starts a boot time is in some unknown manner holding a lock on
> the mdarid  - devmapper - LVM combination. I have unmounted the file
> system, but LVM refuses to let go of the logical volume. Therefore so
> does dev-mapper and of course mdraid. I have systematically stopped or
> killed almost every single running process on the system, taking it
> back to a skeleton, with not much more than init running, it still
> refuses to let go.
>
> However, when I prevent auto-mounting of the raid array at boot, and
> then manually assemble the raid array, LVM finds its meta data, builds
> the VG and LV and mounts. If I then manually force the exact same SATA
> controller failure, which results in the exact same mdraid behaviour,
> I am then able to unmount the filesystem and ...... hey presto
> deactivate the LVM LV. Which then allows me to --stop the mdraid. Just
> as I want. Again it does not solve my SATA hardware issues, but being
> at this point does give me options to restart the hardware etc, and
> probably, though very messily, get the filesystem up again without a
> reboot. The problem being I can not achieve this behaviour without
> manually assembling the array after boot. If you have any idea what
> could possibly be holding this lock I would be glad to hear.
>
> At this point I'm going to have to try and systematically step through
> the boot process and try re-arranging, when the array gets assembled.
> My first attempts at this have been to <ignore> the raid array in
> mdadm.conf and comment the array out of /etc/fstab. In this way mdraid
> inside initramfs does not auto-assemble and LVM does not auto scan for
> the VG. Once I am in the real boot sequence, I have created a systemd
> mount unit that I can pull in from other systemd units, to change the
> point in the boot process when the array is assembled. In this way
> hopefully I can influence when other services are interacting with the
> array in some way and perhaps find the root cause ......   Work in
> progress..but slowly as the fault occurs only very occasionally and I
> still need a working server.
> All the best.. Aidan
>
> On Mon, 10 Jan 2022 at 10:47, Mariusz Tkaczyk
> <mariusz.tkaczyk@linux.intel.com> wrote:
> >
> > On Fri, 7 Jan 2022 23:30:31 +0100
> > Aidan Walton <aidan.walton@gmail.com> wrote:
> >
> > > Hi,
> > > I have a system running:
> > > Ubuntu Server 20.04.3 LTS on a 5.4.0-92-generic kernel.
> > >
> > > On the motherboard is a:
> > > SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller
> > > (rev 02)
> > >
> > > Connected to this I have:
> > > 2x Western Digital - WDC WD5001AALS-00L3B2, BIOS :01.03B01 500Gb
> > > drives
> > >
> > > Each drive has a single partition and is part of a RAID1 array:
> > > /dev/md90:
> > > .....
> > >     Number   Major   Minor   RaidDevice State
> > >        0       8       33        0      active sync   /dev/sdc1
> > >        2       8       49        1      active sync   /dev/sdd1
> > >
> > > On top of this I have a single LVM VG and LV. (Probably not important)
> > >
> > > I have noticed some strange behaviour and upon investigation I find
> > > the md device in the following state:
> > > /dev/md90:
> > > ....
> > >
> > >     Number   Major   Minor   RaidDevice State
> > >        0       8       33        0      active sync   /dev/sdc1
> > >        -       0        0        1      removed
> > >
> > >        2       8       49        -      faulty   /dev/sdd1
> > >
> > >
> > > In fact neither /dev/sdc1 or /dev/sdd1 are available. In fact neither
> > > are /dev/sdc or /dev/sdd, the physical drives, as they both been
> > > disconnected by the kernel:
> > > /dev/sdc is attached to ata7:00  and  /dev/sdd is attached to ata.8:00
> > > This is the log of the kernel events:
> > >
> > >
> > > Jan 07 22:09:03 mx kernel: ata7.00: exception Emask 0x32 SAct 0x0 SErr
> > > 0x0 action 0xe frozen
> > > Jan 07 22:09:03 mx kernel: ata7.00: irq_stat 0xffffffff, unknown FIS
> > > 00000000 00000000 00000000 00000000, host bus
> > > Jan 07 22:09:03 mx kernel: ata7.00: failed command: READ DMA
> > > Jan 07 22:09:03 mx kernel: ata7.00: cmd
> > > c8/00:00:00:cf:26/00:00:00:00:00/e0 tag 18 dma 131072 in
> > > Jan 07 22:09:03 mx kernel: ata7.00: status: { DRDY }
> > > Jan 07 22:09:03 mx kernel: ata7: hard resetting link
> > > Jan 07 22:09:03 mx kernel: ata7: SATA link up 1.5 Gbps (SStatus 113
> > > SControl 310)
> > > Jan 07 22:09:09 mx kernel: ata7.00: qc timeout (cmd 0xec)
> > > Jan 07 22:09:09 mx kernel: ata7.00: failed to IDENTIFY (I/O error,
> > > err_mask=0x4) Jan 07 22:09:09 mx kernel: ata7.00: revalidation failed
> > > (errno=-5) Jan 07 22:09:09 mx kernel: ata7: hard resetting link
> > > Jan 07 22:09:19 mx kernel: ata7: softreset failed (1st FIS failed)
> > > Jan 07 22:09:19 mx kernel: ata7: hard resetting link
> > > Jan 07 22:09:29 mx kernel: ata7: softreset failed (1st FIS failed)
> > > Jan 07 22:09:29 mx kernel: ata7: hard resetting link
> > > Jan 07 22:09:35 mx kernel: ata8.00: exception Emask 0x40 SAct 0x0 SErr
> > > 0x800 action 0x6 frozen
> > > Jan 07 22:09:35 mx kernel: ata8: SError: { HostInt }
> > > Jan 07 22:09:35 mx kernel: ata8.00: failed command: READ DMA
> > > Jan 07 22:09:35 mx kernel: ata8.00: cmd
> > > c8/00:00:00:64:4a/00:00:00:00:00/e0 tag 2 dma 131072 in
> > > Jan 07 22:09:35 mx kernel: ata8.00: status: { DRDY }
> > > Jan 07 22:09:35 mx kernel: ata8: hard resetting link
> > > Jan 07 22:09:45 mx kernel: ata8: softreset failed (1st FIS failed)
> > > Jan 07 22:09:45 mx kernel: ata8: hard resetting link
> > > Jan 07 22:09:55 mx kernel: ata8: softreset failed (1st FIS failed)
> > > Jan 07 22:09:55 mx kernel: ata8: hard resetting link
> > > Jan 07 22:10:04 mx kernel: ata7: softreset failed (1st FIS failed)
> > > Jan 07 22:10:04 mx kernel: ata7: hard resetting link
> > > Jan 07 22:10:09 mx kernel: ata7: softreset failed (1st FIS failed)
> > > Jan 07 22:10:09 mx kernel: ata7: reset failed, giving up
> > > Jan 07 22:10:09 mx kernel: ata7.00: disabled
> > > Jan 07 22:10:09 mx kernel: ata7: EH complete
> > > Jan 07 22:10:30 mx kernel: ata8: softreset failed (1st FIS failed)
> > > Jan 07 22:10:30 mx kernel: ata8: hard resetting link
> > > Jan 07 22:10:35 mx kernel: ata8: softreset failed (1st FIS failed)
> > > Jan 07 22:10:35 mx kernel: ata8: reset failed, giving up
> > > Jan 07 22:10:35 mx kernel: ata8.00: disabled
> > > Jan 07 22:10:35 mx kernel: ata8: EH complete
> > >
> > > This is happening because of some issue with the SATA controller on
> > > the motherboard. This has not been resolved and probably never will
> > > be, I see many others through google search complaining of similar
> > > issues with the SATA controller.
> > > This failure only occurs when the SATA controller is placed under very
> > > heavy load, I have minimised the impact of the problem by not using
> > > NCQ, this helps, but it still occurs. Ironically the biggest issue I
> > > have is that mdadm "checkarray" is running because of a systemd
> > > background process every week or so, and this hammers the disk into
> > > failure. Most of the normal daily usage never generates the link
> > > resets.
> > > Naturally I have changed SATA cables and moved drives around onto
> > > different controllers, but alas, it does seem to be the hardware on
> > > the motherboard.
> > > However as a workaround I was hoping to accept the occasional failure
> > > and then using some scripting and 'setpci' I can get the kernel to
> > > hard reset the chipset and attach the drives again. I have the process
> > > working in terms of getting the kernel to re-attach the drives,
> > > but.......
> > >
> > > Unfortunately mdraid will not let go of them, I can not stop the
> > > arrays, and therefore can't rebuild them. If I simply allow the kernel
> > > to re-attach the drives the kernel names are swapped over, as
> > > something (mdraid) is stopping the kernel re-using the same device
> > > names. Anyway being dependent on the same kernel device names is not a
> > > great plan anyway, so I was simply trying to get mdadm to reassemble
> > > the array as soon as the 'workaround' script gets the drives back in
> > > contact with libata (kernel).
> > >
> > > Plan:
> > > 1. Detecting the problem. (mdadm state)
> > > 2. Stop the array totally (can NOT do it)
> > > 3. reset the chipset across the PCI bus.
> > > 4. allow kernel to re-attach drives.
> > > 5. re-assemble the md device with mdadm
> > > 6. restart, if necessary higher layer services...
> > >
> > > So why is mdraid holding on to the array:
> > >
> > > # mdadm --stop /dev/md90
> > > mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
> > > process, mounted filesystem or active volume group?
> > >
> > > I can not be 100% sure that something else is using the device, but I
> > > can't think of anything that is and I stopped every process I can
> > > think of..... Plus why is the array still shown as 'active' when none
> > > of its member devices even exist anymore?
> > >
> > > What I do know is that device mapper (coming down from LVM)  still has
> > > an entry in /dev/mapper. But then probably no surprise as /dev/md90
> > > the failed array is still an active device node. If you attempt to
> > > write to it, I receive I/O errors from the kernel. In fact as far as
> > > any higher layer services are concerned md90 and the LVM LV on top of
> > > it are still active and working when in reality, they are not. It
> > > causes very strange NFS errors and such.
> > >
> > > mdraid does actually attempt to iteratively remove both partitions
> > > when the kernel signals the disable state, but only 1 of them
> > > succeeds.
> > > I did an strace of the same iterative 'fail:remove' process that
> > > mdraid attempts when the kernel issues -- kernel: ata7.00: disabled
> > >
> > > eg:
> > > /sbin/mdadm -If sdc1 --path pci-0000:02:00.0-ata-1
> > > mdadm: set device faulty failed for sdc1:  Device or resource busy
> > >
> > > The only clue is perhaps this line from the strace:
> > > openat(AT_FDCWD, "/sys/block/md90/md/dev-sdc1/block/dev", O_RDWR) = -1
> > > EACCES (Permission denied)    What is the mdadm command doing that
> > > results in a permission problem?
> > >
> > > So the only way I can get rid of this md raid array is a reboot.
> > > Damn!!!
> > >
> > >
> > > Any help is much appreciated.
> > > Aidan
> > >
> > >
> > >
> > Hi Aidan,
> > This is how it is implemented. Drive is not removed if array failure
> > will cause array failed. Please see:
> > https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?id=9a567843f7ce0037bfd4d5fdc58a09d0a527b28b
> >
> > For RAID1 you can use solution proposed in patch below but IMO it is
> > not your problem. Please stop LVM and then try to stop array. To stop
> > array it needs to be "free" (all upper handlers are down).
> >
> > Thanks,
> > Mariusz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: md device remains active even when all supporting disks have failed and been disabled by the kernel.
  2022-01-12 22:54     ` Roger Heflin
@ 2022-01-13  0:03       ` Aidan Walton
  2022-01-13  0:23         ` Jani Partanen
  2022-01-14 17:05         ` John Stoffel
  0 siblings, 2 replies; 14+ messages in thread
From: Aidan Walton @ 2022-01-13  0:03 UTC (permalink / raw)
  To: Roger Heflin; +Cc: Mariusz Tkaczyk, Linux RAID

Hi Roger,
As I mentioned, it is a:
JMicron Technology Corp. JMB363

The reason for it failing is not entirely clear, but I'm fairly sure
it's not related to your suggestions. I can make it fail at will by
using:
 dd if=/dev/sdc of=/dev/null bs=64k count=10000   and  dd if=/dev/sdd
of=/dev/null bs=64k count=10000

Now as I increase the block size of dd, the rate of errors and resets
reduces, in what seems to be a rather linear manner. (Not in any way
scientifically proven) but there is a clear relationship between
commands/sec and resets.

If I use dd on only one disk, I get no errors and no resets, and this
applies to either disk.

Thanks for the interest, however I was not trying to digress off topic
too much. I don't think the linux-raid list is responsible for all
possible controllers out there. However the interactions between
mdraid and LVM is important and this is actually why I posted
initially.

Essentially I'm hoping someone comes back and says....hey did you
check that such a file under sys/.... is clear or not etc.
i.e. My instinct tells me somehow the kernel is holding a lock on LVM
and the mount/umount commands don't seem to care. They do what is
expected and /proc/mounts is updated accordingly. That is to say, file
locking is not the problem, but something is that I have missed and I
dont have the necessary knowledge to focus down onto it.

On Wed, 12 Jan 2022 at 23:54, Roger Heflin <rogerheflin@gmail.com> wrote:
>
> What kind of sata controller do you have?   I know of some 4pt marvell
> ones that if you execute smart commands or too many commands against
> it will "fault" and drop all of the disks going via that controller.
>
> On Wed, Jan 12, 2022 at 4:10 PM Aidan Walton <aidan.walton@gmail.com> wrote:
> >
> > Hi Mariusz,
> > In my case, the fact that mdraid does not show a 'total failure' is
> > not the root of the problem. However in my opinion I would say that
> > not having mdraid more accurately reflect the state of the underlying
> > hardware can be mis-leading. Initially when I looked at this issue, I
> > was convinced that only one disk had failed and I was scratching my
> > head about firstly why I still could not R/W the array while it
> > appeared to have an active member. Secondly, when I rebooted I noticed
> > that the array became instantly synchronised with both members active.
> > This was not what I expected, as normally an array that has had a
> > single failed disk would require a ra-add and resync.  Then when the
> > problem re-occured I noticed that it was not the same disk that was
> > flagged faulty, next reboot; the faulty disk flipped back the other
> > way... and so on. This was what prompted me to look closer at the
> > kernel. Here I found my answer at the SATA controller. Therefore
> > although mdraids design approach did not cause me any data loss, it
> > did have me looking in the wrong direction for the fault, assuming a
> > disk problem.
> >
> > I have still not been able to successfully --stop the array. I think
> > the issue sits in the LVM domain. Although I can not be 100% sure.
> > What I have achieved is some level of understanding that some process
> > that starts a boot time is in some unknown manner holding a lock on
> > the mdarid  - devmapper - LVM combination. I have unmounted the file
> > system, but LVM refuses to let go of the logical volume. Therefore so
> > does dev-mapper and of course mdraid. I have systematically stopped or
> > killed almost every single running process on the system, taking it
> > back to a skeleton, with not much more than init running, it still
> > refuses to let go.
> >
> > However, when I prevent auto-mounting of the raid array at boot, and
> > then manually assemble the raid array, LVM finds its meta data, builds
> > the VG and LV and mounts. If I then manually force the exact same SATA
> > controller failure, which results in the exact same mdraid behaviour,
> > I am then able to unmount the filesystem and ...... hey presto
> > deactivate the LVM LV. Which then allows me to --stop the mdraid. Just
> > as I want. Again it does not solve my SATA hardware issues, but being
> > at this point does give me options to restart the hardware etc, and
> > probably, though very messily, get the filesystem up again without a
> > reboot. The problem being I can not achieve this behaviour without
> > manually assembling the array after boot. If you have any idea what
> > could possibly be holding this lock I would be glad to hear.
> >
> > At this point I'm going to have to try and systematically step through
> > the boot process and try re-arranging, when the array gets assembled.
> > My first attempts at this have been to <ignore> the raid array in
> > mdadm.conf and comment the array out of /etc/fstab. In this way mdraid
> > inside initramfs does not auto-assemble and LVM does not auto scan for
> > the VG. Once I am in the real boot sequence, I have created a systemd
> > mount unit that I can pull in from other systemd units, to change the
> > point in the boot process when the array is assembled. In this way
> > hopefully I can influence when other services are interacting with the
> > array in some way and perhaps find the root cause ......   Work in
> > progress..but slowly as the fault occurs only very occasionally and I
> > still need a working server.
> > All the best.. Aidan
> >
> > On Mon, 10 Jan 2022 at 10:47, Mariusz Tkaczyk
> > <mariusz.tkaczyk@linux.intel.com> wrote:
> > >
> > > On Fri, 7 Jan 2022 23:30:31 +0100
> > > Aidan Walton <aidan.walton@gmail.com> wrote:
> > >
> > > > Hi,
> > > > I have a system running:
> > > > Ubuntu Server 20.04.3 LTS on a 5.4.0-92-generic kernel.
> > > >
> > > > On the motherboard is a:
> > > > SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller
> > > > (rev 02)
> > > >
> > > > Connected to this I have:
> > > > 2x Western Digital - WDC WD5001AALS-00L3B2, BIOS :01.03B01 500Gb
> > > > drives
> > > >
> > > > Each drive has a single partition and is part of a RAID1 array:
> > > > /dev/md90:
> > > > .....
> > > >     Number   Major   Minor   RaidDevice State
> > > >        0       8       33        0      active sync   /dev/sdc1
> > > >        2       8       49        1      active sync   /dev/sdd1
> > > >
> > > > On top of this I have a single LVM VG and LV. (Probably not important)
> > > >
> > > > I have noticed some strange behaviour and upon investigation I find
> > > > the md device in the following state:
> > > > /dev/md90:
> > > > ....
> > > >
> > > >     Number   Major   Minor   RaidDevice State
> > > >        0       8       33        0      active sync   /dev/sdc1
> > > >        -       0        0        1      removed
> > > >
> > > >        2       8       49        -      faulty   /dev/sdd1
> > > >
> > > >
> > > > In fact neither /dev/sdc1 or /dev/sdd1 are available. In fact neither
> > > > are /dev/sdc or /dev/sdd, the physical drives, as they both been
> > > > disconnected by the kernel:
> > > > /dev/sdc is attached to ata7:00  and  /dev/sdd is attached to ata.8:00
> > > > This is the log of the kernel events:
> > > >
> > > >
> > > > Jan 07 22:09:03 mx kernel: ata7.00: exception Emask 0x32 SAct 0x0 SErr
> > > > 0x0 action 0xe frozen
> > > > Jan 07 22:09:03 mx kernel: ata7.00: irq_stat 0xffffffff, unknown FIS
> > > > 00000000 00000000 00000000 00000000, host bus
> > > > Jan 07 22:09:03 mx kernel: ata7.00: failed command: READ DMA
> > > > Jan 07 22:09:03 mx kernel: ata7.00: cmd
> > > > c8/00:00:00:cf:26/00:00:00:00:00/e0 tag 18 dma 131072 in
> > > > Jan 07 22:09:03 mx kernel: ata7.00: status: { DRDY }
> > > > Jan 07 22:09:03 mx kernel: ata7: hard resetting link
> > > > Jan 07 22:09:03 mx kernel: ata7: SATA link up 1.5 Gbps (SStatus 113
> > > > SControl 310)
> > > > Jan 07 22:09:09 mx kernel: ata7.00: qc timeout (cmd 0xec)
> > > > Jan 07 22:09:09 mx kernel: ata7.00: failed to IDENTIFY (I/O error,
> > > > err_mask=0x4) Jan 07 22:09:09 mx kernel: ata7.00: revalidation failed
> > > > (errno=-5) Jan 07 22:09:09 mx kernel: ata7: hard resetting link
> > > > Jan 07 22:09:19 mx kernel: ata7: softreset failed (1st FIS failed)
> > > > Jan 07 22:09:19 mx kernel: ata7: hard resetting link
> > > > Jan 07 22:09:29 mx kernel: ata7: softreset failed (1st FIS failed)
> > > > Jan 07 22:09:29 mx kernel: ata7: hard resetting link
> > > > Jan 07 22:09:35 mx kernel: ata8.00: exception Emask 0x40 SAct 0x0 SErr
> > > > 0x800 action 0x6 frozen
> > > > Jan 07 22:09:35 mx kernel: ata8: SError: { HostInt }
> > > > Jan 07 22:09:35 mx kernel: ata8.00: failed command: READ DMA
> > > > Jan 07 22:09:35 mx kernel: ata8.00: cmd
> > > > c8/00:00:00:64:4a/00:00:00:00:00/e0 tag 2 dma 131072 in
> > > > Jan 07 22:09:35 mx kernel: ata8.00: status: { DRDY }
> > > > Jan 07 22:09:35 mx kernel: ata8: hard resetting link
> > > > Jan 07 22:09:45 mx kernel: ata8: softreset failed (1st FIS failed)
> > > > Jan 07 22:09:45 mx kernel: ata8: hard resetting link
> > > > Jan 07 22:09:55 mx kernel: ata8: softreset failed (1st FIS failed)
> > > > Jan 07 22:09:55 mx kernel: ata8: hard resetting link
> > > > Jan 07 22:10:04 mx kernel: ata7: softreset failed (1st FIS failed)
> > > > Jan 07 22:10:04 mx kernel: ata7: hard resetting link
> > > > Jan 07 22:10:09 mx kernel: ata7: softreset failed (1st FIS failed)
> > > > Jan 07 22:10:09 mx kernel: ata7: reset failed, giving up
> > > > Jan 07 22:10:09 mx kernel: ata7.00: disabled
> > > > Jan 07 22:10:09 mx kernel: ata7: EH complete
> > > > Jan 07 22:10:30 mx kernel: ata8: softreset failed (1st FIS failed)
> > > > Jan 07 22:10:30 mx kernel: ata8: hard resetting link
> > > > Jan 07 22:10:35 mx kernel: ata8: softreset failed (1st FIS failed)
> > > > Jan 07 22:10:35 mx kernel: ata8: reset failed, giving up
> > > > Jan 07 22:10:35 mx kernel: ata8.00: disabled
> > > > Jan 07 22:10:35 mx kernel: ata8: EH complete
> > > >
> > > > This is happening because of some issue with the SATA controller on
> > > > the motherboard. This has not been resolved and probably never will
> > > > be, I see many others through google search complaining of similar
> > > > issues with the SATA controller.
> > > > This failure only occurs when the SATA controller is placed under very
> > > > heavy load, I have minimised the impact of the problem by not using
> > > > NCQ, this helps, but it still occurs. Ironically the biggest issue I
> > > > have is that mdadm "checkarray" is running because of a systemd
> > > > background process every week or so, and this hammers the disk into
> > > > failure. Most of the normal daily usage never generates the link
> > > > resets.
> > > > Naturally I have changed SATA cables and moved drives around onto
> > > > different controllers, but alas, it does seem to be the hardware on
> > > > the motherboard.
> > > > However as a workaround I was hoping to accept the occasional failure
> > > > and then using some scripting and 'setpci' I can get the kernel to
> > > > hard reset the chipset and attach the drives again. I have the process
> > > > working in terms of getting the kernel to re-attach the drives,
> > > > but.......
> > > >
> > > > Unfortunately mdraid will not let go of them, I can not stop the
> > > > arrays, and therefore can't rebuild them. If I simply allow the kernel
> > > > to re-attach the drives the kernel names are swapped over, as
> > > > something (mdraid) is stopping the kernel re-using the same device
> > > > names. Anyway being dependent on the same kernel device names is not a
> > > > great plan anyway, so I was simply trying to get mdadm to reassemble
> > > > the array as soon as the 'workaround' script gets the drives back in
> > > > contact with libata (kernel).
> > > >
> > > > Plan:
> > > > 1. Detecting the problem. (mdadm state)
> > > > 2. Stop the array totally (can NOT do it)
> > > > 3. reset the chipset across the PCI bus.
> > > > 4. allow kernel to re-attach drives.
> > > > 5. re-assemble the md device with mdadm
> > > > 6. restart, if necessary higher layer services...
> > > >
> > > > So why is mdraid holding on to the array:
> > > >
> > > > # mdadm --stop /dev/md90
> > > > mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
> > > > process, mounted filesystem or active volume group?
> > > >
> > > > I can not be 100% sure that something else is using the device, but I
> > > > can't think of anything that is and I stopped every process I can
> > > > think of..... Plus why is the array still shown as 'active' when none
> > > > of its member devices even exist anymore?
> > > >
> > > > What I do know is that device mapper (coming down from LVM)  still has
> > > > an entry in /dev/mapper. But then probably no surprise as /dev/md90
> > > > the failed array is still an active device node. If you attempt to
> > > > write to it, I receive I/O errors from the kernel. In fact as far as
> > > > any higher layer services are concerned md90 and the LVM LV on top of
> > > > it are still active and working when in reality, they are not. It
> > > > causes very strange NFS errors and such.
> > > >
> > > > mdraid does actually attempt to iteratively remove both partitions
> > > > when the kernel signals the disable state, but only 1 of them
> > > > succeeds.
> > > > I did an strace of the same iterative 'fail:remove' process that
> > > > mdraid attempts when the kernel issues -- kernel: ata7.00: disabled
> > > >
> > > > eg:
> > > > /sbin/mdadm -If sdc1 --path pci-0000:02:00.0-ata-1
> > > > mdadm: set device faulty failed for sdc1:  Device or resource busy
> > > >
> > > > The only clue is perhaps this line from the strace:
> > > > openat(AT_FDCWD, "/sys/block/md90/md/dev-sdc1/block/dev", O_RDWR) = -1
> > > > EACCES (Permission denied)    What is the mdadm command doing that
> > > > results in a permission problem?
> > > >
> > > > So the only way I can get rid of this md raid array is a reboot.
> > > > Damn!!!
> > > >
> > > >
> > > > Any help is much appreciated.
> > > > Aidan
> > > >
> > > >
> > > >
> > > Hi Aidan,
> > > This is how it is implemented. Drive is not removed if array failure
> > > will cause array failed. Please see:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?id=9a567843f7ce0037bfd4d5fdc58a09d0a527b28b
> > >
> > > For RAID1 you can use solution proposed in patch below but IMO it is
> > > not your problem. Please stop LVM and then try to stop array. To stop
> > > array it needs to be "free" (all upper handlers are down).
> > >
> > > Thanks,
> > > Mariusz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: md device remains active even when all supporting disks have failed and been disabled by the kernel.
  2022-01-13  0:03       ` Aidan Walton
@ 2022-01-13  0:23         ` Jani Partanen
  2022-01-13  0:45           ` Aidan Walton
  2022-01-14 17:05         ` John Stoffel
  1 sibling, 1 reply; 14+ messages in thread
From: Jani Partanen @ 2022-01-13  0:23 UTC (permalink / raw)
  To: Aidan Walton, Roger Heflin; +Cc: Mariusz Tkaczyk, Linux RAID

Aidan Walton kirjoitti 13/01/2022 klo 2.03:
> Hi Roger,
> As I mentioned, it is a:
> JMicron Technology Corp. JMB363

If my memory is correct. This chip is trash. If I am right, I have same 
chip on cheap controller what I bought to get IDE controller mainly, it 
also has 2xSATA ports and I had only troubles with that controller in linux.
If I had no drives on SATA ports, dmesg got spammed port errors. If I 
had drives installed, they was randomly dropping in and out. Chip itself 
was running hot as hell, I mean so hot that you could not keep finger on 
chip. I did put some small finns to it and I think it did help little 
for drives dropping in and out, but didn't solve it.
If you want a cheap controller, get this chip:
02:00.0 SATA controller: ASMedia Technology Inc. Device 1166 (rev 02)

It's been quite stable for me and didn't cost much. Only issue I have 
that I cannot force it's ports transfer speed to example 3.0G. Spinnin 
rust do not need 6.0G speed what in my experience increase chance for 
dropouts if you cables aren't 6.0G ready and how do you tell if they are 
when cable doesn't have any mention about it?
I think it's something to do that it's PMP card and I don't understand 
how you have to format kernel parameter when you have PMP situation. I 
can tune internal controller all ports just fine, but not this expansion 
card. Kernel tell you it's forcing them to 3.0G, but when you check 
later what transfer speed is, it's still 6.0G



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: md device remains active even when all supporting disks have failed and been disabled by the kernel.
  2022-01-13  0:23         ` Jani Partanen
@ 2022-01-13  0:45           ` Aidan Walton
  2022-01-13 18:26             ` Roger Heflin
  0 siblings, 1 reply; 14+ messages in thread
From: Aidan Walton @ 2022-01-13  0:45 UTC (permalink / raw)
  To: Jani Partanen; +Cc: Roger Heflin, Mariusz Tkaczyk, Linux RAID

Thanks Jani,
Gold info. I have been looking at 2 cheap Mini-PCIe adapters. One uses
the JMicron and would you believe it the other is using ASMedia.
Now I have a solid reason to choose the second rather than the first.

In this case it will NOT be,  "Better the devil you know than the
devil you don't"
and anyway I don't work hard enough to need 6Gb/s :)
ATB
Aidan

On Thu, 13 Jan 2022 at 01:24, Jani Partanen <jiipee@sotapeli.fi> wrote:
>
> Aidan Walton kirjoitti 13/01/2022 klo 2.03:
> > Hi Roger,
> > As I mentioned, it is a:
> > JMicron Technology Corp. JMB363
>
> If my memory is correct. This chip is trash. If I am right, I have same
> chip on cheap controller what I bought to get IDE controller mainly, it
> also has 2xSATA ports and I had only troubles with that controller in linux.
> If I had no drives on SATA ports, dmesg got spammed port errors. If I
> had drives installed, they was randomly dropping in and out. Chip itself
> was running hot as hell, I mean so hot that you could not keep finger on
> chip. I did put some small finns to it and I think it did help little
> for drives dropping in and out, but didn't solve it.
> If you want a cheap controller, get this chip:
> 02:00.0 SATA controller: ASMedia Technology Inc. Device 1166 (rev 02)
>
> It's been quite stable for me and didn't cost much. Only issue I have
> that I cannot force it's ports transfer speed to example 3.0G. Spinnin
> rust do not need 6.0G speed what in my experience increase chance for
> dropouts if you cables aren't 6.0G ready and how do you tell if they are
> when cable doesn't have any mention about it?
> I think it's something to do that it's PMP card and I don't understand
> how you have to format kernel parameter when you have PMP situation. I
> can tune internal controller all ports just fine, but not this expansion
> card. Kernel tell you it's forcing them to 3.0G, but when you check
> later what transfer speed is, it's still 6.0G
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: md device remains active even when all supporting disks have failed and been disabled by the kernel.
  2022-01-12  1:29   ` Aidan Walton
  2022-01-12 22:54     ` Roger Heflin
@ 2022-01-13 14:46     ` Mariusz Tkaczyk
  2022-01-13 16:35       ` Aidan Walton
  1 sibling, 1 reply; 14+ messages in thread
From: Mariusz Tkaczyk @ 2022-01-13 14:46 UTC (permalink / raw)
  To: Aidan Walton; +Cc: linux-raid

Hi Aidan,

On Wed, 12 Jan 2022 02:29:47 +0100
Aidan Walton <aidan.walton@gmail.com> wrote:

> Hi Mariusz,
> In my case, the fact that mdraid does not show a 'total failure' is
> not the root of the problem. However in my opinion I would say that
> not having mdraid more accurately reflect the state of the underlying
> hardware can be mis-leading. Initially when I looked at this issue, I
> was convinced that only one disk had failed and I was scratching my
> head about firstly why I still could not R/W the array while it
> appeared to have an active member. Secondly, when I rebooted I noticed
> that the array became instantly synchronised with both members active.

We have raid1 so first fail should be recorded in metadata. From your
description, I understand that nothing like this happened. For me, it
seems that the controller lost both drives in the same moment and as a
result nothing was saved. After reboot raid is assembled without
rebuild because metadata on both members is valid.

> This was not what I expected, as normally an array that has had a
> single failed disk would require a ra-add and resync.  Then when the
> problem re-occured I noticed that it was not the same disk that was
> flagged faulty, next reboot; the faulty disk flipped back the other
> way... and so on. This was what prompted me to look closer at the
> kernel. Here I found my answer at the SATA controller. Therefore
> although mdraids design approach did not cause me any data loss, it
> did have me looking in the wrong direction for the fault, assuming a
> disk problem.
> 
> I have still not been able to successfully --stop the array. I think
> the issue sits in the LVM domain. Although I can not be 100% sure.
> What I have achieved is some level of understanding that some process
> that starts a boot time is in some unknown manner holding a lock on
> the mdarid  - devmapper - LVM combination. I have unmounted the file
> system, but LVM refuses to let go of the logical volume. Therefore so
> does dev-mapper and of course mdraid. I have systematically stopped or
> killed almost every single running process on the system, taking it
> back to a skeleton, with not much more than init running, it still
> refuses to let go
> 
> However, when I prevent auto-mounting of the raid array at boot, and
> then manually assemble the raid array, LVM finds its meta data, builds
> the VG and LV and mounts. If I then manually force the exact same SATA
> controller failure, which results in the exact same mdraid behaviour,
> I am then able to unmount the filesystem and ...... hey presto
> deactivate the LVM LV. Which then allows me to --stop the mdraid. Just
> as I want. Again it does not solve my SATA hardware issues, but being
> at this point does give me options to restart the hardware etc, and
> probably, though very messily, get the filesystem up again without a
> reboot. The problem being I can not achieve this behaviour without
> manually assembling the array after boot. If you have any idea what
> could possibly be holding this lock I would be glad to hear.
> 
Could you connect to the udev monitor and analyze events triggered in
both cases? This is the one idea I have.

Thanks,
Mariusz

> At this point I'm going to have to try and systematically step through
> the boot process and try re-arranging, when the array gets assembled.
> My first attempts at this have been to <ignore> the raid array in
> mdadm.conf and comment the array out of /etc/fstab. In this way mdraid
> inside initramfs does not auto-assemble and LVM does not auto scan for
> the VG. Once I am in the real boot sequence, I have created a systemd
> mount unit that I can pull in from other systemd units, to change the
> point in the boot process when the array is assembled. In this way
> hopefully I can influence when other services are interacting with the
> array in some way and perhaps find the root cause ......   Work in
> progress..but slowly as the fault occurs only very occasionally and I
> still need a working server.
> All the best.. Aidan
> 
> On Mon, 10 Jan 2022 at 10:47, Mariusz Tkaczyk
> <mariusz.tkaczyk@linux.intel.com> wrote:
> >
> > On Fri, 7 Jan 2022 23:30:31 +0100
> > Aidan Walton <aidan.walton@gmail.com> wrote:
> >  
> > > Hi,
> > > I have a system running:
> > > Ubuntu Server 20.04.3 LTS on a 5.4.0-92-generic kernel.
> > >
> > > On the motherboard is a:
> > > SATA controller: JMicron Technology Corp. JMB363 SATA/IDE
> > > Controller (rev 02)
> > >
> > > Connected to this I have:
> > > 2x Western Digital - WDC WD5001AALS-00L3B2, BIOS :01.03B01 500Gb
> > > drives
> > >
> > > Each drive has a single partition and is part of a RAID1 array:
> > > /dev/md90:
> > > .....
> > >     Number   Major   Minor   RaidDevice State
> > >        0       8       33        0      active sync   /dev/sdc1
> > >        2       8       49        1      active sync   /dev/sdd1
> > >
> > > On top of this I have a single LVM VG and LV. (Probably not
> > > important)
> > >
> > > I have noticed some strange behaviour and upon investigation I
> > > find the md device in the following state:
> > > /dev/md90:
> > > ....
> > >
> > >     Number   Major   Minor   RaidDevice State
> > >        0       8       33        0      active sync   /dev/sdc1
> > >        -       0        0        1      removed
> > >
> > >        2       8       49        -      faulty   /dev/sdd1
> > >
> > >
> > > In fact neither /dev/sdc1 or /dev/sdd1 are available. In fact
> > > neither are /dev/sdc or /dev/sdd, the physical drives, as they
> > > both been disconnected by the kernel:
> > > /dev/sdc is attached to ata7:00  and  /dev/sdd is attached to
> > > ata.8:00 This is the log of the kernel events:
> > >
> > >
> > > Jan 07 22:09:03 mx kernel: ata7.00: exception Emask 0x32 SAct 0x0
> > > SErr 0x0 action 0xe frozen
> > > Jan 07 22:09:03 mx kernel: ata7.00: irq_stat 0xffffffff, unknown
> > > FIS 00000000 00000000 00000000 00000000, host bus
> > > Jan 07 22:09:03 mx kernel: ata7.00: failed command: READ DMA
> > > Jan 07 22:09:03 mx kernel: ata7.00: cmd
> > > c8/00:00:00:cf:26/00:00:00:00:00/e0 tag 18 dma 131072 in
> > > Jan 07 22:09:03 mx kernel: ata7.00: status: { DRDY }
> > > Jan 07 22:09:03 mx kernel: ata7: hard resetting link
> > > Jan 07 22:09:03 mx kernel: ata7: SATA link up 1.5 Gbps (SStatus
> > > 113 SControl 310)
> > > Jan 07 22:09:09 mx kernel: ata7.00: qc timeout (cmd 0xec)
> > > Jan 07 22:09:09 mx kernel: ata7.00: failed to IDENTIFY (I/O error,
> > > err_mask=0x4) Jan 07 22:09:09 mx kernel: ata7.00: revalidation
> > > failed (errno=-5) Jan 07 22:09:09 mx kernel: ata7: hard resetting
> > > link Jan 07 22:09:19 mx kernel: ata7: softreset failed (1st FIS
> > > failed) Jan 07 22:09:19 mx kernel: ata7: hard resetting link
> > > Jan 07 22:09:29 mx kernel: ata7: softreset failed (1st FIS failed)
> > > Jan 07 22:09:29 mx kernel: ata7: hard resetting link
> > > Jan 07 22:09:35 mx kernel: ata8.00: exception Emask 0x40 SAct 0x0
> > > SErr 0x800 action 0x6 frozen
> > > Jan 07 22:09:35 mx kernel: ata8: SError: { HostInt }
> > > Jan 07 22:09:35 mx kernel: ata8.00: failed command: READ DMA
> > > Jan 07 22:09:35 mx kernel: ata8.00: cmd
> > > c8/00:00:00:64:4a/00:00:00:00:00/e0 tag 2 dma 131072 in
> > > Jan 07 22:09:35 mx kernel: ata8.00: status: { DRDY }
> > > Jan 07 22:09:35 mx kernel: ata8: hard resetting link
> > > Jan 07 22:09:45 mx kernel: ata8: softreset failed (1st FIS failed)
> > > Jan 07 22:09:45 mx kernel: ata8: hard resetting link
> > > Jan 07 22:09:55 mx kernel: ata8: softreset failed (1st FIS failed)
> > > Jan 07 22:09:55 mx kernel: ata8: hard resetting link
> > > Jan 07 22:10:04 mx kernel: ata7: softreset failed (1st FIS failed)
> > > Jan 07 22:10:04 mx kernel: ata7: hard resetting link
> > > Jan 07 22:10:09 mx kernel: ata7: softreset failed (1st FIS failed)
> > > Jan 07 22:10:09 mx kernel: ata7: reset failed, giving up
> > > Jan 07 22:10:09 mx kernel: ata7.00: disabled
> > > Jan 07 22:10:09 mx kernel: ata7: EH complete
> > > Jan 07 22:10:30 mx kernel: ata8: softreset failed (1st FIS failed)
> > > Jan 07 22:10:30 mx kernel: ata8: hard resetting link
> > > Jan 07 22:10:35 mx kernel: ata8: softreset failed (1st FIS failed)
> > > Jan 07 22:10:35 mx kernel: ata8: reset failed, giving up
> > > Jan 07 22:10:35 mx kernel: ata8.00: disabled
> > > Jan 07 22:10:35 mx kernel: ata8: EH complete
> > >
> > > This is happening because of some issue with the SATA controller
> > > on the motherboard. This has not been resolved and probably never
> > > will be, I see many others through google search complaining of
> > > similar issues with the SATA controller.
> > > This failure only occurs when the SATA controller is placed under
> > > very heavy load, I have minimised the impact of the problem by
> > > not using NCQ, this helps, but it still occurs. Ironically the
> > > biggest issue I have is that mdadm "checkarray" is running
> > > because of a systemd background process every week or so, and
> > > this hammers the disk into failure. Most of the normal daily
> > > usage never generates the link resets.
> > > Naturally I have changed SATA cables and moved drives around onto
> > > different controllers, but alas, it does seem to be the hardware
> > > on the motherboard.
> > > However as a workaround I was hoping to accept the occasional
> > > failure and then using some scripting and 'setpci' I can get the
> > > kernel to hard reset the chipset and attach the drives again. I
> > > have the process working in terms of getting the kernel to
> > > re-attach the drives, but.......
> > >
> > > Unfortunately mdraid will not let go of them, I can not stop the
> > > arrays, and therefore can't rebuild them. If I simply allow the
> > > kernel to re-attach the drives the kernel names are swapped over,
> > > as something (mdraid) is stopping the kernel re-using the same
> > > device names. Anyway being dependent on the same kernel device
> > > names is not a great plan anyway, so I was simply trying to get
> > > mdadm to reassemble the array as soon as the 'workaround' script
> > > gets the drives back in contact with libata (kernel).
> > >
> > > Plan:
> > > 1. Detecting the problem. (mdadm state)
> > > 2. Stop the array totally (can NOT do it)
> > > 3. reset the chipset across the PCI bus.
> > > 4. allow kernel to re-attach drives.
> > > 5. re-assemble the md device with mdadm
> > > 6. restart, if necessary higher layer services...
> > >
> > > So why is mdraid holding on to the array:
> > >
> > > # mdadm --stop /dev/md90
> > > mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
> > > process, mounted filesystem or active volume group?
> > >
> > > I can not be 100% sure that something else is using the device,
> > > but I can't think of anything that is and I stopped every process
> > > I can think of..... Plus why is the array still shown as 'active'
> > > when none of its member devices even exist anymore?
> > >
> > > What I do know is that device mapper (coming down from LVM)
> > > still has an entry in /dev/mapper. But then probably no surprise
> > > as /dev/md90 the failed array is still an active device node. If
> > > you attempt to write to it, I receive I/O errors from the kernel.
> > > In fact as far as any higher layer services are concerned md90
> > > and the LVM LV on top of it are still active and working when in
> > > reality, they are not. It causes very strange NFS errors and such.
> > >
> > > mdraid does actually attempt to iteratively remove both partitions
> > > when the kernel signals the disable state, but only 1 of them
> > > succeeds.
> > > I did an strace of the same iterative 'fail:remove' process that
> > > mdraid attempts when the kernel issues -- kernel: ata7.00:
> > > disabled
> > >
> > > eg:
> > > /sbin/mdadm -If sdc1 --path pci-0000:02:00.0-ata-1
> > > mdadm: set device faulty failed for sdc1:  Device or resource busy
> > >
> > > The only clue is perhaps this line from the strace:
> > > openat(AT_FDCWD, "/sys/block/md90/md/dev-sdc1/block/dev", O_RDWR)
> > > = -1 EACCES (Permission denied)    What is the mdadm command
> > > doing that results in a permission problem?
> > >
> > > So the only way I can get rid of this md raid array is a reboot.
> > > Damn!!!
> > >
> > >
> > > Any help is much appreciated.
> > > Aidan
> > >
> > >
> > >  
> > Hi Aidan,
> > This is how it is implemented. Drive is not removed if array failure
> > will cause array failed. Please see:
> > https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?id=9a567843f7ce0037bfd4d5fdc58a09d0a527b28b
> >
> > For RAID1 you can use solution proposed in patch below but IMO it is
> > not your problem. Please stop LVM and then try to stop array. To
> > stop array it needs to be "free" (all upper handlers are down).
> >
> > Thanks,
> > Mariusz  


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: md device remains active even when all supporting disks have failed and been disabled by the kernel.
  2022-01-13 14:46     ` Mariusz Tkaczyk
@ 2022-01-13 16:35       ` Aidan Walton
  2022-01-13 16:48         ` Geoff Back
  2022-01-13 17:12         ` Aidan Walton
  0 siblings, 2 replies; 14+ messages in thread
From: Aidan Walton @ 2022-01-13 16:35 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: Linux RAID

Progress of sorts:
I have tried to get the results as requested. However I was
experimenting with the patch that you referenced initially and before
I did this test, I forgot that I had actually run:
echo 1 > /sys/block/md90/md/fail_last_dev

So the outcome was interesting as the result proves this feature does
work as expected and the md90 array showed both devices flagged as
down. Better!
/dev/md90:
           Version : 1.2
     Creation Time : Sat Nov  3 03:09:16 2018
        Raid Level : raid1
        Array Size : 488253440 (465.63 GiB 499.97 GB)
     Used Dev Size : 488253440 (465.63 GiB 499.97 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu Jan 13 17:08:38 2022
             State : clean, FAILED
    Active Devices : 0
    Failed Devices : 2
     Spare Devices : 0

Consistency Policy : bitmap

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       -       0        0        1      removed

       0       8       33        -      faulty   /dev/sdc1
       2       8       49        -      faulty   /dev/sdd1

The process that led to this is below:
from journalctl:
Jan 13 17:07:05 mx kernel: ata7.00: exception Emask 0x32 SAct 0x0 SErr
0x0 action 0xe frozen
Jan 13 17:07:05 mx kernel: ata7.00: irq_stat 0xffffffff, unknown FIS
00000000 00000000 00000000 00000000, host bus
Jan 13 17:07:05 mx kernel: ata7.00: failed command: READ DMA
Jan 13 17:07:05 mx kernel: ata7.00: cmd
c8/00:00:18:2d:ec/00:00:00:00:00/e0 tag 22 dma 131072 in
Jan 13 17:07:05 mx kernel: ata7.00: status: { DRDY }
Jan 13 17:07:05 mx kernel: ata7: hard resetting link
Jan 13 17:07:15 mx kernel: ata7: softreset failed (1st FIS failed)
Jan 13 17:07:15 mx kernel: ata7: hard resetting link
Jan 13 17:07:25 mx kernel: ata7: softreset failed (1st FIS failed)
Jan 13 17:07:25 mx kernel: ata7: hard resetting link
Jan 13 17:07:36 mx kernel: ata8.00: exception Emask 0x40 SAct 0x0 SErr
0x800 action 0x6 frozen
Jan 13 17:07:36 mx kernel: ata8: SError: { HostInt }
Jan 13 17:07:36 mx kernel: ata8.00: failed command: READ DMA
Jan 13 17:07:36 mx kernel: ata8.00: cmd
c8/00:00:78:53:d9/00:00:00:00:00/e0 tag 10 dma 131072 in
Jan 13 17:07:36 mx kernel: ata8.00: status: { DRDY }
Jan 13 17:07:36 mx kernel: ata8: hard resetting link
Jan 13 17:07:46 mx kernel: ata8: softreset failed (1st FIS failed)
Jan 13 17:07:46 mx kernel: ata8: hard resetting link
Jan 13 17:07:56 mx kernel: ata8: softreset failed (1st FIS failed)
Jan 13 17:07:56 mx kernel: ata8: hard resetting link
Jan 13 17:08:00 mx kernel: ata7: softreset failed (1st FIS failed)
Jan 13 17:08:00 mx kernel: ata7: hard resetting link
Jan 13 17:08:05 mx kernel: ata7: softreset failed (1st FIS failed)
Jan 13 17:08:05 mx kernel: ata7: reset failed, giving up
Jan 13 17:08:05 mx kernel: ata7.00: disabled
Jan 13 17:08:05 mx kernel: ata7: EH complete
Jan 13 17:08:31 mx kernel: ata8: softreset failed (1st FIS failed)
Jan 13 17:08:31 mx kernel: ata8: hard resetting link
Jan 13 17:08:36 mx kernel: ata8: softreset failed (1st FIS failed)
Jan 13 17:08:36 mx kernel: ata8: reset failed, giving up
Jan 13 17:08:36 mx kernel: ata8.00: disabled
Jan 13 17:08:36 mx kernel: ata8: EH complete


from udevadm monitor, NOTE, I noticed that udev monitor did not seem
to spit anything out at the point in time when the first SATA device
ata7.00: was disabled. The messages below only appeared 30secs later
when ata8.00: went down: Sorry no timestamping on udevadm monitor, but
read below from exactly when this occurred: Jan 13 17:08:36 mx kernel:
ata8.00: disabled

KERNEL[226088.463136] add
/kernel/slab/kmalloc-192/cgroup/kmalloc-192(549:mdmonitor.service)
(cgroup)
KERNEL[226088.463335] add
/kernel/slab/kmalloc-1k/cgroup/kmalloc-1k(549:mdmonitor.service)
(cgroup)
KERNEL[226088.464229] add
/kernel/slab/task_struct/cgroup/task_struct(549:mdmonitor.service)
(cgroup)
KERNEL[226088.464374] add
/kernel/slab/:A-0000080/cgroup/task_delay_info(549:mdmonitor.service)
(cgroup)
KERNEL[226088.464467] add
/kernel/slab/:A-0000704/cgroup/files_cache(549:mdmonitor.service)
(cgroup)
KERNEL[226088.464718] add
/kernel/slab/sighand_cache/cgroup/sighand_cache(549:mdmonitor.service)
(cgroup)
KERNEL[226088.465312] add
/kernel/slab/:A-0001152/cgroup/signal_cache(549:mdmonitor.service)
(cgroup)
UDEV  [226088.472483] add
/kernel/slab/kmalloc-192/cgroup/kmalloc-192(549:mdmonitor.service)
(cgroup)
UDEV  [226088.473360] add
/kernel/slab/kmalloc-1k/cgroup/kmalloc-1k(549:mdmonitor.service)
(cgroup)
UDEV  [226088.480328] add
/kernel/slab/task_struct/cgroup/task_struct(549:mdmonitor.service)
(cgroup)
UDEV  [226088.486830] add
/kernel/slab/:A-0000080/cgroup/task_delay_info(549:mdmonitor.service)
(cgroup)
UDEV  [226088.487077] add
/kernel/slab/sighand_cache/cgroup/sighand_cache(549:mdmonitor.service)
(cgroup)
UDEV  [226088.488851] add
/kernel/slab/:A-0000704/cgroup/files_cache(549:mdmonitor.service)
(cgroup)
UDEV  [226088.493066] add
/kernel/slab/:A-0001152/cgroup/signal_cache(549:mdmonitor.service)
(cgroup)
KERNEL[226089.014423] add
/kernel/slab/:a-0000104/cgroup/buffer_head(549:mdmonitor.service)
(cgroup)
UDEV  [226089.017837] add
/kernel/slab/:a-0000104/cgroup/buffer_head(549:mdmonitor.service)
(cgroup)
KERNEL[226089.188538] add
/kernel/slab/radix_tree_node/cgroup/radix_tree_node(549:mdmonitor.service)
(cgroup)
KERNEL[226089.189366] add
/kernel/slab/ext4_inode_cache/cgroup/ext4_inode_cache(549:mdmonitor.service)
(cgroup)
UDEV  [226089.193043] add
/kernel/slab/radix_tree_node/cgroup/radix_tree_node(549:mdmonitor.service)
(cgroup)
UDEV  [226089.194414] add
/kernel/slab/ext4_inode_cache/cgroup/ext4_inode_cache(549:mdmonitor.service)
(cgroup)
KERNEL[226090.421762] add
/kernel/slab/kmalloc-8/cgroup/kmalloc-8(549:mdmonitor.service)
(cgroup)
UDEV  [226090.425465] add
/kernel/slab/kmalloc-8/cgroup/kmalloc-8(549:mdmonitor.service)
(cgroup)
KERNEL[226090.442542] add
/kernel/slab/proc_inode_cache/cgroup/proc_inode_cache(549:mdmonitor.service)
(cgroup)
UDEV  [226090.445999] add
/kernel/slab/proc_inode_cache/cgroup/proc_inode_cache(549:mdmonitor.service)
(cgroup)
KERNEL[226090.458800] add
/kernel/slab/:A-0000072/cgroup/eventpoll_pwq(549:mdmonitor.service)
(cgroup)
KERNEL[226090.460227] add
/kernel/slab/kmalloc-16/cgroup/kmalloc-16(549:mdmonitor.service)
(cgroup)
UDEV  [226090.463193] add
/kernel/slab/:A-0000072/cgroup/eventpoll_pwq(549:mdmonitor.service)
(cgroup)
UDEV  [226090.467271] add
/kernel/slab/kmalloc-16/cgroup/kmalloc-16(549:mdmonitor.service)
(cgroup)
KERNEL[226090.880178] add
/kernel/slab/kmalloc-rcl-64/cgroup/kmalloc-rcl-64(549:mdmonitor.service)
(cgroup)
UDEV  [226090.883794] add
/kernel/slab/kmalloc-rcl-64/cgroup/kmalloc-rcl-64(549:mdmonitor.service)
(cgroup)



dmsetup info -c
Name                                Maj Min Stat Open Targ Event  UUID
storage.mx.vg2-shared_sun_NAS.lv1   253   0 L--w    1    1      0
LVM-Ud9pj6QE4hK1K3xiAFMVCnno3SrXaRyTXJLtTGDOPjBUppJgzr4t0jJowixEOtx7
storage.mx.vg1-shared_sun_users.lv1 253   2 L--w    1    1      0
LVM-ypcHlbNXu36FLRgU0EcUiXBSIvcOlHEP3MHkBKsBeHf6Q68TIuGA9hd5UfCpvOeo
ubuntu_server--vg-ubuntu_server--lv 253   1 L--w    1    1      0
LVM-eGBUJxP1vlW3MfNNeC2r5JfQUiKKWZ73t3U3Jji3lggXe8LPrUf0xRE0YyPzSorO

NOTE the 'open' status on the NAS.lv1 device. In fact the device is not mounted:

cat /proc/mounts | grep mapper
/dev/mapper/ubuntu_server--vg-ubuntu_server--lv / ext4
rw,relatime,errors=remount-ro 0 0
/dev/mapper/storage.mx.vg1-shared_sun_users.lv1 /mnt/home ext4 rw,relatime 0 0

pvdisplay
  --- Physical volume ---
  PV Name               /dev/md1
  VG Name               storage.mx.vg1
  PV Size               111.73 GiB / not usable 3.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              28603
  Free PE               0
  Allocated PE          28603
  PV UUID               4yDnuz-PEHg-uZqd-djWS-DNnp-Qzuf-fYvGZJ

  --- Physical volume ---
  PV Name               /dev/md0
  VG Name               ubuntu_server-vg
  PV Size               <37.22 GiB / not usable 0
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              9528
  Free PE               0
  Allocated PE          9528
  PV UUID               G0bNbO-DOz4-I2nN-rEQq-X00m-PG3a-fPAP3I

So in this case LVM seems to recognise that the md90 device is gone.

Before I changed the 'fail_last_dev' option, When I ran these LVM
commands, I was experiencing a short delay and then a report in place
of the failed device saying that LVM had given up waiting for a udev
entry to become available after 10000mS. Sorry I didn't catch this,
for the log, but it is from memory.

But now this delay is not happening and LVM seems to have a correct
and consistent view of the failed mount point. Clearly mdraid has sent
the failure up the stack.


However mdraid will still NOT stop the md90 device:

mdadm --stop /dev/md90
mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
process, mounted filesystem or active volume group?

ATB
Aidan

On Thu, 13 Jan 2022 at 15:46, Mariusz Tkaczyk
<mariusz.tkaczyk@linux.intel.com> wrote:
>
> Hi Aidan,
>
> On Wed, 12 Jan 2022 02:29:47 +0100
> Aidan Walton <aidan.walton@gmail.com> wrote:
>
> > Hi Mariusz,
> > In my case, the fact that mdraid does not show a 'total failure' is
> > not the root of the problem. However in my opinion I would say that
> > not having mdraid more accurately reflect the state of the underlying
> > hardware can be mis-leading. Initially when I looked at this issue, I
> > was convinced that only one disk had failed and I was scratching my
> > head about firstly why I still could not R/W the array while it
> > appeared to have an active member. Secondly, when I rebooted I noticed
> > that the array became instantly synchronised with both members active.
>
> We have raid1 so first fail should be recorded in metadata. From your
> description, I understand that nothing like this happened. For me, it
> seems that the controller lost both drives in the same moment and as a
> result nothing was saved. After reboot raid is assembled without
> rebuild because metadata on both members is valid.
>
> > This was not what I expected, as normally an array that has had a
> > single failed disk would require a ra-add and resync.  Then when the
> > problem re-occured I noticed that it was not the same disk that was
> > flagged faulty, next reboot; the faulty disk flipped back the other
> > way... and so on. This was what prompted me to look closer at the
> > kernel. Here I found my answer at the SATA controller. Therefore
> > although mdraids design approach did not cause me any data loss, it
> > did have me looking in the wrong direction for the fault, assuming a
> > disk problem.
> >
> > I have still not been able to successfully --stop the array. I think
> > the issue sits in the LVM domain. Although I can not be 100% sure.
> > What I have achieved is some level of understanding that some process
> > that starts a boot time is in some unknown manner holding a lock on
> > the mdarid  - devmapper - LVM combination. I have unmounted the file
> > system, but LVM refuses to let go of the logical volume. Therefore so
> > does dev-mapper and of course mdraid. I have systematically stopped or
> > killed almost every single running process on the system, taking it
> > back to a skeleton, with not much more than init running, it still
> > refuses to let go
> >
> > However, when I prevent auto-mounting of the raid array at boot, and
> > then manually assemble the raid array, LVM finds its meta data, builds
> > the VG and LV and mounts. If I then manually force the exact same SATA
> > controller failure, which results in the exact same mdraid behaviour,
> > I am then able to unmount the filesystem and ...... hey presto
> > deactivate the LVM LV. Which then allows me to --stop the mdraid. Just
> > as I want. Again it does not solve my SATA hardware issues, but being
> > at this point does give me options to restart the hardware etc, and
> > probably, though very messily, get the filesystem up again without a
> > reboot. The problem being I can not achieve this behaviour without
> > manually assembling the array after boot. If you have any idea what
> > could possibly be holding this lock I would be glad to hear.
> >
> Could you connect to the udev monitor and analyze events triggered in
> both cases? This is the one idea I have.
>
> Thanks,
> Mariusz
>
> > At this point I'm going to have to try and systematically step through
> > the boot process and try re-arranging, when the array gets assembled.
> > My first attempts at this have been to <ignore> the raid array in
> > mdadm.conf and comment the array out of /etc/fstab. In this way mdraid
> > inside initramfs does not auto-assemble and LVM does not auto scan for
> > the VG. Once I am in the real boot sequence, I have created a systemd
> > mount unit that I can pull in from other systemd units, to change the
> > point in the boot process when the array is assembled. In this way
> > hopefully I can influence when other services are interacting with the
> > array in some way and perhaps find the root cause ......   Work in
> > progress..but slowly as the fault occurs only very occasionally and I
> > still need a working server.
> > All the best.. Aidan
> >
> > On Mon, 10 Jan 2022 at 10:47, Mariusz Tkaczyk
> > <mariusz.tkaczyk@linux.intel.com> wrote:
> > >
> > > On Fri, 7 Jan 2022 23:30:31 +0100
> > > Aidan Walton <aidan.walton@gmail.com> wrote:
> > >
> > > > Hi,
> > > > I have a system running:
> > > > Ubuntu Server 20.04.3 LTS on a 5.4.0-92-generic kernel.
> > > >
> > > > On the motherboard is a:
> > > > SATA controller: JMicron Technology Corp. JMB363 SATA/IDE
> > > > Controller (rev 02)
> > > >
> > > > Connected to this I have:
> > > > 2x Western Digital - WDC WD5001AALS-00L3B2, BIOS :01.03B01 500Gb
> > > > drives
> > > >
> > > > Each drive has a single partition and is part of a RAID1 array:
> > > > /dev/md90:
> > > > .....
> > > >     Number   Major   Minor   RaidDevice State
> > > >        0       8       33        0      active sync   /dev/sdc1
> > > >        2       8       49        1      active sync   /dev/sdd1
> > > >
> > > > On top of this I have a single LVM VG and LV. (Probably not
> > > > important)
> > > >
> > > > I have noticed some strange behaviour and upon investigation I
> > > > find the md device in the following state:
> > > > /dev/md90:
> > > > ....
> > > >
> > > >     Number   Major   Minor   RaidDevice State
> > > >        0       8       33        0      active sync   /dev/sdc1
> > > >        -       0        0        1      removed
> > > >
> > > >        2       8       49        -      faulty   /dev/sdd1
> > > >
> > > >
> > > > In fact neither /dev/sdc1 or /dev/sdd1 are available. In fact
> > > > neither are /dev/sdc or /dev/sdd, the physical drives, as they
> > > > both been disconnected by the kernel:
> > > > /dev/sdc is attached to ata7:00  and  /dev/sdd is attached to
> > > > ata.8:00 This is the log of the kernel events:
> > > >
> > > >
> > > > Jan 07 22:09:03 mx kernel: ata7.00: exception Emask 0x32 SAct 0x0
> > > > SErr 0x0 action 0xe frozen
> > > > Jan 07 22:09:03 mx kernel: ata7.00: irq_stat 0xffffffff, unknown
> > > > FIS 00000000 00000000 00000000 00000000, host bus
> > > > Jan 07 22:09:03 mx kernel: ata7.00: failed command: READ DMA
> > > > Jan 07 22:09:03 mx kernel: ata7.00: cmd
> > > > c8/00:00:00:cf:26/00:00:00:00:00/e0 tag 18 dma 131072 in
> > > > Jan 07 22:09:03 mx kernel: ata7.00: status: { DRDY }
> > > > Jan 07 22:09:03 mx kernel: ata7: hard resetting link
> > > > Jan 07 22:09:03 mx kernel: ata7: SATA link up 1.5 Gbps (SStatus
> > > > 113 SControl 310)
> > > > Jan 07 22:09:09 mx kernel: ata7.00: qc timeout (cmd 0xec)
> > > > Jan 07 22:09:09 mx kernel: ata7.00: failed to IDENTIFY (I/O error,
> > > > err_mask=0x4) Jan 07 22:09:09 mx kernel: ata7.00: revalidation
> > > > failed (errno=-5) Jan 07 22:09:09 mx kernel: ata7: hard resetting
> > > > link Jan 07 22:09:19 mx kernel: ata7: softreset failed (1st FIS
> > > > failed) Jan 07 22:09:19 mx kernel: ata7: hard resetting link
> > > > Jan 07 22:09:29 mx kernel: ata7: softreset failed (1st FIS failed)
> > > > Jan 07 22:09:29 mx kernel: ata7: hard resetting link
> > > > Jan 07 22:09:35 mx kernel: ata8.00: exception Emask 0x40 SAct 0x0
> > > > SErr 0x800 action 0x6 frozen
> > > > Jan 07 22:09:35 mx kernel: ata8: SError: { HostInt }
> > > > Jan 07 22:09:35 mx kernel: ata8.00: failed command: READ DMA
> > > > Jan 07 22:09:35 mx kernel: ata8.00: cmd
> > > > c8/00:00:00:64:4a/00:00:00:00:00/e0 tag 2 dma 131072 in
> > > > Jan 07 22:09:35 mx kernel: ata8.00: status: { DRDY }
> > > > Jan 07 22:09:35 mx kernel: ata8: hard resetting link
> > > > Jan 07 22:09:45 mx kernel: ata8: softreset failed (1st FIS failed)
> > > > Jan 07 22:09:45 mx kernel: ata8: hard resetting link
> > > > Jan 07 22:09:55 mx kernel: ata8: softreset failed (1st FIS failed)
> > > > Jan 07 22:09:55 mx kernel: ata8: hard resetting link
> > > > Jan 07 22:10:04 mx kernel: ata7: softreset failed (1st FIS failed)
> > > > Jan 07 22:10:04 mx kernel: ata7: hard resetting link
> > > > Jan 07 22:10:09 mx kernel: ata7: softreset failed (1st FIS failed)
> > > > Jan 07 22:10:09 mx kernel: ata7: reset failed, giving up
> > > > Jan 07 22:10:09 mx kernel: ata7.00: disabled
> > > > Jan 07 22:10:09 mx kernel: ata7: EH complete
> > > > Jan 07 22:10:30 mx kernel: ata8: softreset failed (1st FIS failed)
> > > > Jan 07 22:10:30 mx kernel: ata8: hard resetting link
> > > > Jan 07 22:10:35 mx kernel: ata8: softreset failed (1st FIS failed)
> > > > Jan 07 22:10:35 mx kernel: ata8: reset failed, giving up
> > > > Jan 07 22:10:35 mx kernel: ata8.00: disabled
> > > > Jan 07 22:10:35 mx kernel: ata8: EH complete
> > > >
> > > > This is happening because of some issue with the SATA controller
> > > > on the motherboard. This has not been resolved and probably never
> > > > will be, I see many others through google search complaining of
> > > > similar issues with the SATA controller.
> > > > This failure only occurs when the SATA controller is placed under
> > > > very heavy load, I have minimised the impact of the problem by
> > > > not using NCQ, this helps, but it still occurs. Ironically the
> > > > biggest issue I have is that mdadm "checkarray" is running
> > > > because of a systemd background process every week or so, and
> > > > this hammers the disk into failure. Most of the normal daily
> > > > usage never generates the link resets.
> > > > Naturally I have changed SATA cables and moved drives around onto
> > > > different controllers, but alas, it does seem to be the hardware
> > > > on the motherboard.
> > > > However as a workaround I was hoping to accept the occasional
> > > > failure and then using some scripting and 'setpci' I can get the
> > > > kernel to hard reset the chipset and attach the drives again. I
> > > > have the process working in terms of getting the kernel to
> > > > re-attach the drives, but.......
> > > >
> > > > Unfortunately mdraid will not let go of them, I can not stop the
> > > > arrays, and therefore can't rebuild them. If I simply allow the
> > > > kernel to re-attach the drives the kernel names are swapped over,
> > > > as something (mdraid) is stopping the kernel re-using the same
> > > > device names. Anyway being dependent on the same kernel device
> > > > names is not a great plan anyway, so I was simply trying to get
> > > > mdadm to reassemble the array as soon as the 'workaround' script
> > > > gets the drives back in contact with libata (kernel).
> > > >
> > > > Plan:
> > > > 1. Detecting the problem. (mdadm state)
> > > > 2. Stop the array totally (can NOT do it)
> > > > 3. reset the chipset across the PCI bus.
> > > > 4. allow kernel to re-attach drives.
> > > > 5. re-assemble the md device with mdadm
> > > > 6. restart, if necessary higher layer services...
> > > >
> > > > So why is mdraid holding on to the array:
> > > >
> > > > # mdadm --stop /dev/md90
> > > > mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
> > > > process, mounted filesystem or active volume group?
> > > >
> > > > I can not be 100% sure that something else is using the device,
> > > > but I can't think of anything that is and I stopped every process
> > > > I can think of..... Plus why is the array still shown as 'active'
> > > > when none of its member devices even exist anymore?
> > > >
> > > > What I do know is that device mapper (coming down from LVM)
> > > > still has an entry in /dev/mapper. But then probably no surprise
> > > > as /dev/md90 the failed array is still an active device node. If
> > > > you attempt to write to it, I receive I/O errors from the kernel.
> > > > In fact as far as any higher layer services are concerned md90
> > > > and the LVM LV on top of it are still active and working when in
> > > > reality, they are not. It causes very strange NFS errors and such.
> > > >
> > > > mdraid does actually attempt to iteratively remove both partitions
> > > > when the kernel signals the disable state, but only 1 of them
> > > > succeeds.
> > > > I did an strace of the same iterative 'fail:remove' process that
> > > > mdraid attempts when the kernel issues -- kernel: ata7.00:
> > > > disabled
> > > >
> > > > eg:
> > > > /sbin/mdadm -If sdc1 --path pci-0000:02:00.0-ata-1
> > > > mdadm: set device faulty failed for sdc1:  Device or resource busy
> > > >
> > > > The only clue is perhaps this line from the strace:
> > > > openat(AT_FDCWD, "/sys/block/md90/md/dev-sdc1/block/dev", O_RDWR)
> > > > = -1 EACCES (Permission denied)    What is the mdadm command
> > > > doing that results in a permission problem?
> > > >
> > > > So the only way I can get rid of this md raid array is a reboot.
> > > > Damn!!!
> > > >
> > > >
> > > > Any help is much appreciated.
> > > > Aidan
> > > >
> > > >
> > > >
> > > Hi Aidan,
> > > This is how it is implemented. Drive is not removed if array failure
> > > will cause array failed. Please see:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?id=9a567843f7ce0037bfd4d5fdc58a09d0a527b28b
> > >
> > > For RAID1 you can use solution proposed in patch below but IMO it is
> > > not your problem. Please stop LVM and then try to stop array. To
> > > stop array it needs to be "free" (all upper handlers are down).
> > >
> > > Thanks,
> > > Mariusz
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: md device remains active even when all supporting disks have failed and been disabled by the kernel.
  2022-01-13 16:35       ` Aidan Walton
@ 2022-01-13 16:48         ` Geoff Back
  2022-01-13 17:21           ` Aidan Walton
  2022-01-13 17:12         ` Aidan Walton
  1 sibling, 1 reply; 14+ messages in thread
From: Geoff Back @ 2022-01-13 16:48 UTC (permalink / raw)
  To: Aidan Walton, Mariusz Tkaczyk; +Cc: Linux RAID

Hi,

I believe md90 will remain locked all the time that any LVM-created
device mapping exists that references the md90 PV, regardless of whether
that mapping is in use.

You could try using "lvchange -an <vgname>/<volume>" to deactivate the
LV.  That should remove the DM mapping, which should in turn unlock md90.

Cheers,

Geoff.


On 13/01/2022 16:35, Aidan Walton wrote:
> Progress of sorts:
> I have tried to get the results as requested. However I was
> experimenting with the patch that you referenced initially and before
> I did this test, I forgot that I had actually run:
> echo 1 > /sys/block/md90/md/fail_last_dev
>
> So the outcome was interesting as the result proves this feature does
> work as expected and the md90 array showed both devices flagged as
> down. Better!
> /dev/md90:
>            Version : 1.2
>      Creation Time : Sat Nov  3 03:09:16 2018
>         Raid Level : raid1
>         Array Size : 488253440 (465.63 GiB 499.97 GB)
>      Used Dev Size : 488253440 (465.63 GiB 499.97 GB)
>       Raid Devices : 2
>      Total Devices : 2
>        Persistence : Superblock is persistent
>
>      Intent Bitmap : Internal
>
>        Update Time : Thu Jan 13 17:08:38 2022
>              State : clean, FAILED
>     Active Devices : 0
>     Failed Devices : 2
>      Spare Devices : 0
>
> Consistency Policy : bitmap
>
>     Number   Major   Minor   RaidDevice State
>        -       0        0        0      removed
>        -       0        0        1      removed
>
>        0       8       33        -      faulty   /dev/sdc1
>        2       8       49        -      faulty   /dev/sdd1
>
> The process that led to this is below:
> from journalctl:
> Jan 13 17:07:05 mx kernel: ata7.00: exception Emask 0x32 SAct 0x0 SErr
> 0x0 action 0xe frozen
> Jan 13 17:07:05 mx kernel: ata7.00: irq_stat 0xffffffff, unknown FIS
> 00000000 00000000 00000000 00000000, host bus
> Jan 13 17:07:05 mx kernel: ata7.00: failed command: READ DMA
> Jan 13 17:07:05 mx kernel: ata7.00: cmd
> c8/00:00:18:2d:ec/00:00:00:00:00/e0 tag 22 dma 131072 in
> Jan 13 17:07:05 mx kernel: ata7.00: status: { DRDY }
> Jan 13 17:07:05 mx kernel: ata7: hard resetting link
> Jan 13 17:07:15 mx kernel: ata7: softreset failed (1st FIS failed)
> Jan 13 17:07:15 mx kernel: ata7: hard resetting link
> Jan 13 17:07:25 mx kernel: ata7: softreset failed (1st FIS failed)
> Jan 13 17:07:25 mx kernel: ata7: hard resetting link
> Jan 13 17:07:36 mx kernel: ata8.00: exception Emask 0x40 SAct 0x0 SErr
> 0x800 action 0x6 frozen
> Jan 13 17:07:36 mx kernel: ata8: SError: { HostInt }
> Jan 13 17:07:36 mx kernel: ata8.00: failed command: READ DMA
> Jan 13 17:07:36 mx kernel: ata8.00: cmd
> c8/00:00:78:53:d9/00:00:00:00:00/e0 tag 10 dma 131072 in
> Jan 13 17:07:36 mx kernel: ata8.00: status: { DRDY }
> Jan 13 17:07:36 mx kernel: ata8: hard resetting link
> Jan 13 17:07:46 mx kernel: ata8: softreset failed (1st FIS failed)
> Jan 13 17:07:46 mx kernel: ata8: hard resetting link
> Jan 13 17:07:56 mx kernel: ata8: softreset failed (1st FIS failed)
> Jan 13 17:07:56 mx kernel: ata8: hard resetting link
> Jan 13 17:08:00 mx kernel: ata7: softreset failed (1st FIS failed)
> Jan 13 17:08:00 mx kernel: ata7: hard resetting link
> Jan 13 17:08:05 mx kernel: ata7: softreset failed (1st FIS failed)
> Jan 13 17:08:05 mx kernel: ata7: reset failed, giving up
> Jan 13 17:08:05 mx kernel: ata7.00: disabled
> Jan 13 17:08:05 mx kernel: ata7: EH complete
> Jan 13 17:08:31 mx kernel: ata8: softreset failed (1st FIS failed)
> Jan 13 17:08:31 mx kernel: ata8: hard resetting link
> Jan 13 17:08:36 mx kernel: ata8: softreset failed (1st FIS failed)
> Jan 13 17:08:36 mx kernel: ata8: reset failed, giving up
> Jan 13 17:08:36 mx kernel: ata8.00: disabled
> Jan 13 17:08:36 mx kernel: ata8: EH complete
>
>
> from udevadm monitor, NOTE, I noticed that udev monitor did not seem
> to spit anything out at the point in time when the first SATA device
> ata7.00: was disabled. The messages below only appeared 30secs later
> when ata8.00: went down: Sorry no timestamping on udevadm monitor, but
> read below from exactly when this occurred: Jan 13 17:08:36 mx kernel:
> ata8.00: disabled
>
> KERNEL[226088.463136] add
> /kernel/slab/kmalloc-192/cgroup/kmalloc-192(549:mdmonitor.service)
> (cgroup)
> KERNEL[226088.463335] add
> /kernel/slab/kmalloc-1k/cgroup/kmalloc-1k(549:mdmonitor.service)
> (cgroup)
> KERNEL[226088.464229] add
> /kernel/slab/task_struct/cgroup/task_struct(549:mdmonitor.service)
> (cgroup)
> KERNEL[226088.464374] add
> /kernel/slab/:A-0000080/cgroup/task_delay_info(549:mdmonitor.service)
> (cgroup)
> KERNEL[226088.464467] add
> /kernel/slab/:A-0000704/cgroup/files_cache(549:mdmonitor.service)
> (cgroup)
> KERNEL[226088.464718] add
> /kernel/slab/sighand_cache/cgroup/sighand_cache(549:mdmonitor.service)
> (cgroup)
> KERNEL[226088.465312] add
> /kernel/slab/:A-0001152/cgroup/signal_cache(549:mdmonitor.service)
> (cgroup)
> UDEV  [226088.472483] add
> /kernel/slab/kmalloc-192/cgroup/kmalloc-192(549:mdmonitor.service)
> (cgroup)
> UDEV  [226088.473360] add
> /kernel/slab/kmalloc-1k/cgroup/kmalloc-1k(549:mdmonitor.service)
> (cgroup)
> UDEV  [226088.480328] add
> /kernel/slab/task_struct/cgroup/task_struct(549:mdmonitor.service)
> (cgroup)
> UDEV  [226088.486830] add
> /kernel/slab/:A-0000080/cgroup/task_delay_info(549:mdmonitor.service)
> (cgroup)
> UDEV  [226088.487077] add
> /kernel/slab/sighand_cache/cgroup/sighand_cache(549:mdmonitor.service)
> (cgroup)
> UDEV  [226088.488851] add
> /kernel/slab/:A-0000704/cgroup/files_cache(549:mdmonitor.service)
> (cgroup)
> UDEV  [226088.493066] add
> /kernel/slab/:A-0001152/cgroup/signal_cache(549:mdmonitor.service)
> (cgroup)
> KERNEL[226089.014423] add
> /kernel/slab/:a-0000104/cgroup/buffer_head(549:mdmonitor.service)
> (cgroup)
> UDEV  [226089.017837] add
> /kernel/slab/:a-0000104/cgroup/buffer_head(549:mdmonitor.service)
> (cgroup)
> KERNEL[226089.188538] add
> /kernel/slab/radix_tree_node/cgroup/radix_tree_node(549:mdmonitor.service)
> (cgroup)
> KERNEL[226089.189366] add
> /kernel/slab/ext4_inode_cache/cgroup/ext4_inode_cache(549:mdmonitor.service)
> (cgroup)
> UDEV  [226089.193043] add
> /kernel/slab/radix_tree_node/cgroup/radix_tree_node(549:mdmonitor.service)
> (cgroup)
> UDEV  [226089.194414] add
> /kernel/slab/ext4_inode_cache/cgroup/ext4_inode_cache(549:mdmonitor.service)
> (cgroup)
> KERNEL[226090.421762] add
> /kernel/slab/kmalloc-8/cgroup/kmalloc-8(549:mdmonitor.service)
> (cgroup)
> UDEV  [226090.425465] add
> /kernel/slab/kmalloc-8/cgroup/kmalloc-8(549:mdmonitor.service)
> (cgroup)
> KERNEL[226090.442542] add
> /kernel/slab/proc_inode_cache/cgroup/proc_inode_cache(549:mdmonitor.service)
> (cgroup)
> UDEV  [226090.445999] add
> /kernel/slab/proc_inode_cache/cgroup/proc_inode_cache(549:mdmonitor.service)
> (cgroup)
> KERNEL[226090.458800] add
> /kernel/slab/:A-0000072/cgroup/eventpoll_pwq(549:mdmonitor.service)
> (cgroup)
> KERNEL[226090.460227] add
> /kernel/slab/kmalloc-16/cgroup/kmalloc-16(549:mdmonitor.service)
> (cgroup)
> UDEV  [226090.463193] add
> /kernel/slab/:A-0000072/cgroup/eventpoll_pwq(549:mdmonitor.service)
> (cgroup)
> UDEV  [226090.467271] add
> /kernel/slab/kmalloc-16/cgroup/kmalloc-16(549:mdmonitor.service)
> (cgroup)
> KERNEL[226090.880178] add
> /kernel/slab/kmalloc-rcl-64/cgroup/kmalloc-rcl-64(549:mdmonitor.service)
> (cgroup)
> UDEV  [226090.883794] add
> /kernel/slab/kmalloc-rcl-64/cgroup/kmalloc-rcl-64(549:mdmonitor.service)
> (cgroup)
>
>
>
> dmsetup info -c
> Name                                Maj Min Stat Open Targ Event  UUID
> storage.mx.vg2-shared_sun_NAS.lv1   253   0 L--w    1    1      0
> LVM-Ud9pj6QE4hK1K3xiAFMVCnno3SrXaRyTXJLtTGDOPjBUppJgzr4t0jJowixEOtx7
> storage.mx.vg1-shared_sun_users.lv1 253   2 L--w    1    1      0
> LVM-ypcHlbNXu36FLRgU0EcUiXBSIvcOlHEP3MHkBKsBeHf6Q68TIuGA9hd5UfCpvOeo
> ubuntu_server--vg-ubuntu_server--lv 253   1 L--w    1    1      0
> LVM-eGBUJxP1vlW3MfNNeC2r5JfQUiKKWZ73t3U3Jji3lggXe8LPrUf0xRE0YyPzSorO
>
> NOTE the 'open' status on the NAS.lv1 device. In fact the device is not mounted:
>
> cat /proc/mounts | grep mapper
> /dev/mapper/ubuntu_server--vg-ubuntu_server--lv / ext4
> rw,relatime,errors=remount-ro 0 0
> /dev/mapper/storage.mx.vg1-shared_sun_users.lv1 /mnt/home ext4 rw,relatime 0 0
>
> pvdisplay
>   --- Physical volume ---
>   PV Name               /dev/md1
>   VG Name               storage.mx.vg1
>   PV Size               111.73 GiB / not usable 3.00 MiB
>   Allocatable           yes (but full)
>   PE Size               4.00 MiB
>   Total PE              28603
>   Free PE               0
>   Allocated PE          28603
>   PV UUID               4yDnuz-PEHg-uZqd-djWS-DNnp-Qzuf-fYvGZJ
>
>   --- Physical volume ---
>   PV Name               /dev/md0
>   VG Name               ubuntu_server-vg
>   PV Size               <37.22 GiB / not usable 0
>   Allocatable           yes (but full)
>   PE Size               4.00 MiB
>   Total PE              9528
>   Free PE               0
>   Allocated PE          9528
>   PV UUID               G0bNbO-DOz4-I2nN-rEQq-X00m-PG3a-fPAP3I
>
> So in this case LVM seems to recognise that the md90 device is gone.
>
> Before I changed the 'fail_last_dev' option, When I ran these LVM
> commands, I was experiencing a short delay and then a report in place
> of the failed device saying that LVM had given up waiting for a udev
> entry to become available after 10000mS. Sorry I didn't catch this,
> for the log, but it is from memory.
>
> But now this delay is not happening and LVM seems to have a correct
> and consistent view of the failed mount point. Clearly mdraid has sent
> the failure up the stack.
>
>
> However mdraid will still NOT stop the md90 device:
>
> mdadm --stop /dev/md90
> mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
> process, mounted filesystem or active volume group?
>
> ATB
> Aidan
>
> On Thu, 13 Jan 2022 at 15:46, Mariusz Tkaczyk
> <mariusz.tkaczyk@linux.intel.com> wrote:
>> Hi Aidan,
>>
>> On Wed, 12 Jan 2022 02:29:47 +0100
>> Aidan Walton <aidan.walton@gmail.com> wrote:
>>
>>> Hi Mariusz,
>>> In my case, the fact that mdraid does not show a 'total failure' is
>>> not the root of the problem. However in my opinion I would say that
>>> not having mdraid more accurately reflect the state of the underlying
>>> hardware can be mis-leading. Initially when I looked at this issue, I
>>> was convinced that only one disk had failed and I was scratching my
>>> head about firstly why I still could not R/W the array while it
>>> appeared to have an active member. Secondly, when I rebooted I noticed
>>> that the array became instantly synchronised with both members active.
>> We have raid1 so first fail should be recorded in metadata. From your
>> description, I understand that nothing like this happened. For me, it
>> seems that the controller lost both drives in the same moment and as a
>> result nothing was saved. After reboot raid is assembled without
>> rebuild because metadata on both members is valid.
>>
>>> This was not what I expected, as normally an array that has had a
>>> single failed disk would require a ra-add and resync.  Then when the
>>> problem re-occured I noticed that it was not the same disk that was
>>> flagged faulty, next reboot; the faulty disk flipped back the other
>>> way... and so on. This was what prompted me to look closer at the
>>> kernel. Here I found my answer at the SATA controller. Therefore
>>> although mdraids design approach did not cause me any data loss, it
>>> did have me looking in the wrong direction for the fault, assuming a
>>> disk problem.
>>>
>>> I have still not been able to successfully --stop the array. I think
>>> the issue sits in the LVM domain. Although I can not be 100% sure.
>>> What I have achieved is some level of understanding that some process
>>> that starts a boot time is in some unknown manner holding a lock on
>>> the mdarid  - devmapper - LVM combination. I have unmounted the file
>>> system, but LVM refuses to let go of the logical volume. Therefore so
>>> does dev-mapper and of course mdraid. I have systematically stopped or
>>> killed almost every single running process on the system, taking it
>>> back to a skeleton, with not much more than init running, it still
>>> refuses to let go
>>>
>>> However, when I prevent auto-mounting of the raid array at boot, and
>>> then manually assemble the raid array, LVM finds its meta data, builds
>>> the VG and LV and mounts. If I then manually force the exact same SATA
>>> controller failure, which results in the exact same mdraid behaviour,
>>> I am then able to unmount the filesystem and ...... hey presto
>>> deactivate the LVM LV. Which then allows me to --stop the mdraid. Just
>>> as I want. Again it does not solve my SATA hardware issues, but being
>>> at this point does give me options to restart the hardware etc, and
>>> probably, though very messily, get the filesystem up again without a
>>> reboot. The problem being I can not achieve this behaviour without
>>> manually assembling the array after boot. If you have any idea what
>>> could possibly be holding this lock I would be glad to hear.
>>>
>> Could you connect to the udev monitor and analyze events triggered in
>> both cases? This is the one idea I have.
>>
>> Thanks,
>> Mariusz
>>
>>> At this point I'm going to have to try and systematically step through
>>> the boot process and try re-arranging, when the array gets assembled.
>>> My first attempts at this have been to <ignore> the raid array in
>>> mdadm.conf and comment the array out of /etc/fstab. In this way mdraid
>>> inside initramfs does not auto-assemble and LVM does not auto scan for
>>> the VG. Once I am in the real boot sequence, I have created a systemd
>>> mount unit that I can pull in from other systemd units, to change the
>>> point in the boot process when the array is assembled. In this way
>>> hopefully I can influence when other services are interacting with the
>>> array in some way and perhaps find the root cause ......   Work in
>>> progress..but slowly as the fault occurs only very occasionally and I
>>> still need a working server.
>>> All the best.. Aidan
>>>
>>> On Mon, 10 Jan 2022 at 10:47, Mariusz Tkaczyk
>>> <mariusz.tkaczyk@linux.intel.com> wrote:
>>>> On Fri, 7 Jan 2022 23:30:31 +0100
>>>> Aidan Walton <aidan.walton@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>> I have a system running:
>>>>> Ubuntu Server 20.04.3 LTS on a 5.4.0-92-generic kernel.
>>>>>
>>>>> On the motherboard is a:
>>>>> SATA controller: JMicron Technology Corp. JMB363 SATA/IDE
>>>>> Controller (rev 02)
>>>>>
>>>>> Connected to this I have:
>>>>> 2x Western Digital - WDC WD5001AALS-00L3B2, BIOS :01.03B01 500Gb
>>>>> drives
>>>>>
>>>>> Each drive has a single partition and is part of a RAID1 array:
>>>>> /dev/md90:
>>>>> .....
>>>>>     Number   Major   Minor   RaidDevice State
>>>>>        0       8       33        0      active sync   /dev/sdc1
>>>>>        2       8       49        1      active sync   /dev/sdd1
>>>>>
>>>>> On top of this I have a single LVM VG and LV. (Probably not
>>>>> important)
>>>>>
>>>>> I have noticed some strange behaviour and upon investigation I
>>>>> find the md device in the following state:
>>>>> /dev/md90:
>>>>> ....
>>>>>
>>>>>     Number   Major   Minor   RaidDevice State
>>>>>        0       8       33        0      active sync   /dev/sdc1
>>>>>        -       0        0        1      removed
>>>>>
>>>>>        2       8       49        -      faulty   /dev/sdd1
>>>>>
>>>>>
>>>>> In fact neither /dev/sdc1 or /dev/sdd1 are available. In fact
>>>>> neither are /dev/sdc or /dev/sdd, the physical drives, as they
>>>>> both been disconnected by the kernel:
>>>>> /dev/sdc is attached to ata7:00  and  /dev/sdd is attached to
>>>>> ata.8:00 This is the log of the kernel events:
>>>>>
>>>>>
>>>>> Jan 07 22:09:03 mx kernel: ata7.00: exception Emask 0x32 SAct 0x0
>>>>> SErr 0x0 action 0xe frozen
>>>>> Jan 07 22:09:03 mx kernel: ata7.00: irq_stat 0xffffffff, unknown
>>>>> FIS 00000000 00000000 00000000 00000000, host bus
>>>>> Jan 07 22:09:03 mx kernel: ata7.00: failed command: READ DMA
>>>>> Jan 07 22:09:03 mx kernel: ata7.00: cmd
>>>>> c8/00:00:00:cf:26/00:00:00:00:00/e0 tag 18 dma 131072 in
>>>>> Jan 07 22:09:03 mx kernel: ata7.00: status: { DRDY }
>>>>> Jan 07 22:09:03 mx kernel: ata7: hard resetting link
>>>>> Jan 07 22:09:03 mx kernel: ata7: SATA link up 1.5 Gbps (SStatus
>>>>> 113 SControl 310)
>>>>> Jan 07 22:09:09 mx kernel: ata7.00: qc timeout (cmd 0xec)
>>>>> Jan 07 22:09:09 mx kernel: ata7.00: failed to IDENTIFY (I/O error,
>>>>> err_mask=0x4) Jan 07 22:09:09 mx kernel: ata7.00: revalidation
>>>>> failed (errno=-5) Jan 07 22:09:09 mx kernel: ata7: hard resetting
>>>>> link Jan 07 22:09:19 mx kernel: ata7: softreset failed (1st FIS
>>>>> failed) Jan 07 22:09:19 mx kernel: ata7: hard resetting link
>>>>> Jan 07 22:09:29 mx kernel: ata7: softreset failed (1st FIS failed)
>>>>> Jan 07 22:09:29 mx kernel: ata7: hard resetting link
>>>>> Jan 07 22:09:35 mx kernel: ata8.00: exception Emask 0x40 SAct 0x0
>>>>> SErr 0x800 action 0x6 frozen
>>>>> Jan 07 22:09:35 mx kernel: ata8: SError: { HostInt }
>>>>> Jan 07 22:09:35 mx kernel: ata8.00: failed command: READ DMA
>>>>> Jan 07 22:09:35 mx kernel: ata8.00: cmd
>>>>> c8/00:00:00:64:4a/00:00:00:00:00/e0 tag 2 dma 131072 in
>>>>> Jan 07 22:09:35 mx kernel: ata8.00: status: { DRDY }
>>>>> Jan 07 22:09:35 mx kernel: ata8: hard resetting link
>>>>> Jan 07 22:09:45 mx kernel: ata8: softreset failed (1st FIS failed)
>>>>> Jan 07 22:09:45 mx kernel: ata8: hard resetting link
>>>>> Jan 07 22:09:55 mx kernel: ata8: softreset failed (1st FIS failed)
>>>>> Jan 07 22:09:55 mx kernel: ata8: hard resetting link
>>>>> Jan 07 22:10:04 mx kernel: ata7: softreset failed (1st FIS failed)
>>>>> Jan 07 22:10:04 mx kernel: ata7: hard resetting link
>>>>> Jan 07 22:10:09 mx kernel: ata7: softreset failed (1st FIS failed)
>>>>> Jan 07 22:10:09 mx kernel: ata7: reset failed, giving up
>>>>> Jan 07 22:10:09 mx kernel: ata7.00: disabled
>>>>> Jan 07 22:10:09 mx kernel: ata7: EH complete
>>>>> Jan 07 22:10:30 mx kernel: ata8: softreset failed (1st FIS failed)
>>>>> Jan 07 22:10:30 mx kernel: ata8: hard resetting link
>>>>> Jan 07 22:10:35 mx kernel: ata8: softreset failed (1st FIS failed)
>>>>> Jan 07 22:10:35 mx kernel: ata8: reset failed, giving up
>>>>> Jan 07 22:10:35 mx kernel: ata8.00: disabled
>>>>> Jan 07 22:10:35 mx kernel: ata8: EH complete
>>>>>
>>>>> This is happening because of some issue with the SATA controller
>>>>> on the motherboard. This has not been resolved and probably never
>>>>> will be, I see many others through google search complaining of
>>>>> similar issues with the SATA controller.
>>>>> This failure only occurs when the SATA controller is placed under
>>>>> very heavy load, I have minimised the impact of the problem by
>>>>> not using NCQ, this helps, but it still occurs. Ironically the
>>>>> biggest issue I have is that mdadm "checkarray" is running
>>>>> because of a systemd background process every week or so, and
>>>>> this hammers the disk into failure. Most of the normal daily
>>>>> usage never generates the link resets.
>>>>> Naturally I have changed SATA cables and moved drives around onto
>>>>> different controllers, but alas, it does seem to be the hardware
>>>>> on the motherboard.
>>>>> However as a workaround I was hoping to accept the occasional
>>>>> failure and then using some scripting and 'setpci' I can get the
>>>>> kernel to hard reset the chipset and attach the drives again. I
>>>>> have the process working in terms of getting the kernel to
>>>>> re-attach the drives, but.......
>>>>>
>>>>> Unfortunately mdraid will not let go of them, I can not stop the
>>>>> arrays, and therefore can't rebuild them. If I simply allow the
>>>>> kernel to re-attach the drives the kernel names are swapped over,
>>>>> as something (mdraid) is stopping the kernel re-using the same
>>>>> device names. Anyway being dependent on the same kernel device
>>>>> names is not a great plan anyway, so I was simply trying to get
>>>>> mdadm to reassemble the array as soon as the 'workaround' script
>>>>> gets the drives back in contact with libata (kernel).
>>>>>
>>>>> Plan:
>>>>> 1. Detecting the problem. (mdadm state)
>>>>> 2. Stop the array totally (can NOT do it)
>>>>> 3. reset the chipset across the PCI bus.
>>>>> 4. allow kernel to re-attach drives.
>>>>> 5. re-assemble the md device with mdadm
>>>>> 6. restart, if necessary higher layer services...
>>>>>
>>>>> So why is mdraid holding on to the array:
>>>>>
>>>>> # mdadm --stop /dev/md90
>>>>> mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
>>>>> process, mounted filesystem or active volume group?
>>>>>
>>>>> I can not be 100% sure that something else is using the device,
>>>>> but I can't think of anything that is and I stopped every process
>>>>> I can think of..... Plus why is the array still shown as 'active'
>>>>> when none of its member devices even exist anymore?
>>>>>
>>>>> What I do know is that device mapper (coming down from LVM)
>>>>> still has an entry in /dev/mapper. But then probably no surprise
>>>>> as /dev/md90 the failed array is still an active device node. If
>>>>> you attempt to write to it, I receive I/O errors from the kernel.
>>>>> In fact as far as any higher layer services are concerned md90
>>>>> and the LVM LV on top of it are still active and working when in
>>>>> reality, they are not. It causes very strange NFS errors and such.
>>>>>
>>>>> mdraid does actually attempt to iteratively remove both partitions
>>>>> when the kernel signals the disable state, but only 1 of them
>>>>> succeeds.
>>>>> I did an strace of the same iterative 'fail:remove' process that
>>>>> mdraid attempts when the kernel issues -- kernel: ata7.00:
>>>>> disabled
>>>>>
>>>>> eg:
>>>>> /sbin/mdadm -If sdc1 --path pci-0000:02:00.0-ata-1
>>>>> mdadm: set device faulty failed for sdc1:  Device or resource busy
>>>>>
>>>>> The only clue is perhaps this line from the strace:
>>>>> openat(AT_FDCWD, "/sys/block/md90/md/dev-sdc1/block/dev", O_RDWR)
>>>>> = -1 EACCES (Permission denied)    What is the mdadm command
>>>>> doing that results in a permission problem?
>>>>>
>>>>> So the only way I can get rid of this md raid array is a reboot.
>>>>> Damn!!!
>>>>>
>>>>>
>>>>> Any help is much appreciated.
>>>>> Aidan
>>>>>
>>>>>
>>>>>
>>>> Hi Aidan,
>>>> This is how it is implemented. Drive is not removed if array failure
>>>> will cause array failed. Please see:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?id=9a567843f7ce0037bfd4d5fdc58a09d0a527b28b
>>>>
>>>> For RAID1 you can use solution proposed in patch below but IMO it is
>>>> not your problem. Please stop LVM and then try to stop array. To
>>>> stop array it needs to be "free" (all upper handlers are down).
>>>>
>>>> Thanks,
>>>> Mariusz

-- 
Geoff Back
What if we're all just characters in someone's nightmares?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: md device remains active even when all supporting disks have failed and been disabled by the kernel.
  2022-01-13 16:35       ` Aidan Walton
  2022-01-13 16:48         ` Geoff Back
@ 2022-01-13 17:12         ` Aidan Walton
  1 sibling, 0 replies; 14+ messages in thread
From: Aidan Walton @ 2022-01-13 17:12 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: Linux RAID

So, now after manually assembling, mounting and starting nfs-kernel
server on the array:

The following is a verbatim command sequence.
From my thoughts, firstly, note that this time I did not have
'fail_last_dev' set. You see the effect in LVM, Error reading device
...... This is not exactly the same error I remember as it does not
involve udev, but it is different than when the entire array has
failed at the mdraid level.
Further note that the device mapper still sees an 'open' on the
device, which is still visible. I unmount...successfully no problem,
dev mapper shows the 'open = 0'
Then we see no further errors from LVM. That seems kinda odd, as they
are not in that order in the stack.

If I then manually remove the device mapper entry, I can stop the
array. SUCCESS.

Thoughts?
Aidan




pvdisplay
  Error reading device /dev/sdc at 0 length 512.
  Error reading device /dev/sdc at 0 length 4096.
  Error reading device /dev/sdd at 0 length 512.
  Error reading device /dev/sdd at 0 length 4096.
  Error reading device /dev/md90 at 0 length 512.
  Error reading device /dev/md90 at 0 length 4096.
  --- Physical volume ---
  PV Name               /dev/md1
  VG Name               storage.mx.vg1
  PV Size               111.73 GiB / not usable 3.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              28603
  Free PE               0
  Allocated PE          28603
  PV UUID               4yDnuz-PEHg-uZqd-djWS-DNnp-Qzuf-fYvGZJ

  --- Physical volume ---
  PV Name               /dev/md0
  VG Name               ubuntu_server-vg
  PV Size               <37.22 GiB / not usable 0
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              9528
  Free PE               0
  Allocated PE          9528
  PV UUID               G0bNbO-DOz4-I2nN-rEQq-X00m-PG3a-fPAP3I


dmsetup info -c
Name                                Maj Min Stat Open Targ Event  UUID
storage.mx.vg2-shared_sun_NAS.lv1   253   2 L--w    1    1      0
LVM-Ud9pj6QE4hK1K3xiAFMVCnno3SrXaRyTXJLtTGDOPjBUppJgzr4t0jJowixEOtx7
storage.mx.vg1-shared_sun_users.lv1 253   1 L--w    1    1      0
LVM-ypcHlbNXu36FLRgU0EcUiXBSIvcOlHEP3MHkBKsBeHf6Q68TIuGA9hd5UfCpvOeo
ubuntu_server--vg-ubuntu_server--lv 253   0 L--w    1    1      0
LVM-eGBUJxP1vlW3MfNNeC2r5JfQUiKKWZ73t3U3Jji3lggXe8LPrUf0xRE0YyPzSorO


cat /proc/mounts | grep mapper
/dev/mapper/ubuntu_server--vg-ubuntu_server--lv / ext4
rw,relatime,errors=remount-ro 0 0
/dev/mapper/storage.mx.vg1-shared_sun_users.lv1 /mnt/home ext4 rw,relatime 0 0
/dev/mapper/storage.mx.vg2-shared_sun_NAS.lv1 /mnt/shared_sun_NAS ext4
rw,relatime 0 0

umount /mnt/shared_sun_NAS
cat /proc/mounts | grep mapper
/dev/mapper/ubuntu_server--vg-ubuntu_server--lv / ext4
rw,relatime,errors=remount-ro 0 0
/dev/mapper/storage.mx.vg1-shared_sun_users.lv1 /mnt/home ext4 rw,relatime 0 0

dmsetup info -c
Name                                Maj Min Stat Open Targ Event  UUID
storage.mx.vg2-shared_sun_NAS.lv1   253   2 L--w    0    1      0
LVM-Ud9pj6QE4hK1K3xiAFMVCnno3SrXaRyTXJLtTGDOPjBUppJgzr4t0jJowixEOtx7
storage.mx.vg1-shared_sun_users.lv1 253   1 L--w    1    1      0
LVM-ypcHlbNXu36FLRgU0EcUiXBSIvcOlHEP3MHkBKsBeHf6Q68TIuGA9hd5UfCpvOeo
ubuntu_server--vg-ubuntu_server--lv 253   0 L--w    1    1      0
LVM-eGBUJxP1vlW3MfNNeC2r5JfQUiKKWZ73t3U3Jji3lggXe8LPrUf0xRE0YyPzSorO


pvdisplay
  --- Physical volume ---
  PV Name               /dev/md1
  VG Name               storage.mx.vg1
  PV Size               111.73 GiB / not usable 3.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              28603
  Free PE               0
  Allocated PE          28603
  PV UUID               4yDnuz-PEHg-uZqd-djWS-DNnp-Qzuf-fYvGZJ

  --- Physical volume ---
  PV Name               /dev/md0
  VG Name               ubuntu_server-vg
  PV Size               <37.22 GiB / not usable 0
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              9528
  Free PE               0
  Allocated PE          9528
  PV UUID               G0bNbO-DOz4-I2nN-rEQq-X00m-PG3a-fPAP3I

mdadm -D /dev/md90
/dev/md90:
           Version : 1.2
     Creation Time : Sat Nov  3 03:09:16 2018
        Raid Level : raid1
        Array Size : 488253440 (465.63 GiB 499.97 GB)
     Used Dev Size : 488253440 (465.63 GiB 499.97 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu Jan 13 17:59:38 2022
             State : clean, degraded
    Active Devices : 1
   Working Devices : 1
    Failed Devices : 1
     Spare Devices : 0

Consistency Policy : bitmap

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       2       8       49        1      active sync   /dev/sdd1

       0       8       33        -      faulty   /dev/sdc1
root@mx:~# mdadm --stop /dev/md90
mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
process, mounted filesystem or active volume group?

dmsetup remove /dev/mapper/storage.mx.vg2-shared_sun_NAS.lv1
root@mx:~# dmsetup info -c
Name                                Maj Min Stat Open Targ Event  UUID
storage.mx.vg1-shared_sun_users.lv1 253   1 L--w    1    1      0
LVM-ypcHlbNXu36FLRgU0EcUiXBSIvcOlHEP3MHkBKsBeHf6Q68TIuGA9hd5UfCpvOeo
ubuntu_server--vg-ubuntu_server--lv 253   0 L--w    1    1      0
LVM-eGBUJxP1vlW3MfNNeC2r5JfQUiKKWZ73t3U3Jji3lggXe8LPrUf0xRE0YyPzSorO

mdadm --stop /dev/md90
mdadm: stopped /dev/md90



On Thu, 13 Jan 2022 at 17:35, Aidan Walton <aidan.walton@gmail.com> wrote:
>
> Progress of sorts:
> I have tried to get the results as requested. However I was
> experimenting with the patch that you referenced initially and before
> I did this test, I forgot that I had actually run:
> echo 1 > /sys/block/md90/md/fail_last_dev
>
> So the outcome was interesting as the result proves this feature does
> work as expected and the md90 array showed both devices flagged as
> down. Better!
> /dev/md90:
>            Version : 1.2
>      Creation Time : Sat Nov  3 03:09:16 2018
>         Raid Level : raid1
>         Array Size : 488253440 (465.63 GiB 499.97 GB)
>      Used Dev Size : 488253440 (465.63 GiB 499.97 GB)
>       Raid Devices : 2
>      Total Devices : 2
>        Persistence : Superblock is persistent
>
>      Intent Bitmap : Internal
>
>        Update Time : Thu Jan 13 17:08:38 2022
>              State : clean, FAILED
>     Active Devices : 0
>     Failed Devices : 2
>      Spare Devices : 0
>
> Consistency Policy : bitmap
>
>     Number   Major   Minor   RaidDevice State
>        -       0        0        0      removed
>        -       0        0        1      removed
>
>        0       8       33        -      faulty   /dev/sdc1
>        2       8       49        -      faulty   /dev/sdd1
>
> The process that led to this is below:
> from journalctl:
> Jan 13 17:07:05 mx kernel: ata7.00: exception Emask 0x32 SAct 0x0 SErr
> 0x0 action 0xe frozen
> Jan 13 17:07:05 mx kernel: ata7.00: irq_stat 0xffffffff, unknown FIS
> 00000000 00000000 00000000 00000000, host bus
> Jan 13 17:07:05 mx kernel: ata7.00: failed command: READ DMA
> Jan 13 17:07:05 mx kernel: ata7.00: cmd
> c8/00:00:18:2d:ec/00:00:00:00:00/e0 tag 22 dma 131072 in
> Jan 13 17:07:05 mx kernel: ata7.00: status: { DRDY }
> Jan 13 17:07:05 mx kernel: ata7: hard resetting link
> Jan 13 17:07:15 mx kernel: ata7: softreset failed (1st FIS failed)
> Jan 13 17:07:15 mx kernel: ata7: hard resetting link
> Jan 13 17:07:25 mx kernel: ata7: softreset failed (1st FIS failed)
> Jan 13 17:07:25 mx kernel: ata7: hard resetting link
> Jan 13 17:07:36 mx kernel: ata8.00: exception Emask 0x40 SAct 0x0 SErr
> 0x800 action 0x6 frozen
> Jan 13 17:07:36 mx kernel: ata8: SError: { HostInt }
> Jan 13 17:07:36 mx kernel: ata8.00: failed command: READ DMA
> Jan 13 17:07:36 mx kernel: ata8.00: cmd
> c8/00:00:78:53:d9/00:00:00:00:00/e0 tag 10 dma 131072 in
> Jan 13 17:07:36 mx kernel: ata8.00: status: { DRDY }
> Jan 13 17:07:36 mx kernel: ata8: hard resetting link
> Jan 13 17:07:46 mx kernel: ata8: softreset failed (1st FIS failed)
> Jan 13 17:07:46 mx kernel: ata8: hard resetting link
> Jan 13 17:07:56 mx kernel: ata8: softreset failed (1st FIS failed)
> Jan 13 17:07:56 mx kernel: ata8: hard resetting link
> Jan 13 17:08:00 mx kernel: ata7: softreset failed (1st FIS failed)
> Jan 13 17:08:00 mx kernel: ata7: hard resetting link
> Jan 13 17:08:05 mx kernel: ata7: softreset failed (1st FIS failed)
> Jan 13 17:08:05 mx kernel: ata7: reset failed, giving up
> Jan 13 17:08:05 mx kernel: ata7.00: disabled
> Jan 13 17:08:05 mx kernel: ata7: EH complete
> Jan 13 17:08:31 mx kernel: ata8: softreset failed (1st FIS failed)
> Jan 13 17:08:31 mx kernel: ata8: hard resetting link
> Jan 13 17:08:36 mx kernel: ata8: softreset failed (1st FIS failed)
> Jan 13 17:08:36 mx kernel: ata8: reset failed, giving up
> Jan 13 17:08:36 mx kernel: ata8.00: disabled
> Jan 13 17:08:36 mx kernel: ata8: EH complete
>
>
> from udevadm monitor, NOTE, I noticed that udev monitor did not seem
> to spit anything out at the point in time when the first SATA device
> ata7.00: was disabled. The messages below only appeared 30secs later
> when ata8.00: went down: Sorry no timestamping on udevadm monitor, but
> read below from exactly when this occurred: Jan 13 17:08:36 mx kernel:
> ata8.00: disabled
>
> KERNEL[226088.463136] add
> /kernel/slab/kmalloc-192/cgroup/kmalloc-192(549:mdmonitor.service)
> (cgroup)
> KERNEL[226088.463335] add
> /kernel/slab/kmalloc-1k/cgroup/kmalloc-1k(549:mdmonitor.service)
> (cgroup)
> KERNEL[226088.464229] add
> /kernel/slab/task_struct/cgroup/task_struct(549:mdmonitor.service)
> (cgroup)
> KERNEL[226088.464374] add
> /kernel/slab/:A-0000080/cgroup/task_delay_info(549:mdmonitor.service)
> (cgroup)
> KERNEL[226088.464467] add
> /kernel/slab/:A-0000704/cgroup/files_cache(549:mdmonitor.service)
> (cgroup)
> KERNEL[226088.464718] add
> /kernel/slab/sighand_cache/cgroup/sighand_cache(549:mdmonitor.service)
> (cgroup)
> KERNEL[226088.465312] add
> /kernel/slab/:A-0001152/cgroup/signal_cache(549:mdmonitor.service)
> (cgroup)
> UDEV  [226088.472483] add
> /kernel/slab/kmalloc-192/cgroup/kmalloc-192(549:mdmonitor.service)
> (cgroup)
> UDEV  [226088.473360] add
> /kernel/slab/kmalloc-1k/cgroup/kmalloc-1k(549:mdmonitor.service)
> (cgroup)
> UDEV  [226088.480328] add
> /kernel/slab/task_struct/cgroup/task_struct(549:mdmonitor.service)
> (cgroup)
> UDEV  [226088.486830] add
> /kernel/slab/:A-0000080/cgroup/task_delay_info(549:mdmonitor.service)
> (cgroup)
> UDEV  [226088.487077] add
> /kernel/slab/sighand_cache/cgroup/sighand_cache(549:mdmonitor.service)
> (cgroup)
> UDEV  [226088.488851] add
> /kernel/slab/:A-0000704/cgroup/files_cache(549:mdmonitor.service)
> (cgroup)
> UDEV  [226088.493066] add
> /kernel/slab/:A-0001152/cgroup/signal_cache(549:mdmonitor.service)
> (cgroup)
> KERNEL[226089.014423] add
> /kernel/slab/:a-0000104/cgroup/buffer_head(549:mdmonitor.service)
> (cgroup)
> UDEV  [226089.017837] add
> /kernel/slab/:a-0000104/cgroup/buffer_head(549:mdmonitor.service)
> (cgroup)
> KERNEL[226089.188538] add
> /kernel/slab/radix_tree_node/cgroup/radix_tree_node(549:mdmonitor.service)
> (cgroup)
> KERNEL[226089.189366] add
> /kernel/slab/ext4_inode_cache/cgroup/ext4_inode_cache(549:mdmonitor.service)
> (cgroup)
> UDEV  [226089.193043] add
> /kernel/slab/radix_tree_node/cgroup/radix_tree_node(549:mdmonitor.service)
> (cgroup)
> UDEV  [226089.194414] add
> /kernel/slab/ext4_inode_cache/cgroup/ext4_inode_cache(549:mdmonitor.service)
> (cgroup)
> KERNEL[226090.421762] add
> /kernel/slab/kmalloc-8/cgroup/kmalloc-8(549:mdmonitor.service)
> (cgroup)
> UDEV  [226090.425465] add
> /kernel/slab/kmalloc-8/cgroup/kmalloc-8(549:mdmonitor.service)
> (cgroup)
> KERNEL[226090.442542] add
> /kernel/slab/proc_inode_cache/cgroup/proc_inode_cache(549:mdmonitor.service)
> (cgroup)
> UDEV  [226090.445999] add
> /kernel/slab/proc_inode_cache/cgroup/proc_inode_cache(549:mdmonitor.service)
> (cgroup)
> KERNEL[226090.458800] add
> /kernel/slab/:A-0000072/cgroup/eventpoll_pwq(549:mdmonitor.service)
> (cgroup)
> KERNEL[226090.460227] add
> /kernel/slab/kmalloc-16/cgroup/kmalloc-16(549:mdmonitor.service)
> (cgroup)
> UDEV  [226090.463193] add
> /kernel/slab/:A-0000072/cgroup/eventpoll_pwq(549:mdmonitor.service)
> (cgroup)
> UDEV  [226090.467271] add
> /kernel/slab/kmalloc-16/cgroup/kmalloc-16(549:mdmonitor.service)
> (cgroup)
> KERNEL[226090.880178] add
> /kernel/slab/kmalloc-rcl-64/cgroup/kmalloc-rcl-64(549:mdmonitor.service)
> (cgroup)
> UDEV  [226090.883794] add
> /kernel/slab/kmalloc-rcl-64/cgroup/kmalloc-rcl-64(549:mdmonitor.service)
> (cgroup)
>
>
>
> dmsetup info -c
> Name                                Maj Min Stat Open Targ Event  UUID
> storage.mx.vg2-shared_sun_NAS.lv1   253   0 L--w    1    1      0
> LVM-Ud9pj6QE4hK1K3xiAFMVCnno3SrXaRyTXJLtTGDOPjBUppJgzr4t0jJowixEOtx7
> storage.mx.vg1-shared_sun_users.lv1 253   2 L--w    1    1      0
> LVM-ypcHlbNXu36FLRgU0EcUiXBSIvcOlHEP3MHkBKsBeHf6Q68TIuGA9hd5UfCpvOeo
> ubuntu_server--vg-ubuntu_server--lv 253   1 L--w    1    1      0
> LVM-eGBUJxP1vlW3MfNNeC2r5JfQUiKKWZ73t3U3Jji3lggXe8LPrUf0xRE0YyPzSorO
>
> NOTE the 'open' status on the NAS.lv1 device. In fact the device is not mounted:
>
> cat /proc/mounts | grep mapper
> /dev/mapper/ubuntu_server--vg-ubuntu_server--lv / ext4
> rw,relatime,errors=remount-ro 0 0
> /dev/mapper/storage.mx.vg1-shared_sun_users.lv1 /mnt/home ext4 rw,relatime 0 0
>
> pvdisplay
>   --- Physical volume ---
>   PV Name               /dev/md1
>   VG Name               storage.mx.vg1
>   PV Size               111.73 GiB / not usable 3.00 MiB
>   Allocatable           yes (but full)
>   PE Size               4.00 MiB
>   Total PE              28603
>   Free PE               0
>   Allocated PE          28603
>   PV UUID               4yDnuz-PEHg-uZqd-djWS-DNnp-Qzuf-fYvGZJ
>
>   --- Physical volume ---
>   PV Name               /dev/md0
>   VG Name               ubuntu_server-vg
>   PV Size               <37.22 GiB / not usable 0
>   Allocatable           yes (but full)
>   PE Size               4.00 MiB
>   Total PE              9528
>   Free PE               0
>   Allocated PE          9528
>   PV UUID               G0bNbO-DOz4-I2nN-rEQq-X00m-PG3a-fPAP3I
>
> So in this case LVM seems to recognise that the md90 device is gone.
>
> Before I changed the 'fail_last_dev' option, When I ran these LVM
> commands, I was experiencing a short delay and then a report in place
> of the failed device saying that LVM had given up waiting for a udev
> entry to become available after 10000mS. Sorry I didn't catch this,
> for the log, but it is from memory.
>
> But now this delay is not happening and LVM seems to have a correct
> and consistent view of the failed mount point. Clearly mdraid has sent
> the failure up the stack.
>
>
> However mdraid will still NOT stop the md90 device:
>
> mdadm --stop /dev/md90
> mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
> process, mounted filesystem or active volume group?
>
> ATB
> Aidan
>
> On Thu, 13 Jan 2022 at 15:46, Mariusz Tkaczyk
> <mariusz.tkaczyk@linux.intel.com> wrote:
> >
> > Hi Aidan,
> >
> > On Wed, 12 Jan 2022 02:29:47 +0100
> > Aidan Walton <aidan.walton@gmail.com> wrote:
> >
> > > Hi Mariusz,
> > > In my case, the fact that mdraid does not show a 'total failure' is
> > > not the root of the problem. However in my opinion I would say that
> > > not having mdraid more accurately reflect the state of the underlying
> > > hardware can be mis-leading. Initially when I looked at this issue, I
> > > was convinced that only one disk had failed and I was scratching my
> > > head about firstly why I still could not R/W the array while it
> > > appeared to have an active member. Secondly, when I rebooted I noticed
> > > that the array became instantly synchronised with both members active.
> >
> > We have raid1 so first fail should be recorded in metadata. From your
> > description, I understand that nothing like this happened. For me, it
> > seems that the controller lost both drives in the same moment and as a
> > result nothing was saved. After reboot raid is assembled without
> > rebuild because metadata on both members is valid.
> >
> > > This was not what I expected, as normally an array that has had a
> > > single failed disk would require a ra-add and resync.  Then when the
> > > problem re-occured I noticed that it was not the same disk that was
> > > flagged faulty, next reboot; the faulty disk flipped back the other
> > > way... and so on. This was what prompted me to look closer at the
> > > kernel. Here I found my answer at the SATA controller. Therefore
> > > although mdraids design approach did not cause me any data loss, it
> > > did have me looking in the wrong direction for the fault, assuming a
> > > disk problem.
> > >
> > > I have still not been able to successfully --stop the array. I think
> > > the issue sits in the LVM domain. Although I can not be 100% sure.
> > > What I have achieved is some level of understanding that some process
> > > that starts a boot time is in some unknown manner holding a lock on
> > > the mdarid  - devmapper - LVM combination. I have unmounted the file
> > > system, but LVM refuses to let go of the logical volume. Therefore so
> > > does dev-mapper and of course mdraid. I have systematically stopped or
> > > killed almost every single running process on the system, taking it
> > > back to a skeleton, with not much more than init running, it still
> > > refuses to let go
> > >
> > > However, when I prevent auto-mounting of the raid array at boot, and
> > > then manually assemble the raid array, LVM finds its meta data, builds
> > > the VG and LV and mounts. If I then manually force the exact same SATA
> > > controller failure, which results in the exact same mdraid behaviour,
> > > I am then able to unmount the filesystem and ...... hey presto
> > > deactivate the LVM LV. Which then allows me to --stop the mdraid. Just
> > > as I want. Again it does not solve my SATA hardware issues, but being
> > > at this point does give me options to restart the hardware etc, and
> > > probably, though very messily, get the filesystem up again without a
> > > reboot. The problem being I can not achieve this behaviour without
> > > manually assembling the array after boot. If you have any idea what
> > > could possibly be holding this lock I would be glad to hear.
> > >
> > Could you connect to the udev monitor and analyze events triggered in
> > both cases? This is the one idea I have.
> >
> > Thanks,
> > Mariusz
> >
> > > At this point I'm going to have to try and systematically step through
> > > the boot process and try re-arranging, when the array gets assembled.
> > > My first attempts at this have been to <ignore> the raid array in
> > > mdadm.conf and comment the array out of /etc/fstab. In this way mdraid
> > > inside initramfs does not auto-assemble and LVM does not auto scan for
> > > the VG. Once I am in the real boot sequence, I have created a systemd
> > > mount unit that I can pull in from other systemd units, to change the
> > > point in the boot process when the array is assembled. In this way
> > > hopefully I can influence when other services are interacting with the
> > > array in some way and perhaps find the root cause ......   Work in
> > > progress..but slowly as the fault occurs only very occasionally and I
> > > still need a working server.
> > > All the best.. Aidan
> > >
> > > On Mon, 10 Jan 2022 at 10:47, Mariusz Tkaczyk
> > > <mariusz.tkaczyk@linux.intel.com> wrote:
> > > >
> > > > On Fri, 7 Jan 2022 23:30:31 +0100
> > > > Aidan Walton <aidan.walton@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > > I have a system running:
> > > > > Ubuntu Server 20.04.3 LTS on a 5.4.0-92-generic kernel.
> > > > >
> > > > > On the motherboard is a:
> > > > > SATA controller: JMicron Technology Corp. JMB363 SATA/IDE
> > > > > Controller (rev 02)
> > > > >
> > > > > Connected to this I have:
> > > > > 2x Western Digital - WDC WD5001AALS-00L3B2, BIOS :01.03B01 500Gb
> > > > > drives
> > > > >
> > > > > Each drive has a single partition and is part of a RAID1 array:
> > > > > /dev/md90:
> > > > > .....
> > > > >     Number   Major   Minor   RaidDevice State
> > > > >        0       8       33        0      active sync   /dev/sdc1
> > > > >        2       8       49        1      active sync   /dev/sdd1
> > > > >
> > > > > On top of this I have a single LVM VG and LV. (Probably not
> > > > > important)
> > > > >
> > > > > I have noticed some strange behaviour and upon investigation I
> > > > > find the md device in the following state:
> > > > > /dev/md90:
> > > > > ....
> > > > >
> > > > >     Number   Major   Minor   RaidDevice State
> > > > >        0       8       33        0      active sync   /dev/sdc1
> > > > >        -       0        0        1      removed
> > > > >
> > > > >        2       8       49        -      faulty   /dev/sdd1
> > > > >
> > > > >
> > > > > In fact neither /dev/sdc1 or /dev/sdd1 are available. In fact
> > > > > neither are /dev/sdc or /dev/sdd, the physical drives, as they
> > > > > both been disconnected by the kernel:
> > > > > /dev/sdc is attached to ata7:00  and  /dev/sdd is attached to
> > > > > ata.8:00 This is the log of the kernel events:
> > > > >
> > > > >
> > > > > Jan 07 22:09:03 mx kernel: ata7.00: exception Emask 0x32 SAct 0x0
> > > > > SErr 0x0 action 0xe frozen
> > > > > Jan 07 22:09:03 mx kernel: ata7.00: irq_stat 0xffffffff, unknown
> > > > > FIS 00000000 00000000 00000000 00000000, host bus
> > > > > Jan 07 22:09:03 mx kernel: ata7.00: failed command: READ DMA
> > > > > Jan 07 22:09:03 mx kernel: ata7.00: cmd
> > > > > c8/00:00:00:cf:26/00:00:00:00:00/e0 tag 18 dma 131072 in
> > > > > Jan 07 22:09:03 mx kernel: ata7.00: status: { DRDY }
> > > > > Jan 07 22:09:03 mx kernel: ata7: hard resetting link
> > > > > Jan 07 22:09:03 mx kernel: ata7: SATA link up 1.5 Gbps (SStatus
> > > > > 113 SControl 310)
> > > > > Jan 07 22:09:09 mx kernel: ata7.00: qc timeout (cmd 0xec)
> > > > > Jan 07 22:09:09 mx kernel: ata7.00: failed to IDENTIFY (I/O error,
> > > > > err_mask=0x4) Jan 07 22:09:09 mx kernel: ata7.00: revalidation
> > > > > failed (errno=-5) Jan 07 22:09:09 mx kernel: ata7: hard resetting
> > > > > link Jan 07 22:09:19 mx kernel: ata7: softreset failed (1st FIS
> > > > > failed) Jan 07 22:09:19 mx kernel: ata7: hard resetting link
> > > > > Jan 07 22:09:29 mx kernel: ata7: softreset failed (1st FIS failed)
> > > > > Jan 07 22:09:29 mx kernel: ata7: hard resetting link
> > > > > Jan 07 22:09:35 mx kernel: ata8.00: exception Emask 0x40 SAct 0x0
> > > > > SErr 0x800 action 0x6 frozen
> > > > > Jan 07 22:09:35 mx kernel: ata8: SError: { HostInt }
> > > > > Jan 07 22:09:35 mx kernel: ata8.00: failed command: READ DMA
> > > > > Jan 07 22:09:35 mx kernel: ata8.00: cmd
> > > > > c8/00:00:00:64:4a/00:00:00:00:00/e0 tag 2 dma 131072 in
> > > > > Jan 07 22:09:35 mx kernel: ata8.00: status: { DRDY }
> > > > > Jan 07 22:09:35 mx kernel: ata8: hard resetting link
> > > > > Jan 07 22:09:45 mx kernel: ata8: softreset failed (1st FIS failed)
> > > > > Jan 07 22:09:45 mx kernel: ata8: hard resetting link
> > > > > Jan 07 22:09:55 mx kernel: ata8: softreset failed (1st FIS failed)
> > > > > Jan 07 22:09:55 mx kernel: ata8: hard resetting link
> > > > > Jan 07 22:10:04 mx kernel: ata7: softreset failed (1st FIS failed)
> > > > > Jan 07 22:10:04 mx kernel: ata7: hard resetting link
> > > > > Jan 07 22:10:09 mx kernel: ata7: softreset failed (1st FIS failed)
> > > > > Jan 07 22:10:09 mx kernel: ata7: reset failed, giving up
> > > > > Jan 07 22:10:09 mx kernel: ata7.00: disabled
> > > > > Jan 07 22:10:09 mx kernel: ata7: EH complete
> > > > > Jan 07 22:10:30 mx kernel: ata8: softreset failed (1st FIS failed)
> > > > > Jan 07 22:10:30 mx kernel: ata8: hard resetting link
> > > > > Jan 07 22:10:35 mx kernel: ata8: softreset failed (1st FIS failed)
> > > > > Jan 07 22:10:35 mx kernel: ata8: reset failed, giving up
> > > > > Jan 07 22:10:35 mx kernel: ata8.00: disabled
> > > > > Jan 07 22:10:35 mx kernel: ata8: EH complete
> > > > >
> > > > > This is happening because of some issue with the SATA controller
> > > > > on the motherboard. This has not been resolved and probably never
> > > > > will be, I see many others through google search complaining of
> > > > > similar issues with the SATA controller.
> > > > > This failure only occurs when the SATA controller is placed under
> > > > > very heavy load, I have minimised the impact of the problem by
> > > > > not using NCQ, this helps, but it still occurs. Ironically the
> > > > > biggest issue I have is that mdadm "checkarray" is running
> > > > > because of a systemd background process every week or so, and
> > > > > this hammers the disk into failure. Most of the normal daily
> > > > > usage never generates the link resets.
> > > > > Naturally I have changed SATA cables and moved drives around onto
> > > > > different controllers, but alas, it does seem to be the hardware
> > > > > on the motherboard.
> > > > > However as a workaround I was hoping to accept the occasional
> > > > > failure and then using some scripting and 'setpci' I can get the
> > > > > kernel to hard reset the chipset and attach the drives again. I
> > > > > have the process working in terms of getting the kernel to
> > > > > re-attach the drives, but.......
> > > > >
> > > > > Unfortunately mdraid will not let go of them, I can not stop the
> > > > > arrays, and therefore can't rebuild them. If I simply allow the
> > > > > kernel to re-attach the drives the kernel names are swapped over,
> > > > > as something (mdraid) is stopping the kernel re-using the same
> > > > > device names. Anyway being dependent on the same kernel device
> > > > > names is not a great plan anyway, so I was simply trying to get
> > > > > mdadm to reassemble the array as soon as the 'workaround' script
> > > > > gets the drives back in contact with libata (kernel).
> > > > >
> > > > > Plan:
> > > > > 1. Detecting the problem. (mdadm state)
> > > > > 2. Stop the array totally (can NOT do it)
> > > > > 3. reset the chipset across the PCI bus.
> > > > > 4. allow kernel to re-attach drives.
> > > > > 5. re-assemble the md device with mdadm
> > > > > 6. restart, if necessary higher layer services...
> > > > >
> > > > > So why is mdraid holding on to the array:
> > > > >
> > > > > # mdadm --stop /dev/md90
> > > > > mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
> > > > > process, mounted filesystem or active volume group?
> > > > >
> > > > > I can not be 100% sure that something else is using the device,
> > > > > but I can't think of anything that is and I stopped every process
> > > > > I can think of..... Plus why is the array still shown as 'active'
> > > > > when none of its member devices even exist anymore?
> > > > >
> > > > > What I do know is that device mapper (coming down from LVM)
> > > > > still has an entry in /dev/mapper. But then probably no surprise
> > > > > as /dev/md90 the failed array is still an active device node. If
> > > > > you attempt to write to it, I receive I/O errors from the kernel.
> > > > > In fact as far as any higher layer services are concerned md90
> > > > > and the LVM LV on top of it are still active and working when in
> > > > > reality, they are not. It causes very strange NFS errors and such.
> > > > >
> > > > > mdraid does actually attempt to iteratively remove both partitions
> > > > > when the kernel signals the disable state, but only 1 of them
> > > > > succeeds.
> > > > > I did an strace of the same iterative 'fail:remove' process that
> > > > > mdraid attempts when the kernel issues -- kernel: ata7.00:
> > > > > disabled
> > > > >
> > > > > eg:
> > > > > /sbin/mdadm -If sdc1 --path pci-0000:02:00.0-ata-1
> > > > > mdadm: set device faulty failed for sdc1:  Device or resource busy
> > > > >
> > > > > The only clue is perhaps this line from the strace:
> > > > > openat(AT_FDCWD, "/sys/block/md90/md/dev-sdc1/block/dev", O_RDWR)
> > > > > = -1 EACCES (Permission denied)    What is the mdadm command
> > > > > doing that results in a permission problem?
> > > > >
> > > > > So the only way I can get rid of this md raid array is a reboot.
> > > > > Damn!!!
> > > > >
> > > > >
> > > > > Any help is much appreciated.
> > > > > Aidan
> > > > >
> > > > >
> > > > >
> > > > Hi Aidan,
> > > > This is how it is implemented. Drive is not removed if array failure
> > > > will cause array failed. Please see:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?id=9a567843f7ce0037bfd4d5fdc58a09d0a527b28b
> > > >
> > > > For RAID1 you can use solution proposed in patch below but IMO it is
> > > > not your problem. Please stop LVM and then try to stop array. To
> > > > stop array it needs to be "free" (all upper handlers are down).
> > > >
> > > > Thanks,
> > > > Mariusz
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: md device remains active even when all supporting disks have failed and been disabled by the kernel.
  2022-01-13 16:48         ` Geoff Back
@ 2022-01-13 17:21           ` Aidan Walton
  0 siblings, 0 replies; 14+ messages in thread
From: Aidan Walton @ 2022-01-13 17:21 UTC (permalink / raw)
  To: Geoff Back; +Cc: Mariusz Tkaczyk, Linux RAID

This is the problem Geoff, it is not possible to do this in either direction.
When the system fails as described initially without 'fail_last_dev'
set. Although the device is NOT mounted, lvchange will not allow the
LV to be deactivated, dmsetup will not allow the mapping to be removed
and mdraid will not --stop the array.
They seem to be locked in a ring.

However as you see using 'fail_last_dev' set. Then the behaviour of
LVM and dev mapper changes due to mdraid signalling the device failure
up the stack. BUT mdraid itself will still NOT stop the array, even
though there is nothing mapped to it, and nothing using it at a
logical level.

When I assemble the array in user space and not at boot time, you see
the outcome in my last post. It then is finally possible to stop the
array.


I noticed that during the manually assembled array failure procedure,
the udevadm monitor stays 100% silent when the kernel disables both
devices.
Whereas, when the system is allowed to auto-assemble the array and LVM
devices, then udevadm monitor references the mdmonitor service.

Hmmm
Aidan

On Thu, 13 Jan 2022 at 17:48, Geoff Back <geoff@demonlair.co.uk> wrote:
>
> Hi,
>
> I believe md90 will remain locked all the time that any LVM-created
> device mapping exists that references the md90 PV, regardless of whether
> that mapping is in use.
>
> You could try using "lvchange -an <vgname>/<volume>" to deactivate the
> LV.  That should remove the DM mapping, which should in turn unlock md90.
>
> Cheers,
>
> Geoff.
>
>
> On 13/01/2022 16:35, Aidan Walton wrote:
> > Progress of sorts:
> > I have tried to get the results as requested. However I was
> > experimenting with the patch that you referenced initially and before
> > I did this test, I forgot that I had actually run:
> > echo 1 > /sys/block/md90/md/fail_last_dev
> >
> > So the outcome was interesting as the result proves this feature does
> > work as expected and the md90 array showed both devices flagged as
> > down. Better!
> > /dev/md90:
> >            Version : 1.2
> >      Creation Time : Sat Nov  3 03:09:16 2018
> >         Raid Level : raid1
> >         Array Size : 488253440 (465.63 GiB 499.97 GB)
> >      Used Dev Size : 488253440 (465.63 GiB 499.97 GB)
> >       Raid Devices : 2
> >      Total Devices : 2
> >        Persistence : Superblock is persistent
> >
> >      Intent Bitmap : Internal
> >
> >        Update Time : Thu Jan 13 17:08:38 2022
> >              State : clean, FAILED
> >     Active Devices : 0
> >     Failed Devices : 2
> >      Spare Devices : 0
> >
> > Consistency Policy : bitmap
> >
> >     Number   Major   Minor   RaidDevice State
> >        -       0        0        0      removed
> >        -       0        0        1      removed
> >
> >        0       8       33        -      faulty   /dev/sdc1
> >        2       8       49        -      faulty   /dev/sdd1
> >
> > The process that led to this is below:
> > from journalctl:
> > Jan 13 17:07:05 mx kernel: ata7.00: exception Emask 0x32 SAct 0x0 SErr
> > 0x0 action 0xe frozen
> > Jan 13 17:07:05 mx kernel: ata7.00: irq_stat 0xffffffff, unknown FIS
> > 00000000 00000000 00000000 00000000, host bus
> > Jan 13 17:07:05 mx kernel: ata7.00: failed command: READ DMA
> > Jan 13 17:07:05 mx kernel: ata7.00: cmd
> > c8/00:00:18:2d:ec/00:00:00:00:00/e0 tag 22 dma 131072 in
> > Jan 13 17:07:05 mx kernel: ata7.00: status: { DRDY }
> > Jan 13 17:07:05 mx kernel: ata7: hard resetting link
> > Jan 13 17:07:15 mx kernel: ata7: softreset failed (1st FIS failed)
> > Jan 13 17:07:15 mx kernel: ata7: hard resetting link
> > Jan 13 17:07:25 mx kernel: ata7: softreset failed (1st FIS failed)
> > Jan 13 17:07:25 mx kernel: ata7: hard resetting link
> > Jan 13 17:07:36 mx kernel: ata8.00: exception Emask 0x40 SAct 0x0 SErr
> > 0x800 action 0x6 frozen
> > Jan 13 17:07:36 mx kernel: ata8: SError: { HostInt }
> > Jan 13 17:07:36 mx kernel: ata8.00: failed command: READ DMA
> > Jan 13 17:07:36 mx kernel: ata8.00: cmd
> > c8/00:00:78:53:d9/00:00:00:00:00/e0 tag 10 dma 131072 in
> > Jan 13 17:07:36 mx kernel: ata8.00: status: { DRDY }
> > Jan 13 17:07:36 mx kernel: ata8: hard resetting link
> > Jan 13 17:07:46 mx kernel: ata8: softreset failed (1st FIS failed)
> > Jan 13 17:07:46 mx kernel: ata8: hard resetting link
> > Jan 13 17:07:56 mx kernel: ata8: softreset failed (1st FIS failed)
> > Jan 13 17:07:56 mx kernel: ata8: hard resetting link
> > Jan 13 17:08:00 mx kernel: ata7: softreset failed (1st FIS failed)
> > Jan 13 17:08:00 mx kernel: ata7: hard resetting link
> > Jan 13 17:08:05 mx kernel: ata7: softreset failed (1st FIS failed)
> > Jan 13 17:08:05 mx kernel: ata7: reset failed, giving up
> > Jan 13 17:08:05 mx kernel: ata7.00: disabled
> > Jan 13 17:08:05 mx kernel: ata7: EH complete
> > Jan 13 17:08:31 mx kernel: ata8: softreset failed (1st FIS failed)
> > Jan 13 17:08:31 mx kernel: ata8: hard resetting link
> > Jan 13 17:08:36 mx kernel: ata8: softreset failed (1st FIS failed)
> > Jan 13 17:08:36 mx kernel: ata8: reset failed, giving up
> > Jan 13 17:08:36 mx kernel: ata8.00: disabled
> > Jan 13 17:08:36 mx kernel: ata8: EH complete
> >
> >
> > from udevadm monitor, NOTE, I noticed that udev monitor did not seem
> > to spit anything out at the point in time when the first SATA device
> > ata7.00: was disabled. The messages below only appeared 30secs later
> > when ata8.00: went down: Sorry no timestamping on udevadm monitor, but
> > read below from exactly when this occurred: Jan 13 17:08:36 mx kernel:
> > ata8.00: disabled
> >
> > KERNEL[226088.463136] add
> > /kernel/slab/kmalloc-192/cgroup/kmalloc-192(549:mdmonitor.service)
> > (cgroup)
> > KERNEL[226088.463335] add
> > /kernel/slab/kmalloc-1k/cgroup/kmalloc-1k(549:mdmonitor.service)
> > (cgroup)
> > KERNEL[226088.464229] add
> > /kernel/slab/task_struct/cgroup/task_struct(549:mdmonitor.service)
> > (cgroup)
> > KERNEL[226088.464374] add
> > /kernel/slab/:A-0000080/cgroup/task_delay_info(549:mdmonitor.service)
> > (cgroup)
> > KERNEL[226088.464467] add
> > /kernel/slab/:A-0000704/cgroup/files_cache(549:mdmonitor.service)
> > (cgroup)
> > KERNEL[226088.464718] add
> > /kernel/slab/sighand_cache/cgroup/sighand_cache(549:mdmonitor.service)
> > (cgroup)
> > KERNEL[226088.465312] add
> > /kernel/slab/:A-0001152/cgroup/signal_cache(549:mdmonitor.service)
> > (cgroup)
> > UDEV  [226088.472483] add
> > /kernel/slab/kmalloc-192/cgroup/kmalloc-192(549:mdmonitor.service)
> > (cgroup)
> > UDEV  [226088.473360] add
> > /kernel/slab/kmalloc-1k/cgroup/kmalloc-1k(549:mdmonitor.service)
> > (cgroup)
> > UDEV  [226088.480328] add
> > /kernel/slab/task_struct/cgroup/task_struct(549:mdmonitor.service)
> > (cgroup)
> > UDEV  [226088.486830] add
> > /kernel/slab/:A-0000080/cgroup/task_delay_info(549:mdmonitor.service)
> > (cgroup)
> > UDEV  [226088.487077] add
> > /kernel/slab/sighand_cache/cgroup/sighand_cache(549:mdmonitor.service)
> > (cgroup)
> > UDEV  [226088.488851] add
> > /kernel/slab/:A-0000704/cgroup/files_cache(549:mdmonitor.service)
> > (cgroup)
> > UDEV  [226088.493066] add
> > /kernel/slab/:A-0001152/cgroup/signal_cache(549:mdmonitor.service)
> > (cgroup)
> > KERNEL[226089.014423] add
> > /kernel/slab/:a-0000104/cgroup/buffer_head(549:mdmonitor.service)
> > (cgroup)
> > UDEV  [226089.017837] add
> > /kernel/slab/:a-0000104/cgroup/buffer_head(549:mdmonitor.service)
> > (cgroup)
> > KERNEL[226089.188538] add
> > /kernel/slab/radix_tree_node/cgroup/radix_tree_node(549:mdmonitor.service)
> > (cgroup)
> > KERNEL[226089.189366] add
> > /kernel/slab/ext4_inode_cache/cgroup/ext4_inode_cache(549:mdmonitor.service)
> > (cgroup)
> > UDEV  [226089.193043] add
> > /kernel/slab/radix_tree_node/cgroup/radix_tree_node(549:mdmonitor.service)
> > (cgroup)
> > UDEV  [226089.194414] add
> > /kernel/slab/ext4_inode_cache/cgroup/ext4_inode_cache(549:mdmonitor.service)
> > (cgroup)
> > KERNEL[226090.421762] add
> > /kernel/slab/kmalloc-8/cgroup/kmalloc-8(549:mdmonitor.service)
> > (cgroup)
> > UDEV  [226090.425465] add
> > /kernel/slab/kmalloc-8/cgroup/kmalloc-8(549:mdmonitor.service)
> > (cgroup)
> > KERNEL[226090.442542] add
> > /kernel/slab/proc_inode_cache/cgroup/proc_inode_cache(549:mdmonitor.service)
> > (cgroup)
> > UDEV  [226090.445999] add
> > /kernel/slab/proc_inode_cache/cgroup/proc_inode_cache(549:mdmonitor.service)
> > (cgroup)
> > KERNEL[226090.458800] add
> > /kernel/slab/:A-0000072/cgroup/eventpoll_pwq(549:mdmonitor.service)
> > (cgroup)
> > KERNEL[226090.460227] add
> > /kernel/slab/kmalloc-16/cgroup/kmalloc-16(549:mdmonitor.service)
> > (cgroup)
> > UDEV  [226090.463193] add
> > /kernel/slab/:A-0000072/cgroup/eventpoll_pwq(549:mdmonitor.service)
> > (cgroup)
> > UDEV  [226090.467271] add
> > /kernel/slab/kmalloc-16/cgroup/kmalloc-16(549:mdmonitor.service)
> > (cgroup)
> > KERNEL[226090.880178] add
> > /kernel/slab/kmalloc-rcl-64/cgroup/kmalloc-rcl-64(549:mdmonitor.service)
> > (cgroup)
> > UDEV  [226090.883794] add
> > /kernel/slab/kmalloc-rcl-64/cgroup/kmalloc-rcl-64(549:mdmonitor.service)
> > (cgroup)
> >
> >
> >
> > dmsetup info -c
> > Name                                Maj Min Stat Open Targ Event  UUID
> > storage.mx.vg2-shared_sun_NAS.lv1   253   0 L--w    1    1      0
> > LVM-Ud9pj6QE4hK1K3xiAFMVCnno3SrXaRyTXJLtTGDOPjBUppJgzr4t0jJowixEOtx7
> > storage.mx.vg1-shared_sun_users.lv1 253   2 L--w    1    1      0
> > LVM-ypcHlbNXu36FLRgU0EcUiXBSIvcOlHEP3MHkBKsBeHf6Q68TIuGA9hd5UfCpvOeo
> > ubuntu_server--vg-ubuntu_server--lv 253   1 L--w    1    1      0
> > LVM-eGBUJxP1vlW3MfNNeC2r5JfQUiKKWZ73t3U3Jji3lggXe8LPrUf0xRE0YyPzSorO
> >
> > NOTE the 'open' status on the NAS.lv1 device. In fact the device is not mounted:
> >
> > cat /proc/mounts | grep mapper
> > /dev/mapper/ubuntu_server--vg-ubuntu_server--lv / ext4
> > rw,relatime,errors=remount-ro 0 0
> > /dev/mapper/storage.mx.vg1-shared_sun_users.lv1 /mnt/home ext4 rw,relatime 0 0
> >
> > pvdisplay
> >   --- Physical volume ---
> >   PV Name               /dev/md1
> >   VG Name               storage.mx.vg1
> >   PV Size               111.73 GiB / not usable 3.00 MiB
> >   Allocatable           yes (but full)
> >   PE Size               4.00 MiB
> >   Total PE              28603
> >   Free PE               0
> >   Allocated PE          28603
> >   PV UUID               4yDnuz-PEHg-uZqd-djWS-DNnp-Qzuf-fYvGZJ
> >
> >   --- Physical volume ---
> >   PV Name               /dev/md0
> >   VG Name               ubuntu_server-vg
> >   PV Size               <37.22 GiB / not usable 0
> >   Allocatable           yes (but full)
> >   PE Size               4.00 MiB
> >   Total PE              9528
> >   Free PE               0
> >   Allocated PE          9528
> >   PV UUID               G0bNbO-DOz4-I2nN-rEQq-X00m-PG3a-fPAP3I
> >
> > So in this case LVM seems to recognise that the md90 device is gone.
> >
> > Before I changed the 'fail_last_dev' option, When I ran these LVM
> > commands, I was experiencing a short delay and then a report in place
> > of the failed device saying that LVM had given up waiting for a udev
> > entry to become available after 10000mS. Sorry I didn't catch this,
> > for the log, but it is from memory.
> >
> > But now this delay is not happening and LVM seems to have a correct
> > and consistent view of the failed mount point. Clearly mdraid has sent
> > the failure up the stack.
> >
> >
> > However mdraid will still NOT stop the md90 device:
> >
> > mdadm --stop /dev/md90
> > mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
> > process, mounted filesystem or active volume group?
> >
> > ATB
> > Aidan
> >
> > On Thu, 13 Jan 2022 at 15:46, Mariusz Tkaczyk
> > <mariusz.tkaczyk@linux.intel.com> wrote:
> >> Hi Aidan,
> >>
> >> On Wed, 12 Jan 2022 02:29:47 +0100
> >> Aidan Walton <aidan.walton@gmail.com> wrote:
> >>
> >>> Hi Mariusz,
> >>> In my case, the fact that mdraid does not show a 'total failure' is
> >>> not the root of the problem. However in my opinion I would say that
> >>> not having mdraid more accurately reflect the state of the underlying
> >>> hardware can be mis-leading. Initially when I looked at this issue, I
> >>> was convinced that only one disk had failed and I was scratching my
> >>> head about firstly why I still could not R/W the array while it
> >>> appeared to have an active member. Secondly, when I rebooted I noticed
> >>> that the array became instantly synchronised with both members active.
> >> We have raid1 so first fail should be recorded in metadata. From your
> >> description, I understand that nothing like this happened. For me, it
> >> seems that the controller lost both drives in the same moment and as a
> >> result nothing was saved. After reboot raid is assembled without
> >> rebuild because metadata on both members is valid.
> >>
> >>> This was not what I expected, as normally an array that has had a
> >>> single failed disk would require a ra-add and resync.  Then when the
> >>> problem re-occured I noticed that it was not the same disk that was
> >>> flagged faulty, next reboot; the faulty disk flipped back the other
> >>> way... and so on. This was what prompted me to look closer at the
> >>> kernel. Here I found my answer at the SATA controller. Therefore
> >>> although mdraids design approach did not cause me any data loss, it
> >>> did have me looking in the wrong direction for the fault, assuming a
> >>> disk problem.
> >>>
> >>> I have still not been able to successfully --stop the array. I think
> >>> the issue sits in the LVM domain. Although I can not be 100% sure.
> >>> What I have achieved is some level of understanding that some process
> >>> that starts a boot time is in some unknown manner holding a lock on
> >>> the mdarid  - devmapper - LVM combination. I have unmounted the file
> >>> system, but LVM refuses to let go of the logical volume. Therefore so
> >>> does dev-mapper and of course mdraid. I have systematically stopped or
> >>> killed almost every single running process on the system, taking it
> >>> back to a skeleton, with not much more than init running, it still
> >>> refuses to let go
> >>>
> >>> However, when I prevent auto-mounting of the raid array at boot, and
> >>> then manually assemble the raid array, LVM finds its meta data, builds
> >>> the VG and LV and mounts. If I then manually force the exact same SATA
> >>> controller failure, which results in the exact same mdraid behaviour,
> >>> I am then able to unmount the filesystem and ...... hey presto
> >>> deactivate the LVM LV. Which then allows me to --stop the mdraid. Just
> >>> as I want. Again it does not solve my SATA hardware issues, but being
> >>> at this point does give me options to restart the hardware etc, and
> >>> probably, though very messily, get the filesystem up again without a
> >>> reboot. The problem being I can not achieve this behaviour without
> >>> manually assembling the array after boot. If you have any idea what
> >>> could possibly be holding this lock I would be glad to hear.
> >>>
> >> Could you connect to the udev monitor and analyze events triggered in
> >> both cases? This is the one idea I have.
> >>
> >> Thanks,
> >> Mariusz
> >>
> >>> At this point I'm going to have to try and systematically step through
> >>> the boot process and try re-arranging, when the array gets assembled.
> >>> My first attempts at this have been to <ignore> the raid array in
> >>> mdadm.conf and comment the array out of /etc/fstab. In this way mdraid
> >>> inside initramfs does not auto-assemble and LVM does not auto scan for
> >>> the VG. Once I am in the real boot sequence, I have created a systemd
> >>> mount unit that I can pull in from other systemd units, to change the
> >>> point in the boot process when the array is assembled. In this way
> >>> hopefully I can influence when other services are interacting with the
> >>> array in some way and perhaps find the root cause ......   Work in
> >>> progress..but slowly as the fault occurs only very occasionally and I
> >>> still need a working server.
> >>> All the best.. Aidan
> >>>
> >>> On Mon, 10 Jan 2022 at 10:47, Mariusz Tkaczyk
> >>> <mariusz.tkaczyk@linux.intel.com> wrote:
> >>>> On Fri, 7 Jan 2022 23:30:31 +0100
> >>>> Aidan Walton <aidan.walton@gmail.com> wrote:
> >>>>
> >>>>> Hi,
> >>>>> I have a system running:
> >>>>> Ubuntu Server 20.04.3 LTS on a 5.4.0-92-generic kernel.
> >>>>>
> >>>>> On the motherboard is a:
> >>>>> SATA controller: JMicron Technology Corp. JMB363 SATA/IDE
> >>>>> Controller (rev 02)
> >>>>>
> >>>>> Connected to this I have:
> >>>>> 2x Western Digital - WDC WD5001AALS-00L3B2, BIOS :01.03B01 500Gb
> >>>>> drives
> >>>>>
> >>>>> Each drive has a single partition and is part of a RAID1 array:
> >>>>> /dev/md90:
> >>>>> .....
> >>>>>     Number   Major   Minor   RaidDevice State
> >>>>>        0       8       33        0      active sync   /dev/sdc1
> >>>>>        2       8       49        1      active sync   /dev/sdd1
> >>>>>
> >>>>> On top of this I have a single LVM VG and LV. (Probably not
> >>>>> important)
> >>>>>
> >>>>> I have noticed some strange behaviour and upon investigation I
> >>>>> find the md device in the following state:
> >>>>> /dev/md90:
> >>>>> ....
> >>>>>
> >>>>>     Number   Major   Minor   RaidDevice State
> >>>>>        0       8       33        0      active sync   /dev/sdc1
> >>>>>        -       0        0        1      removed
> >>>>>
> >>>>>        2       8       49        -      faulty   /dev/sdd1
> >>>>>
> >>>>>
> >>>>> In fact neither /dev/sdc1 or /dev/sdd1 are available. In fact
> >>>>> neither are /dev/sdc or /dev/sdd, the physical drives, as they
> >>>>> both been disconnected by the kernel:
> >>>>> /dev/sdc is attached to ata7:00  and  /dev/sdd is attached to
> >>>>> ata.8:00 This is the log of the kernel events:
> >>>>>
> >>>>>
> >>>>> Jan 07 22:09:03 mx kernel: ata7.00: exception Emask 0x32 SAct 0x0
> >>>>> SErr 0x0 action 0xe frozen
> >>>>> Jan 07 22:09:03 mx kernel: ata7.00: irq_stat 0xffffffff, unknown
> >>>>> FIS 00000000 00000000 00000000 00000000, host bus
> >>>>> Jan 07 22:09:03 mx kernel: ata7.00: failed command: READ DMA
> >>>>> Jan 07 22:09:03 mx kernel: ata7.00: cmd
> >>>>> c8/00:00:00:cf:26/00:00:00:00:00/e0 tag 18 dma 131072 in
> >>>>> Jan 07 22:09:03 mx kernel: ata7.00: status: { DRDY }
> >>>>> Jan 07 22:09:03 mx kernel: ata7: hard resetting link
> >>>>> Jan 07 22:09:03 mx kernel: ata7: SATA link up 1.5 Gbps (SStatus
> >>>>> 113 SControl 310)
> >>>>> Jan 07 22:09:09 mx kernel: ata7.00: qc timeout (cmd 0xec)
> >>>>> Jan 07 22:09:09 mx kernel: ata7.00: failed to IDENTIFY (I/O error,
> >>>>> err_mask=0x4) Jan 07 22:09:09 mx kernel: ata7.00: revalidation
> >>>>> failed (errno=-5) Jan 07 22:09:09 mx kernel: ata7: hard resetting
> >>>>> link Jan 07 22:09:19 mx kernel: ata7: softreset failed (1st FIS
> >>>>> failed) Jan 07 22:09:19 mx kernel: ata7: hard resetting link
> >>>>> Jan 07 22:09:29 mx kernel: ata7: softreset failed (1st FIS failed)
> >>>>> Jan 07 22:09:29 mx kernel: ata7: hard resetting link
> >>>>> Jan 07 22:09:35 mx kernel: ata8.00: exception Emask 0x40 SAct 0x0
> >>>>> SErr 0x800 action 0x6 frozen
> >>>>> Jan 07 22:09:35 mx kernel: ata8: SError: { HostInt }
> >>>>> Jan 07 22:09:35 mx kernel: ata8.00: failed command: READ DMA
> >>>>> Jan 07 22:09:35 mx kernel: ata8.00: cmd
> >>>>> c8/00:00:00:64:4a/00:00:00:00:00/e0 tag 2 dma 131072 in
> >>>>> Jan 07 22:09:35 mx kernel: ata8.00: status: { DRDY }
> >>>>> Jan 07 22:09:35 mx kernel: ata8: hard resetting link
> >>>>> Jan 07 22:09:45 mx kernel: ata8: softreset failed (1st FIS failed)
> >>>>> Jan 07 22:09:45 mx kernel: ata8: hard resetting link
> >>>>> Jan 07 22:09:55 mx kernel: ata8: softreset failed (1st FIS failed)
> >>>>> Jan 07 22:09:55 mx kernel: ata8: hard resetting link
> >>>>> Jan 07 22:10:04 mx kernel: ata7: softreset failed (1st FIS failed)
> >>>>> Jan 07 22:10:04 mx kernel: ata7: hard resetting link
> >>>>> Jan 07 22:10:09 mx kernel: ata7: softreset failed (1st FIS failed)
> >>>>> Jan 07 22:10:09 mx kernel: ata7: reset failed, giving up
> >>>>> Jan 07 22:10:09 mx kernel: ata7.00: disabled
> >>>>> Jan 07 22:10:09 mx kernel: ata7: EH complete
> >>>>> Jan 07 22:10:30 mx kernel: ata8: softreset failed (1st FIS failed)
> >>>>> Jan 07 22:10:30 mx kernel: ata8: hard resetting link
> >>>>> Jan 07 22:10:35 mx kernel: ata8: softreset failed (1st FIS failed)
> >>>>> Jan 07 22:10:35 mx kernel: ata8: reset failed, giving up
> >>>>> Jan 07 22:10:35 mx kernel: ata8.00: disabled
> >>>>> Jan 07 22:10:35 mx kernel: ata8: EH complete
> >>>>>
> >>>>> This is happening because of some issue with the SATA controller
> >>>>> on the motherboard. This has not been resolved and probably never
> >>>>> will be, I see many others through google search complaining of
> >>>>> similar issues with the SATA controller.
> >>>>> This failure only occurs when the SATA controller is placed under
> >>>>> very heavy load, I have minimised the impact of the problem by
> >>>>> not using NCQ, this helps, but it still occurs. Ironically the
> >>>>> biggest issue I have is that mdadm "checkarray" is running
> >>>>> because of a systemd background process every week or so, and
> >>>>> this hammers the disk into failure. Most of the normal daily
> >>>>> usage never generates the link resets.
> >>>>> Naturally I have changed SATA cables and moved drives around onto
> >>>>> different controllers, but alas, it does seem to be the hardware
> >>>>> on the motherboard.
> >>>>> However as a workaround I was hoping to accept the occasional
> >>>>> failure and then using some scripting and 'setpci' I can get the
> >>>>> kernel to hard reset the chipset and attach the drives again. I
> >>>>> have the process working in terms of getting the kernel to
> >>>>> re-attach the drives, but.......
> >>>>>
> >>>>> Unfortunately mdraid will not let go of them, I can not stop the
> >>>>> arrays, and therefore can't rebuild them. If I simply allow the
> >>>>> kernel to re-attach the drives the kernel names are swapped over,
> >>>>> as something (mdraid) is stopping the kernel re-using the same
> >>>>> device names. Anyway being dependent on the same kernel device
> >>>>> names is not a great plan anyway, so I was simply trying to get
> >>>>> mdadm to reassemble the array as soon as the 'workaround' script
> >>>>> gets the drives back in contact with libata (kernel).
> >>>>>
> >>>>> Plan:
> >>>>> 1. Detecting the problem. (mdadm state)
> >>>>> 2. Stop the array totally (can NOT do it)
> >>>>> 3. reset the chipset across the PCI bus.
> >>>>> 4. allow kernel to re-attach drives.
> >>>>> 5. re-assemble the md device with mdadm
> >>>>> 6. restart, if necessary higher layer services...
> >>>>>
> >>>>> So why is mdraid holding on to the array:
> >>>>>
> >>>>> # mdadm --stop /dev/md90
> >>>>> mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
> >>>>> process, mounted filesystem or active volume group?
> >>>>>
> >>>>> I can not be 100% sure that something else is using the device,
> >>>>> but I can't think of anything that is and I stopped every process
> >>>>> I can think of..... Plus why is the array still shown as 'active'
> >>>>> when none of its member devices even exist anymore?
> >>>>>
> >>>>> What I do know is that device mapper (coming down from LVM)
> >>>>> still has an entry in /dev/mapper. But then probably no surprise
> >>>>> as /dev/md90 the failed array is still an active device node. If
> >>>>> you attempt to write to it, I receive I/O errors from the kernel.
> >>>>> In fact as far as any higher layer services are concerned md90
> >>>>> and the LVM LV on top of it are still active and working when in
> >>>>> reality, they are not. It causes very strange NFS errors and such.
> >>>>>
> >>>>> mdraid does actually attempt to iteratively remove both partitions
> >>>>> when the kernel signals the disable state, but only 1 of them
> >>>>> succeeds.
> >>>>> I did an strace of the same iterative 'fail:remove' process that
> >>>>> mdraid attempts when the kernel issues -- kernel: ata7.00:
> >>>>> disabled
> >>>>>
> >>>>> eg:
> >>>>> /sbin/mdadm -If sdc1 --path pci-0000:02:00.0-ata-1
> >>>>> mdadm: set device faulty failed for sdc1:  Device or resource busy
> >>>>>
> >>>>> The only clue is perhaps this line from the strace:
> >>>>> openat(AT_FDCWD, "/sys/block/md90/md/dev-sdc1/block/dev", O_RDWR)
> >>>>> = -1 EACCES (Permission denied)    What is the mdadm command
> >>>>> doing that results in a permission problem?
> >>>>>
> >>>>> So the only way I can get rid of this md raid array is a reboot.
> >>>>> Damn!!!
> >>>>>
> >>>>>
> >>>>> Any help is much appreciated.
> >>>>> Aidan
> >>>>>
> >>>>>
> >>>>>
> >>>> Hi Aidan,
> >>>> This is how it is implemented. Drive is not removed if array failure
> >>>> will cause array failed. Please see:
> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?id=9a567843f7ce0037bfd4d5fdc58a09d0a527b28b
> >>>>
> >>>> For RAID1 you can use solution proposed in patch below but IMO it is
> >>>> not your problem. Please stop LVM and then try to stop array. To
> >>>> stop array it needs to be "free" (all upper handlers are down).
> >>>>
> >>>> Thanks,
> >>>> Mariusz
>
> --
> Geoff Back
> What if we're all just characters in someone's nightmares?
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: md device remains active even when all supporting disks have failed and been disabled by the kernel.
  2022-01-13  0:45           ` Aidan Walton
@ 2022-01-13 18:26             ` Roger Heflin
  0 siblings, 0 replies; 14+ messages in thread
From: Roger Heflin @ 2022-01-13 18:26 UTC (permalink / raw)
  To: Aidan Walton; +Cc: Jani Partanen, Mariusz Tkaczyk, Linux RAID

My experience is that if a sata controller works fine with one disk,
but not 2 then there is a significant bug in the chip attempting to
correctly handle 2 streams and there is likely no way to fix it
because it is a hw issue.  I have seen way too many controllers that
must have passed a limited worthless test process with only one disk,
but won't work reliably with more than one.

My past testing on cheap PCI-e sata controllers was pretty ugly.
They were a pretty unreliable bunch, they would generally work just
good enough to make you think they work, and then fail under load.   I
gave up and spent a bit more to get a used HBA mode LSI SAS2008
controller but that is a pcie-8 controller and can be had for about
$50.

You cannot umount the filesystem once the underlying disk is gone.
step #1 in umounting is to sync what data there is in ram, and with
the device being gone that will never finish and hence will never
umount it, and there will always be access time and/or other data to
be synced from what I have seen.  I have not heard of a way to tell
the kernel that a device/fs is really gone and clean up the dirty
writes related to that device so that it could be umounted.  You can
lazy umount it and that will make the mount point go away from showing
up in the OS, but it really won't be umounted if there is dirty cache
for the device, and/or programs with file open on the disk.

On Wed, Jan 12, 2022 at 6:46 PM Aidan Walton <aidan.walton@gmail.com> wrote:
>
> Thanks Jani,
> Gold info. I have been looking at 2 cheap Mini-PCIe adapters. One uses
> the JMicron and would you believe it the other is using ASMedia.
> Now I have a solid reason to choose the second rather than the first.
>
> In this case it will NOT be,  "Better the devil you know than the
> devil you don't"
> and anyway I don't work hard enough to need 6Gb/s :)
> ATB
> Aidan
>
> On Thu, 13 Jan 2022 at 01:24, Jani Partanen <jiipee@sotapeli.fi> wrote:
> >
> > Aidan Walton kirjoitti 13/01/2022 klo 2.03:
> > > Hi Roger,
> > > As I mentioned, it is a:
> > > JMicron Technology Corp. JMB363
> >
> > If my memory is correct. This chip is trash. If I am right, I have same
> > chip on cheap controller what I bought to get IDE controller mainly, it
> > also has 2xSATA ports and I had only troubles with that controller in linux.
> > If I had no drives on SATA ports, dmesg got spammed port errors. If I
> > had drives installed, they was randomly dropping in and out. Chip itself
> > was running hot as hell, I mean so hot that you could not keep finger on
> > chip. I did put some small finns to it and I think it did help little
> > for drives dropping in and out, but didn't solve it.
> > If you want a cheap controller, get this chip:
> > 02:00.0 SATA controller: ASMedia Technology Inc. Device 1166 (rev 02)
> >
> > It's been quite stable for me and didn't cost much. Only issue I have
> > that I cannot force it's ports transfer speed to example 3.0G. Spinnin
> > rust do not need 6.0G speed what in my experience increase chance for
> > dropouts if you cables aren't 6.0G ready and how do you tell if they are
> > when cable doesn't have any mention about it?
> > I think it's something to do that it's PMP card and I don't understand
> > how you have to format kernel parameter when you have PMP situation. I
> > can tune internal controller all ports just fine, but not this expansion
> > card. Kernel tell you it's forcing them to 3.0G, but when you check
> > later what transfer speed is, it's still 6.0G
> >
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: md device remains active even when all supporting disks have failed and been disabled by the kernel.
  2022-01-13  0:03       ` Aidan Walton
  2022-01-13  0:23         ` Jani Partanen
@ 2022-01-14 17:05         ` John Stoffel
  1 sibling, 0 replies; 14+ messages in thread
From: John Stoffel @ 2022-01-14 17:05 UTC (permalink / raw)
  To: Aidan Walton; +Cc: Roger Heflin, Mariusz Tkaczyk, Linux RAID

>>>>> "Aidan" == Aidan Walton <aidan.walton@gmail.com> writes:

Aidan> Hi Roger,
Aidan> As I mentioned, it is a:
Aidan> JMicron Technology Corp. JMB363

Aidan> The reason for it failing is not entirely clear, but I'm fairly sure
Aidan> it's not related to your suggestions. I can make it fail at will by
Aidan> using:
Aidan>  dd if=/dev/sdc of=/dev/null bs=64k count=10000   and  dd if=/dev/sdd
Aidan> of=/dev/null bs=64k count=10000

Aidan> Now as I increase the block size of dd, the rate of errors and resets
Aidan> reduces, in what seems to be a rather linear manner. (Not in any way
Aidan> scientifically proven) but there is a clear relationship between
Aidan> commands/sec and resets.

If you can make the controller failover that quickly and easily, then
it's a crap controller and should be ripped out.  Go with anything
LSI, they're well supported in the Linux kernel and rock solid.  The
the JBOD firmware version as well, not the RAID firmware. 


Aidan> If I use dd on only one disk, I get no errors and no resets, and this
Aidan> applies to either disk.

Aidan> Thanks for the interest, however I was not trying to digress off topic
Aidan> too much. I don't think the linux-raid list is responsible for all
Aidan> possible controllers out there. However the interactions between
Aidan> mdraid and LVM is important and this is actually why I posted
Aidan> initially.

Aidan> Essentially I'm hoping someone comes back and says....hey did you
Aidan> check that such a file under sys/.... is clear or not etc.
Aidan> i.e. My instinct tells me somehow the kernel is holding a lock on LVM
Aidan> and the mount/umount commands don't seem to care. They do what is
Aidan> expected and /proc/mounts is updated accordingly. That is to say, file
Aidan> locking is not the problem, but something is that I have missed and I
Aidan> dont have the necessary knowledge to focus down onto it.

Aidan> On Wed, 12 Jan 2022 at 23:54, Roger Heflin <rogerheflin@gmail.com> wrote:
>> 
>> What kind of sata controller do you have?   I know of some 4pt marvell
>> ones that if you execute smart commands or too many commands against
>> it will "fault" and drop all of the disks going via that controller.
>> 
>> On Wed, Jan 12, 2022 at 4:10 PM Aidan Walton <aidan.walton@gmail.com> wrote:
>> >
>> > Hi Mariusz,
>> > In my case, the fact that mdraid does not show a 'total failure' is
>> > not the root of the problem. However in my opinion I would say that
>> > not having mdraid more accurately reflect the state of the underlying
>> > hardware can be mis-leading. Initially when I looked at this issue, I
>> > was convinced that only one disk had failed and I was scratching my
>> > head about firstly why I still could not R/W the array while it
>> > appeared to have an active member. Secondly, when I rebooted I noticed
>> > that the array became instantly synchronised with both members active.
>> > This was not what I expected, as normally an array that has had a
>> > single failed disk would require a ra-add and resync.  Then when the
>> > problem re-occured I noticed that it was not the same disk that was
>> > flagged faulty, next reboot; the faulty disk flipped back the other
>> > way... and so on. This was what prompted me to look closer at the
>> > kernel. Here I found my answer at the SATA controller. Therefore
>> > although mdraids design approach did not cause me any data loss, it
>> > did have me looking in the wrong direction for the fault, assuming a
>> > disk problem.
>> >
>> > I have still not been able to successfully --stop the array. I think
>> > the issue sits in the LVM domain. Although I can not be 100% sure.
>> > What I have achieved is some level of understanding that some process
>> > that starts a boot time is in some unknown manner holding a lock on
>> > the mdarid  - devmapper - LVM combination. I have unmounted the file
>> > system, but LVM refuses to let go of the logical volume. Therefore so
>> > does dev-mapper and of course mdraid. I have systematically stopped or
>> > killed almost every single running process on the system, taking it
>> > back to a skeleton, with not much more than init running, it still
>> > refuses to let go.
>> >
>> > However, when I prevent auto-mounting of the raid array at boot, and
>> > then manually assemble the raid array, LVM finds its meta data, builds
>> > the VG and LV and mounts. If I then manually force the exact same SATA
>> > controller failure, which results in the exact same mdraid behaviour,
>> > I am then able to unmount the filesystem and ...... hey presto
>> > deactivate the LVM LV. Which then allows me to --stop the mdraid. Just
>> > as I want. Again it does not solve my SATA hardware issues, but being
>> > at this point does give me options to restart the hardware etc, and
>> > probably, though very messily, get the filesystem up again without a
>> > reboot. The problem being I can not achieve this behaviour without
>> > manually assembling the array after boot. If you have any idea what
>> > could possibly be holding this lock I would be glad to hear.
>> >
>> > At this point I'm going to have to try and systematically step through
>> > the boot process and try re-arranging, when the array gets assembled.
>> > My first attempts at this have been to <ignore> the raid array in
>> > mdadm.conf and comment the array out of /etc/fstab. In this way mdraid
>> > inside initramfs does not auto-assemble and LVM does not auto scan for
>> > the VG. Once I am in the real boot sequence, I have created a systemd
>> > mount unit that I can pull in from other systemd units, to change the
>> > point in the boot process when the array is assembled. In this way
>> > hopefully I can influence when other services are interacting with the
>> > array in some way and perhaps find the root cause ......   Work in
>> > progress..but slowly as the fault occurs only very occasionally and I
>> > still need a working server.
>> > All the best.. Aidan
>> >
>> > On Mon, 10 Jan 2022 at 10:47, Mariusz Tkaczyk
>> > <mariusz.tkaczyk@linux.intel.com> wrote:
>> > >
>> > > On Fri, 7 Jan 2022 23:30:31 +0100
>> > > Aidan Walton <aidan.walton@gmail.com> wrote:
>> > >
>> > > > Hi,
>> > > > I have a system running:
>> > > > Ubuntu Server 20.04.3 LTS on a 5.4.0-92-generic kernel.
>> > > >
>> > > > On the motherboard is a:
>> > > > SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller
>> > > > (rev 02)
>> > > >
>> > > > Connected to this I have:
>> > > > 2x Western Digital - WDC WD5001AALS-00L3B2, BIOS :01.03B01 500Gb
>> > > > drives
>> > > >
>> > > > Each drive has a single partition and is part of a RAID1 array:
>> > > > /dev/md90:
>> > > > .....
>> > > >     Number   Major   Minor   RaidDevice State
>> > > >        0       8       33        0      active sync   /dev/sdc1
>> > > >        2       8       49        1      active sync   /dev/sdd1
>> > > >
>> > > > On top of this I have a single LVM VG and LV. (Probably not important)
>> > > >
>> > > > I have noticed some strange behaviour and upon investigation I find
>> > > > the md device in the following state:
>> > > > /dev/md90:
>> > > > ....
>> > > >
>> > > >     Number   Major   Minor   RaidDevice State
>> > > >        0       8       33        0      active sync   /dev/sdc1
>> > > >        -       0        0        1      removed
>> > > >
>> > > >        2       8       49        -      faulty   /dev/sdd1
>> > > >
>> > > >
>> > > > In fact neither /dev/sdc1 or /dev/sdd1 are available. In fact neither
>> > > > are /dev/sdc or /dev/sdd, the physical drives, as they both been
>> > > > disconnected by the kernel:
>> > > > /dev/sdc is attached to ata7:00  and  /dev/sdd is attached to ata.8:00
>> > > > This is the log of the kernel events:
>> > > >
>> > > >
>> > > > Jan 07 22:09:03 mx kernel: ata7.00: exception Emask 0x32 SAct 0x0 SErr
>> > > > 0x0 action 0xe frozen
>> > > > Jan 07 22:09:03 mx kernel: ata7.00: irq_stat 0xffffffff, unknown FIS
>> > > > 00000000 00000000 00000000 00000000, host bus
>> > > > Jan 07 22:09:03 mx kernel: ata7.00: failed command: READ DMA
>> > > > Jan 07 22:09:03 mx kernel: ata7.00: cmd
>> > > > c8/00:00:00:cf:26/00:00:00:00:00/e0 tag 18 dma 131072 in
>> > > > Jan 07 22:09:03 mx kernel: ata7.00: status: { DRDY }
>> > > > Jan 07 22:09:03 mx kernel: ata7: hard resetting link
>> > > > Jan 07 22:09:03 mx kernel: ata7: SATA link up 1.5 Gbps (SStatus 113
>> > > > SControl 310)
>> > > > Jan 07 22:09:09 mx kernel: ata7.00: qc timeout (cmd 0xec)
>> > > > Jan 07 22:09:09 mx kernel: ata7.00: failed to IDENTIFY (I/O error,
>> > > > err_mask=0x4) Jan 07 22:09:09 mx kernel: ata7.00: revalidation failed
>> > > > (errno=-5) Jan 07 22:09:09 mx kernel: ata7: hard resetting link
>> > > > Jan 07 22:09:19 mx kernel: ata7: softreset failed (1st FIS failed)
>> > > > Jan 07 22:09:19 mx kernel: ata7: hard resetting link
>> > > > Jan 07 22:09:29 mx kernel: ata7: softreset failed (1st FIS failed)
>> > > > Jan 07 22:09:29 mx kernel: ata7: hard resetting link
>> > > > Jan 07 22:09:35 mx kernel: ata8.00: exception Emask 0x40 SAct 0x0 SErr
>> > > > 0x800 action 0x6 frozen
>> > > > Jan 07 22:09:35 mx kernel: ata8: SError: { HostInt }
>> > > > Jan 07 22:09:35 mx kernel: ata8.00: failed command: READ DMA
>> > > > Jan 07 22:09:35 mx kernel: ata8.00: cmd
>> > > > c8/00:00:00:64:4a/00:00:00:00:00/e0 tag 2 dma 131072 in
>> > > > Jan 07 22:09:35 mx kernel: ata8.00: status: { DRDY }
>> > > > Jan 07 22:09:35 mx kernel: ata8: hard resetting link
>> > > > Jan 07 22:09:45 mx kernel: ata8: softreset failed (1st FIS failed)
>> > > > Jan 07 22:09:45 mx kernel: ata8: hard resetting link
>> > > > Jan 07 22:09:55 mx kernel: ata8: softreset failed (1st FIS failed)
>> > > > Jan 07 22:09:55 mx kernel: ata8: hard resetting link
>> > > > Jan 07 22:10:04 mx kernel: ata7: softreset failed (1st FIS failed)
>> > > > Jan 07 22:10:04 mx kernel: ata7: hard resetting link
>> > > > Jan 07 22:10:09 mx kernel: ata7: softreset failed (1st FIS failed)
>> > > > Jan 07 22:10:09 mx kernel: ata7: reset failed, giving up
>> > > > Jan 07 22:10:09 mx kernel: ata7.00: disabled
>> > > > Jan 07 22:10:09 mx kernel: ata7: EH complete
>> > > > Jan 07 22:10:30 mx kernel: ata8: softreset failed (1st FIS failed)
>> > > > Jan 07 22:10:30 mx kernel: ata8: hard resetting link
>> > > > Jan 07 22:10:35 mx kernel: ata8: softreset failed (1st FIS failed)
>> > > > Jan 07 22:10:35 mx kernel: ata8: reset failed, giving up
>> > > > Jan 07 22:10:35 mx kernel: ata8.00: disabled
>> > > > Jan 07 22:10:35 mx kernel: ata8: EH complete
>> > > >
>> > > > This is happening because of some issue with the SATA controller on
>> > > > the motherboard. This has not been resolved and probably never will
>> > > > be, I see many others through google search complaining of similar
>> > > > issues with the SATA controller.
>> > > > This failure only occurs when the SATA controller is placed under very
>> > > > heavy load, I have minimised the impact of the problem by not using
>> > > > NCQ, this helps, but it still occurs. Ironically the biggest issue I
>> > > > have is that mdadm "checkarray" is running because of a systemd
>> > > > background process every week or so, and this hammers the disk into
>> > > > failure. Most of the normal daily usage never generates the link
>> > > > resets.
>> > > > Naturally I have changed SATA cables and moved drives around onto
>> > > > different controllers, but alas, it does seem to be the hardware on
>> > > > the motherboard.
>> > > > However as a workaround I was hoping to accept the occasional failure
>> > > > and then using some scripting and 'setpci' I can get the kernel to
>> > > > hard reset the chipset and attach the drives again. I have the process
>> > > > working in terms of getting the kernel to re-attach the drives,
>> > > > but.......
>> > > >
>> > > > Unfortunately mdraid will not let go of them, I can not stop the
>> > > > arrays, and therefore can't rebuild them. If I simply allow the kernel
>> > > > to re-attach the drives the kernel names are swapped over, as
>> > > > something (mdraid) is stopping the kernel re-using the same device
>> > > > names. Anyway being dependent on the same kernel device names is not a
>> > > > great plan anyway, so I was simply trying to get mdadm to reassemble
>> > > > the array as soon as the 'workaround' script gets the drives back in
>> > > > contact with libata (kernel).
>> > > >
>> > > > Plan:
>> > > > 1. Detecting the problem. (mdadm state)
>> > > > 2. Stop the array totally (can NOT do it)
>> > > > 3. reset the chipset across the PCI bus.
>> > > > 4. allow kernel to re-attach drives.
>> > > > 5. re-assemble the md device with mdadm
>> > > > 6. restart, if necessary higher layer services...
>> > > >
>> > > > So why is mdraid holding on to the array:
>> > > >
>> > > > # mdadm --stop /dev/md90
>> > > > mdadm: Cannot get exclusive access to /dev/md90:Perhaps a running
>> > > > process, mounted filesystem or active volume group?
>> > > >
>> > > > I can not be 100% sure that something else is using the device, but I
>> > > > can't think of anything that is and I stopped every process I can
>> > > > think of..... Plus why is the array still shown as 'active' when none
>> > > > of its member devices even exist anymore?
>> > > >
>> > > > What I do know is that device mapper (coming down from LVM)  still has
>> > > > an entry in /dev/mapper. But then probably no surprise as /dev/md90
>> > > > the failed array is still an active device node. If you attempt to
>> > > > write to it, I receive I/O errors from the kernel. In fact as far as
>> > > > any higher layer services are concerned md90 and the LVM LV on top of
>> > > > it are still active and working when in reality, they are not. It
>> > > > causes very strange NFS errors and such.
>> > > >
>> > > > mdraid does actually attempt to iteratively remove both partitions
>> > > > when the kernel signals the disable state, but only 1 of them
>> > > > succeeds.
>> > > > I did an strace of the same iterative 'fail:remove' process that
>> > > > mdraid attempts when the kernel issues -- kernel: ata7.00: disabled
>> > > >
>> > > > eg:
>> > > > /sbin/mdadm -If sdc1 --path pci-0000:02:00.0-ata-1
>> > > > mdadm: set device faulty failed for sdc1:  Device or resource busy
>> > > >
>> > > > The only clue is perhaps this line from the strace:
>> > > > openat(AT_FDCWD, "/sys/block/md90/md/dev-sdc1/block/dev", O_RDWR) = -1
>> > > > EACCES (Permission denied)    What is the mdadm command doing that
>> > > > results in a permission problem?
>> > > >
>> > > > So the only way I can get rid of this md raid array is a reboot.
>> > > > Damn!!!
>> > > >
>> > > >
>> > > > Any help is much appreciated.
>> > > > Aidan
>> > > >
>> > > >
>> > > >
>> > > Hi Aidan,
>> > > This is how it is implemented. Drive is not removed if array failure
>> > > will cause array failed. Please see:
>> > > https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?id=9a567843f7ce0037bfd4d5fdc58a09d0a527b28b
>> > >
>> > > For RAID1 you can use solution proposed in patch below but IMO it is
>> > > not your problem. Please stop LVM and then try to stop array. To stop
>> > > array it needs to be "free" (all upper handlers are down).
>> > >
>> > > Thanks,
>> > > Mariusz

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-01-14 17:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-07 22:30 md device remains active even when all supporting disks have failed and been disabled by the kernel Aidan Walton
2022-01-10  9:47 ` Mariusz Tkaczyk
2022-01-12  1:29   ` Aidan Walton
2022-01-12 22:54     ` Roger Heflin
2022-01-13  0:03       ` Aidan Walton
2022-01-13  0:23         ` Jani Partanen
2022-01-13  0:45           ` Aidan Walton
2022-01-13 18:26             ` Roger Heflin
2022-01-14 17:05         ` John Stoffel
2022-01-13 14:46     ` Mariusz Tkaczyk
2022-01-13 16:35       ` Aidan Walton
2022-01-13 16:48         ` Geoff Back
2022-01-13 17:21           ` Aidan Walton
2022-01-13 17:12         ` Aidan Walton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.