From: Donald Buczek <buczek@molgen.mpg.de>
To: Song Liu <song@kernel.org>,
linux-raid@vger.kernel.org,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
it+raid@molgen.mpg.de
Subject: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition
Date: Sat, 28 Nov 2020 13:25:22 +0100 [thread overview]
Message-ID: <aa9567fd-38e1-7b9c-b3e1-dc2fdc055da5@molgen.mpg.de> (raw)
Dear Linux mdraid people,
we are using raid6 on several servers. Occasionally we had failures, where a mdX_raid6 process seems to go into a busy loop and all I/O to the md device blocks. We've seen this on various kernel versions.
The last time this happened (in this case with Linux 5.10.0-rc4), I took some data.
The triggering event seems to be the md_check cron job trying to pause the ongoing check operation in the morning with
echo idle > /sys/devices/virtual/block/md1/md/sync_action
This doesn't complete. Here's /proc/stack of this process:
root@done:~/linux_problems/mdX_raid6_looping/2020-11-27# ps -fp 23333
UID PID PPID C STIME TTY TIME CMD
root 23333 23331 0 02:00 ? 00:00:00 /bin/bash /usr/bin/mdcheck --continue --duration 06:00
root@done:~/linux_problems/mdX_raid6_looping/2020-11-27# cat /proc/23333/stack
[<0>] kthread_stop+0x6e/0x150
[<0>] md_unregister_thread+0x3e/0x70
[<0>] md_reap_sync_thread+0x1f/0x1e0
[<0>] action_store+0x141/0x2b0
[<0>] md_attr_store+0x71/0xb0
[<0>] kernfs_fop_write+0x113/0x1a0
[<0>] vfs_write+0xbc/0x250
[<0>] ksys_write+0xa1/0xe0
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Note, that md0 has been paused successfully just before.
2020-11-27T02:00:01+01:00 done CROND[23333]: (root) CMD (/usr/bin/mdcheck --continue --duration "06:00")
2020-11-27T02:00:01+01:00 done root: mdcheck continue checking /dev/md0 from 10623180920
2020-11-27T02:00:01.382994+01:00 done kernel: [378596.606381] md: data-check of RAID array md0
2020-11-27T02:00:01+01:00 done root: mdcheck continue checking /dev/md1 from 11582849320
2020-11-27T02:00:01.437999+01:00 done kernel: [378596.661559] md: data-check of RAID array md1
2020-11-27T06:00:01.842003+01:00 done kernel: [392996.625147] md: md0: data-check interrupted.
2020-11-27T06:00:02+01:00 done root: pause checking /dev/md0 at 13351127680
2020-11-27T06:00:02.338989+01:00 done kernel: [392997.122520] md: md1: data-check interrupted.
[ nothing related following.... ]
After that, we see md1_raid6 in a busy loop:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2376 root 20 0 0 0 0 R 100.0 0.0 1387:38 md1_raid6
Also, all processes doing I/O do the md device block.
This is /proc/mdstat:
Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] [multipath]
md1 : active raid6 sdk[0] sdj[15] sdi[14] sdh[13] sdg[12] sdf[11] sde[10] sdd[9] sdc[8] sdr[7] sdq[6] sdp[5] sdo[4] sdn[3] sdm[2] sdl[1]
109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
[==================>..] check = 94.0% (7350290348/7813894144) finish=57189.3min speed=135K/sec
bitmap: 0/59 pages [0KB], 65536KB chunk
md0 : active raid6 sds[0] sdah[15] sdag[16] sdaf[13] sdae[12] sdad[11] sdac[10] sdab[9] sdaa[8] sdz[7] sdy[6] sdx[17] sdw[4] sdv[3] sdu[2] sdt[1]
109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
bitmap: 0/59 pages [0KB], 65536KB chunk
There doesn't seem to be any further progress.
I've taken a function_graph trace of the looping md1_raid6 process: https://owww.molgen.mpg.de/~buczek/2020-11-27_trace.txt (30 MB)
Maybe this helps to get an idea what might be going on?
Best
Donald
--
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433
next reply other threads:[~2020-11-28 22:03 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-28 12:25 Donald Buczek [this message]
2020-11-30 2:06 ` md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition Guoqing Jiang
2020-12-01 9:29 ` Donald Buczek
2020-12-02 17:28 ` Donald Buczek
2020-12-03 1:55 ` Guoqing Jiang
2020-12-03 11:42 ` Donald Buczek
2020-12-21 12:33 ` Donald Buczek
2021-01-19 11:30 ` Donald Buczek
2021-01-20 16:33 ` Guoqing Jiang
2021-01-23 13:04 ` Donald Buczek
2021-01-25 8:54 ` Donald Buczek
2021-01-25 21:32 ` Donald Buczek
2021-01-26 0:44 ` Guoqing Jiang
2021-01-26 9:50 ` Donald Buczek
2021-01-26 11:14 ` Guoqing Jiang
2021-01-26 12:58 ` Donald Buczek
2021-01-26 14:06 ` Guoqing Jiang
2021-01-26 16:05 ` Donald Buczek
2021-02-02 15:42 ` Guoqing Jiang
2021-02-08 11:38 ` Donald Buczek
2021-02-08 14:53 ` Guoqing Jiang
2021-02-08 18:41 ` Donald Buczek
2021-02-09 0:46 ` Guoqing Jiang
2021-02-09 9:24 ` Donald Buczek
2023-03-14 13:25 ` Marc Smith
2023-03-14 13:55 ` Guoqing Jiang
2023-03-14 14:45 ` Marc Smith
2023-03-16 15:25 ` Marc Smith
2023-03-29 0:01 ` Song Liu
2023-08-22 21:16 ` Dragan Stancevic
2023-08-23 1:22 ` Yu Kuai
2023-08-23 15:33 ` Dragan Stancevic
2023-08-24 1:18 ` Yu Kuai
2023-08-28 20:32 ` Dragan Stancevic
2023-08-30 1:36 ` Yu Kuai
2023-09-05 3:50 ` Yu Kuai
2023-09-05 13:54 ` Dragan Stancevic
2023-09-13 9:08 ` Donald Buczek
2023-09-13 14:16 ` Dragan Stancevic
2023-09-14 6:03 ` Donald Buczek
2023-09-17 8:55 ` Donald Buczek
2023-09-24 14:35 ` Donald Buczek
2023-09-25 1:11 ` Yu Kuai
2023-09-25 9:11 ` Donald Buczek
2023-09-25 9:32 ` Yu Kuai
2023-03-15 3:02 ` Yu Kuai
2023-03-15 9:30 ` Guoqing Jiang
2023-03-15 9:53 ` Yu Kuai
2023-03-15 7:52 ` Donald Buczek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aa9567fd-38e1-7b9c-b3e1-dc2fdc055da5@molgen.mpg.de \
--to=buczek@molgen.mpg.de \
--cc=it+raid@molgen.mpg.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=song@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).