Linux-Raid Archives on lore.kernel.org
 help / color / Atom feed
From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
To: Donald Buczek <buczek@molgen.mpg.de>, Song Liu <song@kernel.org>,
	linux-raid@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	it+raid@molgen.mpg.de
Subject: Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition
Date: Mon, 30 Nov 2020 03:06:19 +0100
Message-ID: <95fbd558-5e46-7a6a-43ac-bcc5ae8581db@cloud.ionos.com> (raw)
In-Reply-To: <aa9567fd-38e1-7b9c-b3e1-dc2fdc055da5@molgen.mpg.de>



On 11/28/20 13:25, Donald Buczek wrote:
> Dear Linux mdraid people,
> 
> we are using raid6 on several servers. Occasionally we had failures, 
> where a mdX_raid6 process seems to go into a busy loop and all I/O to 
> the md device blocks. We've seen this on various kernel versions.
> 
> The last time this happened (in this case with Linux 5.10.0-rc4), I took 
> some data.
> 
> The triggering event seems to be the md_check cron job trying to pause 
> the ongoing check operation in the morning with
> 
>      echo idle > /sys/devices/virtual/block/md1/md/sync_action
> 
> This doesn't complete. Here's /proc/stack of this process:
> 
>      root@done:~/linux_problems/mdX_raid6_looping/2020-11-27# ps -fp 23333
>      UID        PID  PPID  C STIME TTY          TIME CMD
>      root     23333 23331  0 02:00 ?        00:00:00 /bin/bash 
> /usr/bin/mdcheck --continue --duration 06:00
>      root@done:~/linux_problems/mdX_raid6_looping/2020-11-27# cat 
> /proc/23333/stack
>      [<0>] kthread_stop+0x6e/0x150
>      [<0>] md_unregister_thread+0x3e/0x70
>      [<0>] md_reap_sync_thread+0x1f/0x1e0
>      [<0>] action_store+0x141/0x2b0
>      [<0>] md_attr_store+0x71/0xb0
>      [<0>] kernfs_fop_write+0x113/0x1a0
>      [<0>] vfs_write+0xbc/0x250
>      [<0>] ksys_write+0xa1/0xe0
>      [<0>] do_syscall_64+0x33/0x40
>      [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Note, that md0 has been paused successfully just before.

What is the personality of md0? Is it also raid6?

> 
>      2020-11-27T02:00:01+01:00 done CROND[23333]: (root) CMD 
> (/usr/bin/mdcheck --continue --duration "06:00")
>      2020-11-27T02:00:01+01:00 done root: mdcheck continue checking 
> /dev/md0 from 10623180920
>      2020-11-27T02:00:01.382994+01:00 done kernel: [378596.606381] md: 
> data-check of RAID array md0
>      2020-11-27T02:00:01+01:00 done root: mdcheck continue checking 
> /dev/md1 from 11582849320
>      2020-11-27T02:00:01.437999+01:00 done kernel: [378596.661559] md: 
> data-check of RAID array md1
>      2020-11-27T06:00:01.842003+01:00 done kernel: [392996.625147] md: 
> md0: data-check interrupted.
>      2020-11-27T06:00:02+01:00 done root: pause checking /dev/md0 at 
> 13351127680
>      2020-11-27T06:00:02.338989+01:00 done kernel: [392997.122520] md: 
> md1: data-check interrupted.
>      [ nothing related following.... ]
> 
> After that, we see md1_raid6 in a busy loop:
> 
>      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ 
> COMMAND
>      2376 root     20   0       0      0      0 R 100.0  0.0   1387:38 
> md1_raid6

Seems the reap sync thread was trying to stop md1_raid6 while md1_raid6 
was triggered again and again.

> 
> Also, all processes doing I/O do the md device block.
> 
> This is /proc/mdstat:
> 
>      Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] 
> [multipath]
>      md1 : active raid6 sdk[0] sdj[15] sdi[14] sdh[13] sdg[12] sdf[11] 
> sde[10] sdd[9] sdc[8] sdr[7] sdq[6] sdp[5] sdo[4] sdn[3] sdm[2] sdl[1]
>            109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 
> 2 [16/16] [UUUUUUUUUUUUUUUU]
>            [==================>..]  check = 94.0% 
> (7350290348/7813894144) finish=57189.3min speed=135K/sec
>            bitmap: 0/59 pages [0KB], 65536KB chunk
>      md0 : active raid6 sds[0] sdah[15] sdag[16] sdaf[13] sdae[12] 
> sdad[11] sdac[10] sdab[9] sdaa[8] sdz[7] sdy[6] sdx[17] sdw[4] sdv[3] 
> sdu[2] sdt[1]
>            109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 
> 2 [16/16] [UUUUUUUUUUUUUUUU]
>            bitmap: 0/59 pages [0KB], 65536KB chunk
> 

So the RECOVERY_CHECK flag should be set, not sure if the simple changes
helps, but you may give it a try.

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 98bac4f..e2697d0 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9300,7 +9300,8 @@ void md_check_recovery(struct mddev *mddev)
                         md_update_sb(mddev, 0);

                 if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
-                   !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
+                   (!test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
+                    test_bit(MD_RECOVERY_CHECK, &mddev->recovery))) {
                         /* resync/recovery still happening */
                         clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
                         goto unlock;

Thanks,
Guoqing

  reply index

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-28 12:25 Donald Buczek
2020-11-30  2:06 ` Guoqing Jiang [this message]
2020-12-01  9:29   ` Donald Buczek
2020-12-02 17:28     ` Donald Buczek
2020-12-03  1:55       ` Guoqing Jiang
2020-12-03 11:42         ` Donald Buczek
2020-12-21 12:33           ` Donald Buczek
2021-01-19 11:30             ` Donald Buczek
2021-01-20 16:33               ` Guoqing Jiang
2021-01-23 13:04                 ` Donald Buczek
2021-01-25  8:54                   ` Donald Buczek
2021-01-25 21:32                     ` Donald Buczek
2021-01-26  0:44                       ` Guoqing Jiang
2021-01-26  9:50                         ` Donald Buczek
2021-01-26 11:14                           ` Guoqing Jiang
2021-01-26 12:58                             ` Donald Buczek
2021-01-26 14:06                               ` Guoqing Jiang
2021-01-26 16:05                                 ` Donald Buczek
2021-02-02 15:42                                   ` Guoqing Jiang
2021-02-08 11:38                                     ` Donald Buczek
2021-02-08 14:53                                       ` Guoqing Jiang
2021-02-08 18:41                                         ` Donald Buczek
2021-02-09  0:46                                           ` Guoqing Jiang
2021-02-09  9:24                                             ` Donald Buczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=95fbd558-5e46-7a6a-43ac-bcc5ae8581db@cloud.ionos.com \
    --to=guoqing.jiang@cloud.ionos.com \
    --cc=buczek@molgen.mpg.de \
    --cc=it+raid@molgen.mpg.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Raid Archives on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-raid/0 linux-raid/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-raid linux-raid/ https://lore.kernel.org/linux-raid \
		linux-raid@vger.kernel.org
	public-inbox-index linux-raid

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-raid


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git