All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
To: Donald Buczek <buczek@molgen.mpg.de>, Song Liu <song@kernel.org>,
	linux-raid@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	it+raid@molgen.mpg.de
Subject: Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition
Date: Thu, 3 Dec 2020 02:55:15 +0100	[thread overview]
Message-ID: <b289ae15-ff82-b36e-4be4-a1c8bbdbacd7@cloud.ionos.com> (raw)
In-Reply-To: <7c5438c7-2324-cc50-db4d-512587cb0ec9@molgen.mpg.de>

Hi Donald,

On 12/2/20 18:28, Donald Buczek wrote:
> Dear Guoqing,
> 
> unfortunately the patch didn't fix the problem (unless I messed it up 
> with my logging). This is what I used:
> 
>      --- a/drivers/md/md.c
>      +++ b/drivers/md/md.c
>      @@ -9305,6 +9305,14 @@ void md_check_recovery(struct mddev *mddev)
>                              clear_bit(MD_RECOVERY_NEEDED, 
> &mddev->recovery);
>                              goto unlock;
>                      }

I think you can add the check of RECOVERY_CHECK in above part instead of 
add a new part.

>      +               if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
>      +                   (!test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
>      +                    test_bit(MD_RECOVERY_CHECK, &mddev->recovery))) {
>      +                       /* resync/recovery still happening */
>      +                       pr_info("md: XXX BUGFIX applied\n");
>      +                       clear_bit(MD_RECOVERY_NEEDED, 
> &mddev->recovery);
>      +                       goto unlock;
>      +               }
>                      if (mddev->sync_thread) {
>                              md_reap_sync_thread(mddev);
>                              goto unlock;


> 
> With pausing and continuing the check four times an hour, I could 
> trigger the problem after about 48 hours. This time, the other device 
> (md0) has locked up on `echo idle > 
> /sys/devices/virtual/block/md0/md/sync_action` , while the check of md1 
> is still ongoing:

Without the patch, md0 was good while md1 was locked. So the patch 
switches the status of the two arrays, a little weird ...

What is the stack of the process? I guess it is same as the stack of 
23333 in your previous mail, but just to confirm.

> 
>      Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] 
> [multipath]
>      md1 : active raid6 sdk[0] sdj[15] sdi[14] sdh[13] sdg[12] sdf[11] 
> sde[10] sdd[9] sdc[8] sdr[7] sdq[6] sdp[5] sdo[4] sdn[3] sdm[2] sdl[1]
>            109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 
> 2 [16/16] [UUUUUUUUUUUUUUUU]
>            [=>...................]  check =  8.5% (666852112/7813894144) 
> finish=1271.2min speed=93701K/sec
>            bitmap: 0/59 pages [0KB], 65536KB chunk
>      md0 : active raid6 sds[0] sdah[15] sdag[16] sdaf[13] sdae[12] 
> sdad[11] sdac[10] sdab[9] sdaa[8] sdz[7] sdy[6] sdx[17] sdw[4] sdv[3] 
> sdu[2] sdt[1]
>            109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 
> 2 [16/16] [UUUUUUUUUUUUUUUU]
>            [>....................]  check =  0.2% (19510348/7813894144) 
> finish=253779.6min speed=511K/sec
>            bitmap: 0/59 pages [0KB], 65536KB chunk
> 
> after 1 minute:
> 
>      Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] 
> [multipath]
>      md1 : active raid6 sdk[0] sdj[15] sdi[14] sdh[13] sdg[12] sdf[11] 
> sde[10] sdd[9] sdc[8] sdr[7] sdq[6] sdp[5] sdo[4] sdn[3] sdm[2] sdl[1]
>            109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 
> 2 [16/16] [UUUUUUUUUUUUUUUU]
>            [=>...................]  check =  8.6% (674914560/7813894144) 
> finish=941.1min speed=126418K/sec
>            bitmap: 0/59 pages [0KB], 65536KB chunk
>      md0 : active raid6 sds[0] sdah[15] sdag[16] sdaf[13] sdae[12] 
> sdad[11] sdac[10] sdab[9] sdaa[8] sdz[7] sdy[6] sdx[17] sdw[4] sdv[3] 
> sdu[2] sdt[1]
>            109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 
> 2 [16/16] [UUUUUUUUUUUUUUUU]
>            [>....................]  check =  0.2% (19510348/7813894144) 
> finish=256805.0min speed=505K/sec
>            bitmap: 0/59 pages [0KB], 65536KB chunk
> 
> A data point, I didn't mention in my previous mail, is that the 
> mdX_resync thread is not gone when the problem occurs:
> 
>      buczek@done:/scratch/local/linux (v5.10-rc6-mpi)$ ps -Af|fgrep [md
>      root       134     2  0 Nov30 ?        00:00:00 [md]
>      root      1321     2 27 Nov30 ?        12:57:14 [md0_raid6]
>      root      1454     2 26 Nov30 ?        12:37:23 [md1_raid6]
>      root      5845     2  0 16:20 ?        00:00:30 [md0_resync]
>      root      5855     2 13 16:20 ?        00:14:11 [md1_resync]
>      buczek    9880  9072  0 18:05 pts/0    00:00:00 grep -F [md
>      buczek@done:/scratch/local/linux (v5.10-rc6-mpi)$ sudo cat 
> /proc/5845/stack
>      [<0>] md_bitmap_cond_end_sync+0x12d/0x170
>      [<0>] raid5_sync_request+0x24b/0x390
>      [<0>] md_do_sync+0xb41/0x1030
>      [<0>] md_thread+0x122/0x160
>      [<0>] kthread+0x118/0x130
>      [<0>] ret_from_fork+0x1f/0x30
> 
> I guess, md_bitmap_cond_end_sync+0x12d is the 
> `wait_event(bitmap->mddev->recovery_wait,atomic_read(&bitmap->mddev->recovery_active) 
> == 0);` in md-bitmap.c.
> 

Could be, if so, then I think md_done_sync was not triggered by the path 
md0_raid6 -> ... -> handle_stripe.

I'd suggest to compare the stacks between md0 and md1 to find the 
difference.

Thanks,
Guoqing

  reply	other threads:[~2020-12-03  1:56 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-28 12:25 md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition Donald Buczek
2020-11-30  2:06 ` Guoqing Jiang
2020-12-01  9:29   ` Donald Buczek
2020-12-02 17:28     ` Donald Buczek
2020-12-03  1:55       ` Guoqing Jiang [this message]
2020-12-03 11:42         ` Donald Buczek
2020-12-21 12:33           ` Donald Buczek
2021-01-19 11:30             ` Donald Buczek
2021-01-20 16:33               ` Guoqing Jiang
2021-01-23 13:04                 ` Donald Buczek
2021-01-25  8:54                   ` Donald Buczek
2021-01-25 21:32                     ` Donald Buczek
2021-01-26  0:44                       ` Guoqing Jiang
2021-01-26  9:50                         ` Donald Buczek
2021-01-26 11:14                           ` Guoqing Jiang
2021-01-26 12:58                             ` Donald Buczek
2021-01-26 14:06                               ` Guoqing Jiang
2021-01-26 16:05                                 ` Donald Buczek
2021-02-02 15:42                                   ` Guoqing Jiang
2021-02-08 11:38                                     ` Donald Buczek
2021-02-08 14:53                                       ` Guoqing Jiang
2021-02-08 18:41                                         ` Donald Buczek
2021-02-09  0:46                                           ` Guoqing Jiang
2021-02-09  9:24                                             ` Donald Buczek
2023-03-14 13:25                                             ` Marc Smith
2023-03-14 13:55                                               ` Guoqing Jiang
2023-03-14 14:45                                                 ` Marc Smith
2023-03-16 15:25                                                   ` Marc Smith
2023-03-29  0:01                                                     ` Song Liu
2023-08-22 21:16                                                       ` Dragan Stancevic
2023-08-23  1:22                                                         ` Yu Kuai
2023-08-23 15:33                                                           ` Dragan Stancevic
2023-08-24  1:18                                                             ` Yu Kuai
2023-08-28 20:32                                                               ` Dragan Stancevic
2023-08-30  1:36                                                                 ` Yu Kuai
2023-09-05  3:50                                                                   ` Yu Kuai
2023-09-05 13:54                                                                     ` Dragan Stancevic
2023-09-13  9:08                                                                       ` Donald Buczek
2023-09-13 14:16                                                                         ` Dragan Stancevic
2023-09-14  6:03                                                                           ` Donald Buczek
2023-09-17  8:55                                                                             ` Donald Buczek
2023-09-24 14:35                                                                               ` Donald Buczek
2023-09-25  1:11                                                                                 ` Yu Kuai
2023-09-25  9:11                                                                                   ` Donald Buczek
2023-09-25  9:32                                                                                     ` Yu Kuai
2023-03-15  3:02                                                 ` Yu Kuai
2023-03-15  9:30                                                   ` Guoqing Jiang
2023-03-15  9:53                                                     ` Yu Kuai
2023-03-15  7:52                                               ` Donald Buczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b289ae15-ff82-b36e-4be4-a1c8bbdbacd7@cloud.ionos.com \
    --to=guoqing.jiang@cloud.ionos.com \
    --cc=buczek@molgen.mpg.de \
    --cc=it+raid@molgen.mpg.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.