All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Xiao Ni <xni@redhat.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
Date: Wed, 11 Oct 2017 08:20:56 +1100	[thread overview]
Message-ID: <87vajmwvgn.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <ebf97c38-c8e0-aa87-be84-efc8d56802f0@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2470 bytes --]

On Tue, Oct 10 2017, Xiao Ni wrote:

> On 10/09/2017 01:52 PM, NeilBrown wrote:
>> On Mon, Oct 09 2017, Xiao Ni wrote:
>>
>>> On 10/09/2017 12:57 PM, NeilBrown wrote:
>>>> It would if you had applied
>>>>      [PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce()
>>>>
>>>> Did you apply all 4 patches?
>>> Sorry, it's my mistake. I insmod the wrong module. I'll apply the four
>>> patches
>>> and do test again.
>>>> Thanks.  I looks suspend_lo_store() is calling raid5_quiesce() directly
>>>> as you say - so a patch is missing.
>>> Yes, thanks for pointing about this.
>
> Hi Neil
>
> I applied the four patches and one patch "md: fix deadlock error in 
> recent patch."
> There is a new stuck. It's stuck at suspend_hi_store this time. I add 
> the calltrace
> as an attachment.
>
> I added some printk to print some information.
>
> [12695.993329] mddev suspend : 1
> [12695.996270] mddev ro : 0
> [12695.998790] mddev insync : 0
> [12696.001641] mddev active io: 1

You didn't tell me where (in the code) you printed this information.
That makes it hard to interpret.

If mddev->active_io is 1, then some thread must be in this range
of code

	atomic_inc(&mddev->active_io);
	rcu_read_unlock();

	if (!mddev->pers->make_request(mddev, bio)) {
		atomic_dec(&mddev->active_io);
		wake_up(&mddev->sb_wait);
		goto check_suspended;
	}

	if (atomic_dec_and_test(&mddev->active_io) && mddev->suspended)
		wake_up(&mddev->sb_wait);

If that thread is blocked (which appears to be the case) it must be in
->make_request() because nothing else there blocks.
None of the threads you showed are in that code.
But you didn't report all the threads - only those which hard printed
warnings.

  echo t > /proc/sysrq-trigger

will produce the stack traces of *all* threads.  That would be more
useful.

>
> Can it be:
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index b6b7a28..55e9280 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7777,7 +7777,7 @@ void md_check_recovery(struct mddev *mddev)
>          if (mddev->ro && !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
>                  return;
>          if ( ! (
> -               (mddev->flags & ~ (1<<MD_CHANGE_PENDING)) ||
> +               (mddev->flags & (mddev->external == 1 &&  ~ 
> (1<<MD_CHANGE_PENDING))) ||

Please read that code again and see how it doesn't make any sense at
all.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2017-10-10 21:20 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-12  1:49 [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without NeilBrown
2017-09-12  1:49 ` [PATCH 3/4] md: use mddev_suspend/resume instead of ->quiesce() NeilBrown
2017-09-12  1:49 ` [PATCH 1/4] md: always hold reconfig_mutex when calling mddev_suspend() NeilBrown
2017-09-12  1:49 ` [PATCH 4/4] md: allow metadata update while suspending NeilBrown
2017-09-12  1:49 ` [PATCH 2/4] md: don't call bitmap_create() while array is quiesced NeilBrown
2017-09-12  2:51 ` [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without Xiao Ni
2017-09-13  2:11 ` Xiao Ni
2017-09-13 15:09   ` Xiao Ni
2017-09-13 23:05     ` NeilBrown
2017-09-14  4:55       ` Xiao Ni
2017-09-14  5:32         ` NeilBrown
2017-09-14  7:57           ` Xiao Ni
2017-09-16 13:15             ` Xiao Ni
2017-10-05  5:17             ` NeilBrown
2017-10-06  3:53               ` Xiao Ni
2017-10-06  4:32                 ` NeilBrown
2017-10-09  1:21                   ` Xiao Ni
2017-10-09  4:57                     ` NeilBrown
2017-10-09  5:32                       ` Xiao Ni
2017-10-09  5:52                         ` NeilBrown
2017-10-10  6:05                           ` Xiao Ni
2017-10-10 21:20                             ` NeilBrown [this message]
     [not found]                               ` <960568852.19225619.1507689864371.JavaMail.zimbra@redhat.com>
2017-10-13  3:48                                 ` NeilBrown
2017-10-16  4:43                                   ` Xiao Ni
2017-09-30  9:46 ` Xiao Ni
2017-10-05  5:03   ` NeilBrown
2017-10-06  3:40     ` Xiao Ni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87vajmwvgn.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.