All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Kuai <yukuai1@huaweicloud.com>
To: Guoqing Jiang <guoqing.jiang@linux.dev>,
	Yu Kuai <yukuai1@huaweicloud.com>,
	logang@deltatee.com, pmenzel@molgen.mpg.de, agk@redhat.com,
	snitzer@kernel.org, song@kernel.org
Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	yi.zhang@huawei.com, yangerkun@huawei.com,
	Marc Smith <msmith626@gmail.com>,
	"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [PATCH -next 1/6] Revert "md: unlock mddev before reap sync_thread in action_store"
Date: Thu, 23 Mar 2023 14:32:50 +0800	[thread overview]
Message-ID: <3aa073e9-5145-aae2-2201-5ba48c09c693@huaweicloud.com> (raw)
In-Reply-To: <3fc2a539-e4cc-e057-6cf0-da7b3953be6e@linux.dev>

Hi,

在 2023/03/23 11:50, Guoqing Jiang 写道:

> Combined your debug patch with above steps. Seems you are
> 
> 1. add delay to action_store, so it can't get lock in time.
> 2. echo "want_replacement"**triggers md_check_recovery which can grab lock
>      to start sync thread.
> 3. action_store finally hold lock to clear RECOVERY_RUNNING in reap sync 
> thread.
> 4. Then the new added BUG_ON is invoked since RECOVERY_RUNNING is cleared
>      in step 3.

Yes, this is exactly what I did.

> sync_thread can be interrupted once MD_RECOVERY_INTR is set which means 
> the RUNNING
> can be cleared, so I am not sure the added BUG_ON is reasonable. And 
> change BUG_ON

I think BUG_ON() is reasonable because only md_reap_sync_thread can
clear it, md_do_sync will exit quictly if MD_RECOVERY_INTR is set, but
md_do_sync should not see that MD_RECOVERY_RUNNING is cleared, otherwise
there is no gurantee that only one sync_thread can be in progress.

> like this makes more sense to me.
> 
> +BUG_ON(!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
> +!test_bit(MD_RECOVERY_INTR, &mddev->recovery));

I think this can be reporduced likewise, md_check_recovery clear
MD_RECOVERY_INTR, and new sync_thread triggered by echo
"want_replacement" won't set this bit.

> 
> I think there might be racy window like you described but it should be 
> really small, I prefer
> to just add a few lines like this instead of revert and introduce new 
> lock to resolve the same
> issue (if it is).

The new lock that I add in this patchset is just try to synchronize idle
and forzen from action_store(patch 3), I can drop it if you think this
is not necessary.

The main changes is patch 4, new lines is not much and I really don't
like to add new flags unless we have to, current code is already hard
to understand...

By the way, I'm concerned that drop the mutex to unregister sync_thread
might not be safe, since the mutex protects lots of stuff, and there
might exist other implicit dependencies.

> 
> TBH, I am reluctant to see the changes in the series, it can only be 
> considered
> acceptable with conditions:
> 
> 1. the previous raid456 bug can be fixed in this way too, hopefully Marc 
> or others
>      can verify it.
> 2. pass all the tests in mdadm

I already test this patchset with mdadm, If there are reporducer for
raid456 bug, I can try to verify it myself.

Thanks,
Kuai
> 
> Thanks,
> Guoqing
> .
> 


  reply	other threads:[~2023-03-23  6:32 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-22  6:41 [PATCH -next 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running Yu Kuai
2023-03-22  6:41 ` [PATCH -next 1/6] Revert "md: unlock mddev before reap sync_thread in action_store" Yu Kuai
2023-03-22  7:19   ` Guoqing Jiang
2023-03-22  9:00     ` Yu Kuai
2023-03-22 14:32       ` Guoqing Jiang
2023-03-23  1:36         ` Yu Kuai
2023-03-23  3:50           ` Guoqing Jiang
2023-03-23  6:32             ` Yu Kuai [this message]
2023-03-28 23:58               ` Song Liu
2023-04-06  8:53                 ` Yu Kuai
2023-05-05  9:05                   ` Yu Kuai
2023-03-22  6:41 ` [PATCH -next 2/6] md: refactor action_store() for 'idle' and 'frozen' Yu Kuai
2023-03-22  6:41 ` [PATCH -next 3/6] md: add a mutex to synchronize idle and frozen in action_store() Yu Kuai
2023-03-22  6:41 ` [PATCH -next 4/6] md: refactor idle/frozen_sync_thread() Yu Kuai
2023-03-22  6:41 ` [PATCH -next 5/6] md: wake up 'resync_wait' at last in md_reap_sync_thread() Yu Kuai
2023-03-22  6:41 ` [PATCH -next 6/6] md: enhance checking in md_check_recovery() Yu Kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3aa073e9-5145-aae2-2201-5ba48c09c693@huaweicloud.com \
    --to=yukuai1@huaweicloud.com \
    --cc=agk@redhat.com \
    --cc=guoqing.jiang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=msmith626@gmail.com \
    --cc=pmenzel@molgen.mpg.de \
    --cc=snitzer@kernel.org \
    --cc=song@kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.