From: Yu Kuai <yukuai1@huaweicloud.com>
To: Guoqing Jiang <guoqing.jiang@linux.dev>,
Yu Kuai <yukuai1@huaweicloud.com>,
logang@deltatee.com, pmenzel@molgen.mpg.de, agk@redhat.com,
snitzer@kernel.org, song@kernel.org
Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
yi.zhang@huawei.com, yangerkun@huawei.com,
Marc Smith <msmith626@gmail.com>,
"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [PATCH -next 1/6] Revert "md: unlock mddev before reap sync_thread in action_store"
Date: Thu, 23 Mar 2023 09:36:27 +0800 [thread overview]
Message-ID: <31e7f59e-579a-7812-632d-059ed0a6d441@huaweicloud.com> (raw)
In-Reply-To: <b91ae03a-14d5-11eb-8ec7-3ed91ff2c59e@linux.dev>
Hi,
在 2023/03/22 22:32, Guoqing Jiang 写道:
>>> Could you explain how the same work can be re-queued? Isn't the
>>> PENDING_BIT
>>> is already set in t3? I believe queue_work shouldn't do that per the
>>> comment
>>> but I am not expert ...
>>
>> This is not related to workqueue, it is just because raid10
>> reinitialize the work that is already queued,
>
> I am trying to understand the possibility.
>
>> like I discribed later in t3:
>>
>> t2:
>> md_check_recovery:
>> INIT_WORK -> clear pending
>> queue_work -> set pending
>> list_add_tail
>> ...
>>
>> t3: -> work is still pending
>> md_check_recovery:
>> INIT_WORK -> clear pending
>> queue_work -> set pending
>> list_add_tail -> list is corrupted
>
> First, t2 and t3 can't be run in parallel since reconfig_mutex must be
> held. And if sync_thread existed,
> the second process would unregister and reap sync_thread which means the
> second process will
> call INIT_WORK and queue_work again.
>
> Maybe your description is valid, I would prefer call work_pending and
> flush_workqueue instead of
> INIT_WORK and queue_work.
This is not enough, it's right this can avoid list corruption, but the
worker function md_start_sync just register a sync_thread, and
md_do_sync() can still in progress, hence this can't prevent a new
sync_thread to start while the old one is not done, some other problems
like deadlock can still be triggered.
>> Of course, our 5.10 and mainline are the same,
>>
>> there are some tests:
>>
>> First the deadlock can be reporduced reliably, test script is simple:
>>
>> mdadm -Cv /dev/md0 -n 4 -l10 /dev/sd[abcd]
>
> So this is raid10 while the previous problem was appeared in raid456, I
> am not sure it is the same
> issue, but let's see.
Ok, I'm not quite familiar with raid456 yet, however, the problem is
still related to that action_store hold mutex to unregister sync_thread,
right?
>> Then, the problem MD_RECOVERY_RUNNING can be cleared can't be reporduced
>> reliably, usually it takes 2+ days to triggered a problem, and each time
>> problem phenomenon can be different, I'm hacking the kernel and add
>> some BUG_ON to test MD_RECOVERY_RUNNING in attached patch, following
>> test can trigger the BUG_ON:
>
> Also your debug patch obviously added large delay which make the
> calltrace happen, I doubt
> if user can hit it in real life. Anyway, will try below test from my side.
>
>> mdadm -Cv /dev/md0 -e1.0 -n 4 -l 10 /dev/sd{a..d} --run
>> sleep 5
>> echo 1 > /sys/module/md_mod/parameters/set_delay
>> echo idle > /sys/block/md0/md/sync_action &
>> sleep 5
>> echo "want_replacement" > /sys/block/md0/md/dev-sdd/state
>>
>> test result:
>>
>> [ 228.390237] md_check_recovery: running is set
>> [ 228.391376] md_check_recovery: queue new sync thread
>> [ 233.671041] action_store unregister success! delay 10s
>> [ 233.689276] md_check_recovery: running is set
>> [ 238.722448] md_check_recovery: running is set
>> [ 238.723328] md_check_recovery: queue new sync thread
>> [ 238.724851] md_do_sync: before new wor, sleep 10s
>> [ 239.725818] md_do_sync: delay done
>> [ 243.674828] action_store delay done
>> [ 243.700102] md_reap_sync_thread: running is cleared!
>> [ 243.748703] ------------[ cut here ]------------
>> [ 243.749656] kernel BUG at drivers/md/md.c:9084!
>
> After your debug patch applied, is L9084 points to below?
>
> 9084 mddev->curr_resync = MaxSector;
In my environment, it's a BUG_ON() that I added in md_do_sync:
9080 skip:
9081 /* set CHANGE_PENDING here since maybe another update is
needed,
9082 ┊* so other nodes are informed. It should be harmless for
normal
9083 ┊* raid */
9084 BUG_ON(!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
9085 set_mask_bits(&mddev->sb_flags, 0,
9086 ┊ BIT(MD_SB_CHANGE_PENDING) |
BIT(MD_SB_CHANGE_DEVS));
>
> I don't understand how it triggers below calltrace, and it has nothing
> to do with
> list corruption, right?
Yes, this is just a early BUG_ON() to detect that if MD_RECOVERY_RUNNING
is cleared while sync_thread is still in progress.
Thanks,
Kuai
next prev parent reply other threads:[~2023-03-23 1:36 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-22 6:41 [PATCH -next 0/6] md: fix that MD_RECOVERY_RUNNING can be cleared while sync_thread is still running Yu Kuai
2023-03-22 6:41 ` [PATCH -next 1/6] Revert "md: unlock mddev before reap sync_thread in action_store" Yu Kuai
2023-03-22 7:19 ` Guoqing Jiang
2023-03-22 9:00 ` Yu Kuai
2023-03-22 14:32 ` Guoqing Jiang
2023-03-23 1:36 ` Yu Kuai [this message]
2023-03-23 3:50 ` Guoqing Jiang
2023-03-23 6:32 ` Yu Kuai
2023-03-28 23:58 ` Song Liu
2023-04-06 8:53 ` Yu Kuai
2023-05-05 9:05 ` Yu Kuai
2023-03-22 6:41 ` [PATCH -next 2/6] md: refactor action_store() for 'idle' and 'frozen' Yu Kuai
2023-03-22 6:41 ` [PATCH -next 3/6] md: add a mutex to synchronize idle and frozen in action_store() Yu Kuai
2023-03-22 6:41 ` [PATCH -next 4/6] md: refactor idle/frozen_sync_thread() Yu Kuai
2023-03-22 6:41 ` [PATCH -next 5/6] md: wake up 'resync_wait' at last in md_reap_sync_thread() Yu Kuai
2023-03-22 6:41 ` [PATCH -next 6/6] md: enhance checking in md_check_recovery() Yu Kuai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=31e7f59e-579a-7812-632d-059ed0a6d441@huaweicloud.com \
--to=yukuai1@huaweicloud.com \
--cc=agk@redhat.com \
--cc=guoqing.jiang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=msmith626@gmail.com \
--cc=pmenzel@molgen.mpg.de \
--cc=snitzer@kernel.org \
--cc=song@kernel.org \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).