From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> To: song@kernel.org Cc: agk@redhat.com, snitzer@redhat.com, linux-raid@vger.kernel.org, dm-devel@redhat.com, Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Subject: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held Date: Sat, 13 Feb 2021 01:49:59 +0100 [thread overview] Message-ID: <1613177399-22024-1-git-send-email-guoqing.jiang@cloud.ionos.com> (raw) Unregister sync_thread doesn't need to hold reconfig_mutex since it doesn't reconfigure array. And it could cause deadlock problem for raid5 as follows: 1. process A tried to reap sync thread with reconfig_mutex held after echo idle to sync_action. 2. raid5 sync thread was blocked if there were too many active stripes. 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer) which causes the number of active stripes can't be decreased. 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able to hold reconfig_mutex. More details in the link: https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t And add one parameter to md_reap_sync_thread since it could be called by dm-raid which doesn't hold reconfig_mutex. Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> --- V2: 1. add one parameter to md_reap_sync_thread per Jack's suggestion. drivers/md/dm-raid.c | 2 +- drivers/md/md.c | 14 +++++++++----- drivers/md/md.h | 2 +- 3 files changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index cab12b2..0c4cbba 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv, if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) { if (mddev->sync_thread) { set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_reap_sync_thread(mddev); + md_reap_sync_thread(mddev, false); } } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle) return -EBUSY; diff --git a/drivers/md/md.c b/drivers/md/md.c index ca40942..0c12b7f 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char *page, size_t len) flush_workqueue(md_misc_wq); if (mddev->sync_thread) { set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_reap_sync_thread(mddev); + md_reap_sync_thread(mddev, true); } mddev_unlock(mddev); } @@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev) flush_workqueue(md_misc_wq); if (mddev->sync_thread) { set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_reap_sync_thread(mddev); + md_reap_sync_thread(mddev, true); } del_timer_sync(&mddev->safemode_timer); @@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev) * ->spare_active and clear saved_raid_disk */ set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_reap_sync_thread(mddev); + md_reap_sync_thread(mddev, true); clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); @@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev) goto unlock; } if (mddev->sync_thread) { - md_reap_sync_thread(mddev); + md_reap_sync_thread(mddev, true); goto unlock; } /* Set RUNNING before clearing NEEDED to avoid @@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev) } EXPORT_SYMBOL(md_check_recovery); -void md_reap_sync_thread(struct mddev *mddev) +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held) { struct md_rdev *rdev; sector_t old_dev_sectors = mddev->dev_sectors; bool is_reshaped = false; /* resync has finished, collect result */ + if (reconfig_mutex_held) + mddev_unlock(mddev); md_unregister_thread(&mddev->sync_thread); + if (reconfig_mutex_held) + mddev_lock_nointr(mddev); if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) && mddev->degraded != mddev->raid_disks) { diff --git a/drivers/md/md.h b/drivers/md/md.h index 34070ab..7ae3d98 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -705,7 +705,7 @@ extern struct md_thread *md_register_thread( extern void md_unregister_thread(struct md_thread **threadp); extern void md_wakeup_thread(struct md_thread *thread); extern void md_check_recovery(struct mddev *mddev); -extern void md_reap_sync_thread(struct mddev *mddev); +extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held); extern int mddev_init_writes_pending(struct mddev *mddev); extern bool md_write_start(struct mddev *mddev, struct bio *bi); extern void md_write_inc(struct mddev *mddev, struct bio *bi); -- 2.7.4
WARNING: multiple messages have this Message-ID (diff)
From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> To: song@kernel.org Cc: linux-raid@vger.kernel.org, Guoqing Jiang <guoqing.jiang@cloud.ionos.com>, dm-devel@redhat.com, snitzer@redhat.com, agk@redhat.com Subject: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held Date: Sat, 13 Feb 2021 01:49:59 +0100 [thread overview] Message-ID: <1613177399-22024-1-git-send-email-guoqing.jiang@cloud.ionos.com> (raw) Unregister sync_thread doesn't need to hold reconfig_mutex since it doesn't reconfigure array. And it could cause deadlock problem for raid5 as follows: 1. process A tried to reap sync thread with reconfig_mutex held after echo idle to sync_action. 2. raid5 sync thread was blocked if there were too many active stripes. 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer) which causes the number of active stripes can't be decreased. 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able to hold reconfig_mutex. More details in the link: https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t And add one parameter to md_reap_sync_thread since it could be called by dm-raid which doesn't hold reconfig_mutex. Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> --- V2: 1. add one parameter to md_reap_sync_thread per Jack's suggestion. drivers/md/dm-raid.c | 2 +- drivers/md/md.c | 14 +++++++++----- drivers/md/md.h | 2 +- 3 files changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index cab12b2..0c4cbba 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv, if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) { if (mddev->sync_thread) { set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_reap_sync_thread(mddev); + md_reap_sync_thread(mddev, false); } } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle) return -EBUSY; diff --git a/drivers/md/md.c b/drivers/md/md.c index ca40942..0c12b7f 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char *page, size_t len) flush_workqueue(md_misc_wq); if (mddev->sync_thread) { set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_reap_sync_thread(mddev); + md_reap_sync_thread(mddev, true); } mddev_unlock(mddev); } @@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev) flush_workqueue(md_misc_wq); if (mddev->sync_thread) { set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_reap_sync_thread(mddev); + md_reap_sync_thread(mddev, true); } del_timer_sync(&mddev->safemode_timer); @@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev) * ->spare_active and clear saved_raid_disk */ set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_reap_sync_thread(mddev); + md_reap_sync_thread(mddev, true); clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); @@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev) goto unlock; } if (mddev->sync_thread) { - md_reap_sync_thread(mddev); + md_reap_sync_thread(mddev, true); goto unlock; } /* Set RUNNING before clearing NEEDED to avoid @@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev) } EXPORT_SYMBOL(md_check_recovery); -void md_reap_sync_thread(struct mddev *mddev) +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held) { struct md_rdev *rdev; sector_t old_dev_sectors = mddev->dev_sectors; bool is_reshaped = false; /* resync has finished, collect result */ + if (reconfig_mutex_held) + mddev_unlock(mddev); md_unregister_thread(&mddev->sync_thread); + if (reconfig_mutex_held) + mddev_lock_nointr(mddev); if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) && mddev->degraded != mddev->raid_disks) { diff --git a/drivers/md/md.h b/drivers/md/md.h index 34070ab..7ae3d98 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -705,7 +705,7 @@ extern struct md_thread *md_register_thread( extern void md_unregister_thread(struct md_thread **threadp); extern void md_wakeup_thread(struct md_thread *thread); extern void md_check_recovery(struct mddev *mddev); -extern void md_reap_sync_thread(struct mddev *mddev); +extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held); extern int mddev_init_writes_pending(struct mddev *mddev); extern bool md_write_start(struct mddev *mddev, struct bio *bi); extern void md_write_inc(struct mddev *mddev, struct bio *bi); -- 2.7.4 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
next reply other threads:[~2021-02-13 0:51 UTC|newest] Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-02-13 0:49 Guoqing Jiang [this message] 2021-02-13 0:49 ` [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held Guoqing Jiang 2021-02-15 11:07 ` Paul Menzel 2021-02-15 11:07 ` [dm-devel] " Paul Menzel 2021-02-24 9:09 ` Song Liu 2021-02-24 9:09 ` [dm-devel] " Song Liu 2021-02-24 9:25 ` Guoqing Jiang 2021-02-24 9:25 ` [dm-devel] " Guoqing Jiang 2021-03-19 23:00 ` Song Liu 2021-03-19 23:00 ` [dm-devel] " Song Liu 2021-11-30 17:25 ` Paul Menzel 2021-11-30 17:25 ` [dm-devel] " Paul Menzel 2021-11-30 17:27 ` Paul Menzel 2021-11-30 17:27 ` [dm-devel] " Paul Menzel 2021-12-08 14:16 ` Guoqing Jiang 2021-12-08 14:16 ` [dm-devel] " Guoqing Jiang 2021-12-08 16:35 ` Heinz Mauelshagen 2021-12-09 0:47 ` Guoqing Jiang 2021-12-09 0:47 ` [dm-devel] " Guoqing Jiang 2021-12-09 12:54 ` Donald Buczek 2021-12-09 12:54 ` [dm-devel] " Donald Buczek 2021-12-09 12:57 ` Donald Buczek 2021-12-09 12:57 ` [dm-devel] " Donald Buczek 2021-12-10 1:06 ` Guoqing Jiang 2021-12-10 1:06 ` [dm-devel] " Guoqing Jiang 2021-12-10 14:16 ` Donald Buczek 2021-12-10 14:16 ` [dm-devel] " Donald Buczek 2021-12-14 2:34 ` Guoqing Jiang 2021-12-14 2:34 ` [dm-devel] " Guoqing Jiang 2021-12-14 9:31 ` Donald Buczek 2021-12-14 9:31 ` [dm-devel] " Donald Buczek 2021-12-14 10:03 ` Guoqing Jiang 2021-12-14 10:03 ` [dm-devel] " Guoqing Jiang 2021-12-14 12:21 ` Donald Buczek 2021-12-14 12:21 ` [dm-devel] " Donald Buczek 2022-01-11 12:25 ` Mikko Rantalainen 2022-01-11 12:25 ` [dm-devel] " Mikko Rantalainen
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1613177399-22024-1-git-send-email-guoqing.jiang@cloud.ionos.com \ --to=guoqing.jiang@cloud.ionos.com \ --cc=agk@redhat.com \ --cc=dm-devel@redhat.com \ --cc=linux-raid@vger.kernel.org \ --cc=snitzer@redhat.com \ --cc=song@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.