All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-02-13  0:49 ` Guoqing Jiang
  0 siblings, 0 replies; 37+ messages in thread
From: Guoqing Jiang @ 2021-02-13  0:49 UTC (permalink / raw)
  To: song; +Cc: agk, snitzer, linux-raid, dm-devel, Guoqing Jiang

Unregister sync_thread doesn't need to hold reconfig_mutex since it
doesn't reconfigure array.

And it could cause deadlock problem for raid5 as follows:

1. process A tried to reap sync thread with reconfig_mutex held after echo
   idle to sync_action.
2. raid5 sync thread was blocked if there were too many active stripes.
3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
   which causes the number of active stripes can't be decreased.
4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
   to hold reconfig_mutex.

More details in the link:
https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t

And add one parameter to md_reap_sync_thread since it could be called by
dm-raid which doesn't hold reconfig_mutex.

Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
---
V2:
1. add one parameter to md_reap_sync_thread per Jack's suggestion.

 drivers/md/dm-raid.c |  2 +-
 drivers/md/md.c      | 14 +++++++++-----
 drivers/md/md.h      |  2 +-
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index cab12b2..0c4cbba 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
 	if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
 		if (mddev->sync_thread) {
 			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-			md_reap_sync_thread(mddev);
+			md_reap_sync_thread(mddev, false);
 		}
 	} else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
 		return -EBUSY;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index ca40942..0c12b7f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char *page, size_t len)
 				flush_workqueue(md_misc_wq);
 			if (mddev->sync_thread) {
 				set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-				md_reap_sync_thread(mddev);
+				md_reap_sync_thread(mddev, true);
 			}
 			mddev_unlock(mddev);
 		}
@@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev)
 		flush_workqueue(md_misc_wq);
 	if (mddev->sync_thread) {
 		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-		md_reap_sync_thread(mddev);
+		md_reap_sync_thread(mddev, true);
 	}
 
 	del_timer_sync(&mddev->safemode_timer);
@@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev)
 			 * ->spare_active and clear saved_raid_disk
 			 */
 			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-			md_reap_sync_thread(mddev);
+			md_reap_sync_thread(mddev, true);
 			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
 			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 			clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
@@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev)
 			goto unlock;
 		}
 		if (mddev->sync_thread) {
-			md_reap_sync_thread(mddev);
+			md_reap_sync_thread(mddev, true);
 			goto unlock;
 		}
 		/* Set RUNNING before clearing NEEDED to avoid
@@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev)
 }
 EXPORT_SYMBOL(md_check_recovery);
 
-void md_reap_sync_thread(struct mddev *mddev)
+void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
 {
 	struct md_rdev *rdev;
 	sector_t old_dev_sectors = mddev->dev_sectors;
 	bool is_reshaped = false;
 
 	/* resync has finished, collect result */
+	if (reconfig_mutex_held)
+		mddev_unlock(mddev);
 	md_unregister_thread(&mddev->sync_thread);
+	if (reconfig_mutex_held)
+		mddev_lock_nointr(mddev);
 	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
 	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
 	    mddev->degraded != mddev->raid_disks) {
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 34070ab..7ae3d98 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -705,7 +705,7 @@ extern struct md_thread *md_register_thread(
 extern void md_unregister_thread(struct md_thread **threadp);
 extern void md_wakeup_thread(struct md_thread *thread);
 extern void md_check_recovery(struct mddev *mddev);
-extern void md_reap_sync_thread(struct mddev *mddev);
+extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held);
 extern int mddev_init_writes_pending(struct mddev *mddev);
 extern bool md_write_start(struct mddev *mddev, struct bio *bi);
 extern void md_write_inc(struct mddev *mddev, struct bio *bi);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-02-13  0:49 ` Guoqing Jiang
  0 siblings, 0 replies; 37+ messages in thread
From: Guoqing Jiang @ 2021-02-13  0:49 UTC (permalink / raw)
  To: song; +Cc: linux-raid, Guoqing Jiang, dm-devel, snitzer, agk

Unregister sync_thread doesn't need to hold reconfig_mutex since it
doesn't reconfigure array.

And it could cause deadlock problem for raid5 as follows:

1. process A tried to reap sync thread with reconfig_mutex held after echo
   idle to sync_action.
2. raid5 sync thread was blocked if there were too many active stripes.
3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
   which causes the number of active stripes can't be decreased.
4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
   to hold reconfig_mutex.

More details in the link:
https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t

And add one parameter to md_reap_sync_thread since it could be called by
dm-raid which doesn't hold reconfig_mutex.

Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
---
V2:
1. add one parameter to md_reap_sync_thread per Jack's suggestion.

 drivers/md/dm-raid.c |  2 +-
 drivers/md/md.c      | 14 +++++++++-----
 drivers/md/md.h      |  2 +-
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index cab12b2..0c4cbba 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
 	if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
 		if (mddev->sync_thread) {
 			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-			md_reap_sync_thread(mddev);
+			md_reap_sync_thread(mddev, false);
 		}
 	} else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
 		return -EBUSY;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index ca40942..0c12b7f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char *page, size_t len)
 				flush_workqueue(md_misc_wq);
 			if (mddev->sync_thread) {
 				set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-				md_reap_sync_thread(mddev);
+				md_reap_sync_thread(mddev, true);
 			}
 			mddev_unlock(mddev);
 		}
@@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev)
 		flush_workqueue(md_misc_wq);
 	if (mddev->sync_thread) {
 		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-		md_reap_sync_thread(mddev);
+		md_reap_sync_thread(mddev, true);
 	}
 
 	del_timer_sync(&mddev->safemode_timer);
@@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev)
 			 * ->spare_active and clear saved_raid_disk
 			 */
 			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-			md_reap_sync_thread(mddev);
+			md_reap_sync_thread(mddev, true);
 			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
 			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 			clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
@@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev)
 			goto unlock;
 		}
 		if (mddev->sync_thread) {
-			md_reap_sync_thread(mddev);
+			md_reap_sync_thread(mddev, true);
 			goto unlock;
 		}
 		/* Set RUNNING before clearing NEEDED to avoid
@@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev)
 }
 EXPORT_SYMBOL(md_check_recovery);
 
-void md_reap_sync_thread(struct mddev *mddev)
+void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
 {
 	struct md_rdev *rdev;
 	sector_t old_dev_sectors = mddev->dev_sectors;
 	bool is_reshaped = false;
 
 	/* resync has finished, collect result */
+	if (reconfig_mutex_held)
+		mddev_unlock(mddev);
 	md_unregister_thread(&mddev->sync_thread);
+	if (reconfig_mutex_held)
+		mddev_lock_nointr(mddev);
 	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
 	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
 	    mddev->degraded != mddev->raid_disks) {
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 34070ab..7ae3d98 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -705,7 +705,7 @@ extern struct md_thread *md_register_thread(
 extern void md_unregister_thread(struct md_thread **threadp);
 extern void md_wakeup_thread(struct md_thread *thread);
 extern void md_check_recovery(struct mddev *mddev);
-extern void md_reap_sync_thread(struct mddev *mddev);
+extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held);
 extern int mddev_init_writes_pending(struct mddev *mddev);
 extern bool md_write_start(struct mddev *mddev, struct bio *bi);
 extern void md_write_inc(struct mddev *mddev, struct bio *bi);
-- 
2.7.4

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-02-13  0:49 ` [dm-devel] " Guoqing Jiang
@ 2021-02-15 11:07   ` Paul Menzel
  -1 siblings, 0 replies; 37+ messages in thread
From: Paul Menzel @ 2021-02-15 11:07 UTC (permalink / raw)
  To: Guoqing Jiang, song
  Cc: agk, snitzer, linux-raid, dm-devel, Donald Buczek, it+raid

[+cc Donald]

Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
> Unregister sync_thread doesn't need to hold reconfig_mutex since it
> doesn't reconfigure array.
> 
> And it could cause deadlock problem for raid5 as follows:
> 
> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>     idle to sync_action.
> 2. raid5 sync thread was blocked if there were too many active stripes.
> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>     which causes the number of active stripes can't be decreased.
> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>     to hold reconfig_mutex.
> 
> More details in the link:
> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
> 
> And add one parameter to md_reap_sync_thread since it could be called by
> dm-raid which doesn't hold reconfig_mutex.
> 
> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> ---
> V2:
> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
> 
>   drivers/md/dm-raid.c |  2 +-
>   drivers/md/md.c      | 14 +++++++++-----
>   drivers/md/md.h      |  2 +-
>   3 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index cab12b2..0c4cbba 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>   	if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>   		if (mddev->sync_thread) {
>   			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -			md_reap_sync_thread(mddev);
> +			md_reap_sync_thread(mddev, false);
>   		}
>   	} else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
>   		return -EBUSY;
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index ca40942..0c12b7f 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char *page, size_t len)
>   				flush_workqueue(md_misc_wq);
>   			if (mddev->sync_thread) {
>   				set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -				md_reap_sync_thread(mddev);
> +				md_reap_sync_thread(mddev, true);
>   			}
>   			mddev_unlock(mddev);
>   		}
> @@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev)
>   		flush_workqueue(md_misc_wq);
>   	if (mddev->sync_thread) {
>   		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -		md_reap_sync_thread(mddev);
> +		md_reap_sync_thread(mddev, true);
>   	}
>   
>   	del_timer_sync(&mddev->safemode_timer);
> @@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev)
>   			 * ->spare_active and clear saved_raid_disk
>   			 */
>   			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -			md_reap_sync_thread(mddev);
> +			md_reap_sync_thread(mddev, true);
>   			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>   			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>   			clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
> @@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev)
>   			goto unlock;
>   		}
>   		if (mddev->sync_thread) {
> -			md_reap_sync_thread(mddev);
> +			md_reap_sync_thread(mddev, true);
>   			goto unlock;
>   		}
>   		/* Set RUNNING before clearing NEEDED to avoid
> @@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev)
>   }
>   EXPORT_SYMBOL(md_check_recovery);
>   
> -void md_reap_sync_thread(struct mddev *mddev)
> +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>   {
>   	struct md_rdev *rdev;
>   	sector_t old_dev_sectors = mddev->dev_sectors;
>   	bool is_reshaped = false;
>   
>   	/* resync has finished, collect result */
> +	if (reconfig_mutex_held)
> +		mddev_unlock(mddev);
>   	md_unregister_thread(&mddev->sync_thread);
> +	if (reconfig_mutex_held)
> +		mddev_lock_nointr(mddev);
>   	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>   	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>   	    mddev->degraded != mddev->raid_disks) {
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 34070ab..7ae3d98 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -705,7 +705,7 @@ extern struct md_thread *md_register_thread(
>   extern void md_unregister_thread(struct md_thread **threadp);
>   extern void md_wakeup_thread(struct md_thread *thread);
>   extern void md_check_recovery(struct mddev *mddev);
> -extern void md_reap_sync_thread(struct mddev *mddev);
> +extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held);
>   extern int mddev_init_writes_pending(struct mddev *mddev);
>   extern bool md_write_start(struct mddev *mddev, struct bio *bi);
>   extern void md_write_inc(struct mddev *mddev, struct bio *bi);
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-02-15 11:07   ` Paul Menzel
  0 siblings, 0 replies; 37+ messages in thread
From: Paul Menzel @ 2021-02-15 11:07 UTC (permalink / raw)
  To: Guoqing Jiang, song
  Cc: snitzer, linux-raid, dm-devel, agk, Donald Buczek, it+raid

[+cc Donald]

Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
> Unregister sync_thread doesn't need to hold reconfig_mutex since it
> doesn't reconfigure array.
> 
> And it could cause deadlock problem for raid5 as follows:
> 
> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>     idle to sync_action.
> 2. raid5 sync thread was blocked if there were too many active stripes.
> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>     which causes the number of active stripes can't be decreased.
> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>     to hold reconfig_mutex.
> 
> More details in the link:
> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
> 
> And add one parameter to md_reap_sync_thread since it could be called by
> dm-raid which doesn't hold reconfig_mutex.
> 
> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> ---
> V2:
> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
> 
>   drivers/md/dm-raid.c |  2 +-
>   drivers/md/md.c      | 14 +++++++++-----
>   drivers/md/md.h      |  2 +-
>   3 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index cab12b2..0c4cbba 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>   	if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>   		if (mddev->sync_thread) {
>   			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -			md_reap_sync_thread(mddev);
> +			md_reap_sync_thread(mddev, false);
>   		}
>   	} else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
>   		return -EBUSY;
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index ca40942..0c12b7f 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char *page, size_t len)
>   				flush_workqueue(md_misc_wq);
>   			if (mddev->sync_thread) {
>   				set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -				md_reap_sync_thread(mddev);
> +				md_reap_sync_thread(mddev, true);
>   			}
>   			mddev_unlock(mddev);
>   		}
> @@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev)
>   		flush_workqueue(md_misc_wq);
>   	if (mddev->sync_thread) {
>   		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -		md_reap_sync_thread(mddev);
> +		md_reap_sync_thread(mddev, true);
>   	}
>   
>   	del_timer_sync(&mddev->safemode_timer);
> @@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev)
>   			 * ->spare_active and clear saved_raid_disk
>   			 */
>   			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -			md_reap_sync_thread(mddev);
> +			md_reap_sync_thread(mddev, true);
>   			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>   			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>   			clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
> @@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev)
>   			goto unlock;
>   		}
>   		if (mddev->sync_thread) {
> -			md_reap_sync_thread(mddev);
> +			md_reap_sync_thread(mddev, true);
>   			goto unlock;
>   		}
>   		/* Set RUNNING before clearing NEEDED to avoid
> @@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev)
>   }
>   EXPORT_SYMBOL(md_check_recovery);
>   
> -void md_reap_sync_thread(struct mddev *mddev)
> +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>   {
>   	struct md_rdev *rdev;
>   	sector_t old_dev_sectors = mddev->dev_sectors;
>   	bool is_reshaped = false;
>   
>   	/* resync has finished, collect result */
> +	if (reconfig_mutex_held)
> +		mddev_unlock(mddev);
>   	md_unregister_thread(&mddev->sync_thread);
> +	if (reconfig_mutex_held)
> +		mddev_lock_nointr(mddev);
>   	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>   	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>   	    mddev->degraded != mddev->raid_disks) {
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 34070ab..7ae3d98 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -705,7 +705,7 @@ extern struct md_thread *md_register_thread(
>   extern void md_unregister_thread(struct md_thread **threadp);
>   extern void md_wakeup_thread(struct md_thread *thread);
>   extern void md_check_recovery(struct mddev *mddev);
> -extern void md_reap_sync_thread(struct mddev *mddev);
> +extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held);
>   extern int mddev_init_writes_pending(struct mddev *mddev);
>   extern bool md_write_start(struct mddev *mddev, struct bio *bi);
>   extern void md_write_inc(struct mddev *mddev, struct bio *bi);
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-02-15 11:07   ` [dm-devel] " Paul Menzel
@ 2021-02-24  9:09     ` Song Liu
  -1 siblings, 0 replies; 37+ messages in thread
From: Song Liu @ 2021-02-24  9:09 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Guoqing Jiang, Alasdair Kergon, Mike Snitzer, linux-raid,
	dm-devel, Donald Buczek, it+raid

On Mon, Feb 15, 2021 at 3:08 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> [+cc Donald]
>
> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
> > Unregister sync_thread doesn't need to hold reconfig_mutex since it
> > doesn't reconfigure array.
> >
> > And it could cause deadlock problem for raid5 as follows:
> >
> > 1. process A tried to reap sync thread with reconfig_mutex held after echo
> >     idle to sync_action.
> > 2. raid5 sync thread was blocked if there were too many active stripes.
> > 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
> >     which causes the number of active stripes can't be decreased.
> > 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
> >     to hold reconfig_mutex.
> >
> > More details in the link:
> > https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
> >
> > And add one parameter to md_reap_sync_thread since it could be called by
> > dm-raid which doesn't hold reconfig_mutex.
> >
> > Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
> > Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>

I don't really like this fix. But I haven't got a better (and not too
complicated)
alternative.

> > ---
> > V2:
> > 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
> >
> >   drivers/md/dm-raid.c |  2 +-
> >   drivers/md/md.c      | 14 +++++++++-----
> >   drivers/md/md.h      |  2 +-
> >   3 files changed, 11 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> > index cab12b2..0c4cbba 100644
> > --- a/drivers/md/dm-raid.c
> > +++ b/drivers/md/dm-raid.c
> > @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
> >       if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
> >               if (mddev->sync_thread) {
> >                       set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> > -                     md_reap_sync_thread(mddev);
> > +                     md_reap_sync_thread(mddev, false);

I think we can add mddev_lock() and mddev_unlock() here and then we don't
need the extra parameter?

Thanks,
Song

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-02-24  9:09     ` Song Liu
  0 siblings, 0 replies; 37+ messages in thread
From: Song Liu @ 2021-02-24  9:09 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Guoqing Jiang, Mike Snitzer, linux-raid, dm-devel, it+raid,
	Donald Buczek, Alasdair Kergon

On Mon, Feb 15, 2021 at 3:08 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> [+cc Donald]
>
> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
> > Unregister sync_thread doesn't need to hold reconfig_mutex since it
> > doesn't reconfigure array.
> >
> > And it could cause deadlock problem for raid5 as follows:
> >
> > 1. process A tried to reap sync thread with reconfig_mutex held after echo
> >     idle to sync_action.
> > 2. raid5 sync thread was blocked if there were too many active stripes.
> > 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
> >     which causes the number of active stripes can't be decreased.
> > 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
> >     to hold reconfig_mutex.
> >
> > More details in the link:
> > https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
> >
> > And add one parameter to md_reap_sync_thread since it could be called by
> > dm-raid which doesn't hold reconfig_mutex.
> >
> > Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
> > Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>

I don't really like this fix. But I haven't got a better (and not too
complicated)
alternative.

> > ---
> > V2:
> > 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
> >
> >   drivers/md/dm-raid.c |  2 +-
> >   drivers/md/md.c      | 14 +++++++++-----
> >   drivers/md/md.h      |  2 +-
> >   3 files changed, 11 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> > index cab12b2..0c4cbba 100644
> > --- a/drivers/md/dm-raid.c
> > +++ b/drivers/md/dm-raid.c
> > @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
> >       if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
> >               if (mddev->sync_thread) {
> >                       set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> > -                     md_reap_sync_thread(mddev);
> > +                     md_reap_sync_thread(mddev, false);

I think we can add mddev_lock() and mddev_unlock() here and then we don't
need the extra parameter?

Thanks,
Song

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-02-24  9:09     ` [dm-devel] " Song Liu
@ 2021-02-24  9:25       ` Guoqing Jiang
  -1 siblings, 0 replies; 37+ messages in thread
From: Guoqing Jiang @ 2021-02-24  9:25 UTC (permalink / raw)
  To: Song Liu, Paul Menzel
  Cc: Alasdair Kergon, Mike Snitzer, linux-raid, dm-devel,
	Donald Buczek, it+raid



On 2/24/21 10:09, Song Liu wrote:
> On Mon, Feb 15, 2021 at 3:08 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>>
>> [+cc Donald]
>>
>> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
>>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>>> doesn't reconfigure array.
>>>
>>> And it could cause deadlock problem for raid5 as follows:
>>>
>>> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>>>      idle to sync_action.
>>> 2. raid5 sync thread was blocked if there were too many active stripes.
>>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>>>      which causes the number of active stripes can't be decreased.
>>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>>>      to hold reconfig_mutex.
>>>
>>> More details in the link:
>>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
>>>
>>> And add one parameter to md_reap_sync_thread since it could be called by
>>> dm-raid which doesn't hold reconfig_mutex.
>>>
>>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
>>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> 
> I don't really like this fix. But I haven't got a better (and not too
> complicated)
> alternative.
> 
>>> ---
>>> V2:
>>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
>>>
>>>    drivers/md/dm-raid.c |  2 +-
>>>    drivers/md/md.c      | 14 +++++++++-----
>>>    drivers/md/md.h      |  2 +-
>>>    3 files changed, 11 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>>> index cab12b2..0c4cbba 100644
>>> --- a/drivers/md/dm-raid.c
>>> +++ b/drivers/md/dm-raid.c
>>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>>>        if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>>>                if (mddev->sync_thread) {
>>>                        set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>> -                     md_reap_sync_thread(mddev);
>>> +                     md_reap_sync_thread(mddev, false);
> 
> I think we can add mddev_lock() and mddev_unlock() here and then we don't
> need the extra parameter?
> 

I thought it too, but I would prefer get the input from DM people first.

@ Mike or Alasdair


Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-02-24  9:25       ` Guoqing Jiang
  0 siblings, 0 replies; 37+ messages in thread
From: Guoqing Jiang @ 2021-02-24  9:25 UTC (permalink / raw)
  To: Song Liu, Paul Menzel
  Cc: Mike Snitzer, linux-raid, dm-devel, Alasdair Kergon,
	Donald Buczek, it+raid



On 2/24/21 10:09, Song Liu wrote:
> On Mon, Feb 15, 2021 at 3:08 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>>
>> [+cc Donald]
>>
>> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
>>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>>> doesn't reconfigure array.
>>>
>>> And it could cause deadlock problem for raid5 as follows:
>>>
>>> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>>>      idle to sync_action.
>>> 2. raid5 sync thread was blocked if there were too many active stripes.
>>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>>>      which causes the number of active stripes can't be decreased.
>>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>>>      to hold reconfig_mutex.
>>>
>>> More details in the link:
>>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
>>>
>>> And add one parameter to md_reap_sync_thread since it could be called by
>>> dm-raid which doesn't hold reconfig_mutex.
>>>
>>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
>>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> 
> I don't really like this fix. But I haven't got a better (and not too
> complicated)
> alternative.
> 
>>> ---
>>> V2:
>>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
>>>
>>>    drivers/md/dm-raid.c |  2 +-
>>>    drivers/md/md.c      | 14 +++++++++-----
>>>    drivers/md/md.h      |  2 +-
>>>    3 files changed, 11 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>>> index cab12b2..0c4cbba 100644
>>> --- a/drivers/md/dm-raid.c
>>> +++ b/drivers/md/dm-raid.c
>>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>>>        if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>>>                if (mddev->sync_thread) {
>>>                        set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>> -                     md_reap_sync_thread(mddev);
>>> +                     md_reap_sync_thread(mddev, false);
> 
> I think we can add mddev_lock() and mddev_unlock() here and then we don't
> need the extra parameter?
> 

I thought it too, but I would prefer get the input from DM people first.

@ Mike or Alasdair


Thanks,
Guoqing

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-02-24  9:25       ` [dm-devel] " Guoqing Jiang
@ 2021-03-19 23:00         ` Song Liu
  -1 siblings, 0 replies; 37+ messages in thread
From: Song Liu @ 2021-03-19 23:00 UTC (permalink / raw)
  To: Guoqing Jiang
  Cc: Paul Menzel, Alasdair Kergon, Mike Snitzer, linux-raid, dm-devel,
	Donald Buczek, it+raid

On Wed, Feb 24, 2021 at 1:26 AM Guoqing Jiang
<guoqing.jiang@cloud.ionos.com> wrote:
>
>
>
> On 2/24/21 10:09, Song Liu wrote:
> > On Mon, Feb 15, 2021 at 3:08 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> >>
> >> [+cc Donald]
> >>
> >> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
> >>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
> >>> doesn't reconfigure array.
> >>>
> >>> And it could cause deadlock problem for raid5 as follows:
> >>>
> >>> 1. process A tried to reap sync thread with reconfig_mutex held after echo
> >>>      idle to sync_action.
> >>> 2. raid5 sync thread was blocked if there were too many active stripes.
> >>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
> >>>      which causes the number of active stripes can't be decreased.
> >>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
> >>>      to hold reconfig_mutex.
> >>>
> >>> More details in the link:
> >>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
> >>>
> >>> And add one parameter to md_reap_sync_thread since it could be called by
> >>> dm-raid which doesn't hold reconfig_mutex.
> >>>
> >>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
> >>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> >
> > I don't really like this fix. But I haven't got a better (and not too
> > complicated)
> > alternative.
> >
> >>> ---
> >>> V2:
> >>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
> >>>
> >>>    drivers/md/dm-raid.c |  2 +-
> >>>    drivers/md/md.c      | 14 +++++++++-----
> >>>    drivers/md/md.h      |  2 +-
> >>>    3 files changed, 11 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> >>> index cab12b2..0c4cbba 100644
> >>> --- a/drivers/md/dm-raid.c
> >>> +++ b/drivers/md/dm-raid.c
> >>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
> >>>        if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
> >>>                if (mddev->sync_thread) {
> >>>                        set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> >>> -                     md_reap_sync_thread(mddev);
> >>> +                     md_reap_sync_thread(mddev, false);
> >
> > I think we can add mddev_lock() and mddev_unlock() here and then we don't
> > need the extra parameter?
> >
>
> I thought it too, but I would prefer get the input from DM people first.
>
> @ Mike or Alasdair

Hi Mike and Alasdair,

Could you please comment on this option: adding mddev_lock() and mddev_unlock()
to raid_message() around md_reap_sync_thread()?

Thanks,
Song

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-03-19 23:00         ` Song Liu
  0 siblings, 0 replies; 37+ messages in thread
From: Song Liu @ 2021-03-19 23:00 UTC (permalink / raw)
  To: Guoqing Jiang
  Cc: Paul Menzel, Mike Snitzer, linux-raid, dm-devel, it+raid,
	Donald Buczek, Alasdair Kergon

On Wed, Feb 24, 2021 at 1:26 AM Guoqing Jiang
<guoqing.jiang@cloud.ionos.com> wrote:
>
>
>
> On 2/24/21 10:09, Song Liu wrote:
> > On Mon, Feb 15, 2021 at 3:08 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> >>
> >> [+cc Donald]
> >>
> >> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
> >>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
> >>> doesn't reconfigure array.
> >>>
> >>> And it could cause deadlock problem for raid5 as follows:
> >>>
> >>> 1. process A tried to reap sync thread with reconfig_mutex held after echo
> >>>      idle to sync_action.
> >>> 2. raid5 sync thread was blocked if there were too many active stripes.
> >>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
> >>>      which causes the number of active stripes can't be decreased.
> >>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
> >>>      to hold reconfig_mutex.
> >>>
> >>> More details in the link:
> >>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
> >>>
> >>> And add one parameter to md_reap_sync_thread since it could be called by
> >>> dm-raid which doesn't hold reconfig_mutex.
> >>>
> >>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
> >>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> >
> > I don't really like this fix. But I haven't got a better (and not too
> > complicated)
> > alternative.
> >
> >>> ---
> >>> V2:
> >>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
> >>>
> >>>    drivers/md/dm-raid.c |  2 +-
> >>>    drivers/md/md.c      | 14 +++++++++-----
> >>>    drivers/md/md.h      |  2 +-
> >>>    3 files changed, 11 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> >>> index cab12b2..0c4cbba 100644
> >>> --- a/drivers/md/dm-raid.c
> >>> +++ b/drivers/md/dm-raid.c
> >>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
> >>>        if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
> >>>                if (mddev->sync_thread) {
> >>>                        set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> >>> -                     md_reap_sync_thread(mddev);
> >>> +                     md_reap_sync_thread(mddev, false);
> >
> > I think we can add mddev_lock() and mddev_unlock() here and then we don't
> > need the extra parameter?
> >
>
> I thought it too, but I would prefer get the input from DM people first.
>
> @ Mike or Alasdair

Hi Mike and Alasdair,

Could you please comment on this option: adding mddev_lock() and mddev_unlock()
to raid_message() around md_reap_sync_thread()?

Thanks,
Song

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-03-19 23:00         ` [dm-devel] " Song Liu
@ 2021-11-30 17:25           ` Paul Menzel
  -1 siblings, 0 replies; 37+ messages in thread
From: Paul Menzel @ 2021-11-30 17:25 UTC (permalink / raw)
  To: Song Liu, Guoqing Jiang, Alasdair Kergon, Mike Snitzer
  Cc: linux-raid, dm-devel, Donald Buczek, it+raid

Dear Linux folks,


Am 20.03.21 um 00:00 schrieb Song Liu:
> On Wed, Feb 24, 2021 at 1:26 AM Guoqing Jiang wrote:

>> On 2/24/21 10:09, Song Liu wrote:
>>> On Mon, Feb 15, 2021 at 3:08 AM Paul Menzel wrote:
>>>>
>>>> [+cc Donald]
>>>>
>>>> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
>>>>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>>>>> doesn't reconfigure array.
>>>>>
>>>>> And it could cause deadlock problem for raid5 as follows:
>>>>>
>>>>> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>>>>>       idle to sync_action.
>>>>> 2. raid5 sync thread was blocked if there were too many active stripes.
>>>>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>>>>>       which causes the number of active stripes can't be decreased.
>>>>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>>>>>       to hold reconfig_mutex.
>>>>>
>>>>> More details in the link:
>>>>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
>>>>>
>>>>> And add one parameter to md_reap_sync_thread since it could be called by
>>>>> dm-raid which doesn't hold reconfig_mutex.
>>>>>
>>>>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
>>>>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>>>
>>> I don't really like this fix. But I haven't got a better (and not too
>>> complicated)
>>> alternative.
>>>
>>>>> ---
>>>>> V2:
>>>>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
>>>>>
>>>>>     drivers/md/dm-raid.c |  2 +-
>>>>>     drivers/md/md.c      | 14 +++++++++-----
>>>>>     drivers/md/md.h      |  2 +-
>>>>>     3 files changed, 11 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>>>>> index cab12b2..0c4cbba 100644
>>>>> --- a/drivers/md/dm-raid.c
>>>>> +++ b/drivers/md/dm-raid.c
>>>>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>>>>>         if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>>>>>                 if (mddev->sync_thread) {
>>>>>                         set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>>>> -                     md_reap_sync_thread(mddev);
>>>>> +                     md_reap_sync_thread(mddev, false);
>>>
>>> I think we can add mddev_lock() and mddev_unlock() here and then we don't
>>> need the extra parameter?
>>
>> I thought it too, but I would prefer get the input from DM people first.
>>
>> @ Mike or Alasdair
> 
> Hi Mike and Alasdair,
> 
> Could you please comment on this option: adding mddev_lock() and mddev_unlock()
> to raid_message() around md_reap_sync_thread()?

The issue is unfortunately still unresolved (at least Linux 5.10.82). 
How can we move forward?


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-11-30 17:25           ` Paul Menzel
  0 siblings, 0 replies; 37+ messages in thread
From: Paul Menzel @ 2021-11-30 17:25 UTC (permalink / raw)
  To: Song Liu, Guoqing Jiang, Alasdair Kergon, Mike Snitzer
  Cc: linux-raid, dm-devel, Donald Buczek, it+raid

Dear Linux folks,


Am 20.03.21 um 00:00 schrieb Song Liu:
> On Wed, Feb 24, 2021 at 1:26 AM Guoqing Jiang wrote:

>> On 2/24/21 10:09, Song Liu wrote:
>>> On Mon, Feb 15, 2021 at 3:08 AM Paul Menzel wrote:
>>>>
>>>> [+cc Donald]
>>>>
>>>> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
>>>>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>>>>> doesn't reconfigure array.
>>>>>
>>>>> And it could cause deadlock problem for raid5 as follows:
>>>>>
>>>>> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>>>>>       idle to sync_action.
>>>>> 2. raid5 sync thread was blocked if there were too many active stripes.
>>>>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>>>>>       which causes the number of active stripes can't be decreased.
>>>>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>>>>>       to hold reconfig_mutex.
>>>>>
>>>>> More details in the link:
>>>>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
>>>>>
>>>>> And add one parameter to md_reap_sync_thread since it could be called by
>>>>> dm-raid which doesn't hold reconfig_mutex.
>>>>>
>>>>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
>>>>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>>>
>>> I don't really like this fix. But I haven't got a better (and not too
>>> complicated)
>>> alternative.
>>>
>>>>> ---
>>>>> V2:
>>>>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
>>>>>
>>>>>     drivers/md/dm-raid.c |  2 +-
>>>>>     drivers/md/md.c      | 14 +++++++++-----
>>>>>     drivers/md/md.h      |  2 +-
>>>>>     3 files changed, 11 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>>>>> index cab12b2..0c4cbba 100644
>>>>> --- a/drivers/md/dm-raid.c
>>>>> +++ b/drivers/md/dm-raid.c
>>>>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>>>>>         if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>>>>>                 if (mddev->sync_thread) {
>>>>>                         set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>>>> -                     md_reap_sync_thread(mddev);
>>>>> +                     md_reap_sync_thread(mddev, false);
>>>
>>> I think we can add mddev_lock() and mddev_unlock() here and then we don't
>>> need the extra parameter?
>>
>> I thought it too, but I would prefer get the input from DM people first.
>>
>> @ Mike or Alasdair
> 
> Hi Mike and Alasdair,
> 
> Could you please comment on this option: adding mddev_lock() and mddev_unlock()
> to raid_message() around md_reap_sync_thread()?

The issue is unfortunately still unresolved (at least Linux 5.10.82). 
How can we move forward?


Kind regards,

Paul

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-11-30 17:25           ` [dm-devel] " Paul Menzel
@ 2021-11-30 17:27             ` Paul Menzel
  -1 siblings, 0 replies; 37+ messages in thread
From: Paul Menzel @ 2021-11-30 17:27 UTC (permalink / raw)
  To: Song Liu, Guoqing Jiang, Alasdair Kergon, Mike Snitzer
  Cc: linux-raid, dm-devel, Donald Buczek, it+raid

[Update Guoqing’s email address]

Am 30.11.21 um 18:25 schrieb Paul Menzel:
> Dear Linux folks,
> 
> 
> Am 20.03.21 um 00:00 schrieb Song Liu:
>> On Wed, Feb 24, 2021 at 1:26 AM Guoqing Jiang wrote:
> 
>>> On 2/24/21 10:09, Song Liu wrote:
>>>> On Mon, Feb 15, 2021 at 3:08 AM Paul Menzel wrote:
>>>>>
>>>>> [+cc Donald]
>>>>>
>>>>> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
>>>>>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>>>>>> doesn't reconfigure array.
>>>>>>
>>>>>> And it could cause deadlock problem for raid5 as follows:
>>>>>>
>>>>>> 1. process A tried to reap sync thread with reconfig_mutex held 
>>>>>> after echo
>>>>>>       idle to sync_action.
>>>>>> 2. raid5 sync thread was blocked if there were too many active 
>>>>>> stripes.
>>>>>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper 
>>>>>> layer)
>>>>>>       which causes the number of active stripes can't be decreased.
>>>>>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was 
>>>>>> not able
>>>>>>       to hold reconfig_mutex.
>>>>>>
>>>>>> More details in the link:
>>>>>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t 
>>>>>>
>>>>>>
>>>>>> And add one parameter to md_reap_sync_thread since it could be 
>>>>>> called by
>>>>>> dm-raid which doesn't hold reconfig_mutex.
>>>>>>
>>>>>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
>>>>>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>>>>
>>>> I don't really like this fix. But I haven't got a better (and not too
>>>> complicated)
>>>> alternative.
>>>>
>>>>>> ---
>>>>>> V2:
>>>>>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
>>>>>>
>>>>>>     drivers/md/dm-raid.c |  2 +-
>>>>>>     drivers/md/md.c      | 14 +++++++++-----
>>>>>>     drivers/md/md.h      |  2 +-
>>>>>>     3 files changed, 11 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>>>>>> index cab12b2..0c4cbba 100644
>>>>>> --- a/drivers/md/dm-raid.c
>>>>>> +++ b/drivers/md/dm-raid.c
>>>>>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target 
>>>>>> *ti, unsigned int argc, char **argv,
>>>>>>         if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], 
>>>>>> "frozen")) {
>>>>>>                 if (mddev->sync_thread) {
>>>>>>                         set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>>>>> -                     md_reap_sync_thread(mddev);
>>>>>> +                     md_reap_sync_thread(mddev, false);
>>>>
>>>> I think we can add mddev_lock() and mddev_unlock() here and then we 
>>>> don't
>>>> need the extra parameter?
>>>
>>> I thought it too, but I would prefer get the input from DM people first.
>>>
>>> @ Mike or Alasdair
>>
>> Hi Mike and Alasdair,
>>
>> Could you please comment on this option: adding mddev_lock() and 
>> mddev_unlock()
>> to raid_message() around md_reap_sync_thread()?
> 
> The issue is unfortunately still unresolved (at least Linux 5.10.82). 
> How can we move forward?
> 
> 
> Kind regards,
> 
> Paul

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-11-30 17:27             ` Paul Menzel
  0 siblings, 0 replies; 37+ messages in thread
From: Paul Menzel @ 2021-11-30 17:27 UTC (permalink / raw)
  To: Song Liu, Guoqing Jiang, Alasdair Kergon, Mike Snitzer
  Cc: linux-raid, dm-devel, Donald Buczek, it+raid

[Update Guoqing’s email address]

Am 30.11.21 um 18:25 schrieb Paul Menzel:
> Dear Linux folks,
> 
> 
> Am 20.03.21 um 00:00 schrieb Song Liu:
>> On Wed, Feb 24, 2021 at 1:26 AM Guoqing Jiang wrote:
> 
>>> On 2/24/21 10:09, Song Liu wrote:
>>>> On Mon, Feb 15, 2021 at 3:08 AM Paul Menzel wrote:
>>>>>
>>>>> [+cc Donald]
>>>>>
>>>>> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
>>>>>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>>>>>> doesn't reconfigure array.
>>>>>>
>>>>>> And it could cause deadlock problem for raid5 as follows:
>>>>>>
>>>>>> 1. process A tried to reap sync thread with reconfig_mutex held 
>>>>>> after echo
>>>>>>       idle to sync_action.
>>>>>> 2. raid5 sync thread was blocked if there were too many active 
>>>>>> stripes.
>>>>>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper 
>>>>>> layer)
>>>>>>       which causes the number of active stripes can't be decreased.
>>>>>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was 
>>>>>> not able
>>>>>>       to hold reconfig_mutex.
>>>>>>
>>>>>> More details in the link:
>>>>>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t 
>>>>>>
>>>>>>
>>>>>> And add one parameter to md_reap_sync_thread since it could be 
>>>>>> called by
>>>>>> dm-raid which doesn't hold reconfig_mutex.
>>>>>>
>>>>>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
>>>>>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>>>>
>>>> I don't really like this fix. But I haven't got a better (and not too
>>>> complicated)
>>>> alternative.
>>>>
>>>>>> ---
>>>>>> V2:
>>>>>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
>>>>>>
>>>>>>     drivers/md/dm-raid.c |  2 +-
>>>>>>     drivers/md/md.c      | 14 +++++++++-----
>>>>>>     drivers/md/md.h      |  2 +-
>>>>>>     3 files changed, 11 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>>>>>> index cab12b2..0c4cbba 100644
>>>>>> --- a/drivers/md/dm-raid.c
>>>>>> +++ b/drivers/md/dm-raid.c
>>>>>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target 
>>>>>> *ti, unsigned int argc, char **argv,
>>>>>>         if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], 
>>>>>> "frozen")) {
>>>>>>                 if (mddev->sync_thread) {
>>>>>>                         set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>>>>> -                     md_reap_sync_thread(mddev);
>>>>>> +                     md_reap_sync_thread(mddev, false);
>>>>
>>>> I think we can add mddev_lock() and mddev_unlock() here and then we 
>>>> don't
>>>> need the extra parameter?
>>>
>>> I thought it too, but I would prefer get the input from DM people first.
>>>
>>> @ Mike or Alasdair
>>
>> Hi Mike and Alasdair,
>>
>> Could you please comment on this option: adding mddev_lock() and 
>> mddev_unlock()
>> to raid_message() around md_reap_sync_thread()?
> 
> The issue is unfortunately still unresolved (at least Linux 5.10.82). 
> How can we move forward?
> 
> 
> Kind regards,
> 
> Paul


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-11-30 17:27             ` [dm-devel] " Paul Menzel
@ 2021-12-08 14:16               ` Guoqing Jiang
  -1 siblings, 0 replies; 37+ messages in thread
From: Guoqing Jiang @ 2021-12-08 14:16 UTC (permalink / raw)
  To: Paul Menzel, Song Liu, Alasdair Kergon, Mike Snitzer, heinzm, jbrassow
  Cc: linux-raid, dm-devel, Donald Buczek, it+raid



On 12/1/21 1:27 AM, Paul Menzel wrote:
>
>>>>>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>>>>>>> index cab12b2..0c4cbba 100644
>>>>>>> --- a/drivers/md/dm-raid.c
>>>>>>> +++ b/drivers/md/dm-raid.c
>>>>>>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target 
>>>>>>> *ti, unsigned int argc, char **argv,
>>>>>>>         if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], 
>>>>>>> "frozen")) {
>>>>>>>                 if (mddev->sync_thread) {
>>>>>>>                         set_bit(MD_RECOVERY_INTR, 
>>>>>>> &mddev->recovery);
>>>>>>> -                     md_reap_sync_thread(mddev);
>>>>>>> +                     md_reap_sync_thread(mddev, false);
>>>>>
>>>>> I think we can add mddev_lock() and mddev_unlock() here and then 
>>>>> we don't
>>>>> need the extra parameter?
>>>>
>>>> I thought it too, but I would prefer get the input from DM people 
>>>> first.
>>>>
>>>> @ Mike or Alasdair
>>>
>>> Hi Mike and Alasdair,
>>>
>>> Could you please comment on this option: adding mddev_lock() and 
>>> mddev_unlock()
>>> to raid_message() around md_reap_sync_thread()?

Add Heinz and Jonathan, could you comment about this? Thanks.

>>
>> The issue is unfortunately still unresolved (at least Linux 5.10.82). 
>> How can we move forward?

If it is not applicable to change dm-raid, another alternative could be 
like this

--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9409,8 +9409,12 @@ void md_reap_sync_thread(struct mdev *mddev)
         sector_t old_dev_sectors = mddev->dev_sectors;
         bool is_reshaped = false;

+       if (mddev_is_locked(mddev))
+               mddev_unlock(mddev);
         /* resync has finished, collect result */
         md_unregister_thread(&mddev->sync_thread);
+       if (mddev_is_locked(mddev))
+               mddev_lock(mddev);
         if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
             !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
             mddev->degraded != mddev->raid_disks) {
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 53ea7a6961de..96a88b7681d6 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -549,6 +549,11 @@ static inline int mddev_trylock(struct mddev *mddev)
  }
  extern void mddev_unlock(struct mddev *mddev);

+static inline int mddev_is_locked(struct mddev *mddev)
+{
+       return mutex_is_locked(&mddev->reconfig_mutex);
+}
+

BTW, it is holiday season,  so people are probably on vacation.

Thanks,
Guoqing

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-12-08 14:16               ` Guoqing Jiang
  0 siblings, 0 replies; 37+ messages in thread
From: Guoqing Jiang @ 2021-12-08 14:16 UTC (permalink / raw)
  To: Paul Menzel, Song Liu, Alasdair Kergon, Mike Snitzer, heinzm, jbrassow
  Cc: linux-raid, dm-devel, Donald Buczek, it+raid



On 12/1/21 1:27 AM, Paul Menzel wrote:
>
>>>>>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>>>>>>> index cab12b2..0c4cbba 100644
>>>>>>> --- a/drivers/md/dm-raid.c
>>>>>>> +++ b/drivers/md/dm-raid.c
>>>>>>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target 
>>>>>>> *ti, unsigned int argc, char **argv,
>>>>>>>         if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], 
>>>>>>> "frozen")) {
>>>>>>>                 if (mddev->sync_thread) {
>>>>>>>                         set_bit(MD_RECOVERY_INTR, 
>>>>>>> &mddev->recovery);
>>>>>>> -                     md_reap_sync_thread(mddev);
>>>>>>> +                     md_reap_sync_thread(mddev, false);
>>>>>
>>>>> I think we can add mddev_lock() and mddev_unlock() here and then 
>>>>> we don't
>>>>> need the extra parameter?
>>>>
>>>> I thought it too, but I would prefer get the input from DM people 
>>>> first.
>>>>
>>>> @ Mike or Alasdair
>>>
>>> Hi Mike and Alasdair,
>>>
>>> Could you please comment on this option: adding mddev_lock() and 
>>> mddev_unlock()
>>> to raid_message() around md_reap_sync_thread()?

Add Heinz and Jonathan, could you comment about this? Thanks.

>>
>> The issue is unfortunately still unresolved (at least Linux 5.10.82). 
>> How can we move forward?

If it is not applicable to change dm-raid, another alternative could be 
like this

--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9409,8 +9409,12 @@ void md_reap_sync_thread(struct mdev *mddev)
         sector_t old_dev_sectors = mddev->dev_sectors;
         bool is_reshaped = false;

+       if (mddev_is_locked(mddev))
+               mddev_unlock(mddev);
         /* resync has finished, collect result */
         md_unregister_thread(&mddev->sync_thread);
+       if (mddev_is_locked(mddev))
+               mddev_lock(mddev);
         if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
             !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
             mddev->degraded != mddev->raid_disks) {
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 53ea7a6961de..96a88b7681d6 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -549,6 +549,11 @@ static inline int mddev_trylock(struct mddev *mddev)
  }
  extern void mddev_unlock(struct mddev *mddev);

+static inline int mddev_is_locked(struct mddev *mddev)
+{
+       return mutex_is_locked(&mddev->reconfig_mutex);
+}
+

BTW, it is holiday season,  so people are probably on vacation.

Thanks,
Guoqing


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-12-08 14:16               ` [dm-devel] " Guoqing Jiang
  (?)
@ 2021-12-08 16:35               ` Heinz Mauelshagen
  2021-12-09  0:47                   ` [dm-devel] " Guoqing Jiang
  -1 siblings, 1 reply; 37+ messages in thread
From: Heinz Mauelshagen @ 2021-12-08 16:35 UTC (permalink / raw)
  To: Guoqing Jiang
  Cc: Paul Menzel, Mike Snitzer, linux-raid, Song Liu, dm-devel,
	it+raid, Donald Buczek, Alasdair Kergon


[-- Attachment #1.1: Type: text/plain, Size: 3146 bytes --]

NACK, see details below.

On Wed, Dec 8, 2021 at 3:24 PM Guoqing Jiang <guoqing.jiang@linux.dev>
wrote:

>
>
> On 12/1/21 1:27 AM, Paul Menzel wrote:
> >
> >>>>>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> >>>>>>> index cab12b2..0c4cbba 100644
> >>>>>>> --- a/drivers/md/dm-raid.c
> >>>>>>> +++ b/drivers/md/dm-raid.c
> >>>>>>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target
> >>>>>>> *ti, unsigned int argc, char **argv,
> >>>>>>>         if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0],
> >>>>>>> "frozen")) {
> >>>>>>>                 if (mddev->sync_thread) {
> >>>>>>>                         set_bit(MD_RECOVERY_INTR,
> >>>>>>> &mddev->recovery);
> >>>>>>> -                     md_reap_sync_thread(mddev);
> >>>>>>> +                     md_reap_sync_thread(mddev, false);
> >>>>>
> >>>>> I think we can add mddev_lock() and mddev_unlock() here and then
> >>>>> we don't
> >>>>> need the extra parameter?
> >>>>
> >>>> I thought it too, but I would prefer get the input from DM people
> >>>> first.
> >>>>
> >>>> @ Mike or Alasdair
> >>>
> >>> Hi Mike and Alasdair,
> >>>
> >>> Could you please comment on this option: adding mddev_lock() and
> >>> mddev_unlock()
> >>> to raid_message() around md_reap_sync_thread()?
>
> Add Heinz and Jonathan, could you comment about this? Thanks.
>
> >>
> >> The issue is unfortunately still unresolved (at least Linux 5.10.82).
> >> How can we move forward?
>
> If it is not applicable to change dm-raid, another alternative could be
> like this
>
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9409,8 +9409,12 @@ void md_reap_sync_thread(struct mdev *mddev)
>          sector_t old_dev_sectors = mddev->dev_sectors;
>          bool is_reshaped = false;
>
> +       if (mddev_is_locked(mddev))
> +               mddev_unlock(mddev);
>          /* resync has finished, collect result */
>          md_unregister_thread(&mddev->sync_thread);
> +       if (mddev_is_locked(mddev))
> +               mddev_lock(mddev);
>          if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>              !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>              mddev->degraded != mddev->raid_disks) {
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 53ea7a6961de..96a88b7681d6 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -549,6 +549,11 @@ static inline int mddev_trylock(struct mddev *mddev)
>   }
>   extern void mddev_unlock(struct mddev *mddev);
>
> +static inline int mddev_is_locked(struct mddev *mddev)
> +{
> +       return mutex_is_locked(&mddev->reconfig_mutex);
> +}
> +
>
>
Patch is bogus relative to the proposed mddev_unlock/mddev_lock logic in
md.c around the
md_unregister_thread() as it's failing to lock again if it was holding the
mutex before as it again
calls mddev_locked() after having the mutex unlocked just before the
md_unregister_thread() call.

If that patch to md.c holds up in further analysis, it has to keep the
mddev_is_locked() result
and unlock/lock conditionally based on its result.

Thanks,
Heinz


BTW, it is holiday season,  so people are probably on vacation.
>
> Thanks,
> Guoqing
>
>

[-- Attachment #1.2: Type: text/html, Size: 4703 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-12-08 16:35               ` Heinz Mauelshagen
@ 2021-12-09  0:47                   ` Guoqing Jiang
  0 siblings, 0 replies; 37+ messages in thread
From: Guoqing Jiang @ 2021-12-09  0:47 UTC (permalink / raw)
  To: Heinz Mauelshagen
  Cc: Paul Menzel, Song Liu, Alasdair Kergon, Mike Snitzer, Brassow,
	Jonathan, linux-raid, dm-devel, Donald Buczek, it+raid



On 12/9/21 12:35 AM, Heinz Mauelshagen wrote:
> NACK, see details below.
>
> On Wed, Dec 8, 2021 at 3:24 PM Guoqing Jiang <guoqing.jiang@linux.dev 
> <mailto:guoqing.jiang@linux.dev>> wrote:
>
>
>
>     On 12/1/21 1:27 AM, Paul Menzel wrote:
>     >
>     >>>>>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>     >>>>>>> index cab12b2..0c4cbba 100644
>     >>>>>>> --- a/drivers/md/dm-raid.c
>     >>>>>>> +++ b/drivers/md/dm-raid.c
>     >>>>>>> @@ -3668,7 +3668,7 @@ static int raid_message(struct
>     dm_target
>     >>>>>>> *ti, unsigned int argc, char **argv,
>     >>>>>>>         if (!strcasecmp(argv[0], "idle") ||
>     !strcasecmp(argv[0],
>     >>>>>>> "frozen")) {
>     >>>>>>>                 if (mddev->sync_thread) {
>     >>>>>>> set_bit(MD_RECOVERY_INTR,
>     >>>>>>> &mddev->recovery);
>     >>>>>>> - md_reap_sync_thread(mddev);
>     >>>>>>> + md_reap_sync_thread(mddev, false);
>     >>>>>
>     >>>>> I think we can add mddev_lock() and mddev_unlock() here and
>     then
>     >>>>> we don't
>     >>>>> need the extra parameter?
>     >>>>
>     >>>> I thought it too, but I would prefer get the input from DM
>     people
>     >>>> first.
>     >>>>
>     >>>> @ Mike or Alasdair
>     >>>
>     >>> Hi Mike and Alasdair,
>     >>>
>     >>> Could you please comment on this option: adding mddev_lock() and
>     >>> mddev_unlock()
>     >>> to raid_message() around md_reap_sync_thread()?
>
>     Add Heinz and Jonathan, could you comment about this? Thanks.
>
>     >>
>     >> The issue is unfortunately still unresolved (at least Linux
>     5.10.82).
>     >> How can we move forward?
>
>     If it is not applicable to change dm-raid, another alternative
>     could be
>     like this
>
>     --- a/drivers/md/md.c
>     +++ b/drivers/md/md.c
>     @@ -9409,8 +9409,12 @@ void md_reap_sync_thread(struct mdev *mddev)
>              sector_t old_dev_sectors = mddev->dev_sectors;
>              bool is_reshaped = false;
>
>     +       if (mddev_is_locked(mddev))
>     +               mddev_unlock(mddev);
>              /* resync has finished, collect result */
>              md_unregister_thread(&mddev->sync_thread);
>     +       if (mddev_is_locked(mddev))
>     +               mddev_lock(mddev);
>              if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>                  !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>                  mddev->degraded != mddev->raid_disks) {
>     diff --git a/drivers/md/md.h b/drivers/md/md.h
>     index 53ea7a6961de..96a88b7681d6 100644
>     --- a/drivers/md/md.h
>     +++ b/drivers/md/md.h
>     @@ -549,6 +549,11 @@ static inline int mddev_trylock(struct mddev
>     *mddev)
>       }
>       extern void mddev_unlock(struct mddev *mddev);
>
>     +static inline int mddev_is_locked(struct mddev *mddev)
>     +{
>     +       return mutex_is_locked(&mddev->reconfig_mutex);
>     +}
>     +
>
>
> Patch is bogus relative to the proposed mddev_unlock/mddev_lock logic 
> in md.c around the
> md_unregister_thread() as it's failing to lock again if it was holding 
> the mutex before as it again
> calls mddev_locked() after having the mutex unlocked just before the 
> md_unregister_thread() call.
>
> If that patch to md.c holds up in further analysis, it has to keep the 
> mddev_is_locked() result
> and unlock/lock conditionally based on its result.
>

Yes, that was my intention too, thanks for pointing it out.

@@ -9407,10 +9407,16 @@ void md_reap_sync_thread(struct mddev *mddev)
  {
         struct md_rdev *rdev;
         sector_t old_dev_sectors = mddev->dev_sectors;
-       bool is_reshaped = false;
+       bool is_reshaped = false, is_locked = false;

         /* resync has finished, collect result */
+       if (mddev_is_locked(mddev)) {
+               is_locked = true;
+               mddev_unlock(mddev);
+       }
         md_unregister_thread(&mddev->sync_thread);
+       if (is_locked)
+               mddev_lock(mddev);

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-12-09  0:47                   ` Guoqing Jiang
  0 siblings, 0 replies; 37+ messages in thread
From: Guoqing Jiang @ 2021-12-09  0:47 UTC (permalink / raw)
  To: Heinz Mauelshagen
  Cc: Paul Menzel, Mike Snitzer, linux-raid, Song Liu, dm-devel,
	it+raid, Donald Buczek, Alasdair Kergon



On 12/9/21 12:35 AM, Heinz Mauelshagen wrote:
> NACK, see details below.
>
> On Wed, Dec 8, 2021 at 3:24 PM Guoqing Jiang <guoqing.jiang@linux.dev 
> <mailto:guoqing.jiang@linux.dev>> wrote:
>
>
>
>     On 12/1/21 1:27 AM, Paul Menzel wrote:
>     >
>     >>>>>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>     >>>>>>> index cab12b2..0c4cbba 100644
>     >>>>>>> --- a/drivers/md/dm-raid.c
>     >>>>>>> +++ b/drivers/md/dm-raid.c
>     >>>>>>> @@ -3668,7 +3668,7 @@ static int raid_message(struct
>     dm_target
>     >>>>>>> *ti, unsigned int argc, char **argv,
>     >>>>>>>         if (!strcasecmp(argv[0], "idle") ||
>     !strcasecmp(argv[0],
>     >>>>>>> "frozen")) {
>     >>>>>>>                 if (mddev->sync_thread) {
>     >>>>>>> set_bit(MD_RECOVERY_INTR,
>     >>>>>>> &mddev->recovery);
>     >>>>>>> - md_reap_sync_thread(mddev);
>     >>>>>>> + md_reap_sync_thread(mddev, false);
>     >>>>>
>     >>>>> I think we can add mddev_lock() and mddev_unlock() here and
>     then
>     >>>>> we don't
>     >>>>> need the extra parameter?
>     >>>>
>     >>>> I thought it too, but I would prefer get the input from DM
>     people
>     >>>> first.
>     >>>>
>     >>>> @ Mike or Alasdair
>     >>>
>     >>> Hi Mike and Alasdair,
>     >>>
>     >>> Could you please comment on this option: adding mddev_lock() and
>     >>> mddev_unlock()
>     >>> to raid_message() around md_reap_sync_thread()?
>
>     Add Heinz and Jonathan, could you comment about this? Thanks.
>
>     >>
>     >> The issue is unfortunately still unresolved (at least Linux
>     5.10.82).
>     >> How can we move forward?
>
>     If it is not applicable to change dm-raid, another alternative
>     could be
>     like this
>
>     --- a/drivers/md/md.c
>     +++ b/drivers/md/md.c
>     @@ -9409,8 +9409,12 @@ void md_reap_sync_thread(struct mdev *mddev)
>              sector_t old_dev_sectors = mddev->dev_sectors;
>              bool is_reshaped = false;
>
>     +       if (mddev_is_locked(mddev))
>     +               mddev_unlock(mddev);
>              /* resync has finished, collect result */
>              md_unregister_thread(&mddev->sync_thread);
>     +       if (mddev_is_locked(mddev))
>     +               mddev_lock(mddev);
>              if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>                  !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>                  mddev->degraded != mddev->raid_disks) {
>     diff --git a/drivers/md/md.h b/drivers/md/md.h
>     index 53ea7a6961de..96a88b7681d6 100644
>     --- a/drivers/md/md.h
>     +++ b/drivers/md/md.h
>     @@ -549,6 +549,11 @@ static inline int mddev_trylock(struct mddev
>     *mddev)
>       }
>       extern void mddev_unlock(struct mddev *mddev);
>
>     +static inline int mddev_is_locked(struct mddev *mddev)
>     +{
>     +       return mutex_is_locked(&mddev->reconfig_mutex);
>     +}
>     +
>
>
> Patch is bogus relative to the proposed mddev_unlock/mddev_lock logic 
> in md.c around the
> md_unregister_thread() as it's failing to lock again if it was holding 
> the mutex before as it again
> calls mddev_locked() after having the mutex unlocked just before the 
> md_unregister_thread() call.
>
> If that patch to md.c holds up in further analysis, it has to keep the 
> mddev_is_locked() result
> and unlock/lock conditionally based on its result.
>

Yes, that was my intention too, thanks for pointing it out.

@@ -9407,10 +9407,16 @@ void md_reap_sync_thread(struct mddev *mddev)
  {
         struct md_rdev *rdev;
         sector_t old_dev_sectors = mddev->dev_sectors;
-       bool is_reshaped = false;
+       bool is_reshaped = false, is_locked = false;

         /* resync has finished, collect result */
+       if (mddev_is_locked(mddev)) {
+               is_locked = true;
+               mddev_unlock(mddev);
+       }
         md_unregister_thread(&mddev->sync_thread);
+       if (is_locked)
+               mddev_lock(mddev);

Thanks,
Guoqing


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-02-15 11:07   ` [dm-devel] " Paul Menzel
@ 2021-12-09 12:54     ` Donald Buczek
  -1 siblings, 0 replies; 37+ messages in thread
From: Donald Buczek @ 2021-12-09 12:54 UTC (permalink / raw)
  To: Paul Menzel, Guoqing Jiang, song
  Cc: agk, snitzer, linux-raid, dm-devel, it+raid

On 15.02.21 12:07, Paul Menzel wrote:
> [+cc Donald]
> 
> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>> doesn't reconfigure array.
>>
>> And it could cause deadlock problem for raid5 as follows:
>>
>> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>>     idle to sync_action.
>> 2. raid5 sync thread was blocked if there were too many active stripes.
>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>>     which causes the number of active stripes can't be decreased.
>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>>     to hold reconfig_mutex.
>>
>> More details in the link:
>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
>>
>> And add one parameter to md_reap_sync_thread since it could be called by
>> dm-raid which doesn't hold reconfig_mutex.
>>
>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>

Thanks, Paul, for putting me into the cc.

Guoqing, I don't think, I've tested this patch. Please remove the tested-by.

btw: We have the fix I suggested [1] running on 59 production raid6 sets with 16 disk each with various loads and with monthly mdcheck (paused during daytime, so a few transitions each month) on several kernel versions running for nearly a year now. Many more transitions during testing. That doesn't mean the fix is correct, of course. The configurations of our systems are almost identical and we don't do suspend or anything. But maybe you might want to reconsider.

[1]: https://lore.kernel.org/linux-raid/bc342de0-98d2-1733-39cd-cc1999777ff3@molgen.mpg.de/

If you want me to test V3 of your patch, please put me in the cc.

Best
   Donald

>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>> ---
>> V2:
>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
>>
>>   drivers/md/dm-raid.c |  2 +-
>>   drivers/md/md.c      | 14 +++++++++-----
>>   drivers/md/md.h      |  2 +-
>>   3 files changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>> index cab12b2..0c4cbba 100644
>> --- a/drivers/md/dm-raid.c
>> +++ b/drivers/md/dm-raid.c
>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>>       if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>>           if (mddev->sync_thread) {
>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, false);
>>           }
>>       } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
>>           return -EBUSY;
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index ca40942..0c12b7f 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char *page, size_t len)
>>                   flush_workqueue(md_misc_wq);
>>               if (mddev->sync_thread) {
>>                   set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -                md_reap_sync_thread(mddev);
>> +                md_reap_sync_thread(mddev, true);
>>               }
>>               mddev_unlock(mddev);
>>           }
>> @@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev)
>>           flush_workqueue(md_misc_wq);
>>       if (mddev->sync_thread) {
>>           set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -        md_reap_sync_thread(mddev);
>> +        md_reap_sync_thread(mddev, true);
>>       }
>>       del_timer_sync(&mddev->safemode_timer);
>> @@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev)
>>                * ->spare_active and clear saved_raid_disk
>>                */
>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, true);
>>               clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>               clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>               clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
>> @@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev)
>>               goto unlock;
>>           }
>>           if (mddev->sync_thread) {
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, true);
>>               goto unlock;
>>           }
>>           /* Set RUNNING before clearing NEEDED to avoid
>> @@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev)
>>   }
>>   EXPORT_SYMBOL(md_check_recovery);
>> -void md_reap_sync_thread(struct mddev *mddev)
>> +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>>   {
>>       struct md_rdev *rdev;
>>       sector_t old_dev_sectors = mddev->dev_sectors;
>>       bool is_reshaped = false;
>>       /* resync has finished, collect result */
>> +    if (reconfig_mutex_held)
>> +        mddev_unlock(mddev);
>>       md_unregister_thread(&mddev->sync_thread);
>> +    if (reconfig_mutex_held)
>> +        mddev_lock_nointr(mddev);
>>       if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>>           !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>>           mddev->degraded != mddev->raid_disks) {
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index 34070ab..7ae3d98 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -705,7 +705,7 @@ extern struct md_thread *md_register_thread(
>>   extern void md_unregister_thread(struct md_thread **threadp);
>>   extern void md_wakeup_thread(struct md_thread *thread);
>>   extern void md_check_recovery(struct mddev *mddev);
>> -extern void md_reap_sync_thread(struct mddev *mddev);
>> +extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held);
>>   extern int mddev_init_writes_pending(struct mddev *mddev);
>>   extern bool md_write_start(struct mddev *mddev, struct bio *bi);
>>   extern void md_write_inc(struct mddev *mddev, struct bio *bi);
>>

-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-12-09 12:54     ` Donald Buczek
  0 siblings, 0 replies; 37+ messages in thread
From: Donald Buczek @ 2021-12-09 12:54 UTC (permalink / raw)
  To: Paul Menzel, Guoqing Jiang, song
  Cc: linux-raid, dm-devel, it+raid, snitzer, agk

On 15.02.21 12:07, Paul Menzel wrote:
> [+cc Donald]
> 
> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>> doesn't reconfigure array.
>>
>> And it could cause deadlock problem for raid5 as follows:
>>
>> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>>     idle to sync_action.
>> 2. raid5 sync thread was blocked if there were too many active stripes.
>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>>     which causes the number of active stripes can't be decreased.
>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>>     to hold reconfig_mutex.
>>
>> More details in the link:
>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
>>
>> And add one parameter to md_reap_sync_thread since it could be called by
>> dm-raid which doesn't hold reconfig_mutex.
>>
>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>

Thanks, Paul, for putting me into the cc.

Guoqing, I don't think, I've tested this patch. Please remove the tested-by.

btw: We have the fix I suggested [1] running on 59 production raid6 sets with 16 disk each with various loads and with monthly mdcheck (paused during daytime, so a few transitions each month) on several kernel versions running for nearly a year now. Many more transitions during testing. That doesn't mean the fix is correct, of course. The configurations of our systems are almost identical and we don't do suspend or anything. But maybe you might want to reconsider.

[1]: https://lore.kernel.org/linux-raid/bc342de0-98d2-1733-39cd-cc1999777ff3@molgen.mpg.de/

If you want me to test V3 of your patch, please put me in the cc.

Best
   Donald

>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>> ---
>> V2:
>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
>>
>>   drivers/md/dm-raid.c |  2 +-
>>   drivers/md/md.c      | 14 +++++++++-----
>>   drivers/md/md.h      |  2 +-
>>   3 files changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>> index cab12b2..0c4cbba 100644
>> --- a/drivers/md/dm-raid.c
>> +++ b/drivers/md/dm-raid.c
>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>>       if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>>           if (mddev->sync_thread) {
>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, false);
>>           }
>>       } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
>>           return -EBUSY;
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index ca40942..0c12b7f 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char *page, size_t len)
>>                   flush_workqueue(md_misc_wq);
>>               if (mddev->sync_thread) {
>>                   set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -                md_reap_sync_thread(mddev);
>> +                md_reap_sync_thread(mddev, true);
>>               }
>>               mddev_unlock(mddev);
>>           }
>> @@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev)
>>           flush_workqueue(md_misc_wq);
>>       if (mddev->sync_thread) {
>>           set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -        md_reap_sync_thread(mddev);
>> +        md_reap_sync_thread(mddev, true);
>>       }
>>       del_timer_sync(&mddev->safemode_timer);
>> @@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev)
>>                * ->spare_active and clear saved_raid_disk
>>                */
>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, true);
>>               clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>               clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>               clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
>> @@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev)
>>               goto unlock;
>>           }
>>           if (mddev->sync_thread) {
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, true);
>>               goto unlock;
>>           }
>>           /* Set RUNNING before clearing NEEDED to avoid
>> @@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev)
>>   }
>>   EXPORT_SYMBOL(md_check_recovery);
>> -void md_reap_sync_thread(struct mddev *mddev)
>> +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>>   {
>>       struct md_rdev *rdev;
>>       sector_t old_dev_sectors = mddev->dev_sectors;
>>       bool is_reshaped = false;
>>       /* resync has finished, collect result */
>> +    if (reconfig_mutex_held)
>> +        mddev_unlock(mddev);
>>       md_unregister_thread(&mddev->sync_thread);
>> +    if (reconfig_mutex_held)
>> +        mddev_lock_nointr(mddev);
>>       if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>>           !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>>           mddev->degraded != mddev->raid_disks) {
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index 34070ab..7ae3d98 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -705,7 +705,7 @@ extern struct md_thread *md_register_thread(
>>   extern void md_unregister_thread(struct md_thread **threadp);
>>   extern void md_wakeup_thread(struct md_thread *thread);
>>   extern void md_check_recovery(struct mddev *mddev);
>> -extern void md_reap_sync_thread(struct mddev *mddev);
>> +extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held);
>>   extern int mddev_init_writes_pending(struct mddev *mddev);
>>   extern bool md_write_start(struct mddev *mddev, struct bio *bi);
>>   extern void md_write_inc(struct mddev *mddev, struct bio *bi);
>>

-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-02-15 11:07   ` [dm-devel] " Paul Menzel
@ 2021-12-09 12:57     ` Donald Buczek
  -1 siblings, 0 replies; 37+ messages in thread
From: Donald Buczek @ 2021-12-09 12:57 UTC (permalink / raw)
  To: Paul Menzel, song, Guoqing Jiang
  Cc: agk, snitzer, linux-raid, dm-devel, it+raid

[Update Guoqing’s email address]

On 15.02.21 12:07, Paul Menzel wrote:
> [+cc Donald]
> 
> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>> doesn't reconfigure array.
>>
>> And it could cause deadlock problem for raid5 as follows:
>>
>> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>>     idle to sync_action.
>> 2. raid5 sync thread was blocked if there were too many active stripes.
>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>>     which causes the number of active stripes can't be decreased.
>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>>     to hold reconfig_mutex.
>>
>> More details in the link:
>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
>>
>> And add one parameter to md_reap_sync_thread since it could be called by
>> dm-raid which doesn't hold reconfig_mutex.
>>
>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>

Thanks, Paul, for putting me into the cc.

Guoqing, I don't think, I've tested this patch. Please remove the tested-by.

btw: We have the fix I suggested [1] running on 59 production raid6 sets with 16 disk each with various loads and with monthly mdcheck (paused during daytime, so a few transitions each month) on several kernel versions running for nearly a year now. Many more transitions during testing. That doesn't mean the fix is correct, of course. The configurations of our systems are almost identical and we don't do suspend or anything. But maybe you might want to reconsider.

[1]: https://lore.kernel.org/linux-raid/bc342de0-98d2-1733-39cd-cc1999777ff3@molgen.mpg.de/

If you want me to test V3 of your patch, please put me in the cc.

Best
   Donald

>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>> ---
>> V2:
>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
>>
>>   drivers/md/dm-raid.c |  2 +-
>>   drivers/md/md.c      | 14 +++++++++-----
>>   drivers/md/md.h      |  2 +-
>>   3 files changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>> index cab12b2..0c4cbba 100644
>> --- a/drivers/md/dm-raid.c
>> +++ b/drivers/md/dm-raid.c
>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>>       if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>>           if (mddev->sync_thread) {
>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, false);
>>           }
>>       } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
>>           return -EBUSY;
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index ca40942..0c12b7f 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char *page, size_t len)
>>                   flush_workqueue(md_misc_wq);
>>               if (mddev->sync_thread) {
>>                   set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -                md_reap_sync_thread(mddev);
>> +                md_reap_sync_thread(mddev, true);
>>               }
>>               mddev_unlock(mddev);
>>           }
>> @@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev)
>>           flush_workqueue(md_misc_wq);
>>       if (mddev->sync_thread) {
>>           set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -        md_reap_sync_thread(mddev);
>> +        md_reap_sync_thread(mddev, true);
>>       }
>>       del_timer_sync(&mddev->safemode_timer);
>> @@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev)
>>                * ->spare_active and clear saved_raid_disk
>>                */
>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, true);
>>               clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>               clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>               clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
>> @@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev)
>>               goto unlock;
>>           }
>>           if (mddev->sync_thread) {
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, true);
>>               goto unlock;
>>           }
>>           /* Set RUNNING before clearing NEEDED to avoid
>> @@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev)
>>   }
>>   EXPORT_SYMBOL(md_check_recovery);
>> -void md_reap_sync_thread(struct mddev *mddev)
>> +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>>   {
>>       struct md_rdev *rdev;
>>       sector_t old_dev_sectors = mddev->dev_sectors;
>>       bool is_reshaped = false;
>>       /* resync has finished, collect result */
>> +    if (reconfig_mutex_held)
>> +        mddev_unlock(mddev);
>>       md_unregister_thread(&mddev->sync_thread);
>> +    if (reconfig_mutex_held)
>> +        mddev_lock_nointr(mddev);
>>       if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>>           !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>>           mddev->degraded != mddev->raid_disks) {
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index 34070ab..7ae3d98 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -705,7 +705,7 @@ extern struct md_thread *md_register_thread(
>>   extern void md_unregister_thread(struct md_thread **threadp);
>>   extern void md_wakeup_thread(struct md_thread *thread);
>>   extern void md_check_recovery(struct mddev *mddev);
>> -extern void md_reap_sync_thread(struct mddev *mddev);
>> +extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held);
>>   extern int mddev_init_writes_pending(struct mddev *mddev);
>>   extern bool md_write_start(struct mddev *mddev, struct bio *bi);
>>   extern void md_write_inc(struct mddev *mddev, struct bio *bi);
>>

-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-12-09 12:57     ` Donald Buczek
  0 siblings, 0 replies; 37+ messages in thread
From: Donald Buczek @ 2021-12-09 12:57 UTC (permalink / raw)
  To: Paul Menzel, song, Guoqing Jiang
  Cc: linux-raid, dm-devel, it+raid, snitzer, agk

[Update Guoqing’s email address]

On 15.02.21 12:07, Paul Menzel wrote:
> [+cc Donald]
> 
> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>> doesn't reconfigure array.
>>
>> And it could cause deadlock problem for raid5 as follows:
>>
>> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>>     idle to sync_action.
>> 2. raid5 sync thread was blocked if there were too many active stripes.
>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>>     which causes the number of active stripes can't be decreased.
>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>>     to hold reconfig_mutex.
>>
>> More details in the link:
>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
>>
>> And add one parameter to md_reap_sync_thread since it could be called by
>> dm-raid which doesn't hold reconfig_mutex.
>>
>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>

Thanks, Paul, for putting me into the cc.

Guoqing, I don't think, I've tested this patch. Please remove the tested-by.

btw: We have the fix I suggested [1] running on 59 production raid6 sets with 16 disk each with various loads and with monthly mdcheck (paused during daytime, so a few transitions each month) on several kernel versions running for nearly a year now. Many more transitions during testing. That doesn't mean the fix is correct, of course. The configurations of our systems are almost identical and we don't do suspend or anything. But maybe you might want to reconsider.

[1]: https://lore.kernel.org/linux-raid/bc342de0-98d2-1733-39cd-cc1999777ff3@molgen.mpg.de/

If you want me to test V3 of your patch, please put me in the cc.

Best
   Donald

>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>> ---
>> V2:
>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
>>
>>   drivers/md/dm-raid.c |  2 +-
>>   drivers/md/md.c      | 14 +++++++++-----
>>   drivers/md/md.h      |  2 +-
>>   3 files changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>> index cab12b2..0c4cbba 100644
>> --- a/drivers/md/dm-raid.c
>> +++ b/drivers/md/dm-raid.c
>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>>       if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>>           if (mddev->sync_thread) {
>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, false);
>>           }
>>       } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
>>           return -EBUSY;
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index ca40942..0c12b7f 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char *page, size_t len)
>>                   flush_workqueue(md_misc_wq);
>>               if (mddev->sync_thread) {
>>                   set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -                md_reap_sync_thread(mddev);
>> +                md_reap_sync_thread(mddev, true);
>>               }
>>               mddev_unlock(mddev);
>>           }
>> @@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev)
>>           flush_workqueue(md_misc_wq);
>>       if (mddev->sync_thread) {
>>           set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -        md_reap_sync_thread(mddev);
>> +        md_reap_sync_thread(mddev, true);
>>       }
>>       del_timer_sync(&mddev->safemode_timer);
>> @@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev)
>>                * ->spare_active and clear saved_raid_disk
>>                */
>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, true);
>>               clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>               clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>               clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
>> @@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev)
>>               goto unlock;
>>           }
>>           if (mddev->sync_thread) {
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, true);
>>               goto unlock;
>>           }
>>           /* Set RUNNING before clearing NEEDED to avoid
>> @@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev)
>>   }
>>   EXPORT_SYMBOL(md_check_recovery);
>> -void md_reap_sync_thread(struct mddev *mddev)
>> +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>>   {
>>       struct md_rdev *rdev;
>>       sector_t old_dev_sectors = mddev->dev_sectors;
>>       bool is_reshaped = false;
>>       /* resync has finished, collect result */
>> +    if (reconfig_mutex_held)
>> +        mddev_unlock(mddev);
>>       md_unregister_thread(&mddev->sync_thread);
>> +    if (reconfig_mutex_held)
>> +        mddev_lock_nointr(mddev);
>>       if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>>           !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>>           mddev->degraded != mddev->raid_disks) {
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index 34070ab..7ae3d98 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -705,7 +705,7 @@ extern struct md_thread *md_register_thread(
>>   extern void md_unregister_thread(struct md_thread **threadp);
>>   extern void md_wakeup_thread(struct md_thread *thread);
>>   extern void md_check_recovery(struct mddev *mddev);
>> -extern void md_reap_sync_thread(struct mddev *mddev);
>> +extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held);
>>   extern int mddev_init_writes_pending(struct mddev *mddev);
>>   extern bool md_write_start(struct mddev *mddev, struct bio *bi);
>>   extern void md_write_inc(struct mddev *mddev, struct bio *bi);
>>

-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-12-09 12:57     ` [dm-devel] " Donald Buczek
@ 2021-12-10  1:06       ` Guoqing Jiang
  -1 siblings, 0 replies; 37+ messages in thread
From: Guoqing Jiang @ 2021-12-10  1:06 UTC (permalink / raw)
  To: Donald Buczek, Paul Menzel, song
  Cc: agk, snitzer, linux-raid, dm-devel, it+raid



On 12/9/21 8:57 PM, Donald Buczek wrote:
> [Update Guoqing’s email address]
>
> On 15.02.21 12:07, Paul Menzel wrote:
>> [+cc Donald]
>>
>> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
>>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>>> doesn't reconfigure array.
>>>
>>> And it could cause deadlock problem for raid5 as follows:
>>>
>>> 1. process A tried to reap sync thread with reconfig_mutex held 
>>> after echo
>>>     idle to sync_action.
>>> 2. raid5 sync thread was blocked if there were too many active stripes.
>>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper 
>>> layer)
>>>     which causes the number of active stripes can't be decreased.
>>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was 
>>> not able
>>>     to hold reconfig_mutex.
>>>
>>> More details in the link:
>>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t 
>>>
>>>
>>> And add one parameter to md_reap_sync_thread since it could be 
>>> called by
>>> dm-raid which doesn't hold reconfig_mutex.
>>>
>>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
>
> Thanks, Paul, for putting me into the cc.
>
> Guoqing, I don't think, I've tested this patch. Please remove the 
> tested-by.

This version is basically the similar as the change in the thread.

https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#m546d8c55a42f308985b9d31d4be85832edcd15ab

Anyway, I will remove your tested-by per the request if I will update 
the patch.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-12-10  1:06       ` Guoqing Jiang
  0 siblings, 0 replies; 37+ messages in thread
From: Guoqing Jiang @ 2021-12-10  1:06 UTC (permalink / raw)
  To: Donald Buczek, Paul Menzel, song
  Cc: linux-raid, dm-devel, it+raid, snitzer, agk



On 12/9/21 8:57 PM, Donald Buczek wrote:
> [Update Guoqing’s email address]
>
> On 15.02.21 12:07, Paul Menzel wrote:
>> [+cc Donald]
>>
>> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
>>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>>> doesn't reconfigure array.
>>>
>>> And it could cause deadlock problem for raid5 as follows:
>>>
>>> 1. process A tried to reap sync thread with reconfig_mutex held 
>>> after echo
>>>     idle to sync_action.
>>> 2. raid5 sync thread was blocked if there were too many active stripes.
>>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper 
>>> layer)
>>>     which causes the number of active stripes can't be decreased.
>>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was 
>>> not able
>>>     to hold reconfig_mutex.
>>>
>>> More details in the link:
>>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t 
>>>
>>>
>>> And add one parameter to md_reap_sync_thread since it could be 
>>> called by
>>> dm-raid which doesn't hold reconfig_mutex.
>>>
>>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
>
> Thanks, Paul, for putting me into the cc.
>
> Guoqing, I don't think, I've tested this patch. Please remove the 
> tested-by.

This version is basically the similar as the change in the thread.

https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#m546d8c55a42f308985b9d31d4be85832edcd15ab

Anyway, I will remove your tested-by per the request if I will update 
the patch.

Thanks,
Guoqing


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-02-13  0:49 ` [dm-devel] " Guoqing Jiang
@ 2021-12-10 14:16   ` Donald Buczek
  -1 siblings, 0 replies; 37+ messages in thread
From: Donald Buczek @ 2021-12-10 14:16 UTC (permalink / raw)
  To: song, Guoqing Jiang; +Cc: agk, snitzer, linux-raid, dm-devel, Paul Menzel

Dear Guoqing,

On 13.02.21 01:49, Guoqing Jiang wrote:
> Unregister sync_thread doesn't need to hold reconfig_mutex since it
> doesn't reconfigure array.
> 
> And it could cause deadlock problem for raid5 as follows:
> 
> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>     idle to sync_action.
> 2. raid5 sync thread was blocked if there were too many active stripes.
> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>     which causes the number of active stripes can't be decreased.
> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>     to hold reconfig_mutex.
> 
> More details in the link:
> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
> 
> And add one parameter to md_reap_sync_thread since it could be called by
> dm-raid which doesn't hold reconfig_mutex.
> 
> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> ---
> V2:
> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
> 
>   drivers/md/dm-raid.c |  2 +-
>   drivers/md/md.c      | 14 +++++++++-----
>   drivers/md/md.h      |  2 +-
>   3 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index cab12b2..0c4cbba 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>   	if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>   		if (mddev->sync_thread) {
>   			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -			md_reap_sync_thread(mddev);
> +			md_reap_sync_thread(mddev, false);
>   		}
>   	} else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
>   		return -EBUSY;
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index ca40942..0c12b7f 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char *page, size_t len)
>   				flush_workqueue(md_misc_wq);
>   			if (mddev->sync_thread) {
>   				set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -				md_reap_sync_thread(mddev);
> +				md_reap_sync_thread(mddev, true);
>   			}
>   			mddev_unlock(mddev);
>   		}
> @@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev)
>   		flush_workqueue(md_misc_wq);
>   	if (mddev->sync_thread) {
>   		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -		md_reap_sync_thread(mddev);
> +		md_reap_sync_thread(mddev, true);
>   	}
>   
>   	del_timer_sync(&mddev->safemode_timer);
> @@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev)
>   			 * ->spare_active and clear saved_raid_disk
>   			 */
>   			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -			md_reap_sync_thread(mddev);
> +			md_reap_sync_thread(mddev, true);
>   			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>   			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>   			clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
> @@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev)
>   			goto unlock;
>   		}
>   		if (mddev->sync_thread) {
> -			md_reap_sync_thread(mddev);
> +			md_reap_sync_thread(mddev, true);
>   			goto unlock;
>   		}
>   		/* Set RUNNING before clearing NEEDED to avoid
> @@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev)
>   }
>   EXPORT_SYMBOL(md_check_recovery);
>   
> -void md_reap_sync_thread(struct mddev *mddev)
> +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>   {
>   	struct md_rdev *rdev;
>   	sector_t old_dev_sectors = mddev->dev_sectors;
>   	bool is_reshaped = false;
>   
>   	/* resync has finished, collect result */
> +	if (reconfig_mutex_held)
> +		mddev_unlock(mddev);


If one thread got here, e.g. via action_store( /* "idle" */ ), now that the mutex is unlocked, is there anything which would prevent another thread getting  here as well, e.g. via the same path?

If not, they both might call

>   	md_unregister_thread(&mddev->sync_thread);

Which is not reentrant:

void md_unregister_thread(struct md_thread **threadp)
{
	struct md_thread *thread = *threadp;
	if (!thread)
		return;
	pr_debug("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk));
	/* Locking ensures that mddev_unlock does not wake_up a
	 * non-existent thread
	 */
	spin_lock(&pers_lock);
	*threadp = NULL;
	spin_unlock(&pers_lock);

	kthread_stop(thread->tsk);
	kfree(thread);
}

This might be a preexisting problem, because the call site in dm-raid.c, which you updated to `md_reap_sync_thread(mddev, false);`, didn't hold the mutex anyway.

Am I missing something? Probably, I do.

Otherwise: Move the deref of threadp in md_unregister_thread() into the spinlock scope?

Best
   Donald

> +	if (reconfig_mutex_held)
> +		mddev_lock_nointr(mddev);
>   	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>   	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>   	    mddev->degraded != mddev->raid_disks) {
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 34070ab..7ae3d98 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -705,7 +705,7 @@ extern struct md_thread *md_register_thread(
>   extern void md_unregister_thread(struct md_thread **threadp);
>   extern void md_wakeup_thread(struct md_thread *thread);
>   extern void md_check_recovery(struct mddev *mddev);
> -extern void md_reap_sync_thread(struct mddev *mddev);
> +extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held);
>   extern int mddev_init_writes_pending(struct mddev *mddev);
>   extern bool md_write_start(struct mddev *mddev, struct bio *bi);
>   extern void md_write_inc(struct mddev *mddev, struct bio *bi);
> 

-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-12-10 14:16   ` Donald Buczek
  0 siblings, 0 replies; 37+ messages in thread
From: Donald Buczek @ 2021-12-10 14:16 UTC (permalink / raw)
  To: song, Guoqing Jiang; +Cc: linux-raid, Paul Menzel, dm-devel, snitzer, agk

Dear Guoqing,

On 13.02.21 01:49, Guoqing Jiang wrote:
> Unregister sync_thread doesn't need to hold reconfig_mutex since it
> doesn't reconfigure array.
> 
> And it could cause deadlock problem for raid5 as follows:
> 
> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>     idle to sync_action.
> 2. raid5 sync thread was blocked if there were too many active stripes.
> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>     which causes the number of active stripes can't be decreased.
> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>     to hold reconfig_mutex.
> 
> More details in the link:
> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
> 
> And add one parameter to md_reap_sync_thread since it could be called by
> dm-raid which doesn't hold reconfig_mutex.
> 
> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> ---
> V2:
> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
> 
>   drivers/md/dm-raid.c |  2 +-
>   drivers/md/md.c      | 14 +++++++++-----
>   drivers/md/md.h      |  2 +-
>   3 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index cab12b2..0c4cbba 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>   	if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>   		if (mddev->sync_thread) {
>   			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -			md_reap_sync_thread(mddev);
> +			md_reap_sync_thread(mddev, false);
>   		}
>   	} else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
>   		return -EBUSY;
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index ca40942..0c12b7f 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char *page, size_t len)
>   				flush_workqueue(md_misc_wq);
>   			if (mddev->sync_thread) {
>   				set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -				md_reap_sync_thread(mddev);
> +				md_reap_sync_thread(mddev, true);
>   			}
>   			mddev_unlock(mddev);
>   		}
> @@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev)
>   		flush_workqueue(md_misc_wq);
>   	if (mddev->sync_thread) {
>   		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -		md_reap_sync_thread(mddev);
> +		md_reap_sync_thread(mddev, true);
>   	}
>   
>   	del_timer_sync(&mddev->safemode_timer);
> @@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev)
>   			 * ->spare_active and clear saved_raid_disk
>   			 */
>   			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -			md_reap_sync_thread(mddev);
> +			md_reap_sync_thread(mddev, true);
>   			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>   			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>   			clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
> @@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev)
>   			goto unlock;
>   		}
>   		if (mddev->sync_thread) {
> -			md_reap_sync_thread(mddev);
> +			md_reap_sync_thread(mddev, true);
>   			goto unlock;
>   		}
>   		/* Set RUNNING before clearing NEEDED to avoid
> @@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev)
>   }
>   EXPORT_SYMBOL(md_check_recovery);
>   
> -void md_reap_sync_thread(struct mddev *mddev)
> +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>   {
>   	struct md_rdev *rdev;
>   	sector_t old_dev_sectors = mddev->dev_sectors;
>   	bool is_reshaped = false;
>   
>   	/* resync has finished, collect result */
> +	if (reconfig_mutex_held)
> +		mddev_unlock(mddev);


If one thread got here, e.g. via action_store( /* "idle" */ ), now that the mutex is unlocked, is there anything which would prevent another thread getting  here as well, e.g. via the same path?

If not, they both might call

>   	md_unregister_thread(&mddev->sync_thread);

Which is not reentrant:

void md_unregister_thread(struct md_thread **threadp)
{
	struct md_thread *thread = *threadp;
	if (!thread)
		return;
	pr_debug("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk));
	/* Locking ensures that mddev_unlock does not wake_up a
	 * non-existent thread
	 */
	spin_lock(&pers_lock);
	*threadp = NULL;
	spin_unlock(&pers_lock);

	kthread_stop(thread->tsk);
	kfree(thread);
}

This might be a preexisting problem, because the call site in dm-raid.c, which you updated to `md_reap_sync_thread(mddev, false);`, didn't hold the mutex anyway.

Am I missing something? Probably, I do.

Otherwise: Move the deref of threadp in md_unregister_thread() into the spinlock scope?

Best
   Donald

> +	if (reconfig_mutex_held)
> +		mddev_lock_nointr(mddev);
>   	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>   	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>   	    mddev->degraded != mddev->raid_disks) {
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 34070ab..7ae3d98 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -705,7 +705,7 @@ extern struct md_thread *md_register_thread(
>   extern void md_unregister_thread(struct md_thread **threadp);
>   extern void md_wakeup_thread(struct md_thread *thread);
>   extern void md_check_recovery(struct mddev *mddev);
> -extern void md_reap_sync_thread(struct mddev *mddev);
> +extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held);
>   extern int mddev_init_writes_pending(struct mddev *mddev);
>   extern bool md_write_start(struct mddev *mddev, struct bio *bi);
>   extern void md_write_inc(struct mddev *mddev, struct bio *bi);
> 

-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-12-10 14:16   ` [dm-devel] " Donald Buczek
@ 2021-12-14  2:34     ` Guoqing Jiang
  -1 siblings, 0 replies; 37+ messages in thread
From: Guoqing Jiang @ 2021-12-14  2:34 UTC (permalink / raw)
  To: Donald Buczek, song; +Cc: agk, snitzer, linux-raid, dm-devel, Paul Menzel



On 12/10/21 10:16 PM, Donald Buczek wrote:
> Dear Guoqing,
>
> On 13.02.21 01:49, Guoqing Jiang wrote:
>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>> doesn't reconfigure array.
>>
>> And it could cause deadlock problem for raid5 as follows:
>>
>> 1. process A tried to reap sync thread with reconfig_mutex held after 
>> echo
>>     idle to sync_action.
>> 2. raid5 sync thread was blocked if there were too many active stripes.
>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper 
>> layer)
>>     which causes the number of active stripes can't be decreased.
>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not 
>> able
>>     to hold reconfig_mutex.
>>
>> More details in the link:
>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t 
>>
>>
>> And add one parameter to md_reap_sync_thread since it could be called by
>> dm-raid which doesn't hold reconfig_mutex.
>>
>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>> ---
>> V2:
>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
>>
>>   drivers/md/dm-raid.c |  2 +-
>>   drivers/md/md.c      | 14 +++++++++-----
>>   drivers/md/md.h      |  2 +-
>>   3 files changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>> index cab12b2..0c4cbba 100644
>> --- a/drivers/md/dm-raid.c
>> +++ b/drivers/md/dm-raid.c
>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, 
>> unsigned int argc, char **argv,
>>       if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], 
>> "frozen")) {
>>           if (mddev->sync_thread) {
>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, false);
>>           }
>>       } else if (decipher_sync_action(mddev, mddev->recovery) != 
>> st_idle)
>>           return -EBUSY;
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index ca40942..0c12b7f 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char 
>> *page, size_t len)
>>                   flush_workqueue(md_misc_wq);
>>               if (mddev->sync_thread) {
>>                   set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -                md_reap_sync_thread(mddev);
>> +                md_reap_sync_thread(mddev, true);
>>               }
>>               mddev_unlock(mddev);
>>           }
>> @@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev)
>>           flush_workqueue(md_misc_wq);
>>       if (mddev->sync_thread) {
>>           set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -        md_reap_sync_thread(mddev);
>> +        md_reap_sync_thread(mddev, true);
>>       }
>>         del_timer_sync(&mddev->safemode_timer);
>> @@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev)
>>                * ->spare_active and clear saved_raid_disk
>>                */
>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, true);
>>               clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>               clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>               clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
>> @@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev)
>>               goto unlock;
>>           }
>>           if (mddev->sync_thread) {
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, true);
>>               goto unlock;
>>           }
>>           /* Set RUNNING before clearing NEEDED to avoid
>> @@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev)
>>   }
>>   EXPORT_SYMBOL(md_check_recovery);
>>   -void md_reap_sync_thread(struct mddev *mddev)
>> +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>>   {
>>       struct md_rdev *rdev;
>>       sector_t old_dev_sectors = mddev->dev_sectors;
>>       bool is_reshaped = false;
>>         /* resync has finished, collect result */
>> +    if (reconfig_mutex_held)
>> +        mddev_unlock(mddev);
>
>
> If one thread got here, e.g. via action_store( /* "idle" */ ), now 
> that the mutex is unlocked, is there anything which would prevent 
> another thread getting  here as well, e.g. via the same path?
>
> If not, they both might call
>
>> md_unregister_thread(&mddev->sync_thread);
>
> Which is not reentrant:
>
> void md_unregister_thread(struct md_thread **threadp)
> {
>     struct md_thread *thread = *threadp;
>     if (!thread)
>         return;
>     pr_debug("interrupting MD-thread pid %d\n", 
> task_pid_nr(thread->tsk));
>     /* Locking ensures that mddev_unlock does not wake_up a
>      * non-existent thread
>      */
>     spin_lock(&pers_lock);
>     *threadp = NULL;
>     spin_unlock(&pers_lock);
>
>     kthread_stop(thread->tsk);
>     kfree(thread);
> }
>
> This might be a preexisting problem, because the call site in 
> dm-raid.c, which you updated to `md_reap_sync_thread(mddev, false);`, 
> didn't hold the mutex anyway.
>
> Am I missing something? Probably, I do.
>
> Otherwise: Move the deref of threadp in md_unregister_thread() into 
> the spinlock scope?

Good point, I think you are right.

And actually pers_lock does extra service to protect accesses to 
mddev->thread (I think it
also suitable for mddev->sync_thread ) when the mutex can't be held. 
Care to send a patch
for it?

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-12-14  2:34     ` Guoqing Jiang
  0 siblings, 0 replies; 37+ messages in thread
From: Guoqing Jiang @ 2021-12-14  2:34 UTC (permalink / raw)
  To: Donald Buczek, song; +Cc: linux-raid, Paul Menzel, dm-devel, snitzer, agk



On 12/10/21 10:16 PM, Donald Buczek wrote:
> Dear Guoqing,
>
> On 13.02.21 01:49, Guoqing Jiang wrote:
>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>> doesn't reconfigure array.
>>
>> And it could cause deadlock problem for raid5 as follows:
>>
>> 1. process A tried to reap sync thread with reconfig_mutex held after 
>> echo
>>     idle to sync_action.
>> 2. raid5 sync thread was blocked if there were too many active stripes.
>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper 
>> layer)
>>     which causes the number of active stripes can't be decreased.
>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not 
>> able
>>     to hold reconfig_mutex.
>>
>> More details in the link:
>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t 
>>
>>
>> And add one parameter to md_reap_sync_thread since it could be called by
>> dm-raid which doesn't hold reconfig_mutex.
>>
>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>> ---
>> V2:
>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
>>
>>   drivers/md/dm-raid.c |  2 +-
>>   drivers/md/md.c      | 14 +++++++++-----
>>   drivers/md/md.h      |  2 +-
>>   3 files changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>> index cab12b2..0c4cbba 100644
>> --- a/drivers/md/dm-raid.c
>> +++ b/drivers/md/dm-raid.c
>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, 
>> unsigned int argc, char **argv,
>>       if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], 
>> "frozen")) {
>>           if (mddev->sync_thread) {
>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, false);
>>           }
>>       } else if (decipher_sync_action(mddev, mddev->recovery) != 
>> st_idle)
>>           return -EBUSY;
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index ca40942..0c12b7f 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char 
>> *page, size_t len)
>>                   flush_workqueue(md_misc_wq);
>>               if (mddev->sync_thread) {
>>                   set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -                md_reap_sync_thread(mddev);
>> +                md_reap_sync_thread(mddev, true);
>>               }
>>               mddev_unlock(mddev);
>>           }
>> @@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev)
>>           flush_workqueue(md_misc_wq);
>>       if (mddev->sync_thread) {
>>           set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -        md_reap_sync_thread(mddev);
>> +        md_reap_sync_thread(mddev, true);
>>       }
>>         del_timer_sync(&mddev->safemode_timer);
>> @@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev)
>>                * ->spare_active and clear saved_raid_disk
>>                */
>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, true);
>>               clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>               clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>               clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
>> @@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev)
>>               goto unlock;
>>           }
>>           if (mddev->sync_thread) {
>> -            md_reap_sync_thread(mddev);
>> +            md_reap_sync_thread(mddev, true);
>>               goto unlock;
>>           }
>>           /* Set RUNNING before clearing NEEDED to avoid
>> @@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev)
>>   }
>>   EXPORT_SYMBOL(md_check_recovery);
>>   -void md_reap_sync_thread(struct mddev *mddev)
>> +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>>   {
>>       struct md_rdev *rdev;
>>       sector_t old_dev_sectors = mddev->dev_sectors;
>>       bool is_reshaped = false;
>>         /* resync has finished, collect result */
>> +    if (reconfig_mutex_held)
>> +        mddev_unlock(mddev);
>
>
> If one thread got here, e.g. via action_store( /* "idle" */ ), now 
> that the mutex is unlocked, is there anything which would prevent 
> another thread getting  here as well, e.g. via the same path?
>
> If not, they both might call
>
>> md_unregister_thread(&mddev->sync_thread);
>
> Which is not reentrant:
>
> void md_unregister_thread(struct md_thread **threadp)
> {
>     struct md_thread *thread = *threadp;
>     if (!thread)
>         return;
>     pr_debug("interrupting MD-thread pid %d\n", 
> task_pid_nr(thread->tsk));
>     /* Locking ensures that mddev_unlock does not wake_up a
>      * non-existent thread
>      */
>     spin_lock(&pers_lock);
>     *threadp = NULL;
>     spin_unlock(&pers_lock);
>
>     kthread_stop(thread->tsk);
>     kfree(thread);
> }
>
> This might be a preexisting problem, because the call site in 
> dm-raid.c, which you updated to `md_reap_sync_thread(mddev, false);`, 
> didn't hold the mutex anyway.
>
> Am I missing something? Probably, I do.
>
> Otherwise: Move the deref of threadp in md_unregister_thread() into 
> the spinlock scope?

Good point, I think you are right.

And actually pers_lock does extra service to protect accesses to 
mddev->thread (I think it
also suitable for mddev->sync_thread ) when the mutex can't be held. 
Care to send a patch
for it?

Thanks,
Guoqing


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-12-14  2:34     ` [dm-devel] " Guoqing Jiang
@ 2021-12-14  9:31       ` Donald Buczek
  -1 siblings, 0 replies; 37+ messages in thread
From: Donald Buczek @ 2021-12-14  9:31 UTC (permalink / raw)
  To: Guoqing Jiang, song; +Cc: agk, snitzer, linux-raid, dm-devel, Paul Menzel

On 14.12.21 03:34, Guoqing Jiang wrote:
> 
> 
> On 12/10/21 10:16 PM, Donald Buczek wrote:
>> Dear Guoqing,
>>
>> On 13.02.21 01:49, Guoqing Jiang wrote:
>>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>>> doesn't reconfigure array.
>>>
>>> And it could cause deadlock problem for raid5 as follows:
>>>
>>> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>>>     idle to sync_action.
>>> 2. raid5 sync thread was blocked if there were too many active stripes.
>>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>>>     which causes the number of active stripes can't be decreased.
>>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>>>     to hold reconfig_mutex.
>>>
>>> More details in the link:
>>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
>>>
>>> And add one parameter to md_reap_sync_thread since it could be called by
>>> dm-raid which doesn't hold reconfig_mutex.
>>>
>>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
>>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>>> ---
>>> V2:
>>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
>>>
>>>   drivers/md/dm-raid.c |  2 +-
>>>   drivers/md/md.c      | 14 +++++++++-----
>>>   drivers/md/md.h      |  2 +-
>>>   3 files changed, 11 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>>> index cab12b2..0c4cbba 100644
>>> --- a/drivers/md/dm-raid.c
>>> +++ b/drivers/md/dm-raid.c
>>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>>>       if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>>>           if (mddev->sync_thread) {
>>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>> -            md_reap_sync_thread(mddev);
>>> +            md_reap_sync_thread(mddev, false);
>>>           }
>>>       } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
>>>           return -EBUSY;
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index ca40942..0c12b7f 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char *page, size_t len)
>>>                   flush_workqueue(md_misc_wq);
>>>               if (mddev->sync_thread) {
>>>                   set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>> -                md_reap_sync_thread(mddev);
>>> +                md_reap_sync_thread(mddev, true);
>>>               }
>>>               mddev_unlock(mddev);
>>>           }
>>> @@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev)
>>>           flush_workqueue(md_misc_wq);
>>>       if (mddev->sync_thread) {
>>>           set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>> -        md_reap_sync_thread(mddev);
>>> +        md_reap_sync_thread(mddev, true);
>>>       }
>>>         del_timer_sync(&mddev->safemode_timer);
>>> @@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev)
>>>                * ->spare_active and clear saved_raid_disk
>>>                */
>>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>> -            md_reap_sync_thread(mddev);
>>> +            md_reap_sync_thread(mddev, true);
>>>               clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>>               clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>>               clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
>>> @@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev)
>>>               goto unlock;
>>>           }
>>>           if (mddev->sync_thread) {
>>> -            md_reap_sync_thread(mddev);
>>> +            md_reap_sync_thread(mddev, true);
>>>               goto unlock;
>>>           }
>>>           /* Set RUNNING before clearing NEEDED to avoid
>>> @@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev)
>>>   }
>>>   EXPORT_SYMBOL(md_check_recovery);
>>>   -void md_reap_sync_thread(struct mddev *mddev)
>>> +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>>>   {
>>>       struct md_rdev *rdev;
>>>       sector_t old_dev_sectors = mddev->dev_sectors;
>>>       bool is_reshaped = false;
>>>         /* resync has finished, collect result */
>>> +    if (reconfig_mutex_held)
>>> +        mddev_unlock(mddev);
>>
>>
>> If one thread got here, e.g. via action_store( /* "idle" */ ), now that the mutex is unlocked, is there anything which would prevent another thread getting  here as well, e.g. via the same path?
>>
>> If not, they both might call
>>
>>> md_unregister_thread(&mddev->sync_thread);
>>
>> Which is not reentrant:
>>
>> void md_unregister_thread(struct md_thread **threadp)
>> {
>>     struct md_thread *thread = *threadp;
>>     if (!thread)
>>         return;
>>     pr_debug("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk));
>>     /* Locking ensures that mddev_unlock does not wake_up a
>>      * non-existent thread
>>      */
>>     spin_lock(&pers_lock);
>>     *threadp = NULL;
>>     spin_unlock(&pers_lock);
>>
>>     kthread_stop(thread->tsk);
>>     kfree(thread);
>> }
>>
>> This might be a preexisting problem, because the call site in dm-raid.c, which you updated to `md_reap_sync_thread(mddev, false);`, didn't hold the mutex anyway.
>>
>> Am I missing something? Probably, I do.
>>
>> Otherwise: Move the deref of threadp in md_unregister_thread() into the spinlock scope?
> 
> Good point, I think you are right.
> 
> And actually pers_lock does extra service to protect accesses to mddev->thread (I think it
> also suitable for mddev->sync_thread ) when the mutex can't be held. Care to send a patch
> for it?

I'm really sorry, but it's one thing to point to a possible problem and another thing to come up with a correct solution.

While I think it would be easy to avoid the double free with the spinlock (or maybe atomic RMW) , we surely don't want to hold the spinlock while we are sleeping in kthread_stop(). If we don't hold some kind of lock, what are the side effects of another sync thread being started or any other reconfiguration? Are the existing flags enough to protect us from this? If we do want to hold the lock while waiting for the thread to terminate, should it be made into a mutex? If so, it probably shouldn't be static but moved into the mddev structure. I'd need weeks if not month to figure that out and to feel bold enough to post it.

I don't want to push work to others, but my own my understanding of md is to narrow.

Best

   Donald

> 
> Thanks,
> Guoqing

-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-12-14  9:31       ` Donald Buczek
  0 siblings, 0 replies; 37+ messages in thread
From: Donald Buczek @ 2021-12-14  9:31 UTC (permalink / raw)
  To: Guoqing Jiang, song; +Cc: linux-raid, Paul Menzel, dm-devel, snitzer, agk

On 14.12.21 03:34, Guoqing Jiang wrote:
> 
> 
> On 12/10/21 10:16 PM, Donald Buczek wrote:
>> Dear Guoqing,
>>
>> On 13.02.21 01:49, Guoqing Jiang wrote:
>>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>>> doesn't reconfigure array.
>>>
>>> And it could cause deadlock problem for raid5 as follows:
>>>
>>> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>>>     idle to sync_action.
>>> 2. raid5 sync thread was blocked if there were too many active stripes.
>>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>>>     which causes the number of active stripes can't be decreased.
>>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>>>     to hold reconfig_mutex.
>>>
>>> More details in the link:
>>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
>>>
>>> And add one parameter to md_reap_sync_thread since it could be called by
>>> dm-raid which doesn't hold reconfig_mutex.
>>>
>>> Reported-and-tested-by: Donald Buczek <buczek@molgen.mpg.de>
>>> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>>> ---
>>> V2:
>>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
>>>
>>>   drivers/md/dm-raid.c |  2 +-
>>>   drivers/md/md.c      | 14 +++++++++-----
>>>   drivers/md/md.h      |  2 +-
>>>   3 files changed, 11 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
>>> index cab12b2..0c4cbba 100644
>>> --- a/drivers/md/dm-raid.c
>>> +++ b/drivers/md/dm-raid.c
>>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
>>>       if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
>>>           if (mddev->sync_thread) {
>>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>> -            md_reap_sync_thread(mddev);
>>> +            md_reap_sync_thread(mddev, false);
>>>           }
>>>       } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
>>>           return -EBUSY;
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index ca40942..0c12b7f 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -4857,7 +4857,7 @@ action_store(struct mddev *mddev, const char *page, size_t len)
>>>                   flush_workqueue(md_misc_wq);
>>>               if (mddev->sync_thread) {
>>>                   set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>> -                md_reap_sync_thread(mddev);
>>> +                md_reap_sync_thread(mddev, true);
>>>               }
>>>               mddev_unlock(mddev);
>>>           }
>>> @@ -6234,7 +6234,7 @@ static void __md_stop_writes(struct mddev *mddev)
>>>           flush_workqueue(md_misc_wq);
>>>       if (mddev->sync_thread) {
>>>           set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>> -        md_reap_sync_thread(mddev);
>>> +        md_reap_sync_thread(mddev, true);
>>>       }
>>>         del_timer_sync(&mddev->safemode_timer);
>>> @@ -9256,7 +9256,7 @@ void md_check_recovery(struct mddev *mddev)
>>>                * ->spare_active and clear saved_raid_disk
>>>                */
>>>               set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>> -            md_reap_sync_thread(mddev);
>>> +            md_reap_sync_thread(mddev, true);
>>>               clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>>               clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>>               clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
>>> @@ -9291,7 +9291,7 @@ void md_check_recovery(struct mddev *mddev)
>>>               goto unlock;
>>>           }
>>>           if (mddev->sync_thread) {
>>> -            md_reap_sync_thread(mddev);
>>> +            md_reap_sync_thread(mddev, true);
>>>               goto unlock;
>>>           }
>>>           /* Set RUNNING before clearing NEEDED to avoid
>>> @@ -9364,14 +9364,18 @@ void md_check_recovery(struct mddev *mddev)
>>>   }
>>>   EXPORT_SYMBOL(md_check_recovery);
>>>   -void md_reap_sync_thread(struct mddev *mddev)
>>> +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>>>   {
>>>       struct md_rdev *rdev;
>>>       sector_t old_dev_sectors = mddev->dev_sectors;
>>>       bool is_reshaped = false;
>>>         /* resync has finished, collect result */
>>> +    if (reconfig_mutex_held)
>>> +        mddev_unlock(mddev);
>>
>>
>> If one thread got here, e.g. via action_store( /* "idle" */ ), now that the mutex is unlocked, is there anything which would prevent another thread getting  here as well, e.g. via the same path?
>>
>> If not, they both might call
>>
>>> md_unregister_thread(&mddev->sync_thread);
>>
>> Which is not reentrant:
>>
>> void md_unregister_thread(struct md_thread **threadp)
>> {
>>     struct md_thread *thread = *threadp;
>>     if (!thread)
>>         return;
>>     pr_debug("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk));
>>     /* Locking ensures that mddev_unlock does not wake_up a
>>      * non-existent thread
>>      */
>>     spin_lock(&pers_lock);
>>     *threadp = NULL;
>>     spin_unlock(&pers_lock);
>>
>>     kthread_stop(thread->tsk);
>>     kfree(thread);
>> }
>>
>> This might be a preexisting problem, because the call site in dm-raid.c, which you updated to `md_reap_sync_thread(mddev, false);`, didn't hold the mutex anyway.
>>
>> Am I missing something? Probably, I do.
>>
>> Otherwise: Move the deref of threadp in md_unregister_thread() into the spinlock scope?
> 
> Good point, I think you are right.
> 
> And actually pers_lock does extra service to protect accesses to mddev->thread (I think it
> also suitable for mddev->sync_thread ) when the mutex can't be held. Care to send a patch
> for it?

I'm really sorry, but it's one thing to point to a possible problem and another thing to come up with a correct solution.

While I think it would be easy to avoid the double free with the spinlock (or maybe atomic RMW) , we surely don't want to hold the spinlock while we are sleeping in kthread_stop(). If we don't hold some kind of lock, what are the side effects of another sync thread being started or any other reconfiguration? Are the existing flags enough to protect us from this? If we do want to hold the lock while waiting for the thread to terminate, should it be made into a mutex? If so, it probably shouldn't be static but moved into the mddev structure. I'd need weeks if not month to figure that out and to feel bold enough to post it.

I don't want to push work to others, but my own my understanding of md is to narrow.

Best

   Donald

> 
> Thanks,
> Guoqing

-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-12-14  9:31       ` [dm-devel] " Donald Buczek
@ 2021-12-14 10:03         ` Guoqing Jiang
  -1 siblings, 0 replies; 37+ messages in thread
From: Guoqing Jiang @ 2021-12-14 10:03 UTC (permalink / raw)
  To: Donald Buczek, song; +Cc: agk, snitzer, linux-raid, dm-devel, Paul Menzel



On 12/14/21 5:31 PM, Donald Buczek wrote:

>>>>   -void md_reap_sync_thread(struct mddev *mddev)
>>>> +void md_reap_sync_thread(struct mddev *mddev, bool 
>>>> reconfig_mutex_held)
>>>>   {
>>>>       struct md_rdev *rdev;
>>>>       sector_t old_dev_sectors = mddev->dev_sectors;
>>>>       bool is_reshaped = false;
>>>>         /* resync has finished, collect result */
>>>> +    if (reconfig_mutex_held)
>>>> +        mddev_unlock(mddev);
>>>
>>>
>>> If one thread got here, e.g. via action_store( /* "idle" */ ), now 
>>> that the mutex is unlocked, is there anything which would prevent 
>>> another thread getting  here as well, e.g. via the same path?
>>>
>>> If not, they both might call
>>>
>>>> md_unregister_thread(&mddev->sync_thread);
>>>
>>> Which is not reentrant:
>>>
>>> void md_unregister_thread(struct md_thread **threadp)
>>> {
>>>     struct md_thread *thread = *threadp;
>>>     if (!thread)
>>>         return;
>>>     pr_debug("interrupting MD-thread pid %d\n", 
>>> task_pid_nr(thread->tsk));
>>>     /* Locking ensures that mddev_unlock does not wake_up a
>>>      * non-existent thread
>>>      */
>>>     spin_lock(&pers_lock);
>>>     *threadp = NULL;
>>>     spin_unlock(&pers_lock);
>>>
>>>     kthread_stop(thread->tsk);
>>>     kfree(thread);
>>> }
>>>
>>> This might be a preexisting problem, because the call site in 
>>> dm-raid.c, which you updated to `md_reap_sync_thread(mddev, 
>>> false);`, didn't hold the mutex anyway.
>>>
>>> Am I missing something? Probably, I do.
>>>
>>> Otherwise: Move the deref of threadp in md_unregister_thread() into 
>>> the spinlock scope?
>>
>> Good point, I think you are right.
>>
>> And actually pers_lock does extra service to protect accesses to 
>> mddev->thread (I think it
>> also suitable for mddev->sync_thread ) when the mutex can't be held. 
>> Care to send a patch
>> for it?
>
> I'm really sorry, but it's one thing to point to a possible problem 
> and another thing to come up with a correct solution.

Yes, it is often the reality of life, and we can always correct 
ourselves if there is problem 😎.

> While I think it would be easy to avoid the double free with the 
> spinlock (or maybe atomic RMW) , we surely don't want to hold the 
> spinlock while we are sleeping in kthread_stop(). If we don't hold 
> some kind of lock, what are the side effects of another sync thread 
> being started or any other reconfiguration? Are the existing flags 
> enough to protect us from this? If we do want to hold the lock while 
> waiting for the thread to terminate, should it be made into a mutex? 
> If so, it probably shouldn't be static but moved into the mddev 
> structure. I'd need weeks if not month to figure that out and to feel 
> bold enough to post it.

Thanks for deep thinking about it, I think we can avoid to call 
kthread_stop with spinlock
held. Maybe something like this, but just my raw idea, please have a 
thorough review.

  void md_unregister_thread(struct md_thread **threadp)
  {
-       struct md_thread *thread = *threadp;
-       if (!thread)
+       struct md_thread *thread = READ_ONCE(*threadp);
+
+       spin_lock(&pers_lock);
+       if (!thread) {
+               spin_unlock(&pers_lock);
                 return;
+       }
         pr_debug("interrupting MD-thread pid %d\n", 
task_pid_nr(thread->tsk));
         /* Locking ensures that mddev_unlock does not wake_up a
          * non-existent thread
          */
-       spin_lock(&pers_lock);
         *threadp = NULL;
         spin_unlock(&pers_lock);

-       kthread_stop(thread->tsk);
+       if (IS_ERR_OR_NULL(thread->tsk)) {
+               kthread_stop(thread->tsk);
+               thread->tsk = NULL;
+       }
         kfree(thread);
  }
  EXPORT_SYMBOL(md_unregister_thread);

> I don't want to push work to others, but my own my understanding of md 
> is to narrow.

Me either 😉

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-12-14 10:03         ` Guoqing Jiang
  0 siblings, 0 replies; 37+ messages in thread
From: Guoqing Jiang @ 2021-12-14 10:03 UTC (permalink / raw)
  To: Donald Buczek, song; +Cc: linux-raid, Paul Menzel, dm-devel, snitzer, agk



On 12/14/21 5:31 PM, Donald Buczek wrote:

>>>>   -void md_reap_sync_thread(struct mddev *mddev)
>>>> +void md_reap_sync_thread(struct mddev *mddev, bool 
>>>> reconfig_mutex_held)
>>>>   {
>>>>       struct md_rdev *rdev;
>>>>       sector_t old_dev_sectors = mddev->dev_sectors;
>>>>       bool is_reshaped = false;
>>>>         /* resync has finished, collect result */
>>>> +    if (reconfig_mutex_held)
>>>> +        mddev_unlock(mddev);
>>>
>>>
>>> If one thread got here, e.g. via action_store( /* "idle" */ ), now 
>>> that the mutex is unlocked, is there anything which would prevent 
>>> another thread getting  here as well, e.g. via the same path?
>>>
>>> If not, they both might call
>>>
>>>> md_unregister_thread(&mddev->sync_thread);
>>>
>>> Which is not reentrant:
>>>
>>> void md_unregister_thread(struct md_thread **threadp)
>>> {
>>>     struct md_thread *thread = *threadp;
>>>     if (!thread)
>>>         return;
>>>     pr_debug("interrupting MD-thread pid %d\n", 
>>> task_pid_nr(thread->tsk));
>>>     /* Locking ensures that mddev_unlock does not wake_up a
>>>      * non-existent thread
>>>      */
>>>     spin_lock(&pers_lock);
>>>     *threadp = NULL;
>>>     spin_unlock(&pers_lock);
>>>
>>>     kthread_stop(thread->tsk);
>>>     kfree(thread);
>>> }
>>>
>>> This might be a preexisting problem, because the call site in 
>>> dm-raid.c, which you updated to `md_reap_sync_thread(mddev, 
>>> false);`, didn't hold the mutex anyway.
>>>
>>> Am I missing something? Probably, I do.
>>>
>>> Otherwise: Move the deref of threadp in md_unregister_thread() into 
>>> the spinlock scope?
>>
>> Good point, I think you are right.
>>
>> And actually pers_lock does extra service to protect accesses to 
>> mddev->thread (I think it
>> also suitable for mddev->sync_thread ) when the mutex can't be held. 
>> Care to send a patch
>> for it?
>
> I'm really sorry, but it's one thing to point to a possible problem 
> and another thing to come up with a correct solution.

Yes, it is often the reality of life, and we can always correct 
ourselves if there is problem 😎.

> While I think it would be easy to avoid the double free with the 
> spinlock (or maybe atomic RMW) , we surely don't want to hold the 
> spinlock while we are sleeping in kthread_stop(). If we don't hold 
> some kind of lock, what are the side effects of another sync thread 
> being started or any other reconfiguration? Are the existing flags 
> enough to protect us from this? If we do want to hold the lock while 
> waiting for the thread to terminate, should it be made into a mutex? 
> If so, it probably shouldn't be static but moved into the mddev 
> structure. I'd need weeks if not month to figure that out and to feel 
> bold enough to post it.

Thanks for deep thinking about it, I think we can avoid to call 
kthread_stop with spinlock
held. Maybe something like this, but just my raw idea, please have a 
thorough review.

  void md_unregister_thread(struct md_thread **threadp)
  {
-       struct md_thread *thread = *threadp;
-       if (!thread)
+       struct md_thread *thread = READ_ONCE(*threadp);
+
+       spin_lock(&pers_lock);
+       if (!thread) {
+               spin_unlock(&pers_lock);
                 return;
+       }
         pr_debug("interrupting MD-thread pid %d\n", 
task_pid_nr(thread->tsk));
         /* Locking ensures that mddev_unlock does not wake_up a
          * non-existent thread
          */
-       spin_lock(&pers_lock);
         *threadp = NULL;
         spin_unlock(&pers_lock);

-       kthread_stop(thread->tsk);
+       if (IS_ERR_OR_NULL(thread->tsk)) {
+               kthread_stop(thread->tsk);
+               thread->tsk = NULL;
+       }
         kfree(thread);
  }
  EXPORT_SYMBOL(md_unregister_thread);

> I don't want to push work to others, but my own my understanding of md 
> is to narrow.

Me either 😉

Thanks,
Guoqing


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-12-14 10:03         ` [dm-devel] " Guoqing Jiang
@ 2021-12-14 12:21           ` Donald Buczek
  -1 siblings, 0 replies; 37+ messages in thread
From: Donald Buczek @ 2021-12-14 12:21 UTC (permalink / raw)
  To: Guoqing Jiang, song; +Cc: agk, snitzer, linux-raid, dm-devel, Paul Menzel

On 14.12.21 11:03, Guoqing Jiang wrote:
> 
> 
> On 12/14/21 5:31 PM, Donald Buczek wrote:
> 
>>>>>   -void md_reap_sync_thread(struct mddev *mddev)
>>>>> +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>>>>>   {
>>>>>       struct md_rdev *rdev;
>>>>>       sector_t old_dev_sectors = mddev->dev_sectors;
>>>>>       bool is_reshaped = false;
>>>>>         /* resync has finished, collect result */
>>>>> +    if (reconfig_mutex_held)
>>>>> +        mddev_unlock(mddev);
>>>>
>>>>
>>>> If one thread got here, e.g. via action_store( /* "idle" */ ), now that the mutex is unlocked, is there anything which would prevent another thread getting  here as well, e.g. via the same path?
>>>>
>>>> If not, they both might call
>>>>
>>>>> md_unregister_thread(&mddev->sync_thread);
>>>>
>>>> Which is not reentrant:
>>>>
>>>> void md_unregister_thread(struct md_thread **threadp)
>>>> {
>>>>     struct md_thread *thread = *threadp;
>>>>     if (!thread)
>>>>         return;
>>>>     pr_debug("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk));
>>>>     /* Locking ensures that mddev_unlock does not wake_up a
>>>>      * non-existent thread
>>>>      */
>>>>     spin_lock(&pers_lock);
>>>>     *threadp = NULL;
>>>>     spin_unlock(&pers_lock);
>>>>
>>>>     kthread_stop(thread->tsk);
>>>>     kfree(thread);
>>>> }
>>>>
>>>> This might be a preexisting problem, because the call site in dm-raid.c, which you updated to `md_reap_sync_thread(mddev, false);`, didn't hold the mutex anyway.
>>>>
>>>> Am I missing something? Probably, I do.
>>>>
>>>> Otherwise: Move the deref of threadp in md_unregister_thread() into the spinlock scope?
>>>
>>> Good point, I think you are right.
>>>
>>> And actually pers_lock does extra service to protect accesses to mddev->thread (I think it
>>> also suitable for mddev->sync_thread ) when the mutex can't be held. Care to send a patch
>>> for it?
>>
>> I'm really sorry, but it's one thing to point to a possible problem and another thing to come up with a correct solution.
> 
> Yes, it is often the reality of life, and we can always correct ourselves if there is problem 😎.
> 
>> While I think it would be easy to avoid the double free with the spinlock (or maybe atomic RMW) , we surely don't want to hold the spinlock while we are sleeping in kthread_stop(). If we don't hold some kind of lock, what are the side effects of another sync thread being started or any other reconfiguration? Are the existing flags enough to protect us from this? If we do want to hold the lock while waiting for the thread to terminate, should it be made into a mutex? If so, it probably shouldn't be static but moved into the mddev structure. I'd need weeks if not month to figure that out and to feel bold enough to post it.
> 
> Thanks for deep thinking about it, I think we can avoid to call kthread_stop with spinlock
> held. Maybe something like this, but just my raw idea, please have a thorough review.
> 
>   void md_unregister_thread(struct md_thread **threadp)
>   {
> -       struct md_thread *thread = *threadp;
> -       if (!thread)
> +       struct md_thread *thread = READ_ONCE(*threadp);

- The access to *threadp needs to be after the spin_lock(). Otherwise two CPUs might read non-NULL here.

- If it was after spin_lock(), I think (!= I know), we don't need the READ_ONCE, because spin_lock() implies a compiler barrier.

> +
> +       spin_lock(&pers_lock);
> +       if (!thread) {
> +               spin_unlock(&pers_lock);
>                  return;
> +       }
>          pr_debug("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk));
>          /* Locking ensures that mddev_unlock does not wake_up a
>           * non-existent thread
>           */
> -       spin_lock(&pers_lock);
>          *threadp = NULL;
>          spin_unlock(&pers_lock);
> 
> -       kthread_stop(thread->tsk);
> +       if (IS_ERR_OR_NULL(thread->tsk)) {

- Test accidentally negated? This test is new. Is this an unrelated change? Anyway, I don't get it.

> +               kthread_stop(thread->tsk);
> +               thread->tsk = NULL;

- This assignment can't be required, because we are going to kfree(thread) in the next line.

> +       }
>          kfree(thread);
>   }
>   EXPORT_SYMBOL(md_unregister_thread);
> 
>> I don't want to push work to others, but my own my understanding of md is to narrow.
> 
> Me either 😉
> 
> Thanks,
> Guoqing

Best

   Donald

-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2021-12-14 12:21           ` Donald Buczek
  0 siblings, 0 replies; 37+ messages in thread
From: Donald Buczek @ 2021-12-14 12:21 UTC (permalink / raw)
  To: Guoqing Jiang, song; +Cc: linux-raid, Paul Menzel, dm-devel, snitzer, agk

On 14.12.21 11:03, Guoqing Jiang wrote:
> 
> 
> On 12/14/21 5:31 PM, Donald Buczek wrote:
> 
>>>>>   -void md_reap_sync_thread(struct mddev *mddev)
>>>>> +void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>>>>>   {
>>>>>       struct md_rdev *rdev;
>>>>>       sector_t old_dev_sectors = mddev->dev_sectors;
>>>>>       bool is_reshaped = false;
>>>>>         /* resync has finished, collect result */
>>>>> +    if (reconfig_mutex_held)
>>>>> +        mddev_unlock(mddev);
>>>>
>>>>
>>>> If one thread got here, e.g. via action_store( /* "idle" */ ), now that the mutex is unlocked, is there anything which would prevent another thread getting  here as well, e.g. via the same path?
>>>>
>>>> If not, they both might call
>>>>
>>>>> md_unregister_thread(&mddev->sync_thread);
>>>>
>>>> Which is not reentrant:
>>>>
>>>> void md_unregister_thread(struct md_thread **threadp)
>>>> {
>>>>     struct md_thread *thread = *threadp;
>>>>     if (!thread)
>>>>         return;
>>>>     pr_debug("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk));
>>>>     /* Locking ensures that mddev_unlock does not wake_up a
>>>>      * non-existent thread
>>>>      */
>>>>     spin_lock(&pers_lock);
>>>>     *threadp = NULL;
>>>>     spin_unlock(&pers_lock);
>>>>
>>>>     kthread_stop(thread->tsk);
>>>>     kfree(thread);
>>>> }
>>>>
>>>> This might be a preexisting problem, because the call site in dm-raid.c, which you updated to `md_reap_sync_thread(mddev, false);`, didn't hold the mutex anyway.
>>>>
>>>> Am I missing something? Probably, I do.
>>>>
>>>> Otherwise: Move the deref of threadp in md_unregister_thread() into the spinlock scope?
>>>
>>> Good point, I think you are right.
>>>
>>> And actually pers_lock does extra service to protect accesses to mddev->thread (I think it
>>> also suitable for mddev->sync_thread ) when the mutex can't be held. Care to send a patch
>>> for it?
>>
>> I'm really sorry, but it's one thing to point to a possible problem and another thing to come up with a correct solution.
> 
> Yes, it is often the reality of life, and we can always correct ourselves if there is problem 😎.
> 
>> While I think it would be easy to avoid the double free with the spinlock (or maybe atomic RMW) , we surely don't want to hold the spinlock while we are sleeping in kthread_stop(). If we don't hold some kind of lock, what are the side effects of another sync thread being started or any other reconfiguration? Are the existing flags enough to protect us from this? If we do want to hold the lock while waiting for the thread to terminate, should it be made into a mutex? If so, it probably shouldn't be static but moved into the mddev structure. I'd need weeks if not month to figure that out and to feel bold enough to post it.
> 
> Thanks for deep thinking about it, I think we can avoid to call kthread_stop with spinlock
> held. Maybe something like this, but just my raw idea, please have a thorough review.
> 
>   void md_unregister_thread(struct md_thread **threadp)
>   {
> -       struct md_thread *thread = *threadp;
> -       if (!thread)
> +       struct md_thread *thread = READ_ONCE(*threadp);

- The access to *threadp needs to be after the spin_lock(). Otherwise two CPUs might read non-NULL here.

- If it was after spin_lock(), I think (!= I know), we don't need the READ_ONCE, because spin_lock() implies a compiler barrier.

> +
> +       spin_lock(&pers_lock);
> +       if (!thread) {
> +               spin_unlock(&pers_lock);
>                  return;
> +       }
>          pr_debug("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk));
>          /* Locking ensures that mddev_unlock does not wake_up a
>           * non-existent thread
>           */
> -       spin_lock(&pers_lock);
>          *threadp = NULL;
>          spin_unlock(&pers_lock);
> 
> -       kthread_stop(thread->tsk);
> +       if (IS_ERR_OR_NULL(thread->tsk)) {

- Test accidentally negated? This test is new. Is this an unrelated change? Anyway, I don't get it.

> +               kthread_stop(thread->tsk);
> +               thread->tsk = NULL;

- This assignment can't be required, because we are going to kfree(thread) in the next line.

> +       }
>          kfree(thread);
>   }
>   EXPORT_SYMBOL(md_unregister_thread);
> 
>> I don't want to push work to others, but my own my understanding of md is to narrow.
> 
> Me either 😉
> 
> Thanks,
> Guoqing

Best

   Donald

-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
  2021-02-13  0:49 ` [dm-devel] " Guoqing Jiang
@ 2022-01-11 12:25   ` Mikko Rantalainen
  -1 siblings, 0 replies; 37+ messages in thread
From: Mikko Rantalainen @ 2022-01-11 12:25 UTC (permalink / raw)
  To: Guoqing Jiang, song; +Cc: agk, snitzer, linux-raid, dm-devel

Guoqing Jiang (2021-02-13 02:49 Europe/Helsinki):
> Unregister sync_thread doesn't need to hold reconfig_mutex since it
> doesn't reconfigure array.
> 
> And it could cause deadlock problem for raid5 as follows:
> 
> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>    idle to sync_action.
> 2. raid5 sync thread was blocked if there were too many active stripes.
> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>    which causes the number of active stripes can't be decreased.
> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>    to hold reconfig_mutex.
> 
> More details in the link:
> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t

I don't understand md well enough to suggest a patch but isn't this
logically a classic two lock deadlock problem where

thread 1:
- lock reconfig_mutex
- blocked for sync that requires SB_CHANGE_PENDING

thread 2:
- (logically) acquire lock SB_CHANGE_PENDING
- blocked for reconfig_mutex before SB_CHANGE_PENDING can be released

?

The classic fix for this kind of deadlock is to require these locks to
be always acquired in constant order and released in reverse order.

If you had a rule that SB_CHANGE_PENDING cannot be set or cleared
without already having reconfig_mutex, wouldn't that prevent this
deadlock? (If I understood the issue correctly, it's currently possible
to set but not clear the SB_CHANGE_PENDING without having reconfig_mutex.)

Another possibility is to expect SB_CHANGE_PENDING to be set as part of
sync process required change to "idle" (write to sync_action). In that
case the logic would be you cannot have reconfig_mutex already locked
before setting (logically acquiring lock) SB_CHANGE_PENDING. So the
transfer from active to idle would require first setting
SB_CHANGE_PENDING, doing the required processing (getting and freeing
reconfig_mutex in process) and then clearing SB_CHANGE_PENDING.
Basically the rule would be you must lock SB_CHANGE_PENDING before you
can lock reconfig_mutex.

If I've understood correctly SB_CHANGE_PENDING is not technically a lock
but it's logically used like it were a lock.

Would either of these make sense for the overall design?

Obviously, if it doesn't hurt the performance too much, using a single
lock for everything that needs to be serialized would be much easier.

-- 
Mikko

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
@ 2022-01-11 12:25   ` Mikko Rantalainen
  0 siblings, 0 replies; 37+ messages in thread
From: Mikko Rantalainen @ 2022-01-11 12:25 UTC (permalink / raw)
  To: Guoqing Jiang, song; +Cc: linux-raid, dm-devel, snitzer, agk

Guoqing Jiang (2021-02-13 02:49 Europe/Helsinki):
> Unregister sync_thread doesn't need to hold reconfig_mutex since it
> doesn't reconfigure array.
> 
> And it could cause deadlock problem for raid5 as follows:
> 
> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>    idle to sync_action.
> 2. raid5 sync thread was blocked if there were too many active stripes.
> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>    which causes the number of active stripes can't be decreased.
> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>    to hold reconfig_mutex.
> 
> More details in the link:
> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t

I don't understand md well enough to suggest a patch but isn't this
logically a classic two lock deadlock problem where

thread 1:
- lock reconfig_mutex
- blocked for sync that requires SB_CHANGE_PENDING

thread 2:
- (logically) acquire lock SB_CHANGE_PENDING
- blocked for reconfig_mutex before SB_CHANGE_PENDING can be released

?

The classic fix for this kind of deadlock is to require these locks to
be always acquired in constant order and released in reverse order.

If you had a rule that SB_CHANGE_PENDING cannot be set or cleared
without already having reconfig_mutex, wouldn't that prevent this
deadlock? (If I understood the issue correctly, it's currently possible
to set but not clear the SB_CHANGE_PENDING without having reconfig_mutex.)

Another possibility is to expect SB_CHANGE_PENDING to be set as part of
sync process required change to "idle" (write to sync_action). In that
case the logic would be you cannot have reconfig_mutex already locked
before setting (logically acquiring lock) SB_CHANGE_PENDING. So the
transfer from active to idle would require first setting
SB_CHANGE_PENDING, doing the required processing (getting and freeing
reconfig_mutex in process) and then clearing SB_CHANGE_PENDING.
Basically the rule would be you must lock SB_CHANGE_PENDING before you
can lock reconfig_mutex.

If I've understood correctly SB_CHANGE_PENDING is not technically a lock
but it's logically used like it were a lock.

Would either of these make sense for the overall design?

Obviously, if it doesn't hurt the performance too much, using a single
lock for everything that needs to be serialized would be much easier.

-- 
Mikko

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2022-01-12  6:24 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-13  0:49 [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held Guoqing Jiang
2021-02-13  0:49 ` [dm-devel] " Guoqing Jiang
2021-02-15 11:07 ` Paul Menzel
2021-02-15 11:07   ` [dm-devel] " Paul Menzel
2021-02-24  9:09   ` Song Liu
2021-02-24  9:09     ` [dm-devel] " Song Liu
2021-02-24  9:25     ` Guoqing Jiang
2021-02-24  9:25       ` [dm-devel] " Guoqing Jiang
2021-03-19 23:00       ` Song Liu
2021-03-19 23:00         ` [dm-devel] " Song Liu
2021-11-30 17:25         ` Paul Menzel
2021-11-30 17:25           ` [dm-devel] " Paul Menzel
2021-11-30 17:27           ` Paul Menzel
2021-11-30 17:27             ` [dm-devel] " Paul Menzel
2021-12-08 14:16             ` Guoqing Jiang
2021-12-08 14:16               ` [dm-devel] " Guoqing Jiang
2021-12-08 16:35               ` Heinz Mauelshagen
2021-12-09  0:47                 ` Guoqing Jiang
2021-12-09  0:47                   ` [dm-devel] " Guoqing Jiang
2021-12-09 12:54   ` Donald Buczek
2021-12-09 12:54     ` [dm-devel] " Donald Buczek
2021-12-09 12:57   ` Donald Buczek
2021-12-09 12:57     ` [dm-devel] " Donald Buczek
2021-12-10  1:06     ` Guoqing Jiang
2021-12-10  1:06       ` [dm-devel] " Guoqing Jiang
2021-12-10 14:16 ` Donald Buczek
2021-12-10 14:16   ` [dm-devel] " Donald Buczek
2021-12-14  2:34   ` Guoqing Jiang
2021-12-14  2:34     ` [dm-devel] " Guoqing Jiang
2021-12-14  9:31     ` Donald Buczek
2021-12-14  9:31       ` [dm-devel] " Donald Buczek
2021-12-14 10:03       ` Guoqing Jiang
2021-12-14 10:03         ` [dm-devel] " Guoqing Jiang
2021-12-14 12:21         ` Donald Buczek
2021-12-14 12:21           ` [dm-devel] " Donald Buczek
2022-01-11 12:25 ` Mikko Rantalainen
2022-01-11 12:25   ` [dm-devel] " Mikko Rantalainen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.