All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] md: only unlock mddev from action_store
@ 2022-06-02 13:45 Guoqing Jiang
  2022-06-02 18:03 ` Logan Gunthorpe
  2022-06-03  6:24 ` Donald Buczek
  0 siblings, 2 replies; 12+ messages in thread
From: Guoqing Jiang @ 2022-06-02 13:45 UTC (permalink / raw)
  To: song; +Cc: linux-raid, buczek, logang

The 07reshape5intr test is broke because of below path.

    md_reap_sync_thread
            -> mddev_unlock
            -> md_unregister_thread(&mddev->sync_thread)

And md_check_recovery is triggered by,

mddev_unlock -> md_wakeup_thread(mddev->thread)

then mddev->reshape_position is set to MaxSector in raid5_finish_reshape
since MD_RECOVERY_INTR is cleared in md_check_recovery, which means
feature_map is not set with MD_FEATURE_RESHAPE_ACTIVE and superblock's
reshape_position can't be updated accordingly.

Since the bug which commit 8b48ec23cc51a ("md: don't unregister sync_thread
with reconfig_mutex held") fixed is related with action_store path, other
callers which reap sync_thread didn't need to be changed, let's

1. only unlock mddev in md_reap_sync_thread if caller is action_store,
   so the parameter is renamed to reflect the change.
2. save some contexts (MD_RECOVERY_INTR and reshape_position) since they
   could be changed by other processes, then restore them after get lock
   again.

Fixes: 8b48ec23cc51a ("md: don't unregister sync_thread with reconfig_mutex held")
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
I suppose the previous bug still can be fixed with the change, but it is
better to verify it. Donald, could you help to test the new code?

Thanks,
Guoqing

 drivers/md/md.c | 24 ++++++++++++++++++------
 drivers/md/md.h |  2 +-
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5c8efef13881..3387260dd55b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6197,7 +6197,7 @@ static void __md_stop_writes(struct mddev *mddev)
 		flush_workqueue(md_misc_wq);
 	if (mddev->sync_thread) {
 		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-		md_reap_sync_thread(mddev, true);
+		md_reap_sync_thread(mddev, false);
 	}
 
 	del_timer_sync(&mddev->safemode_timer);
@@ -9303,7 +9303,7 @@ void md_check_recovery(struct mddev *mddev)
 			 * ->spare_active and clear saved_raid_disk
 			 */
 			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-			md_reap_sync_thread(mddev, true);
+			md_reap_sync_thread(mddev, false);
 			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
 			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 			clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
@@ -9338,7 +9338,7 @@ void md_check_recovery(struct mddev *mddev)
 			goto unlock;
 		}
 		if (mddev->sync_thread) {
-			md_reap_sync_thread(mddev, true);
+			md_reap_sync_thread(mddev, false);
 			goto unlock;
 		}
 		/* Set RUNNING before clearing NEEDED to avoid
@@ -9411,18 +9411,30 @@ void md_check_recovery(struct mddev *mddev)
 }
 EXPORT_SYMBOL(md_check_recovery);
 
-void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
+void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev)
 {
 	struct md_rdev *rdev;
 	sector_t old_dev_sectors = mddev->dev_sectors;
+	sector_t old_reshape_position;
 	bool is_reshaped = false;
+	bool is_interrupted = false;
 
-	if (reconfig_mutex_held)
+	if (unlock_mddev) {
+		old_reshape_position = mddev->reshape_position;
+		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+			is_interrupted = true;
 		mddev_unlock(mddev);
+	}
 	/* resync has finished, collect result */
 	md_unregister_thread(&mddev->sync_thread);
-	if (reconfig_mutex_held)
+	if (unlock_mddev) {
 		mddev_lock_nointr(mddev);
+		/* restore the previous flag and position */
+		mddev->reshape_position = old_reshape_position;
+		if (is_interrupted)
+			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
+	}
+
 	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
 	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
 	    mddev->degraded != mddev->raid_disks) {
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 82465a4b1588..3e3ff3b20e81 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -719,7 +719,7 @@ extern struct md_thread *md_register_thread(
 extern void md_unregister_thread(struct md_thread **threadp);
 extern void md_wakeup_thread(struct md_thread *thread);
 extern void md_check_recovery(struct mddev *mddev);
-extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held);
+extern void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev);
 extern int mddev_init_writes_pending(struct mddev *mddev);
 extern bool md_write_start(struct mddev *mddev, struct bio *bi);
 extern void md_write_inc(struct mddev *mddev, struct bio *bi);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: only unlock mddev from action_store
  2022-06-02 13:45 [PATCH] md: only unlock mddev from action_store Guoqing Jiang
@ 2022-06-02 18:03 ` Logan Gunthorpe
  2022-06-06  3:08   ` Guoqing Jiang
  2022-06-06  3:29   ` Guoqing Jiang
  2022-06-03  6:24 ` Donald Buczek
  1 sibling, 2 replies; 12+ messages in thread
From: Logan Gunthorpe @ 2022-06-02 18:03 UTC (permalink / raw)
  To: Guoqing Jiang, song; +Cc: linux-raid, buczek




On 2022-06-02 07:45, Guoqing Jiang wrote:
> The 07reshape5intr test is broke because of below path.
> 
>     md_reap_sync_thread
>             -> mddev_unlock
>             -> md_unregister_thread(&mddev->sync_thread)
> 
> And md_check_recovery is triggered by,
> 
> mddev_unlock -> md_wakeup_thread(mddev->thread)
> 
> then mddev->reshape_position is set to MaxSector in raid5_finish_reshape
> since MD_RECOVERY_INTR is cleared in md_check_recovery, which means
> feature_map is not set with MD_FEATURE_RESHAPE_ACTIVE and superblock's
> reshape_position can't be updated accordingly.
> 
> Since the bug which commit 8b48ec23cc51a ("md: don't unregister sync_thread
> with reconfig_mutex held") fixed is related with action_store path, other
> callers which reap sync_thread didn't need to be changed, let's
> 
> 1. only unlock mddev in md_reap_sync_thread if caller is action_store,
>    so the parameter is renamed to reflect the change.
> 2. save some contexts (MD_RECOVERY_INTR and reshape_position) since they
>    could be changed by other processes, then restore them after get lock
>    again.
> 
> Fixes: 8b48ec23cc51a ("md: don't unregister sync_thread with reconfig_mutex held")
> Reported-by: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
> ---
> I suppose the previous bug still can be fixed with the change, but it is
> better to verify it. Donald, could you help to test the new code?
> 
> Thanks,
> Guoqing
> 
>  drivers/md/md.c | 24 ++++++++++++++++++------
>  drivers/md/md.h |  2 +-
>  2 files changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 5c8efef13881..3387260dd55b 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6197,7 +6197,7 @@ static void __md_stop_writes(struct mddev *mddev)
>  		flush_workqueue(md_misc_wq);
>  	if (mddev->sync_thread) {
>  		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -		md_reap_sync_thread(mddev, true);
> +		md_reap_sync_thread(mddev, false);
>  	}
>  
>  	del_timer_sync(&mddev->safemode_timer);
> @@ -9303,7 +9303,7 @@ void md_check_recovery(struct mddev *mddev)
>  			 * ->spare_active and clear saved_raid_disk
>  			 */
>  			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -			md_reap_sync_thread(mddev, true);
> +			md_reap_sync_thread(mddev, false);
>  			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>  			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>  			clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
> @@ -9338,7 +9338,7 @@ void md_check_recovery(struct mddev *mddev)
>  			goto unlock;
>  		}
>  		if (mddev->sync_thread) {
> -			md_reap_sync_thread(mddev, true);
> +			md_reap_sync_thread(mddev, false);
>  			goto unlock;
>  		}
>  		/* Set RUNNING before clearing NEEDED to avoid
> @@ -9411,18 +9411,30 @@ void md_check_recovery(struct mddev *mddev)
>  }
>  EXPORT_SYMBOL(md_check_recovery);
>  
> -void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
> +void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev)
>  {
>  	struct md_rdev *rdev;
>  	sector_t old_dev_sectors = mddev->dev_sectors;
> +	sector_t old_reshape_position;
>  	bool is_reshaped = false;
> +	bool is_interrupted = false;
>  
> -	if (reconfig_mutex_held)
> +	if (unlock_mddev) {
> +		old_reshape_position = mddev->reshape_position;
> +		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
> +			is_interrupted = true;
>  		mddev_unlock(mddev);
> +	}
>  	/* resync has finished, collect result */
>  	md_unregister_thread(&mddev->sync_thread);
> -	if (reconfig_mutex_held)
> +	if (unlock_mddev) {
>  		mddev_lock_nointr(mddev);
> +		/* restore the previous flag and position */
> +		mddev->reshape_position = old_reshape_position;
> +		if (is_interrupted)
> +			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> +	}

Maybe instead of the ugly boolean argument we could pull
md_unregister_thread() into all the callers and explicitly unlock in the
single call site that needs it?

Logan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: only unlock mddev from action_store
  2022-06-02 13:45 [PATCH] md: only unlock mddev from action_store Guoqing Jiang
  2022-06-02 18:03 ` Logan Gunthorpe
@ 2022-06-03  6:24 ` Donald Buczek
  1 sibling, 0 replies; 12+ messages in thread
From: Donald Buczek @ 2022-06-03  6:24 UTC (permalink / raw)
  To: Guoqing Jiang, song; +Cc: linux-raid, logang


On 6/2/22 3:45 PM, Guoqing Jiang wrote:
> The 07reshape5intr test is broke because of below path.
> 
>      md_reap_sync_thread
>              -> mddev_unlock
>              -> md_unregister_thread(&mddev->sync_thread)
> 
> And md_check_recovery is triggered by,
> 
> mddev_unlock -> md_wakeup_thread(mddev->thread)
> 
> then mddev->reshape_position is set to MaxSector in raid5_finish_reshape
> since MD_RECOVERY_INTR is cleared in md_check_recovery, which means
> feature_map is not set with MD_FEATURE_RESHAPE_ACTIVE and superblock's
> reshape_position can't be updated accordingly.
> 
> Since the bug which commit 8b48ec23cc51a ("md: don't unregister sync_thread
> with reconfig_mutex held") fixed is related with action_store path, other
> callers which reap sync_thread didn't need to be changed, let's
> 
> 1. only unlock mddev in md_reap_sync_thread if caller is action_store,
>     so the parameter is renamed to reflect the change.
> 2. save some contexts (MD_RECOVERY_INTR and reshape_position) since they
>     could be changed by other processes, then restore them after get lock
>     again.
> 
> Fixes: 8b48ec23cc51a ("md: don't unregister sync_thread with reconfig_mutex held")
> Reported-by: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
> ---
> I suppose the previous bug still can be fixed with the change, but it is
> better to verify it. Donald, could you help to test the new code?

Sure. But Probably not before next week and there is a holiday in Germany on Monday.

Best
   Donald

> 
> Thanks,
> Guoqing
> 
>   drivers/md/md.c | 24 ++++++++++++++++++------
>   drivers/md/md.h |  2 +-
>   2 files changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 5c8efef13881..3387260dd55b 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6197,7 +6197,7 @@ static void __md_stop_writes(struct mddev *mddev)
>   		flush_workqueue(md_misc_wq);
>   	if (mddev->sync_thread) {
>   		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -		md_reap_sync_thread(mddev, true);
> +		md_reap_sync_thread(mddev, false);
>   	}
>   
>   	del_timer_sync(&mddev->safemode_timer);
> @@ -9303,7 +9303,7 @@ void md_check_recovery(struct mddev *mddev)
>   			 * ->spare_active and clear saved_raid_disk
>   			 */
>   			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> -			md_reap_sync_thread(mddev, true);
> +			md_reap_sync_thread(mddev, false);
>   			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>   			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>   			clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
> @@ -9338,7 +9338,7 @@ void md_check_recovery(struct mddev *mddev)
>   			goto unlock;
>   		}
>   		if (mddev->sync_thread) {
> -			md_reap_sync_thread(mddev, true);
> +			md_reap_sync_thread(mddev, false);
>   			goto unlock;
>   		}
>   		/* Set RUNNING before clearing NEEDED to avoid
> @@ -9411,18 +9411,30 @@ void md_check_recovery(struct mddev *mddev)
>   }
>   EXPORT_SYMBOL(md_check_recovery);
>   
> -void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
> +void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev)
>   {
>   	struct md_rdev *rdev;
>   	sector_t old_dev_sectors = mddev->dev_sectors;
> +	sector_t old_reshape_position;
>   	bool is_reshaped = false;
> +	bool is_interrupted = false;
>   
> -	if (reconfig_mutex_held)
> +	if (unlock_mddev) {
> +		old_reshape_position = mddev->reshape_position;
> +		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
> +			is_interrupted = true;
>   		mddev_unlock(mddev);
> +	}
>   	/* resync has finished, collect result */
>   	md_unregister_thread(&mddev->sync_thread);
> -	if (reconfig_mutex_held)
> +	if (unlock_mddev) {
>   		mddev_lock_nointr(mddev);
> +		/* restore the previous flag and position */
> +		mddev->reshape_position = old_reshape_position;
> +		if (is_interrupted)
> +			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> +	}
> +
>   	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>   	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>   	    mddev->degraded != mddev->raid_disks) {
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 82465a4b1588..3e3ff3b20e81 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -719,7 +719,7 @@ extern struct md_thread *md_register_thread(
>   extern void md_unregister_thread(struct md_thread **threadp);
>   extern void md_wakeup_thread(struct md_thread *thread);
>   extern void md_check_recovery(struct mddev *mddev);
> -extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held);
> +extern void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev);
>   extern int mddev_init_writes_pending(struct mddev *mddev);
>   extern bool md_write_start(struct mddev *mddev, struct bio *bi);
>   extern void md_write_inc(struct mddev *mddev, struct bio *bi);
> 


-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: only unlock mddev from action_store
  2022-06-02 18:03 ` Logan Gunthorpe
@ 2022-06-06  3:08   ` Guoqing Jiang
  2022-06-06  8:39     ` Xiao Ni
  2022-06-06  3:29   ` Guoqing Jiang
  1 sibling, 1 reply; 12+ messages in thread
From: Guoqing Jiang @ 2022-06-06  3:08 UTC (permalink / raw)
  To: Logan Gunthorpe, song; +Cc: linux-raid, buczek



On 6/3/22 2:03 AM, Logan Gunthorpe wrote:
>
>
> On 2022-06-02 07:45, Guoqing Jiang wrote:
>> The 07reshape5intr test is broke because of below path.
>>
>>      md_reap_sync_thread
>>              -> mddev_unlock
>>              -> md_unregister_thread(&mddev->sync_thread)
>>
>> And md_check_recovery is triggered by,
>>
>> mddev_unlock -> md_wakeup_thread(mddev->thread)
>>
>> then mddev->reshape_position is set to MaxSector in raid5_finish_reshape
>> since MD_RECOVERY_INTR is cleared in md_check_recovery, which means
>> feature_map is not set with MD_FEATURE_RESHAPE_ACTIVE and superblock's
>> reshape_position can't be updated accordingly.
>>
>> Since the bug which commit 8b48ec23cc51a ("md: don't unregister sync_thread
>> with reconfig_mutex held") fixed is related with action_store path, other
>> callers which reap sync_thread didn't need to be changed, let's
>>
>> 1. only unlock mddev in md_reap_sync_thread if caller is action_store,
>>     so the parameter is renamed to reflect the change.
>> 2. save some contexts (MD_RECOVERY_INTR and reshape_position) since they
>>     could be changed by other processes, then restore them after get lock
>>     again.
>>
>> Fixes: 8b48ec23cc51a ("md: don't unregister sync_thread with reconfig_mutex held")
>> Reported-by: Logan Gunthorpe <logang@deltatee.com>
>> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
>> ---
>> I suppose the previous bug still can be fixed with the change, but it is
>> better to verify it. Donald, could you help to test the new code?
>>
>> Thanks,
>> Guoqing
>>
>>   drivers/md/md.c | 24 ++++++++++++++++++------
>>   drivers/md/md.h |  2 +-
>>   2 files changed, 19 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 5c8efef13881..3387260dd55b 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -6197,7 +6197,7 @@ static void __md_stop_writes(struct mddev *mddev)
>>   		flush_workqueue(md_misc_wq);
>>   	if (mddev->sync_thread) {
>>   		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -		md_reap_sync_thread(mddev, true);
>> +		md_reap_sync_thread(mddev, false);
>>   	}
>>   
>>   	del_timer_sync(&mddev->safemode_timer);
>> @@ -9303,7 +9303,7 @@ void md_check_recovery(struct mddev *mddev)
>>   			 * ->spare_active and clear saved_raid_disk
>>   			 */
>>   			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -			md_reap_sync_thread(mddev, true);
>> +			md_reap_sync_thread(mddev, false);
>>   			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>   			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>   			clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
>> @@ -9338,7 +9338,7 @@ void md_check_recovery(struct mddev *mddev)
>>   			goto unlock;
>>   		}
>>   		if (mddev->sync_thread) {
>> -			md_reap_sync_thread(mddev, true);
>> +			md_reap_sync_thread(mddev, false);
>>   			goto unlock;
>>   		}
>>   		/* Set RUNNING before clearing NEEDED to avoid
>> @@ -9411,18 +9411,30 @@ void md_check_recovery(struct mddev *mddev)
>>   }
>>   EXPORT_SYMBOL(md_check_recovery);
>>   
>> -void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>> +void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev)
>>   {
>>   	struct md_rdev *rdev;
>>   	sector_t old_dev_sectors = mddev->dev_sectors;
>> +	sector_t old_reshape_position;
>>   	bool is_reshaped = false;
>> +	bool is_interrupted = false;
>>   
>> -	if (reconfig_mutex_held)
>> +	if (unlock_mddev) {
>> +		old_reshape_position = mddev->reshape_position;
>> +		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
>> +			is_interrupted = true;

Hmm, this checking is not needed as it is always true from action_store.

>>   		mddev_unlock(mddev);
>> +	}
>>   	/* resync has finished, collect result */
>>   	md_unregister_thread(&mddev->sync_thread);
>> -	if (reconfig_mutex_held)
>> +	if (unlock_mddev) {
>>   		mddev_lock_nointr(mddev);
>> +		/* restore the previous flag and position */
>> +		mddev->reshape_position = old_reshape_position;
>> +		if (is_interrupted)
>> +			set_bit(MD_RECOVERY_INTR, &mddev->recovery);

So is this one.

>> +	}
> Maybe instead of the ugly boolean argument we could pull
> md_unregister_thread() into all the callers and explicitly unlock in the
> single call site that needs it?

Sounds good, let me revert previous commit then pull it.

Thanks,
Guoqing


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: only unlock mddev from action_store
  2022-06-02 18:03 ` Logan Gunthorpe
  2022-06-06  3:08   ` Guoqing Jiang
@ 2022-06-06  3:29   ` Guoqing Jiang
  2022-06-06 15:48     ` Logan Gunthorpe
  1 sibling, 1 reply; 12+ messages in thread
From: Guoqing Jiang @ 2022-06-06  3:29 UTC (permalink / raw)
  To: Logan Gunthorpe, song; +Cc: linux-raid, buczek



On 6/3/22 2:03 AM, Logan Gunthorpe wrote:
>
>
> On 2022-06-02 07:45, Guoqing Jiang wrote:
>> The 07reshape5intr test is broke because of below path.
>>
>>      md_reap_sync_thread
>>              -> mddev_unlock
>>              -> md_unregister_thread(&mddev->sync_thread)
>>
>> And md_check_recovery is triggered by,
>>
>> mddev_unlock -> md_wakeup_thread(mddev->thread)
>>
>> then mddev->reshape_position is set to MaxSector in raid5_finish_reshape
>> since MD_RECOVERY_INTR is cleared in md_check_recovery, which means
>> feature_map is not set with MD_FEATURE_RESHAPE_ACTIVE and superblock's
>> reshape_position can't be updated accordingly.
>>
>> Since the bug which commit 8b48ec23cc51a ("md: don't unregister sync_thread
>> with reconfig_mutex held") fixed is related with action_store path, other
>> callers which reap sync_thread didn't need to be changed, let's
>>
>> 1. only unlock mddev in md_reap_sync_thread if caller is action_store,
>>     so the parameter is renamed to reflect the change.
>> 2. save some contexts (MD_RECOVERY_INTR and reshape_position) since they
>>     could be changed by other processes, then restore them after get lock
>>     again.
>>
>> Fixes: 8b48ec23cc51a ("md: don't unregister sync_thread with reconfig_mutex held")
>> Reported-by: Logan Gunthorpe <logang@deltatee.com>
>> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
>> ---
>> I suppose the previous bug still can be fixed with the change, but it is
>> better to verify it. Donald, could you help to test the new code?
>>
>> Thanks,
>> Guoqing
>>
>>   drivers/md/md.c | 24 ++++++++++++++++++------
>>   drivers/md/md.h |  2 +-
>>   2 files changed, 19 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 5c8efef13881..3387260dd55b 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -6197,7 +6197,7 @@ static void __md_stop_writes(struct mddev *mddev)
>>   		flush_workqueue(md_misc_wq);
>>   	if (mddev->sync_thread) {
>>   		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -		md_reap_sync_thread(mddev, true);
>> +		md_reap_sync_thread(mddev, false);
>>   	}
>>   
>>   	del_timer_sync(&mddev->safemode_timer);
>> @@ -9303,7 +9303,7 @@ void md_check_recovery(struct mddev *mddev)
>>   			 * ->spare_active and clear saved_raid_disk
>>   			 */
>>   			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> -			md_reap_sync_thread(mddev, true);
>> +			md_reap_sync_thread(mddev, false);
>>   			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>   			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>   			clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
>> @@ -9338,7 +9338,7 @@ void md_check_recovery(struct mddev *mddev)
>>   			goto unlock;
>>   		}
>>   		if (mddev->sync_thread) {
>> -			md_reap_sync_thread(mddev, true);
>> +			md_reap_sync_thread(mddev, false);
>>   			goto unlock;
>>   		}
>>   		/* Set RUNNING before clearing NEEDED to avoid
>> @@ -9411,18 +9411,30 @@ void md_check_recovery(struct mddev *mddev)
>>   }
>>   EXPORT_SYMBOL(md_check_recovery);
>>   
>> -void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>> +void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev)
>>   {
>>   	struct md_rdev *rdev;
>>   	sector_t old_dev_sectors = mddev->dev_sectors;
>> +	sector_t old_reshape_position;
>>   	bool is_reshaped = false;
>> +	bool is_interrupted = false;
>>   
>> -	if (reconfig_mutex_held)
>> +	if (unlock_mddev) {
>> +		old_reshape_position = mddev->reshape_position;
>> +		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
>> +			is_interrupted = true;
>>   		mddev_unlock(mddev);
>> +	}
>>   	/* resync has finished, collect result */
>>   	md_unregister_thread(&mddev->sync_thread);
>> -	if (reconfig_mutex_held)
>> +	if (unlock_mddev) {
>>   		mddev_lock_nointr(mddev);
>> +		/* restore the previous flag and position */
>> +		mddev->reshape_position = old_reshape_position;
>> +		if (is_interrupted)
>> +			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>> +	}
> Maybe instead of the ugly boolean argument we could pull
> md_unregister_thread() into all the callers and explicitly unlock in the
> single call site that needs it?

After move "md_unregister_thread(&mddev->sync_thread)", then we need to
rename md_reap_sync_thread given it doesn't unregister sync_thread, any
suggestion about the new name? md_behind_reap_sync_thread?

Thanks,
Guoqing




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: only unlock mddev from action_store
  2022-06-06  3:08   ` Guoqing Jiang
@ 2022-06-06  8:39     ` Xiao Ni
  2022-06-06  9:00       ` Guoqing Jiang
  0 siblings, 1 reply; 12+ messages in thread
From: Xiao Ni @ 2022-06-06  8:39 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Logan Gunthorpe, Song Liu, linux-raid, Donald Buczek

On Mon, Jun 6, 2022 at 11:08 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
>
>
> >> +    }
> > Maybe instead of the ugly boolean argument we could pull
> > md_unregister_thread() into all the callers and explicitly unlock in the
> > single call site that needs it?
>
> Sounds good, let me revert previous commit then pull it.
>

Hi all

Now md_reap_sync_thread is called by __md_stop_writes, action_store
and md_check_recovery.
If we move md_unregister_thread out of md_reap_sync_thread and unlock
reconfig_mutex before
calling md_unregister_thread, it means we break the atomic action for
these three functions mentioned
above. Not sure if it can introduce other problems, especially for the
change of md_check_recovery.

How about suspend I/O when changing the sync action? It should not
hurt the performance, because
it only interrupts the sync action and flush the in memory stripes.
For this way, it needs to revert
7e6ba434cc60 (md: don't unregister sync_thread with reconfig_mutex
held) and changes like this:

diff --git a/drivers/md/md.c b/drivers/md/md.c
index ce9d2845d3ac..af28cdeaf7e1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4827,12 +4827,14 @@ action_store(struct mddev *mddev, const char
*page, size_t len)
                        clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
                if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
                    mddev_lock(mddev) == 0) {
+                       mddev_suspend(mddev);
                        if (work_pending(&mddev->del_work))
                                flush_workqueue(md_misc_wq);
                        if (mddev->sync_thread) {
                                set_bit(MD_RECOVERY_INTR, &mddev->recovery);
                                md_reap_sync_thread(mddev);
                        }
+                       mddev_resume(mddev);
                        mddev_unlock(mddev);
                }
        } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))

mddev_suspend can stop the I/O and the superblock can be updated too. Because
MD_ALLOW_SB_UPDATE is set. So raid5d can go on handling stripes.

Best Regards
Xiao


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: only unlock mddev from action_store
  2022-06-06  8:39     ` Xiao Ni
@ 2022-06-06  9:00       ` Guoqing Jiang
  2022-06-06  9:36         ` Xiao Ni
  0 siblings, 1 reply; 12+ messages in thread
From: Guoqing Jiang @ 2022-06-06  9:00 UTC (permalink / raw)
  To: Xiao Ni; +Cc: Logan Gunthorpe, Song Liu, linux-raid, Donald Buczek



On 6/6/22 4:39 PM, Xiao Ni wrote:
> On Mon, Jun 6, 2022 at 11:08 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>>
>>
>>>> +    }
>>> Maybe instead of the ugly boolean argument we could pull
>>> md_unregister_thread() into all the callers and explicitly unlock in the
>>> single call site that needs it?
>> Sounds good, let me revert previous commit then pull it.
>>
> Hi all
>
> Now md_reap_sync_thread is called by __md_stop_writes, action_store
> and md_check_recovery.
> If we move md_unregister_thread out of md_reap_sync_thread and unlock
> reconfig_mutex before
> calling md_unregister_thread, it means we break the atomic action for
> these three functions mentioned
> above.

No, only action_store need to be changed, other callers keep the original
behavior.

>   Not sure if it can introduce other problems, especially for the
> change of md_check_recovery.
>
> How about suspend I/O when changing the sync action? It should not
> hurt the performance, because
> it only interrupts the sync action and flush the in memory stripes.
> For this way, it needs to revert
> 7e6ba434cc60 (md: don't unregister sync_thread with reconfig_mutex
> held) and changes like this:
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index ce9d2845d3ac..af28cdeaf7e1 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4827,12 +4827,14 @@ action_store(struct mddev *mddev, const char
> *page, size_t len)
>                          clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>                  if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
>                      mddev_lock(mddev) == 0) {
> +                       mddev_suspend(mddev);
>                          if (work_pending(&mddev->del_work))
>                                  flush_workqueue(md_misc_wq);
>                          if (mddev->sync_thread) {
>                                  set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>                                  md_reap_sync_thread(mddev);
>                          }
> +                       mddev_resume(mddev);
>                          mddev_unlock(mddev);
>                  }
>          } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))

The action actually means sync action which are for internal IO instead
of external IO, I suppose the semantic is different with above change.

And there are lots of sync/locking actions in suspend path, I am not
sure it will cause other locking issues, you may need to investigate it.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: only unlock mddev from action_store
  2022-06-06  9:00       ` Guoqing Jiang
@ 2022-06-06  9:36         ` Xiao Ni
  2022-06-06  9:55           ` Guoqing Jiang
  0 siblings, 1 reply; 12+ messages in thread
From: Xiao Ni @ 2022-06-06  9:36 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Logan Gunthorpe, Song Liu, linux-raid, Donald Buczek

On Mon, Jun 6, 2022 at 5:00 PM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
>
>
> On 6/6/22 4:39 PM, Xiao Ni wrote:
> > On Mon, Jun 6, 2022 at 11:08 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
> >>
> >>
> >>>> +    }
> >>> Maybe instead of the ugly boolean argument we could pull
> >>> md_unregister_thread() into all the callers and explicitly unlock in the
> >>> single call site that needs it?
> >> Sounds good, let me revert previous commit then pull it.
> >>
> > Hi all
> >
> > Now md_reap_sync_thread is called by __md_stop_writes, action_store
> > and md_check_recovery.
> > If we move md_unregister_thread out of md_reap_sync_thread and unlock
> > reconfig_mutex before
> > calling md_unregister_thread, it means we break the atomic action for
> > these three functions mentioned
> > above.
>
> No, only action_store need to be changed, other callers keep the original
> behavior.

Hi Guoqing

Thanks for pointing out this.

>
> >   Not sure if it can introduce other problems, especially for the
> > change of md_check_recovery.
> >
> > How about suspend I/O when changing the sync action? It should not
> > hurt the performance, because
> > it only interrupts the sync action and flush the in memory stripes.
> > For this way, it needs to revert
> > 7e6ba434cc60 (md: don't unregister sync_thread with reconfig_mutex
> > held) and changes like this:
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index ce9d2845d3ac..af28cdeaf7e1 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -4827,12 +4827,14 @@ action_store(struct mddev *mddev, const char
> > *page, size_t len)
> >                          clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> >                  if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
> >                      mddev_lock(mddev) == 0) {
> > +                       mddev_suspend(mddev);
> >                          if (work_pending(&mddev->del_work))
> >                                  flush_workqueue(md_misc_wq);
> >                          if (mddev->sync_thread) {
> >                                  set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> >                                  md_reap_sync_thread(mddev);
> >                          }
> > +                       mddev_resume(mddev);
> >                          mddev_unlock(mddev);
> >                  }
> >          } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
>
> The action actually means sync action which are for internal IO instead
> of external IO, I suppose the semantic is different with above change.

The original problem should be i/o happen when interrupting the sync thread?
So the external i/o calls md_write_start and sets MD_SB_CHANGE_PENDING.
Then raid5d->md_check_recovery can't update superblock and handle internal
stripes. So the `echo idle` action is stuck.

>
> And there are lots of sync/locking actions in suspend path, I am not
> sure it will cause other locking issues, you may need to investigate it.

Yes, mddev_supsend needs those sync methods to keep the raid in a quiescent
state. First, it needs to get reconfig_mutex before calling mddev_supsend.
So it can avoid racing with all the actions that want to change the raid. Then
it waits mddev->active_io to 0 which means all submit bio processes stop
submitting io. Then it waits until all internal i/os finish by
pers->quiesce. Last it waits
until the superblock is updated.

From the logic, it looks safe. By the way, the patch 35bfc52187f
(md: allow metadata update while suspending) which introduces
MD_UPDATING_SB and MD_UPDATING_SB fixes the similar deadlock
problem. So in this problem, it looks like mddev_suspend is a good choice.

As you mentioned, it needs to consider and do tests more because of
the complex sync/locking actions.

Best Regards
Xiao


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: only unlock mddev from action_store
  2022-06-06  9:36         ` Xiao Ni
@ 2022-06-06  9:55           ` Guoqing Jiang
  2022-06-06 11:54             ` Xiao Ni
  0 siblings, 1 reply; 12+ messages in thread
From: Guoqing Jiang @ 2022-06-06  9:55 UTC (permalink / raw)
  To: Xiao Ni; +Cc: Logan Gunthorpe, Song Liu, linux-raid, Donald Buczek



On 6/6/22 5:36 PM, Xiao Ni wrote:
>>> @@ -4827,12 +4827,14 @@ action_store(struct mddev *mddev, const char
>>> *page, size_t len)
>>>                           clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>>>                   if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
>>>                       mddev_lock(mddev) == 0) {
>>> +                       mddev_suspend(mddev);
>>>                           if (work_pending(&mddev->del_work))
>>>                                   flush_workqueue(md_misc_wq);
>>>                           if (mddev->sync_thread) {
>>>                                   set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>>                                   md_reap_sync_thread(mddev);
>>>                           }
>>> +                       mddev_resume(mddev);
>>>                           mddev_unlock(mddev);
>>>                   }
>>>           } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
>> The action actually means sync action which are for internal IO instead
>> of external IO, I suppose the semantic is different with above change.
> The original problem should be i/o happen when interrupting the sync thread?
> So the external i/o calls md_write_start and sets MD_SB_CHANGE_PENDING.
> Then raid5d->md_check_recovery can't update superblock and handle internal
> stripes. So the `echo idle` action is stuck.

My point is action_store shouldn't disturb external IO, it should only 
deal with
sync IO and relevant stuffs as the function is for sync_action node, no?

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: only unlock mddev from action_store
  2022-06-06  9:55           ` Guoqing Jiang
@ 2022-06-06 11:54             ` Xiao Ni
  0 siblings, 0 replies; 12+ messages in thread
From: Xiao Ni @ 2022-06-06 11:54 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Logan Gunthorpe, Song Liu, linux-raid, Donald Buczek

On Mon, Jun 6, 2022 at 5:55 PM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
>
>
> On 6/6/22 5:36 PM, Xiao Ni wrote:
> >>> @@ -4827,12 +4827,14 @@ action_store(struct mddev *mddev, const char
> >>> *page, size_t len)
> >>>                           clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> >>>                   if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
> >>>                       mddev_lock(mddev) == 0) {
> >>> +                       mddev_suspend(mddev);
> >>>                           if (work_pending(&mddev->del_work))
> >>>                                   flush_workqueue(md_misc_wq);
> >>>                           if (mddev->sync_thread) {
> >>>                                   set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> >>>                                   md_reap_sync_thread(mddev);
> >>>                           }
> >>> +                       mddev_resume(mddev);
> >>>                           mddev_unlock(mddev);
> >>>                   }
> >>>           } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
> >> The action actually means sync action which are for internal IO instead
> >> of external IO, I suppose the semantic is different with above change.
> > The original problem should be i/o happen when interrupting the sync thread?
> > So the external i/o calls md_write_start and sets MD_SB_CHANGE_PENDING.
> > Then raid5d->md_check_recovery can't update superblock and handle internal
> > stripes. So the `echo idle` action is stuck.
>
> My point is action_store shouldn't disturb external IO, it should only
> deal with
> sync IO and relevant stuffs as the function is for sync_action node, no?

Yes, it's a change that was mentioned above. It depends on which way
we want to go.
In some sysfs file store functions, it also calls mddev_suspend. For
some situations,
it needs to choose if we need to stop the i/o temporarily.

Anyway, it's only a possible method I want to discuss. It's very good
for me to dig into this problem.
Now I have a better understanding of those codes :)

Best Regards
Xiao


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: only unlock mddev from action_store
  2022-06-06  3:29   ` Guoqing Jiang
@ 2022-06-06 15:48     ` Logan Gunthorpe
  2022-06-07  1:20       ` Guoqing Jiang
  0 siblings, 1 reply; 12+ messages in thread
From: Logan Gunthorpe @ 2022-06-06 15:48 UTC (permalink / raw)
  To: Guoqing Jiang, song; +Cc: linux-raid, buczek



On 2022-06-05 21:29, Guoqing Jiang wrote:
> 
> 
> On 6/3/22 2:03 AM, Logan Gunthorpe wrote:
>>
>>
>> On 2022-06-02 07:45, Guoqing Jiang wrote:
>>> The 07reshape5intr test is broke because of below path.
>>>
>>>      md_reap_sync_thread
>>>              -> mddev_unlock
>>>              -> md_unregister_thread(&mddev->sync_thread)
>>>
>>> And md_check_recovery is triggered by,
>>>
>>> mddev_unlock -> md_wakeup_thread(mddev->thread)
>>>
>>> then mddev->reshape_position is set to MaxSector in raid5_finish_reshape
>>> since MD_RECOVERY_INTR is cleared in md_check_recovery, which means
>>> feature_map is not set with MD_FEATURE_RESHAPE_ACTIVE and superblock's
>>> reshape_position can't be updated accordingly.
>>>
>>> Since the bug which commit 8b48ec23cc51a ("md: don't unregister sync_thread
>>> with reconfig_mutex held") fixed is related with action_store path, other
>>> callers which reap sync_thread didn't need to be changed, let's
>>>
>>> 1. only unlock mddev in md_reap_sync_thread if caller is action_store,
>>>     so the parameter is renamed to reflect the change.
>>> 2. save some contexts (MD_RECOVERY_INTR and reshape_position) since they
>>>     could be changed by other processes, then restore them after get lock
>>>     again.
>>>
>>> Fixes: 8b48ec23cc51a ("md: don't unregister sync_thread with reconfig_mutex held")
>>> Reported-by: Logan Gunthorpe <logang@deltatee.com>
>>> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
>>> ---
>>> I suppose the previous bug still can be fixed with the change, but it is
>>> better to verify it. Donald, could you help to test the new code?
>>>
>>> Thanks,
>>> Guoqing
>>>
>>>   drivers/md/md.c | 24 ++++++++++++++++++------
>>>   drivers/md/md.h |  2 +-
>>>   2 files changed, 19 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index 5c8efef13881..3387260dd55b 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -6197,7 +6197,7 @@ static void __md_stop_writes(struct mddev *mddev)
>>>   		flush_workqueue(md_misc_wq);
>>>   	if (mddev->sync_thread) {
>>>   		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>> -		md_reap_sync_thread(mddev, true);
>>> +		md_reap_sync_thread(mddev, false);
>>>   	}
>>>   
>>>   	del_timer_sync(&mddev->safemode_timer);
>>> @@ -9303,7 +9303,7 @@ void md_check_recovery(struct mddev *mddev)
>>>   			 * ->spare_active and clear saved_raid_disk
>>>   			 */
>>>   			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>> -			md_reap_sync_thread(mddev, true);
>>> +			md_reap_sync_thread(mddev, false);
>>>   			clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
>>>   			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>>>   			clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags);
>>> @@ -9338,7 +9338,7 @@ void md_check_recovery(struct mddev *mddev)
>>>   			goto unlock;
>>>   		}
>>>   		if (mddev->sync_thread) {
>>> -			md_reap_sync_thread(mddev, true);
>>> +			md_reap_sync_thread(mddev, false);
>>>   			goto unlock;
>>>   		}
>>>   		/* Set RUNNING before clearing NEEDED to avoid
>>> @@ -9411,18 +9411,30 @@ void md_check_recovery(struct mddev *mddev)
>>>   }
>>>   EXPORT_SYMBOL(md_check_recovery);
>>>   
>>> -void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held)
>>> +void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev)
>>>   {
>>>   	struct md_rdev *rdev;
>>>   	sector_t old_dev_sectors = mddev->dev_sectors;
>>> +	sector_t old_reshape_position;
>>>   	bool is_reshaped = false;
>>> +	bool is_interrupted = false;
>>>   
>>> -	if (reconfig_mutex_held)
>>> +	if (unlock_mddev) {
>>> +		old_reshape_position = mddev->reshape_position;
>>> +		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
>>> +			is_interrupted = true;
>>>   		mddev_unlock(mddev);
>>> +	}
>>>   	/* resync has finished, collect result */
>>>   	md_unregister_thread(&mddev->sync_thread);
>>> -	if (reconfig_mutex_held)
>>> +	if (unlock_mddev) {
>>>   		mddev_lock_nointr(mddev);
>>> +		/* restore the previous flag and position */
>>> +		mddev->reshape_position = old_reshape_position;
>>> +		if (is_interrupted)
>>> +			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
>>> +	}
>> Maybe instead of the ugly boolean argument we could pull
>> md_unregister_thread() into all the callers and explicitly unlock in the
>> single call site that needs it?
> 
> After move "md_unregister_thread(&mddev->sync_thread)", then we need to
> rename md_reap_sync_thread given it doesn't unregister sync_thread, any
> suggestion about the new name? md_behind_reap_sync_thread?

I don't like the "behind"... Which would be a name suggesting when the
function should be called, not what the function does.

I'd maybe go with something like md_cleanup_sync_thread()

Logan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: only unlock mddev from action_store
  2022-06-06 15:48     ` Logan Gunthorpe
@ 2022-06-07  1:20       ` Guoqing Jiang
  0 siblings, 0 replies; 12+ messages in thread
From: Guoqing Jiang @ 2022-06-07  1:20 UTC (permalink / raw)
  To: Logan Gunthorpe, song; +Cc: linux-raid, buczek



On 6/6/22 11:48 PM, Logan Gunthorpe wrote:
>> After move "md_unregister_thread(&mddev->sync_thread)", then we need to
>> rename md_reap_sync_thread given it doesn't unregister sync_thread, any
>> suggestion about the new name? md_behind_reap_sync_thread?
> I don't like the "behind"... Which would be a name suggesting when the
> function should be called, not what the function does.

Right, naming is hard.

> I'd maybe go with something like md_cleanup_sync_thread()

It is confusing since it doesn't related with sync_thread anymore after 
the pull.
I will keep the name unchanged for now.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-06-07  1:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-02 13:45 [PATCH] md: only unlock mddev from action_store Guoqing Jiang
2022-06-02 18:03 ` Logan Gunthorpe
2022-06-06  3:08   ` Guoqing Jiang
2022-06-06  8:39     ` Xiao Ni
2022-06-06  9:00       ` Guoqing Jiang
2022-06-06  9:36         ` Xiao Ni
2022-06-06  9:55           ` Guoqing Jiang
2022-06-06 11:54             ` Xiao Ni
2022-06-06  3:29   ` Guoqing Jiang
2022-06-06 15:48     ` Logan Gunthorpe
2022-06-07  1:20       ` Guoqing Jiang
2022-06-03  6:24 ` Donald Buczek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.