All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log()
@ 2023-06-28  1:07 Yu Kuai
  2023-06-28  1:07 ` [PATCH -next v2 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" Yu Kuai
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Yu Kuai @ 2023-06-28  1:07 UTC (permalink / raw)
  To: xni, logang, hch, song, shli
  Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Changes in v2:
 - remove a now unused local variable in patch 2;

Commit b13015af94cf ("md/raid5-cache: Clear conf->log after finishing
work") introduce a new problem:

// caller hold reconfig_mutex
r5l_exit_log
 flush_work(&log->disable_writeback_work)
			r5c_disable_writeback_async
			 wait_event
			  /*
			   * conf->log is not NULL, and mddev_trylock()
			   * will fail, wait_event() can never pass.
			   */
 conf->log = NULL

patch 1 revert this patch, an patch 2 fix the original problem in a
different way.

Noted this problem is just found by code review, and I think this is
probably the reason that some mdadm tests is broken.

Yu Kuai (2):
  md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after
    finishing work"
  md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()

 drivers/md/raid5-cache.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH -next v2 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work"
  2023-06-28  1:07 [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai
@ 2023-06-28  1:07 ` Yu Kuai
  2023-06-28  1:07 ` [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai
  2023-07-07  1:24 ` [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai
  2 siblings, 0 replies; 10+ messages in thread
From: Yu Kuai @ 2023-06-28  1:07 UTC (permalink / raw)
  To: xni, logang, hch, song, shli
  Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

This reverts commit b13015af94cf405f73ff64ce0797269554020c37.

Because this will cause that r5c_disable_writeback_async() to wait
forever, since caller hold reconfig_mutex and conf->log is not NULL:

wait_event
 conf->log == NULL ||
 (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
  (locked = mddev_trylock(mddev)))

This problem is found by code review, and the null-ptr-deref this patch
fixed will be fixed by another approch in the next patch.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5-cache.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 47ba7d9e81e1..083288e36949 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -3168,13 +3168,12 @@ void r5l_exit_log(struct r5conf *conf)
 {
 	struct r5l_log *log = conf->log;
 
+	conf->log = NULL;
+
 	/* Ensure disable_writeback_work wakes up and exits */
 	wake_up(&conf->mddev->sb_wait);
 	flush_work(&log->disable_writeback_work);
 	md_unregister_thread(&log->reclaim_thread);
-
-	conf->log = NULL;
-
 	mempool_exit(&log->meta_pool);
 	bioset_exit(&log->bs);
 	mempool_exit(&log->io_pool);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()
  2023-06-28  1:07 [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai
  2023-06-28  1:07 ` [PATCH -next v2 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" Yu Kuai
@ 2023-06-28  1:07 ` Yu Kuai
  2023-07-07  8:52   ` Song Liu
  2023-07-07  1:24 ` [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai
  2 siblings, 1 reply; 10+ messages in thread
From: Yu Kuai @ 2023-06-28  1:07 UTC (permalink / raw)
  To: xni, logang, hch, song, shli
  Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

r5l_reclaim_thread() already check that 'conf->log' is not NULL in the
beginning, however, r5c_do_reclaim() and r5l_do_reclaim() will
dereference 'conf->log' again, which will cause null-ptr-deref if
'conf->log' is set to NULL from r5l_exit_log() concurrently.

Fix this problem by don't dereference 'conf->log' again in
r5c_do_reclaim() and r5c_do_reclaim().

Fixes: a39f7afde358 ("md/r5cache: write-out phase and reclaim support")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5-cache.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 083288e36949..ba6fc146d265 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -1148,10 +1148,9 @@ static void r5l_run_no_space_stripes(struct r5l_log *log)
  * for write through mode, returns log->next_checkpoint
  * for write back, returns log_start of first sh in stripe_in_journal_list
  */
-static sector_t r5c_calculate_new_cp(struct r5conf *conf)
+static sector_t r5c_calculate_new_cp(struct r5l_log *log)
 {
 	struct stripe_head *sh;
-	struct r5l_log *log = conf->log;
 	sector_t new_cp;
 	unsigned long flags;
 
@@ -1159,12 +1158,12 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf)
 		return log->next_checkpoint;
 
 	spin_lock_irqsave(&log->stripe_in_journal_lock, flags);
-	if (list_empty(&conf->log->stripe_in_journal_list)) {
+	if (list_empty(&log->stripe_in_journal_list)) {
 		/* all stripes flushed */
 		spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
 		return log->next_checkpoint;
 	}
-	sh = list_first_entry(&conf->log->stripe_in_journal_list,
+	sh = list_first_entry(&log->stripe_in_journal_list,
 			      struct stripe_head, r5c);
 	new_cp = sh->log_start;
 	spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
@@ -1173,10 +1172,8 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf)
 
 static sector_t r5l_reclaimable_space(struct r5l_log *log)
 {
-	struct r5conf *conf = log->rdev->mddev->private;
-
 	return r5l_ring_distance(log, log->last_checkpoint,
-				 r5c_calculate_new_cp(conf));
+				 r5c_calculate_new_cp(log));
 }
 
 static void r5l_run_no_mem_stripe(struct r5l_log *log)
@@ -1419,9 +1416,9 @@ void r5c_flush_cache(struct r5conf *conf, int num)
 	}
 }
 
-static void r5c_do_reclaim(struct r5conf *conf)
+static void r5c_do_reclaim(struct r5l_log *log)
 {
-	struct r5l_log *log = conf->log;
+	struct r5conf *conf = log->rdev->mddev->private;
 	struct stripe_head *sh;
 	int count = 0;
 	unsigned long flags;
@@ -1496,7 +1493,6 @@ static void r5c_do_reclaim(struct r5conf *conf)
 
 static void r5l_do_reclaim(struct r5l_log *log)
 {
-	struct r5conf *conf = log->rdev->mddev->private;
 	sector_t reclaim_target = xchg(&log->reclaim_target, 0);
 	sector_t reclaimable;
 	sector_t next_checkpoint;
@@ -1525,7 +1521,7 @@ static void r5l_do_reclaim(struct r5l_log *log)
 				    log->io_list_lock);
 	}
 
-	next_checkpoint = r5c_calculate_new_cp(conf);
+	next_checkpoint = r5c_calculate_new_cp(log);
 	spin_unlock_irq(&log->io_list_lock);
 
 	if (reclaimable == 0 || !write_super)
@@ -1554,7 +1550,7 @@ static void r5l_reclaim_thread(struct md_thread *thread)
 
 	if (!log)
 		return;
-	r5c_do_reclaim(conf);
+	r5c_do_reclaim(log);
 	r5l_do_reclaim(log);
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log()
  2023-06-28  1:07 [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai
  2023-06-28  1:07 ` [PATCH -next v2 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" Yu Kuai
  2023-06-28  1:07 ` [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai
@ 2023-07-07  1:24 ` Yu Kuai
  2 siblings, 0 replies; 10+ messages in thread
From: Yu Kuai @ 2023-07-07  1:24 UTC (permalink / raw)
  To: Yu Kuai, xni, logang, hch, song, shli
  Cc: linux-raid, linux-kernel, yi.zhang, yangerkun, yukuai (C)

Hi,

在 2023/06/28 9:07, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Changes in v2:
>   - remove a now unused local variable in patch 2;
> 
> Commit b13015af94cf ("md/raid5-cache: Clear conf->log after finishing
> work") introduce a new problem:
> 
> // caller hold reconfig_mutex
> r5l_exit_log
>   flush_work(&log->disable_writeback_work)
> 			r5c_disable_writeback_async
> 			 wait_event
> 			  /*
> 			   * conf->log is not NULL, and mddev_trylock()
> 			   * will fail, wait_event() can never pass.
> 			   */
>   conf->log = NULL
> 
> patch 1 revert this patch, an patch 2 fix the original problem in a
> different way.
> 
> Noted this problem is just found by code review, and I think this is
> probably the reason that some mdadm tests is broken.

Any suggestions?

By the way, while taking another look at this problem, I think probably
read and write 'conf->log' should use READ_ONCE and WRITE_ONCE.

Thanks,
Kuai
> 
> Yu Kuai (2):
>    md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after
>      finishing work"
>    md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()
> 
>   drivers/md/raid5-cache.c | 25 ++++++++++---------------
>   1 file changed, 10 insertions(+), 15 deletions(-)
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()
  2023-06-28  1:07 ` [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai
@ 2023-07-07  8:52   ` Song Liu
  2023-07-07  9:06     ` Yu Kuai
  0 siblings, 1 reply; 10+ messages in thread
From: Song Liu @ 2023-07-07  8:52 UTC (permalink / raw)
  To: Yu Kuai
  Cc: xni, logang, hch, shli, linux-raid, linux-kernel, yukuai3,
	yi.zhang, yangerkun

On Wed, Jun 28, 2023 at 9:08 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> r5l_reclaim_thread() already check that 'conf->log' is not NULL in the
> beginning, however, r5c_do_reclaim() and r5l_do_reclaim() will
> dereference 'conf->log' again, which will cause null-ptr-deref if
> 'conf->log' is set to NULL from r5l_exit_log() concurrently.

r5l_exit_log() will wait until reclaim_thread() finishes, and then set
conf->log to NULL. So this is not a problem, no?

Thanks,
Song

>
> Fix this problem by don't dereference 'conf->log' again in
> r5c_do_reclaim() and r5c_do_reclaim().
>
> Fixes: a39f7afde358 ("md/r5cache: write-out phase and reclaim support")
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/raid5-cache.c | 20 ++++++++------------
>  1 file changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> index 083288e36949..ba6fc146d265 100644
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -1148,10 +1148,9 @@ static void r5l_run_no_space_stripes(struct r5l_log *log)
>   * for write through mode, returns log->next_checkpoint
>   * for write back, returns log_start of first sh in stripe_in_journal_list
>   */
> -static sector_t r5c_calculate_new_cp(struct r5conf *conf)
> +static sector_t r5c_calculate_new_cp(struct r5l_log *log)
>  {
>         struct stripe_head *sh;
> -       struct r5l_log *log = conf->log;
>         sector_t new_cp;
>         unsigned long flags;
>
> @@ -1159,12 +1158,12 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf)
>                 return log->next_checkpoint;
>
>         spin_lock_irqsave(&log->stripe_in_journal_lock, flags);
> -       if (list_empty(&conf->log->stripe_in_journal_list)) {
> +       if (list_empty(&log->stripe_in_journal_list)) {
>                 /* all stripes flushed */
>                 spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
>                 return log->next_checkpoint;
>         }
> -       sh = list_first_entry(&conf->log->stripe_in_journal_list,
> +       sh = list_first_entry(&log->stripe_in_journal_list,
>                               struct stripe_head, r5c);
>         new_cp = sh->log_start;
>         spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
> @@ -1173,10 +1172,8 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf)
>
>  static sector_t r5l_reclaimable_space(struct r5l_log *log)
>  {
> -       struct r5conf *conf = log->rdev->mddev->private;
> -
>         return r5l_ring_distance(log, log->last_checkpoint,
> -                                r5c_calculate_new_cp(conf));
> +                                r5c_calculate_new_cp(log));
>  }
>
>  static void r5l_run_no_mem_stripe(struct r5l_log *log)
> @@ -1419,9 +1416,9 @@ void r5c_flush_cache(struct r5conf *conf, int num)
>         }
>  }
>
> -static void r5c_do_reclaim(struct r5conf *conf)
> +static void r5c_do_reclaim(struct r5l_log *log)
>  {
> -       struct r5l_log *log = conf->log;
> +       struct r5conf *conf = log->rdev->mddev->private;
>         struct stripe_head *sh;
>         int count = 0;
>         unsigned long flags;
> @@ -1496,7 +1493,6 @@ static void r5c_do_reclaim(struct r5conf *conf)
>
>  static void r5l_do_reclaim(struct r5l_log *log)
>  {
> -       struct r5conf *conf = log->rdev->mddev->private;
>         sector_t reclaim_target = xchg(&log->reclaim_target, 0);
>         sector_t reclaimable;
>         sector_t next_checkpoint;
> @@ -1525,7 +1521,7 @@ static void r5l_do_reclaim(struct r5l_log *log)
>                                     log->io_list_lock);
>         }
>
> -       next_checkpoint = r5c_calculate_new_cp(conf);
> +       next_checkpoint = r5c_calculate_new_cp(log);
>         spin_unlock_irq(&log->io_list_lock);
>
>         if (reclaimable == 0 || !write_super)
> @@ -1554,7 +1550,7 @@ static void r5l_reclaim_thread(struct md_thread *thread)
>
>         if (!log)
>                 return;
> -       r5c_do_reclaim(conf);
> +       r5c_do_reclaim(log);
>         r5l_do_reclaim(log);
>  }
>
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()
  2023-07-07  8:52   ` Song Liu
@ 2023-07-07  9:06     ` Yu Kuai
  2023-07-07  9:16       ` Yu Kuai
  0 siblings, 1 reply; 10+ messages in thread
From: Yu Kuai @ 2023-07-07  9:06 UTC (permalink / raw)
  To: Song Liu, Yu Kuai
  Cc: xni, logang, hch, shli, linux-raid, linux-kernel, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2023/07/07 16:52, Song Liu 写道:
> On Wed, Jun 28, 2023 at 9:08 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> r5l_reclaim_thread() already check that 'conf->log' is not NULL in the
>> beginning, however, r5c_do_reclaim() and r5l_do_reclaim() will
>> dereference 'conf->log' again, which will cause null-ptr-deref if
>> 'conf->log' is set to NULL from r5l_exit_log() concurrently.
> 
> r5l_exit_log() will wait until reclaim_thread() finishes, and then set
> conf->log to NULL. So this is not a problem, no?

Patch one just revert this, wait until reclaim_thread() then set
conf->log to NULL will cause deadlock, as I sescribled in patch 0.

Thanks,
Kuai
> 
> Thanks,
> Song
> 
>>
>> Fix this problem by don't dereference 'conf->log' again in
>> r5c_do_reclaim() and r5c_do_reclaim().
>>
>> Fixes: a39f7afde358 ("md/r5cache: write-out phase and reclaim support")
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/md/raid5-cache.c | 20 ++++++++------------
>>   1 file changed, 8 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
>> index 083288e36949..ba6fc146d265 100644
>> --- a/drivers/md/raid5-cache.c
>> +++ b/drivers/md/raid5-cache.c
>> @@ -1148,10 +1148,9 @@ static void r5l_run_no_space_stripes(struct r5l_log *log)
>>    * for write through mode, returns log->next_checkpoint
>>    * for write back, returns log_start of first sh in stripe_in_journal_list
>>    */
>> -static sector_t r5c_calculate_new_cp(struct r5conf *conf)
>> +static sector_t r5c_calculate_new_cp(struct r5l_log *log)
>>   {
>>          struct stripe_head *sh;
>> -       struct r5l_log *log = conf->log;
>>          sector_t new_cp;
>>          unsigned long flags;
>>
>> @@ -1159,12 +1158,12 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf)
>>                  return log->next_checkpoint;
>>
>>          spin_lock_irqsave(&log->stripe_in_journal_lock, flags);
>> -       if (list_empty(&conf->log->stripe_in_journal_list)) {
>> +       if (list_empty(&log->stripe_in_journal_list)) {
>>                  /* all stripes flushed */
>>                  spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
>>                  return log->next_checkpoint;
>>          }
>> -       sh = list_first_entry(&conf->log->stripe_in_journal_list,
>> +       sh = list_first_entry(&log->stripe_in_journal_list,
>>                                struct stripe_head, r5c);
>>          new_cp = sh->log_start;
>>          spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
>> @@ -1173,10 +1172,8 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf)
>>
>>   static sector_t r5l_reclaimable_space(struct r5l_log *log)
>>   {
>> -       struct r5conf *conf = log->rdev->mddev->private;
>> -
>>          return r5l_ring_distance(log, log->last_checkpoint,
>> -                                r5c_calculate_new_cp(conf));
>> +                                r5c_calculate_new_cp(log));
>>   }
>>
>>   static void r5l_run_no_mem_stripe(struct r5l_log *log)
>> @@ -1419,9 +1416,9 @@ void r5c_flush_cache(struct r5conf *conf, int num)
>>          }
>>   }
>>
>> -static void r5c_do_reclaim(struct r5conf *conf)
>> +static void r5c_do_reclaim(struct r5l_log *log)
>>   {
>> -       struct r5l_log *log = conf->log;
>> +       struct r5conf *conf = log->rdev->mddev->private;
>>          struct stripe_head *sh;
>>          int count = 0;
>>          unsigned long flags;
>> @@ -1496,7 +1493,6 @@ static void r5c_do_reclaim(struct r5conf *conf)
>>
>>   static void r5l_do_reclaim(struct r5l_log *log)
>>   {
>> -       struct r5conf *conf = log->rdev->mddev->private;
>>          sector_t reclaim_target = xchg(&log->reclaim_target, 0);
>>          sector_t reclaimable;
>>          sector_t next_checkpoint;
>> @@ -1525,7 +1521,7 @@ static void r5l_do_reclaim(struct r5l_log *log)
>>                                      log->io_list_lock);
>>          }
>>
>> -       next_checkpoint = r5c_calculate_new_cp(conf);
>> +       next_checkpoint = r5c_calculate_new_cp(log);
>>          spin_unlock_irq(&log->io_list_lock);
>>
>>          if (reclaimable == 0 || !write_super)
>> @@ -1554,7 +1550,7 @@ static void r5l_reclaim_thread(struct md_thread *thread)
>>
>>          if (!log)
>>                  return;
>> -       r5c_do_reclaim(conf);
>> +       r5c_do_reclaim(log);
>>          r5l_do_reclaim(log);
>>   }
>>
>> --
>> 2.39.2
>>
> .
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()
  2023-07-07  9:06     ` Yu Kuai
@ 2023-07-07  9:16       ` Yu Kuai
  2023-07-07  9:19         ` Yu Kuai
  0 siblings, 1 reply; 10+ messages in thread
From: Yu Kuai @ 2023-07-07  9:16 UTC (permalink / raw)
  To: Yu Kuai, Song Liu
  Cc: xni, logang, hch, shli, linux-raid, linux-kernel, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2023/07/07 17:06, Yu Kuai 写道:
> Hi,
> 
> 在 2023/07/07 16:52, Song Liu 写道:
>> On Wed, Jun 28, 2023 at 9:08 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> r5l_reclaim_thread() already check that 'conf->log' is not NULL in the
>>> beginning, however, r5c_do_reclaim() and r5l_do_reclaim() will
>>> dereference 'conf->log' again, which will cause null-ptr-deref if
>>> 'conf->log' is set to NULL from r5l_exit_log() concurrently.
>>
>> r5l_exit_log() will wait until reclaim_thread() finishes, and then set
>> conf->log to NULL. So this is not a problem, no?

Perhaps you means this order?

r5l_exit_log
  flush_work(&log->disable_writeback_work)
  conf->log = NULL
  md_unregister_thread(&log->reclaim_thread)

I think this is better indeed.

Thanks,
Kuai
> 
> Patch one just revert this, wait until reclaim_thread() then set
> conf->log to NULL will cause deadlock, as I sescribled in patch 0.
> 
> Thanks,
> Kuai
>>
>> Thanks,
>> Song
>>
>>>
>>> Fix this problem by don't dereference 'conf->log' again in
>>> r5c_do_reclaim() and r5c_do_reclaim().
>>>
>>> Fixes: a39f7afde358 ("md/r5cache: write-out phase and reclaim support")
>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>> ---
>>>   drivers/md/raid5-cache.c | 20 ++++++++------------
>>>   1 file changed, 8 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
>>> index 083288e36949..ba6fc146d265 100644
>>> --- a/drivers/md/raid5-cache.c
>>> +++ b/drivers/md/raid5-cache.c
>>> @@ -1148,10 +1148,9 @@ static void r5l_run_no_space_stripes(struct 
>>> r5l_log *log)
>>>    * for write through mode, returns log->next_checkpoint
>>>    * for write back, returns log_start of first sh in 
>>> stripe_in_journal_list
>>>    */
>>> -static sector_t r5c_calculate_new_cp(struct r5conf *conf)
>>> +static sector_t r5c_calculate_new_cp(struct r5l_log *log)
>>>   {
>>>          struct stripe_head *sh;
>>> -       struct r5l_log *log = conf->log;
>>>          sector_t new_cp;
>>>          unsigned long flags;
>>>
>>> @@ -1159,12 +1158,12 @@ static sector_t r5c_calculate_new_cp(struct 
>>> r5conf *conf)
>>>                  return log->next_checkpoint;
>>>
>>>          spin_lock_irqsave(&log->stripe_in_journal_lock, flags);
>>> -       if (list_empty(&conf->log->stripe_in_journal_list)) {
>>> +       if (list_empty(&log->stripe_in_journal_list)) {
>>>                  /* all stripes flushed */
>>>                  spin_unlock_irqrestore(&log->stripe_in_journal_lock, 
>>> flags);
>>>                  return log->next_checkpoint;
>>>          }
>>> -       sh = list_first_entry(&conf->log->stripe_in_journal_list,
>>> +       sh = list_first_entry(&log->stripe_in_journal_list,
>>>                                struct stripe_head, r5c);
>>>          new_cp = sh->log_start;
>>>          spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
>>> @@ -1173,10 +1172,8 @@ static sector_t r5c_calculate_new_cp(struct 
>>> r5conf *conf)
>>>
>>>   static sector_t r5l_reclaimable_space(struct r5l_log *log)
>>>   {
>>> -       struct r5conf *conf = log->rdev->mddev->private;
>>> -
>>>          return r5l_ring_distance(log, log->last_checkpoint,
>>> -                                r5c_calculate_new_cp(conf));
>>> +                                r5c_calculate_new_cp(log));
>>>   }
>>>
>>>   static void r5l_run_no_mem_stripe(struct r5l_log *log)
>>> @@ -1419,9 +1416,9 @@ void r5c_flush_cache(struct r5conf *conf, int num)
>>>          }
>>>   }
>>>
>>> -static void r5c_do_reclaim(struct r5conf *conf)
>>> +static void r5c_do_reclaim(struct r5l_log *log)
>>>   {
>>> -       struct r5l_log *log = conf->log;
>>> +       struct r5conf *conf = log->rdev->mddev->private;
>>>          struct stripe_head *sh;
>>>          int count = 0;
>>>          unsigned long flags;
>>> @@ -1496,7 +1493,6 @@ static void r5c_do_reclaim(struct r5conf *conf)
>>>
>>>   static void r5l_do_reclaim(struct r5l_log *log)
>>>   {
>>> -       struct r5conf *conf = log->rdev->mddev->private;
>>>          sector_t reclaim_target = xchg(&log->reclaim_target, 0);
>>>          sector_t reclaimable;
>>>          sector_t next_checkpoint;
>>> @@ -1525,7 +1521,7 @@ static void r5l_do_reclaim(struct r5l_log *log)
>>>                                      log->io_list_lock);
>>>          }
>>>
>>> -       next_checkpoint = r5c_calculate_new_cp(conf);
>>> +       next_checkpoint = r5c_calculate_new_cp(log);
>>>          spin_unlock_irq(&log->io_list_lock);
>>>
>>>          if (reclaimable == 0 || !write_super)
>>> @@ -1554,7 +1550,7 @@ static void r5l_reclaim_thread(struct md_thread 
>>> *thread)
>>>
>>>          if (!log)
>>>                  return;
>>> -       r5c_do_reclaim(conf);
>>> +       r5c_do_reclaim(log);
>>>          r5l_do_reclaim(log);
>>>   }
>>>
>>> -- 
>>> 2.39.2
>>>
>> .
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()
  2023-07-07  9:16       ` Yu Kuai
@ 2023-07-07  9:19         ` Yu Kuai
  2023-07-07  9:36           ` Song Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Yu Kuai @ 2023-07-07  9:19 UTC (permalink / raw)
  To: Yu Kuai, Song Liu
  Cc: xni, logang, hch, shli, linux-raid, linux-kernel, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2023/07/07 17:16, Yu Kuai 写道:
> Perhaps you means this order?
> 
> r5l_exit_log
>   flush_work(&log->disable_writeback_work)
>   conf->log = NULL
>   md_unregister_thread(&log->reclaim_thread)
> 
> I think this is better indeed.
Never mind, this is wrong, I got confused...

Please ignore this and take a look at my original fix.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()
  2023-07-07  9:19         ` Yu Kuai
@ 2023-07-07  9:36           ` Song Liu
  2023-07-08  2:41             ` Yu Kuai
  0 siblings, 1 reply; 10+ messages in thread
From: Song Liu @ 2023-07-07  9:36 UTC (permalink / raw)
  To: Yu Kuai
  Cc: xni, logang, hch, shli, linux-raid, linux-kernel, yi.zhang,
	yangerkun, yukuai (C)

On Fri, Jul 7, 2023 at 5:19 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/07/07 17:16, Yu Kuai 写道:
> > Perhaps you means this order?
> >
> > r5l_exit_log
> >   flush_work(&log->disable_writeback_work)
> >   conf->log = NULL
> >   md_unregister_thread(&log->reclaim_thread)
> >
> > I think this is better indeed.
> Never mind, this is wrong, I got confused...
>
> Please ignore this and take a look at my original fix.

How about

r5l_exit_log
  md_unregister_thread(&log->reclaim_thread)
  conf->log = NULL
  flush_work(&log->disable_writeback_work)

?

Thanks,
Song

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()
  2023-07-07  9:36           ` Song Liu
@ 2023-07-08  2:41             ` Yu Kuai
  0 siblings, 0 replies; 10+ messages in thread
From: Yu Kuai @ 2023-07-08  2:41 UTC (permalink / raw)
  To: Song Liu, Yu Kuai
  Cc: xni, logang, hch, shli, linux-raid, linux-kernel, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2023/07/07 17:36, Song Liu 写道:
> On Fri, Jul 7, 2023 at 5:19 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/07/07 17:16, Yu Kuai 写道:
>>> Perhaps you means this order?
>>>
>>> r5l_exit_log
>>>    flush_work(&log->disable_writeback_work)
>>>    conf->log = NULL
>>>    md_unregister_thread(&log->reclaim_thread)
>>>
>>> I think this is better indeed.
>> Never mind, this is wrong, I got confused...
>>
>> Please ignore this and take a look at my original fix.
> 
> How about
> 
> r5l_exit_log
>    md_unregister_thread(&log->reclaim_thread)
>    conf->log = NULL
>    flush_work(&log->disable_writeback_work)
> 
> ?

This looks correct, expect that wake_up() should be moved together.

I'll send a v2.

Thanks,
Kuai
> 
> Thanks,
> Song
> .
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-07-08  2:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-28  1:07 [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai
2023-06-28  1:07 ` [PATCH -next v2 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" Yu Kuai
2023-06-28  1:07 ` [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai
2023-07-07  8:52   ` Song Liu
2023-07-07  9:06     ` Yu Kuai
2023-07-07  9:16       ` Yu Kuai
2023-07-07  9:19         ` Yu Kuai
2023-07-07  9:36           ` Song Liu
2023-07-08  2:41             ` Yu Kuai
2023-07-07  1:24 ` [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.