* [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() @ 2023-06-28 1:07 Yu Kuai 2023-06-28 1:07 ` [PATCH -next v2 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" Yu Kuai ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Yu Kuai @ 2023-06-28 1:07 UTC (permalink / raw) To: xni, logang, hch, song, shli Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun From: Yu Kuai <yukuai3@huawei.com> Changes in v2: - remove a now unused local variable in patch 2; Commit b13015af94cf ("md/raid5-cache: Clear conf->log after finishing work") introduce a new problem: // caller hold reconfig_mutex r5l_exit_log flush_work(&log->disable_writeback_work) r5c_disable_writeback_async wait_event /* * conf->log is not NULL, and mddev_trylock() * will fail, wait_event() can never pass. */ conf->log = NULL patch 1 revert this patch, an patch 2 fix the original problem in a different way. Noted this problem is just found by code review, and I think this is probably the reason that some mdadm tests is broken. Yu Kuai (2): md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() drivers/md/raid5-cache.c | 25 ++++++++++--------------- 1 file changed, 10 insertions(+), 15 deletions(-) -- 2.39.2 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH -next v2 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" 2023-06-28 1:07 [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai @ 2023-06-28 1:07 ` Yu Kuai 2023-06-28 1:07 ` [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai 2023-07-07 1:24 ` [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai 2 siblings, 0 replies; 10+ messages in thread From: Yu Kuai @ 2023-06-28 1:07 UTC (permalink / raw) To: xni, logang, hch, song, shli Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun From: Yu Kuai <yukuai3@huawei.com> This reverts commit b13015af94cf405f73ff64ce0797269554020c37. Because this will cause that r5c_disable_writeback_async() to wait forever, since caller hold reconfig_mutex and conf->log is not NULL: wait_event conf->log == NULL || (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) && (locked = mddev_trylock(mddev))) This problem is found by code review, and the null-ptr-deref this patch fixed will be fixed by another approch in the next patch. Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- drivers/md/raid5-cache.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 47ba7d9e81e1..083288e36949 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -3168,13 +3168,12 @@ void r5l_exit_log(struct r5conf *conf) { struct r5l_log *log = conf->log; + conf->log = NULL; + /* Ensure disable_writeback_work wakes up and exits */ wake_up(&conf->mddev->sb_wait); flush_work(&log->disable_writeback_work); md_unregister_thread(&log->reclaim_thread); - - conf->log = NULL; - mempool_exit(&log->meta_pool); bioset_exit(&log->bs); mempool_exit(&log->io_pool); -- 2.39.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() 2023-06-28 1:07 [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai 2023-06-28 1:07 ` [PATCH -next v2 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" Yu Kuai @ 2023-06-28 1:07 ` Yu Kuai 2023-07-07 8:52 ` Song Liu 2023-07-07 1:24 ` [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai 2 siblings, 1 reply; 10+ messages in thread From: Yu Kuai @ 2023-06-28 1:07 UTC (permalink / raw) To: xni, logang, hch, song, shli Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun From: Yu Kuai <yukuai3@huawei.com> r5l_reclaim_thread() already check that 'conf->log' is not NULL in the beginning, however, r5c_do_reclaim() and r5l_do_reclaim() will dereference 'conf->log' again, which will cause null-ptr-deref if 'conf->log' is set to NULL from r5l_exit_log() concurrently. Fix this problem by don't dereference 'conf->log' again in r5c_do_reclaim() and r5c_do_reclaim(). Fixes: a39f7afde358 ("md/r5cache: write-out phase and reclaim support") Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- drivers/md/raid5-cache.c | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 083288e36949..ba6fc146d265 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -1148,10 +1148,9 @@ static void r5l_run_no_space_stripes(struct r5l_log *log) * for write through mode, returns log->next_checkpoint * for write back, returns log_start of first sh in stripe_in_journal_list */ -static sector_t r5c_calculate_new_cp(struct r5conf *conf) +static sector_t r5c_calculate_new_cp(struct r5l_log *log) { struct stripe_head *sh; - struct r5l_log *log = conf->log; sector_t new_cp; unsigned long flags; @@ -1159,12 +1158,12 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf) return log->next_checkpoint; spin_lock_irqsave(&log->stripe_in_journal_lock, flags); - if (list_empty(&conf->log->stripe_in_journal_list)) { + if (list_empty(&log->stripe_in_journal_list)) { /* all stripes flushed */ spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags); return log->next_checkpoint; } - sh = list_first_entry(&conf->log->stripe_in_journal_list, + sh = list_first_entry(&log->stripe_in_journal_list, struct stripe_head, r5c); new_cp = sh->log_start; spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags); @@ -1173,10 +1172,8 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf) static sector_t r5l_reclaimable_space(struct r5l_log *log) { - struct r5conf *conf = log->rdev->mddev->private; - return r5l_ring_distance(log, log->last_checkpoint, - r5c_calculate_new_cp(conf)); + r5c_calculate_new_cp(log)); } static void r5l_run_no_mem_stripe(struct r5l_log *log) @@ -1419,9 +1416,9 @@ void r5c_flush_cache(struct r5conf *conf, int num) } } -static void r5c_do_reclaim(struct r5conf *conf) +static void r5c_do_reclaim(struct r5l_log *log) { - struct r5l_log *log = conf->log; + struct r5conf *conf = log->rdev->mddev->private; struct stripe_head *sh; int count = 0; unsigned long flags; @@ -1496,7 +1493,6 @@ static void r5c_do_reclaim(struct r5conf *conf) static void r5l_do_reclaim(struct r5l_log *log) { - struct r5conf *conf = log->rdev->mddev->private; sector_t reclaim_target = xchg(&log->reclaim_target, 0); sector_t reclaimable; sector_t next_checkpoint; @@ -1525,7 +1521,7 @@ static void r5l_do_reclaim(struct r5l_log *log) log->io_list_lock); } - next_checkpoint = r5c_calculate_new_cp(conf); + next_checkpoint = r5c_calculate_new_cp(log); spin_unlock_irq(&log->io_list_lock); if (reclaimable == 0 || !write_super) @@ -1554,7 +1550,7 @@ static void r5l_reclaim_thread(struct md_thread *thread) if (!log) return; - r5c_do_reclaim(conf); + r5c_do_reclaim(log); r5l_do_reclaim(log); } -- 2.39.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() 2023-06-28 1:07 ` [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai @ 2023-07-07 8:52 ` Song Liu 2023-07-07 9:06 ` Yu Kuai 0 siblings, 1 reply; 10+ messages in thread From: Song Liu @ 2023-07-07 8:52 UTC (permalink / raw) To: Yu Kuai Cc: xni, logang, hch, shli, linux-raid, linux-kernel, yukuai3, yi.zhang, yangerkun On Wed, Jun 28, 2023 at 9:08 AM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > From: Yu Kuai <yukuai3@huawei.com> > > r5l_reclaim_thread() already check that 'conf->log' is not NULL in the > beginning, however, r5c_do_reclaim() and r5l_do_reclaim() will > dereference 'conf->log' again, which will cause null-ptr-deref if > 'conf->log' is set to NULL from r5l_exit_log() concurrently. r5l_exit_log() will wait until reclaim_thread() finishes, and then set conf->log to NULL. So this is not a problem, no? Thanks, Song > > Fix this problem by don't dereference 'conf->log' again in > r5c_do_reclaim() and r5c_do_reclaim(). > > Fixes: a39f7afde358 ("md/r5cache: write-out phase and reclaim support") > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- > drivers/md/raid5-cache.c | 20 ++++++++------------ > 1 file changed, 8 insertions(+), 12 deletions(-) > > diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c > index 083288e36949..ba6fc146d265 100644 > --- a/drivers/md/raid5-cache.c > +++ b/drivers/md/raid5-cache.c > @@ -1148,10 +1148,9 @@ static void r5l_run_no_space_stripes(struct r5l_log *log) > * for write through mode, returns log->next_checkpoint > * for write back, returns log_start of first sh in stripe_in_journal_list > */ > -static sector_t r5c_calculate_new_cp(struct r5conf *conf) > +static sector_t r5c_calculate_new_cp(struct r5l_log *log) > { > struct stripe_head *sh; > - struct r5l_log *log = conf->log; > sector_t new_cp; > unsigned long flags; > > @@ -1159,12 +1158,12 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf) > return log->next_checkpoint; > > spin_lock_irqsave(&log->stripe_in_journal_lock, flags); > - if (list_empty(&conf->log->stripe_in_journal_list)) { > + if (list_empty(&log->stripe_in_journal_list)) { > /* all stripes flushed */ > spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags); > return log->next_checkpoint; > } > - sh = list_first_entry(&conf->log->stripe_in_journal_list, > + sh = list_first_entry(&log->stripe_in_journal_list, > struct stripe_head, r5c); > new_cp = sh->log_start; > spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags); > @@ -1173,10 +1172,8 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf) > > static sector_t r5l_reclaimable_space(struct r5l_log *log) > { > - struct r5conf *conf = log->rdev->mddev->private; > - > return r5l_ring_distance(log, log->last_checkpoint, > - r5c_calculate_new_cp(conf)); > + r5c_calculate_new_cp(log)); > } > > static void r5l_run_no_mem_stripe(struct r5l_log *log) > @@ -1419,9 +1416,9 @@ void r5c_flush_cache(struct r5conf *conf, int num) > } > } > > -static void r5c_do_reclaim(struct r5conf *conf) > +static void r5c_do_reclaim(struct r5l_log *log) > { > - struct r5l_log *log = conf->log; > + struct r5conf *conf = log->rdev->mddev->private; > struct stripe_head *sh; > int count = 0; > unsigned long flags; > @@ -1496,7 +1493,6 @@ static void r5c_do_reclaim(struct r5conf *conf) > > static void r5l_do_reclaim(struct r5l_log *log) > { > - struct r5conf *conf = log->rdev->mddev->private; > sector_t reclaim_target = xchg(&log->reclaim_target, 0); > sector_t reclaimable; > sector_t next_checkpoint; > @@ -1525,7 +1521,7 @@ static void r5l_do_reclaim(struct r5l_log *log) > log->io_list_lock); > } > > - next_checkpoint = r5c_calculate_new_cp(conf); > + next_checkpoint = r5c_calculate_new_cp(log); > spin_unlock_irq(&log->io_list_lock); > > if (reclaimable == 0 || !write_super) > @@ -1554,7 +1550,7 @@ static void r5l_reclaim_thread(struct md_thread *thread) > > if (!log) > return; > - r5c_do_reclaim(conf); > + r5c_do_reclaim(log); > r5l_do_reclaim(log); > } > > -- > 2.39.2 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() 2023-07-07 8:52 ` Song Liu @ 2023-07-07 9:06 ` Yu Kuai 2023-07-07 9:16 ` Yu Kuai 0 siblings, 1 reply; 10+ messages in thread From: Yu Kuai @ 2023-07-07 9:06 UTC (permalink / raw) To: Song Liu, Yu Kuai Cc: xni, logang, hch, shli, linux-raid, linux-kernel, yi.zhang, yangerkun, yukuai (C) Hi, 在 2023/07/07 16:52, Song Liu 写道: > On Wed, Jun 28, 2023 at 9:08 AM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> From: Yu Kuai <yukuai3@huawei.com> >> >> r5l_reclaim_thread() already check that 'conf->log' is not NULL in the >> beginning, however, r5c_do_reclaim() and r5l_do_reclaim() will >> dereference 'conf->log' again, which will cause null-ptr-deref if >> 'conf->log' is set to NULL from r5l_exit_log() concurrently. > > r5l_exit_log() will wait until reclaim_thread() finishes, and then set > conf->log to NULL. So this is not a problem, no? Patch one just revert this, wait until reclaim_thread() then set conf->log to NULL will cause deadlock, as I sescribled in patch 0. Thanks, Kuai > > Thanks, > Song > >> >> Fix this problem by don't dereference 'conf->log' again in >> r5c_do_reclaim() and r5c_do_reclaim(). >> >> Fixes: a39f7afde358 ("md/r5cache: write-out phase and reclaim support") >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >> --- >> drivers/md/raid5-cache.c | 20 ++++++++------------ >> 1 file changed, 8 insertions(+), 12 deletions(-) >> >> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c >> index 083288e36949..ba6fc146d265 100644 >> --- a/drivers/md/raid5-cache.c >> +++ b/drivers/md/raid5-cache.c >> @@ -1148,10 +1148,9 @@ static void r5l_run_no_space_stripes(struct r5l_log *log) >> * for write through mode, returns log->next_checkpoint >> * for write back, returns log_start of first sh in stripe_in_journal_list >> */ >> -static sector_t r5c_calculate_new_cp(struct r5conf *conf) >> +static sector_t r5c_calculate_new_cp(struct r5l_log *log) >> { >> struct stripe_head *sh; >> - struct r5l_log *log = conf->log; >> sector_t new_cp; >> unsigned long flags; >> >> @@ -1159,12 +1158,12 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf) >> return log->next_checkpoint; >> >> spin_lock_irqsave(&log->stripe_in_journal_lock, flags); >> - if (list_empty(&conf->log->stripe_in_journal_list)) { >> + if (list_empty(&log->stripe_in_journal_list)) { >> /* all stripes flushed */ >> spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags); >> return log->next_checkpoint; >> } >> - sh = list_first_entry(&conf->log->stripe_in_journal_list, >> + sh = list_first_entry(&log->stripe_in_journal_list, >> struct stripe_head, r5c); >> new_cp = sh->log_start; >> spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags); >> @@ -1173,10 +1172,8 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf) >> >> static sector_t r5l_reclaimable_space(struct r5l_log *log) >> { >> - struct r5conf *conf = log->rdev->mddev->private; >> - >> return r5l_ring_distance(log, log->last_checkpoint, >> - r5c_calculate_new_cp(conf)); >> + r5c_calculate_new_cp(log)); >> } >> >> static void r5l_run_no_mem_stripe(struct r5l_log *log) >> @@ -1419,9 +1416,9 @@ void r5c_flush_cache(struct r5conf *conf, int num) >> } >> } >> >> -static void r5c_do_reclaim(struct r5conf *conf) >> +static void r5c_do_reclaim(struct r5l_log *log) >> { >> - struct r5l_log *log = conf->log; >> + struct r5conf *conf = log->rdev->mddev->private; >> struct stripe_head *sh; >> int count = 0; >> unsigned long flags; >> @@ -1496,7 +1493,6 @@ static void r5c_do_reclaim(struct r5conf *conf) >> >> static void r5l_do_reclaim(struct r5l_log *log) >> { >> - struct r5conf *conf = log->rdev->mddev->private; >> sector_t reclaim_target = xchg(&log->reclaim_target, 0); >> sector_t reclaimable; >> sector_t next_checkpoint; >> @@ -1525,7 +1521,7 @@ static void r5l_do_reclaim(struct r5l_log *log) >> log->io_list_lock); >> } >> >> - next_checkpoint = r5c_calculate_new_cp(conf); >> + next_checkpoint = r5c_calculate_new_cp(log); >> spin_unlock_irq(&log->io_list_lock); >> >> if (reclaimable == 0 || !write_super) >> @@ -1554,7 +1550,7 @@ static void r5l_reclaim_thread(struct md_thread *thread) >> >> if (!log) >> return; >> - r5c_do_reclaim(conf); >> + r5c_do_reclaim(log); >> r5l_do_reclaim(log); >> } >> >> -- >> 2.39.2 >> > . > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() 2023-07-07 9:06 ` Yu Kuai @ 2023-07-07 9:16 ` Yu Kuai 2023-07-07 9:19 ` Yu Kuai 0 siblings, 1 reply; 10+ messages in thread From: Yu Kuai @ 2023-07-07 9:16 UTC (permalink / raw) To: Yu Kuai, Song Liu Cc: xni, logang, hch, shli, linux-raid, linux-kernel, yi.zhang, yangerkun, yukuai (C) Hi, 在 2023/07/07 17:06, Yu Kuai 写道: > Hi, > > 在 2023/07/07 16:52, Song Liu 写道: >> On Wed, Jun 28, 2023 at 9:08 AM Yu Kuai <yukuai1@huaweicloud.com> wrote: >>> >>> From: Yu Kuai <yukuai3@huawei.com> >>> >>> r5l_reclaim_thread() already check that 'conf->log' is not NULL in the >>> beginning, however, r5c_do_reclaim() and r5l_do_reclaim() will >>> dereference 'conf->log' again, which will cause null-ptr-deref if >>> 'conf->log' is set to NULL from r5l_exit_log() concurrently. >> >> r5l_exit_log() will wait until reclaim_thread() finishes, and then set >> conf->log to NULL. So this is not a problem, no? Perhaps you means this order? r5l_exit_log flush_work(&log->disable_writeback_work) conf->log = NULL md_unregister_thread(&log->reclaim_thread) I think this is better indeed. Thanks, Kuai > > Patch one just revert this, wait until reclaim_thread() then set > conf->log to NULL will cause deadlock, as I sescribled in patch 0. > > Thanks, > Kuai >> >> Thanks, >> Song >> >>> >>> Fix this problem by don't dereference 'conf->log' again in >>> r5c_do_reclaim() and r5c_do_reclaim(). >>> >>> Fixes: a39f7afde358 ("md/r5cache: write-out phase and reclaim support") >>> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >>> --- >>> drivers/md/raid5-cache.c | 20 ++++++++------------ >>> 1 file changed, 8 insertions(+), 12 deletions(-) >>> >>> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c >>> index 083288e36949..ba6fc146d265 100644 >>> --- a/drivers/md/raid5-cache.c >>> +++ b/drivers/md/raid5-cache.c >>> @@ -1148,10 +1148,9 @@ static void r5l_run_no_space_stripes(struct >>> r5l_log *log) >>> * for write through mode, returns log->next_checkpoint >>> * for write back, returns log_start of first sh in >>> stripe_in_journal_list >>> */ >>> -static sector_t r5c_calculate_new_cp(struct r5conf *conf) >>> +static sector_t r5c_calculate_new_cp(struct r5l_log *log) >>> { >>> struct stripe_head *sh; >>> - struct r5l_log *log = conf->log; >>> sector_t new_cp; >>> unsigned long flags; >>> >>> @@ -1159,12 +1158,12 @@ static sector_t r5c_calculate_new_cp(struct >>> r5conf *conf) >>> return log->next_checkpoint; >>> >>> spin_lock_irqsave(&log->stripe_in_journal_lock, flags); >>> - if (list_empty(&conf->log->stripe_in_journal_list)) { >>> + if (list_empty(&log->stripe_in_journal_list)) { >>> /* all stripes flushed */ >>> spin_unlock_irqrestore(&log->stripe_in_journal_lock, >>> flags); >>> return log->next_checkpoint; >>> } >>> - sh = list_first_entry(&conf->log->stripe_in_journal_list, >>> + sh = list_first_entry(&log->stripe_in_journal_list, >>> struct stripe_head, r5c); >>> new_cp = sh->log_start; >>> spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags); >>> @@ -1173,10 +1172,8 @@ static sector_t r5c_calculate_new_cp(struct >>> r5conf *conf) >>> >>> static sector_t r5l_reclaimable_space(struct r5l_log *log) >>> { >>> - struct r5conf *conf = log->rdev->mddev->private; >>> - >>> return r5l_ring_distance(log, log->last_checkpoint, >>> - r5c_calculate_new_cp(conf)); >>> + r5c_calculate_new_cp(log)); >>> } >>> >>> static void r5l_run_no_mem_stripe(struct r5l_log *log) >>> @@ -1419,9 +1416,9 @@ void r5c_flush_cache(struct r5conf *conf, int num) >>> } >>> } >>> >>> -static void r5c_do_reclaim(struct r5conf *conf) >>> +static void r5c_do_reclaim(struct r5l_log *log) >>> { >>> - struct r5l_log *log = conf->log; >>> + struct r5conf *conf = log->rdev->mddev->private; >>> struct stripe_head *sh; >>> int count = 0; >>> unsigned long flags; >>> @@ -1496,7 +1493,6 @@ static void r5c_do_reclaim(struct r5conf *conf) >>> >>> static void r5l_do_reclaim(struct r5l_log *log) >>> { >>> - struct r5conf *conf = log->rdev->mddev->private; >>> sector_t reclaim_target = xchg(&log->reclaim_target, 0); >>> sector_t reclaimable; >>> sector_t next_checkpoint; >>> @@ -1525,7 +1521,7 @@ static void r5l_do_reclaim(struct r5l_log *log) >>> log->io_list_lock); >>> } >>> >>> - next_checkpoint = r5c_calculate_new_cp(conf); >>> + next_checkpoint = r5c_calculate_new_cp(log); >>> spin_unlock_irq(&log->io_list_lock); >>> >>> if (reclaimable == 0 || !write_super) >>> @@ -1554,7 +1550,7 @@ static void r5l_reclaim_thread(struct md_thread >>> *thread) >>> >>> if (!log) >>> return; >>> - r5c_do_reclaim(conf); >>> + r5c_do_reclaim(log); >>> r5l_do_reclaim(log); >>> } >>> >>> -- >>> 2.39.2 >>> >> . >> > > . > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() 2023-07-07 9:16 ` Yu Kuai @ 2023-07-07 9:19 ` Yu Kuai 2023-07-07 9:36 ` Song Liu 0 siblings, 1 reply; 10+ messages in thread From: Yu Kuai @ 2023-07-07 9:19 UTC (permalink / raw) To: Yu Kuai, Song Liu Cc: xni, logang, hch, shli, linux-raid, linux-kernel, yi.zhang, yangerkun, yukuai (C) Hi, 在 2023/07/07 17:16, Yu Kuai 写道: > Perhaps you means this order? > > r5l_exit_log > flush_work(&log->disable_writeback_work) > conf->log = NULL > md_unregister_thread(&log->reclaim_thread) > > I think this is better indeed. Never mind, this is wrong, I got confused... Please ignore this and take a look at my original fix. Thanks, Kuai ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() 2023-07-07 9:19 ` Yu Kuai @ 2023-07-07 9:36 ` Song Liu 2023-07-08 2:41 ` Yu Kuai 0 siblings, 1 reply; 10+ messages in thread From: Song Liu @ 2023-07-07 9:36 UTC (permalink / raw) To: Yu Kuai Cc: xni, logang, hch, shli, linux-raid, linux-kernel, yi.zhang, yangerkun, yukuai (C) On Fri, Jul 7, 2023 at 5:19 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2023/07/07 17:16, Yu Kuai 写道: > > Perhaps you means this order? > > > > r5l_exit_log > > flush_work(&log->disable_writeback_work) > > conf->log = NULL > > md_unregister_thread(&log->reclaim_thread) > > > > I think this is better indeed. > Never mind, this is wrong, I got confused... > > Please ignore this and take a look at my original fix. How about r5l_exit_log md_unregister_thread(&log->reclaim_thread) conf->log = NULL flush_work(&log->disable_writeback_work) ? Thanks, Song ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() 2023-07-07 9:36 ` Song Liu @ 2023-07-08 2:41 ` Yu Kuai 0 siblings, 0 replies; 10+ messages in thread From: Yu Kuai @ 2023-07-08 2:41 UTC (permalink / raw) To: Song Liu, Yu Kuai Cc: xni, logang, hch, shli, linux-raid, linux-kernel, yi.zhang, yangerkun, yukuai (C) Hi, 在 2023/07/07 17:36, Song Liu 写道: > On Fri, Jul 7, 2023 at 5:19 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> Hi, >> >> 在 2023/07/07 17:16, Yu Kuai 写道: >>> Perhaps you means this order? >>> >>> r5l_exit_log >>> flush_work(&log->disable_writeback_work) >>> conf->log = NULL >>> md_unregister_thread(&log->reclaim_thread) >>> >>> I think this is better indeed. >> Never mind, this is wrong, I got confused... >> >> Please ignore this and take a look at my original fix. > > How about > > r5l_exit_log > md_unregister_thread(&log->reclaim_thread) > conf->log = NULL > flush_work(&log->disable_writeback_work) > > ? This looks correct, expect that wake_up() should be moved together. I'll send a v2. Thanks, Kuai > > Thanks, > Song > . > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() 2023-06-28 1:07 [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai 2023-06-28 1:07 ` [PATCH -next v2 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" Yu Kuai 2023-06-28 1:07 ` [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai @ 2023-07-07 1:24 ` Yu Kuai 2 siblings, 0 replies; 10+ messages in thread From: Yu Kuai @ 2023-07-07 1:24 UTC (permalink / raw) To: Yu Kuai, xni, logang, hch, song, shli Cc: linux-raid, linux-kernel, yi.zhang, yangerkun, yukuai (C) Hi, 在 2023/06/28 9:07, Yu Kuai 写道: > From: Yu Kuai <yukuai3@huawei.com> > > Changes in v2: > - remove a now unused local variable in patch 2; > > Commit b13015af94cf ("md/raid5-cache: Clear conf->log after finishing > work") introduce a new problem: > > // caller hold reconfig_mutex > r5l_exit_log > flush_work(&log->disable_writeback_work) > r5c_disable_writeback_async > wait_event > /* > * conf->log is not NULL, and mddev_trylock() > * will fail, wait_event() can never pass. > */ > conf->log = NULL > > patch 1 revert this patch, an patch 2 fix the original problem in a > different way. > > Noted this problem is just found by code review, and I think this is > probably the reason that some mdadm tests is broken. Any suggestions? By the way, while taking another look at this problem, I think probably read and write 'conf->log' should use READ_ONCE and WRITE_ONCE. Thanks, Kuai > > Yu Kuai (2): > md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after > finishing work" > md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() > > drivers/md/raid5-cache.c | 25 ++++++++++--------------- > 1 file changed, 10 insertions(+), 15 deletions(-) > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-07-08 2:42 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-06-28 1:07 [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai 2023-06-28 1:07 ` [PATCH -next v2 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" Yu Kuai 2023-06-28 1:07 ` [PATCH -next v2 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai 2023-07-07 8:52 ` Song Liu 2023-07-07 9:06 ` Yu Kuai 2023-07-07 9:16 ` Yu Kuai 2023-07-07 9:19 ` Yu Kuai 2023-07-07 9:36 ` Song Liu 2023-07-08 2:41 ` Yu Kuai 2023-07-07 1:24 ` [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.