All of lore.kernel.org
 help / color / mirror / Atom feed
From: tang.junhui@zte.com.cn
To: colyli@suse.de
Cc: mlyle@lyle.org, linux-bcache@vger.kernel.org,
	linux-block@vger.kernel.org, tang.junhui@zte.com.cn
Subject: Re: [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping
Date: Wed, 3 Jan 2018 21:15:59 +0800	[thread overview]
Message-ID: <1514985359-3210-1-git-send-email-tang.junhui@zte.com.cn> (raw)

From: Tang Junhui <tang.junhui@zte.com.cn>

Hello Coly,

Thanks for this serials.

>struct delayed_work writeback_rate_update in struct cache_dev is a delayed
>worker to call function update_writeback_rate() in period (the interval is
>defined by dc->writeback_rate_update_seconds).
>
>When a metadate I/O error happens on cache device, bcache error handling
>routine bch_cache_set_error() will call bch_cache_set_unregister() to
>retire whole cache set. On the unregister code path, cached_dev_free()
>calls cancel_delayed_work_sync(&dc->writeback_rate_update) to stop this
>delayed work.
>
>dc->writeback_rate_update is a special delayed work from others in bcache.
>In its routine update_writeback_rate(), this delayed work is re-armed
>after a piece of time. That means when cancel_delayed_work_sync() returns,
>this delayed work can still be executed after several seconds defined by
>dc->writeback_rate_update_seconds.
>
>The problem is, after cancel_delayed_work_sync() returns, the cache set
>unregister code path will eventually release memory of struct cache set.
>Then the delayed work is scheduled to run, and inside its routine
>update_writeback_rate() that already released cache set NULL pointer will
>be accessed. Now a NULL pointer deference panic is triggered.
>
As I known that, after calling cancel_delayed_work_sync(), if the work queue
is running, cancel_delayed_work_sync() would return only AFTER the work queue
runs over, else if the work queue does not run yet, cancel_delayed_work_sync()
will cancel the work queue imediately, so I think it is safe in 
update_writeback_rate() to access the cache set resource. Point me out if I
am wrong.

>In order to avoid the above problem, this patch checks cache set flags in
>delayed work routine update_writeback_rate(). If flag CACHE_SET_STOPPING
>is set, this routine will quit without re-arm the delayed work. Then the
>NULL pointer deference panic won't happen after cache set is released.
>
>Signed-off-by: Coly Li <colyli@suse.de>
>---
> drivers/md/bcache/writeback.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
>diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>index 0789a9e18337..745d9b2a326f 100644
>--- a/drivers/md/bcache/writeback.c
>+++ b/drivers/md/bcache/writeback.c
>@@ -91,6 +91,11 @@ static void update_writeback_rate(struct work_struct *work)
>     struct cached_dev *dc = container_of(to_delayed_work(work),
>                          struct cached_dev,
>                          writeback_rate_update);
>+    struct cache_set *c = dc->disk.c;
>+
>+    /* quit directly if cache set is stopping */
>+    if (test_bit(CACHE_SET_STOPPING, &c->flags))
>+        return;
> 
>     down_read(&dc->writeback_lock);
> 
>@@ -100,6 +105,10 @@ static void update_writeback_rate(struct work_struct *work)
> 
>     up_read(&dc->writeback_lock);
> 
>+    /* do not schedule delayed work if cache set is stopping */
>+    if (test_bit(CACHE_SET_STOPPING, &c->flags))
>+        return;
>+
>     schedule_delayed_work(&dc->writeback_rate_update,
>                   dc->writeback_rate_update_seconds * HZ);
> }
>-- 
>2.15.1                                                             

Thanks,
Tang Junhui

             reply	other threads:[~2018-01-04  2:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-03 13:15 tang.junhui [this message]
2018-01-04  3:32 ` [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping Coly Li
2018-01-03 14:03 [PATCH v1 00/10] cache device failure handling improvement Coly Li
2018-01-03 14:03 ` [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping Coly Li
2018-01-08  7:22   ` Hannes Reinecke
2018-01-08 16:01     ` Coly Li
2018-01-03 16:47 tang.junhui
2018-01-04  8:06 ` Coly Li
2018-01-03 18:54 tang.junhui
2018-01-04  9:05 ` Coly Li
2018-01-04 12:41 tang.junhui
2018-01-05  4:01 ` Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1514985359-3210-1-git-send-email-tang.junhui@zte.com.cn \
    --to=tang.junhui@zte.com.cn \
    --cc=colyli@suse.de \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=mlyle@lyle.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.