All of lore.kernel.org
 help / color / mirror / Atom feed
From: colyli <colyli@suse.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org
Subject: Re: [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate()
Date: Sat, 28 May 2022 10:20:26 +0800	[thread overview]
Message-ID: <49c0852df72f978241fd14a029c91ea9@suse.de> (raw)
In-Reply-To: <1a3a151e-a8f9-7f53-f9fc-e7eaea42a462@kernel.dk>

在 2022-05-28 08:00,Jens Axboe 写道:
> On 5/27/22 11:05 AM, colyli wrote:
>> ? 2022-05-27 23:49?Jens Axboe ???
>>> On 5/27/22 9:28 AM, Coly Li wrote:
>>>> diff --git a/drivers/md/bcache/writeback.c 
>>>> b/drivers/md/bcache/writeback.c
>>>> index d138a2d73240..c51671abe74e 100644
>>>> --- a/drivers/md/bcache/writeback.c
>>>> +++ b/drivers/md/bcache/writeback.c
>>>> @@ -214,6 +214,7 @@ static void update_writeback_rate(struct 
>>>> work_struct *work)
>>>>                           struct cached_dev,
>>>>                           writeback_rate_update);
>>>>      struct cache_set *c = dc->disk.c;
>>>> +    bool contention = false;
>>>> 
>>>>      /*
>>>>       * should check BCACHE_DEV_RATE_DW_RUNNING before calling
>>>> @@ -243,13 +244,41 @@ static void update_writeback_rate(struct 
>>>> work_struct *work)
>>>>           * in maximum writeback rate number(s).
>>>>           */
>>>>          if (!set_at_max_writeback_rate(c, dc)) {
>>>> -            down_read(&dc->writeback_lock);
>>>> -            __update_writeback_rate(dc);
>>>> -            update_gc_after_writeback(c);
>>>> -            up_read(&dc->writeback_lock);
>>>> +            /*
>>>> +             * When contention happens on dc->writeback_lock with
>>>> +             * the writeback thread, this kwork may be blocked for
>>>> +             * very long time if there are too many dirty data to
>>>> +             * writeback, and kerne message will complain a (bogus)
>>>> +             * software lockup kernel message. To avoid potential
>>>> +             * starving, if down_read_trylock() fails, writeback
>>>> +             * rate updating will be skipped for dc->retry_max 
>>>> times
>>>> +             * at most while delay this worker a bit longer time.
>>>> +             * If dc->retry_max times are tried and the trylock
>>>> +             * still fails, then call down_read() to wait for
>>>> +             * dc->writeback_lock.
>>>> +             */
>>>> +            if (!down_read_trylock((&dc->writeback_lock))) {
>>>> +                contention = true;
>>>> +                dc->retry_nr++;
>>>> +                if (dc->retry_nr > dc->retry_max)
>>>> +                    down_read(&dc->writeback_lock);
>>>> +            }
>>>> +
>>>> +            if (!contention || dc->retry_nr > dc->retry_max) {
>>>> +                __update_writeback_rate(dc);
>>>> +                update_gc_after_writeback(c);
>>>> +                up_read(&dc->writeback_lock);
>>>> +                dc->retry_nr = 0;
>>>> +            }
>>>>          }
>>>>      }
>>> 
>> 
>> Hi Jens,
>> 
>> Thanks for looking into this :-)
>> 
>>> This is really not very pretty. First of all, why bother with storing 
>>> a
>>> max retry value in there? Doesn't seem like it'd ever be different 
>>> per
>> 
>> It is because the probability of the lock contention on
>> dc->writeback_lock depends on the I/O speed backing device. From my
>> observation during the tests, for fast backing device with larger
>> cache device, its writeback thread may work harder to flush more dirty
>> data to backing device, the lock contention happens more and longer,
>> so the writeback rate update kworker has to wait longer time before
>> acquires dc->writeback_lock. So its dc->retry_max should be larger
>> then slow backing device.
>> 
>> Therefore I'd like to have a tunable per-backing-device retry_max. And
>> the syses interface will be added when users/customers want it. The
>> use case is from SAP HANA users, I have report that they observe the
>> soft lockup warning for dc->writeback_lock contention and worry about
>> whether data is corrupted (indeed, of course not).
> 
> The initial patch has 5 as the default, and there are no sysfs knobs. 
> If
> you ever need a sysfs knob, by all means make it a variable and make it
> configurable too. But don't do it upfront where the '5' suitabled named
> would do the job.
> 

Copied.

>>> diff --git a/drivers/md/bcache/writeback.c 
>>> b/drivers/md/bcache/writeback.c
>>> index 9ee0005874cd..cbc01372c7a1 100644
>>> --- a/drivers/md/bcache/writeback.c
>>> +++ b/drivers/md/bcache/writeback.c
>>> @@ -235,19 +235,27 @@ static void update_writeback_rate(struct
>>> work_struct *work)
>>>          return;
>>>      }
>>> 
>>> -    if (atomic_read(&dc->has_dirty) && dc->writeback_percent) {
>>> +    if (atomic_read(&dc->has_dirty) && dc->writeback_percent &&
>>> +        !set_at_max_writeback_rate(c, dc)) {
>>>          /*
>>>           * If the whole cache set is idle, 
>>> set_at_max_writeback_rate()
>>>           * will set writeback rate to a max number. Then it is
>>>           * unncessary to update writeback rate for an idle cache set
>>>           * in maximum writeback rate number(s).
>>>           */
>>> -        if (!set_at_max_writeback_rate(c, dc)) {
>> 
>> The reason I didn't place '!set_at_max_writeback_rate' with other 
>> items in
>> previous if() was for the above code comment. If I moved it to 
>> previous
>> if() without other items, I was not comfortable to place the code 
>> comments
>> neither before or after the if() check. So I used a separated if() 
>> check for
>> '!set_at_max_writeback_rate'.
>> 
>> From your change, it seems placing the code comments behind is fine 
>> (or
>> better), can I understand in this way? I try to learn and follow your 
>> way
>> to handle such code comments situation.
> 
> Just put it higher up if you want, it doesn't really matter, or leave 
> it
> where it is.
> 

Copied.

>>>              __update_writeback_rate(dc);
>>>              update_gc_after_writeback(c);
>>>              up_read(&dc->writeback_lock);
>>> -        }
>>> +        } while (0);
>> 
>> Aha, this is cool! I never though of using do{}while(0) and break in
>> such a genius way! Sure I will use this, thanks for the hint :-)
>> 
>> After you reply my defense of dc->retry_max, and the question of code
>> comments location, I will update and test the patch again, and
>> re-sbumit to you.
>> 
>> Thanks for your constructive suggestion, especially the do{}while(0)
>> part!
> 
> I would do something similar to my change and drop the 'dc' addition 
> for
> the max retries as it, by definition, can only be one value right now.
> For all I know, you'll never need to change it again, and then you're
> just wasting memory and making the code harder to read by putting it in
> dc instead of just having this define.

Copied. I will do the change and repost it again. Thank you for the 
review and comments.

Coly Li

  reply	other threads:[~2022-05-28  2:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-27 15:28 [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave) Coly Li
2022-05-27 15:28 ` [PATCH 1/3] bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() Coly Li
2022-05-27 15:28 ` [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate() Coly Li
2022-05-27 15:49   ` Jens Axboe
2022-05-27 17:05     ` colyli
2022-05-28  0:00       ` Jens Axboe
2022-05-28  2:20         ` colyli [this message]
2022-05-27 15:28 ` [PATCH 3/3] md: bcache: check the return value of kzalloc() in detached_dev_do_request() Coly Li
2022-05-27 15:50 ` (subset) [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave) Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49c0852df72f978241fd14a029c91ea9@suse.de \
    --to=colyli@suse.de \
    --cc=axboe@kernel.dk \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.