All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave)
@ 2022-05-27 15:28 Coly Li
  2022-05-27 15:28 ` [PATCH 1/3] bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() Coly Li
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Coly Li @ 2022-05-27 15:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Coly Li

Hi Jens,

Here are the 2nd wave bcache fixes for Linux v5.19, they just survives
from my I/O pressure testing and look fine.

The patch from Jia-Ju Bai is in my testing queue for a while, it handles
a memory allocation failure in the I/O path on a backing device when it
is not attached to cache device.

My first patch adds memset() to zero clean the on-stack local variables
to keep the code logic consistent as they were previously allocated by
kzalloc() dynamically.

The second patch from me is an effort to avoid bogus soft lockup warning
in kernel message. Indeed the busy writeback thread starves writeback
I/O rate calculation kwork doesn't hurt anything, but the kernel message
with trace information scares users time to time, makes them to worry
about something wrong with bcache. This patch permit the writeback rate
update kworker to retry longer times before finally competing the write-
back lock with writeback thread, to avoid the unnecessary soft lockup
warning information.

There is no more patch in my plan for Linux v5.19. Please consider to
take them, and thank you in advance.

Coly Li (2):
  bcache: memset on stack variables in bch_btree_check() and
    bch_sectors_dirty_init()
  bcache: avoid unnecessary soft lockup in kworker
    update_writeback_rate()

Jia-Ju Bai (1):
  md: bcache: check the return value of kzalloc() in
    detached_dev_do_request()

 drivers/md/bcache/bcache.h    |  7 ++++++
 drivers/md/bcache/btree.c     |  1 +
 drivers/md/bcache/request.c   |  6 +++++
 drivers/md/bcache/writeback.c | 46 +++++++++++++++++++++++++++++++----
 4 files changed, 55 insertions(+), 5 deletions(-)

-- 
2.35.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init()
  2022-05-27 15:28 [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave) Coly Li
@ 2022-05-27 15:28 ` Coly Li
  2022-05-27 15:28 ` [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate() Coly Li
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Coly Li @ 2022-05-27 15:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Coly Li

The local variables check_state (in bch_btree_check()) and state (in
bch_sectors_dirty_init()) should be fully filled by 0, because before
allocating them on stack, they were dynamically allocated by kzalloc().

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/btree.c     | 1 +
 drivers/md/bcache/writeback.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 2362bb8ef6d1..e136d6edc1ed 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -2017,6 +2017,7 @@ int bch_btree_check(struct cache_set *c)
 	if (c->root->level == 0)
 		return 0;
 
+	memset(&check_state, 0, sizeof(struct btree_check_state));
 	check_state.c = c;
 	check_state.total_threads = bch_btree_chkthread_nr();
 	check_state.key_idx = 0;
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 75b71199800d..d138a2d73240 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -950,6 +950,7 @@ void bch_sectors_dirty_init(struct bcache_device *d)
 		return;
 	}
 
+	memset(&state, 0, sizeof(struct bch_dirty_init_state));
 	state.c = c;
 	state.d = d;
 	state.total_threads = bch_btre_dirty_init_thread_nr();
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate()
  2022-05-27 15:28 [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave) Coly Li
  2022-05-27 15:28 ` [PATCH 1/3] bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() Coly Li
@ 2022-05-27 15:28 ` Coly Li
  2022-05-27 15:49   ` Jens Axboe
  2022-05-27 15:28 ` [PATCH 3/3] md: bcache: check the return value of kzalloc() in detached_dev_do_request() Coly Li
  2022-05-27 15:50 ` (subset) [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave) Jens Axboe
  3 siblings, 1 reply; 9+ messages in thread
From: Coly Li @ 2022-05-27 15:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Coly Li

The kworker routine update_writeback_rate() is schedued to update the
writeback rate in every 5 seconds by default. Before calling
__update_writeback_rate() to do real job, semaphore dc->writeback_lock
should be held by the kworker routine.

At the same time, bcache writeback thread routine bch_writeback_thread()
also needs to hold dc->writeback_lock before flushing dirty data back
into the backing device. If the dirty data set is large, it might be
very long time for bch_writeback_thread() to scan all dirty buckets and
releases dc->writeback_lock. In such case update_writeback_rate() can be
starved for long enough time so that kernel reports a soft lockup warn-
ing started like:
  watchdog: BUG: soft lockup - CPU#246 stuck for 23s! [kworker/246:31:179713]

Such soft lockup condition is unnecessary, because after the writeback
thread finishes its job and releases dc->writeback_lock, the kworker
update_writeback_rate() may continue to work and everything is fine
indeed.

This patch avoids the unnecessary soft lockup by the following method,
- Add new members to struct cached_dev
  - dc->retry_nr (0 by default)
  - dc->retry_max (5 by default)
- In update_writeback_rate() call down_read_trylock(&dc->writeback_lock)
  firstly, if it fails then lock contention happens. Then dc->retry_nr
  adds 1, and mark contention as true
- If dc->retry_nr <= dc->retry_max, give up updating the writeback rate
  and reschedules the kworker to retry after a longer time.
- If dc->retry_nr > dc->retry_max, no retry anymore and call
  down_read(&dc->writeback_lock) to wait for the lock.

By the above method, at worst case update_writeback_rate() may retry for
1+ minutes before blocking on dc->writeback_lock by calling down_read().
For a 4TB cache device with 1TB dirty data, 90%+ of the unnecessary soft
lockup warning message can be avoided.

When retrying to acquire dc->writeback_lock in update_writeback_rate(),
of course the writeback rate cannot be updated. It is fair, because when
the kworker is blocked on the lock contention of dc->writeback_lock, the
writeback rate cannot be updated neither.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/bcache.h    |  7 ++++++
 drivers/md/bcache/writeback.c | 45 +++++++++++++++++++++++++++++++----
 2 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 9ed9c955add7..82b86b874294 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -395,6 +395,13 @@ struct cached_dev {
 	atomic_t		io_errors;
 	unsigned int		error_limit;
 	unsigned int		offline_seconds;
+
+	/*
+	 * Retry to update writeback_rate if contention happens for
+	 * down_read(dc->writeback_lock) in update_writeback_rate()
+	 */
+	unsigned int		retry_nr;
+	unsigned int		retry_max;
 };
 
 enum alloc_reserve {
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index d138a2d73240..c51671abe74e 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -214,6 +214,7 @@ static void update_writeback_rate(struct work_struct *work)
 					     struct cached_dev,
 					     writeback_rate_update);
 	struct cache_set *c = dc->disk.c;
+	bool contention = false;
 
 	/*
 	 * should check BCACHE_DEV_RATE_DW_RUNNING before calling
@@ -243,13 +244,41 @@ static void update_writeback_rate(struct work_struct *work)
 		 * in maximum writeback rate number(s).
 		 */
 		if (!set_at_max_writeback_rate(c, dc)) {
-			down_read(&dc->writeback_lock);
-			__update_writeback_rate(dc);
-			update_gc_after_writeback(c);
-			up_read(&dc->writeback_lock);
+			/*
+			 * When contention happens on dc->writeback_lock with
+			 * the writeback thread, this kwork may be blocked for
+			 * very long time if there are too many dirty data to
+			 * writeback, and kerne message will complain a (bogus)
+			 * software lockup kernel message. To avoid potential
+			 * starving, if down_read_trylock() fails, writeback
+			 * rate updating will be skipped for dc->retry_max times
+			 * at most while delay this worker a bit longer time.
+			 * If dc->retry_max times are tried and the trylock
+			 * still fails, then call down_read() to wait for
+			 * dc->writeback_lock.
+			 */
+			if (!down_read_trylock((&dc->writeback_lock))) {
+				contention = true;
+				dc->retry_nr++;
+				if (dc->retry_nr > dc->retry_max)
+					down_read(&dc->writeback_lock);
+			}
+
+			if (!contention || dc->retry_nr > dc->retry_max) {
+				__update_writeback_rate(dc);
+				update_gc_after_writeback(c);
+				up_read(&dc->writeback_lock);
+				dc->retry_nr = 0;
+			}
 		}
 	}
 
+	/*
+	 * In case no lock contention on dc->writeback_lock happens since
+	 * last retry, e.g. cache is clean or I/O idle for a while.
+	 */
+	if (!contention && dc->retry_nr)
+		dc->retry_nr = 0;
 
 	/*
 	 * CACHE_SET_IO_DISABLE might be set via sysfs interface,
@@ -257,8 +286,10 @@ static void update_writeback_rate(struct work_struct *work)
 	 */
 	if (test_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags) &&
 	    !test_bit(CACHE_SET_IO_DISABLE, &c->flags)) {
+		unsigned int scale = 1 + dc->retry_nr;
+
 		schedule_delayed_work(&dc->writeback_rate_update,
-			      dc->writeback_rate_update_seconds * HZ);
+			dc->writeback_rate_update_seconds * scale * HZ);
 	}
 
 	/*
@@ -1006,6 +1037,10 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc)
 	dc->writeback_rate_fp_term_high = 1000;
 	dc->writeback_rate_i_term_inverse = 10000;
 
+	/* For dc->writeback_lock contention in update_writeback_rate() */
+	dc->retry_nr = 0;
+	dc->retry_max = 5;
+
 	WARN_ON(test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags));
 	INIT_DELAYED_WORK(&dc->writeback_rate_update, update_writeback_rate);
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] md: bcache: check the return value of kzalloc() in detached_dev_do_request()
  2022-05-27 15:28 [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave) Coly Li
  2022-05-27 15:28 ` [PATCH 1/3] bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() Coly Li
  2022-05-27 15:28 ` [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate() Coly Li
@ 2022-05-27 15:28 ` Coly Li
  2022-05-27 15:50 ` (subset) [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave) Jens Axboe
  3 siblings, 0 replies; 9+ messages in thread
From: Coly Li @ 2022-05-27 15:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-bcache, linux-block, Jia-Ju Bai, TOTE Robot, Coly Li

From: Jia-Ju Bai <baijiaju1990@gmail.com>

The function kzalloc() in detached_dev_do_request() can fail, so its
return value should be checked.

Fixes: bc082a55d25c ("bcache: fix inaccurate io state for detached bcache devices")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/request.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 320fcdfef48e..02df49d79b4b 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -1105,6 +1105,12 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
 	 * which would call closure_get(&dc->disk.cl)
 	 */
 	ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO);
+	if (!ddip) {
+		bio->bi_status = BLK_STS_RESOURCE;
+		bio->bi_end_io(bio);
+		return;
+	}
+
 	ddip->d = d;
 	/* Count on the bcache device */
 	ddip->orig_bdev = orig_bdev;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate()
  2022-05-27 15:28 ` [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate() Coly Li
@ 2022-05-27 15:49   ` Jens Axboe
  2022-05-27 17:05     ` colyli
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2022-05-27 15:49 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache, linux-block

On 5/27/22 9:28 AM, Coly Li wrote:
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index d138a2d73240..c51671abe74e 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -214,6 +214,7 @@ static void update_writeback_rate(struct work_struct *work)
>  					     struct cached_dev,
>  					     writeback_rate_update);
>  	struct cache_set *c = dc->disk.c;
> +	bool contention = false;
>  
>  	/*
>  	 * should check BCACHE_DEV_RATE_DW_RUNNING before calling
> @@ -243,13 +244,41 @@ static void update_writeback_rate(struct work_struct *work)
>  		 * in maximum writeback rate number(s).
>  		 */
>  		if (!set_at_max_writeback_rate(c, dc)) {
> -			down_read(&dc->writeback_lock);
> -			__update_writeback_rate(dc);
> -			update_gc_after_writeback(c);
> -			up_read(&dc->writeback_lock);
> +			/*
> +			 * When contention happens on dc->writeback_lock with
> +			 * the writeback thread, this kwork may be blocked for
> +			 * very long time if there are too many dirty data to
> +			 * writeback, and kerne message will complain a (bogus)
> +			 * software lockup kernel message. To avoid potential
> +			 * starving, if down_read_trylock() fails, writeback
> +			 * rate updating will be skipped for dc->retry_max times
> +			 * at most while delay this worker a bit longer time.
> +			 * If dc->retry_max times are tried and the trylock
> +			 * still fails, then call down_read() to wait for
> +			 * dc->writeback_lock.
> +			 */
> +			if (!down_read_trylock((&dc->writeback_lock))) {
> +				contention = true;
> +				dc->retry_nr++;
> +				if (dc->retry_nr > dc->retry_max)
> +					down_read(&dc->writeback_lock);
> +			}
> +
> +			if (!contention || dc->retry_nr > dc->retry_max) {
> +				__update_writeback_rate(dc);
> +				update_gc_after_writeback(c);
> +				up_read(&dc->writeback_lock);
> +				dc->retry_nr = 0;
> +			}
>  		}
>  	}

This is really not very pretty. First of all, why bother with storing a
max retry value in there? Doesn't seem like it'd ever be different per
'dc' anyway. Secondly, something like the below would be a lot more
readable. Totally untested.

diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 9ee0005874cd..cbc01372c7a1 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -235,19 +235,27 @@ static void update_writeback_rate(struct work_struct *work)
 		return;
 	}
 
-	if (atomic_read(&dc->has_dirty) && dc->writeback_percent) {
+	if (atomic_read(&dc->has_dirty) && dc->writeback_percent &&
+	    !set_at_max_writeback_rate(c, dc)) {
 		/*
 		 * If the whole cache set is idle, set_at_max_writeback_rate()
 		 * will set writeback rate to a max number. Then it is
 		 * unncessary to update writeback rate for an idle cache set
 		 * in maximum writeback rate number(s).
 		 */
-		if (!set_at_max_writeback_rate(c, dc)) {
-			down_read(&dc->writeback_lock);
+		do {
+			if (!down_read_trylock(&dc->writeback_lock)) {
+				dc->rate_update_retry++;
+				if (dc->rate_update_retry < MY_MAX)
+					break;
+				down_read(&dc->writeback_lock);
+				dc->rate_update_retry = 0;
+			}
+
 			__update_writeback_rate(dc);
 			update_gc_after_writeback(c);
 			up_read(&dc->writeback_lock);
-		}
+		} while (0);
 	}
 
-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: (subset) [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave)
  2022-05-27 15:28 [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave) Coly Li
                   ` (2 preceding siblings ...)
  2022-05-27 15:28 ` [PATCH 3/3] md: bcache: check the return value of kzalloc() in detached_dev_do_request() Coly Li
@ 2022-05-27 15:50 ` Jens Axboe
  3 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2022-05-27 15:50 UTC (permalink / raw)
  To: colyli; +Cc: linux-block, linux-bcache

On Fri, 27 May 2022 23:28:15 +0800, Coly Li wrote:
> Here are the 2nd wave bcache fixes for Linux v5.19, they just survives
> from my I/O pressure testing and look fine.
> 
> The patch from Jia-Ju Bai is in my testing queue for a while, it handles
> a memory allocation failure in the I/O path on a backing device when it
> is not attached to cache device.
> 
> [...]

Applied, thanks!

[1/3] bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init()
      commit: 7d6b902ea0e02b2a25c480edf471cbaa4ebe6b3c
[3/3] md: bcache: check the return value of kzalloc() in detached_dev_do_request()
      commit: 40f567bbb3b0639d2ec7d1c6ad4b1b018f80cf19

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate()
  2022-05-27 15:49   ` Jens Axboe
@ 2022-05-27 17:05     ` colyli
  2022-05-28  0:00       ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: colyli @ 2022-05-27 17:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-bcache, linux-block

在 2022-05-27 23:49,Jens Axboe 写道:
> On 5/27/22 9:28 AM, Coly Li wrote:
>> diff --git a/drivers/md/bcache/writeback.c 
>> b/drivers/md/bcache/writeback.c
>> index d138a2d73240..c51671abe74e 100644
>> --- a/drivers/md/bcache/writeback.c
>> +++ b/drivers/md/bcache/writeback.c
>> @@ -214,6 +214,7 @@ static void update_writeback_rate(struct 
>> work_struct *work)
>>  					     struct cached_dev,
>>  					     writeback_rate_update);
>>  	struct cache_set *c = dc->disk.c;
>> +	bool contention = false;
>> 
>>  	/*
>>  	 * should check BCACHE_DEV_RATE_DW_RUNNING before calling
>> @@ -243,13 +244,41 @@ static void update_writeback_rate(struct 
>> work_struct *work)
>>  		 * in maximum writeback rate number(s).
>>  		 */
>>  		if (!set_at_max_writeback_rate(c, dc)) {
>> -			down_read(&dc->writeback_lock);
>> -			__update_writeback_rate(dc);
>> -			update_gc_after_writeback(c);
>> -			up_read(&dc->writeback_lock);
>> +			/*
>> +			 * When contention happens on dc->writeback_lock with
>> +			 * the writeback thread, this kwork may be blocked for
>> +			 * very long time if there are too many dirty data to
>> +			 * writeback, and kerne message will complain a (bogus)
>> +			 * software lockup kernel message. To avoid potential
>> +			 * starving, if down_read_trylock() fails, writeback
>> +			 * rate updating will be skipped for dc->retry_max times
>> +			 * at most while delay this worker a bit longer time.
>> +			 * If dc->retry_max times are tried and the trylock
>> +			 * still fails, then call down_read() to wait for
>> +			 * dc->writeback_lock.
>> +			 */
>> +			if (!down_read_trylock((&dc->writeback_lock))) {
>> +				contention = true;
>> +				dc->retry_nr++;
>> +				if (dc->retry_nr > dc->retry_max)
>> +					down_read(&dc->writeback_lock);
>> +			}
>> +
>> +			if (!contention || dc->retry_nr > dc->retry_max) {
>> +				__update_writeback_rate(dc);
>> +				update_gc_after_writeback(c);
>> +				up_read(&dc->writeback_lock);
>> +				dc->retry_nr = 0;
>> +			}
>>  		}
>>  	}
> 

Hi Jens,

Thanks for looking into this :-)

> This is really not very pretty. First of all, why bother with storing a
> max retry value in there? Doesn't seem like it'd ever be different per

It is because the probability of the lock contention on 
dc->writeback_lock
depends on the I/O speed backing device. From my observation during the
tests, for fast backing device with larger cache device, its writeback
thread may work harder to flush more dirty data to backing device, the
lock contention happens more and longer, so the writeback rate update
kworker has to wait longer time before acquires dc->writeback_lock. So
its dc->retry_max should be larger then slow backing device.

Therefore I'd like to have a tunable per-backing-device retry_max. And
the syses interface will be added when users/customers want it. The use
case is from SAP HANA users, I have report that they observe the soft
lockup warning for dc->writeback_lock contention and worry about whether
data is corrupted (indeed, of course not).

> 'dc' anyway. Secondly, something like the below would be a lot more
> readable. Totally untested.

I response inline for the following suggestion.

> 
> diff --git a/drivers/md/bcache/writeback.c 
> b/drivers/md/bcache/writeback.c
> index 9ee0005874cd..cbc01372c7a1 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -235,19 +235,27 @@ static void update_writeback_rate(struct
> work_struct *work)
>  		return;
>  	}
> 
> -	if (atomic_read(&dc->has_dirty) && dc->writeback_percent) {
> +	if (atomic_read(&dc->has_dirty) && dc->writeback_percent &&
> +	    !set_at_max_writeback_rate(c, dc)) {
>  		/*
>  		 * If the whole cache set is idle, set_at_max_writeback_rate()
>  		 * will set writeback rate to a max number. Then it is
>  		 * unncessary to update writeback rate for an idle cache set
>  		 * in maximum writeback rate number(s).
>  		 */
> -		if (!set_at_max_writeback_rate(c, dc)) {

The reason I didn't place '!set_at_max_writeback_rate' with other items 
in
previous if() was for the above code comment. If I moved it to previous
if() without other items, I was not comfortable to place the code 
comments
neither before or after the if() check. So I used a separated if() check 
for
'!set_at_max_writeback_rate'.

 From your change, it seems placing the code comments behind is fine (or
better), can I understand in this way? I try to learn and follow your 
way
to handle such code comments situation.


> -			down_read(&dc->writeback_lock);
> +		do {
> +			if (!down_read_trylock(&dc->writeback_lock)) {
> +				dc->rate_update_retry++;
> +				if (dc->rate_update_retry < MY_MAX)
> +					break;
> +				down_read(&dc->writeback_lock);
> +				dc->rate_update_retry = 0;

The incremental reschedule delay was to avoid might-be-useless retry, 
but
the above method works too. Just setting the default retry_max from 5 to
15, for 10 more retry with 5 seconds interval, it's fine. I can modify
the change in this way to recuse change size.

> +			}
> +
>  			__update_writeback_rate(dc);
>  			update_gc_after_writeback(c);
>  			up_read(&dc->writeback_lock);
> -		}
> +		} while (0);

Aha, this is cool! I never though of using do{}while(0) and break in 
such a
genius way! Sure I will use this, thanks for the hint :-)

After you reply my defense of dc->retry_max, and the question of code
comments location, I will update and test the patch again, and re-sbumit 
to
you.

Thanks for your constructive suggestion, especially the do{}while(0) 
part!

Coly Li

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate()
  2022-05-27 17:05     ` colyli
@ 2022-05-28  0:00       ` Jens Axboe
  2022-05-28  2:20         ` colyli
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2022-05-28  0:00 UTC (permalink / raw)
  To: colyli; +Cc: linux-bcache, linux-block

On 5/27/22 11:05 AM, colyli wrote:
> ? 2022-05-27 23:49?Jens Axboe ???
>> On 5/27/22 9:28 AM, Coly Li wrote:
>>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>>> index d138a2d73240..c51671abe74e 100644
>>> --- a/drivers/md/bcache/writeback.c
>>> +++ b/drivers/md/bcache/writeback.c
>>> @@ -214,6 +214,7 @@ static void update_writeback_rate(struct work_struct *work)
>>>                           struct cached_dev,
>>>                           writeback_rate_update);
>>>      struct cache_set *c = dc->disk.c;
>>> +    bool contention = false;
>>>
>>>      /*
>>>       * should check BCACHE_DEV_RATE_DW_RUNNING before calling
>>> @@ -243,13 +244,41 @@ static void update_writeback_rate(struct work_struct *work)
>>>           * in maximum writeback rate number(s).
>>>           */
>>>          if (!set_at_max_writeback_rate(c, dc)) {
>>> -            down_read(&dc->writeback_lock);
>>> -            __update_writeback_rate(dc);
>>> -            update_gc_after_writeback(c);
>>> -            up_read(&dc->writeback_lock);
>>> +            /*
>>> +             * When contention happens on dc->writeback_lock with
>>> +             * the writeback thread, this kwork may be blocked for
>>> +             * very long time if there are too many dirty data to
>>> +             * writeback, and kerne message will complain a (bogus)
>>> +             * software lockup kernel message. To avoid potential
>>> +             * starving, if down_read_trylock() fails, writeback
>>> +             * rate updating will be skipped for dc->retry_max times
>>> +             * at most while delay this worker a bit longer time.
>>> +             * If dc->retry_max times are tried and the trylock
>>> +             * still fails, then call down_read() to wait for
>>> +             * dc->writeback_lock.
>>> +             */
>>> +            if (!down_read_trylock((&dc->writeback_lock))) {
>>> +                contention = true;
>>> +                dc->retry_nr++;
>>> +                if (dc->retry_nr > dc->retry_max)
>>> +                    down_read(&dc->writeback_lock);
>>> +            }
>>> +
>>> +            if (!contention || dc->retry_nr > dc->retry_max) {
>>> +                __update_writeback_rate(dc);
>>> +                update_gc_after_writeback(c);
>>> +                up_read(&dc->writeback_lock);
>>> +                dc->retry_nr = 0;
>>> +            }
>>>          }
>>>      }
>>
> 
> Hi Jens,
> 
> Thanks for looking into this :-)
> 
>> This is really not very pretty. First of all, why bother with storing a
>> max retry value in there? Doesn't seem like it'd ever be different per
> 
> It is because the probability of the lock contention on
> dc->writeback_lock depends on the I/O speed backing device. From my
> observation during the tests, for fast backing device with larger
> cache device, its writeback thread may work harder to flush more dirty
> data to backing device, the lock contention happens more and longer,
> so the writeback rate update kworker has to wait longer time before
> acquires dc->writeback_lock. So its dc->retry_max should be larger
> then slow backing device.
> 
> Therefore I'd like to have a tunable per-backing-device retry_max. And
> the syses interface will be added when users/customers want it. The
> use case is from SAP HANA users, I have report that they observe the
> soft lockup warning for dc->writeback_lock contention and worry about
> whether data is corrupted (indeed, of course not).

The initial patch has 5 as the default, and there are no sysfs knobs. If
you ever need a sysfs knob, by all means make it a variable and make it
configurable too. But don't do it upfront where the '5' suitabled named
would do the job.

>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>> index 9ee0005874cd..cbc01372c7a1 100644
>> --- a/drivers/md/bcache/writeback.c
>> +++ b/drivers/md/bcache/writeback.c
>> @@ -235,19 +235,27 @@ static void update_writeback_rate(struct
>> work_struct *work)
>>          return;
>>      }
>>
>> -    if (atomic_read(&dc->has_dirty) && dc->writeback_percent) {
>> +    if (atomic_read(&dc->has_dirty) && dc->writeback_percent &&
>> +        !set_at_max_writeback_rate(c, dc)) {
>>          /*
>>           * If the whole cache set is idle, set_at_max_writeback_rate()
>>           * will set writeback rate to a max number. Then it is
>>           * unncessary to update writeback rate for an idle cache set
>>           * in maximum writeback rate number(s).
>>           */
>> -        if (!set_at_max_writeback_rate(c, dc)) {
> 
> The reason I didn't place '!set_at_max_writeback_rate' with other items in
> previous if() was for the above code comment. If I moved it to previous
> if() without other items, I was not comfortable to place the code comments
> neither before or after the if() check. So I used a separated if() check for
> '!set_at_max_writeback_rate'.
> 
> From your change, it seems placing the code comments behind is fine (or
> better), can I understand in this way? I try to learn and follow your way
> to handle such code comments situation.

Just put it higher up if you want, it doesn't really matter, or leave it
where it is.

>>              __update_writeback_rate(dc);
>>              update_gc_after_writeback(c);
>>              up_read(&dc->writeback_lock);
>> -        }
>> +        } while (0);
> 
> Aha, this is cool! I never though of using do{}while(0) and break in
> such a genius way! Sure I will use this, thanks for the hint :-)
> 
> After you reply my defense of dc->retry_max, and the question of code
> comments location, I will update and test the patch again, and
> re-sbumit to you.
> 
> Thanks for your constructive suggestion, especially the do{}while(0)
> part!

I would do something similar to my change and drop the 'dc' addition for
the max retries as it, by definition, can only be one value right now.
For all I know, you'll never need to change it again, and then you're
just wasting memory and making the code harder to read by putting it in
dc instead of just having this define.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate()
  2022-05-28  0:00       ` Jens Axboe
@ 2022-05-28  2:20         ` colyli
  0 siblings, 0 replies; 9+ messages in thread
From: colyli @ 2022-05-28  2:20 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-bcache, linux-block

在 2022-05-28 08:00,Jens Axboe 写道:
> On 5/27/22 11:05 AM, colyli wrote:
>> ? 2022-05-27 23:49?Jens Axboe ???
>>> On 5/27/22 9:28 AM, Coly Li wrote:
>>>> diff --git a/drivers/md/bcache/writeback.c 
>>>> b/drivers/md/bcache/writeback.c
>>>> index d138a2d73240..c51671abe74e 100644
>>>> --- a/drivers/md/bcache/writeback.c
>>>> +++ b/drivers/md/bcache/writeback.c
>>>> @@ -214,6 +214,7 @@ static void update_writeback_rate(struct 
>>>> work_struct *work)
>>>>                           struct cached_dev,
>>>>                           writeback_rate_update);
>>>>      struct cache_set *c = dc->disk.c;
>>>> +    bool contention = false;
>>>> 
>>>>      /*
>>>>       * should check BCACHE_DEV_RATE_DW_RUNNING before calling
>>>> @@ -243,13 +244,41 @@ static void update_writeback_rate(struct 
>>>> work_struct *work)
>>>>           * in maximum writeback rate number(s).
>>>>           */
>>>>          if (!set_at_max_writeback_rate(c, dc)) {
>>>> -            down_read(&dc->writeback_lock);
>>>> -            __update_writeback_rate(dc);
>>>> -            update_gc_after_writeback(c);
>>>> -            up_read(&dc->writeback_lock);
>>>> +            /*
>>>> +             * When contention happens on dc->writeback_lock with
>>>> +             * the writeback thread, this kwork may be blocked for
>>>> +             * very long time if there are too many dirty data to
>>>> +             * writeback, and kerne message will complain a (bogus)
>>>> +             * software lockup kernel message. To avoid potential
>>>> +             * starving, if down_read_trylock() fails, writeback
>>>> +             * rate updating will be skipped for dc->retry_max 
>>>> times
>>>> +             * at most while delay this worker a bit longer time.
>>>> +             * If dc->retry_max times are tried and the trylock
>>>> +             * still fails, then call down_read() to wait for
>>>> +             * dc->writeback_lock.
>>>> +             */
>>>> +            if (!down_read_trylock((&dc->writeback_lock))) {
>>>> +                contention = true;
>>>> +                dc->retry_nr++;
>>>> +                if (dc->retry_nr > dc->retry_max)
>>>> +                    down_read(&dc->writeback_lock);
>>>> +            }
>>>> +
>>>> +            if (!contention || dc->retry_nr > dc->retry_max) {
>>>> +                __update_writeback_rate(dc);
>>>> +                update_gc_after_writeback(c);
>>>> +                up_read(&dc->writeback_lock);
>>>> +                dc->retry_nr = 0;
>>>> +            }
>>>>          }
>>>>      }
>>> 
>> 
>> Hi Jens,
>> 
>> Thanks for looking into this :-)
>> 
>>> This is really not very pretty. First of all, why bother with storing 
>>> a
>>> max retry value in there? Doesn't seem like it'd ever be different 
>>> per
>> 
>> It is because the probability of the lock contention on
>> dc->writeback_lock depends on the I/O speed backing device. From my
>> observation during the tests, for fast backing device with larger
>> cache device, its writeback thread may work harder to flush more dirty
>> data to backing device, the lock contention happens more and longer,
>> so the writeback rate update kworker has to wait longer time before
>> acquires dc->writeback_lock. So its dc->retry_max should be larger
>> then slow backing device.
>> 
>> Therefore I'd like to have a tunable per-backing-device retry_max. And
>> the syses interface will be added when users/customers want it. The
>> use case is from SAP HANA users, I have report that they observe the
>> soft lockup warning for dc->writeback_lock contention and worry about
>> whether data is corrupted (indeed, of course not).
> 
> The initial patch has 5 as the default, and there are no sysfs knobs. 
> If
> you ever need a sysfs knob, by all means make it a variable and make it
> configurable too. But don't do it upfront where the '5' suitabled named
> would do the job.
> 

Copied.

>>> diff --git a/drivers/md/bcache/writeback.c 
>>> b/drivers/md/bcache/writeback.c
>>> index 9ee0005874cd..cbc01372c7a1 100644
>>> --- a/drivers/md/bcache/writeback.c
>>> +++ b/drivers/md/bcache/writeback.c
>>> @@ -235,19 +235,27 @@ static void update_writeback_rate(struct
>>> work_struct *work)
>>>          return;
>>>      }
>>> 
>>> -    if (atomic_read(&dc->has_dirty) && dc->writeback_percent) {
>>> +    if (atomic_read(&dc->has_dirty) && dc->writeback_percent &&
>>> +        !set_at_max_writeback_rate(c, dc)) {
>>>          /*
>>>           * If the whole cache set is idle, 
>>> set_at_max_writeback_rate()
>>>           * will set writeback rate to a max number. Then it is
>>>           * unncessary to update writeback rate for an idle cache set
>>>           * in maximum writeback rate number(s).
>>>           */
>>> -        if (!set_at_max_writeback_rate(c, dc)) {
>> 
>> The reason I didn't place '!set_at_max_writeback_rate' with other 
>> items in
>> previous if() was for the above code comment. If I moved it to 
>> previous
>> if() without other items, I was not comfortable to place the code 
>> comments
>> neither before or after the if() check. So I used a separated if() 
>> check for
>> '!set_at_max_writeback_rate'.
>> 
>> From your change, it seems placing the code comments behind is fine 
>> (or
>> better), can I understand in this way? I try to learn and follow your 
>> way
>> to handle such code comments situation.
> 
> Just put it higher up if you want, it doesn't really matter, or leave 
> it
> where it is.
> 

Copied.

>>>              __update_writeback_rate(dc);
>>>              update_gc_after_writeback(c);
>>>              up_read(&dc->writeback_lock);
>>> -        }
>>> +        } while (0);
>> 
>> Aha, this is cool! I never though of using do{}while(0) and break in
>> such a genius way! Sure I will use this, thanks for the hint :-)
>> 
>> After you reply my defense of dc->retry_max, and the question of code
>> comments location, I will update and test the patch again, and
>> re-sbumit to you.
>> 
>> Thanks for your constructive suggestion, especially the do{}while(0)
>> part!
> 
> I would do something similar to my change and drop the 'dc' addition 
> for
> the max retries as it, by definition, can only be one value right now.
> For all I know, you'll never need to change it again, and then you're
> just wasting memory and making the code harder to read by putting it in
> dc instead of just having this define.

Copied. I will do the change and repost it again. Thank you for the 
review and comments.

Coly Li

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-05-28  2:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-27 15:28 [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave) Coly Li
2022-05-27 15:28 ` [PATCH 1/3] bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() Coly Li
2022-05-27 15:28 ` [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate() Coly Li
2022-05-27 15:49   ` Jens Axboe
2022-05-27 17:05     ` colyli
2022-05-28  0:00       ` Jens Axboe
2022-05-28  2:20         ` colyli
2022-05-27 15:28 ` [PATCH 3/3] md: bcache: check the return value of kzalloc() in detached_dev_do_request() Coly Li
2022-05-27 15:50 ` (subset) [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave) Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.