All of lore.kernel.org
 help / color / mirror / Atom feed
* set max writeback rate when bcache device is idle
@ 2017-07-19  5:36 Coly Li
  2017-07-19  5:36 ` [PATCH 1/2] bcache: do not perform writeback if dirty data is no more than writeback_percent Coly Li
  2017-07-19  5:36 ` [PATCH 2/2] bcache: implement max_writeback_rate_when_idle option for writeback mode Coly Li
  0 siblings, 2 replies; 7+ messages in thread
From: Coly Li @ 2017-07-19  5:36 UTC (permalink / raw)
  To: linux-bcache; +Cc: bcache, tang.junhui, kubuxu

This patch set is an effort to make bcache writes back dirty data when
it is idle. Junhui Tang posted a fix days before, which IMHO has unstable
interfere to writeback rate PD controller. This is another approach for
same problem.

The first patch with subject "bcache: do not perform writeback if dirty
data is no more than writeback_percent" makes writeback_percent works as
designed. This patch is depended by the seconded patch.

The second patch with subject "bcache: implement
max_writeback_rate_when_idle option for writeback mode" implements an
option to set a maximum writeback rate when bcache device is idle. Then
dirty data can be wrote back to cached device and cleaned on cache device.
After cache set is cleaned, the following write requests on bcache device
won't trigger writeback unless dirty data exceeds writeback_percent.

Please review the patch set, any comment is welcome :-)

Thanks in advance.

Coly Li

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] bcache: do not perform writeback if dirty data is no more than writeback_percent
  2017-07-19  5:36 set max writeback rate when bcache device is idle Coly Li
@ 2017-07-19  5:36 ` Coly Li
  2017-07-19  8:19   ` Coly Li
       [not found]   ` <OF7ADE25A8.9CA7AF89-ON48258165.00108B81-48258165.0010FA59@zte.com.cn>
  2017-07-19  5:36 ` [PATCH 2/2] bcache: implement max_writeback_rate_when_idle option for writeback mode Coly Li
  1 sibling, 2 replies; 7+ messages in thread
From: Coly Li @ 2017-07-19  5:36 UTC (permalink / raw)
  To: linux-bcache; +Cc: bcache, tang.junhui, kubuxu, Coly Li

The functionanlity of sysfs entry writeback_percent is explained in file
Document/ABI/testing/sysfs-block-bcache:
"For backing devices: If nonzero, writeback from cache to
 backing device only takes place when more than this percentage
 of the cache is used, allowing more write coalescing to take
 place and reducing total number of writes sent to the backing
 device. Integer between 0 and 40."

Indeed currently in writeback mode, even dirty data percent is not exceed
writeback_percent of the cached device, writeback still happens at least
one key per second. This is not a behavior as designed.

This patch does three things to fix the above issue,
1) Move writeback thread related checks from bch_writeback_thread()
   to no_writeback_now().
2) Add a duration to count time duration since last I/O request received
   to the moment when no_writeback_now() is called. When duration is more
   than BCH_IDLE_DURATION_MSECS, perform a proactive writeback no matter
   dirty data exceeds writeback_percent or not.
3) If duration is not timeout, and dirty data is not more than cached
   device's writeback_percent, writeback thread just schedules out and
   waits for another try after BCH_IDLE_DURATION_MSECS.

By this patch, if a write request's dirty data does not exceed
writeback_percent, writeback will not happen. In this case dirty data may
stay in cache device for a very long time if I/O on bcache device is idle.
To fix this, bch_writeback_queue() is called in update_writeback_rate()
to wake up the writeback thread in period of writeback_rate_update_seconds.
Then if no I/O happens on bcache device for BCH_IDLE_DURATION_MSECS
milliseconds, at least we have chance to wake up writeback thread to find
the duration timeout in a not-that-much delay.

Now writeback_percent acts as what it is defined for.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/bcache.h    |  4 ++++
 drivers/md/bcache/request.c   |  6 +++++
 drivers/md/bcache/writeback.c | 55 +++++++++++++++++++++++++++++++++++++++----
 3 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index dee542fff68e..ab7e60336edb 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -662,6 +662,8 @@ struct cache_set {
 
 #define BUCKET_HASH_BITS	12
 	struct hlist_head	bucket_hash[1 << BUCKET_HASH_BITS];
+
+	uint64_t		last_request_time;
 };
 
 struct bbio {
@@ -680,6 +682,8 @@ struct bbio {
 #define BTREE_PRIO		USHRT_MAX
 #define INITIAL_PRIO		32768U
 
+#define BCH_IDLE_DURATION_MSECS	10000U
+
 #define btree_bytes(c)		((c)->btree_pages * PAGE_SIZE)
 #define btree_blocks(b)							\
 	((unsigned) (KEY_SIZE(&b->key) >> (b)->c->block_bits))
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 019b3df9f1c6..98971486f7f1 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -957,10 +957,13 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q,
 	struct search *s;
 	struct bcache_device *d = bio->bi_bdev->bd_disk->private_data;
 	struct cached_dev *dc = container_of(d, struct cached_dev, disk);
+	struct cache_set *c = d->c;
 	int rw = bio_data_dir(bio);
 
 	generic_start_io_acct(rw, bio_sectors(bio), &d->disk->part0);
 
+	if (c)
+		c->last_request_time = local_clock();
 	bio->bi_bdev = dc->bdev;
 	bio->bi_iter.bi_sector += dc->sb.data_offset;
 
@@ -991,6 +994,9 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q,
 		else
 			generic_make_request(bio);
 	}
+	/* update last_request_time again to make it to be more accurate */
+	if (c)
+		c->last_request_time = local_clock();
 
 	return BLK_QC_T_NONE;
 }
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 42c66e76f05e..13594dd7f564 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -78,8 +78,15 @@ static void update_writeback_rate(struct work_struct *work)
 	down_read(&dc->writeback_lock);
 
 	if (atomic_read(&dc->has_dirty) &&
-	    dc->writeback_percent)
+	    dc->writeback_percent) {
 		__update_writeback_rate(dc);
+		/*
+		 * wake up writeback thread to check whether request
+		 * duration is timeout in no_writeback_now(). If yes,
+		 * existing dirty data should be handled.
+		 */
+		bch_writeback_queue(dc);
+	}
 
 	up_read(&dc->writeback_lock);
 
@@ -412,6 +419,48 @@ static bool refill_dirty(struct cached_dev *dc)
 	return bkey_cmp(&buf->last_scanned, &start_pos) >= 0;
 }
 
+static bool no_writeback_now(struct cached_dev *dc)
+{
+	uint64_t duration;
+	struct cache_set *c = dc->disk.c;
+	uint64_t cache_set_sectors;
+	uint64_t dirty_sectors;
+	uint64_t percent_sectors;
+
+	if (!atomic_read(&dc->has_dirty))
+		return true;
+
+	if (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) &&
+	    !dc->writeback_running)
+		return true;
+
+	/*
+	 * last_request_time is initialized to 0, it is possible that
+	 * duration is larger than BCH_IDLE_DURATION_MSECS when the first
+	 * time it is calculated. If there is dirty data of the cached
+	 * device, bcache will try to perform a proactive writeback.
+	 */
+	duration = local_clock() - c->last_request_time;
+	if ((duration/NSEC_PER_MSEC) > BCH_IDLE_DURATION_MSECS)
+		return false;
+
+	/*
+	 * If duration is not timeout, and dirty data does not exceed
+	 * writeback_percent, no writeback now.
+	 * This is what writeback_percent is designed for, see
+	 * /sys/block/<disk>/bcache/writeback_percentkernel in
+	 * Documentation/ABI/testing/sysfs-block-bcache
+	 */
+	cache_set_sectors = c->nbuckets * c->sb.bucket_size;
+	percent_sectors = div64_u64(
+		cache_set_sectors * dc->writeback_percent, 100);
+	dirty_sectors = bcache_dev_sectors_dirty(&dc->disk);
+	if (dirty_sectors <= percent_sectors)
+		return true;
+
+	return false;
+}
+
 static int bch_writeback_thread(void *arg)
 {
 	struct cached_dev *dc = arg;
@@ -419,9 +468,7 @@ static int bch_writeback_thread(void *arg)
 
 	while (!kthread_should_stop()) {
 		down_write(&dc->writeback_lock);
-		if (!atomic_read(&dc->has_dirty) ||
-		    (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) &&
-		     !dc->writeback_running)) {
+		if (no_writeback_now(dc)) {
 			up_write(&dc->writeback_lock);
 			set_current_state(TASK_INTERRUPTIBLE);
 
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] bcache: implement max_writeback_rate_when_idle option for writeback mode
  2017-07-19  5:36 set max writeback rate when bcache device is idle Coly Li
  2017-07-19  5:36 ` [PATCH 1/2] bcache: do not perform writeback if dirty data is no more than writeback_percent Coly Li
@ 2017-07-19  5:36 ` Coly Li
  1 sibling, 0 replies; 7+ messages in thread
From: Coly Li @ 2017-07-19  5:36 UTC (permalink / raw)
  To: linux-bcache; +Cc: bcache, tang.junhui, kubuxu, Coly Li

For bcache writeback mode, a writeback rate for each cached device is
adjusted dynamically by a PD controller in __update_writeback_rate().

If a bcache device is in idle state (no I/O request for an explicity
while of time), and there is dirty data existing on cache device, the
PD controller will decrease its writeback rate continuously and finally
adjust the writeback rate to 1. It means the writeback throughput of
a cached device is around 1 key per second, this is almost useless.

There are two major disadvantage of the above behavior,
1) The time of idle state is wasted. It could have been used for dirty
   data writeback and garbage collection, which might have had positive
   contribution for throughput and latency of further I/O requests.
2) For large amount of dirty data, cached devices are always actived with
   very low I/O efficiency. If all the dirty data could have been wrote
   back to cached devices in a relative short time, there could have been
   a potential energy saving because no active cached device any more.

max_writeback_rate_when_idle is a cache set option via sysfs interface,
it is implemented to solve the above problems. Here I explain how it works.
1) When max_writeback_rate_when_idle of a cache set is enabled (set to 1),
   and this cache set goes into idle state (no I/O request for an explicity
   while of time), writeback rate of all its cached devices containing
   dirty data will be set to max value (INT_MAX for now). Now dirty data
   can be wrote back to cached device as fast as it can be.
2) If an I/O request comes while max rate writeback is performing,
   writeback rates of all writing back cached device will be set to 1. Then
   writeback I/O won't interfere regular I/O request to bcache device. The
   writeback rate of each cached device will continue to be dynamically
   adjusted by PD controller in __update_writeback_rate(), until next cache
   set I/O idle state comes.

This patch does not change existing PD controller logic, most of its changes
happens in update_writeback_rate(), to set writeback rate to INT_MAX or 1
depending on cache set I/O idle state.

This option can be configured via sysfs file at
/sys/block/<bcache device>/bcache/cache/internal/max_writeback_rate_when_idle
Write integer 1 to enable it, and integer 0 to disable it. It is enabled
as default.

Signed-off-by: Coly Li <colyli@suse.de>
---
 drivers/md/bcache/bcache.h    |  2 ++
 drivers/md/bcache/super.c     |  1 +
 drivers/md/bcache/sysfs.c     |  6 +++++
 drivers/md/bcache/writeback.c | 51 ++++++++++++++++++++++++++++++++++---------
 4 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index ab7e60336edb..8499cd1c6c7f 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -659,6 +659,8 @@ struct cache_set {
 	unsigned		gc_always_rewrite:1;
 	unsigned		shrinker_disabled:1;
 	unsigned		copy_gc_enabled:1;
+	unsigned		max_writeback_rate_when_idle:1;
+	unsigned		request_to_cache_idle:1;
 
 #define BUCKET_HASH_BITS	12
 	struct hlist_head	bucket_hash[1 << BUCKET_HASH_BITS];
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 8352fad765f6..e21a1684ca2e 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1533,6 +1533,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
 	c->congested_read_threshold_us	= 2000;
 	c->congested_write_threshold_us	= 20000;
 	c->error_limit	= 8 << IO_ERROR_SHIFT;
+	c->max_writeback_rate_when_idle = 1;
 
 	return c;
 err:
diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index f90f13616980..6b1652fe797d 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -106,6 +106,7 @@ rw_attribute(cache_replacement_policy);
 rw_attribute(btree_shrinker_disabled);
 rw_attribute(copy_gc_enabled);
 rw_attribute(size);
+rw_attribute(max_writeback_rate_when_idle);
 
 SHOW(__bch_cached_dev)
 {
@@ -570,6 +571,8 @@ SHOW(__bch_cache_set)
 	sysfs_printf(gc_always_rewrite,		"%i", c->gc_always_rewrite);
 	sysfs_printf(btree_shrinker_disabled,	"%i", c->shrinker_disabled);
 	sysfs_printf(copy_gc_enabled,		"%i", c->copy_gc_enabled);
+	sysfs_printf(max_writeback_rate_when_idle,
+		     "%i", c->max_writeback_rate_when_idle);
 
 	if (attr == &sysfs_bset_tree_stats)
 		return bch_bset_print_stats(c, buf);
@@ -653,6 +656,8 @@ STORE(__bch_cache_set)
 	sysfs_strtoul(gc_always_rewrite,	c->gc_always_rewrite);
 	sysfs_strtoul(btree_shrinker_disabled,	c->shrinker_disabled);
 	sysfs_strtoul(copy_gc_enabled,		c->copy_gc_enabled);
+	sysfs_strtoul(max_writeback_rate_when_idle,
+		      c->max_writeback_rate_when_idle);
 
 	return size;
 }
@@ -728,6 +733,7 @@ static struct attribute *bch_cache_set_internal_files[] = {
 	&sysfs_gc_always_rewrite,
 	&sysfs_btree_shrinker_disabled,
 	&sysfs_copy_gc_enabled,
+	&sysfs_max_writeback_rate_when_idle,
 	NULL
 };
 KTYPE(bch_cache_set_internal);
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 13594dd7f564..38dac49b28cb 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -74,22 +74,53 @@ static void update_writeback_rate(struct work_struct *work)
 	struct cached_dev *dc = container_of(to_delayed_work(work),
 					     struct cached_dev,
 					     writeback_rate_update);
+	struct cache_set *c = dc->disk.c;
+	uint64_t duration, now;
+	bool timeout = false, wakeup = false;
 
 	down_read(&dc->writeback_lock);
+	if (!atomic_read(&dc->has_dirty))
+		goto schedule_delay;
+
+	now = local_clock();
+	if (c->last_request_time) {
+		duration = now - c->last_request_time;
+	} else {
+		c->last_request_time = now;
+		duration = 0;
+	}
 
-	if (atomic_read(&dc->has_dirty) &&
-	    dc->writeback_percent) {
-		__update_writeback_rate(dc);
-		/*
-		 * wake up writeback thread to check whether request
-		 * duration is timeout in no_writeback_now(). If yes,
-		 * existing dirty data should be handled.
-		 */
-		bch_writeback_queue(dc);
+	if ((duration/NSEC_PER_MSEC) > BCH_IDLE_DURATION_MSECS)
+		timeout = true;
+
+	if (timeout && c->max_writeback_rate_when_idle) {
+		dc->writeback_rate.rate = INT_MAX;
+		wakeup = true;
+		goto schedule_delay;
 	}
 
-	up_read(&dc->writeback_lock);
+	/*
+	 * New requests break I/O idle status, set writeback rate to 1,
+	 * to make sure requests on cache device have good throughput
+	 * and latency as soon as possible. Then the PD controller in
+	 * __update_writeback_tate() may dynamic set a proper writeback
+	 * rate.
+	 */
+	if (!timeout && c->request_to_cache_idle)
+		dc->writeback_rate.rate = 1;
 
+	/*
+	 * Do not check writeback_percent here, because it might be set
+	 * to zero while dirty data exist. Once dc->has_dirty is set,
+	 * __update_writeback_rate() should always be called here.
+	 */
+	__update_writeback_rate(dc);
+
+schedule_delay:
+	c->request_to_cache_idle = timeout ? 1 : 0;
+	up_read(&dc->writeback_lock);
+	if (wakeup)
+		bch_writeback_queue(dc);
 	schedule_delayed_work(&dc->writeback_rate_update,
 			      dc->writeback_rate_update_seconds * HZ);
 }
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] bcache: do not perform writeback if dirty data is no more than writeback_percent
  2017-07-19  5:36 ` [PATCH 1/2] bcache: do not perform writeback if dirty data is no more than writeback_percent Coly Li
@ 2017-07-19  8:19   ` Coly Li
       [not found]     ` <OF2AB44ECA.94CE1BF0-ON48258162.00356FFD-48258162.00359404@zte.com.cn>
       [not found]   ` <OF7ADE25A8.9CA7AF89-ON48258165.00108B81-48258165.0010FA59@zte.com.cn>
  1 sibling, 1 reply; 7+ messages in thread
From: Coly Li @ 2017-07-19  8:19 UTC (permalink / raw)
  To: Coly Li, linux-bcache; +Cc: bcache, tang.junhui, kubuxu

On 2017/7/19 下午1:36, Coly Li wrote:
> The functionanlity of sysfs entry writeback_percent is explained in file
> Document/ABI/testing/sysfs-block-bcache:
> "For backing devices: If nonzero, writeback from cache to
>  backing device only takes place when more than this percentage
>  of the cache is used, allowing more write coalescing to take
>  place and reducing total number of writes sent to the backing
>  device. Integer between 0 and 40."
> 
> Indeed currently in writeback mode, even dirty data percent is not exceed
> writeback_percent of the cached device, writeback still happens at least
> one key per second. This is not a behavior as designed.
> 
> This patch does three things to fix the above issue,
> 1) Move writeback thread related checks from bch_writeback_thread()
>    to no_writeback_now().
> 2) Add a duration to count time duration since last I/O request received
>    to the moment when no_writeback_now() is called. When duration is more
>    than BCH_IDLE_DURATION_MSECS, perform a proactive writeback no matter
>    dirty data exceeds writeback_percent or not.
> 3) If duration is not timeout, and dirty data is not more than cached
>    device's writeback_percent, writeback thread just schedules out and
>    waits for another try after BCH_IDLE_DURATION_MSECS.
> 
> By this patch, if a write request's dirty data does not exceed
> writeback_percent, writeback will not happen. In this case dirty data may
> stay in cache device for a very long time if I/O on bcache device is idle.
> To fix this, bch_writeback_queue() is called in update_writeback_rate()
> to wake up the writeback thread in period of writeback_rate_update_seconds.
> Then if no I/O happens on bcache device for BCH_IDLE_DURATION_MSECS
> milliseconds, at least we have chance to wake up writeback thread to find
> the duration timeout in a not-that-much delay.
> 
> Now writeback_percent acts as what it is defined for.
> 
> Signed-off-by: Coly Li <colyli@suse.de>
> ---
>  drivers/md/bcache/bcache.h    |  4 ++++
>  drivers/md/bcache/request.c   |  6 +++++
>  drivers/md/bcache/writeback.c | 55 +++++++++++++++++++++++++++++++++++++++----
>  3 files changed, 61 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index dee542fff68e..ab7e60336edb 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -662,6 +662,8 @@ struct cache_set {
>  
>  #define BUCKET_HASH_BITS	12
>  	struct hlist_head	bucket_hash[1 << BUCKET_HASH_BITS];
> +
> +	uint64_t		last_request_time;
>  };
>  
>  struct bbio {
> @@ -680,6 +682,8 @@ struct bbio {
>  #define BTREE_PRIO		USHRT_MAX
>  #define INITIAL_PRIO		32768U
>  
> +#define BCH_IDLE_DURATION_MSECS	10000U
> +
>  #define btree_bytes(c)		((c)->btree_pages * PAGE_SIZE)
>  #define btree_blocks(b)							\
>  	((unsigned) (KEY_SIZE(&b->key) >> (b)->c->block_bits))
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index 019b3df9f1c6..98971486f7f1 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -957,10 +957,13 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q,
>  	struct search *s;
>  	struct bcache_device *d = bio->bi_bdev->bd_disk->private_data;
>  	struct cached_dev *dc = container_of(d, struct cached_dev, disk);
> +	struct cache_set *c = d->c;
>  	int rw = bio_data_dir(bio);
>  
>  	generic_start_io_acct(rw, bio_sectors(bio), &d->disk->part0);
>  
> +	if (c)
> +		c->last_request_time = local_clock();
>  	bio->bi_bdev = dc->bdev;
>  	bio->bi_iter.bi_sector += dc->sb.data_offset;
>  
> @@ -991,6 +994,9 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q,
>  		else
>  			generic_make_request(bio);
>  	}
> +	/* update last_request_time again to make it to be more accurate */
> +	if (c)
> +		c->last_request_time = local_clock();
>  
>  	return BLK_QC_T_NONE;
>  }
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 42c66e76f05e..13594dd7f564 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -78,8 +78,15 @@ static void update_writeback_rate(struct work_struct *work)
>  	down_read(&dc->writeback_lock);
>  
>  	if (atomic_read(&dc->has_dirty) &&
> -	    dc->writeback_percent)
> +	    dc->writeback_percent) {
>  		__update_writeback_rate(dc);
> +		/*
> +		 * wake up writeback thread to check whether request
> +		 * duration is timeout in no_writeback_now(). If yes,
> +		 * existing dirty data should be handled.
> +		 */
> +		bch_writeback_queue(dc);
> +	}
>  
>  	up_read(&dc->writeback_lock);
>  
> @@ -412,6 +419,48 @@ static bool refill_dirty(struct cached_dev *dc)
>  	return bkey_cmp(&buf->last_scanned, &start_pos) >= 0;
>  }
>  
> +static bool no_writeback_now(struct cached_dev *dc)
> +{
> +	uint64_t duration;
> +	struct cache_set *c = dc->disk.c;
> +	uint64_t cache_set_sectors;
> +	uint64_t dirty_sectors;
> +	uint64_t percent_sectors;
> +
> +	if (!atomic_read(&dc->has_dirty))
> +		return true;
> +
> +	if (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) &&
> +	    !dc->writeback_running)
> +		return true;
> +
> +	/*
> +	 * last_request_time is initialized to 0, it is possible that
> +	 * duration is larger than BCH_IDLE_DURATION_MSECS when the first
> +	 * time it is calculated. If there is dirty data of the cached
> +	 * device, bcache will try to perform a proactive writeback.
> +	 */
> +	duration = local_clock() - c->last_request_time;
> +	if ((duration/NSEC_PER_MSEC) > BCH_IDLE_DURATION_MSECS)
> +		return false;
> +
> +	/*
> +	 * If duration is not timeout, and dirty data does not exceed
> +	 * writeback_percent, no writeback now.
> +	 * This is what writeback_percent is designed for, see
> +	 * /sys/block/<disk>/bcache/writeback_percentkernel in
> +	 * Documentation/ABI/testing/sysfs-block-bcache
> +	 */
> +	cache_set_sectors = c->nbuckets * c->sb.bucket_size;
> +	percent_sectors = div64_u64(
> +		cache_set_sectors * dc->writeback_percent, 100);
> +	dirty_sectors = bcache_dev_sectors_dirty(&dc->disk);

Currently bcache_dev_sectors_dirty() iterates all dirty stripes of
bcache device, it will take around 15 milliseconds for a 15.2TB cache
device on my hardware. The latency is unacceptable, I also mentioned
that on Junhui's work (previously posted patch titled "bcache: update
bucket_in_use periodically").

I am working on a low latency dirty sectors counter now, after it gets
done, I will add them into this patch set and post again.

Just FYI. Thanks.


Coly



> +	if (dirty_sectors <= percent_sectors)
> +		return true;
> +
> +	return false;
> +}
> +
>  static int bch_writeback_thread(void *arg)
>  {
>  	struct cached_dev *dc = arg;
> @@ -419,9 +468,7 @@ static int bch_writeback_thread(void *arg)
>  
>  	while (!kthread_should_stop()) {
>  		down_write(&dc->writeback_lock);
> -		if (!atomic_read(&dc->has_dirty) ||
> -		    (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) &&
> -		     !dc->writeback_running)) {
> +		if (no_writeback_now(dc)) {
>  			up_write(&dc->writeback_lock);
>  			set_current_state(TASK_INTERRUPTIBLE);
>  
> 


-- 
Coly Li

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] bcache: do not perform writeback if dirty data is no more than writeback_percent
       [not found]     ` <OF2AB44ECA.94CE1BF0-ON48258162.00356FFD-48258162.00359404@zte.com.cn>
@ 2017-07-19 10:32       ` Coly Li
  0 siblings, 0 replies; 7+ messages in thread
From: Coly Li @ 2017-07-19 10:32 UTC (permalink / raw)
  To: tang.junhui; +Cc: bcache, kubuxu, linux-bcache

On 2017/7/19 下午5:45, tang.junhui@zte.com.cn wrote:
>> Currently bcache_dev_sectors_dirty() iterates all dirty stripes of
>> bcache device, it will take around 15 milliseconds for a 15.2TB cache
>> device on my hardware. The latency is unacceptable,
> 
> Why it is unacceptable? It does not block the front-endincoming IOs.

Hi Junhui,

Imagine the following situation,

1) Cached device has dirty data but not exceed its writeback_percent,
after its writeback thread is woke up, no_writeback_now() will iterates
all dirty stripes of the bcache device (for 15 ms on my server), and
return true to bch_writeback_thread(), then the thread schedules out and
sleep.

2) Before dirty data exceeds writeback_percent of the cached device, it
is not difficult to have more than 1 writ request onto bcache device for
every 15 ms (on my hardware). If it happens, then the writeback thread
will be woke up repeated and every time no_writeback_now() will be
called and consume CPU cycles to iterate all stripes for 15ms.

3) The result is, there is potential possibility, that the writeback
thread always consume a full CPU core before dirty data exceeds cached
device's writeback_percent, even there is only a few write request on
the bcache device.

This is unacceptable in this patch set.

Thanks.

Coly


> 
> 
> 发件人:         Coly Li <i@coly.li>
> 收件人:         Coly Li <colyli@suse.de>, linux-bcache@vger.kernel.org,
> 抄送:        bcache@lists.ewheeler.net, tang.junhui@zte.com.cn,
> kubuxu@ipfs.io
> 日期:         2017/07/19 16:20
> 主题:        Re: [PATCH 1/2] bcache: do not perform writeback if dirty
> data is no more than writeback_percent
> 发件人:        linux-bcache-owner@vger.kernel.org
> ------------------------------------------------------------------------
> 
> 
> 
> On 2017/7/19 下午1:36, Coly Li wrote:
>> The functionanlity of sysfs entry writeback_percent is explained in file
>> Document/ABI/testing/sysfs-block-bcache:
>> "For backing devices: If nonzero, writeback from cache to
>>  backing device only takes place when more than this percentage
>>  of the cache is used, allowing more write coalescing to take
>>  place and reducing total number of writes sent to the backing
>>  device. Integer between 0 and 40."
>>
>> Indeed currently in writeback mode, even dirty data percent is not exceed
>> writeback_percent of the cached device, writeback still happens at least
>> one key per second. This is not a behavior as designed.
>>
>> This patch does three things to fix the above issue,
>> 1) Move writeback thread related checks from bch_writeback_thread()
>>    to no_writeback_now().
>> 2) Add a duration to count time duration since last I/O request received
>>    to the moment when no_writeback_now() is called. When duration is more
>>    than BCH_IDLE_DURATION_MSECS, perform a proactive writeback no matter
>>    dirty data exceeds writeback_percent or not.
>> 3) If duration is not timeout, and dirty data is not more than cached
>>    device's writeback_percent, writeback thread just schedules out and
>>    waits for another try after BCH_IDLE_DURATION_MSECS.
>>
>> By this patch, if a write request's dirty data does not exceed
>> writeback_percent, writeback will not happen. In this case dirty data may
>> stay in cache device for a very long time if I/O on bcache device is idle.
>> To fix this, bch_writeback_queue() is called in update_writeback_rate()
>> to wake up the writeback thread in period of
> writeback_rate_update_seconds.
>> Then if no I/O happens on bcache device for BCH_IDLE_DURATION_MSECS
>> milliseconds, at least we have chance to wake up writeback thread to find
>> the duration timeout in a not-that-much delay.
>>
>> Now writeback_percent acts as what it is defined for.
>>
>> Signed-off-by: Coly Li <colyli@suse.de>
>> ---
>>  drivers/md/bcache/bcache.h    |  4 ++++
>>  drivers/md/bcache/request.c   |  6 +++++
>>  drivers/md/bcache/writeback.c | 55
> +++++++++++++++++++++++++++++++++++++++----
>>  3 files changed, 61 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
>> index dee542fff68e..ab7e60336edb 100644
>> --- a/drivers/md/bcache/bcache.h
>> +++ b/drivers/md/bcache/bcache.h
>> @@ -662,6 +662,8 @@ struct cache_set {
>>  
>>  #define BUCKET_HASH_BITS                 12
>>                   struct hlist_head                 bucket_hash[1 <<
> BUCKET_HASH_BITS];
>> +
>> +                 uint64_t                                
>  last_request_time;
>>  };
>>  
>>  struct bbio {
>> @@ -680,6 +682,8 @@ struct bbio {
>>  #define BTREE_PRIO                                  USHRT_MAX
>>  #define INITIAL_PRIO                                  32768U
>>  
>> +#define BCH_IDLE_DURATION_MSECS                 10000U
>> +
>>  #define btree_bytes(c)                                
>  ((c)->btree_pages * PAGE_SIZE)
>>  #define btree_blocks(b)                                              
>                                                                         \
>>                   ((unsigned) (KEY_SIZE(&b->key) >> (b)->c->block_bits))
>> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
>> index 019b3df9f1c6..98971486f7f1 100644
>> --- a/drivers/md/bcache/request.c
>> +++ b/drivers/md/bcache/request.c
>> @@ -957,10 +957,13 @@ static blk_qc_t cached_dev_make_request(struct
> request_queue *q,
>>                   struct search *s;
>>                   struct bcache_device *d =
> bio->bi_bdev->bd_disk->private_data;
>>                   struct cached_dev *dc = container_of(d, struct
> cached_dev, disk);
>> +                 struct cache_set *c = d->c;
>>                   int rw = bio_data_dir(bio);
>>  
>>                   generic_start_io_acct(rw, bio_sectors(bio),
> &d->disk->part0);
>>  
>> +                 if (c)
>> +                                  c->last_request_time = local_clock();
>>                   bio->bi_bdev = dc->bdev;
>>                   bio->bi_iter.bi_sector += dc->sb.data_offset;
>>  
>> @@ -991,6 +994,9 @@ static blk_qc_t cached_dev_make_request(struct
> request_queue *q,
>>                                    else
>>                                                    
> generic_make_request(bio);
>>                   }
>> +                 /* update last_request_time again to make it to be
> more accurate */
>> +                 if (c)
>> +                                  c->last_request_time = local_clock();
>>  
>>                   return BLK_QC_T_NONE;
>>  }
>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>> index 42c66e76f05e..13594dd7f564 100644
>> --- a/drivers/md/bcache/writeback.c
>> +++ b/drivers/md/bcache/writeback.c
>> @@ -78,8 +78,15 @@ static void update_writeback_rate(struct
> work_struct *work)
>>                   down_read(&dc->writeback_lock);
>>  
>>                   if (atomic_read(&dc->has_dirty) &&
>> -                     dc->writeback_percent)
>> +                     dc->writeback_percent) {
>>                                    __update_writeback_rate(dc);
>> +                                  /*
>> +                                   * wake up writeback thread to
> check whether request
>> +                                   * duration is timeout in
> no_writeback_now(). If yes,
>> +                                   * existing dirty data should be
> handled.
>> +                                   */
>> +                                  bch_writeback_queue(dc);
>> +                 }
>>  
>>                   up_read(&dc->writeback_lock);
>>  
>> @@ -412,6 +419,48 @@ static bool refill_dirty(struct cached_dev *dc)
>>                   return bkey_cmp(&buf->last_scanned, &start_pos) >= 0;
>>  }
>>  
>> +static bool no_writeback_now(struct cached_dev *dc)
>> +{
>> +                 uint64_t duration;
>> +                 struct cache_set *c = dc->disk.c;
>> +                 uint64_t cache_set_sectors;
>> +                 uint64_t dirty_sectors;
>> +                 uint64_t percent_sectors;
>> +
>> +                 if (!atomic_read(&dc->has_dirty))
>> +                                  return true;
>> +
>> +                 if (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) &&
>> +                     !dc->writeback_running)
>> +                                  return true;
>> +
>> +                 /*
>> +                  * last_request_time is initialized to 0, it is
> possible that
>> +                  * duration is larger than BCH_IDLE_DURATION_MSECS
> when the first
>> +                  * time it is calculated. If there is dirty data of
> the cached
>> +                  * device, bcache will try to perform a proactive
> writeback.
>> +                  */
>> +                 duration = local_clock() - c->last_request_time;
>> +                 if ((duration/NSEC_PER_MSEC) > BCH_IDLE_DURATION_MSECS)
>> +                                  return false;
>> +
>> +                 /*
>> +                  * If duration is not timeout, and dirty data does
> not exceed
>> +                  * writeback_percent, no writeback now.
>> +                  * This is what writeback_percent is designed for, see
>> +                  * /sys/block/<disk>/bcache/writeback_percentkernel in
>> +                  * Documentation/ABI/testing/sysfs-block-bcache
>> +                  */
>> +                 cache_set_sectors = c->nbuckets * c->sb.bucket_size;
>> +                 percent_sectors = div64_u64(
>> +                                  cache_set_sectors *
> dc->writeback_percent, 100);
>> +                 dirty_sectors = bcache_dev_sectors_dirty(&dc->disk);
> 
> Currently bcache_dev_sectors_dirty() iterates all dirty stripes of
> bcache device, it will take around 15 milliseconds for a 15.2TB cache
> device on my hardware. The latency is unacceptable, I also mentioned
> that on Junhui's work (previously posted patch titled "bcache: update
> bucket_in_use periodically").
> 
> I am working on a low latency dirty sectors counter now, after it gets
> done, I will add them into this patch set and post again.
> 
> Just FYI. Thanks.
> 
> 
> Coly
> 
> 
> 
>> +                 if (dirty_sectors <= percent_sectors)
>> +                                  return true;
>> +
>> +                 return false;
>> +}
>> +
>>  static int bch_writeback_thread(void *arg)
>>  {
>>                   struct cached_dev *dc = arg;
>> @@ -419,9 +468,7 @@ static int bch_writeback_thread(void *arg)
>>  
>>                   while (!kthread_should_stop()) {
>>                                    down_write(&dc->writeback_lock);
>> -                                  if (!atomic_read(&dc->has_dirty) ||
>> -                                    
>  (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) &&
>> -                                       !dc->writeback_running)) {
>> +                                  if (no_writeback_now(dc)) {
>>                                                    
> up_write(&dc->writeback_lock);
>>                                                    
> set_current_state(TASK_INTERRUPTIBLE);
>>  
>>
> 
> 
> -- 
> Coly Li
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] bcache: do not perform writeback if dirty data is no more than writeback_percent
       [not found]   ` <OF7ADE25A8.9CA7AF89-ON48258165.00108B81-48258165.0010FA59@zte.com.cn>
@ 2017-07-22 15:49     ` Coly Li
  0 siblings, 0 replies; 7+ messages in thread
From: Coly Li @ 2017-07-22 15:49 UTC (permalink / raw)
  To: tang.junhui; +Cc: bcache, kubuxu, linux-bcache, linux-bcache-owner

On 2017/7/22 上午11:05, tang.junhui@zte.com.cn wrote:
> Hello Coly,
> 
> Good improvement for the writeback mechanism.
> 
> But you maybe still need pay attention to three things bellow:
> 1) You may introduce a new race in variable "c->last_request_time",
>    other devices or threads may access on it in the same time;

Yes, int64_t last_request_time operations can not be atomic on non-64
bit machines, there is race. I should change it to atomic64_t and use
atomic64_set()/atomic64_read().

And I feel in no_writeback_now() the following line should be fixed too,
	duration = local_clock() - c->last_request_time;
c->last_request_time might be updated after local_clock() is returned,
then duration will be a minus value. I should calculate duration in this
way:
	/*
	 * c->last_request_time might be updated after local_lock()
	 * returns, fetched it firstly to make sure last_request_time
	 * won't be newer then return value of local_clock()
	 */
	int64_t last_request_time =
		atomic64_read(&c->last_request_time);
	duration = local_clock() - last_request_time;

Nice catch!

> 2) Request may be also on thin flash device, so I think you should also
>    record c->last_request_time in flash_dev_make_request();

c->last_request_time is for bcache devices to calculate
dc->writeback_rate.rate, thin flash device does not have cached device
(that is no dc->writeback_rate), this is why I don't counter time stamp
for flash only volume.


> 3) What I worried the most is that, this patch may not takes effection in
>    real application, for example, in Openstack, there is many virtual
> machines
>    running on the bcache device, so there is always some IO requesets (few
>    but frequently) , so it would be never timeout, and the left dirty data
>    will never flush cleanly.

In this patch set, the above situation is what the second patch cares
about. For this patch, it only prevents writeback to continue before
dirty data exceeds writeback_percent, and permit writeback to continue
if a timeer is out.

Indeed, before applying the second patch of this patch set, even
((duration/NSEC_PER_MSEC) > BCH_IDLE_DURATION_MSECS) happens, the
writeback rate is very low, most of time it is 1 only. For data center
application it won't be problematic, but for mobile devices it is. This
is what the second patch of this patch set tries to fix.

Here I also talk about the idea of second patch here as well. The idea
behind PD controller in __update_writeback_rate() is,
- when dirty data exceeds dc->writeback_percent
  more dirty data means high writeback rate, less dirty data means low
writeback rate.
- when dirty data is equal to or less than dc->writeback_percent
  regular I/O is always the highest prior, writeback rate is the lowest
value 1.

For data center environment the above ideas are right, so in your
example the writeback rate will be very low (maybe only 1). And yes,
since the writeback is based on iteration of an in-memory RB tree, there
is no assurance that oldest dirty data will be selected first, it is
possible that a dirty data will be never cleaned in your example.

Less dirty data and low I/O latency can not be achieved at same tine in
bcache code, obviously bcache choose low I/O latency as first priority.

Even in my patch set the second patch for max_writeback_rate_when_idle
will be set writeback_rate to 1 when update_writeback_rate() detect I/O
idle is broken by new coming request.

Thanks for comments :-)

Coly


> 
>> The functionanlity of sysfs entry writeback_percent is explained in file
>> Document/ABI/testing/sysfs-block-bcache:
>> "For backing devices: If nonzero, writeback from cache to
>>  backing device only takes place when more than this percentage
>>  of the cache is used, allowing more write coalescing to take
>>  place and reducing total number of writes sent to the backing
>>  device. Integer between 0 and 40."
>>
>> Indeed currently in writeback mode, even dirty data percent is not exceed
>> writeback_percent of the cached device, writeback still happens at least
>> one key per second. This is not a behavior as designed.
>>
>> This patch does three things to fix the above issue,
>> 1) Move writeback thread related checks from bch_writeback_thread()
>>    to no_writeback_now().
>> 2) Add a duration to count time duration since last I/O request received
>>    to the moment when no_writeback_now() is called. When duration is more
>>    than BCH_IDLE_DURATION_MSECS, perform a proactive writeback no matter
>>    dirty data exceeds writeback_percent or not.
>> 3) If duration is not timeout, and dirty data is not more than cached
>>    device's writeback_percent, writeback thread just schedules out and
>>    waits for another try after BCH_IDLE_DURATION_MSECS.
>>
>> By this patch, if a write request's dirty data does not exceed
>> writeback_percent, writeback will not happen. In this case dirty data may
>> stay in cache device for a very long time if I/O on bcache device is
> idle.
>> To fix this, bch_writeback_queue() is called in update_writeback_rate()
>> to wake up the writeback thread in period of
> writeback_rate_update_seconds.
>> Then if no I/O happens on bcache device for BCH_IDLE_DURATION_MSECS
>> milliseconds, at least we have chance to wake up writeback thread to find
>> the duration timeout in a not-that-much delay.
>>
>> Now writeback_percent acts as what it is defined for.
>>
>> Signed-off-by: Coly Li <colyli@suse.de>
>> ---
>>  drivers/md/bcache/bcache.h    |  4 ++++
>>  drivers/md/bcache/request.c   |  6 +++++
>>  drivers/md/bcache/writeback.c | 55
> +++++++++++++++++++++++++++++++++++++++----
>>  3 files changed, 61 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
>> index dee542fff68e..ab7e60336edb 100644
>> --- a/drivers/md/bcache/bcache.h
>> +++ b/drivers/md/bcache/bcache.h
>> @@ -662,6 +662,8 @@ struct cache_set {
>>  
>>  #define BUCKET_HASH_BITS                 12
>>                   struct hlist_head                 bucket_hash[1 <<
> BUCKET_HASH_BITS];
>> +
>> +                 uint64_t                                
>  last_request_time;
>>  };
>>  
>>  struct bbio {
>> @@ -680,6 +682,8 @@ struct bbio {
>>  #define BTREE_PRIO                                  USHRT_MAX
>>  #define INITIAL_PRIO                                  32768U
>>  
>> +#define BCH_IDLE_DURATION_MSECS                 10000U
>> +
>>  #define btree_bytes(c)                                
>  ((c)->btree_pages * PAGE_SIZE)
>>  #define btree_blocks(b)                                              
>                                                                         \
>>                   ((unsigned) (KEY_SIZE(&b->key) >> (b)->c->block_bits))
>> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
>> index 019b3df9f1c6..98971486f7f1 100644
>> --- a/drivers/md/bcache/request.c
>> +++ b/drivers/md/bcache/request.c
>> @@ -957,10 +957,13 @@ static blk_qc_t cached_dev_make_request(struct
> request_queue *q,
>>                   struct search *s;
>>                   struct bcache_device *d =
> bio->bi_bdev->bd_disk->private_data;
>>                   struct cached_dev *dc = container_of(d, struct
> cached_dev, disk);
>> +                 struct cache_set *c = d->c;
>>                   int rw = bio_data_dir(bio);
>>  
>>                   generic_start_io_acct(rw, bio_sectors(bio),
> &d->disk->part0);
>>  
>> +                 if (c)
>> +                                  c->last_request_time = local_clock();
>>                   bio->bi_bdev = dc->bdev;
>>                   bio->bi_iter.bi_sector += dc->sb.data_offset;
>>  
>> @@ -991,6 +994,9 @@ static blk_qc_t cached_dev_make_request(struct
> request_queue *q,
>>                                    else
>>                                                    
> generic_make_request(bio);
>>                   }
>> +                 /* update last_request_time again to make it to be
> more accurate */
>> +                 if (c)
>> +                                  c->last_request_time = local_clock();
>>  
>>                   return BLK_QC_T_NONE;
>>  }
>> diff --git a/drivers/md/bcache/writeback.c
> b/drivers/md/bcache/writeback.c
>> index 42c66e76f05e..13594dd7f564 100644
>> --- a/drivers/md/bcache/writeback.c
>> +++ b/drivers/md/bcache/writeback.c
>> @@ -78,8 +78,15 @@ static void update_writeback_rate(struct
> work_struct *work)
>>                   down_read(&dc->writeback_lock);
>>  
>>                   if (atomic_read(&dc->has_dirty) &&
>> -                     dc->writeback_percent)
>> +                     dc->writeback_percent) {
>>                                    __update_writeback_rate(dc);
>> +                                  /*
>> +                                   * wake up writeback thread to
> check whether request
>> +                                   * duration is timeout in
> no_writeback_now(). If yes,
>> +                                   * existing dirty data should be
> handled.
>> +                                   */
>> +                                  bch_writeback_queue(dc);
>> +                 }
>>  
>>                   up_read(&dc->writeback_lock);
>>  
>> @@ -412,6 +419,48 @@ static bool refill_dirty(struct cached_dev *dc)
>>                   return bkey_cmp(&buf->last_scanned, &start_pos) >= 0;
>>  }
>>  
>> +static bool no_writeback_now(struct cached_dev *dc)
>> +{
>> +                 uint64_t duration;
>> +                 struct cache_set *c = dc->disk.c;
>> +                 uint64_t cache_set_sectors;
>> +                 uint64_t dirty_sectors;
>> +                 uint64_t percent_sectors;
>> +
>> +                 if (!atomic_read(&dc->has_dirty))
>> +                                  return true;
>> +
>> +                 if (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) &&
>> +                     !dc->writeback_running)
>> +                                  return true;
>> +
>> +                 /*
>> +                  * last_request_time is initialized to 0, it is
> possible that
>> +                  * duration is larger than BCH_IDLE_DURATION_MSECS
> when the first
>> +                  * time it is calculated. If there is dirty data of
> the cached
>> +                  * device, bcache will try to perform a proactive
> writeback.
>> +                  */
>> +                 duration = local_clock() - c->last_request_time;
>> +                 if ((duration/NSEC_PER_MSEC) > BCH_IDLE_DURATION_MSECS)
>> +                                  return false;
>> +
>> +                 /*
>> +                  * If duration is not timeout, and dirty data does
> not exceed
>> +                  * writeback_percent, no writeback now.
>> +                  * This is what writeback_percent is designed for, see
>> +                  * /sys/block/<disk>/bcache/writeback_percentkernel in
>> +                  * Documentation/ABI/testing/sysfs-block-bcache
>> +                  */
>> +                 cache_set_sectors = c->nbuckets * c->sb.bucket_size;
>> +                 percent_sectors = div64_u64(
>> +                                  cache_set_sectors *
> dc->writeback_percent, 100);
>> +                 dirty_sectors = bcache_dev_sectors_dirty(&dc->disk);
>> +                 if (dirty_sectors <= percent_sectors)
>> +                                  return true;
>> +
>> +                 return false;
>> +}
>> +
>>  static int bch_writeback_thread(void *arg)
>>  {
>>                   struct cached_dev *dc = arg;
>> @@ -419,9 +468,7 @@ static int bch_writeback_thread(void *arg)
>>  
>>                   while (!kthread_should_stop()) {
>>                                    down_write(&dc->writeback_lock);
>> -                                  if (!atomic_read(&dc->has_dirty) ||
>> -                                    
>  (!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) &&
>> -                                       !dc->writeback_running)) {
>> +                                  if (no_writeback_now(dc)) {
>>                                                    
> up_write(&dc->writeback_lock);
>>                                                    
> set_current_state(TASK_INTERRUPTIBLE);
>>  
>> --
>> 2.12.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* set max writeback rate when bcache device is idle
@ 2017-07-19  5:34 Coly Li
  0 siblings, 0 replies; 7+ messages in thread
From: Coly Li @ 2017-07-19  5:34 UTC (permalink / raw)
  To: linux-bcache; +Cc: bcache, tang.junhui, kubuxu

This patch set is an effort to make bcache writes back dirty data when
it is idle. Junhui Tang posted a fix days before, which IMHO has unstable
interfere to writeback rate PD controller. This is another approach for
same problem.

The first patch with subject "bcache: do not perform writeback if dirty
data is no more than writeback_percent" makes writeback_percent works as
designed. This patch is depended by the seconded patch.

The second patch with subject "bcache: implement
max_writeback_rate_when_idle option for writeback mode" implements an
option to set a maximum writeback rate when bcache device is idle. Then
dirty data can be wrote back to cached device and cleaned on cache device.
After cache set is cleaned, the following write requests on bcache device
won't trigger writeback unless dirty data exceeds writeback_percent.

Please review the patch set, any comment is welcome :-)

Thanks in advance.

Coly Li

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-07-22 15:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-19  5:36 set max writeback rate when bcache device is idle Coly Li
2017-07-19  5:36 ` [PATCH 1/2] bcache: do not perform writeback if dirty data is no more than writeback_percent Coly Li
2017-07-19  8:19   ` Coly Li
     [not found]     ` <OF2AB44ECA.94CE1BF0-ON48258162.00356FFD-48258162.00359404@zte.com.cn>
2017-07-19 10:32       ` Coly Li
     [not found]   ` <OF7ADE25A8.9CA7AF89-ON48258165.00108B81-48258165.0010FA59@zte.com.cn>
2017-07-22 15:49     ` Coly Li
2017-07-19  5:36 ` [PATCH 2/2] bcache: implement max_writeback_rate_when_idle option for writeback mode Coly Li
  -- strict thread matches above, loose matches on Subject: below --
2017-07-19  5:34 set max writeback rate when bcache device is idle Coly Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.