All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v2 0/4] Add throttling for discards
@ 2018-05-07 16:13 Jens Axboe
  2018-05-07 16:13 ` [PATCH 1/4] block: break discard submissions into the user defined size Jens Axboe
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Jens Axboe @ 2018-05-07 16:13 UTC (permalink / raw)
  To: linux-block, linux-xfs; +Cc: dchinner, hch

I implemented support for discards in blk-wbt, so we treat them like
background writes. If we have competing foreground IO, then we may
throttle discards. Otherwise they should run at full speed.

Ran a quick local test case on a NVMe device. The test case here is
doing reads from a file, while other files are being delete. File
system is XFS. Three separate runs:

1) 'justreader' is just running the reader workload by itself.
2) 'nothrottle' is running the reader with the unlinks on the stock
   kernel.
3) 'throttled' is running the reader with the unlinks with the patches
   applied.

The fio run logs latencies in windows of 50msec, logging only the
worst latency seen in that window:

[read]
filename=file0
rw=randread
bs=4k
direct=1
ioengine=libaio
iodepth=8
write_lat_log=read_lat
log_avg_msec=50
log_max_value=1

Graph of the latencies here: http://kernel.dk/wbt-discard.jpg

As you can see, with the throttling, the latencies and performance is
the same as not having the unlinks running. They finish within 100msec
of each other, after having read 10G of data. Without the throttling
and unlinks running, we see spikes that are almost an order of magnitude
worse. The reader also takes 1 second longer to complete. 322MB/sec vs
313MB/sec. It's also visible in the higher Pxx. Here's the throttled
end of the percentiles:

     | 99.00th=[  145], 99.50th=[  153], 99.90th=[  180], 99.95th=[  198],
     | 99.99th=[  889]

and here's the unthrottled equivalent:

     | 99.00th=[  145], 99.50th=[  155], 99.90th=[  188], 99.95th=[  322],
     | 99.99th=[ 8586]


Changes since v1:

- Pass in enum wbt_flags for get_rq_wait()
- Turn wbt_should_throttle() into a switch statement

 blk-lib.c  |    7 +++---
 blk-stat.h |    6 ++---
 blk-wbt.c  |   70 ++++++++++++++++++++++++++++++++++++-------------------------
 blk-wbt.h  |    8 +++++-
 4 files changed, 55 insertions(+), 36 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] block: break discard submissions into the user defined size
  2018-05-07 16:13 [PATCHSET v2 0/4] Add throttling for discards Jens Axboe
@ 2018-05-07 16:13 ` Jens Axboe
  2018-05-07 23:56   ` Darrick J. Wong
  2018-05-08 20:43   ` Omar Sandoval
  2018-05-07 16:13 ` [PATCH 2/4] blk-wbt: account any writing command as a write Jens Axboe
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Jens Axboe @ 2018-05-07 16:13 UTC (permalink / raw)
  To: linux-block, linux-xfs; +Cc: dchinner, hch, Jens Axboe

Don't build discards bigger than what the user asked for, if the
user decided to limit the size by writing to 'discard_max_bytes'.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-lib.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index a676084d4740..7417d617091b 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -62,10 +62,11 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		unsigned int req_sects;
 		sector_t end_sect, tmp;
 
-		/* Make sure bi_size doesn't overflow */
-		req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
+		/* Issue in chunks of the user defined max discard setting */
+		req_sects = min_t(sector_t, nr_sects,
+					q->limits.max_discard_sectors);
 
-		/**
+		/*
 		 * If splitting a request, and the next starting sector would be
 		 * misaligned, stop the discard at the previous aligned sector.
 		 */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] blk-wbt: account any writing command as a write
  2018-05-07 16:13 [PATCHSET v2 0/4] Add throttling for discards Jens Axboe
  2018-05-07 16:13 ` [PATCH 1/4] block: break discard submissions into the user defined size Jens Axboe
@ 2018-05-07 16:13 ` Jens Axboe
  2018-05-07 23:58   ` Darrick J. Wong
  2018-05-08 20:44   ` Omar Sandoval
  2018-05-07 16:13 ` [PATCH 3/4] blk-wbt: pass in enum wbt_flags to get_rq_wait() Jens Axboe
  2018-05-07 16:13 ` [PATCH 4/4] blk-wbt: throttle discards like background writes Jens Axboe
  3 siblings, 2 replies; 14+ messages in thread
From: Jens Axboe @ 2018-05-07 16:13 UTC (permalink / raw)
  To: linux-block, linux-xfs; +Cc: dchinner, hch, Jens Axboe

We currently special case WRITE and FLUSH, but we should really
just include any command with the write bit set. This ensures
that we account DISCARD.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-wbt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index f92fc84b5e2c..3e34b41bcefc 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -701,7 +701,7 @@ static int wbt_data_dir(const struct request *rq)
 
 	if (op == REQ_OP_READ)
 		return READ;
-	else if (op == REQ_OP_WRITE || op == REQ_OP_FLUSH)
+	else if (op_is_write(op))
 		return WRITE;
 
 	/* don't account */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] blk-wbt: pass in enum wbt_flags to get_rq_wait()
  2018-05-07 16:13 [PATCHSET v2 0/4] Add throttling for discards Jens Axboe
  2018-05-07 16:13 ` [PATCH 1/4] block: break discard submissions into the user defined size Jens Axboe
  2018-05-07 16:13 ` [PATCH 2/4] blk-wbt: account any writing command as a write Jens Axboe
@ 2018-05-07 16:13 ` Jens Axboe
  2018-05-08  0:07   ` Darrick J. Wong
  2018-05-08 20:46   ` Omar Sandoval
  2018-05-07 16:13 ` [PATCH 4/4] blk-wbt: throttle discards like background writes Jens Axboe
  3 siblings, 2 replies; 14+ messages in thread
From: Jens Axboe @ 2018-05-07 16:13 UTC (permalink / raw)
  To: linux-block, linux-xfs; +Cc: dchinner, hch, Jens Axboe

This is in preparation for having more write queues, in which
case we would have needed to pass in more information than just
a simple 'is_kswapd' boolean.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-wbt.c | 25 +++++++++++++++----------
 block/blk-wbt.h |  4 +++-
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 3e34b41bcefc..25d202345965 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -101,9 +101,13 @@ static bool wb_recent_wait(struct rq_wb *rwb)
 	return time_before(jiffies, wb->dirty_sleep + HZ);
 }
 
-static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, bool is_kswapd)
+static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb,
+					  enum wbt_flags wb_acct)
 {
-	return &rwb->rq_wait[is_kswapd];
+	if (wb_acct & WBT_KSWAPD)
+		return &rwb->rq_wait[WBT_RWQ_KSWAPD];
+
+	return &rwb->rq_wait[WBT_RWQ_BG];
 }
 
 static void rwb_wake_all(struct rq_wb *rwb)
@@ -126,7 +130,7 @@ void __wbt_done(struct rq_wb *rwb, enum wbt_flags wb_acct)
 	if (!(wb_acct & WBT_TRACKED))
 		return;
 
-	rqw = get_rq_wait(rwb, wb_acct & WBT_KSWAPD);
+	rqw = get_rq_wait(rwb, wb_acct);
 	inflight = atomic_dec_return(&rqw->inflight);
 
 	/*
@@ -529,11 +533,12 @@ static inline bool may_queue(struct rq_wb *rwb, struct rq_wait *rqw,
  * Block if we will exceed our limit, or if we are currently waiting for
  * the timer to kick off queuing again.
  */
-static void __wbt_wait(struct rq_wb *rwb, unsigned long rw, spinlock_t *lock)
+static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
+		       unsigned long rw, spinlock_t *lock)
 	__releases(lock)
 	__acquires(lock)
 {
-	struct rq_wait *rqw = get_rq_wait(rwb, current_is_kswapd());
+	struct rq_wait *rqw = get_rq_wait(rwb, wb_acct);
 	DEFINE_WAIT(wait);
 
 	if (may_queue(rwb, rqw, &wait, rw))
@@ -584,7 +589,7 @@ static inline bool wbt_should_throttle(struct rq_wb *rwb, struct bio *bio)
  */
 enum wbt_flags wbt_wait(struct rq_wb *rwb, struct bio *bio, spinlock_t *lock)
 {
-	unsigned int ret = 0;
+	enum wbt_flags ret = 0;
 
 	if (!rwb_enabled(rwb))
 		return 0;
@@ -598,14 +603,14 @@ enum wbt_flags wbt_wait(struct rq_wb *rwb, struct bio *bio, spinlock_t *lock)
 		return ret;
 	}
 
-	__wbt_wait(rwb, bio->bi_opf, lock);
+	if (current_is_kswapd())
+		ret |= WBT_KSWAPD;
+
+	__wbt_wait(rwb, ret, bio->bi_opf, lock);
 
 	if (!blk_stat_is_active(rwb->cb))
 		rwb_arm_timer(rwb);
 
-	if (current_is_kswapd())
-		ret |= WBT_KSWAPD;
-
 	return ret | WBT_TRACKED;
 }
 
diff --git a/block/blk-wbt.h b/block/blk-wbt.h
index a232c98fbf4d..8038b4a0d4ef 100644
--- a/block/blk-wbt.h
+++ b/block/blk-wbt.h
@@ -19,7 +19,9 @@ enum wbt_flags {
 };
 
 enum {
-	WBT_NUM_RWQ		= 2,
+	WBT_RWQ_BG		= 0,
+	WBT_RWQ_KSWAPD,
+	WBT_NUM_RWQ,
 };
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] blk-wbt: throttle discards like background writes
  2018-05-07 16:13 [PATCHSET v2 0/4] Add throttling for discards Jens Axboe
                   ` (2 preceding siblings ...)
  2018-05-07 16:13 ` [PATCH 3/4] blk-wbt: pass in enum wbt_flags to get_rq_wait() Jens Axboe
@ 2018-05-07 16:13 ` Jens Axboe
  2018-05-08  0:13   ` Darrick J. Wong
  2018-05-08 20:48   ` Omar Sandoval
  3 siblings, 2 replies; 14+ messages in thread
From: Jens Axboe @ 2018-05-07 16:13 UTC (permalink / raw)
  To: linux-block, linux-xfs; +Cc: dchinner, hch, Jens Axboe

Throttle discards like we would any background write. Discards should
be background activity, so if they are impacting foreground IO, then
we will throttle them down.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-stat.h |  6 +++---
 block/blk-wbt.c  | 43 ++++++++++++++++++++++++++-----------------
 block/blk-wbt.h  |  4 +++-
 3 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/block/blk-stat.h b/block/blk-stat.h
index 2dd36347252a..c22049a8125e 100644
--- a/block/blk-stat.h
+++ b/block/blk-stat.h
@@ -10,11 +10,11 @@
 
 /*
  * from upper:
- * 3 bits: reserved for other usage
+ * 4 bits: reserved for other usage
  * 12 bits: size
- * 49 bits: time
+ * 48 bits: time
  */
-#define BLK_STAT_RES_BITS	3
+#define BLK_STAT_RES_BITS	4
 #define BLK_STAT_SIZE_BITS	12
 #define BLK_STAT_RES_SHIFT	(64 - BLK_STAT_RES_BITS)
 #define BLK_STAT_SIZE_SHIFT	(BLK_STAT_RES_SHIFT - BLK_STAT_SIZE_BITS)
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 25d202345965..a7a724580033 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -106,6 +106,8 @@ static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb,
 {
 	if (wb_acct & WBT_KSWAPD)
 		return &rwb->rq_wait[WBT_RWQ_KSWAPD];
+	else if (wb_acct & WBT_DISCARD)
+		return &rwb->rq_wait[WBT_RWQ_DISCARD];
 
 	return &rwb->rq_wait[WBT_RWQ_BG];
 }
@@ -143,10 +145,13 @@ void __wbt_done(struct rq_wb *rwb, enum wbt_flags wb_acct)
 	}
 
 	/*
-	 * If the device does write back caching, drop further down
-	 * before we wake people up.
+	 * For discards, our limit is always the background. For writes, if
+	 * the device does write back caching, drop further down before we
+	 * wake people up.
 	 */
-	if (rwb->wc && !wb_recent_wait(rwb))
+	if (wb_acct & WBT_DISCARD)
+		limit = rwb->wb_background;
+	else if (rwb->wc && !wb_recent_wait(rwb))
 		limit = 0;
 	else
 		limit = rwb->wb_normal;
@@ -483,6 +488,9 @@ static inline unsigned int get_limit(struct rq_wb *rwb, unsigned long rw)
 {
 	unsigned int limit;
 
+	if ((rw & REQ_OP_MASK) == REQ_OP_DISCARD)
+		return rwb->wb_background;
+
 	/*
 	 * At this point we know it's a buffered write. If this is
 	 * kswapd trying to free memory, or REQ_SYNC is set, then
@@ -564,21 +572,20 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
 
 static inline bool wbt_should_throttle(struct rq_wb *rwb, struct bio *bio)
 {
-	const int op = bio_op(bio);
-
-	/*
-	 * If not a WRITE, do nothing
-	 */
-	if (op != REQ_OP_WRITE)
-		return false;
-
-	/*
-	 * Don't throttle WRITE_ODIRECT
-	 */
-	if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == (REQ_SYNC | REQ_IDLE))
+	switch (bio_op(bio)) {
+	case REQ_OP_WRITE:
+		/*
+		 * Don't throttle WRITE_ODIRECT
+		 */
+		if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) ==
+		    (REQ_SYNC | REQ_IDLE))
+			return false;
+		/* fallthrough */
+	case REQ_OP_DISCARD:
+		return true;
+	default:
 		return false;
-
-	return true;
+	}
 }
 
 /*
@@ -605,6 +612,8 @@ enum wbt_flags wbt_wait(struct rq_wb *rwb, struct bio *bio, spinlock_t *lock)
 
 	if (current_is_kswapd())
 		ret |= WBT_KSWAPD;
+	if (bio_op(bio) == REQ_OP_DISCARD)
+		ret |= WBT_DISCARD;
 
 	__wbt_wait(rwb, ret, bio->bi_opf, lock);
 
diff --git a/block/blk-wbt.h b/block/blk-wbt.h
index 8038b4a0d4ef..d6a125e49db5 100644
--- a/block/blk-wbt.h
+++ b/block/blk-wbt.h
@@ -14,13 +14,15 @@ enum wbt_flags {
 	WBT_TRACKED		= 1,	/* write, tracked for throttling */
 	WBT_READ		= 2,	/* read */
 	WBT_KSWAPD		= 4,	/* write, from kswapd */
+	WBT_DISCARD		= 8,	/* discard */
 
-	WBT_NR_BITS		= 3,	/* number of bits */
+	WBT_NR_BITS		= 4,	/* number of bits */
 };
 
 enum {
 	WBT_RWQ_BG		= 0,
 	WBT_RWQ_KSWAPD,
+	WBT_RWQ_DISCARD,
 	WBT_NUM_RWQ,
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] block: break discard submissions into the user defined size
  2018-05-07 16:13 ` [PATCH 1/4] block: break discard submissions into the user defined size Jens Axboe
@ 2018-05-07 23:56   ` Darrick J. Wong
  2018-05-08 20:43   ` Omar Sandoval
  1 sibling, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2018-05-07 23:56 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-xfs, dchinner, hch

On Mon, May 07, 2018 at 10:13:32AM -0600, Jens Axboe wrote:
> Don't build discards bigger than what the user asked for, if the
> user decided to limit the size by writing to 'discard_max_bytes'.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Seems fine to me,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  block/blk-lib.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index a676084d4740..7417d617091b 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -62,10 +62,11 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  		unsigned int req_sects;
>  		sector_t end_sect, tmp;
>  
> -		/* Make sure bi_size doesn't overflow */
> -		req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
> +		/* Issue in chunks of the user defined max discard setting */
> +		req_sects = min_t(sector_t, nr_sects,
> +					q->limits.max_discard_sectors);
>  
> -		/**
> +		/*
>  		 * If splitting a request, and the next starting sector would be
>  		 * misaligned, stop the discard at the previous aligned sector.
>  		 */
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] blk-wbt: account any writing command as a write
  2018-05-07 16:13 ` [PATCH 2/4] blk-wbt: account any writing command as a write Jens Axboe
@ 2018-05-07 23:58   ` Darrick J. Wong
  2018-05-08 20:44   ` Omar Sandoval
  1 sibling, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2018-05-07 23:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-xfs, dchinner, hch

On Mon, May 07, 2018 at 10:13:33AM -0600, Jens Axboe wrote:
> We currently special case WRITE and FLUSH, but we should really
> just include any command with the write bit set. This ensures
> that we account DISCARD.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  block/blk-wbt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> index f92fc84b5e2c..3e34b41bcefc 100644
> --- a/block/blk-wbt.c
> +++ b/block/blk-wbt.c
> @@ -701,7 +701,7 @@ static int wbt_data_dir(const struct request *rq)
>  
>  	if (op == REQ_OP_READ)
>  		return READ;
> -	else if (op == REQ_OP_WRITE || op == REQ_OP_FLUSH)
> +	else if (op_is_write(op))
>  		return WRITE;
>  
>  	/* don't account */
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] blk-wbt: pass in enum wbt_flags to get_rq_wait()
  2018-05-07 16:13 ` [PATCH 3/4] blk-wbt: pass in enum wbt_flags to get_rq_wait() Jens Axboe
@ 2018-05-08  0:07   ` Darrick J. Wong
  2018-05-08 20:46   ` Omar Sandoval
  1 sibling, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2018-05-08  0:07 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-xfs, dchinner, hch

On Mon, May 07, 2018 at 10:13:34AM -0600, Jens Axboe wrote:
> This is in preparation for having more write queues, in which
> case we would have needed to pass in more information than just
> a simple 'is_kswapd' boolean.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Looks ok (having not run any kind of testing on this yet),
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  block/blk-wbt.c | 25 +++++++++++++++----------
>  block/blk-wbt.h |  4 +++-
>  2 files changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> index 3e34b41bcefc..25d202345965 100644
> --- a/block/blk-wbt.c
> +++ b/block/blk-wbt.c
> @@ -101,9 +101,13 @@ static bool wb_recent_wait(struct rq_wb *rwb)
>  	return time_before(jiffies, wb->dirty_sleep + HZ);
>  }
>  
> -static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, bool is_kswapd)
> +static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb,
> +					  enum wbt_flags wb_acct)
>  {
> -	return &rwb->rq_wait[is_kswapd];
> +	if (wb_acct & WBT_KSWAPD)
> +		return &rwb->rq_wait[WBT_RWQ_KSWAPD];
> +
> +	return &rwb->rq_wait[WBT_RWQ_BG];
>  }
>  
>  static void rwb_wake_all(struct rq_wb *rwb)
> @@ -126,7 +130,7 @@ void __wbt_done(struct rq_wb *rwb, enum wbt_flags wb_acct)
>  	if (!(wb_acct & WBT_TRACKED))
>  		return;
>  
> -	rqw = get_rq_wait(rwb, wb_acct & WBT_KSWAPD);
> +	rqw = get_rq_wait(rwb, wb_acct);
>  	inflight = atomic_dec_return(&rqw->inflight);
>  
>  	/*
> @@ -529,11 +533,12 @@ static inline bool may_queue(struct rq_wb *rwb, struct rq_wait *rqw,
>   * Block if we will exceed our limit, or if we are currently waiting for
>   * the timer to kick off queuing again.
>   */
> -static void __wbt_wait(struct rq_wb *rwb, unsigned long rw, spinlock_t *lock)
> +static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
> +		       unsigned long rw, spinlock_t *lock)
>  	__releases(lock)
>  	__acquires(lock)
>  {
> -	struct rq_wait *rqw = get_rq_wait(rwb, current_is_kswapd());
> +	struct rq_wait *rqw = get_rq_wait(rwb, wb_acct);
>  	DEFINE_WAIT(wait);
>  
>  	if (may_queue(rwb, rqw, &wait, rw))
> @@ -584,7 +589,7 @@ static inline bool wbt_should_throttle(struct rq_wb *rwb, struct bio *bio)
>   */
>  enum wbt_flags wbt_wait(struct rq_wb *rwb, struct bio *bio, spinlock_t *lock)
>  {
> -	unsigned int ret = 0;
> +	enum wbt_flags ret = 0;
>  
>  	if (!rwb_enabled(rwb))
>  		return 0;
> @@ -598,14 +603,14 @@ enum wbt_flags wbt_wait(struct rq_wb *rwb, struct bio *bio, spinlock_t *lock)
>  		return ret;
>  	}
>  
> -	__wbt_wait(rwb, bio->bi_opf, lock);
> +	if (current_is_kswapd())
> +		ret |= WBT_KSWAPD;
> +
> +	__wbt_wait(rwb, ret, bio->bi_opf, lock);
>  
>  	if (!blk_stat_is_active(rwb->cb))
>  		rwb_arm_timer(rwb);
>  
> -	if (current_is_kswapd())
> -		ret |= WBT_KSWAPD;
> -
>  	return ret | WBT_TRACKED;
>  }
>  
> diff --git a/block/blk-wbt.h b/block/blk-wbt.h
> index a232c98fbf4d..8038b4a0d4ef 100644
> --- a/block/blk-wbt.h
> +++ b/block/blk-wbt.h
> @@ -19,7 +19,9 @@ enum wbt_flags {
>  };
>  
>  enum {
> -	WBT_NUM_RWQ		= 2,
> +	WBT_RWQ_BG		= 0,
> +	WBT_RWQ_KSWAPD,
> +	WBT_NUM_RWQ,
>  };
>  
>  /*
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] blk-wbt: throttle discards like background writes
  2018-05-07 16:13 ` [PATCH 4/4] blk-wbt: throttle discards like background writes Jens Axboe
@ 2018-05-08  0:13   ` Darrick J. Wong
  2018-05-08 20:48   ` Omar Sandoval
  1 sibling, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2018-05-08  0:13 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-xfs, dchinner, hch

On Mon, May 07, 2018 at 10:13:35AM -0600, Jens Axboe wrote:
> Throttle discards like we would any background write. Discards should
> be background activity, so if they are impacting foreground IO, then
> we will throttle them down.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  block/blk-stat.h |  6 +++---
>  block/blk-wbt.c  | 43 ++++++++++++++++++++++++++-----------------
>  block/blk-wbt.h  |  4 +++-
>  3 files changed, 32 insertions(+), 21 deletions(-)
> 
> diff --git a/block/blk-stat.h b/block/blk-stat.h
> index 2dd36347252a..c22049a8125e 100644
> --- a/block/blk-stat.h
> +++ b/block/blk-stat.h
> @@ -10,11 +10,11 @@
>  
>  /*
>   * from upper:
> - * 3 bits: reserved for other usage
> + * 4 bits: reserved for other usage
>   * 12 bits: size
> - * 49 bits: time
> + * 48 bits: time
>   */
> -#define BLK_STAT_RES_BITS	3
> +#define BLK_STAT_RES_BITS	4
>  #define BLK_STAT_SIZE_BITS	12
>  #define BLK_STAT_RES_SHIFT	(64 - BLK_STAT_RES_BITS)
>  #define BLK_STAT_SIZE_SHIFT	(BLK_STAT_RES_SHIFT - BLK_STAT_SIZE_BITS)
> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> index 25d202345965..a7a724580033 100644
> --- a/block/blk-wbt.c
> +++ b/block/blk-wbt.c
> @@ -106,6 +106,8 @@ static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb,
>  {
>  	if (wb_acct & WBT_KSWAPD)
>  		return &rwb->rq_wait[WBT_RWQ_KSWAPD];
> +	else if (wb_acct & WBT_DISCARD)
> +		return &rwb->rq_wait[WBT_RWQ_DISCARD];
>  
>  	return &rwb->rq_wait[WBT_RWQ_BG];
>  }
> @@ -143,10 +145,13 @@ void __wbt_done(struct rq_wb *rwb, enum wbt_flags wb_acct)
>  	}
>  
>  	/*
> -	 * If the device does write back caching, drop further down
> -	 * before we wake people up.
> +	 * For discards, our limit is always the background. For writes, if
> +	 * the device does write back caching, drop further down before we
> +	 * wake people up.
>  	 */
> -	if (rwb->wc && !wb_recent_wait(rwb))
> +	if (wb_acct & WBT_DISCARD)
> +		limit = rwb->wb_background;
> +	else if (rwb->wc && !wb_recent_wait(rwb))
>  		limit = 0;
>  	else
>  		limit = rwb->wb_normal;
> @@ -483,6 +488,9 @@ static inline unsigned int get_limit(struct rq_wb *rwb, unsigned long rw)
>  {
>  	unsigned int limit;
>  
> +	if ((rw & REQ_OP_MASK) == REQ_OP_DISCARD)
> +		return rwb->wb_background;
> +
>  	/*
>  	 * At this point we know it's a buffered write. If this is
>  	 * kswapd trying to free memory, or REQ_SYNC is set, then
> @@ -564,21 +572,20 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
>  
>  static inline bool wbt_should_throttle(struct rq_wb *rwb, struct bio *bio)
>  {
> -	const int op = bio_op(bio);
> -
> -	/*
> -	 * If not a WRITE, do nothing
> -	 */
> -	if (op != REQ_OP_WRITE)
> -		return false;
> -
> -	/*
> -	 * Don't throttle WRITE_ODIRECT
> -	 */
> -	if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == (REQ_SYNC | REQ_IDLE))
> +	switch (bio_op(bio)) {
> +	case REQ_OP_WRITE:
> +		/*
> +		 * Don't throttle WRITE_ODIRECT
> +		 */
> +		if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) ==
> +		    (REQ_SYNC | REQ_IDLE))
> +			return false;
> +		/* fallthrough */
> +	case REQ_OP_DISCARD:
> +		return true;
> +	default:
>  		return false;
> -
> -	return true;
> +	}
>  }
>  
>  /*
> @@ -605,6 +612,8 @@ enum wbt_flags wbt_wait(struct rq_wb *rwb, struct bio *bio, spinlock_t *lock)
>  
>  	if (current_is_kswapd())
>  		ret |= WBT_KSWAPD;
> +	if (bio_op(bio) == REQ_OP_DISCARD)
> +		ret |= WBT_DISCARD;
>  
>  	__wbt_wait(rwb, ret, bio->bi_opf, lock);
>  
> diff --git a/block/blk-wbt.h b/block/blk-wbt.h
> index 8038b4a0d4ef..d6a125e49db5 100644
> --- a/block/blk-wbt.h
> +++ b/block/blk-wbt.h
> @@ -14,13 +14,15 @@ enum wbt_flags {
>  	WBT_TRACKED		= 1,	/* write, tracked for throttling */
>  	WBT_READ		= 2,	/* read */
>  	WBT_KSWAPD		= 4,	/* write, from kswapd */
> +	WBT_DISCARD		= 8,	/* discard */
>  
> -	WBT_NR_BITS		= 3,	/* number of bits */
> +	WBT_NR_BITS		= 4,	/* number of bits */
>  };
>  
>  enum {
>  	WBT_RWQ_BG		= 0,
>  	WBT_RWQ_KSWAPD,
> +	WBT_RWQ_DISCARD,
>  	WBT_NUM_RWQ,
>  };
>  
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] block: break discard submissions into the user defined size
  2018-05-07 16:13 ` [PATCH 1/4] block: break discard submissions into the user defined size Jens Axboe
  2018-05-07 23:56   ` Darrick J. Wong
@ 2018-05-08 20:43   ` Omar Sandoval
  2018-05-08 20:57     ` Jens Axboe
  1 sibling, 1 reply; 14+ messages in thread
From: Omar Sandoval @ 2018-05-08 20:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-xfs, dchinner, hch

On Mon, May 07, 2018 at 10:13:32AM -0600, Jens Axboe wrote:
> Don't build discards bigger than what the user asked for, if the
> user decided to limit the size by writing to 'discard_max_bytes'.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/blk-lib.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index a676084d4740..7417d617091b 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -62,10 +62,11 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  		unsigned int req_sects;
>  		sector_t end_sect, tmp;
>  
> -		/* Make sure bi_size doesn't overflow */
> -		req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
> +		/* Issue in chunks of the user defined max discard setting */
> +		req_sects = min_t(sector_t, nr_sects,
> +					q->limits.max_discard_sectors);
>  

Some drivers, including nvme, do

   blk_queue_max_discard_sectors(q, UINT_MAX)

That means q->limits.max_discard_sectors can be UINT_MAX, and

    bio->bi_iter.bi_size = req_sects << 9;

will overflow. We should probably cap max_discard_sectors at
UINT_MAX >> 9 in a prep patch.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] blk-wbt: account any writing command as a write
  2018-05-07 16:13 ` [PATCH 2/4] blk-wbt: account any writing command as a write Jens Axboe
  2018-05-07 23:58   ` Darrick J. Wong
@ 2018-05-08 20:44   ` Omar Sandoval
  1 sibling, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2018-05-08 20:44 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-xfs, dchinner, hch

On Mon, May 07, 2018 at 10:13:33AM -0600, Jens Axboe wrote:
> We currently special case WRITE and FLUSH, but we should really
> just include any command with the write bit set. This ensures
> that we account DISCARD.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/blk-wbt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> index f92fc84b5e2c..3e34b41bcefc 100644
> --- a/block/blk-wbt.c
> +++ b/block/blk-wbt.c
> @@ -701,7 +701,7 @@ static int wbt_data_dir(const struct request *rq)
>  
>  	if (op == REQ_OP_READ)
>  		return READ;
> -	else if (op == REQ_OP_WRITE || op == REQ_OP_FLUSH)
> +	else if (op_is_write(op))
>  		return WRITE;
>  
>  	/* don't account */
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] blk-wbt: pass in enum wbt_flags to get_rq_wait()
  2018-05-07 16:13 ` [PATCH 3/4] blk-wbt: pass in enum wbt_flags to get_rq_wait() Jens Axboe
  2018-05-08  0:07   ` Darrick J. Wong
@ 2018-05-08 20:46   ` Omar Sandoval
  1 sibling, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2018-05-08 20:46 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-xfs, dchinner, hch

On Mon, May 07, 2018 at 10:13:34AM -0600, Jens Axboe wrote:
> This is in preparation for having more write queues, in which
> case we would have needed to pass in more information than just
> a simple 'is_kswapd' boolean.


Reviewed-by: Omar Sandoval <osandov@fb.com>

> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/blk-wbt.c | 25 +++++++++++++++----------
>  block/blk-wbt.h |  4 +++-
>  2 files changed, 18 insertions(+), 11 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] blk-wbt: throttle discards like background writes
  2018-05-07 16:13 ` [PATCH 4/4] blk-wbt: throttle discards like background writes Jens Axboe
  2018-05-08  0:13   ` Darrick J. Wong
@ 2018-05-08 20:48   ` Omar Sandoval
  1 sibling, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2018-05-08 20:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-xfs, dchinner, hch

On Mon, May 07, 2018 at 10:13:35AM -0600, Jens Axboe wrote:
> Throttle discards like we would any background write. Discards should
> be background activity, so if they are impacting foreground IO, then
> we will throttle them down.

Seems reasonable.

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/blk-stat.h |  6 +++---
>  block/blk-wbt.c  | 43 ++++++++++++++++++++++++++-----------------
>  block/blk-wbt.h  |  4 +++-
>  3 files changed, 32 insertions(+), 21 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] block: break discard submissions into the user defined size
  2018-05-08 20:43   ` Omar Sandoval
@ 2018-05-08 20:57     ` Jens Axboe
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2018-05-08 20:57 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-block, linux-xfs, dchinner, hch

On 5/8/18 2:43 PM, Omar Sandoval wrote:
> On Mon, May 07, 2018 at 10:13:32AM -0600, Jens Axboe wrote:
>> Don't build discards bigger than what the user asked for, if the
>> user decided to limit the size by writing to 'discard_max_bytes'.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>>  block/blk-lib.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>> index a676084d4740..7417d617091b 100644
>> --- a/block/blk-lib.c
>> +++ b/block/blk-lib.c
>> @@ -62,10 +62,11 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>>  		unsigned int req_sects;
>>  		sector_t end_sect, tmp;
>>  
>> -		/* Make sure bi_size doesn't overflow */
>> -		req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
>> +		/* Issue in chunks of the user defined max discard setting */
>> +		req_sects = min_t(sector_t, nr_sects,
>> +					q->limits.max_discard_sectors);
>>  
> 
> Some drivers, including nvme, do
> 
>    blk_queue_max_discard_sectors(q, UINT_MAX)
> 
> That means q->limits.max_discard_sectors can be UINT_MAX, and
> 
>     bio->bi_iter.bi_size = req_sects << 9;
> 
> will overflow. We should probably cap max_discard_sectors at
> UINT_MAX >> 9 in a prep patch.

Probably better to just keep the local overflow check, I'll do that.
Thanks for spotting it!

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-05-08 20:57 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-07 16:13 [PATCHSET v2 0/4] Add throttling for discards Jens Axboe
2018-05-07 16:13 ` [PATCH 1/4] block: break discard submissions into the user defined size Jens Axboe
2018-05-07 23:56   ` Darrick J. Wong
2018-05-08 20:43   ` Omar Sandoval
2018-05-08 20:57     ` Jens Axboe
2018-05-07 16:13 ` [PATCH 2/4] blk-wbt: account any writing command as a write Jens Axboe
2018-05-07 23:58   ` Darrick J. Wong
2018-05-08 20:44   ` Omar Sandoval
2018-05-07 16:13 ` [PATCH 3/4] blk-wbt: pass in enum wbt_flags to get_rq_wait() Jens Axboe
2018-05-08  0:07   ` Darrick J. Wong
2018-05-08 20:46   ` Omar Sandoval
2018-05-07 16:13 ` [PATCH 4/4] blk-wbt: throttle discards like background writes Jens Axboe
2018-05-08  0:13   ` Darrick J. Wong
2018-05-08 20:48   ` Omar Sandoval

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.