* [PATCHSET 0/3] Add throttling for discards @ 2018-05-03 15:20 Jens Axboe 2018-05-03 15:20 ` [PATCH 1/3] block: break discard submissions into the user defined size Jens Axboe ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Jens Axboe @ 2018-05-03 15:20 UTC (permalink / raw) To: linux-block, linux-xfs; +Cc: dchinner, hch I implemented support for discards in blk-wbt, so we treat them like background writes. If we have competing foreground IO, then we may throttle discards. Otherwise they should run at full speed. -- Jens Axboe ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] block: break discard submissions into the user defined size 2018-05-03 15:20 [PATCHSET 0/3] Add throttling for discards Jens Axboe @ 2018-05-03 15:20 ` Jens Axboe 2018-05-07 9:51 ` Christoph Hellwig 2018-05-03 15:20 ` [PATCH 2/3] blk-wbt: account any writing command as a write Jens Axboe ` (2 subsequent siblings) 3 siblings, 1 reply; 10+ messages in thread From: Jens Axboe @ 2018-05-03 15:20 UTC (permalink / raw) To: linux-block, linux-xfs; +Cc: dchinner, hch, Jens Axboe Don't build discards bigger than what the user asked for, if the user decided to limit the size by writing to 'discard_max_bytes'. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- block/blk-lib.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index a676084d4740..7417d617091b 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -62,10 +62,11 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, unsigned int req_sects; sector_t end_sect, tmp; - /* Make sure bi_size doesn't overflow */ - req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9); + /* Issue in chunks of the user defined max discard setting */ + req_sects = min_t(sector_t, nr_sects, + q->limits.max_discard_sectors); - /** + /* * If splitting a request, and the next starting sector would be * misaligned, stop the discard at the previous aligned sector. */ -- 2.7.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] block: break discard submissions into the user defined size 2018-05-03 15:20 ` [PATCH 1/3] block: break discard submissions into the user defined size Jens Axboe @ 2018-05-07 9:51 ` Christoph Hellwig 2018-05-07 15:47 ` Jens Axboe 0 siblings, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2018-05-07 9:51 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, linux-xfs, dchinner, hch On Thu, May 03, 2018 at 09:20:41AM -0600, Jens Axboe wrote: > Don't build discards bigger than what the user asked for, if the > user decided to limit the size by writing to 'discard_max_bytes'. > > Signed-off-by: Jens Axboe <axboe@kernel.dk> Why do this here when blk_bio_discard_split already takes care of it? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] block: break discard submissions into the user defined size 2018-05-07 9:51 ` Christoph Hellwig @ 2018-05-07 15:47 ` Jens Axboe 0 siblings, 0 replies; 10+ messages in thread From: Jens Axboe @ 2018-05-07 15:47 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-block, linux-xfs, dchinner On 5/7/18 3:51 AM, Christoph Hellwig wrote: > On Thu, May 03, 2018 at 09:20:41AM -0600, Jens Axboe wrote: >> Don't build discards bigger than what the user asked for, if the >> user decided to limit the size by writing to 'discard_max_bytes'. >> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> > > Why do this here when blk_bio_discard_split already takes care of it? Functionally it should be the same, just seems pointless to defer the chopping if we can build them the right size to begin with. -- Jens Axboe ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/3] blk-wbt: account any writing command as a write 2018-05-03 15:20 [PATCHSET 0/3] Add throttling for discards Jens Axboe 2018-05-03 15:20 ` [PATCH 1/3] block: break discard submissions into the user defined size Jens Axboe @ 2018-05-03 15:20 ` Jens Axboe 2018-05-07 9:53 ` Christoph Hellwig 2018-05-03 15:20 ` [PATCH 3/3] blk-wbt: throttle discards like background writes Jens Axboe 2018-05-03 17:45 ` [PATCHSET 0/3] Add throttling for discards Jens Axboe 3 siblings, 1 reply; 10+ messages in thread From: Jens Axboe @ 2018-05-03 15:20 UTC (permalink / raw) To: linux-block, linux-xfs; +Cc: dchinner, hch, Jens Axboe We currently special case WRITE and FLUSH, but we should really just include any command with the write bit set. This ensures that we account DISCARD. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- block/blk-wbt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-wbt.c b/block/blk-wbt.c index f92fc84b5e2c..3e34b41bcefc 100644 --- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -701,7 +701,7 @@ static int wbt_data_dir(const struct request *rq) if (op == REQ_OP_READ) return READ; - else if (op == REQ_OP_WRITE || op == REQ_OP_FLUSH) + else if (op_is_write(op)) return WRITE; /* don't account */ -- 2.7.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] blk-wbt: account any writing command as a write 2018-05-03 15:20 ` [PATCH 2/3] blk-wbt: account any writing command as a write Jens Axboe @ 2018-05-07 9:53 ` Christoph Hellwig 0 siblings, 0 replies; 10+ messages in thread From: Christoph Hellwig @ 2018-05-07 9:53 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, linux-xfs, dchinner, hch On Thu, May 03, 2018 at 09:20:42AM -0600, Jens Axboe wrote: > We currently special case WRITE and FLUSH, but we should really > just include any command with the write bit set. This ensures > that we account DISCARD. Looks fine, Reviewed-by: Christoph Hellwig <hch@lst.de> > diff --git a/block/blk-wbt.c b/block/blk-wbt.c > index f92fc84b5e2c..3e34b41bcefc 100644 > --- a/block/blk-wbt.c > +++ b/block/blk-wbt.c > @@ -701,7 +701,7 @@ static int wbt_data_dir(const struct request *rq) > > if (op == REQ_OP_READ) > return READ; > - else if (op == REQ_OP_WRITE || op == REQ_OP_FLUSH) > + else if (op_is_write(op)) > return WRITE; > > /* don't account */ But this whole function smells, the return value is an array index, so instead of abusing read/write this should return 0 or 1 and be called _idx or similar. But I guess that is a question for the higher level interface. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 3/3] blk-wbt: throttle discards like background writes 2018-05-03 15:20 [PATCHSET 0/3] Add throttling for discards Jens Axboe 2018-05-03 15:20 ` [PATCH 1/3] block: break discard submissions into the user defined size Jens Axboe 2018-05-03 15:20 ` [PATCH 2/3] blk-wbt: account any writing command as a write Jens Axboe @ 2018-05-03 15:20 ` Jens Axboe 2018-05-07 9:57 ` Christoph Hellwig 2018-05-03 17:45 ` [PATCHSET 0/3] Add throttling for discards Jens Axboe 3 siblings, 1 reply; 10+ messages in thread From: Jens Axboe @ 2018-05-03 15:20 UTC (permalink / raw) To: linux-block, linux-xfs; +Cc: dchinner, hch, Jens Axboe Throttle discards like we would any background write. Discards should be background activity, so if they are impacting foreground IO, then we will throttle them down. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- block/blk-stat.h | 6 +++--- block/blk-wbt.c | 52 ++++++++++++++++++++++++++++++++++------------------ block/blk-wbt.h | 9 +++++++-- 3 files changed, 44 insertions(+), 23 deletions(-) diff --git a/block/blk-stat.h b/block/blk-stat.h index 2dd36347252a..c22049a8125e 100644 --- a/block/blk-stat.h +++ b/block/blk-stat.h @@ -10,11 +10,11 @@ /* * from upper: - * 3 bits: reserved for other usage + * 4 bits: reserved for other usage * 12 bits: size - * 49 bits: time + * 48 bits: time */ -#define BLK_STAT_RES_BITS 3 +#define BLK_STAT_RES_BITS 4 #define BLK_STAT_SIZE_BITS 12 #define BLK_STAT_RES_SHIFT (64 - BLK_STAT_RES_BITS) #define BLK_STAT_SIZE_SHIFT (BLK_STAT_RES_SHIFT - BLK_STAT_SIZE_BITS) diff --git a/block/blk-wbt.c b/block/blk-wbt.c index 3e34b41bcefc..4d222100746c 100644 --- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -101,9 +101,15 @@ static bool wb_recent_wait(struct rq_wb *rwb) return time_before(jiffies, wb->dirty_sleep + HZ); } -static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, bool is_kswapd) +static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, bool is_trim, + bool is_kswapd) { - return &rwb->rq_wait[is_kswapd]; + if (is_trim) + return &rwb->rq_wait[WBT_REQ_DISCARD]; + else if (is_kswapd) + return &rwb->rq_wait[WBT_REQ_KSWAPD]; + else + return &rwb->rq_wait[WBT_REQ_BG]; } static void rwb_wake_all(struct rq_wb *rwb) @@ -120,13 +126,14 @@ static void rwb_wake_all(struct rq_wb *rwb) void __wbt_done(struct rq_wb *rwb, enum wbt_flags wb_acct) { + const bool is_trim = wb_acct & WBT_DISCARD; struct rq_wait *rqw; int inflight, limit; if (!(wb_acct & WBT_TRACKED)) return; - rqw = get_rq_wait(rwb, wb_acct & WBT_KSWAPD); + rqw = get_rq_wait(rwb, is_trim, wb_acct & WBT_KSWAPD); inflight = atomic_dec_return(&rqw->inflight); /* @@ -139,10 +146,13 @@ void __wbt_done(struct rq_wb *rwb, enum wbt_flags wb_acct) } /* - * If the device does write back caching, drop further down - * before we wake people up. + * For discards, our limit is always the background. For writes, if + * the device does write back caching, drop further down before we + * wake people up. */ - if (rwb->wc && !wb_recent_wait(rwb)) + if (is_trim) + limit = rwb->wb_background; + else if (rwb->wc && !wb_recent_wait(rwb)) limit = 0; else limit = rwb->wb_normal; @@ -479,6 +489,9 @@ static inline unsigned int get_limit(struct rq_wb *rwb, unsigned long rw) { unsigned int limit; + if ((rw & REQ_OP_MASK) == REQ_OP_DISCARD) + return rwb->wb_background; + /* * At this point we know it's a buffered write. If this is * kswapd trying to free memory, or REQ_SYNC is set, then @@ -533,7 +546,8 @@ static void __wbt_wait(struct rq_wb *rwb, unsigned long rw, spinlock_t *lock) __releases(lock) __acquires(lock) { - struct rq_wait *rqw = get_rq_wait(rwb, current_is_kswapd()); + const bool is_trim = (rw & REQ_OP_MASK) == REQ_OP_DISCARD; + struct rq_wait *rqw = get_rq_wait(rwb, is_trim, current_is_kswapd()); DEFINE_WAIT(wait); if (may_queue(rwb, rqw, &wait, rw)) @@ -561,19 +575,19 @@ static inline bool wbt_should_throttle(struct rq_wb *rwb, struct bio *bio) { const int op = bio_op(bio); - /* - * If not a WRITE, do nothing - */ - if (op != REQ_OP_WRITE) - return false; + if (op == REQ_OP_WRITE) { + /* + * Don't throttle WRITE_ODIRECT + */ + if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == + (REQ_SYNC | REQ_IDLE)) + return false; - /* - * Don't throttle WRITE_ODIRECT - */ - if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == (REQ_SYNC | REQ_IDLE)) - return false; + return true; + } else if (op == REQ_OP_DISCARD) + return true; - return true; + return false; } /* @@ -605,6 +619,8 @@ enum wbt_flags wbt_wait(struct rq_wb *rwb, struct bio *bio, spinlock_t *lock) if (current_is_kswapd()) ret |= WBT_KSWAPD; + if (bio_op(bio) == REQ_OP_DISCARD) + ret |= WBT_DISCARD; return ret | WBT_TRACKED; } diff --git a/block/blk-wbt.h b/block/blk-wbt.h index a232c98fbf4d..af451876ea31 100644 --- a/block/blk-wbt.h +++ b/block/blk-wbt.h @@ -14,12 +14,17 @@ enum wbt_flags { WBT_TRACKED = 1, /* write, tracked for throttling */ WBT_READ = 2, /* read */ WBT_KSWAPD = 4, /* write, from kswapd */ + WBT_DISCARD = 8, - WBT_NR_BITS = 3, /* number of bits */ + WBT_NR_BITS = 4, /* number of bits */ }; enum { - WBT_NUM_RWQ = 2, + WBT_REQ_BG = 0, + WBT_REQ_KSWAPD, + WBT_REQ_DISCARD, + + WBT_NUM_RWQ, }; /* -- 2.7.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] blk-wbt: throttle discards like background writes 2018-05-03 15:20 ` [PATCH 3/3] blk-wbt: throttle discards like background writes Jens Axboe @ 2018-05-07 9:57 ` Christoph Hellwig 2018-05-07 15:51 ` Jens Axboe 0 siblings, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2018-05-07 9:57 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, linux-xfs, dchinner, hch > -static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, bool is_kswapd) > +static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, bool is_trim, > + bool is_kswapd) > { > - return &rwb->rq_wait[is_kswapd]; > + if (is_trim) > + return &rwb->rq_wait[WBT_REQ_DISCARD]; > + else if (is_kswapd) > + return &rwb->rq_wait[WBT_REQ_KSWAPD]; > + else > + return &rwb->rq_wait[WBT_REQ_BG]; > } Wouldn't it be more useful to pass a enum wbt_flag here? Or just have a wbt_flag_to_wait_idx helper and do the array indexing in the callers? > { > const int op = bio_op(bio); > > - /* > - * If not a WRITE, do nothing > - */ > - if (op != REQ_OP_WRITE) > - return false; > + if (op == REQ_OP_WRITE) { > + /* > + * Don't throttle WRITE_ODIRECT > + */ > + if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == > + (REQ_SYNC | REQ_IDLE)) > + return false; > > - /* > - * Don't throttle WRITE_ODIRECT > - */ > - if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == (REQ_SYNC | REQ_IDLE)) > - return false; > + return true; > + } else if (op == REQ_OP_DISCARD) > + return true; what about: switch (bio_op(bio)) { case REQ_OP_WRITE: /* * Don't throttle WRITE_ODIRECT */ if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == (REQ_SYNC | REQ_IDLE)) return false; /*FALLTHROUGH*/ case REQ_OP_DISCARD: return true; default: return false; ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] blk-wbt: throttle discards like background writes 2018-05-07 9:57 ` Christoph Hellwig @ 2018-05-07 15:51 ` Jens Axboe 0 siblings, 0 replies; 10+ messages in thread From: Jens Axboe @ 2018-05-07 15:51 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-block, linux-xfs, dchinner On 5/7/18 3:57 AM, Christoph Hellwig wrote: >> -static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, bool is_kswapd) >> +static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb, bool is_trim, >> + bool is_kswapd) >> { >> - return &rwb->rq_wait[is_kswapd]; >> + if (is_trim) >> + return &rwb->rq_wait[WBT_REQ_DISCARD]; >> + else if (is_kswapd) >> + return &rwb->rq_wait[WBT_REQ_KSWAPD]; >> + else >> + return &rwb->rq_wait[WBT_REQ_BG]; >> } > > Wouldn't it be more useful to pass a enum wbt_flag here? > > Or just have a wbt_flag_to_wait_idx helper and do the array indexing > in the callers? It would be cleaner, but we don't have wbt_flag everywhere we need it. Though I guess we could swap the masking in wbt_wait() and do it before the __wbt_wait() call, and just use that. Since we only do the indexing in that one spot, I don't think we should add a helper. > >> { >> const int op = bio_op(bio); >> >> - /* >> - * If not a WRITE, do nothing >> - */ >> - if (op != REQ_OP_WRITE) >> - return false; >> + if (op == REQ_OP_WRITE) { >> + /* >> + * Don't throttle WRITE_ODIRECT >> + */ >> + if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == >> + (REQ_SYNC | REQ_IDLE)) >> + return false; >> >> - /* >> - * Don't throttle WRITE_ODIRECT >> - */ >> - if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == (REQ_SYNC | REQ_IDLE)) >> - return false; >> + return true; >> + } else if (op == REQ_OP_DISCARD) >> + return true; > > what about: > > switch (bio_op(bio)) { > case REQ_OP_WRITE: > /* > * Don't throttle WRITE_ODIRECT > */ > if ((bio->bi_opf & (REQ_SYNC | REQ_IDLE)) == > (REQ_SYNC | REQ_IDLE)) > return false; > /*FALLTHROUGH*/ > case REQ_OP_DISCARD: > return true; > default: > return false; Sure, I can do that. I'll spin a v2. -- Jens Axboe ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHSET 0/3] Add throttling for discards 2018-05-03 15:20 [PATCHSET 0/3] Add throttling for discards Jens Axboe ` (2 preceding siblings ...) 2018-05-03 15:20 ` [PATCH 3/3] blk-wbt: throttle discards like background writes Jens Axboe @ 2018-05-03 17:45 ` Jens Axboe 3 siblings, 0 replies; 10+ messages in thread From: Jens Axboe @ 2018-05-03 17:45 UTC (permalink / raw) To: linux-block, linux-xfs; +Cc: dchinner, hch [-- Attachment #1: Type: text/plain, Size: 1573 bytes --] On 5/3/18 9:20 AM, Jens Axboe wrote: > I implemented support for discards in blk-wbt, so we treat them like > background writes. If we have competing foreground IO, then we may > throttle discards. Otherwise they should run at full speed. Ran a quick local test case on a NVMe device. The test case here is doing reads from a file, while other files are being delete. File system is XFS. Three separate runs: 1) 'justreader' is just running the reader workload by itself. 2) 'nothrottle' is running the reader with the unlinks on the stock kernel. 3) 'throttled' is running the reader with the unlinks with the patches applied. The fio run logs latencies in windows of 50msec, logging only the worst latency seen in that window: [read] filename=file0 rw=randread bs=4k direct=1 ioengine=libaio iodepth=8 write_lat_log=read_lat log_avg_msec=50 log_max_value=1 As you can see, with the throttling, the latencies and performance is the same as not having the unlinks running. They finish within 100msec of each other, after having read 10G of data. Without the throttling and unlinks running, we see spikes that are almost an order of magnitude worse. The reader also takes 1 second longer to complete. 322MB/sec vs 313MB/sec. It's also visible in the higher Pxx. Here's the throttled end of the percentiles: | 99.00th=[ 145], 99.50th=[ 153], 99.90th=[ 180], 99.95th=[ 198], | 99.99th=[ 889] and here's the unthrottled equivalent: | 99.00th=[ 145], 99.50th=[ 155], 99.90th=[ 188], 99.95th=[ 322], | 99.99th=[ 8586] -- Jens Axboe [-- Attachment #2: Reader latencies-lat.jpg --] [-- Type: image/jpeg, Size: 100302 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-05-07 15:51 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-03 15:20 [PATCHSET 0/3] Add throttling for discards Jens Axboe 2018-05-03 15:20 ` [PATCH 1/3] block: break discard submissions into the user defined size Jens Axboe 2018-05-07 9:51 ` Christoph Hellwig 2018-05-07 15:47 ` Jens Axboe 2018-05-03 15:20 ` [PATCH 2/3] blk-wbt: account any writing command as a write Jens Axboe 2018-05-07 9:53 ` Christoph Hellwig 2018-05-03 15:20 ` [PATCH 3/3] blk-wbt: throttle discards like background writes Jens Axboe 2018-05-07 9:57 ` Christoph Hellwig 2018-05-07 15:51 ` Jens Axboe 2018-05-03 17:45 ` [PATCHSET 0/3] Add throttling for discards Jens Axboe
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.