All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v2 0/6] block: more scalable inflight tracking
@ 2017-08-04 15:04 Jens Axboe
  2017-08-04 15:04 ` [PATCH 1/6] blk-mq-tag: check for NULL rq when iterating tags Jens Axboe
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Jens Axboe @ 2017-08-04 15:04 UTC (permalink / raw)
  To: linux-block; +Cc: brking, bart.vanassche

One scaling issue we currently have in the block code is the
inflight accounting. It's based on a per-device atomic count
for reads and writes, which means that even for an mq device
with lots of hardware queues, we end up dirtying a per-device
cacheline for each IO. The issue can easily be observed by
using null_blk:

modprobe null_blk submit_queues=48 queue_mode=2

and running a fio job that has 32 jobs doing sync reads on
the device (average of 3 runs, though deviation is low):

stats		IOPS		usr		sys
------------------------------------------------------
on		 2.6M		 5.4%		94.6%
off		21.0M		33.7%		67.3%

which shows a 10x slowdown with stats enabled. If we look at
the profile for stats on, the top entries are:

37.38%  fio      [kernel.vmlinux]    [k] blk_account_io_done
14.65%  fio      [kernel.vmlinux]    [k] blk_account_io_start
14.29%  fio      [kernel.vmlinux]    [k] part_round_stats_single
11.81%  fio      [kernel.vmlinux]    [k] blk_account_io_completion
 3.62%  fio      [kernel.vmlinux]    [k] part_round_stats
 0.81%  fio      [kernel.vmlinux]    [k] __blkdev_direct_IO_simple

which shows the system time being dominated by the stats accounting.

This patch series replaces the atomic counter with using the tags
hamming weight for tracking infligh counts. This means we don't have
to do anything when IO starts or completes, and for reading the value
we just have to check how many bits we have set in the tag maps on
the queues. This part is limited to 1000 per second (for HZ=1000).
Using this approach, running the same test again results in:

stats		IOPS		usr		sys
------------------------------------------------------
on		20.4M		30.9%		69.0%
off		21.4M		32.4%		67.4%

and doing a profiled run with stats on, the top of stats reporting is
now:

1.23%  fio  [kernel.vmlinux]  [k] blk_account_io_done
0.83%  fio  [kernel.vmlinux]  [k] blk_account_io_start
0.55%  fio  [kernel.vmlinux]  [k] blk_account_io_completion

which is a lot more reasonable. The difference between stats on and
off is now also neglible. Brian also ran a bunch of numbers on
various implementations of this, showing a big win on a real device
on PPC.

You can also download the code here:


git://git.kernel.dk/linux-block mq-inflight.2


Changes since v1:

- Reword commit message for patch #1, to indicate that the race is
  related to when a tag is allocated, not freed.
- Remove redundant init of inflight to 0 in intermediate patch. Didn't
  matter for the end result, since it went away in the last patch.
- Kill part_in_flight_double(). Add a prep patch that just changes
  part_in_flight() to take an array for accounting instead.
- Fix root vs part accounting.
- Since it's related to tag iteration, add a patch to remove the
  sbitmap tags iteration return value. It's unused.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/6] blk-mq-tag: check for NULL rq when iterating tags
  2017-08-04 15:04 [PATCHSET v2 0/6] block: more scalable inflight tracking Jens Axboe
@ 2017-08-04 15:04 ` Jens Axboe
  2017-08-04 19:34   ` Bart Van Assche
  2017-08-08 22:28   ` Omar Sandoval
  2017-08-04 15:04 ` [PATCH 2/6] block: pass in queue to inflight accounting Jens Axboe
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Jens Axboe @ 2017-08-04 15:04 UTC (permalink / raw)
  To: linux-block; +Cc: brking, bart.vanassche, Jens Axboe

Since we introduced blk-mq-sched, the tags->rqs[] array has been
dynamically assigned. So we need to check for NULL when iterating,
since there's a window of time where the bit is set, but we haven't
dynamically assigned the tags->rqs[] array position yet.

This is perfectly safe, since the memory backing of the request is
never going away while the device is alive.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq-tag.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index d0be72ccb091..b856b2827157 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -214,7 +214,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 		bitnr += tags->nr_reserved_tags;
 	rq = tags->rqs[bitnr];
 
-	if (rq->q == hctx->queue)
+	if (rq && rq->q == hctx->queue)
 		iter_data->fn(hctx, rq, iter_data->data, reserved);
 	return true;
 }
@@ -249,8 +249,8 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	if (!reserved)
 		bitnr += tags->nr_reserved_tags;
 	rq = tags->rqs[bitnr];
-
-	iter_data->fn(rq, iter_data->data, reserved);
+	if (rq)
+		iter_data->fn(rq, iter_data->data, reserved);
 	return true;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/6] block: pass in queue to inflight accounting
  2017-08-04 15:04 [PATCHSET v2 0/6] block: more scalable inflight tracking Jens Axboe
  2017-08-04 15:04 ` [PATCH 1/6] blk-mq-tag: check for NULL rq when iterating tags Jens Axboe
@ 2017-08-04 15:04 ` Jens Axboe
  2017-08-08 22:30   ` Omar Sandoval
  2017-08-04 15:04 ` [PATCH 3/6] block: make part_in_flight() take an array of two ints Jens Axboe
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2017-08-04 15:04 UTC (permalink / raw)
  To: linux-block; +Cc: brking, bart.vanassche, Jens Axboe

No functional change in this patch, just in preparation for
basing the inflight mechanism on the queue in question.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
---
 block/bio.c                   | 16 ++++++++--------
 block/blk-core.c              | 22 ++++++++++++----------
 block/blk-merge.c             |  4 ++--
 block/genhd.c                 |  4 ++--
 block/partition-generic.c     |  5 +++--
 drivers/block/drbd/drbd_req.c | 10 +++++++---
 drivers/block/rsxx/dev.c      |  6 +++---
 drivers/block/zram/zram_drv.c |  5 +++--
 drivers/md/bcache/request.c   |  7 ++++---
 drivers/md/dm.c               |  6 +++---
 drivers/nvdimm/nd.h           |  5 +++--
 include/linux/bio.h           |  9 +++++----
 include/linux/genhd.h         | 14 +++++++++-----
 13 files changed, 64 insertions(+), 49 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index e241bbc49f14..ecd1a9c7a301 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1736,29 +1736,29 @@ void bio_check_pages_dirty(struct bio *bio)
 	}
 }
 
-void generic_start_io_acct(int rw, unsigned long sectors,
-			   struct hd_struct *part)
+void generic_start_io_acct(struct request_queue *q, int rw,
+			   unsigned long sectors, struct hd_struct *part)
 {
 	int cpu = part_stat_lock();
 
-	part_round_stats(cpu, part);
+	part_round_stats(q, cpu, part);
 	part_stat_inc(cpu, part, ios[rw]);
 	part_stat_add(cpu, part, sectors[rw], sectors);
-	part_inc_in_flight(part, rw);
+	part_inc_in_flight(q, part, rw);
 
 	part_stat_unlock();
 }
 EXPORT_SYMBOL(generic_start_io_acct);
 
-void generic_end_io_acct(int rw, struct hd_struct *part,
-			 unsigned long start_time)
+void generic_end_io_acct(struct request_queue *q, int rw,
+			 struct hd_struct *part, unsigned long start_time)
 {
 	unsigned long duration = jiffies - start_time;
 	int cpu = part_stat_lock();
 
 	part_stat_add(cpu, part, ticks[rw], duration);
-	part_round_stats(cpu, part);
-	part_dec_in_flight(part, rw);
+	part_round_stats(q, cpu, part);
+	part_dec_in_flight(q, part, rw);
 
 	part_stat_unlock();
 }
diff --git a/block/blk-core.c b/block/blk-core.c
index dbecbf4a64e0..8ee954c12e9d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1469,15 +1469,15 @@ static void add_acct_request(struct request_queue *q, struct request *rq,
 	__elv_add_request(q, rq, where);
 }
 
-static void part_round_stats_single(int cpu, struct hd_struct *part,
-				    unsigned long now)
+static void part_round_stats_single(struct request_queue *q, int cpu,
+				    struct hd_struct *part, unsigned long now)
 {
 	int inflight;
 
 	if (now == part->stamp)
 		return;
 
-	inflight = part_in_flight(part);
+	inflight = part_in_flight(q, part);
 	if (inflight) {
 		__part_stat_add(cpu, part, time_in_queue,
 				inflight * (now - part->stamp));
@@ -1488,6 +1488,7 @@ static void part_round_stats_single(int cpu, struct hd_struct *part,
 
 /**
  * part_round_stats() - Round off the performance stats on a struct disk_stats.
+ * @q: target block queue
  * @cpu: cpu number for stats access
  * @part: target partition
  *
@@ -1502,13 +1503,14 @@ static void part_round_stats_single(int cpu, struct hd_struct *part,
  * /proc/diskstats.  This accounts immediately for all queue usage up to
  * the current jiffies and restarts the counters again.
  */
-void part_round_stats(int cpu, struct hd_struct *part)
+void part_round_stats(struct request_queue *q, int cpu, struct hd_struct *part)
 {
 	unsigned long now = jiffies;
 
 	if (part->partno)
-		part_round_stats_single(cpu, &part_to_disk(part)->part0, now);
-	part_round_stats_single(cpu, part, now);
+		part_round_stats_single(q, cpu, &part_to_disk(part)->part0,
+						now);
+	part_round_stats_single(q, cpu, part, now);
 }
 EXPORT_SYMBOL_GPL(part_round_stats);
 
@@ -2431,8 +2433,8 @@ void blk_account_io_done(struct request *req)
 
 		part_stat_inc(cpu, part, ios[rw]);
 		part_stat_add(cpu, part, ticks[rw], duration);
-		part_round_stats(cpu, part);
-		part_dec_in_flight(part, rw);
+		part_round_stats(req->q, cpu, part);
+		part_dec_in_flight(req->q, part, rw);
 
 		hd_struct_put(part);
 		part_stat_unlock();
@@ -2489,8 +2491,8 @@ void blk_account_io_start(struct request *rq, bool new_io)
 			part = &rq->rq_disk->part0;
 			hd_struct_get(part);
 		}
-		part_round_stats(cpu, part);
-		part_inc_in_flight(part, rw);
+		part_round_stats(rq->q, cpu, part);
+		part_inc_in_flight(rq->q, part, rw);
 		rq->part = part;
 	}
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 99038830fb42..05f116bfb99d 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -633,8 +633,8 @@ static void blk_account_io_merge(struct request *req)
 		cpu = part_stat_lock();
 		part = req->part;
 
-		part_round_stats(cpu, part);
-		part_dec_in_flight(part, rq_data_dir(req));
+		part_round_stats(req->q, cpu, part);
+		part_dec_in_flight(req->q, part, rq_data_dir(req));
 
 		hd_struct_put(part);
 		part_stat_unlock();
diff --git a/block/genhd.c b/block/genhd.c
index 7f520fa25d16..f735af67a0c9 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1217,7 +1217,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 	disk_part_iter_init(&piter, gp, DISK_PITER_INCL_EMPTY_PART0);
 	while ((hd = disk_part_iter_next(&piter))) {
 		cpu = part_stat_lock();
-		part_round_stats(cpu, hd);
+		part_round_stats(gp->queue, cpu, hd);
 		part_stat_unlock();
 		seq_printf(seqf, "%4d %7d %s %lu %lu %lu "
 			   "%u %lu %lu %lu %u %u %u %u\n",
@@ -1231,7 +1231,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 			   part_stat_read(hd, merges[WRITE]),
 			   part_stat_read(hd, sectors[WRITE]),
 			   jiffies_to_msecs(part_stat_read(hd, ticks[WRITE])),
-			   part_in_flight(hd),
+			   part_in_flight(gp->queue, hd),
 			   jiffies_to_msecs(part_stat_read(hd, io_ticks)),
 			   jiffies_to_msecs(part_stat_read(hd, time_in_queue))
 			);
diff --git a/block/partition-generic.c b/block/partition-generic.c
index c5ec8246e25e..d1bdd61e1d71 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -112,10 +112,11 @@ ssize_t part_stat_show(struct device *dev,
 		       struct device_attribute *attr, char *buf)
 {
 	struct hd_struct *p = dev_to_part(dev);
+	struct request_queue *q = dev_to_disk(dev)->queue;
 	int cpu;
 
 	cpu = part_stat_lock();
-	part_round_stats(cpu, p);
+	part_round_stats(q, cpu, p);
 	part_stat_unlock();
 	return sprintf(buf,
 		"%8lu %8lu %8llu %8u "
@@ -130,7 +131,7 @@ ssize_t part_stat_show(struct device *dev,
 		part_stat_read(p, merges[WRITE]),
 		(unsigned long long)part_stat_read(p, sectors[WRITE]),
 		jiffies_to_msecs(part_stat_read(p, ticks[WRITE])),
-		part_in_flight(p),
+		part_in_flight(q, p),
 		jiffies_to_msecs(part_stat_read(p, io_ticks)),
 		jiffies_to_msecs(part_stat_read(p, time_in_queue)));
 }
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index f6e865b2d543..8d6b5d137b5e 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -36,14 +36,18 @@ static bool drbd_may_do_local_read(struct drbd_device *device, sector_t sector,
 /* Update disk stats at start of I/O request */
 static void _drbd_start_io_acct(struct drbd_device *device, struct drbd_request *req)
 {
-	generic_start_io_acct(bio_data_dir(req->master_bio), req->i.size >> 9,
-			      &device->vdisk->part0);
+	struct request_queue *q = device->rq_queue;
+
+	generic_start_io_acct(q, bio_data_dir(req->master_bio),
+				req->i.size >> 9, &device->vdisk->part0);
 }
 
 /* Update disk stats when completing request upwards */
 static void _drbd_end_io_acct(struct drbd_device *device, struct drbd_request *req)
 {
-	generic_end_io_acct(bio_data_dir(req->master_bio),
+	struct request_queue *q = device->rq_queue;
+
+	generic_end_io_acct(q, bio_data_dir(req->master_bio),
 			    &device->vdisk->part0, req->start_jif);
 }
 
diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index 7f4acebf4657..e397d3ee7308 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -112,7 +112,7 @@ static const struct block_device_operations rsxx_fops = {
 
 static void disk_stats_start(struct rsxx_cardinfo *card, struct bio *bio)
 {
-	generic_start_io_acct(bio_data_dir(bio), bio_sectors(bio),
+	generic_start_io_acct(card->queue, bio_data_dir(bio), bio_sectors(bio),
 			     &card->gendisk->part0);
 }
 
@@ -120,8 +120,8 @@ static void disk_stats_complete(struct rsxx_cardinfo *card,
 				struct bio *bio,
 				unsigned long start_time)
 {
-	generic_end_io_acct(bio_data_dir(bio), &card->gendisk->part0,
-			   start_time);
+	generic_end_io_acct(card->queue, bio_data_dir(bio),
+				&card->gendisk->part0, start_time);
 }
 
 static void bio_dma_done_cb(struct rsxx_cardinfo *card,
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 856d5dc02451..1c3383b4a0cf 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -813,9 +813,10 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 {
 	unsigned long start_time = jiffies;
 	int rw_acct = is_write ? REQ_OP_WRITE : REQ_OP_READ;
+	struct request_queue *q = zram->disk->queue;
 	int ret;
 
-	generic_start_io_acct(rw_acct, bvec->bv_len >> SECTOR_SHIFT,
+	generic_start_io_acct(q, rw_acct, bvec->bv_len >> SECTOR_SHIFT,
 			&zram->disk->part0);
 
 	if (!is_write) {
@@ -827,7 +828,7 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 		ret = zram_bvec_write(zram, bvec, index, offset);
 	}
 
-	generic_end_io_acct(rw_acct, &zram->disk->part0, start_time);
+	generic_end_io_acct(q, rw_acct, &zram->disk->part0, start_time);
 
 	if (unlikely(ret)) {
 		if (!is_write)
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 019b3df9f1c6..72eb97176403 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -607,7 +607,8 @@ static void request_endio(struct bio *bio)
 static void bio_complete(struct search *s)
 {
 	if (s->orig_bio) {
-		generic_end_io_acct(bio_data_dir(s->orig_bio),
+		struct request_queue *q = bdev_get_queue(s->orig_bio->bi_bdev);
+		generic_end_io_acct(q, bio_data_dir(s->orig_bio),
 				    &s->d->disk->part0, s->start_time);
 
 		trace_bcache_request_end(s->d, s->orig_bio);
@@ -959,7 +960,7 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q,
 	struct cached_dev *dc = container_of(d, struct cached_dev, disk);
 	int rw = bio_data_dir(bio);
 
-	generic_start_io_acct(rw, bio_sectors(bio), &d->disk->part0);
+	generic_start_io_acct(q, rw, bio_sectors(bio), &d->disk->part0);
 
 	bio->bi_bdev = dc->bdev;
 	bio->bi_iter.bi_sector += dc->sb.data_offset;
@@ -1074,7 +1075,7 @@ static blk_qc_t flash_dev_make_request(struct request_queue *q,
 	struct bcache_device *d = bio->bi_bdev->bd_disk->private_data;
 	int rw = bio_data_dir(bio);
 
-	generic_start_io_acct(rw, bio_sectors(bio), &d->disk->part0);
+	generic_start_io_acct(q, rw, bio_sectors(bio), &d->disk->part0);
 
 	s = search_alloc(bio, d);
 	cl = &s->cl;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 2edbcc2d7d3f..8612a2d1ccd9 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -520,7 +520,7 @@ static void start_io_acct(struct dm_io *io)
 	io->start_time = jiffies;
 
 	cpu = part_stat_lock();
-	part_round_stats(cpu, &dm_disk(md)->part0);
+	part_round_stats(md->queue, cpu, &dm_disk(md)->part0);
 	part_stat_unlock();
 	atomic_set(&dm_disk(md)->part0.in_flight[rw],
 		atomic_inc_return(&md->pending[rw]));
@@ -539,7 +539,7 @@ static void end_io_acct(struct dm_io *io)
 	int pending;
 	int rw = bio_data_dir(bio);
 
-	generic_end_io_acct(rw, &dm_disk(md)->part0, io->start_time);
+	generic_end_io_acct(md->queue, rw, &dm_disk(md)->part0, io->start_time);
 
 	if (unlikely(dm_stats_used(&md->stats)))
 		dm_stats_account_io(&md->stats, bio_data_dir(bio),
@@ -1542,7 +1542,7 @@ static blk_qc_t dm_make_request(struct request_queue *q, struct bio *bio)
 
 	map = dm_get_live_table(md, &srcu_idx);
 
-	generic_start_io_acct(rw, bio_sectors(bio), &dm_disk(md)->part0);
+	generic_start_io_acct(q, rw, bio_sectors(bio), &dm_disk(md)->part0);
 
 	/* if we're suspended, we have to queue this io for later */
 	if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags))) {
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index e1b5715bd91f..73062da3177f 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -396,7 +396,7 @@ static inline bool nd_iostat_start(struct bio *bio, unsigned long *start)
 		return false;
 
 	*start = jiffies;
-	generic_start_io_acct(bio_data_dir(bio),
+	generic_start_io_acct(disk->queue, bio_data_dir(bio),
 			      bio_sectors(bio), &disk->part0);
 	return true;
 }
@@ -404,7 +404,8 @@ static inline void nd_iostat_end(struct bio *bio, unsigned long start)
 {
 	struct gendisk *disk = bio->bi_bdev->bd_disk;
 
-	generic_end_io_acct(bio_data_dir(bio), &disk->part0, start);
+	generic_end_io_acct(disk->queue, bio_data_dir(bio), &disk->part0,
+				start);
 }
 static inline bool is_bad_pmem(struct badblocks *bb, sector_t sector,
 		unsigned int len)
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7b1cf4ba0902..9276788a9b24 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -463,10 +463,11 @@ extern struct bio *bio_copy_kern(struct request_queue *, void *, unsigned int,
 extern void bio_set_pages_dirty(struct bio *bio);
 extern void bio_check_pages_dirty(struct bio *bio);
 
-void generic_start_io_acct(int rw, unsigned long sectors,
-			   struct hd_struct *part);
-void generic_end_io_acct(int rw, struct hd_struct *part,
-			 unsigned long start_time);
+void generic_start_io_acct(struct request_queue *q, int rw,
+				unsigned long sectors, struct hd_struct *part);
+void generic_end_io_acct(struct request_queue *q, int rw,
+				struct hd_struct *part,
+				unsigned long start_time);
 
 #ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
 # error	"You should define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE for your platform"
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index e619fae2f037..7f7427e00f9c 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -362,23 +362,27 @@ static inline void free_part_stats(struct hd_struct *part)
 #define part_stat_sub(cpu, gendiskp, field, subnd)			\
 	part_stat_add(cpu, gendiskp, field, -subnd)
 
-static inline void part_inc_in_flight(struct hd_struct *part, int rw)
+static inline void part_inc_in_flight(struct request_queue *q,
+				      struct hd_struct *part, int rw)
 {
 	atomic_inc(&part->in_flight[rw]);
 	if (part->partno)
 		atomic_inc(&part_to_disk(part)->part0.in_flight[rw]);
 }
 
-static inline void part_dec_in_flight(struct hd_struct *part, int rw)
+static inline void part_dec_in_flight(struct request_queue *q,
+				      struct hd_struct *part, int rw)
 {
 	atomic_dec(&part->in_flight[rw]);
 	if (part->partno)
 		atomic_dec(&part_to_disk(part)->part0.in_flight[rw]);
 }
 
-static inline int part_in_flight(struct hd_struct *part)
+static inline int part_in_flight(struct request_queue *q,
+				 struct hd_struct *part)
 {
-	return atomic_read(&part->in_flight[0]) + atomic_read(&part->in_flight[1]);
+	return atomic_read(&part->in_flight[0]) +
+			atomic_read(&part->in_flight[1]);
 }
 
 static inline struct partition_meta_info *alloc_part_info(struct gendisk *disk)
@@ -395,7 +399,7 @@ static inline void free_part_info(struct hd_struct *part)
 }
 
 /* block/blk-core.c */
-extern void part_round_stats(int cpu, struct hd_struct *part);
+extern void part_round_stats(struct request_queue *q, int cpu, struct hd_struct *part);
 
 /* block/genhd.c */
 extern void device_add_disk(struct device *parent, struct gendisk *disk);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/6] block: make part_in_flight() take an array of two ints
  2017-08-04 15:04 [PATCHSET v2 0/6] block: more scalable inflight tracking Jens Axboe
  2017-08-04 15:04 ` [PATCH 1/6] blk-mq-tag: check for NULL rq when iterating tags Jens Axboe
  2017-08-04 15:04 ` [PATCH 2/6] block: pass in queue to inflight accounting Jens Axboe
@ 2017-08-04 15:04 ` Jens Axboe
  2017-08-04 19:38   ` Bart Van Assche
  2017-08-08 22:42   ` Omar Sandoval
  2017-08-04 15:04 ` [PATCH 4/6] blk-mq: provide internal in-flight variant Jens Axboe
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Jens Axboe @ 2017-08-04 15:04 UTC (permalink / raw)
  To: linux-block; +Cc: brking, bart.vanassche, Jens Axboe

Instead of returning the count that matches the partition, pass
in an array of two ints. Index 0 will be filled with the inflight
count for the partition in question, and index 1 will filled
with the root infligh count, if the partition passed in is not the
root.

This is in preparation for being able to calculate both in one
go.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c          |  8 ++++----
 block/genhd.c             |  4 +++-
 block/partition-generic.c |  4 +++-
 include/linux/genhd.h     | 12 +++++++++---
 4 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 8ee954c12e9d..6ad2b8602c1d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1472,15 +1472,15 @@ static void add_acct_request(struct request_queue *q, struct request *rq,
 static void part_round_stats_single(struct request_queue *q, int cpu,
 				    struct hd_struct *part, unsigned long now)
 {
-	int inflight;
+	int inflight[2];
 
 	if (now == part->stamp)
 		return;
 
-	inflight = part_in_flight(q, part);
-	if (inflight) {
+	part_in_flight(q, part, inflight);
+	if (inflight[0]) {
 		__part_stat_add(cpu, part, time_in_queue,
-				inflight * (now - part->stamp));
+				inflight[0] * (now - part->stamp));
 		__part_stat_add(cpu, part, io_ticks, (now - part->stamp));
 	}
 	part->stamp = now;
diff --git a/block/genhd.c b/block/genhd.c
index f735af67a0c9..822f65f95e2a 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1204,6 +1204,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 	struct disk_part_iter piter;
 	struct hd_struct *hd;
 	char buf[BDEVNAME_SIZE];
+	unsigned int inflight[2];
 	int cpu;
 
 	/*
@@ -1219,6 +1220,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 		cpu = part_stat_lock();
 		part_round_stats(gp->queue, cpu, hd);
 		part_stat_unlock();
+		part_in_flight(gp->queue, hd, inflight);
 		seq_printf(seqf, "%4d %7d %s %lu %lu %lu "
 			   "%u %lu %lu %lu %u %u %u %u\n",
 			   MAJOR(part_devt(hd)), MINOR(part_devt(hd)),
@@ -1231,7 +1233,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 			   part_stat_read(hd, merges[WRITE]),
 			   part_stat_read(hd, sectors[WRITE]),
 			   jiffies_to_msecs(part_stat_read(hd, ticks[WRITE])),
-			   part_in_flight(gp->queue, hd),
+			   inflight[0],
 			   jiffies_to_msecs(part_stat_read(hd, io_ticks)),
 			   jiffies_to_msecs(part_stat_read(hd, time_in_queue))
 			);
diff --git a/block/partition-generic.c b/block/partition-generic.c
index d1bdd61e1d71..fa5049a4d99b 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -113,11 +113,13 @@ ssize_t part_stat_show(struct device *dev,
 {
 	struct hd_struct *p = dev_to_part(dev);
 	struct request_queue *q = dev_to_disk(dev)->queue;
+	unsigned int inflight[2];
 	int cpu;
 
 	cpu = part_stat_lock();
 	part_round_stats(q, cpu, p);
 	part_stat_unlock();
+	part_in_flight(q, p, inflight);
 	return sprintf(buf,
 		"%8lu %8lu %8llu %8u "
 		"%8lu %8lu %8llu %8u "
@@ -131,7 +133,7 @@ ssize_t part_stat_show(struct device *dev,
 		part_stat_read(p, merges[WRITE]),
 		(unsigned long long)part_stat_read(p, sectors[WRITE]),
 		jiffies_to_msecs(part_stat_read(p, ticks[WRITE])),
-		part_in_flight(q, p),
+		inflight[0],
 		jiffies_to_msecs(part_stat_read(p, io_ticks)),
 		jiffies_to_msecs(part_stat_read(p, time_in_queue)));
 }
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 7f7427e00f9c..a9c8ea632fdc 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -378,11 +378,17 @@ static inline void part_dec_in_flight(struct request_queue *q,
 		atomic_dec(&part_to_disk(part)->part0.in_flight[rw]);
 }
 
-static inline int part_in_flight(struct request_queue *q,
-				 struct hd_struct *part)
+static inline void part_in_flight(struct request_queue *q,
+				  struct hd_struct *part,
+				  unsigned int inflight[2])
 {
-	return atomic_read(&part->in_flight[0]) +
+	inflight[0] = atomic_read(&part->in_flight[0]) +
 			atomic_read(&part->in_flight[1]);
+	if (part->partno) {
+		part = &part_to_disk(part)->part0;
+		inflight[1] = atomic_read(&part->in_flight[0]) +
+				atomic_read(&part->in_flight[1]);
+	}
 }
 
 static inline struct partition_meta_info *alloc_part_info(struct gendisk *disk)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/6] blk-mq: provide internal in-flight variant
  2017-08-04 15:04 [PATCHSET v2 0/6] block: more scalable inflight tracking Jens Axboe
                   ` (2 preceding siblings ...)
  2017-08-04 15:04 ` [PATCH 3/6] block: make part_in_flight() take an array of two ints Jens Axboe
@ 2017-08-04 15:04 ` Jens Axboe
  2017-08-04 19:43   ` Bart Van Assche
  2017-08-08 22:46   ` Omar Sandoval
  2017-08-04 15:04 ` [PATCH 5/6] blk-mq: enable checking two part inflight counts at the same time Jens Axboe
  2017-08-04 15:04 ` [PATCH 6/6] sbitmap: make sb_for_each_fn() return void Jens Axboe
  5 siblings, 2 replies; 27+ messages in thread
From: Jens Axboe @ 2017-08-04 15:04 UTC (permalink / raw)
  To: linux-block; +Cc: brking, bart.vanassche, Jens Axboe

We don't have to inc/dec some counter, since we can just
iterate the tags. That makes inc/dec a noop, but means we
have to iterate busy tags to get an in-flight count.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq.c        | 31 +++++++++++++++++++++++++++++++
 block/blk-mq.h        |  3 +++
 block/genhd.c         | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/genhd.h | 34 ++++++----------------------------
 4 files changed, 77 insertions(+), 28 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a5d369dc7622..fe1aa1f5f069 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -83,6 +83,37 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx,
 	sbitmap_clear_bit(&hctx->ctx_map, ctx->index_hw);
 }
 
+struct mq_inflight {
+	struct hd_struct *part;
+	unsigned int *inflight;
+};
+
+static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
+				  struct request *rq, void *priv,
+				  bool reserved)
+{
+	struct mq_inflight *mi = priv;
+
+	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
+		return;
+
+	/*
+	 * Count as inflight if it either matches the partition we asked
+	 * for, or if it's the root
+	 */
+	if (rq->part == mi->part || mi->part->partno)
+		mi->inflight[0]++;
+}
+
+void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part,
+		      unsigned int inflight[2])
+{
+	struct mq_inflight mi = { .part = part, .inflight = inflight, };
+
+	inflight[0] = 0;
+	blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, &mi);
+}
+
 void blk_freeze_queue_start(struct request_queue *q)
 {
 	int freeze_depth;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 60b01c0309bc..98252b79b80b 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -133,4 +133,7 @@ static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx)
 	return hctx->nr_ctx && hctx->tags;
 }
 
+void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part,
+			unsigned int inflight[2]);
+
 #endif
diff --git a/block/genhd.c b/block/genhd.c
index 822f65f95e2a..3dc4d115480f 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -45,6 +45,43 @@ static void disk_add_events(struct gendisk *disk);
 static void disk_del_events(struct gendisk *disk);
 static void disk_release_events(struct gendisk *disk);
 
+void part_inc_in_flight(struct request_queue *q, struct hd_struct *part, int rw)
+{
+	if (q->mq_ops)
+		return;
+
+	atomic_inc(&part->in_flight[rw]);
+	if (part->partno)
+		atomic_inc(&part_to_disk(part)->part0.in_flight[rw]);
+}
+
+void part_dec_in_flight(struct request_queue *q, struct hd_struct *part, int rw)
+{
+	if (q->mq_ops)
+		return;
+
+	atomic_dec(&part->in_flight[rw]);
+	if (part->partno)
+		atomic_dec(&part_to_disk(part)->part0.in_flight[rw]);
+}
+
+void part_in_flight(struct request_queue *q, struct hd_struct *part,
+		    unsigned int inflight[2])
+{
+	if (q->mq_ops) {
+		blk_mq_in_flight(q, part, inflight);
+		return;
+	}
+
+	inflight[0] = atomic_read(&part->in_flight[0]) +
+			atomic_read(&part->in_flight[1]);
+	if (part->partno) {
+		part = &part_to_disk(part)->part0;
+		inflight[1] = atomic_read(&part->in_flight[0]) +
+				atomic_read(&part->in_flight[1]);
+	}
+}
+
 /**
  * disk_get_part - get partition
  * @disk: disk to look partition from
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index a9c8ea632fdc..ea652bfcd675 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -362,34 +362,12 @@ static inline void free_part_stats(struct hd_struct *part)
 #define part_stat_sub(cpu, gendiskp, field, subnd)			\
 	part_stat_add(cpu, gendiskp, field, -subnd)
 
-static inline void part_inc_in_flight(struct request_queue *q,
-				      struct hd_struct *part, int rw)
-{
-	atomic_inc(&part->in_flight[rw]);
-	if (part->partno)
-		atomic_inc(&part_to_disk(part)->part0.in_flight[rw]);
-}
-
-static inline void part_dec_in_flight(struct request_queue *q,
-				      struct hd_struct *part, int rw)
-{
-	atomic_dec(&part->in_flight[rw]);
-	if (part->partno)
-		atomic_dec(&part_to_disk(part)->part0.in_flight[rw]);
-}
-
-static inline void part_in_flight(struct request_queue *q,
-				  struct hd_struct *part,
-				  unsigned int inflight[2])
-{
-	inflight[0] = atomic_read(&part->in_flight[0]) +
-			atomic_read(&part->in_flight[1]);
-	if (part->partno) {
-		part = &part_to_disk(part)->part0;
-		inflight[1] = atomic_read(&part->in_flight[0]) +
-				atomic_read(&part->in_flight[1]);
-	}
-}
+void part_in_flight(struct request_queue *q, struct hd_struct *part,
+			unsigned int inflight[2]);
+void part_dec_in_flight(struct request_queue *q, struct hd_struct *part,
+			int rw);
+void part_inc_in_flight(struct request_queue *q, struct hd_struct *part,
+			int rw);
 
 static inline struct partition_meta_info *alloc_part_info(struct gendisk *disk)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/6] blk-mq: enable checking two part inflight counts at the same time
  2017-08-04 15:04 [PATCHSET v2 0/6] block: more scalable inflight tracking Jens Axboe
                   ` (3 preceding siblings ...)
  2017-08-04 15:04 ` [PATCH 4/6] blk-mq: provide internal in-flight variant Jens Axboe
@ 2017-08-04 15:04 ` Jens Axboe
  2017-08-04 19:44   ` Bart Van Assche
  2017-08-08 22:48   ` Omar Sandoval
  2017-08-04 15:04 ` [PATCH 6/6] sbitmap: make sb_for_each_fn() return void Jens Axboe
  5 siblings, 2 replies; 27+ messages in thread
From: Jens Axboe @ 2017-08-04 15:04 UTC (permalink / raw)
  To: linux-block; +Cc: brking, bart.vanassche, Jens Axboe

Modify blk_mq_in_flight() to count both a partition and root at
the same time. Then we only have to call it once, instead of
potentially looping the tags twice.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c | 38 +++++++++++++++++++++++++-------------
 block/blk-mq.c   | 10 ++++++----
 2 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 6ad2b8602c1d..d836c84ad3da 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1470,17 +1470,12 @@ static void add_acct_request(struct request_queue *q, struct request *rq,
 }
 
 static void part_round_stats_single(struct request_queue *q, int cpu,
-				    struct hd_struct *part, unsigned long now)
+				    struct hd_struct *part, unsigned long now,
+				    unsigned int inflight)
 {
-	int inflight[2];
-
-	if (now == part->stamp)
-		return;
-
-	part_in_flight(q, part, inflight);
-	if (inflight[0]) {
+	if (inflight) {
 		__part_stat_add(cpu, part, time_in_queue,
-				inflight[0] * (now - part->stamp));
+				inflight * (now - part->stamp));
 		__part_stat_add(cpu, part, io_ticks, (now - part->stamp));
 	}
 	part->stamp = now;
@@ -1505,12 +1500,29 @@ static void part_round_stats_single(struct request_queue *q, int cpu,
  */
 void part_round_stats(struct request_queue *q, int cpu, struct hd_struct *part)
 {
+	struct hd_struct *part2 = NULL;
 	unsigned long now = jiffies;
+	unsigned int inflight[2];
+	int stats = 0;
+
+	if (part->stamp != now)
+		stats |= 1;
+
+	if (part->partno) {
+		part2 = &part_to_disk(part)->part0;
+		if (part2->stamp != now)
+			stats |= 2;
+	}
+
+	if (!stats)
+		return;
+
+	part_in_flight(q, part, inflight);
 
-	if (part->partno)
-		part_round_stats_single(q, cpu, &part_to_disk(part)->part0,
-						now);
-	part_round_stats_single(q, cpu, part, now);
+	if (stats & 2)
+		part_round_stats_single(q, cpu, part2, now, inflight[1]);
+	if (stats & 1)
+		part_round_stats_single(q, cpu, part, now, inflight[0]);
 }
 EXPORT_SYMBOL_GPL(part_round_stats);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index fe1aa1f5f069..410ed246bc9b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -98,11 +98,13 @@ static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
 		return;
 
 	/*
-	 * Count as inflight if it either matches the partition we asked
-	 * for, or if it's the root
+	 * Count as inflight if it matches the partition, count separately
+	 * (but all) if we got asked for the root
 	 */
-	if (rq->part == mi->part || mi->part->partno)
+	if (rq->part == mi->part)
 		mi->inflight[0]++;
+	if (mi->part->partno)
+		mi->inflight[1]++;
 }
 
 void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part,
@@ -110,7 +112,7 @@ void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part,
 {
 	struct mq_inflight mi = { .part = part, .inflight = inflight, };
 
-	inflight[0] = 0;
+	inflight[0] = inflight[1] = 0;
 	blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, &mi);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/6] sbitmap: make sb_for_each_fn() return void
  2017-08-04 15:04 [PATCHSET v2 0/6] block: more scalable inflight tracking Jens Axboe
                   ` (4 preceding siblings ...)
  2017-08-04 15:04 ` [PATCH 5/6] blk-mq: enable checking two part inflight counts at the same time Jens Axboe
@ 2017-08-04 15:04 ` Jens Axboe
  2017-08-04 19:49   ` Bart Van Assche
  5 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2017-08-04 15:04 UTC (permalink / raw)
  To: linux-block; +Cc: brking, bart.vanassche, Jens Axboe

The intent was for users to be able to continue/break iteration
based on the return value of this callback function, but all
callers now always return true. Hence it's pointless to return
anything at all. Change it to void, and update the users of it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq-tag.c      | 6 ++----
 block/blk-mq.c          | 3 +--
 include/linux/sbitmap.h | 6 ++----
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index b856b2827157..5ee218f11f81 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -202,7 +202,7 @@ struct bt_iter_data {
 	bool reserved;
 };
 
-static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
+static void bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 {
 	struct bt_iter_data *iter_data = data;
 	struct blk_mq_hw_ctx *hctx = iter_data->hctx;
@@ -216,7 +216,6 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 
 	if (rq && rq->q == hctx->queue)
 		iter_data->fn(hctx, rq, iter_data->data, reserved);
-	return true;
 }
 
 static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt,
@@ -239,7 +238,7 @@ struct bt_tags_iter_data {
 	bool reserved;
 };
 
-static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
+static void bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 {
 	struct bt_tags_iter_data *iter_data = data;
 	struct blk_mq_tags *tags = iter_data->tags;
@@ -251,7 +250,6 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	rq = tags->rqs[bitnr];
 	if (rq)
 		iter_data->fn(rq, iter_data->data, reserved);
-	return true;
 }
 
 static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 410ed246bc9b..c8d568aafcec 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -846,7 +846,7 @@ struct flush_busy_ctx_data {
 	struct list_head *list;
 };
 
-static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
+static void flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
 {
 	struct flush_busy_ctx_data *flush_data = data;
 	struct blk_mq_hw_ctx *hctx = flush_data->hctx;
@@ -856,7 +856,6 @@ static bool flush_busy_ctx(struct sbitmap *sb, unsigned int bitnr, void *data)
 	spin_lock(&ctx->lock);
 	list_splice_tail_init(&ctx->rq_list, flush_data->list);
 	spin_unlock(&ctx->lock);
-	return true;
 }
 
 /*
diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
index a1904aadbc45..33c3d8957927 100644
--- a/include/linux/sbitmap.h
+++ b/include/linux/sbitmap.h
@@ -211,7 +211,7 @@ bool sbitmap_any_bit_set(const struct sbitmap *sb);
  */
 bool sbitmap_any_bit_clear(const struct sbitmap *sb);
 
-typedef bool (*sb_for_each_fn)(struct sbitmap *, unsigned int, void *);
+typedef void (*sb_for_each_fn)(struct sbitmap *, unsigned int, void *);
 
 /**
  * sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitmap.
@@ -241,9 +241,7 @@ static inline void sbitmap_for_each_set(struct sbitmap *sb, sb_for_each_fn fn,
 			if (nr >= word->depth)
 				break;
 
-			if (!fn(sb, off + nr, data))
-				return;
-
+			fn(sb, off + nr, data);
 			nr++;
 		}
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] blk-mq-tag: check for NULL rq when iterating tags
  2017-08-04 15:04 ` [PATCH 1/6] blk-mq-tag: check for NULL rq when iterating tags Jens Axboe
@ 2017-08-04 19:34   ` Bart Van Assche
  2017-08-04 19:35     ` Jens Axboe
  2017-08-08 22:28   ` Omar Sandoval
  1 sibling, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2017-08-04 19:34 UTC (permalink / raw)
  To: linux-block, axboe; +Cc: brking

On Fri, 2017-08-04 at 09:04 -0600, Jens Axboe wrote:
> Since we introduced blk-mq-sched, the tags->rqs[] array has been
> dynamically assigned. So we need to check for NULL when iterating,
> since there's a window of time where the bit is set, but we haven't
> dynamically assigned the tags->rqs[] array position yet.
>=20
> This is perfectly safe, since the memory backing of the request is
> never going away while the device is alive.
>=20
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/blk-mq-tag.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>=20
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index d0be72ccb091..b856b2827157 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -214,7 +214,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned =
int bitnr, void *data)
>  		bitnr +=3D tags->nr_reserved_tags;
>  	rq =3D tags->rqs[bitnr];
> =20
> -	if (rq->q =3D=3D hctx->queue)
> +	if (rq && rq->q =3D=3D hctx->queue)
>  		iter_data->fn(hctx, rq, iter_data->data, reserved);
>  	return true;
>  }
> @@ -249,8 +249,8 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsi=
gned int bitnr, void *data)
>  	if (!reserved)
>  		bitnr +=3D tags->nr_reserved_tags;
>  	rq =3D tags->rqs[bitnr];
> -
> -	iter_data->fn(rq, iter_data->data, reserved);
> +	if (rq)
> +		iter_data->fn(rq, iter_data->data, reserved);
>  	return true;
>  }

Hello Jens,

Should it have been mentioned in the patch description or in a comment that
these functions can read tags->rqs[] after the corresponding bit has been s=
et
in the bitmap but before the tags->rqs[] pointer is updated?

Anyway:

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] blk-mq-tag: check for NULL rq when iterating tags
  2017-08-04 19:34   ` Bart Van Assche
@ 2017-08-04 19:35     ` Jens Axboe
  0 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2017-08-04 19:35 UTC (permalink / raw)
  To: Bart Van Assche, linux-block; +Cc: brking

On 08/04/2017 01:34 PM, Bart Van Assche wrote:
> On Fri, 2017-08-04 at 09:04 -0600, Jens Axboe wrote:
>> Since we introduced blk-mq-sched, the tags->rqs[] array has been
>> dynamically assigned. So we need to check for NULL when iterating,
>> since there's a window of time where the bit is set, but we haven't
>> dynamically assigned the tags->rqs[] array position yet.
>>
>> This is perfectly safe, since the memory backing of the request is
>> never going away while the device is alive.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>>  block/blk-mq-tag.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index d0be72ccb091..b856b2827157 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -214,7 +214,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
>>  		bitnr += tags->nr_reserved_tags;
>>  	rq = tags->rqs[bitnr];
>>  
>> -	if (rq->q == hctx->queue)
>> +	if (rq && rq->q == hctx->queue)
>>  		iter_data->fn(hctx, rq, iter_data->data, reserved);
>>  	return true;
>>  }
>> @@ -249,8 +249,8 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
>>  	if (!reserved)
>>  		bitnr += tags->nr_reserved_tags;
>>  	rq = tags->rqs[bitnr];
>> -
>> -	iter_data->fn(rq, iter_data->data, reserved);
>> +	if (rq)
>> +		iter_data->fn(rq, iter_data->data, reserved);
>>  	return true;
>>  }
> 
> Hello Jens,
> 
> Should it have been mentioned in the patch description or in a comment that
> these functions can read tags->rqs[] after the corresponding bit has been set
> in the bitmap but before the tags->rqs[] pointer is updated?

I'll add a comment to that effect.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/6] block: make part_in_flight() take an array of two ints
  2017-08-04 15:04 ` [PATCH 3/6] block: make part_in_flight() take an array of two ints Jens Axboe
@ 2017-08-04 19:38   ` Bart Van Assche
  2017-08-04 19:40     ` Jens Axboe
  2017-08-08 22:42   ` Omar Sandoval
  1 sibling, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2017-08-04 19:38 UTC (permalink / raw)
  To: linux-block, axboe; +Cc: brking

On Fri, 2017-08-04 at 09:04 -0600, Jens Axboe wrote:
> Instead of returning the count that matches the partition, pass
> in an array of two ints. Index 0 will be filled with the inflight
> count for the partition in question, and index 1 will filled
> with the root infligh count, if the partition passed in is not the
> root.

Hello Jens,

It looks like a letter 't' is missing from the patch description ("infligh"=
)?

Anyway:

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/6] block: make part_in_flight() take an array of two ints
  2017-08-04 19:38   ` Bart Van Assche
@ 2017-08-04 19:40     ` Jens Axboe
  0 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2017-08-04 19:40 UTC (permalink / raw)
  To: Bart Van Assche, linux-block; +Cc: brking

On 08/04/2017 01:38 PM, Bart Van Assche wrote:
> On Fri, 2017-08-04 at 09:04 -0600, Jens Axboe wrote:
>> Instead of returning the count that matches the partition, pass
>> in an array of two ints. Index 0 will be filled with the inflight
>> count for the partition in question, and index 1 will filled
>> with the root infligh count, if the partition passed in is not the
>> root.
> 
> Hello Jens,
> 
> It looks like a letter 't' is missing from the patch description ("infligh")?

Oops thanks, fixed up.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] blk-mq: provide internal in-flight variant
  2017-08-04 15:04 ` [PATCH 4/6] blk-mq: provide internal in-flight variant Jens Axboe
@ 2017-08-04 19:43   ` Bart Van Assche
  2017-08-04 19:45     ` Jens Axboe
  2017-08-08 22:46   ` Omar Sandoval
  1 sibling, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2017-08-04 19:43 UTC (permalink / raw)
  To: linux-block, axboe; +Cc: brking

On Fri, 2017-08-04 at 09:04 -0600, Jens Axboe wrote:
> +static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
> +				  struct request *rq, void *priv,
> +				  bool reserved)
> +{
> +	struct mq_inflight *mi =3D priv;
> +
> +	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
> +		return;
> +
> +	/*
> +	 * Count as inflight if it either matches the partition we asked
> +	 * for, or if it's the root
> +	 */
> +	if (rq->part =3D=3D mi->part || mi->part->partno)
> +		mi->inflight[0]++;
> +}

Hello Jens,

How to check for the root? Does partno =3D=3D 0 or partno !=3D 0 represent =
the
root? Please note that I'm not familiar with the partition code.

Thanks,

Bart.=

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] blk-mq: enable checking two part inflight counts at the same time
  2017-08-04 15:04 ` [PATCH 5/6] blk-mq: enable checking two part inflight counts at the same time Jens Axboe
@ 2017-08-04 19:44   ` Bart Van Assche
  2017-08-04 19:56     ` Jens Axboe
  2017-08-08 22:48   ` Omar Sandoval
  1 sibling, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2017-08-04 19:44 UTC (permalink / raw)
  To: linux-block, axboe; +Cc: brking

On Fri, 2017-08-04 at 09:04 -0600, Jens Axboe wrote:
> @@ -98,11 +98,13 @@ static void blk_mq_check_inflight(struct blk_mq_hw_ct=
x *hctx,
>  		return;
> =20
>  	/*
> -	 * Count as inflight if it either matches the partition we asked
> -	 * for, or if it's the root
> +	 * Count as inflight if it matches the partition, count separately
> +	 * (but all) if we got asked for the root
>  	 */
> -	if (rq->part =3D=3D mi->part || mi->part->partno)
> +	if (rq->part =3D=3D mi->part)
>  		mi->inflight[0]++;
> +	if (mi->part->partno)
> +		mi->inflight[1]++;
>  }

Hello Jens,

Same question here: should "if (mi->part->partno)" perhaps be changed into
"if (mi->part->partno =3D=3D 0)"?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] blk-mq: provide internal in-flight variant
  2017-08-04 19:43   ` Bart Van Assche
@ 2017-08-04 19:45     ` Jens Axboe
  0 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2017-08-04 19:45 UTC (permalink / raw)
  To: Bart Van Assche, linux-block; +Cc: brking

On 08/04/2017 01:43 PM, Bart Van Assche wrote:
> On Fri, 2017-08-04 at 09:04 -0600, Jens Axboe wrote:
>> +static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
>> +				  struct request *rq, void *priv,
>> +				  bool reserved)
>> +{
>> +	struct mq_inflight *mi = priv;
>> +
>> +	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
>> +		return;
>> +
>> +	/*
>> +	 * Count as inflight if it either matches the partition we asked
>> +	 * for, or if it's the root
>> +	 */
>> +	if (rq->part == mi->part || mi->part->partno)
>> +		mi->inflight[0]++;
>> +}
> 
> Hello Jens,
> 
> How to check for the root? Does partno == 0 or partno != 0 represent the
> root? Please note that I'm not familiar with the partition code.

if partno != 0, it's a partition. So for that, account for root too. I
guess the comment could be better...

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] sbitmap: make sb_for_each_fn() return void
  2017-08-04 15:04 ` [PATCH 6/6] sbitmap: make sb_for_each_fn() return void Jens Axboe
@ 2017-08-04 19:49   ` Bart Van Assche
  2017-08-04 19:50     ` Jens Axboe
  0 siblings, 1 reply; 27+ messages in thread
From: Bart Van Assche @ 2017-08-04 19:49 UTC (permalink / raw)
  To: linux-block, axboe; +Cc: brking

On Fri, 2017-08-04 at 09:04 -0600, Jens Axboe wrote:
> @@ -211,7 +211,7 @@ bool sbitmap_any_bit_set(const struct sbitmap *sb);
>   */
>  bool sbitmap_any_bit_clear(const struct sbitmap *sb);
> =20
> -typedef bool (*sb_for_each_fn)(struct sbitmap *, unsigned int, void *);
> +typedef void (*sb_for_each_fn)(struct sbitmap *, unsigned int, void *);
> =20
>  /**
>   * sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitm=
ap.
> @@ -241,9 +241,7 @@ static inline void sbitmap_for_each_set(struct sbitma=
p *sb, sb_for_each_fn fn,
>  			if (nr >=3D word->depth)
>  				break;
> =20
> -			if (!fn(sb, off + nr, data))
> -				return;
> -
> +			fn(sb, off + nr, data);
>  			nr++;
>  		}
>  	}

Hello Jens,

Are you aware that this change will break one of Ming Lei's patches? See
also https://www.mail-archive.com/linux-block@vger.kernel.org/msg11230.html=
.

Thanks,

Bart.=

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] sbitmap: make sb_for_each_fn() return void
  2017-08-04 19:49   ` Bart Van Assche
@ 2017-08-04 19:50     ` Jens Axboe
  0 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2017-08-04 19:50 UTC (permalink / raw)
  To: Bart Van Assche, linux-block; +Cc: brking

On 08/04/2017 01:49 PM, Bart Van Assche wrote:
> On Fri, 2017-08-04 at 09:04 -0600, Jens Axboe wrote:
>> @@ -211,7 +211,7 @@ bool sbitmap_any_bit_set(const struct sbitmap *sb);
>>   */
>>  bool sbitmap_any_bit_clear(const struct sbitmap *sb);
>>  
>> -typedef bool (*sb_for_each_fn)(struct sbitmap *, unsigned int, void *);
>> +typedef void (*sb_for_each_fn)(struct sbitmap *, unsigned int, void *);
>>  
>>  /**
>>   * sbitmap_for_each_set() - Iterate over each set bit in a &struct sbitmap.
>> @@ -241,9 +241,7 @@ static inline void sbitmap_for_each_set(struct sbitmap *sb, sb_for_each_fn fn,
>>  			if (nr >= word->depth)
>>  				break;
>>  
>> -			if (!fn(sb, off + nr, data))
>> -				return;
>> -
>> +			fn(sb, off + nr, data);
>>  			nr++;
>>  		}
>>  	}
> 
> Hello Jens,
> 
> Are you aware that this change will break one of Ming Lei's patches? See
> also https://www.mail-archive.com/linux-block@vger.kernel.org/msg11230.html.

I wasn't, I'll just drop it for now then.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] blk-mq: enable checking two part inflight counts at the same time
  2017-08-04 19:44   ` Bart Van Assche
@ 2017-08-04 19:56     ` Jens Axboe
  2017-08-06  0:17       ` Bart Van Assche
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2017-08-04 19:56 UTC (permalink / raw)
  To: Bart Van Assche, linux-block; +Cc: brking

On 08/04/2017 01:44 PM, Bart Van Assche wrote:
> On Fri, 2017-08-04 at 09:04 -0600, Jens Axboe wrote:
>> @@ -98,11 +98,13 @@ static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
>>  		return;
>>  
>>  	/*
>> -	 * Count as inflight if it either matches the partition we asked
>> -	 * for, or if it's the root
>> +	 * Count as inflight if it matches the partition, count separately
>> +	 * (but all) if we got asked for the root
>>  	 */
>> -	if (rq->part == mi->part || mi->part->partno)
>> +	if (rq->part == mi->part)
>>  		mi->inflight[0]++;
>> +	if (mi->part->partno)
>> +		mi->inflight[1]++;
>>  }
> 
> Hello Jens,
> 
> Same question here: should "if (mi->part->partno)" perhaps be changed into
> "if (mi->part->partno == 0)"?

It should be correct as-is.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] blk-mq: enable checking two part inflight counts at the same time
  2017-08-04 19:56     ` Jens Axboe
@ 2017-08-06  0:17       ` Bart Van Assche
  0 siblings, 0 replies; 27+ messages in thread
From: Bart Van Assche @ 2017-08-06  0:17 UTC (permalink / raw)
  To: linux-block, axboe; +Cc: brking

T24gRnJpLCAyMDE3LTA4LTA0IGF0IDEzOjU2IC0wNjAwLCBKZW5zIEF4Ym9lIHdyb3RlOg0KPiBP
biAwOC8wNC8yMDE3IDAxOjQ0IFBNLCBCYXJ0IFZhbiBBc3NjaGUgd3JvdGU6DQo+ID4gT24gRnJp
LCAyMDE3LTA4LTA0IGF0IDA5OjA0IC0wNjAwLCBKZW5zIEF4Ym9lIHdyb3RlOg0KPiA+ID4gQEAg
LTk4LDExICs5OCwxMyBAQCBzdGF0aWMgdm9pZCBibGtfbXFfY2hlY2tfaW5mbGlnaHQoc3RydWN0
IGJsa19tcV9od19jdHggKmhjdHgsDQo+ID4gPiAgCQlyZXR1cm47DQo+ID4gPiAgDQo+ID4gPiAg
CS8qDQo+ID4gPiAtCSAqIENvdW50IGFzIGluZmxpZ2h0IGlmIGl0IGVpdGhlciBtYXRjaGVzIHRo
ZSBwYXJ0aXRpb24gd2UgYXNrZWQNCj4gPiA+IC0JICogZm9yLCBvciBpZiBpdCdzIHRoZSByb290
DQo+ID4gPiArCSAqIENvdW50IGFzIGluZmxpZ2h0IGlmIGl0IG1hdGNoZXMgdGhlIHBhcnRpdGlv
biwgY291bnQgc2VwYXJhdGVseQ0KPiA+ID4gKwkgKiAoYnV0IGFsbCkgaWYgd2UgZ290IGFza2Vk
IGZvciB0aGUgcm9vdA0KPiA+ID4gIAkgKi8NCj4gPiA+IC0JaWYgKHJxLT5wYXJ0ID09IG1pLT5w
YXJ0IHx8IG1pLT5wYXJ0LT5wYXJ0bm8pDQo+ID4gPiArCWlmIChycS0+cGFydCA9PSBtaS0+cGFy
dCkNCj4gPiA+ICAJCW1pLT5pbmZsaWdodFswXSsrOw0KPiA+ID4gKwlpZiAobWktPnBhcnQtPnBh
cnRubykNCj4gPiA+ICsJCW1pLT5pbmZsaWdodFsxXSsrOw0KPiA+ID4gIH0NCj4gPiANCj4gPiBI
ZWxsbyBKZW5zLA0KPiA+IA0KPiA+IFNhbWUgcXVlc3Rpb24gaGVyZTogc2hvdWxkICJpZiAobWkt
PnBhcnQtPnBhcnRubykiIHBlcmhhcHMgYmUgY2hhbmdlZCBpbnRvDQo+ID4gImlmIChtaS0+cGFy
dC0+cGFydG5vID09IDApIj8NCj4gDQo+IEl0IHNob3VsZCBiZSBjb3JyZWN0IGFzLWlzLg0KDQpB
Z3JlZWQsIEkgZ290IG1pc2xlZCBieSB0aGUgY29tbWVudCBhYm92ZSB0aGUgY29kZS4NCg0KQmFy
dC4=

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] blk-mq-tag: check for NULL rq when iterating tags
  2017-08-04 15:04 ` [PATCH 1/6] blk-mq-tag: check for NULL rq when iterating tags Jens Axboe
  2017-08-04 19:34   ` Bart Van Assche
@ 2017-08-08 22:28   ` Omar Sandoval
  1 sibling, 0 replies; 27+ messages in thread
From: Omar Sandoval @ 2017-08-08 22:28 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, brking, bart.vanassche

On Fri, Aug 04, 2017 at 09:04:17AM -0600, Jens Axboe wrote:
> Since we introduced blk-mq-sched, the tags->rqs[] array has been
> dynamically assigned. So we need to check for NULL when iterating,
> since there's a window of time where the bit is set, but we haven't
> dynamically assigned the tags->rqs[] array position yet.
> 
> This is perfectly safe, since the memory backing of the request is
> never going away while the device is alive.

Reviewed-by: Omar Sandoval <osandov@fb.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] block: pass in queue to inflight accounting
  2017-08-04 15:04 ` [PATCH 2/6] block: pass in queue to inflight accounting Jens Axboe
@ 2017-08-08 22:30   ` Omar Sandoval
  0 siblings, 0 replies; 27+ messages in thread
From: Omar Sandoval @ 2017-08-08 22:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, brking, bart.vanassche

On Fri, Aug 04, 2017 at 09:04:18AM -0600, Jens Axboe wrote:
> No functional change in this patch, just in preparation for
> basing the inflight mechanism on the queue in question.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/6] block: make part_in_flight() take an array of two ints
  2017-08-04 15:04 ` [PATCH 3/6] block: make part_in_flight() take an array of two ints Jens Axboe
  2017-08-04 19:38   ` Bart Van Assche
@ 2017-08-08 22:42   ` Omar Sandoval
  2017-08-08 23:45     ` Jens Axboe
  1 sibling, 1 reply; 27+ messages in thread
From: Omar Sandoval @ 2017-08-08 22:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, brking, bart.vanassche

On Fri, Aug 04, 2017 at 09:04:19AM -0600, Jens Axboe wrote:
> Instead of returning the count that matches the partition, pass
> in an array of two ints. Index 0 will be filled with the inflight
> count for the partition in question, and index 1 will filled
> with the root infligh count, if the partition passed in is not the
> root.
> 
> This is in preparation for being able to calculate both in one
> go.

One tiny comment below, besides that

Reviewed-by: Omar Sandoval <osandov@fb.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index 7f7427e00f9c..a9c8ea632fdc 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -378,11 +378,17 @@ static inline void part_dec_in_flight(struct request_queue *q,
>  		atomic_dec(&part_to_disk(part)->part0.in_flight[rw]);
>  }
>  
> -static inline int part_in_flight(struct request_queue *q,
> -				 struct hd_struct *part)
> +static inline void part_in_flight(struct request_queue *q,
> +				  struct hd_struct *part,
> +				  unsigned int inflight[2])
>  {
> -	return atomic_read(&part->in_flight[0]) +
> +	inflight[0] = atomic_read(&part->in_flight[0]) +
>  			atomic_read(&part->in_flight[1]);

It makes me a little nervous here that we only initialize inflight[1] if
part is not part0, that seems a little subtle and easy to miss. Can we
change the line above to this?

inflight[0] = inflight[1] = (atomic_read(&part->in_flight[0]) +
			     atomic_read(&part->in_flight[1]));

> +	if (part->partno) {
> +		part = &part_to_disk(part)->part0;
> +		inflight[1] = atomic_read(&part->in_flight[0]) +
> +				atomic_read(&part->in_flight[1]);
> +	}
>  }
>  
>  static inline struct partition_meta_info *alloc_part_info(struct gendisk *disk)
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] blk-mq: provide internal in-flight variant
  2017-08-04 15:04 ` [PATCH 4/6] blk-mq: provide internal in-flight variant Jens Axboe
  2017-08-04 19:43   ` Bart Van Assche
@ 2017-08-08 22:46   ` Omar Sandoval
  1 sibling, 0 replies; 27+ messages in thread
From: Omar Sandoval @ 2017-08-08 22:46 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, brking, bart.vanassche

On Fri, Aug 04, 2017 at 09:04:20AM -0600, Jens Axboe wrote:
> We don't have to inc/dec some counter, since we can just
> iterate the tags. That makes inc/dec a noop, but means we
> have to iterate busy tags to get an in-flight count.
>
> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>

Barring performance numbers that show that percpu is much better (which
I doubt),

Reviewed-by: Omar Sandoval <osandov@fb.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  block/blk-mq.c        | 31 +++++++++++++++++++++++++++++++
>  block/blk-mq.h        |  3 +++
>  block/genhd.c         | 37 +++++++++++++++++++++++++++++++++++++
>  include/linux/genhd.h | 34 ++++++----------------------------
>  4 files changed, 77 insertions(+), 28 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a5d369dc7622..fe1aa1f5f069 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -83,6 +83,37 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx,
>  	sbitmap_clear_bit(&hctx->ctx_map, ctx->index_hw);
>  }
>  
> +struct mq_inflight {
> +	struct hd_struct *part;
> +	unsigned int *inflight;
> +};
> +
> +static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
> +				  struct request *rq, void *priv,
> +				  bool reserved)
> +{
> +	struct mq_inflight *mi = priv;
> +
> +	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
> +		return;
> +
> +	/*
> +	 * Count as inflight if it either matches the partition we asked
> +	 * for, or if it's the root
> +	 */
> +	if (rq->part == mi->part || mi->part->partno)
> +		mi->inflight[0]++;
> +}
> +
> +void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part,
> +		      unsigned int inflight[2])
> +{
> +	struct mq_inflight mi = { .part = part, .inflight = inflight, };
> +
> +	inflight[0] = 0;
> +	blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, &mi);
> +}
> +
>  void blk_freeze_queue_start(struct request_queue *q)
>  {
>  	int freeze_depth;
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 60b01c0309bc..98252b79b80b 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -133,4 +133,7 @@ static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx)
>  	return hctx->nr_ctx && hctx->tags;
>  }
>  
> +void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part,
> +			unsigned int inflight[2]);
> +
>  #endif
> diff --git a/block/genhd.c b/block/genhd.c
> index 822f65f95e2a..3dc4d115480f 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -45,6 +45,43 @@ static void disk_add_events(struct gendisk *disk);
>  static void disk_del_events(struct gendisk *disk);
>  static void disk_release_events(struct gendisk *disk);
>  
> +void part_inc_in_flight(struct request_queue *q, struct hd_struct *part, int rw)
> +{
> +	if (q->mq_ops)
> +		return;
> +
> +	atomic_inc(&part->in_flight[rw]);
> +	if (part->partno)
> +		atomic_inc(&part_to_disk(part)->part0.in_flight[rw]);
> +}
> +
> +void part_dec_in_flight(struct request_queue *q, struct hd_struct *part, int rw)
> +{
> +	if (q->mq_ops)
> +		return;
> +
> +	atomic_dec(&part->in_flight[rw]);
> +	if (part->partno)
> +		atomic_dec(&part_to_disk(part)->part0.in_flight[rw]);
> +}
> +
> +void part_in_flight(struct request_queue *q, struct hd_struct *part,
> +		    unsigned int inflight[2])
> +{
> +	if (q->mq_ops) {
> +		blk_mq_in_flight(q, part, inflight);
> +		return;
> +	}
> +
> +	inflight[0] = atomic_read(&part->in_flight[0]) +
> +			atomic_read(&part->in_flight[1]);
> +	if (part->partno) {
> +		part = &part_to_disk(part)->part0;
> +		inflight[1] = atomic_read(&part->in_flight[0]) +
> +				atomic_read(&part->in_flight[1]);
> +	}
> +}
> +
>  /**
>   * disk_get_part - get partition
>   * @disk: disk to look partition from
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index a9c8ea632fdc..ea652bfcd675 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -362,34 +362,12 @@ static inline void free_part_stats(struct hd_struct *part)
>  #define part_stat_sub(cpu, gendiskp, field, subnd)			\
>  	part_stat_add(cpu, gendiskp, field, -subnd)
>  
> -static inline void part_inc_in_flight(struct request_queue *q,
> -				      struct hd_struct *part, int rw)
> -{
> -	atomic_inc(&part->in_flight[rw]);
> -	if (part->partno)
> -		atomic_inc(&part_to_disk(part)->part0.in_flight[rw]);
> -}
> -
> -static inline void part_dec_in_flight(struct request_queue *q,
> -				      struct hd_struct *part, int rw)
> -{
> -	atomic_dec(&part->in_flight[rw]);
> -	if (part->partno)
> -		atomic_dec(&part_to_disk(part)->part0.in_flight[rw]);
> -}
> -
> -static inline void part_in_flight(struct request_queue *q,
> -				  struct hd_struct *part,
> -				  unsigned int inflight[2])
> -{
> -	inflight[0] = atomic_read(&part->in_flight[0]) +
> -			atomic_read(&part->in_flight[1]);
> -	if (part->partno) {
> -		part = &part_to_disk(part)->part0;
> -		inflight[1] = atomic_read(&part->in_flight[0]) +
> -				atomic_read(&part->in_flight[1]);
> -	}
> -}
> +void part_in_flight(struct request_queue *q, struct hd_struct *part,
> +			unsigned int inflight[2]);
> +void part_dec_in_flight(struct request_queue *q, struct hd_struct *part,
> +			int rw);
> +void part_inc_in_flight(struct request_queue *q, struct hd_struct *part,
> +			int rw);
>  
>  static inline struct partition_meta_info *alloc_part_info(struct gendisk *disk)
>  {
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] blk-mq: enable checking two part inflight counts at the same time
  2017-08-04 15:04 ` [PATCH 5/6] blk-mq: enable checking two part inflight counts at the same time Jens Axboe
  2017-08-04 19:44   ` Bart Van Assche
@ 2017-08-08 22:48   ` Omar Sandoval
  2017-08-08 23:47     ` Jens Axboe
  1 sibling, 1 reply; 27+ messages in thread
From: Omar Sandoval @ 2017-08-08 22:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, brking, bart.vanassche

On Fri, Aug 04, 2017 at 09:04:21AM -0600, Jens Axboe wrote:
> Modify blk_mq_in_flight() to count both a partition and root at
> the same time. Then we only have to call it once, instead of
> potentially looping the tags twice.

Reviewed-by: Omar Sandoval <osandov@fb.com>

One comment below.

> Signed-off-by: Jens Axboe <axboe@kernel.dk>

[snip]

> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index fe1aa1f5f069..410ed246bc9b 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -98,11 +98,13 @@ static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
>  		return;
>  
>  	/*
> -	 * Count as inflight if it either matches the partition we asked
> -	 * for, or if it's the root
> +	 * Count as inflight if it matches the partition, count separately
> +	 * (but all) if we got asked for the root
>  	 */
> -	if (rq->part == mi->part || mi->part->partno)
> +	if (rq->part == mi->part)
>  		mi->inflight[0]++;

Similar concern as with patch 3, why special case the part0 case below?

> +	if (mi->part->partno)
> +		mi->inflight[1]++;
>  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/6] block: make part_in_flight() take an array of two ints
  2017-08-08 22:42   ` Omar Sandoval
@ 2017-08-08 23:45     ` Jens Axboe
  0 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2017-08-08 23:45 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-block, brking, bart.vanassche

On 08/08/2017 04:42 PM, Omar Sandoval wrote:
> On Fri, Aug 04, 2017 at 09:04:19AM -0600, Jens Axboe wrote:
>> Instead of returning the count that matches the partition, pass
>> in an array of two ints. Index 0 will be filled with the inflight
>> count for the partition in question, and index 1 will filled
>> with the root infligh count, if the partition passed in is not the
>> root.
>>
>> This is in preparation for being able to calculate both in one
>> go.
> 
> One tiny comment below, besides that
> 
> Reviewed-by: Omar Sandoval <osandov@fb.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
>> index 7f7427e00f9c..a9c8ea632fdc 100644
>> --- a/include/linux/genhd.h
>> +++ b/include/linux/genhd.h
>> @@ -378,11 +378,17 @@ static inline void part_dec_in_flight(struct request_queue *q,
>>  		atomic_dec(&part_to_disk(part)->part0.in_flight[rw]);
>>  }
>>  
>> -static inline int part_in_flight(struct request_queue *q,
>> -				 struct hd_struct *part)
>> +static inline void part_in_flight(struct request_queue *q,
>> +				  struct hd_struct *part,
>> +				  unsigned int inflight[2])
>>  {
>> -	return atomic_read(&part->in_flight[0]) +
>> +	inflight[0] = atomic_read(&part->in_flight[0]) +
>>  			atomic_read(&part->in_flight[1]);
> 
> It makes me a little nervous here that we only initialize inflight[1] if
> part is not part0, that seems a little subtle and easy to miss. Can we
> change the line above to this?
> 
> inflight[0] = inflight[1] = (atomic_read(&part->in_flight[0]) +
> 			     atomic_read(&part->in_flight[1]));

Yes good point, I'll make that change.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] blk-mq: enable checking two part inflight counts at the same time
  2017-08-08 22:48   ` Omar Sandoval
@ 2017-08-08 23:47     ` Jens Axboe
  2017-08-09  7:14       ` Omar Sandoval
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2017-08-08 23:47 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-block, brking, bart.vanassche

On 08/08/2017 04:48 PM, Omar Sandoval wrote:
> On Fri, Aug 04, 2017 at 09:04:21AM -0600, Jens Axboe wrote:
>> Modify blk_mq_in_flight() to count both a partition and root at
>> the same time. Then we only have to call it once, instead of
>> potentially looping the tags twice.
> 
> Reviewed-by: Omar Sandoval <osandov@fb.com>
> 
> One comment below.
> 
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> [snip]
> 
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index fe1aa1f5f069..410ed246bc9b 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -98,11 +98,13 @@ static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
>>  		return;
>>  
>>  	/*
>> -	 * Count as inflight if it either matches the partition we asked
>> -	 * for, or if it's the root
>> +	 * Count as inflight if it matches the partition, count separately
>> +	 * (but all) if we got asked for the root
>>  	 */
>> -	if (rq->part == mi->part || mi->part->partno)
>> +	if (rq->part == mi->part)
>>  		mi->inflight[0]++;
> 
> Similar concern as with patch 3, why special case the part0 case below?

Not sure I follow, both are initialized for this case. Or do you mean
the increment? The comment isn't great, I should update that. Basically
we want to increment [1] if this is a partition, and increment [0] if it
matches the one that was asked for.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] blk-mq: enable checking two part inflight counts at the same time
  2017-08-08 23:47     ` Jens Axboe
@ 2017-08-09  7:14       ` Omar Sandoval
  2017-08-09 14:13         ` Jens Axboe
  0 siblings, 1 reply; 27+ messages in thread
From: Omar Sandoval @ 2017-08-09  7:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, brking, bart.vanassche

On Tue, Aug 08, 2017 at 05:47:15PM -0600, Jens Axboe wrote:
> On 08/08/2017 04:48 PM, Omar Sandoval wrote:
> > On Fri, Aug 04, 2017 at 09:04:21AM -0600, Jens Axboe wrote:
> >> Modify blk_mq_in_flight() to count both a partition and root at
> >> the same time. Then we only have to call it once, instead of
> >> potentially looping the tags twice.
> > 
> > Reviewed-by: Omar Sandoval <osandov@fb.com>
> > 
> > One comment below.
> > 
> >> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > 
> > [snip]
> > 
> >> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >> index fe1aa1f5f069..410ed246bc9b 100644
> >> --- a/block/blk-mq.c
> >> +++ b/block/blk-mq.c
> >> @@ -98,11 +98,13 @@ static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
> >>  		return;
> >>  
> >>  	/*
> >> -	 * Count as inflight if it either matches the partition we asked
> >> -	 * for, or if it's the root
> >> +	 * Count as inflight if it matches the partition, count separately
> >> +	 * (but all) if we got asked for the root
> >>  	 */
> >> -	if (rq->part == mi->part || mi->part->partno)
> >> +	if (rq->part == mi->part)
> >>  		mi->inflight[0]++;
> > 
> > Similar concern as with patch 3, why special case the part0 case below?
> 
> Not sure I follow, both are initialized for this case. Or do you mean
> the increment?

Yeah I mean the increment. If I'm reading this right, for the !part0
case, inflight[0] is the count of in-flight requests for the partition
and inflight[1] is the count of in-flight requests for the whole device.
For the part0 case, inflight[0] is the count of in-flight requests for
the root device and inflight[1] is always 0. Can we make inflight[1] the
same in the part0 and !part0 cases?

> The comment isn't great, I should update that. Basically
> we want to increment [1] if this is a partition, and increment [0] if it
> matches the one that was asked for.
> 
> -- 
> Jens Axboe
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] blk-mq: enable checking two part inflight counts at the same time
  2017-08-09  7:14       ` Omar Sandoval
@ 2017-08-09 14:13         ` Jens Axboe
  0 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2017-08-09 14:13 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-block, brking, bart.vanassche

On 08/09/2017 01:14 AM, Omar Sandoval wrote:
> On Tue, Aug 08, 2017 at 05:47:15PM -0600, Jens Axboe wrote:
>> On 08/08/2017 04:48 PM, Omar Sandoval wrote:
>>> On Fri, Aug 04, 2017 at 09:04:21AM -0600, Jens Axboe wrote:
>>>> Modify blk_mq_in_flight() to count both a partition and root at
>>>> the same time. Then we only have to call it once, instead of
>>>> potentially looping the tags twice.
>>>
>>> Reviewed-by: Omar Sandoval <osandov@fb.com>
>>>
>>> One comment below.
>>>
>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>
>>> [snip]
>>>
>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> index fe1aa1f5f069..410ed246bc9b 100644
>>>> --- a/block/blk-mq.c
>>>> +++ b/block/blk-mq.c
>>>> @@ -98,11 +98,13 @@ static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
>>>>  		return;
>>>>  
>>>>  	/*
>>>> -	 * Count as inflight if it either matches the partition we asked
>>>> -	 * for, or if it's the root
>>>> +	 * Count as inflight if it matches the partition, count separately
>>>> +	 * (but all) if we got asked for the root
>>>>  	 */
>>>> -	if (rq->part == mi->part || mi->part->partno)
>>>> +	if (rq->part == mi->part)
>>>>  		mi->inflight[0]++;
>>>
>>> Similar concern as with patch 3, why special case the part0 case below?
>>
>> Not sure I follow, both are initialized for this case. Or do you mean
>> the increment?
> 
> Yeah I mean the increment. If I'm reading this right, for the !part0
> case, inflight[0] is the count of in-flight requests for the partition
> and inflight[1] is the count of in-flight requests for the whole device.
> For the part0 case, inflight[0] is the count of in-flight requests for
> the root device and inflight[1] is always 0. Can we make inflight[1] the
> same in the part0 and !part0 cases?

Not sure that would change much. [0] is indeed the exact partition. That's
always the case. [1] is always the root, except if [0] is. We just don't
want to count it twice. So I'd argue they are the same :-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-08-09 14:13 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-04 15:04 [PATCHSET v2 0/6] block: more scalable inflight tracking Jens Axboe
2017-08-04 15:04 ` [PATCH 1/6] blk-mq-tag: check for NULL rq when iterating tags Jens Axboe
2017-08-04 19:34   ` Bart Van Assche
2017-08-04 19:35     ` Jens Axboe
2017-08-08 22:28   ` Omar Sandoval
2017-08-04 15:04 ` [PATCH 2/6] block: pass in queue to inflight accounting Jens Axboe
2017-08-08 22:30   ` Omar Sandoval
2017-08-04 15:04 ` [PATCH 3/6] block: make part_in_flight() take an array of two ints Jens Axboe
2017-08-04 19:38   ` Bart Van Assche
2017-08-04 19:40     ` Jens Axboe
2017-08-08 22:42   ` Omar Sandoval
2017-08-08 23:45     ` Jens Axboe
2017-08-04 15:04 ` [PATCH 4/6] blk-mq: provide internal in-flight variant Jens Axboe
2017-08-04 19:43   ` Bart Van Assche
2017-08-04 19:45     ` Jens Axboe
2017-08-08 22:46   ` Omar Sandoval
2017-08-04 15:04 ` [PATCH 5/6] blk-mq: enable checking two part inflight counts at the same time Jens Axboe
2017-08-04 19:44   ` Bart Van Assche
2017-08-04 19:56     ` Jens Axboe
2017-08-06  0:17       ` Bart Van Assche
2017-08-08 22:48   ` Omar Sandoval
2017-08-08 23:47     ` Jens Axboe
2017-08-09  7:14       ` Omar Sandoval
2017-08-09 14:13         ` Jens Axboe
2017-08-04 15:04 ` [PATCH 6/6] sbitmap: make sb_for_each_fn() return void Jens Axboe
2017-08-04 19:49   ` Bart Van Assche
2017-08-04 19:50     ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.