linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC V2 0/4] block: add two statistic tables
@ 2020-07-13 21:13 Guoqing Jiang
  2020-07-13 21:13 ` [PATCH RFC V2 1/4] block: add a statistic table for io latency Guoqing Jiang
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Guoqing Jiang @ 2020-07-13 21:13 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, Guoqing Jiang

Hi,

Now, a new io_extra_stats node is introduced which works with the config
option. with the double insurance, I think no one will suffer from the
additional overhead unless people really need the statistic tables.

And I will delete the kernel option if it doesn't make sense to have both
kernel option and sysfs node. Review and comment are welcomed.

Thanks,
Guoqing

RFC V1 -> RFC V2:
* don't call ktime_get_ns and drop unnecessary patches.
* add io_extra_stats to avoid potential overhead.

RFC V1 at https://marc.info/?l=linux-block&m=159419516730386&w=2

Guoqing Jiang (4):
  block: add a statistic table for io latency
  block: add a statistic table for io sector
  block: add io_extra_stats node
  block: call blk_additional_{latency,sector} only when io_extra_stats
    is true

 Documentation/block/queue-sysfs.rst |  6 ++++
 block/Kconfig                       |  9 +++++
 block/blk-core.c                    | 54 +++++++++++++++++++++++++++++
 block/blk-sysfs.c                   |  8 +++++
 block/genhd.c                       | 47 +++++++++++++++++++++++++
 include/linux/blkdev.h              |  2 ++
 include/linux/part_stat.h           |  8 +++++
 7 files changed, 134 insertions(+)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH RFC V2 1/4] block: add a statistic table for io latency
  2020-07-13 21:13 [PATCH RFC V2 0/4] block: add two statistic tables Guoqing Jiang
@ 2020-07-13 21:13 ` Guoqing Jiang
  2020-07-14  7:43   ` Johannes Thumshirn
  2020-08-11  7:22   ` Danil Kipnis
  2020-07-13 21:13 ` [PATCH RFC V2 2/4] block: add a statistic table for io sector Guoqing Jiang
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 10+ messages in thread
From: Guoqing Jiang @ 2020-07-13 21:13 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, Guoqing Jiang, Florian-Ewald Mueller

Usually, we get the status of block device by cat stat file,
but we can only know the total time with that file. And we
would like to know more accurate statistic, such as each
latency range, which helps people to diagnose if there is
issue about the hardware.

Also a new config option is introduced to control if people
want to know the additional statistics or not, and we use
the option for io sector in next patch.

Signed-off-by: Florian-Ewald Mueller <florian-ewald.mueller@cloud.ionos.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
---
 block/Kconfig             |  8 ++++++++
 block/blk-core.c          | 34 ++++++++++++++++++++++++++++++++++
 block/genhd.c             | 26 ++++++++++++++++++++++++++
 include/linux/part_stat.h |  7 +++++++
 4 files changed, 75 insertions(+)

diff --git a/block/Kconfig b/block/Kconfig
index bbad5e8bbffe..360f63111e2d 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -176,6 +176,14 @@ config BLK_DEBUG_FS
 	Unless you are building a kernel for a tiny system, you should
 	say Y here.
 
+config BLK_ADDITIONAL_DISKSTAT
+	bool "Block layer additional diskstat"
+	default n
+	help
+	Enabling this option adds io latency statistics for each block device.
+
+	If unsure, say N.
+
 config BLK_DEBUG_FS_ZONED
        bool
        default BLK_DEBUG_FS && BLK_DEV_ZONED
diff --git a/block/blk-core.c b/block/blk-core.c
index d9d632639bd1..036eb04782de 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1411,6 +1411,34 @@ static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
 	}
 }
 
+#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
+/*
+ * Either account additional stat for request if req is not NULL or account for bio.
+ */
+static void blk_additional_latency(struct hd_struct *part, const int sgrp,
+				   struct request *req, unsigned long start_jiffies)
+{
+	unsigned int idx;
+	unsigned long duration, now = READ_ONCE(jiffies);
+
+	if (req)
+		duration = jiffies_to_nsecs(now) - req->start_time_ns;
+	else
+		duration = jiffies_to_nsecs(now - start_jiffies);
+
+	duration /= NSEC_PER_MSEC;
+	duration /= HZ_TO_MSEC_NUM;
+	if (likely(duration > 0)) {
+		idx = ilog2(duration);
+		if (idx > ADD_STAT_NUM - 1)
+			idx = ADD_STAT_NUM - 1;
+	} else
+		idx = 0;
+	part_stat_inc(part, latency_table[idx][sgrp]);
+
+}
+#endif
+
 static void blk_account_io_completion(struct request *req, unsigned int bytes)
 {
 	if (req->part && blk_do_io_stat(req)) {
@@ -1440,6 +1468,9 @@ void blk_account_io_done(struct request *req, u64 now)
 		part = req->part;
 
 		update_io_ticks(part, jiffies, true);
+#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
+		blk_additional_latency(part, sgrp, req, 0);
+#endif
 		part_stat_inc(part, ios[sgrp]);
 		part_stat_add(part, nsecs[sgrp], now - req->start_time_ns);
 		part_stat_unlock();
@@ -1488,6 +1519,9 @@ void disk_end_io_acct(struct gendisk *disk, unsigned int op,
 
 	part_stat_lock();
 	update_io_ticks(part, now, true);
+#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
+	blk_additional_latency(part, sgrp, NULL, start_time);
+#endif
 	part_stat_add(part, nsecs[sgrp], jiffies_to_nsecs(duration));
 	part_stat_local_dec(part, in_flight[op_is_write(op)]);
 	part_stat_unlock();
diff --git a/block/genhd.c b/block/genhd.c
index c42a49f2f537..f5d2f110fb34 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1420,6 +1420,29 @@ static struct device_attribute dev_attr_fail_timeout =
 	__ATTR(io-timeout-fail, 0644, part_timeout_show, part_timeout_store);
 #endif
 
+#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
+static ssize_t io_latency_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct hd_struct *p = dev_to_part(dev);
+	size_t count = 0;
+	int i, sgrp;
+
+	for (i = 0; i < ADD_STAT_NUM; i++) {
+		count += scnprintf(buf + count, PAGE_SIZE - count, "%5d ms: ",
+				   (1 << i) * HZ_TO_MSEC_NUM);
+		for (sgrp = 0; sgrp < NR_STAT_GROUPS; sgrp++)
+			count += scnprintf(buf + count, PAGE_SIZE - count, "%lu ",
+					   part_stat_read(p, latency_table[i][sgrp]));
+		count += scnprintf(buf + count, PAGE_SIZE - count, "\n");
+	}
+
+	return count;
+}
+
+static struct device_attribute dev_attr_io_latency =
+	__ATTR(io_latency, 0444, io_latency_show, NULL);
+#endif
+
 static struct attribute *disk_attrs[] = {
 	&dev_attr_range.attr,
 	&dev_attr_ext_range.attr,
@@ -1438,6 +1461,9 @@ static struct attribute *disk_attrs[] = {
 #endif
 #ifdef CONFIG_FAIL_IO_TIMEOUT
 	&dev_attr_fail_timeout.attr,
+#endif
+#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
+	&dev_attr_io_latency.attr,
 #endif
 	NULL
 };
diff --git a/include/linux/part_stat.h b/include/linux/part_stat.h
index 24125778ef3e..fe3def8c69d7 100644
--- a/include/linux/part_stat.h
+++ b/include/linux/part_stat.h
@@ -9,6 +9,13 @@ struct disk_stats {
 	unsigned long sectors[NR_STAT_GROUPS];
 	unsigned long ios[NR_STAT_GROUPS];
 	unsigned long merges[NR_STAT_GROUPS];
+#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
+/*
+ * We measure latency (ms) for 1, 2, ..., 1024 and >=1024.
+ */
+#define ADD_STAT_NUM	12
+	unsigned long latency_table[ADD_STAT_NUM][NR_STAT_GROUPS];
+#endif
 	unsigned long io_ticks;
 	local_t in_flight[2];
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC V2 2/4] block: add a statistic table for io sector
  2020-07-13 21:13 [PATCH RFC V2 0/4] block: add two statistic tables Guoqing Jiang
  2020-07-13 21:13 ` [PATCH RFC V2 1/4] block: add a statistic table for io latency Guoqing Jiang
@ 2020-07-13 21:13 ` Guoqing Jiang
  2020-08-11 15:04   ` Aleksei Marov
  2020-07-13 21:13 ` [PATCH RFC V2 3/4] block: add io_extra_stats node Guoqing Jiang
  2020-07-13 21:13 ` [PATCH RFC V2 4/4] block: call blk_additional_{latency,sector} only when io_extra_stats is true Guoqing Jiang
  3 siblings, 1 reply; 10+ messages in thread
From: Guoqing Jiang @ 2020-07-13 21:13 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, Guoqing Jiang, Florian-Ewald Mueller

With the sector table, so we can know the distribution of
different IO size from upper layer, which means we could
have the opportunity to tune the performance based on the
mostly issued IOs.

Signed-off-by: Florian-Ewald Mueller <florian-ewald.mueller@cloud.ionos.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
---
 block/Kconfig             |  3 ++-
 block/blk-core.c          | 16 ++++++++++++++++
 block/genhd.c             | 21 +++++++++++++++++++++
 include/linux/part_stat.h |  3 ++-
 4 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index 360f63111e2d..c9b9f99152d8 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -180,7 +180,8 @@ config BLK_ADDITIONAL_DISKSTAT
 	bool "Block layer additional diskstat"
 	default n
 	help
-	Enabling this option adds io latency statistics for each block device.
+	Enabling this option adds io latency and io size statistics for each
+	block device.
 
 	If unsure, say N.
 
diff --git a/block/blk-core.c b/block/blk-core.c
index 036eb04782de..b67aedfbcefc 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1437,6 +1437,16 @@ static void blk_additional_latency(struct hd_struct *part, const int sgrp,
 	part_stat_inc(part, latency_table[idx][sgrp]);
 
 }
+
+static void blk_additional_sector(struct hd_struct *part, const int sgrp,
+				  unsigned int sectors)
+{
+	unsigned int KB = sectors / 2, idx;
+
+	idx = (KB > 0) ? ilog2(KB) : 0;
+	idx = (idx > (ADD_STAT_NUM - 1)) ? (ADD_STAT_NUM - 1) : idx;
+	part_stat_inc(part, size_table[idx][sgrp]);
+}
 #endif
 
 static void blk_account_io_completion(struct request *req, unsigned int bytes)
@@ -1447,6 +1457,9 @@ static void blk_account_io_completion(struct request *req, unsigned int bytes)
 
 		part_stat_lock();
 		part = req->part;
+#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
+		blk_additional_sector(part, sgrp, bytes >> SECTOR_SHIFT);
+#endif
 		part_stat_add(part, sectors[sgrp], bytes >> 9);
 		part_stat_unlock();
 	}
@@ -1502,6 +1515,9 @@ unsigned long disk_start_io_acct(struct gendisk *disk, unsigned int sectors,
 	update_io_ticks(part, now, false);
 	part_stat_inc(part, ios[sgrp]);
 	part_stat_add(part, sectors[sgrp], sectors);
+#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
+	blk_additional_sector(part, sgrp, sectors);
+#endif
 	part_stat_local_inc(part, in_flight[op_is_write(op)]);
 	part_stat_unlock();
 
diff --git a/block/genhd.c b/block/genhd.c
index f5d2f110fb34..cb9394521a8f 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1441,6 +1441,26 @@ static ssize_t io_latency_show(struct device *dev, struct device_attribute *attr
 
 static struct device_attribute dev_attr_io_latency =
 	__ATTR(io_latency, 0444, io_latency_show, NULL);
+
+static ssize_t io_size_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct hd_struct *p = dev_to_part(dev);
+	size_t count = 0;
+	int i, sgrp;
+
+	for (i = 0; i < ADD_STAT_NUM; i++) {
+		count += scnprintf(buf + count, PAGE_SIZE - count, "%5d KB: ", 1 << i);
+		for (sgrp = 0; sgrp < NR_STAT_GROUPS; sgrp++)
+			count += scnprintf(buf + count, PAGE_SIZE - count, "%lu ",
+					   part_stat_read(p, size_table[i][sgrp]));
+		count += scnprintf(buf + count, PAGE_SIZE - count, "\n");
+	}
+
+	return count;
+}
+
+static struct device_attribute dev_attr_io_size =
+	__ATTR(io_size, 0444, io_size_show, NULL);
 #endif
 
 static struct attribute *disk_attrs[] = {
@@ -1464,6 +1484,7 @@ static struct attribute *disk_attrs[] = {
 #endif
 #ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
 	&dev_attr_io_latency.attr,
+	&dev_attr_io_size.attr,
 #endif
 	NULL
 };
diff --git a/include/linux/part_stat.h b/include/linux/part_stat.h
index fe3def8c69d7..2b056cd70d1f 100644
--- a/include/linux/part_stat.h
+++ b/include/linux/part_stat.h
@@ -11,10 +11,11 @@ struct disk_stats {
 	unsigned long merges[NR_STAT_GROUPS];
 #ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
 /*
- * We measure latency (ms) for 1, 2, ..., 1024 and >=1024.
+ * We measure latency (ms) and size (sector) for 1, 2, ..., 1024 and >=1024.
  */
 #define ADD_STAT_NUM	12
 	unsigned long latency_table[ADD_STAT_NUM][NR_STAT_GROUPS];
+	unsigned long size_table[ADD_STAT_NUM][NR_STAT_GROUPS];
 #endif
 	unsigned long io_ticks;
 	local_t in_flight[2];
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC V2 3/4] block: add io_extra_stats node
  2020-07-13 21:13 [PATCH RFC V2 0/4] block: add two statistic tables Guoqing Jiang
  2020-07-13 21:13 ` [PATCH RFC V2 1/4] block: add a statistic table for io latency Guoqing Jiang
  2020-07-13 21:13 ` [PATCH RFC V2 2/4] block: add a statistic table for io sector Guoqing Jiang
@ 2020-07-13 21:13 ` Guoqing Jiang
  2020-07-13 21:13 ` [PATCH RFC V2 4/4] block: call blk_additional_{latency,sector} only when io_extra_stats is true Guoqing Jiang
  3 siblings, 0 replies; 10+ messages in thread
From: Guoqing Jiang @ 2020-07-13 21:13 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, Guoqing Jiang

Even we have introduced a Kconfig option (default N) to control the
accounting of additional data, but the option still could be enabled
occasionally while user doesn't care about the size and latency of io,
and they could suffer from the additional overhead. So introduce a
specific sysfs node to avoid such mistake.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
---
 Documentation/block/queue-sysfs.rst | 6 ++++++
 block/blk-sysfs.c                   | 8 ++++++++
 include/linux/blkdev.h              | 2 ++
 3 files changed, 16 insertions(+)

diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst
index 6a8513af9201..e7b5e0d77385 100644
--- a/Documentation/block/queue-sysfs.rst
+++ b/Documentation/block/queue-sysfs.rst
@@ -99,6 +99,12 @@ iostats (RW)
 This file is used to control (on/off) the iostats accounting of the
 disk.
 
+io_extra_stats (RW)
+-------------------
+This file is used to control (on/off) the additional accounting of the
+io size and io latency of disk, and BLK_ADDITIONAL_DISKSTAT should be
+enabled if you want the additional accounting.
+
 logical_block_size (RO)
 -----------------------
 This is the logical block size of the device, in bytes.
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index be67952e7be2..98bd788e32c3 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -287,6 +287,7 @@ queue_store_##name(struct request_queue *q, const char *page, size_t count) \
 QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
 QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
+QUEUE_SYSFS_BIT_FNS(io_extra_stats, IO_EXTRA_STAT, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
 static ssize_t queue_zoned_show(struct request_queue *q, char *page)
@@ -686,6 +687,12 @@ static struct queue_sysfs_entry queue_iostats_entry = {
 	.store = queue_store_iostats,
 };
 
+static struct queue_sysfs_entry queue_io_extra_stats_entry = {
+	.attr = {.name = "io_extra_stats", .mode = 0644 },
+	.show = queue_show_io_extra_stats,
+	.store = queue_store_io_extra_stats,
+};
+
 static struct queue_sysfs_entry queue_random_entry = {
 	.attr = {.name = "add_random", .mode = 0644 },
 	.show = queue_show_random,
@@ -777,6 +784,7 @@ static struct attribute *queue_attrs[] = {
 	&queue_wb_lat_entry.attr,
 	&queue_poll_delay_entry.attr,
 	&queue_io_timeout_entry.attr,
+	&queue_io_extra_stats_entry.attr,
 #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
 	&throtl_sample_time_entry.attr,
 #endif
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 69ad13dacd48..640190678bbc 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -610,6 +610,7 @@ struct request_queue {
 #define QUEUE_FLAG_PCI_P2PDMA	25	/* device supports PCI p2p requests */
 #define QUEUE_FLAG_ZONE_RESETALL 26	/* supports Zone Reset All */
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
+#define QUEUE_FLAG_IO_EXTRA_STAT 28	/* extra IO accounting for latency and size */
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP))
@@ -652,6 +653,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_pm_only(q)	atomic_read(&(q)->pm_only)
 #define blk_queue_fua(q)	test_bit(QUEUE_FLAG_FUA, &(q)->queue_flags)
 #define blk_queue_registered(q)	test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags)
+#define blk_queue_extra_io_stat(q) test_bit(QUEUE_FLAG_IO_EXTRA_STAT, &(q)->queue_flags)
 
 extern void blk_set_pm_only(struct request_queue *q);
 extern void blk_clear_pm_only(struct request_queue *q);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC V2 4/4] block: call blk_additional_{latency,sector} only when io_extra_stats is true
  2020-07-13 21:13 [PATCH RFC V2 0/4] block: add two statistic tables Guoqing Jiang
                   ` (2 preceding siblings ...)
  2020-07-13 21:13 ` [PATCH RFC V2 3/4] block: add io_extra_stats node Guoqing Jiang
@ 2020-07-13 21:13 ` Guoqing Jiang
  3 siblings, 0 replies; 10+ messages in thread
From: Guoqing Jiang @ 2020-07-13 21:13 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, Guoqing Jiang

If ADDITIONAL_DISKSTAT is enabled carelessly, then it is bad to people
who don't want the additional overhead.

Now add check before call blk_additional_{latency,sector}, which guarntee
only those who really know about the attribute can account the additional
data.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
---
 block/blk-core.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b67aedfbcefc..171e99ed820b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1458,7 +1458,8 @@ static void blk_account_io_completion(struct request *req, unsigned int bytes)
 		part_stat_lock();
 		part = req->part;
 #ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
-		blk_additional_sector(part, sgrp, bytes >> SECTOR_SHIFT);
+		if (blk_queue_extra_io_stat(req->q))
+			blk_additional_sector(part, sgrp, bytes >> SECTOR_SHIFT);
 #endif
 		part_stat_add(part, sectors[sgrp], bytes >> 9);
 		part_stat_unlock();
@@ -1482,7 +1483,8 @@ void blk_account_io_done(struct request *req, u64 now)
 
 		update_io_ticks(part, jiffies, true);
 #ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
-		blk_additional_latency(part, sgrp, req, 0);
+		if (blk_queue_extra_io_stat(req->q))
+			blk_additional_latency(part, sgrp, req, 0);
 #endif
 		part_stat_inc(part, ios[sgrp]);
 		part_stat_add(part, nsecs[sgrp], now - req->start_time_ns);
@@ -1516,7 +1518,8 @@ unsigned long disk_start_io_acct(struct gendisk *disk, unsigned int sectors,
 	part_stat_inc(part, ios[sgrp]);
 	part_stat_add(part, sectors[sgrp], sectors);
 #ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
-	blk_additional_sector(part, sgrp, sectors);
+	if (blk_queue_extra_io_stat(disk->queue))
+		blk_additional_sector(part, sgrp, sectors);
 #endif
 	part_stat_local_inc(part, in_flight[op_is_write(op)]);
 	part_stat_unlock();
@@ -1536,7 +1539,8 @@ void disk_end_io_acct(struct gendisk *disk, unsigned int op,
 	part_stat_lock();
 	update_io_ticks(part, now, true);
 #ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
-	blk_additional_latency(part, sgrp, NULL, start_time);
+	if (blk_queue_extra_io_stat(disk->queue))
+		blk_additional_latency(part, sgrp, NULL, start_time);
 #endif
 	part_stat_add(part, nsecs[sgrp], jiffies_to_nsecs(duration));
 	part_stat_local_dec(part, in_flight[op_is_write(op)]);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC V2 1/4] block: add a statistic table for io latency
  2020-07-13 21:13 ` [PATCH RFC V2 1/4] block: add a statistic table for io latency Guoqing Jiang
@ 2020-07-14  7:43   ` Johannes Thumshirn
  2020-07-14  8:25     ` Guoqing Jiang
  2020-08-11  7:22   ` Danil Kipnis
  1 sibling, 1 reply; 10+ messages in thread
From: Johannes Thumshirn @ 2020-07-14  7:43 UTC (permalink / raw)
  To: Guoqing Jiang, axboe; +Cc: linux-block, Florian-Ewald Mueller

On 13/07/2020 23:14, Guoqing Jiang wrote:
> diff --git a/block/blk-core.c b/block/blk-core.c
> index d9d632639bd1..036eb04782de 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1411,6 +1411,34 @@ static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
>  	}
>  }
>  
> +#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
> +/*
> + * Either account additional stat for request if req is not NULL or account for bio.
> + */
> +static void blk_additional_latency(struct hd_struct *part, const int sgrp,
> +				   struct request *req, unsigned long start_jiffies)
> +{
> +	unsigned int idx;
> +	unsigned long duration, now = READ_ONCE(jiffies);
> +
> +	if (req)
> +		duration = jiffies_to_nsecs(now) - req->start_time_ns;
> +	else
> +		duration = jiffies_to_nsecs(now - start_jiffies);
> +
> +	duration /= NSEC_PER_MSEC;
> +	duration /= HZ_TO_MSEC_NUM;
> +	if (likely(duration > 0)) {
> +		idx = ilog2(duration);
> +		if (idx > ADD_STAT_NUM - 1)
> +			idx = ADD_STAT_NUM - 1;
> +	} else
> +		idx = 0;
> +	part_stat_inc(part, latency_table[idx][sgrp]);
> +
> +}
> +#endif
> +
>  static void blk_account_io_completion(struct request *req, unsigned int bytes)
>  {
>  	if (req->part && blk_do_io_stat(req)) {
> @@ -1440,6 +1468,9 @@ void blk_account_io_done(struct request *req, u64 now)
>  		part = req->part;
>  
>  		update_io_ticks(part, jiffies, true);
> +#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
> +		blk_additional_latency(part, sgrp, req, 0);
> +#endif

Not commenting on the general idea here but only the code. The above introduces quite a
lot of ifdefs in code. Please at least move the #ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
into the function body of blk_additional_latency() so you don't need any ifdefs at the
call sites.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC V2 1/4] block: add a statistic table for io latency
  2020-07-14  7:43   ` Johannes Thumshirn
@ 2020-07-14  8:25     ` Guoqing Jiang
  0 siblings, 0 replies; 10+ messages in thread
From: Guoqing Jiang @ 2020-07-14  8:25 UTC (permalink / raw)
  To: Johannes Thumshirn, axboe; +Cc: linux-block, Florian-Ewald Mueller

On 7/14/20 9:43 AM, Johannes Thumshirn wrote:
> On 13/07/2020 23:14, Guoqing Jiang wrote:
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index d9d632639bd1..036eb04782de 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1411,6 +1411,34 @@ static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
>>   	}
>>   }
>>   
>> +#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
>> +/*
>> + * Either account additional stat for request if req is not NULL or account for bio.
>> + */
>> +static void blk_additional_latency(struct hd_struct *part, const int sgrp,
>> +				   struct request *req, unsigned long start_jiffies)
>> +{
>> +	unsigned int idx;
>> +	unsigned long duration, now = READ_ONCE(jiffies);
>> +
>> +	if (req)
>> +		duration = jiffies_to_nsecs(now) - req->start_time_ns;
>> +	else
>> +		duration = jiffies_to_nsecs(now - start_jiffies);
>> +
>> +	duration /= NSEC_PER_MSEC;
>> +	duration /= HZ_TO_MSEC_NUM;
>> +	if (likely(duration > 0)) {
>> +		idx = ilog2(duration);
>> +		if (idx > ADD_STAT_NUM - 1)
>> +			idx = ADD_STAT_NUM - 1;
>> +	} else
>> +		idx = 0;
>> +	part_stat_inc(part, latency_table[idx][sgrp]);
>> +
>> +}
>> +#endif
>> +
>>   static void blk_account_io_completion(struct request *req, unsigned int bytes)
>>   {
>>   	if (req->part && blk_do_io_stat(req)) {
>> @@ -1440,6 +1468,9 @@ void blk_account_io_done(struct request *req, u64 now)
>>   		part = req->part;
>>   
>>   		update_io_ticks(part, jiffies, true);
>> +#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
>> +		blk_additional_latency(part, sgrp, req, 0);
>> +#endif
> Not commenting on the general idea here but only the code. The above introduces quite a
> lot of ifdefs in code. Please at least move the #ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
> into the function body of blk_additional_latency() so you don't need any ifdefs at the
> call sites.

Sure, will do it, thanks for your suggestion.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC V2 1/4] block: add a statistic table for io latency
  2020-07-13 21:13 ` [PATCH RFC V2 1/4] block: add a statistic table for io latency Guoqing Jiang
  2020-07-14  7:43   ` Johannes Thumshirn
@ 2020-08-11  7:22   ` Danil Kipnis
  1 sibling, 0 replies; 10+ messages in thread
From: Danil Kipnis @ 2020-08-11  7:22 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Jens Axboe, linux-block, Florian-Ewald Mueller

On Mon, Jul 13, 2020 at 11:13 PM Guoqing Jiang
<guoqing.jiang@cloud.ionos.com> wrote:
>
> Usually, we get the status of block device by cat stat file,
> but we can only know the total time with that file. And we
> would like to know more accurate statistic, such as each
> latency range, which helps people to diagnose if there is
> issue about the hardware.
>
> Also a new config option is introduced to control if people
> want to know the additional statistics or not, and we use
> the option for io sector in next patch.
>
> Signed-off-by: Florian-Ewald Mueller <florian-ewald.mueller@cloud.ionos.com>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> ---
>  block/Kconfig             |  8 ++++++++
>  block/blk-core.c          | 34 ++++++++++++++++++++++++++++++++++
>  block/genhd.c             | 26 ++++++++++++++++++++++++++
>  include/linux/part_stat.h |  7 +++++++
>  4 files changed, 75 insertions(+)
>
> diff --git a/block/Kconfig b/block/Kconfig
> index bbad5e8bbffe..360f63111e2d 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -176,6 +176,14 @@ config BLK_DEBUG_FS
>         Unless you are building a kernel for a tiny system, you should
>         say Y here.
>
> +config BLK_ADDITIONAL_DISKSTAT
> +       bool "Block layer additional diskstat"
> +       default n
> +       help
> +       Enabling this option adds io latency statistics for each block device.
> +
> +       If unsure, say N.
> +
>  config BLK_DEBUG_FS_ZONED
>         bool
>         default BLK_DEBUG_FS && BLK_DEV_ZONED
> diff --git a/block/blk-core.c b/block/blk-core.c
> index d9d632639bd1..036eb04782de 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1411,6 +1411,34 @@ static void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
>         }
>  }
>
> +#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
> +/*
> + * Either account additional stat for request if req is not NULL or account for bio.
> + */
> +static void blk_additional_latency(struct hd_struct *part, const int sgrp,
> +                                  struct request *req, unsigned long start_jiffies)
> +{
> +       unsigned int idx;
> +       unsigned long duration, now = READ_ONCE(jiffies);
> +
> +       if (req)
> +               duration = jiffies_to_nsecs(now) - req->start_time_ns;
> +       else
> +               duration = jiffies_to_nsecs(now - start_jiffies);
> +
> +       duration /= NSEC_PER_MSEC;
> +       duration /= HZ_TO_MSEC_NUM;
> +       if (likely(duration > 0)) {
> +               idx = ilog2(duration);
> +               if (idx > ADD_STAT_NUM - 1)
> +                       idx = ADD_STAT_NUM - 1;
> +       } else
> +               idx = 0;
> +       part_stat_inc(part, latency_table[idx][sgrp]);
> +
> +}
> +#endif
> +
>  static void blk_account_io_completion(struct request *req, unsigned int bytes)
>  {
>         if (req->part && blk_do_io_stat(req)) {
> @@ -1440,6 +1468,9 @@ void blk_account_io_done(struct request *req, u64 now)
>                 part = req->part;
>
>                 update_io_ticks(part, jiffies, true);
> +#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
> +               blk_additional_latency(part, sgrp, req, 0);
> +#endif
>                 part_stat_inc(part, ios[sgrp]);
>                 part_stat_add(part, nsecs[sgrp], now - req->start_time_ns);
>                 part_stat_unlock();
> @@ -1488,6 +1519,9 @@ void disk_end_io_acct(struct gendisk *disk, unsigned int op,
>
>         part_stat_lock();
>         update_io_ticks(part, now, true);
> +#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
> +       blk_additional_latency(part, sgrp, NULL, start_time);
> +#endif
>         part_stat_add(part, nsecs[sgrp], jiffies_to_nsecs(duration));
>         part_stat_local_dec(part, in_flight[op_is_write(op)]);
>         part_stat_unlock();
> diff --git a/block/genhd.c b/block/genhd.c
> index c42a49f2f537..f5d2f110fb34 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -1420,6 +1420,29 @@ static struct device_attribute dev_attr_fail_timeout =
>         __ATTR(io-timeout-fail, 0644, part_timeout_show, part_timeout_store);
>  #endif
>
> +#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
> +static ssize_t io_latency_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +       struct hd_struct *p = dev_to_part(dev);
> +       size_t count = 0;
> +       int i, sgrp;
> +
> +       for (i = 0; i < ADD_STAT_NUM; i++) {
> +               count += scnprintf(buf + count, PAGE_SIZE - count, "%5d ms: ",
> +                                  (1 << i) * HZ_TO_MSEC_NUM);
> +               for (sgrp = 0; sgrp < NR_STAT_GROUPS; sgrp++)
> +                       count += scnprintf(buf + count, PAGE_SIZE - count, "%lu ",
> +                                          part_stat_read(p, latency_table[i][sgrp]));
> +               count += scnprintf(buf + count, PAGE_SIZE - count, "\n");
> +       }
> +
> +       return count;
> +}
> +
> +static struct device_attribute dev_attr_io_latency =
> +       __ATTR(io_latency, 0444, io_latency_show, NULL);
> +#endif
> +
>  static struct attribute *disk_attrs[] = {
>         &dev_attr_range.attr,
>         &dev_attr_ext_range.attr,
> @@ -1438,6 +1461,9 @@ static struct attribute *disk_attrs[] = {
>  #endif
>  #ifdef CONFIG_FAIL_IO_TIMEOUT
>         &dev_attr_fail_timeout.attr,
> +#endif
> +#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
> +       &dev_attr_io_latency.attr,
>  #endif
>         NULL
>  };
> diff --git a/include/linux/part_stat.h b/include/linux/part_stat.h
> index 24125778ef3e..fe3def8c69d7 100644
> --- a/include/linux/part_stat.h
> +++ b/include/linux/part_stat.h
> @@ -9,6 +9,13 @@ struct disk_stats {
>         unsigned long sectors[NR_STAT_GROUPS];
>         unsigned long ios[NR_STAT_GROUPS];
>         unsigned long merges[NR_STAT_GROUPS];
> +#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
> +/*
> + * We measure latency (ms) for 1, 2, ..., 1024 and >=1024.
> + */
> +#define ADD_STAT_NUM   12
> +       unsigned long latency_table[ADD_STAT_NUM][NR_STAT_GROUPS];
> +#endif
>         unsigned long io_ticks;
>         local_t in_flight[2];
>  };
> --
> 2.17.1
>
Hi,

This feature is very useful to analyse io performance in a cluster of
Linux machines. For example an io is generated in the block layer of a
VM, enters the block layer of the host, passes through a couple of
block devices, is then sent over a network to a number of remote
machines, enters the block layer there, crosses yet another couple of
block devices and finally gets submitted to the disks. Then
confirmations travel all the way back to the block layer of the host
and at some point bio_endio is called in the vm.

- Hey folks, a lower accumulated io latency would be nice.
- NP, where do we start?
- ...
- Ping?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC V2 2/4] block: add a statistic table for io sector
  2020-07-13 21:13 ` [PATCH RFC V2 2/4] block: add a statistic table for io sector Guoqing Jiang
@ 2020-08-11 15:04   ` Aleksei Marov
  2020-08-11 15:48     ` Guoqing Jiang
  0 siblings, 1 reply; 10+ messages in thread
From: Aleksei Marov @ 2020-08-11 15:04 UTC (permalink / raw)
  To: Guoqing Jiang, axboe; +Cc: linux-block, Florian-Ewald Mueller

Is it possible to collect the very same stats (distribution of sizes and
distribution of lat) without having static maps in kernel but with eBPF tracing?
Like using https://github.com/iovisor/bcc/tree/master/tools/
* biolatency for lat distribution
* bitesize for size distribution
Please, have a look at these and similar tools (biolatpcts, biosnoop). Check the
examples they have 
https://github.com/iovisor/bcc/blob/master/tools/bitesize_example.txt
https://github.com/iovisor/bcc/blob/master/tools/biolatency_example.txt
Let me know what is the difference comparing to your stats.

Best Regards
Aleksei Marov

On Mon, 2020-07-13 at 23:13 +0200, Guoqing Jiang wrote:
> With the sector table, so we can know the distribution of
> different IO size from upper layer, which means we could
> have the opportunity to tune the performance based on the
> mostly issued IOs.
> 
> Signed-off-by: Florian-Ewald Mueller <florian-ewald.mueller@cloud.ionos.com>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> ---
>  block/Kconfig             |  3 ++-
>  block/blk-core.c          | 16 ++++++++++++++++
>  block/genhd.c             | 21 +++++++++++++++++++++
>  include/linux/part_stat.h |  3 ++-
>  4 files changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/block/Kconfig b/block/Kconfig
> index 360f63111e2d..c9b9f99152d8 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -180,7 +180,8 @@ config BLK_ADDITIONAL_DISKSTAT
>  	bool "Block layer additional diskstat"
>  	default n
>  	help
> -	Enabling this option adds io latency statistics for each block device.
> +	Enabling this option adds io latency and io size statistics for each
> +	block device.
>  
>  	If unsure, say N.
>  
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 036eb04782de..b67aedfbcefc 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1437,6 +1437,16 @@ static void blk_additional_latency(struct hd_struct
> *part, const int sgrp,
>  	part_stat_inc(part, latency_table[idx][sgrp]);
>  
>  }
> +
> +static void blk_additional_sector(struct hd_struct *part, const int sgrp,
> +				  unsigned int sectors)
> +{
> +	unsigned int KB = sectors / 2, idx;
> +
> +	idx = (KB > 0) ? ilog2(KB) : 0;
> +	idx = (idx > (ADD_STAT_NUM - 1)) ? (ADD_STAT_NUM - 1) : idx;
> +	part_stat_inc(part, size_table[idx][sgrp]);
> +}
>  #endif
>  
>  static void blk_account_io_completion(struct request *req, unsigned int
> bytes)
> @@ -1447,6 +1457,9 @@ static void blk_account_io_completion(struct request
> *req, unsigned int bytes)
>  
>  		part_stat_lock();
>  		part = req->part;
> +#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
> +		blk_additional_sector(part, sgrp, bytes >> SECTOR_SHIFT);
> +#endif
>  		part_stat_add(part, sectors[sgrp], bytes >> 9);
>  		part_stat_unlock();
>  	}
> @@ -1502,6 +1515,9 @@ unsigned long disk_start_io_acct(struct gendisk *disk,
> unsigned int sectors,
>  	update_io_ticks(part, now, false);
>  	part_stat_inc(part, ios[sgrp]);
>  	part_stat_add(part, sectors[sgrp], sectors);
> +#ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
> +	blk_additional_sector(part, sgrp, sectors);
> +#endif
>  	part_stat_local_inc(part, in_flight[op_is_write(op)]);
>  	part_stat_unlock();
>  
> diff --git a/block/genhd.c b/block/genhd.c
> index f5d2f110fb34..cb9394521a8f 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -1441,6 +1441,26 @@ static ssize_t io_latency_show(struct device *dev,
> struct device_attribute *attr
>  
>  static struct device_attribute dev_attr_io_latency =
>  	__ATTR(io_latency, 0444, io_latency_show, NULL);
> +
> +static ssize_t io_size_show(struct device *dev, struct device_attribute
> *attr, char *buf)
> +{
> +	struct hd_struct *p = dev_to_part(dev);
> +	size_t count = 0;
> +	int i, sgrp;
> +
> +	for (i = 0; i < ADD_STAT_NUM; i++) {
> +		count += scnprintf(buf + count, PAGE_SIZE - count, "%5d KB: ", 1
> << i);
> +		for (sgrp = 0; sgrp < NR_STAT_GROUPS; sgrp++)
> +			count += scnprintf(buf + count, PAGE_SIZE - count, "%lu
> ",
> +					   part_stat_read(p,
> size_table[i][sgrp]));
> +		count += scnprintf(buf + count, PAGE_SIZE - count, "\n");
> +	}
> +
> +	return count;
> +}
> +
> +static struct device_attribute dev_attr_io_size =
> +	__ATTR(io_size, 0444, io_size_show, NULL);
>  #endif
>  
>  static struct attribute *disk_attrs[] = {
> @@ -1464,6 +1484,7 @@ static struct attribute *disk_attrs[] = {
>  #endif
>  #ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
>  	&dev_attr_io_latency.attr,
> +	&dev_attr_io_size.attr,
>  #endif
>  	NULL
>  };
> diff --git a/include/linux/part_stat.h b/include/linux/part_stat.h
> index fe3def8c69d7..2b056cd70d1f 100644
> --- a/include/linux/part_stat.h
> +++ b/include/linux/part_stat.h
> @@ -11,10 +11,11 @@ struct disk_stats {
>  	unsigned long merges[NR_STAT_GROUPS];
>  #ifdef CONFIG_BLK_ADDITIONAL_DISKSTAT
>  /*
> - * We measure latency (ms) for 1, 2, ..., 1024 and >=1024.
> + * We measure latency (ms) and size (sector) for 1, 2, ..., 1024 and >=1024.
>   */
>  #define ADD_STAT_NUM	12
>  	unsigned long latency_table[ADD_STAT_NUM][NR_STAT_GROUPS];
> +	unsigned long size_table[ADD_STAT_NUM][NR_STAT_GROUPS];
>  #endif
>  	unsigned long io_ticks;
>  	local_t in_flight[2];


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC V2 2/4] block: add a statistic table for io sector
  2020-08-11 15:04   ` Aleksei Marov
@ 2020-08-11 15:48     ` Guoqing Jiang
  0 siblings, 0 replies; 10+ messages in thread
From: Guoqing Jiang @ 2020-08-11 15:48 UTC (permalink / raw)
  To: Aleksei Marov, axboe; +Cc: linux-block

On 8/11/20 5:04 PM, Aleksei Marov wrote:
> Is it possible to collect the very same stats (distribution of sizes and
> distribution of lat) without having static maps in kernel but with eBPF tracing?
> Like using https://github.com/iovisor/bcc/tree/master/tools/
> * biolatency for lat distribution
> * bitesize for size distribution
> Please, have a look at these and similar tools (biolatpcts, biosnoop). Check the
> examples they have
> https://github.com/iovisor/bcc/blob/master/tools/bitesize_example.txt
> https://github.com/iovisor/bcc/blob/master/tools/biolatency_example.txt
> Let me know what is the difference comparing to your stats.

The difference is about the cost, please see the link.

https://marc.info/?l=linux-block&m=159458634517068&w=2

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-08-11 16:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13 21:13 [PATCH RFC V2 0/4] block: add two statistic tables Guoqing Jiang
2020-07-13 21:13 ` [PATCH RFC V2 1/4] block: add a statistic table for io latency Guoqing Jiang
2020-07-14  7:43   ` Johannes Thumshirn
2020-07-14  8:25     ` Guoqing Jiang
2020-08-11  7:22   ` Danil Kipnis
2020-07-13 21:13 ` [PATCH RFC V2 2/4] block: add a statistic table for io sector Guoqing Jiang
2020-08-11 15:04   ` Aleksei Marov
2020-08-11 15:48     ` Guoqing Jiang
2020-07-13 21:13 ` [PATCH RFC V2 3/4] block: add io_extra_stats node Guoqing Jiang
2020-07-13 21:13 ` [PATCH RFC V2 4/4] block: call blk_additional_{latency,sector} only when io_extra_stats is true Guoqing Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).