All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gulam Mohamed <gulam.mohamed@oracle.com>
To: linux-block@vger.kernel.org
Cc: axboe@kernel.dk, philipp.reisner@linbit.com,
	lars.ellenberg@linbit.com, christoph.boehmwalder@linbit.com,
	minchan@kernel.org, ngupta@vflare.org, senozhatsky@chromium.org,
	colyli@suse.de, kent.overstreet@gmail.com, agk@redhat.com,
	snitzer@kernel.org, dm-devel@redhat.com, song@kernel.org,
	dan.j.williams@intel.com, vishal.l.verma@intel.com,
	dave.jiang@intel.com, ira.weiny@intel.com, junxiao.bi@oracle.com,
	gulam.mohamed@oracle.com, martin.petersen@oracle.com,
	kch@nvidia.com, drbd-dev@lists.linbit.com,
	linux-kernel@vger.kernel.org, linux-bcache@vger.kernel.org,
	linux-raid@vger.kernel.org, nvdimm@lists.linux.dev,
	konrad.wilk@oracle.com, joe.jin@oracle.com,
	rajesh.sivaramasubramaniom@oracle.com,
	shminderjit.singh@oracle.com
Subject: [PATCH for-6.2/block V3 2/2] block: Change the granularity of io ticks from ms to ns
Date: Wed, 21 Dec 2022 04:05:06 +0000	[thread overview]
Message-ID: <20221221040506.1174644-2-gulam.mohamed@oracle.com> (raw)
In-Reply-To: <20221221040506.1174644-1-gulam.mohamed@oracle.com>

Problem Desc
============
The "iostat" user-space utility was showing %util as 100% for the disks
which has latencies less than a milli-second i.e for latencies in the
range of micro-seconds and below.

Root Cause
==========
The IO accounting in block layer is currently done by updating the
io_ticks in jiffies which is of milli-seconds granularity. Due to this,
for the devices with IO latencies less than a milli-second, the latency
will be accounted as 1 milli-second even-though its in the range of
micro-seconds. This was causing the iostat command to show %util
as 100% which is incorrect.

Recreationg of the issue
========================
Setup
-----
Devices: NVMe 24 devices
Model number: 4610 (Intel)

fio
---
[global]
bs=4K
iodepth=1
direct=1
ioengine=libaio
group_reporting
time_based
runtime=100
thinktime=1ms
numjobs=1
name=raw-write
rw=randrw
ignore_error=EIO:EIO
[job1]
filename=/dev/nvme0n1

iostat o/p
----------

Device   %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme3n1   0.00    0.05    0.00  75.38     0.50     0.00   0.00 100.00

Device   %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme3n1   0.00    0.05    0.00  74.74     0.50     0.00   0.00 100.00

Solution
========
Use ktime_get_ns() to update the disk_stats io_ticks so that the io_ticks
are updated for every io start and end times.

Issues using ktime
==================

Using ktime_get_ns() has a performance overhead as ktime will go ahead
and reads the clock everytime its called. Following test environment
was used by Jens Axboe on which t/io_uring was run which has shown a
high performance drop.

Devices
-------
SSDs: P5800X
No of devices: 24

io_uring config
---------------
Polled IO
iostats: Enabled
Reads: Random
Block Size: 512
QDepth: 128
Batch Submit: 32
Batch Complete: 32
No of Threads: 24

With the above environment and ktime patch, it has shown a performance
drop of ~25% from iostats disabled and ~19% performance drop from current
iostats enabled. This performance drop is high.

Suggestion from Jens Axboe
==========================
Jens Axboe suggested to split the bigger patch into two as follows:

1. In first patch, change all the types from unsigned long to u64, that
   can be done while retaining jiffies.

2. In second patch, add an iostats == 2 setting, which enables the higher
   resolution mode using ktime. We'd still default to 1, lower granularity
   iostats enabled.

Fix details
===========
1. Use ktime_get_ns() to get the current nano-seconds to update the
   io_ticks for start and end time stamps in block layer for io accounting

2. Create a new setting '2' in sysfs for iostats variable i.e for
   /sys/block/<device-name>/queue/iostats, to enable the iostat (io
   accounting) with nano-seconds (using ktime) granularity. This setting
   should be enabled only if the iostat is needed with high resolution
   mode as it has a high performance drop

3. Earlier available settings were 0 and 1 for disable and enable io
   accounting with milli-seconds granularity (jiffies)

Testing
=======
Ran the t/io_uring command with following setup:

Devices
-------
SSDs: P4610
No of devices: 8

io_uring config
---------------
Polled IO
iostats: Enabled
Reads: Random
Block Size: 512
QDepth: 128
Batch Submit: 32
Batch Complete: 32
No of Threads: 24

io_uring o/p
------------
iostat=0, with patch: Maximum IOPS=10.09M
iostat=1, with patch: Maximum IOPS=9.84M
iostat=2, with patch: Maximum IOPS=9.48M

Changes from V2 to V3
=====================
1. Changed all the required variables data-type to u64 as a first patch
2. Create a new setting '2' for iostats in sysfs in this patch
3. Change the code to get the ktime values when iostat=2, in this patch

Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com>
---
 block/blk-core.c                  | 26 +++++++++++++++++----
 block/blk-mq.c                    |  4 ++--
 block/blk-sysfs.c                 | 39 ++++++++++++++++++++++++++++++-
 block/genhd.c                     | 18 ++++++++++----
 drivers/block/drbd/drbd_debugfs.c | 12 ++++++++--
 drivers/block/zram/zram_drv.c     |  3 ++-
 drivers/md/dm.c                   | 13 +++++++++--
 include/linux/blkdev.h            |  4 ++++
 8 files changed, 103 insertions(+), 16 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 5670032fe932..0b5e4eb909a5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -927,6 +927,18 @@ int iocb_bio_iopoll(struct kiocb *kiocb, struct io_comp_batch *iob,
 }
 EXPORT_SYMBOL_GPL(iocb_bio_iopoll);
 
+/*
+ * Get the time based upon the available granularity for io accounting
+ * If the resolution mode is set to precise (2) i.e
+ * (/sys/block/<device>/queue/iostats = 2), then this will return time
+ * in nano-seconds else this will return time in jiffies
+ */
+u64  blk_get_iostat_ticks(struct request_queue *q)
+{
+       return (blk_queue_precise_io_stat(q) ? ktime_get_ns() : jiffies);
+}
+EXPORT_SYMBOL_GPL(blk_get_iostat_ticks);
+
 void update_io_ticks(struct block_device *part, u64 now, bool end)
 {
 	u64 stamp;
@@ -968,20 +980,26 @@ EXPORT_SYMBOL(bdev_start_io_acct);
 u64 bio_start_io_acct(struct bio *bio)
 {
 	return bdev_start_io_acct(bio->bi_bdev, bio_sectors(bio),
-				  bio_op(bio), jiffies);
+				  bio_op(bio),
+				  blk_get_iostat_ticks(bio->bi_bdev->bd_queue));
 }
 EXPORT_SYMBOL_GPL(bio_start_io_acct);
 
 void bdev_end_io_acct(struct block_device *bdev, enum req_op op,
 		      u64 start_time)
 {
+	u64 now;
+	u64 duration;
+	struct request_queue *q = bdev_get_queue(bdev);
 	const int sgrp = op_stat_group(op);
-	u64 now = READ_ONCE(jiffies);
-	u64 duration = (unsigned long)now -(unsigned long) start_time;
+	now = blk_get_iostat_ticks(q);;
+	duration = (unsigned long)now -(unsigned long) start_time;
 
 	part_stat_lock();
 	update_io_ticks(bdev, now, true);
-	part_stat_add(bdev, nsecs[sgrp], jiffies_to_nsecs(duration));
+	part_stat_add(bdev, nsecs[sgrp],
+		(blk_queue_precise_io_stat(q) ? duration :
+		 jiffies_to_nsecs(duration)));
 	part_stat_local_dec(bdev, in_flight[op_is_write(op)]);
 	part_stat_unlock();
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4e6b3ccd4989..5318bf87f17e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -975,7 +975,7 @@ static void __blk_account_io_done(struct request *req, u64 now)
 	const int sgrp = op_stat_group(req_op(req));
 
 	part_stat_lock();
-	update_io_ticks(req->part, jiffies, true);
+	update_io_ticks(req->part, blk_get_iostat_ticks(req->q), true);
 	part_stat_inc(req->part, ios[sgrp]);
 	part_stat_add(req->part, nsecs[sgrp], now - req->start_time_ns);
 	part_stat_unlock();
@@ -1007,7 +1007,7 @@ static void __blk_account_io_start(struct request *rq)
 		rq->part = rq->q->disk->part0;
 
 	part_stat_lock();
-	update_io_ticks(rq->part, jiffies, false);
+	update_io_ticks(rq->part, blk_get_iostat_ticks(rq->q), false);
 	part_stat_unlock();
 }
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 93d9e9c9a6ea..e7959782ce59 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -307,7 +307,6 @@ queue_##name##_store(struct request_queue *q, const char *page, size_t count) \
 
 QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
 QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
-QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
@@ -363,6 +362,44 @@ static ssize_t queue_nomerges_store(struct request_queue *q, const char *page,
 	return ret;
 }
 
+static ssize_t queue_iostats_show(struct request_queue *q, char *page)
+{
+       bool iostat = blk_queue_io_stat(q);
+       bool precise_iostat = blk_queue_precise_io_stat(q);
+
+       return queue_var_show(iostat << precise_iostat, page);
+}
+
+static ssize_t queue_iostats_store(struct request_queue *q, const char *page,
+                                  size_t count)
+{
+       unsigned long val;
+       ssize_t ret = -EINVAL;
+
+       ret = queue_var_store(&val, page, count);
+
+       if (ret < 0)
+               return ret;
+
+       if (val == 2) {
+               blk_queue_flag_set(QUEUE_FLAG_IO_STAT, q);
+               blk_queue_flag_set(QUEUE_FLAG_PRECISE_IO_STAT, q);
+               q->disk->part0->bd_stamp = 0;
+       }
+       else if (val == 1) {
+               blk_queue_flag_set(QUEUE_FLAG_IO_STAT, q);
+               blk_queue_flag_clear(QUEUE_FLAG_PRECISE_IO_STAT, q);
+               q->disk->part0->bd_stamp = 0;
+       }
+       else if (val == 0) {
+               blk_queue_flag_clear(QUEUE_FLAG_IO_STAT, q);
+               blk_queue_flag_clear(QUEUE_FLAG_PRECISE_IO_STAT, q);
+               q->disk->part0->bd_stamp = 0;
+       }
+
+       return ret;
+}
+
 static ssize_t queue_rq_affinity_show(struct request_queue *q, char *page)
 {
 	bool set = test_bit(QUEUE_FLAG_SAME_COMP, &q->queue_flags);
diff --git a/block/genhd.c b/block/genhd.c
index 03a96d6473e1..d034219a4683 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -951,6 +951,7 @@ ssize_t part_stat_show(struct device *dev,
 	struct request_queue *q = bdev_get_queue(bdev);
 	struct disk_stats stat;
 	unsigned int inflight;
+	u64 stat_ioticks;
 
 	if (queue_is_mq(q))
 		inflight = blk_mq_in_flight(q, bdev);
@@ -959,10 +960,13 @@ ssize_t part_stat_show(struct device *dev,
 
 	if (inflight) {
 		part_stat_lock();
-		update_io_ticks(bdev, jiffies, true);
+		update_io_ticks(bdev, blk_get_iostat_ticks(q), true);
 		part_stat_unlock();
 	}
 	part_stat_read_all(bdev, &stat);
+	stat_ioticks = (blk_queue_precise_io_stat(q) ?
+				div_u64(stat.io_ticks, NSEC_PER_MSEC) :
+				jiffies_to_msecs(stat.io_ticks));
 	return sprintf(buf,
 		"%8lu %8lu %8llu %8u "
 		"%8lu %8lu %8llu %8u "
@@ -979,7 +983,7 @@ ssize_t part_stat_show(struct device *dev,
 		(unsigned long long)stat.sectors[STAT_WRITE],
 		(unsigned int)div_u64(stat.nsecs[STAT_WRITE], NSEC_PER_MSEC),
 		inflight,
-		jiffies_to_msecs(stat.io_ticks),
+		(unsigned int)stat_ioticks,
 		(unsigned int)div_u64(stat.nsecs[STAT_READ] +
 				      stat.nsecs[STAT_WRITE] +
 				      stat.nsecs[STAT_DISCARD] +
@@ -1217,6 +1221,8 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 	unsigned int inflight;
 	struct disk_stats stat;
 	unsigned long idx;
+	struct request_queue *q;
+	u64 stat_ioticks;
 
 	/*
 	if (&disk_to_dev(gp)->kobj.entry == block_class.devices.next)
@@ -1235,12 +1241,16 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 		else
 			inflight = part_in_flight(hd);
 
+		q = bdev_get_queue(hd);
 		if (inflight) {
 			part_stat_lock();
-			update_io_ticks(hd, jiffies, true);
+			update_io_ticks(hd, blk_get_iostat_ticks(q), true);
 			part_stat_unlock();
 		}
 		part_stat_read_all(hd, &stat);
+		stat_ioticks = (blk_queue_precise_io_stat(q) ?
+				div_u64(stat.io_ticks, NSEC_PER_MSEC) :
+				jiffies_to_msecs(stat.io_ticks));
 		seq_printf(seqf, "%4d %7d %pg "
 			   "%lu %lu %lu %u "
 			   "%lu %lu %lu %u "
@@ -1260,7 +1270,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 			   (unsigned int)div_u64(stat.nsecs[STAT_WRITE],
 							NSEC_PER_MSEC),
 			   inflight,
-			   jiffies_to_msecs(stat.io_ticks),
+			   (unsigned int)stat_ioticks,
 			   (unsigned int)div_u64(stat.nsecs[STAT_READ] +
 						 stat.nsecs[STAT_WRITE] +
 						 stat.nsecs[STAT_DISCARD] +
diff --git a/drivers/block/drbd/drbd_debugfs.c b/drivers/block/drbd/drbd_debugfs.c
index a72c096aa5b1..af193bcc4f3a 100644
--- a/drivers/block/drbd/drbd_debugfs.c
+++ b/drivers/block/drbd/drbd_debugfs.c
@@ -97,6 +97,12 @@ static void seq_print_one_request(struct seq_file *m, struct drbd_request *req,
 {
 	/* change anything here, fixup header below! */
 	unsigned int s = req->rq_state;
+	unsigned long start_time;
+	struct request_queue *q = req->device->rq_queue;
+
+	start_time = (blk_queue_precise_io_stat(q) ?
+			nsecs_to_jiffies(req->start_jif) :
+			req->start_jif);
 
 #define RQ_HDR_1 "epoch\tsector\tsize\trw"
 	seq_printf(m, "0x%x\t%llu\t%u\t%s",
@@ -105,7 +111,7 @@ static void seq_print_one_request(struct seq_file *m, struct drbd_request *req,
 		(s & RQ_WRITE) ? "W" : "R");
 
 #define RQ_HDR_2 "\tstart\tin AL\tsubmit"
-	seq_printf(m, "\t%d", jiffies_to_msecs(now - req->start_jif));
+	seq_printf(m, "\t%d", jiffies_to_msecs(now - start_time));
 	seq_print_age_or_dash(m, s & RQ_IN_ACT_LOG, now - req->in_actlog_jif);
 	seq_print_age_or_dash(m, s & RQ_LOCAL_PENDING, now - req->pre_submit_jif);
 
@@ -171,7 +177,9 @@ static void seq_print_waiting_for_AL(struct seq_file *m, struct drbd_resource *r
 			/* if the oldest request does not wait for the activity log
 			 * it is not interesting for us here */
 			if (req && !(req->rq_state & RQ_IN_ACT_LOG))
-				jif = req->start_jif;
+				jif = (blk_queue_precise_io_stat(device->rq_queue) ?
+						nsecs_to_jiffies(req->start_jif):
+						req->start_jif);
 			else
 				req = NULL;
 			spin_unlock_irq(&device->resource->req_lock);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index da28eb83e6ed..8e22487de7de 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1663,6 +1663,7 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	struct zram *zram;
 	struct bio_vec bv;
 	u64 start_time;
+	struct request_queue *q = bdev_get_queue(bdev);
 
 	if (PageTransHuge(page))
 		return -ENOTSUPP;
@@ -1682,7 +1683,7 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	bv.bv_offset = 0;
 
 	start_time = bdev_start_io_acct(bdev->bd_disk->part0,
-			SECTORS_PER_PAGE, op, jiffies);
+			SECTORS_PER_PAGE, op, blk_get_iostat_ticks(q));
 	ret = zram_bvec_rw(zram, &bv, index, offset, op, NULL);
 	bdev_end_io_acct(bdev->bd_disk->part0, op, start_time);
 out:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 011a85ea40da..1bb58a0b8cd1 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -482,7 +482,11 @@ static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode,
 
 u64 dm_start_time_ns_from_clone(struct bio *bio)
 {
-	return jiffies_to_nsecs(clone_to_tio(bio)->io->start_time);
+	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
+	u64 start_time_ns = (blk_queue_precise_io_stat(q) ?
+				clone_to_tio(bio)->io->start_time :
+				jiffies_to_nsecs(clone_to_tio(bio)->io->start_time));
+	return start_time_ns;
 }
 EXPORT_SYMBOL_GPL(dm_start_time_ns_from_clone);
 
@@ -498,6 +502,11 @@ static void dm_io_acct(struct dm_io *io, bool end)
 	struct mapped_device *md = io->md;
 	struct bio *bio = io->orig_bio;
 	unsigned int sectors;
+	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
+
+	u64 start_time = (blk_queue_precise_io_stat(q) ?
+				nsecs_to_jiffies(io->start_time) :
+				io->start_time);
 
 	/*
 	 * If REQ_PREFLUSH set, don't account payload, it will be
@@ -589,7 +598,7 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
 	io->orig_bio = bio;
 	io->md = md;
 	spin_lock_init(&io->lock);
-	io->start_time = jiffies;
+	io->start_time = blk_get_iostat_ticks(bio->bi_bdev->bd_queue);
 	io->flags = 0;
 
 	if (static_branch_unlikely(&stats_enabled))
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ca94d690d292..0543a536c6e5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -570,6 +570,7 @@ struct request_queue {
 #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
 #define QUEUE_FLAG_SQ_SCHED     30	/* single queue style io dispatch */
 #define QUEUE_FLAG_SKIP_TAGSET_QUIESCE	31 /* quiesce_tagset skip the queue*/
+#define QUEUE_FLAG_PRECISE_IO_STAT 32   /* To enable precise io accounting */
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1UL << QUEUE_FLAG_IO_STAT) |		\
 				 (1UL << QUEUE_FLAG_SAME_COMP) |	\
@@ -578,6 +579,7 @@ struct request_queue {
 void blk_queue_flag_set(unsigned int flag, struct request_queue *q);
 void blk_queue_flag_clear(unsigned int flag, struct request_queue *q);
 bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
+u64  blk_get_iostat_ticks(struct request_queue *q);
 
 #define blk_queue_stopped(q)	test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
 #define blk_queue_dying(q)	test_bit(QUEUE_FLAG_DYING, &(q)->queue_flags)
@@ -589,6 +591,8 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_stable_writes(q) \
 	test_bit(QUEUE_FLAG_STABLE_WRITES, &(q)->queue_flags)
 #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
+#define blk_queue_precise_io_stat(q)   \
+	test_bit(QUEUE_FLAG_PRECISE_IO_STAT, &(q)->queue_flags)
 #define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
 #define blk_queue_zone_resetall(q)	\
 	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
-- 
2.31.1


WARNING: multiple messages have this Message-ID (diff)
From: Gulam Mohamed <gulam.mohamed@oracle.com>
To: linux-block@vger.kernel.org
Cc: nvdimm@lists.linux.dev, shminderjit.singh@oracle.com,
	linux-kernel@vger.kernel.org, song@kernel.org,
	dm-devel@redhat.com, ira.weiny@intel.com, agk@redhat.com,
	drbd-dev@lists.linbit.com, dave.jiang@intel.com,
	christoph.boehmwalder@linbit.com, vishal.l.verma@intel.com,
	konrad.wilk@oracle.com, joe.jin@oracle.com,
	kent.overstreet@gmail.com, ngupta@vflare.org, kch@nvidia.com,
	senozhatsky@chromium.org, gulam.mohamed@oracle.com,
	snitzer@kernel.org, colyli@suse.de, linux-raid@vger.kernel.org,
	linux-bcache@vger.kernel.org, dan.j.williams@intel.com,
	axboe@kernel.dk, martin.petersen@oracle.com,
	rajesh.sivaramasubramaniom@oracle.com,
	philipp.reisner@linbit.com, junxiao.bi@oracle.com,
	minchan@kernel.org, lars.ellenberg@linbit.com
Subject: [dm-devel] [PATCH for-6.2/block V3 2/2] block: Change the granularity of io ticks from ms to ns
Date: Wed, 21 Dec 2022 04:05:06 +0000	[thread overview]
Message-ID: <20221221040506.1174644-2-gulam.mohamed@oracle.com> (raw)
In-Reply-To: <20221221040506.1174644-1-gulam.mohamed@oracle.com>

Problem Desc
============
The "iostat" user-space utility was showing %util as 100% for the disks
which has latencies less than a milli-second i.e for latencies in the
range of micro-seconds and below.

Root Cause
==========
The IO accounting in block layer is currently done by updating the
io_ticks in jiffies which is of milli-seconds granularity. Due to this,
for the devices with IO latencies less than a milli-second, the latency
will be accounted as 1 milli-second even-though its in the range of
micro-seconds. This was causing the iostat command to show %util
as 100% which is incorrect.

Recreationg of the issue
========================
Setup
-----
Devices: NVMe 24 devices
Model number: 4610 (Intel)

fio
---
[global]
bs=4K
iodepth=1
direct=1
ioengine=libaio
group_reporting
time_based
runtime=100
thinktime=1ms
numjobs=1
name=raw-write
rw=randrw
ignore_error=EIO:EIO
[job1]
filename=/dev/nvme0n1

iostat o/p
----------

Device   %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme3n1   0.00    0.05    0.00  75.38     0.50     0.00   0.00 100.00

Device   %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme3n1   0.00    0.05    0.00  74.74     0.50     0.00   0.00 100.00

Solution
========
Use ktime_get_ns() to update the disk_stats io_ticks so that the io_ticks
are updated for every io start and end times.

Issues using ktime
==================

Using ktime_get_ns() has a performance overhead as ktime will go ahead
and reads the clock everytime its called. Following test environment
was used by Jens Axboe on which t/io_uring was run which has shown a
high performance drop.

Devices
-------
SSDs: P5800X
No of devices: 24

io_uring config
---------------
Polled IO
iostats: Enabled
Reads: Random
Block Size: 512
QDepth: 128
Batch Submit: 32
Batch Complete: 32
No of Threads: 24

With the above environment and ktime patch, it has shown a performance
drop of ~25% from iostats disabled and ~19% performance drop from current
iostats enabled. This performance drop is high.

Suggestion from Jens Axboe
==========================
Jens Axboe suggested to split the bigger patch into two as follows:

1. In first patch, change all the types from unsigned long to u64, that
   can be done while retaining jiffies.

2. In second patch, add an iostats == 2 setting, which enables the higher
   resolution mode using ktime. We'd still default to 1, lower granularity
   iostats enabled.

Fix details
===========
1. Use ktime_get_ns() to get the current nano-seconds to update the
   io_ticks for start and end time stamps in block layer for io accounting

2. Create a new setting '2' in sysfs for iostats variable i.e for
   /sys/block/<device-name>/queue/iostats, to enable the iostat (io
   accounting) with nano-seconds (using ktime) granularity. This setting
   should be enabled only if the iostat is needed with high resolution
   mode as it has a high performance drop

3. Earlier available settings were 0 and 1 for disable and enable io
   accounting with milli-seconds granularity (jiffies)

Testing
=======
Ran the t/io_uring command with following setup:

Devices
-------
SSDs: P4610
No of devices: 8

io_uring config
---------------
Polled IO
iostats: Enabled
Reads: Random
Block Size: 512
QDepth: 128
Batch Submit: 32
Batch Complete: 32
No of Threads: 24

io_uring o/p
------------
iostat=0, with patch: Maximum IOPS=10.09M
iostat=1, with patch: Maximum IOPS=9.84M
iostat=2, with patch: Maximum IOPS=9.48M

Changes from V2 to V3
=====================
1. Changed all the required variables data-type to u64 as a first patch
2. Create a new setting '2' for iostats in sysfs in this patch
3. Change the code to get the ktime values when iostat=2, in this patch

Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com>
---
 block/blk-core.c                  | 26 +++++++++++++++++----
 block/blk-mq.c                    |  4 ++--
 block/blk-sysfs.c                 | 39 ++++++++++++++++++++++++++++++-
 block/genhd.c                     | 18 ++++++++++----
 drivers/block/drbd/drbd_debugfs.c | 12 ++++++++--
 drivers/block/zram/zram_drv.c     |  3 ++-
 drivers/md/dm.c                   | 13 +++++++++--
 include/linux/blkdev.h            |  4 ++++
 8 files changed, 103 insertions(+), 16 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 5670032fe932..0b5e4eb909a5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -927,6 +927,18 @@ int iocb_bio_iopoll(struct kiocb *kiocb, struct io_comp_batch *iob,
 }
 EXPORT_SYMBOL_GPL(iocb_bio_iopoll);
 
+/*
+ * Get the time based upon the available granularity for io accounting
+ * If the resolution mode is set to precise (2) i.e
+ * (/sys/block/<device>/queue/iostats = 2), then this will return time
+ * in nano-seconds else this will return time in jiffies
+ */
+u64  blk_get_iostat_ticks(struct request_queue *q)
+{
+       return (blk_queue_precise_io_stat(q) ? ktime_get_ns() : jiffies);
+}
+EXPORT_SYMBOL_GPL(blk_get_iostat_ticks);
+
 void update_io_ticks(struct block_device *part, u64 now, bool end)
 {
 	u64 stamp;
@@ -968,20 +980,26 @@ EXPORT_SYMBOL(bdev_start_io_acct);
 u64 bio_start_io_acct(struct bio *bio)
 {
 	return bdev_start_io_acct(bio->bi_bdev, bio_sectors(bio),
-				  bio_op(bio), jiffies);
+				  bio_op(bio),
+				  blk_get_iostat_ticks(bio->bi_bdev->bd_queue));
 }
 EXPORT_SYMBOL_GPL(bio_start_io_acct);
 
 void bdev_end_io_acct(struct block_device *bdev, enum req_op op,
 		      u64 start_time)
 {
+	u64 now;
+	u64 duration;
+	struct request_queue *q = bdev_get_queue(bdev);
 	const int sgrp = op_stat_group(op);
-	u64 now = READ_ONCE(jiffies);
-	u64 duration = (unsigned long)now -(unsigned long) start_time;
+	now = blk_get_iostat_ticks(q);;
+	duration = (unsigned long)now -(unsigned long) start_time;
 
 	part_stat_lock();
 	update_io_ticks(bdev, now, true);
-	part_stat_add(bdev, nsecs[sgrp], jiffies_to_nsecs(duration));
+	part_stat_add(bdev, nsecs[sgrp],
+		(blk_queue_precise_io_stat(q) ? duration :
+		 jiffies_to_nsecs(duration)));
 	part_stat_local_dec(bdev, in_flight[op_is_write(op)]);
 	part_stat_unlock();
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4e6b3ccd4989..5318bf87f17e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -975,7 +975,7 @@ static void __blk_account_io_done(struct request *req, u64 now)
 	const int sgrp = op_stat_group(req_op(req));
 
 	part_stat_lock();
-	update_io_ticks(req->part, jiffies, true);
+	update_io_ticks(req->part, blk_get_iostat_ticks(req->q), true);
 	part_stat_inc(req->part, ios[sgrp]);
 	part_stat_add(req->part, nsecs[sgrp], now - req->start_time_ns);
 	part_stat_unlock();
@@ -1007,7 +1007,7 @@ static void __blk_account_io_start(struct request *rq)
 		rq->part = rq->q->disk->part0;
 
 	part_stat_lock();
-	update_io_ticks(rq->part, jiffies, false);
+	update_io_ticks(rq->part, blk_get_iostat_ticks(rq->q), false);
 	part_stat_unlock();
 }
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 93d9e9c9a6ea..e7959782ce59 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -307,7 +307,6 @@ queue_##name##_store(struct request_queue *q, const char *page, size_t count) \
 
 QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
 QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
-QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
@@ -363,6 +362,44 @@ static ssize_t queue_nomerges_store(struct request_queue *q, const char *page,
 	return ret;
 }
 
+static ssize_t queue_iostats_show(struct request_queue *q, char *page)
+{
+       bool iostat = blk_queue_io_stat(q);
+       bool precise_iostat = blk_queue_precise_io_stat(q);
+
+       return queue_var_show(iostat << precise_iostat, page);
+}
+
+static ssize_t queue_iostats_store(struct request_queue *q, const char *page,
+                                  size_t count)
+{
+       unsigned long val;
+       ssize_t ret = -EINVAL;
+
+       ret = queue_var_store(&val, page, count);
+
+       if (ret < 0)
+               return ret;
+
+       if (val == 2) {
+               blk_queue_flag_set(QUEUE_FLAG_IO_STAT, q);
+               blk_queue_flag_set(QUEUE_FLAG_PRECISE_IO_STAT, q);
+               q->disk->part0->bd_stamp = 0;
+       }
+       else if (val == 1) {
+               blk_queue_flag_set(QUEUE_FLAG_IO_STAT, q);
+               blk_queue_flag_clear(QUEUE_FLAG_PRECISE_IO_STAT, q);
+               q->disk->part0->bd_stamp = 0;
+       }
+       else if (val == 0) {
+               blk_queue_flag_clear(QUEUE_FLAG_IO_STAT, q);
+               blk_queue_flag_clear(QUEUE_FLAG_PRECISE_IO_STAT, q);
+               q->disk->part0->bd_stamp = 0;
+       }
+
+       return ret;
+}
+
 static ssize_t queue_rq_affinity_show(struct request_queue *q, char *page)
 {
 	bool set = test_bit(QUEUE_FLAG_SAME_COMP, &q->queue_flags);
diff --git a/block/genhd.c b/block/genhd.c
index 03a96d6473e1..d034219a4683 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -951,6 +951,7 @@ ssize_t part_stat_show(struct device *dev,
 	struct request_queue *q = bdev_get_queue(bdev);
 	struct disk_stats stat;
 	unsigned int inflight;
+	u64 stat_ioticks;
 
 	if (queue_is_mq(q))
 		inflight = blk_mq_in_flight(q, bdev);
@@ -959,10 +960,13 @@ ssize_t part_stat_show(struct device *dev,
 
 	if (inflight) {
 		part_stat_lock();
-		update_io_ticks(bdev, jiffies, true);
+		update_io_ticks(bdev, blk_get_iostat_ticks(q), true);
 		part_stat_unlock();
 	}
 	part_stat_read_all(bdev, &stat);
+	stat_ioticks = (blk_queue_precise_io_stat(q) ?
+				div_u64(stat.io_ticks, NSEC_PER_MSEC) :
+				jiffies_to_msecs(stat.io_ticks));
 	return sprintf(buf,
 		"%8lu %8lu %8llu %8u "
 		"%8lu %8lu %8llu %8u "
@@ -979,7 +983,7 @@ ssize_t part_stat_show(struct device *dev,
 		(unsigned long long)stat.sectors[STAT_WRITE],
 		(unsigned int)div_u64(stat.nsecs[STAT_WRITE], NSEC_PER_MSEC),
 		inflight,
-		jiffies_to_msecs(stat.io_ticks),
+		(unsigned int)stat_ioticks,
 		(unsigned int)div_u64(stat.nsecs[STAT_READ] +
 				      stat.nsecs[STAT_WRITE] +
 				      stat.nsecs[STAT_DISCARD] +
@@ -1217,6 +1221,8 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 	unsigned int inflight;
 	struct disk_stats stat;
 	unsigned long idx;
+	struct request_queue *q;
+	u64 stat_ioticks;
 
 	/*
 	if (&disk_to_dev(gp)->kobj.entry == block_class.devices.next)
@@ -1235,12 +1241,16 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 		else
 			inflight = part_in_flight(hd);
 
+		q = bdev_get_queue(hd);
 		if (inflight) {
 			part_stat_lock();
-			update_io_ticks(hd, jiffies, true);
+			update_io_ticks(hd, blk_get_iostat_ticks(q), true);
 			part_stat_unlock();
 		}
 		part_stat_read_all(hd, &stat);
+		stat_ioticks = (blk_queue_precise_io_stat(q) ?
+				div_u64(stat.io_ticks, NSEC_PER_MSEC) :
+				jiffies_to_msecs(stat.io_ticks));
 		seq_printf(seqf, "%4d %7d %pg "
 			   "%lu %lu %lu %u "
 			   "%lu %lu %lu %u "
@@ -1260,7 +1270,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 			   (unsigned int)div_u64(stat.nsecs[STAT_WRITE],
 							NSEC_PER_MSEC),
 			   inflight,
-			   jiffies_to_msecs(stat.io_ticks),
+			   (unsigned int)stat_ioticks,
 			   (unsigned int)div_u64(stat.nsecs[STAT_READ] +
 						 stat.nsecs[STAT_WRITE] +
 						 stat.nsecs[STAT_DISCARD] +
diff --git a/drivers/block/drbd/drbd_debugfs.c b/drivers/block/drbd/drbd_debugfs.c
index a72c096aa5b1..af193bcc4f3a 100644
--- a/drivers/block/drbd/drbd_debugfs.c
+++ b/drivers/block/drbd/drbd_debugfs.c
@@ -97,6 +97,12 @@ static void seq_print_one_request(struct seq_file *m, struct drbd_request *req,
 {
 	/* change anything here, fixup header below! */
 	unsigned int s = req->rq_state;
+	unsigned long start_time;
+	struct request_queue *q = req->device->rq_queue;
+
+	start_time = (blk_queue_precise_io_stat(q) ?
+			nsecs_to_jiffies(req->start_jif) :
+			req->start_jif);
 
 #define RQ_HDR_1 "epoch\tsector\tsize\trw"
 	seq_printf(m, "0x%x\t%llu\t%u\t%s",
@@ -105,7 +111,7 @@ static void seq_print_one_request(struct seq_file *m, struct drbd_request *req,
 		(s & RQ_WRITE) ? "W" : "R");
 
 #define RQ_HDR_2 "\tstart\tin AL\tsubmit"
-	seq_printf(m, "\t%d", jiffies_to_msecs(now - req->start_jif));
+	seq_printf(m, "\t%d", jiffies_to_msecs(now - start_time));
 	seq_print_age_or_dash(m, s & RQ_IN_ACT_LOG, now - req->in_actlog_jif);
 	seq_print_age_or_dash(m, s & RQ_LOCAL_PENDING, now - req->pre_submit_jif);
 
@@ -171,7 +177,9 @@ static void seq_print_waiting_for_AL(struct seq_file *m, struct drbd_resource *r
 			/* if the oldest request does not wait for the activity log
 			 * it is not interesting for us here */
 			if (req && !(req->rq_state & RQ_IN_ACT_LOG))
-				jif = req->start_jif;
+				jif = (blk_queue_precise_io_stat(device->rq_queue) ?
+						nsecs_to_jiffies(req->start_jif):
+						req->start_jif);
 			else
 				req = NULL;
 			spin_unlock_irq(&device->resource->req_lock);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index da28eb83e6ed..8e22487de7de 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1663,6 +1663,7 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	struct zram *zram;
 	struct bio_vec bv;
 	u64 start_time;
+	struct request_queue *q = bdev_get_queue(bdev);
 
 	if (PageTransHuge(page))
 		return -ENOTSUPP;
@@ -1682,7 +1683,7 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	bv.bv_offset = 0;
 
 	start_time = bdev_start_io_acct(bdev->bd_disk->part0,
-			SECTORS_PER_PAGE, op, jiffies);
+			SECTORS_PER_PAGE, op, blk_get_iostat_ticks(q));
 	ret = zram_bvec_rw(zram, &bv, index, offset, op, NULL);
 	bdev_end_io_acct(bdev->bd_disk->part0, op, start_time);
 out:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 011a85ea40da..1bb58a0b8cd1 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -482,7 +482,11 @@ static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode,
 
 u64 dm_start_time_ns_from_clone(struct bio *bio)
 {
-	return jiffies_to_nsecs(clone_to_tio(bio)->io->start_time);
+	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
+	u64 start_time_ns = (blk_queue_precise_io_stat(q) ?
+				clone_to_tio(bio)->io->start_time :
+				jiffies_to_nsecs(clone_to_tio(bio)->io->start_time));
+	return start_time_ns;
 }
 EXPORT_SYMBOL_GPL(dm_start_time_ns_from_clone);
 
@@ -498,6 +502,11 @@ static void dm_io_acct(struct dm_io *io, bool end)
 	struct mapped_device *md = io->md;
 	struct bio *bio = io->orig_bio;
 	unsigned int sectors;
+	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
+
+	u64 start_time = (blk_queue_precise_io_stat(q) ?
+				nsecs_to_jiffies(io->start_time) :
+				io->start_time);
 
 	/*
 	 * If REQ_PREFLUSH set, don't account payload, it will be
@@ -589,7 +598,7 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
 	io->orig_bio = bio;
 	io->md = md;
 	spin_lock_init(&io->lock);
-	io->start_time = jiffies;
+	io->start_time = blk_get_iostat_ticks(bio->bi_bdev->bd_queue);
 	io->flags = 0;
 
 	if (static_branch_unlikely(&stats_enabled))
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ca94d690d292..0543a536c6e5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -570,6 +570,7 @@ struct request_queue {
 #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
 #define QUEUE_FLAG_SQ_SCHED     30	/* single queue style io dispatch */
 #define QUEUE_FLAG_SKIP_TAGSET_QUIESCE	31 /* quiesce_tagset skip the queue*/
+#define QUEUE_FLAG_PRECISE_IO_STAT 32   /* To enable precise io accounting */
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1UL << QUEUE_FLAG_IO_STAT) |		\
 				 (1UL << QUEUE_FLAG_SAME_COMP) |	\
@@ -578,6 +579,7 @@ struct request_queue {
 void blk_queue_flag_set(unsigned int flag, struct request_queue *q);
 void blk_queue_flag_clear(unsigned int flag, struct request_queue *q);
 bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
+u64  blk_get_iostat_ticks(struct request_queue *q);
 
 #define blk_queue_stopped(q)	test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
 #define blk_queue_dying(q)	test_bit(QUEUE_FLAG_DYING, &(q)->queue_flags)
@@ -589,6 +591,8 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_stable_writes(q) \
 	test_bit(QUEUE_FLAG_STABLE_WRITES, &(q)->queue_flags)
 #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
+#define blk_queue_precise_io_stat(q)   \
+	test_bit(QUEUE_FLAG_PRECISE_IO_STAT, &(q)->queue_flags)
 #define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
 #define blk_queue_zone_resetall(q)	\
 	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
-- 
2.31.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


  reply	other threads:[~2022-12-21  4:10 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-21  4:05 [PATCH for-6.2/block V3 1/2] block: Data type conversion for IO accounting Gulam Mohamed
2022-12-21  4:05 ` [dm-devel] " Gulam Mohamed
2022-12-21  4:05 ` Gulam Mohamed [this message]
2022-12-21  4:05   ` [dm-devel] [PATCH for-6.2/block V3 2/2] block: Change the granularity of io ticks from ms to ns Gulam Mohamed
2022-12-21 16:09   ` kernel test robot
2022-12-21 16:09     ` [dm-devel] " kernel test robot
2022-12-21 17:09   ` Keith Busch
2022-12-21 17:09     ` [dm-devel] " Keith Busch
2022-12-23 14:47     ` Gulam Mohamed
2022-12-23 14:47       ` [dm-devel] " Gulam Mohamed
2022-12-21 22:54   ` kernel test robot
2022-12-21 22:54     ` [dm-devel] " kernel test robot
2022-12-24  1:26   ` kernel test robot
2022-12-24  1:26     ` [dm-devel] " kernel test robot
2023-01-17 17:52   ` Mike Snitzer
2023-01-17 17:52     ` [dm-devel] " Mike Snitzer
2022-12-21  9:25 ` [PATCH for-6.2/block V3 1/2] block: Data type conversion for IO accounting kernel test robot
2022-12-21  9:25   ` [dm-devel] " kernel test robot
2022-12-21 16:29 ` kernel test robot
2022-12-21 16:29   ` [dm-devel] " kernel test robot
2022-12-22 15:36 ` kernel test robot
2022-12-22 15:36   ` [dm-devel] " kernel test robot
2022-12-25 11:40 ` Sagi Grimberg
2022-12-25 11:40   ` [dm-devel] " Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221221040506.1174644-2-gulam.mohamed@oracle.com \
    --to=gulam.mohamed@oracle.com \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=christoph.boehmwalder@linbit.com \
    --cc=colyli@suse.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dm-devel@redhat.com \
    --cc=drbd-dev@lists.linbit.com \
    --cc=ira.weiny@intel.com \
    --cc=joe.jin@oracle.com \
    --cc=junxiao.bi@oracle.com \
    --cc=kch@nvidia.com \
    --cc=kent.overstreet@gmail.com \
    --cc=konrad.wilk@oracle.com \
    --cc=lars.ellenberg@linbit.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=minchan@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=philipp.reisner@linbit.com \
    --cc=rajesh.sivaramasubramaniom@oracle.com \
    --cc=senozhatsky@chromium.org \
    --cc=shminderjit.singh@oracle.com \
    --cc=snitzer@kernel.org \
    --cc=song@kernel.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.