All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET RFC 0/2] Cache issue side time querying
@ 2024-01-15 21:53 Jens Axboe
  2024-01-15 21:53 ` [PATCH 1/2] block: add blk_time_get_ns() helper Jens Axboe
  2024-01-15 21:53 ` [PATCH 2/2] block: cache current nsec time in struct blk_plug Jens Axboe
  0 siblings, 2 replies; 11+ messages in thread
From: Jens Axboe @ 2024-01-15 21:53 UTC (permalink / raw)
  To: linux-block

Hi,

When I run my peak testing to see if we've regressed, my test script
always does:

	echo 1 > /sys/block/$DEV/queue/iostats
	echo 2 > /sys/block/$DEV/queue/nomerges

for each device being used. It's unfortunate that we need to disable
iostats, but without doing that, I lose about 12% performance. The main
reason for that is the time querying we need to do, when iostats are
enabled. As it turns out, lots of other block code is quite trigger
happy with querying time as well. We do have some nice batching in place
which helps ammortize that, but it's not perfect.

This trivial patchset simply caches the current time in struct blk_plug,
on the premise that any issue side time querying can get adequate
granularity through that. Nobody really needs nsec granularity on the
timestamp.

Results in patch 2, but tldr is a more than 6% improvement (108M -> 115M
IOPS) for my test case, which doesn't even enable most of the costly
block layer items that you'd typically find in a distro and which would
further increase the number of issue side time calls.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] block: add blk_time_get_ns() helper
  2024-01-15 21:53 [PATCHSET RFC 0/2] Cache issue side time querying Jens Axboe
@ 2024-01-15 21:53 ` Jens Axboe
  2024-01-16  6:42   ` Yu Kuai
  2024-01-15 21:53 ` [PATCH 2/2] block: cache current nsec time in struct blk_plug Jens Axboe
  1 sibling, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2024-01-15 21:53 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

No functional changes intended, this patch just wraps ktime_get_ns()
with a block helper.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-cgroup.c     |  2 +-
 block/blk-flush.c      |  2 +-
 block/blk-iocost.c     |  6 +++---
 block/blk-mq.c         | 16 ++++++++--------
 block/blk-throttle.c   |  6 +++---
 block/blk-wbt.c        |  5 ++---
 include/linux/blkdev.h |  5 +++++
 7 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index ff93c385ba5a..bdbb557feb5a 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1846,7 +1846,7 @@ static void blkcg_maybe_throttle_blkg(struct blkcg_gq *blkg, bool use_memdelay)
 {
 	unsigned long pflags;
 	bool clamp;
-	u64 now = ktime_to_ns(ktime_get());
+	u64 now = blk_time_get_ns();
 	u64 exp;
 	u64 delay_nsec = 0;
 	int tok;
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 3f4d41952ef2..b0f314f4bc14 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -143,7 +143,7 @@ static void blk_account_io_flush(struct request *rq)
 	part_stat_lock();
 	part_stat_inc(part, ios[STAT_FLUSH]);
 	part_stat_add(part, nsecs[STAT_FLUSH],
-		      ktime_get_ns() - rq->start_time_ns);
+		      blk_time_get_ns() - rq->start_time_ns);
 	part_stat_unlock();
 }
 
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index c8beec6d7df0..e54b17261d96 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -829,7 +829,7 @@ static int ioc_autop_idx(struct ioc *ioc, struct gendisk *disk)
 
 	/* step up/down based on the vrate */
 	vrate_pct = div64_u64(ioc->vtime_base_rate * 100, VTIME_PER_USEC);
-	now_ns = ktime_get_ns();
+	now_ns = blk_time_get_ns();
 
 	if (p->too_fast_vrate_pct && p->too_fast_vrate_pct <= vrate_pct) {
 		if (!ioc->autop_too_fast_at)
@@ -1044,7 +1044,7 @@ static void ioc_now(struct ioc *ioc, struct ioc_now *now)
 	unsigned seq;
 	u64 vrate;
 
-	now->now_ns = ktime_get();
+	now->now_ns = blk_time_get_ns();
 	now->now = ktime_to_us(now->now_ns);
 	vrate = atomic64_read(&ioc->vtime_rate);
 
@@ -2810,7 +2810,7 @@ static void ioc_rqos_done(struct rq_qos *rqos, struct request *rq)
 		return;
 	}
 
-	on_q_ns = ktime_get_ns() - rq->alloc_time_ns;
+	on_q_ns = blk_time_get_ns() - rq->alloc_time_ns;
 	rq_wait_ns = rq->start_time_ns - rq->alloc_time_ns;
 	size_nsec = div64_u64(calc_size_vtime_cost(rq, ioc), VTIME_PER_NSEC);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index aa87fcfda1ec..aff9e9492f59 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -323,7 +323,7 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
 	RB_CLEAR_NODE(&rq->rb_node);
 	rq->tag = BLK_MQ_NO_TAG;
 	rq->internal_tag = BLK_MQ_NO_TAG;
-	rq->start_time_ns = ktime_get_ns();
+	rq->start_time_ns = blk_time_get_ns();
 	rq->part = NULL;
 	blk_crypto_rq_set_defaults(rq);
 }
@@ -333,7 +333,7 @@ EXPORT_SYMBOL(blk_rq_init);
 static inline void blk_mq_rq_time_init(struct request *rq, u64 alloc_time_ns)
 {
 	if (blk_mq_need_time_stamp(rq))
-		rq->start_time_ns = ktime_get_ns();
+		rq->start_time_ns = blk_time_get_ns();
 	else
 		rq->start_time_ns = 0;
 
@@ -444,7 +444,7 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 
 	/* alloc_time includes depth and tag waits */
 	if (blk_queue_rq_alloc_time(q))
-		alloc_time_ns = ktime_get_ns();
+		alloc_time_ns = blk_time_get_ns();
 
 	if (data->cmd_flags & REQ_NOWAIT)
 		data->flags |= BLK_MQ_REQ_NOWAIT;
@@ -629,7 +629,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 
 	/* alloc_time includes depth and tag waits */
 	if (blk_queue_rq_alloc_time(q))
-		alloc_time_ns = ktime_get_ns();
+		alloc_time_ns = blk_time_get_ns();
 
 	/*
 	 * If the tag allocator sleeps we could get an allocation for a
@@ -1042,7 +1042,7 @@ static inline void __blk_mq_end_request_acct(struct request *rq, u64 now)
 inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
 {
 	if (blk_mq_need_time_stamp(rq))
-		__blk_mq_end_request_acct(rq, ktime_get_ns());
+		__blk_mq_end_request_acct(rq, blk_time_get_ns());
 
 	blk_mq_finish_request(rq);
 
@@ -1085,7 +1085,7 @@ void blk_mq_end_request_batch(struct io_comp_batch *iob)
 	u64 now = 0;
 
 	if (iob->need_ts)
-		now = ktime_get_ns();
+		now = blk_time_get_ns();
 
 	while ((rq = rq_list_pop(&iob->req_list)) != NULL) {
 		prefetch(rq->bio);
@@ -1255,7 +1255,7 @@ void blk_mq_start_request(struct request *rq)
 
 	if (test_bit(QUEUE_FLAG_STATS, &q->queue_flags) &&
 	    !blk_rq_is_passthrough(rq)) {
-		rq->io_start_time_ns = ktime_get_ns();
+		rq->io_start_time_ns = blk_time_get_ns();
 		rq->stats_sectors = blk_rq_sectors(rq);
 		rq->rq_flags |= RQF_STATS;
 		rq_qos_issue(q, rq);
@@ -3107,7 +3107,7 @@ blk_status_t blk_insert_cloned_request(struct request *rq)
 	blk_mq_run_dispatch_ops(q,
 			ret = blk_mq_request_issue_directly(rq, true));
 	if (ret)
-		blk_account_io_done(rq, ktime_get_ns());
+		blk_account_io_done(rq, blk_time_get_ns());
 	return ret;
 }
 EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 16f5766620a4..da9dc1f793c3 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1815,7 +1815,7 @@ static bool throtl_tg_is_idle(struct throtl_grp *tg)
 	time = min_t(unsigned long, MAX_IDLE_TIME, 4 * tg->idletime_threshold);
 	ret = tg->latency_target == DFL_LATENCY_TARGET ||
 	      tg->idletime_threshold == DFL_IDLE_THRESHOLD ||
-	      (ktime_get_ns() >> 10) - tg->last_finish_time > time ||
+	      (blk_time_get_ns() >> 10) - tg->last_finish_time > time ||
 	      tg->avg_idletime > tg->idletime_threshold ||
 	      (tg->latency_target && tg->bio_cnt &&
 		tg->bad_bio_cnt * 5 < tg->bio_cnt);
@@ -2060,7 +2060,7 @@ static void blk_throtl_update_idletime(struct throtl_grp *tg)
 	if (last_finish_time == 0)
 		return;
 
-	now = ktime_get_ns() >> 10;
+	now = blk_time_get_ns() >> 10;
 	if (now <= last_finish_time ||
 	    last_finish_time == tg->checked_last_finish_time)
 		return;
@@ -2327,7 +2327,7 @@ void blk_throtl_bio_endio(struct bio *bio)
 	if (!tg->td->limit_valid[LIMIT_LOW])
 		return;
 
-	finish_time_ns = ktime_get_ns();
+	finish_time_ns = blk_time_get_ns();
 	tg->last_finish_time = finish_time_ns >> 10;
 
 	start_time = bio_issue_time(&bio->bi_issue) >> 10;
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 5ba3cd574eac..d5dcf44ef0de 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -274,13 +274,12 @@ static inline bool stat_sample_valid(struct blk_rq_stat *stat)
 
 static u64 rwb_sync_issue_lat(struct rq_wb *rwb)
 {
-	u64 now, issue = READ_ONCE(rwb->sync_issue);
+	u64 issue = READ_ONCE(rwb->sync_issue);
 
 	if (!issue || !rwb->sync_cookie)
 		return 0;
 
-	now = ktime_to_ns(ktime_get());
-	return now - issue;
+	return blk_time_to_ns() - issue;
 }
 
 static inline unsigned int wbt_inflight(struct rq_wb *rwb)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 99e4f5e72213..2f9ceea0e23b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -974,6 +974,11 @@ static inline void blk_flush_plug(struct blk_plug *plug, bool async)
 
 int blkdev_issue_flush(struct block_device *bdev);
 long nr_blockdev_pages(void);
+
+static inline u64 blk_time_get_ns(void)
+{
+	return ktime_get_ns();
+}
 #else /* CONFIG_BLOCK */
 struct blk_plug {
 };
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] block: cache current nsec time in struct blk_plug
  2024-01-15 21:53 [PATCHSET RFC 0/2] Cache issue side time querying Jens Axboe
  2024-01-15 21:53 ` [PATCH 1/2] block: add blk_time_get_ns() helper Jens Axboe
@ 2024-01-15 21:53 ` Jens Axboe
  2024-01-16  9:51   ` Kanchan Joshi
  2024-01-16 16:13   ` Keith Busch
  1 sibling, 2 replies; 11+ messages in thread
From: Jens Axboe @ 2024-01-15 21:53 UTC (permalink / raw)
  To: linux-block; +Cc: Jens Axboe

Querying the current time is the most costly thing we do in the block
layer per IO, and depending on kernel config settings, we may do it
many times per IO.

None of the callers actually need nsec granularity. Take advantage of
that by caching the current time in the plug, with the assumption here
being that any time checking will be temporally close enough that the
slight loss of precision doesn't matter.

If the block plug gets flushed, eg on preempt or schedule out, then
we invalidate the cached clock.

On a basic peak IOPS test case with iostats enabled, this changes
the performance from:

IOPS=108.41M, BW=52.93GiB/s, IOS/call=31/31
IOPS=108.43M, BW=52.94GiB/s, IOS/call=32/32
IOPS=108.29M, BW=52.88GiB/s, IOS/call=31/32
IOPS=108.35M, BW=52.91GiB/s, IOS/call=32/32
IOPS=108.42M, BW=52.94GiB/s, IOS/call=31/31
IOPS=108.40M, BW=52.93GiB/s, IOS/call=32/32
IOPS=108.31M, BW=52.89GiB/s, IOS/call=32/31

to

IOPS=115.57M, BW=56.43GiB/s, IOS/call=32/32
IOPS=115.61M, BW=56.45GiB/s, IOS/call=32/32
IOPS=115.54M, BW=56.41GiB/s, IOS/call=32/31
IOPS=115.51M, BW=56.40GiB/s, IOS/call=31/31
IOPS=115.59M, BW=56.44GiB/s, IOS/call=32/32
IOPS=115.08M, BW=56.19GiB/s, IOS/call=31/31

which is more than a 6% improvement in performance. Looking at perf diff,
we can see a huge reduction in time overhead:

    10.55%     -9.88%  [kernel.vmlinux]  [k] read_tsc
     1.31%     -1.22%  [kernel.vmlinux]  [k] ktime_get

Note that since this relies on blk_plug for the caching, it's only
applicable to the issue side. But this is where most of the time calls
happen anyway. It's also worth nothing that the above testing doesn't
enable any of the higher cost CPU items on the block layer side, like
wbt, cgroups, iocost, etc, which all would add additional time querying.
IOW, results would likely look even better in comparison with those
enabled, as distros would do.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c       |  1 +
 include/linux/blkdev.h | 11 ++++++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 11342af420d0..cc4db4d92c75 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1073,6 +1073,7 @@ void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
 	if (tsk->plug)
 		return;
 
+	plug->cur_ktime = 0;
 	plug->mq_list = NULL;
 	plug->cached_rq = NULL;
 	plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2f9ceea0e23b..23c237b22071 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -942,6 +942,7 @@ struct blk_plug {
 
 	/* if ios_left is > 1, we can batch tag/rq allocations */
 	struct request *cached_rq;
+	u64 cur_ktime;
 	unsigned short nr_ios;
 
 	unsigned short rq_count;
@@ -977,7 +978,15 @@ long nr_blockdev_pages(void);
 
 static inline u64 blk_time_get_ns(void)
 {
-	return ktime_get_ns();
+	struct blk_plug *plug = current->plug;
+
+	if (!plug)
+		return ktime_get_ns();
+	if (!(plug->cur_ktime & 1ULL)) {
+		plug->cur_ktime = ktime_get_ns();
+		plug->cur_ktime |= 1ULL;
+	}
+	return plug->cur_ktime;
 }
 #else /* CONFIG_BLOCK */
 struct blk_plug {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] block: add blk_time_get_ns() helper
  2024-01-15 21:53 ` [PATCH 1/2] block: add blk_time_get_ns() helper Jens Axboe
@ 2024-01-16  6:42   ` Yu Kuai
  2024-01-16 14:41     ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Yu Kuai @ 2024-01-16  6:42 UTC (permalink / raw)
  To: Jens Axboe, linux-block, yukuai (C)

Hi,

在 2024/01/16 5:53, Jens Axboe 写道:
>   
> @@ -2810,7 +2810,7 @@ static void ioc_rqos_done(struct rq_qos *rqos, struct request *rq)
>   		return;
>   	}
>   
> -	on_q_ns = ktime_get_ns() - rq->alloc_time_ns;
> +	on_q_ns = blk_time_get_ns() - rq->alloc_time_ns;

Just notice that this is from io completion path, and the same as
blk_account_io_done(), where plug does not exist. Hence ktime_get_ns()
will still be called multiple times after patch 2.

Do you think will this worth to be optimized? For example, add a new
field in request so that each rq completion will only call
ktime_get_ns() once. Furthermore, we can optimize io_tices precision
from jiffies to ns.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] block: cache current nsec time in struct blk_plug
  2024-01-15 21:53 ` [PATCH 2/2] block: cache current nsec time in struct blk_plug Jens Axboe
@ 2024-01-16  9:51   ` Kanchan Joshi
  2024-01-16 10:25     ` Johannes Thumshirn
  2024-01-16 14:43     ` Jens Axboe
  2024-01-16 16:13   ` Keith Busch
  1 sibling, 2 replies; 11+ messages in thread
From: Kanchan Joshi @ 2024-01-16  9:51 UTC (permalink / raw)
  To: Jens Axboe, linux-block

On 1/16/2024 3:23 AM, Jens Axboe wrote:

> diff --git a/block/blk-core.c b/block/blk-core.c
> index 11342af420d0..cc4db4d92c75 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1073,6 +1073,7 @@ void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
>   	if (tsk->plug)
>   		return;
>   
> +	plug->cur_ktime = 0;
>   	plug->mq_list = NULL;
>   	plug->cached_rq = NULL;
>   	plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 2f9ceea0e23b..23c237b22071 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -942,6 +942,7 @@ struct blk_plug {
>   
>   	/* if ios_left is > 1, we can batch tag/rq allocations */
>   	struct request *cached_rq;
> +	u64 cur_ktime;
>   	unsigned short nr_ios;
>   
>   	unsigned short rq_count;
> @@ -977,7 +978,15 @@ long nr_blockdev_pages(void);
>   
>   static inline u64 blk_time_get_ns(void)
>   {
> -	return ktime_get_ns();
> +	struct blk_plug *plug = current->plug;
> +
> +	if (!plug)
> +		return ktime_get_ns();
> +	if (!(plug->cur_ktime & 1ULL)) {
> +		plug->cur_ktime = ktime_get_ns();
> +		plug->cur_ktime |= 1ULL;
> +	}
> +	return plug->cur_ktime;

I did not understand the relevance of 1ULL here.
If ktime_get_ns() returns even value, it will turn that into an odd 
value before caching. And that value will be returned for the subsequent 
calls.
But how is that better compared to just caching whatever ktime_get_ns() 
returned.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] block: cache current nsec time in struct blk_plug
  2024-01-16  9:51   ` Kanchan Joshi
@ 2024-01-16 10:25     ` Johannes Thumshirn
  2024-01-16 10:47       ` Kanchan Joshi
  2024-01-16 14:43     ` Jens Axboe
  1 sibling, 1 reply; 11+ messages in thread
From: Johannes Thumshirn @ 2024-01-16 10:25 UTC (permalink / raw)
  To: Kanchan Joshi, Jens Axboe, linux-block

On 16.01.24 10:52, Kanchan Joshi wrote:
>> +	if (!plug)
>> +		return ktime_get_ns();
>> +	if (!(plug->cur_ktime & 1ULL)) {
>> +		plug->cur_ktime = ktime_get_ns();
>> +		plug->cur_ktime |= 1ULL;
>> +	}
>> +	return plug->cur_ktime;
> 
> I did not understand the relevance of 1ULL here.
> If ktime_get_ns() returns even value, it will turn that into an odd
> value before caching. And that value will be returned for the subsequent
> calls.
> But how is that better compared to just caching whatever ktime_get_ns()
> returned.
> 
> 

IIUC it's an indicator if plug->cur_ktime has been set or not.
But I don't understand why we can't compare with 0?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] block: cache current nsec time in struct blk_plug
  2024-01-16 10:25     ` Johannes Thumshirn
@ 2024-01-16 10:47       ` Kanchan Joshi
  0 siblings, 0 replies; 11+ messages in thread
From: Kanchan Joshi @ 2024-01-16 10:47 UTC (permalink / raw)
  To: Johannes Thumshirn, Jens Axboe, linux-block

On 1/16/2024 3:55 PM, Johannes Thumshirn wrote:
> On 16.01.24 10:52, Kanchan Joshi wrote:
>>> +	if (!plug)
>>> +		return ktime_get_ns();
>>> +	if (!(plug->cur_ktime & 1ULL)) {
>>> +		plug->cur_ktime = ktime_get_ns();
>>> +		plug->cur_ktime |= 1ULL;
>>> +	}
>>> +	return plug->cur_ktime;
>>
>> I did not understand the relevance of 1ULL here.
>> If ktime_get_ns() returns even value, it will turn that into an odd
>> value before caching. And that value will be returned for the subsequent
>> calls.
>> But how is that better compared to just caching whatever ktime_get_ns()
>> returned.
>>
>>
> 
> IIUC it's an indicator if plug->cur_ktime has been set or not.
> But I don't understand why we can't compare with 0?

Yes, that's what I meant by 'just caching whatever ktime_get_ns() returned'.
if (!plug->cur_ktime)
	plug->cur_ktime = ktime_get_ns();

An indicator should not alter the return (unless there is a reason).



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] block: add blk_time_get_ns() helper
  2024-01-16  6:42   ` Yu Kuai
@ 2024-01-16 14:41     ` Jens Axboe
  0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2024-01-16 14:41 UTC (permalink / raw)
  To: Yu Kuai, linux-block, yukuai (C)

On 1/15/24 11:42 PM, Yu Kuai wrote:
> Hi,
> 
> ? 2024/01/16 5:53, Jens Axboe ??:
>>   @@ -2810,7 +2810,7 @@ static void ioc_rqos_done(struct rq_qos *rqos, struct request *rq)
>>           return;
>>       }
>>   -    on_q_ns = ktime_get_ns() - rq->alloc_time_ns;
>> +    on_q_ns = blk_time_get_ns() - rq->alloc_time_ns;
> 
> Just notice that this is from io completion path, and the same as
> blk_account_io_done(), where plug does not exist. Hence ktime_get_ns()
> will still be called multiple times after patch 2.
> 
> Do you think will this worth to be optimized? For example, add a new
> field in request so that each rq completion will only call
> ktime_get_ns() once. Furthermore, we can optimize io_tices precision
> from jiffies to ns.

Right, as mentioned this is just an issue side optimization, my intent
is/was to convert all callers of ktime_get_ns() to use the internal one.
The completion side will just not get any amortization of the frequency
of time calls for now, but then it's prepared for that later on. That,
to me, was nicer than doing selective conversions as then it's not clear
if the two time sources could be compared.

It isn't complete yet, I just did the main components.

Putting it in the request is one way, ideally we'd have completion
batching via struct io_comp_batch for the completion side. At least that
was my plan.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] block: cache current nsec time in struct blk_plug
  2024-01-16  9:51   ` Kanchan Joshi
  2024-01-16 10:25     ` Johannes Thumshirn
@ 2024-01-16 14:43     ` Jens Axboe
  1 sibling, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2024-01-16 14:43 UTC (permalink / raw)
  To: Kanchan Joshi, linux-block

On 1/16/24 2:51 AM, Kanchan Joshi wrote:
> On 1/16/2024 3:23 AM, Jens Axboe wrote:
> 
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 11342af420d0..cc4db4d92c75 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1073,6 +1073,7 @@ void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
>>   	if (tsk->plug)
>>   		return;
>>   
>> +	plug->cur_ktime = 0;
>>   	plug->mq_list = NULL;
>>   	plug->cached_rq = NULL;
>>   	plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT);
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 2f9ceea0e23b..23c237b22071 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -942,6 +942,7 @@ struct blk_plug {
>>   
>>   	/* if ios_left is > 1, we can batch tag/rq allocations */
>>   	struct request *cached_rq;
>> +	u64 cur_ktime;
>>   	unsigned short nr_ios;
>>   
>>   	unsigned short rq_count;
>> @@ -977,7 +978,15 @@ long nr_blockdev_pages(void);
>>   
>>   static inline u64 blk_time_get_ns(void)
>>   {
>> -	return ktime_get_ns();
>> +	struct blk_plug *plug = current->plug;
>> +
>> +	if (!plug)
>> +		return ktime_get_ns();
>> +	if (!(plug->cur_ktime & 1ULL)) {
>> +		plug->cur_ktime = ktime_get_ns();
>> +		plug->cur_ktime |= 1ULL;
>> +	}
>> +	return plug->cur_ktime;
> 
> I did not understand the relevance of 1ULL here. If ktime_get_ns()
> returns even value, it will turn that into an odd value before
> caching.

Right, it's potentially round it up by 1 nsec.

> And that value will be returned for the subsequent calls. But
> how is that better compared to just caching whatever ktime_get_ns()
> returned.

0 could be a valid time. You could argue that this doesn't matter, we'd
just do an extra ktime_get_ns() in that case. And that's probably fine.
The LSB was meant to indicate "time stamp is valid".

But not that important imho, I'll either add a comment or just use 0 as
both the initializer (as it is now) and non-zero to indicate if the
timestamp is valid or not.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] block: cache current nsec time in struct blk_plug
  2024-01-15 21:53 ` [PATCH 2/2] block: cache current nsec time in struct blk_plug Jens Axboe
  2024-01-16  9:51   ` Kanchan Joshi
@ 2024-01-16 16:13   ` Keith Busch
  2024-01-16 16:18     ` Jens Axboe
  1 sibling, 1 reply; 11+ messages in thread
From: Keith Busch @ 2024-01-16 16:13 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Mon, Jan 15, 2024 at 02:53:55PM -0700, Jens Axboe wrote:
> If the block plug gets flushed, eg on preempt or schedule out, then
> we invalidate the cached clock.

There must be something implicitly happening that I am missing. Where is
the 'cur_time' cached clock invalidated on a plug flush?
 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 11342af420d0..cc4db4d92c75 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1073,6 +1073,7 @@ void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
>  	if (tsk->plug)
>  		return;
>  
> +	plug->cur_ktime = 0;
>  	plug->mq_list = NULL;
>  	plug->cached_rq = NULL;
>  	plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 2f9ceea0e23b..23c237b22071 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -942,6 +942,7 @@ struct blk_plug {
>  
>  	/* if ios_left is > 1, we can batch tag/rq allocations */
>  	struct request *cached_rq;
> +	u64 cur_ktime;
>  	unsigned short nr_ios;
>  
>  	unsigned short rq_count;
> @@ -977,7 +978,15 @@ long nr_blockdev_pages(void);
>  
>  static inline u64 blk_time_get_ns(void)
>  {
> -	return ktime_get_ns();
> +	struct blk_plug *plug = current->plug;
> +
> +	if (!plug)
> +		return ktime_get_ns();
> +	if (!(plug->cur_ktime & 1ULL)) {
> +		plug->cur_ktime = ktime_get_ns();
> +		plug->cur_ktime |= 1ULL;
> +	}
> +	return plug->cur_ktime;
>  }
>  #else /* CONFIG_BLOCK */
>  struct blk_plug {

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] block: cache current nsec time in struct blk_plug
  2024-01-16 16:13   ` Keith Busch
@ 2024-01-16 16:18     ` Jens Axboe
  0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2024-01-16 16:18 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block

On 1/16/24 9:13 AM, Keith Busch wrote:
> On Mon, Jan 15, 2024 at 02:53:55PM -0700, Jens Axboe wrote:
>> If the block plug gets flushed, eg on preempt or schedule out, then
>> we invalidate the cached clock.
> 
> There must be something implicitly happening that I am missing. Where is
> the 'cur_time' cached clock invalidated on a plug flush?

It's not just yet, and it's also missing on preemption. I have those in
my current tree. The current test doesn't hit any of these, so it didn't
impact the testing.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-01-16 16:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-15 21:53 [PATCHSET RFC 0/2] Cache issue side time querying Jens Axboe
2024-01-15 21:53 ` [PATCH 1/2] block: add blk_time_get_ns() helper Jens Axboe
2024-01-16  6:42   ` Yu Kuai
2024-01-16 14:41     ` Jens Axboe
2024-01-15 21:53 ` [PATCH 2/2] block: cache current nsec time in struct blk_plug Jens Axboe
2024-01-16  9:51   ` Kanchan Joshi
2024-01-16 10:25     ` Johannes Thumshirn
2024-01-16 10:47       ` Kanchan Joshi
2024-01-16 14:43     ` Jens Axboe
2024-01-16 16:13   ` Keith Busch
2024-01-16 16:18     ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.