All of lore.kernel.org
 help / color / mirror / Atom feed
* Deduplicate io_*_prep calls?
@ 2020-02-24  1:07 Andres Freund
  2020-02-24  3:17 ` Jens Axboe
  0 siblings, 1 reply; 22+ messages in thread
From: Andres Freund @ 2020-02-24  1:07 UTC (permalink / raw)
  To: io-uring

Hi,

There's currently two places that know to call io_*_prep() for
sqes. io_req_defer_prep() and io_issue_sqe(). E.g. for READV there's:

static int io_req_defer_prep(struct io_kiocb *req,
			     const struct io_uring_sqe *sqe)
...
	case IORING_OP_READV:
	case IORING_OP_READ_FIXED:
	case IORING_OP_READ:
		ret = io_read_prep(req, sqe, true);
		break;

and

static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
			struct io_kiocb **nxt, bool force_nonblock)
{
...
	case IORING_OP_READV:
	case IORING_OP_READ_FIXED:
	case IORING_OP_READ:
		if (sqe) {
			ret = io_read_prep(req, sqe, force_nonblock);
			if (ret < 0)
				break;
		}
		ret = io_read(req, nxt, force_nonblock);
		break;

that seems a bit unnecessary. How about breaking that out into a
separate function?  I can write up a patch, just didn't want to do so if
there's a reason for the current split.


Alternatively it'd could all be just be dispatches via io_op_defs, but
that'd be a bigger change with potential performance implications. And
it'd benefit from prior deduplication anyway.

Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24  1:07 Deduplicate io_*_prep calls? Andres Freund
@ 2020-02-24  3:17 ` Jens Axboe
  2020-02-24  3:33   ` Andres Freund
  0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2020-02-24  3:17 UTC (permalink / raw)
  To: Andres Freund, io-uring

On 2/23/20 6:07 PM, Andres Freund wrote:
> Hi,
> 
> There's currently two places that know to call io_*_prep() for
> sqes. io_req_defer_prep() and io_issue_sqe(). E.g. for READV there's:
> 
> static int io_req_defer_prep(struct io_kiocb *req,
> 			     const struct io_uring_sqe *sqe)
> ...
> 	case IORING_OP_READV:
> 	case IORING_OP_READ_FIXED:
> 	case IORING_OP_READ:
> 		ret = io_read_prep(req, sqe, true);
> 		break;
> 
> and
> 
> static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
> 			struct io_kiocb **nxt, bool force_nonblock)
> {
> ...
> 	case IORING_OP_READV:
> 	case IORING_OP_READ_FIXED:
> 	case IORING_OP_READ:
> 		if (sqe) {
> 			ret = io_read_prep(req, sqe, force_nonblock);
> 			if (ret < 0)
> 				break;
> 		}
> 		ret = io_read(req, nxt, force_nonblock);
> 		break;
> 
> that seems a bit unnecessary. How about breaking that out into a
> separate function?  I can write up a patch, just didn't want to do so if
> there's a reason for the current split.
> 
> 
> Alternatively it'd could all be just be dispatches via io_op_defs, but
> that'd be a bigger change with potential performance implications. And
> it'd benefit from prior deduplication anyway.

The reason for the split is that if we defer a request, it has to be
prepared up front. If the request has been deferred, then the
io_issue_sqe() invocation has sqe == NULL. Hence we only run the prep
handler once, and read the sqe just once.

This could of course be compacted with some indirect function calls, but
I didn't want to pay the overhead of doing so... The downside is that
the code is a bit bigger.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24  3:17 ` Jens Axboe
@ 2020-02-24  3:33   ` Andres Freund
  2020-02-24  3:52     ` Jens Axboe
  0 siblings, 1 reply; 22+ messages in thread
From: Andres Freund @ 2020-02-24  3:33 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring

Hi,

On 2020-02-23 20:17:45 -0700, Jens Axboe wrote:
> > that seems a bit unnecessary. How about breaking that out into a
> > separate function?  I can write up a patch, just didn't want to do so if
> > there's a reason for the current split.
> >
> >
> > Alternatively it'd could all be just be dispatches via io_op_defs, but
> > that'd be a bigger change with potential performance implications. And
> > it'd benefit from prior deduplication anyway.
>
> The reason for the split is that if we defer a request, it has to be
> prepared up front. If the request has been deferred, then the
> io_issue_sqe() invocation has sqe == NULL. Hence we only run the prep
> handler once, and read the sqe just once.

> This could of course be compacted with some indirect function calls, but
> I didn't want to pay the overhead of doing so... The downside is that
> the code is a bit bigger.

Shouldn't need indirect function calls? At most the switch() would be
duplicated, if the compiler can't optimize it away (ok, that's an
indirect jump...).  I was just thinking of moving the io_*_prep() switch
into something like io_prep_sqe().

io_req_defer_prep() would basically move its switch into io_prep_sqe
(but not touch the rest of its code). io_issue_sqe() would have

if (sqe) {
    ret = io_prep_sqe(req, sqe, force_nonblock);
    if (ret != 0)
        return ret;
}

at the start.

Even if the added switch can't be optimized away from io_issue_sqe(),
the code for all the branches inside the opcode cases isn't free
either...

Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24  3:33   ` Andres Freund
@ 2020-02-24  3:52     ` Jens Axboe
  2020-02-24  7:12       ` Andres Freund
  0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2020-02-24  3:52 UTC (permalink / raw)
  To: Andres Freund; +Cc: io-uring

On 2/23/20 8:33 PM, Andres Freund wrote:
> Hi,
> 
> On 2020-02-23 20:17:45 -0700, Jens Axboe wrote:
>>> that seems a bit unnecessary. How about breaking that out into a
>>> separate function?  I can write up a patch, just didn't want to do so if
>>> there's a reason for the current split.
>>>
>>>
>>> Alternatively it'd could all be just be dispatches via io_op_defs, but
>>> that'd be a bigger change with potential performance implications. And
>>> it'd benefit from prior deduplication anyway.
>>
>> The reason for the split is that if we defer a request, it has to be
>> prepared up front. If the request has been deferred, then the
>> io_issue_sqe() invocation has sqe == NULL. Hence we only run the prep
>> handler once, and read the sqe just once.
> 
>> This could of course be compacted with some indirect function calls, but
>> I didn't want to pay the overhead of doing so... The downside is that
>> the code is a bit bigger.
> 
> Shouldn't need indirect function calls? At most the switch() would be
> duplicated, if the compiler can't optimize it away (ok, that's an
> indirect jump...).  I was just thinking of moving the io_*_prep() switch
> into something like io_prep_sqe().
> 
> io_req_defer_prep() would basically move its switch into io_prep_sqe
> (but not touch the rest of its code). io_issue_sqe() would have
> 
> if (sqe) {
>     ret = io_prep_sqe(req, sqe, force_nonblock);
>     if (ret != 0)
>         return ret;
> }
> 
> at the start.
> 
> Even if the added switch can't be optimized away from io_issue_sqe(),
> the code for all the branches inside the opcode cases isn't free
> either...

The fast case is not being deferred, that's by far the common (and hot)
case, which means io_issue() is called with sqe != NULL. My worry is
that by moving it into a prep helper, the compiler isn't smart enough to
not make that basically two switches. Feel free to prove me wrong, I'd
love to reduce it ;-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24  3:52     ` Jens Axboe
@ 2020-02-24  7:12       ` Andres Freund
  2020-02-24  9:10         ` Pavel Begunkov
  2020-02-24 15:40         ` Jens Axboe
  0 siblings, 2 replies; 22+ messages in thread
From: Andres Freund @ 2020-02-24  7:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring

[-- Attachment #1: Type: text/plain, Size: 3729 bytes --]

Hi,

On 2020-02-23 20:52:26 -0700, Jens Axboe wrote:
> The fast case is not being deferred, that's by far the common (and hot)
> case, which means io_issue() is called with sqe != NULL. My worry is
> that by moving it into a prep helper, the compiler isn't smart enough to
> not make that basically two switches.

I'm not sure that benefit of a single switch isn't offset by the lower
code density due to the additional per-opcode branches.  Not inlining
the prepare function results in:

$ size fs/io_uring.o fs/io_uring.before.o
   text	   data	    bss	    dec	    hex	filename
  75383	   8237	      8	  83628	  146ac	fs/io_uring.o
  76959	   8237	      8	  85204	  14cd4	fs/io_uring.before.o

symbol size
-io_close_prep 0000000000000066
-io_connect_prep 0000000000000051
-io_epoll_ctl_prep 0000000000000051
-io_issue_sqe 0000000000001101
+io_issue_sqe 0000000000000de9
-io_openat2_prep 00000000000000ed
-io_openat_prep 0000000000000089
-io_poll_add_prep 0000000000000056
-io_prep_fsync 0000000000000053
-io_prep_sfr 000000000000004e
-io_read_prep 00000000000000ca
-io_recvmsg_prep 0000000000000079
-io_req_defer_prep 000000000000058e
+io_req_defer_prep 0000000000000160
+io_req_prep 0000000000000d26
-io_sendmsg_prep 000000000000006b
-io_statx_prep 00000000000000ed
-io_write_prep 00000000000000cd



> Feel free to prove me wrong, I'd love to reduce it ;-)

With a bit of handholding the compiler can deduplicate the switches. It
can't recognize on its own that req->opcode can't change between the
switch for prep and issue. Can be solved by moving the opcode into a
temporary variable. Also needs an inline for io_req_prep (not surpring,
it's a bit large).

That results in a bit bigger code. That's partially because of more
inlining:
   text	   data	    bss	    dec	    hex	filename
  78291	   8237	      8	  86536	  15208	fs/io_uring.o
  76959	   8237	      8	  85204	  14cd4	fs/io_uring.before.o

symbol size
+get_order 0000000000000015
-io_close_prep 0000000000000066
-io_connect_prep 0000000000000051
-io_epoll_ctl_prep 0000000000000051
-io_issue_sqe 0000000000001101
+io_issue_sqe 00000000000018fa
-io_openat2_prep 00000000000000ed
-io_openat_prep 0000000000000089
-io_poll_add_prep 0000000000000056
-io_prep_fsync 0000000000000053
-io_prep_sfr 000000000000004e
-io_read_prep 00000000000000ca
-io_recvmsg_prep 0000000000000079
-io_req_defer_prep 000000000000058e
+io_req_defer_prep 0000000000000f12
-io_sendmsg_prep 000000000000006b
-io_statx_prep 00000000000000ed
-io_write_prep 00000000000000cd


There's still some unnecessary branching on force_nonblocking. The
second patch just separates the cases needing force_nonblocking
out. Probably not quite the right structure.


Oddly enough gcc decides that io_queue_async_work() wouldn't be inlined
anymore after that. I'm quite doubtful it's a good candidate anyway?
Seems mighty complex, and not likely to win much. That's a noticable
win:
   text	   data	    bss	    dec	    hex	filename
  72857	   8141	      8	  81006	  13c6e	fs/io_uring.o
  76959	   8237	      8	  85204	  14cd4	fs/io_uring.before.o
--- /tmp/before.txt	2020-02-23 21:00:16.316753022 -0800
+++ /tmp/after.txt	2020-02-23 23:10:44.979496728 -0800
-io_commit_cqring 00000000000003ef
+io_commit_cqring 000000000000012c
+io_free_req 000000000000005e
-io_free_req 00000000000002ed
-io_issue_sqe 0000000000001101
+io_issue_sqe 0000000000000e86
-io_poll_remove_one 0000000000000308
+io_poll_remove_one 0000000000000074
-io_poll_wake 0000000000000498
+io_poll_wake 000000000000021c
+io_queue_async_work 00000000000002a0
-io_queue_sqe 00000000000008cc
+io_queue_sqe 0000000000000391


Not quite sure what the policy is with attaching POC patches? Also send
as separate emails?

Greetings,

Andres Freund

[-- Attachment #2: v1-0001-WIP-io_uring-Deduplicate-request-prep.patch --]
[-- Type: text/x-diff, Size: 7794 bytes --]

From edb629fc246ef146ad4e25bc51fd3f5db797b2be Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 23 Feb 2020 22:22:33 -0800
Subject: [PATCH v1 1/2] WIP: io_uring: Deduplicate request prep.

Signed-off-by: Andres Freund <andres@anarazel.de>
---
 fs/io_uring.c | 192 +++++++++++++-------------------------------------
 1 file changed, 49 insertions(+), 143 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index de650df9ac53..9a8fda8b28c9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4116,31 +4116,24 @@ static int io_files_update(struct io_kiocb *req, bool force_nonblock)
 	return 0;
 }
 
-static int io_req_defer_prep(struct io_kiocb *req,
-			     const struct io_uring_sqe *sqe)
+static inline int io_req_prep(u8 opcode, struct io_kiocb *req,
+			      const struct io_uring_sqe *sqe,
+			      bool force_nonblock)
 {
 	ssize_t ret = 0;
 
-	if (io_op_defs[req->opcode].file_table) {
-		ret = io_grab_files(req);
-		if (unlikely(ret))
-			return ret;
-	}
-
-	io_req_work_grab_env(req, &io_op_defs[req->opcode]);
-
-	switch (req->opcode) {
+	switch (opcode) {
 	case IORING_OP_NOP:
 		break;
 	case IORING_OP_READV:
 	case IORING_OP_READ_FIXED:
 	case IORING_OP_READ:
-		ret = io_read_prep(req, sqe, true);
+		ret = io_read_prep(req, sqe, force_nonblock);
 		break;
 	case IORING_OP_WRITEV:
 	case IORING_OP_WRITE_FIXED:
 	case IORING_OP_WRITE:
-		ret = io_write_prep(req, sqe, true);
+		ret = io_write_prep(req, sqe, force_nonblock);
 		break;
 	case IORING_OP_POLL_ADD:
 		ret = io_poll_add_prep(req, sqe);
@@ -4162,23 +4155,23 @@ static int io_req_defer_prep(struct io_kiocb *req,
 	case IORING_OP_RECV:
 		ret = io_recvmsg_prep(req, sqe);
 		break;
-	case IORING_OP_CONNECT:
-		ret = io_connect_prep(req, sqe);
-		break;
 	case IORING_OP_TIMEOUT:
 		ret = io_timeout_prep(req, sqe, false);
 		break;
 	case IORING_OP_TIMEOUT_REMOVE:
 		ret = io_timeout_remove_prep(req, sqe);
 		break;
+	case IORING_OP_ACCEPT:
+		ret = io_accept_prep(req, sqe);
+		break;
 	case IORING_OP_ASYNC_CANCEL:
 		ret = io_async_cancel_prep(req, sqe);
 		break;
 	case IORING_OP_LINK_TIMEOUT:
 		ret = io_timeout_prep(req, sqe, true);
 		break;
-	case IORING_OP_ACCEPT:
-		ret = io_accept_prep(req, sqe);
+	case IORING_OP_CONNECT:
+		ret = io_connect_prep(req, sqe);
 		break;
 	case IORING_OP_FALLOCATE:
 		ret = io_fallocate_prep(req, sqe);
@@ -4217,6 +4210,23 @@ static int io_req_defer_prep(struct io_kiocb *req,
 	return ret;
 }
 
+static int io_req_defer_prep(struct io_kiocb *req,
+			     const struct io_uring_sqe *sqe)
+{
+	ssize_t ret = 0;
+	u8 opcode = req->opcode;
+
+	if (io_op_defs[opcode].file_table) {
+		ret = io_grab_files(req);
+		if (unlikely(ret))
+			return ret;
+	}
+
+	io_req_work_grab_env(req, &io_op_defs[opcode]);
+
+	return io_req_prep(opcode, req, sqe, true);
+}
+
 static int io_req_defer(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_ring_ctx *ctx = req->ctx;
@@ -4278,198 +4288,94 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			struct io_kiocb **nxt, bool force_nonblock)
 {
 	struct io_ring_ctx *ctx = req->ctx;
+	/* allow compiler to infer opcode doesn't change */
+	u8 opcode = req->opcode;
 	int ret;
 
-	switch (req->opcode) {
+	if (sqe) {
+		ret = io_req_prep(opcode, req, sqe, force_nonblock);
+		if (ret)
+			return ret;
+	}
+
+	switch (opcode) {
 	case IORING_OP_NOP:
 		ret = io_nop(req);
 		break;
 	case IORING_OP_READV:
 	case IORING_OP_READ_FIXED:
 	case IORING_OP_READ:
-		if (sqe) {
-			ret = io_read_prep(req, sqe, force_nonblock);
-			if (ret < 0)
-				break;
-		}
 		ret = io_read(req, nxt, force_nonblock);
 		break;
 	case IORING_OP_WRITEV:
 	case IORING_OP_WRITE_FIXED:
 	case IORING_OP_WRITE:
-		if (sqe) {
-			ret = io_write_prep(req, sqe, force_nonblock);
-			if (ret < 0)
-				break;
-		}
 		ret = io_write(req, nxt, force_nonblock);
 		break;
-	case IORING_OP_FSYNC:
-		if (sqe) {
-			ret = io_prep_fsync(req, sqe);
-			if (ret < 0)
-				break;
-		}
-		ret = io_fsync(req, nxt, force_nonblock);
-		break;
 	case IORING_OP_POLL_ADD:
-		if (sqe) {
-			ret = io_poll_add_prep(req, sqe);
-			if (ret)
-				break;
-		}
 		ret = io_poll_add(req, nxt);
 		break;
 	case IORING_OP_POLL_REMOVE:
-		if (sqe) {
-			ret = io_poll_remove_prep(req, sqe);
-			if (ret < 0)
-				break;
-		}
 		ret = io_poll_remove(req);
 		break;
+	case IORING_OP_FSYNC:
+		ret = io_fsync(req, nxt, force_nonblock);
+		break;
 	case IORING_OP_SYNC_FILE_RANGE:
-		if (sqe) {
-			ret = io_prep_sfr(req, sqe);
-			if (ret < 0)
-				break;
-		}
 		ret = io_sync_file_range(req, nxt, force_nonblock);
 		break;
 	case IORING_OP_SENDMSG:
+		ret = io_sendmsg(req, nxt, force_nonblock);
+		break;
 	case IORING_OP_SEND:
-		if (sqe) {
-			ret = io_sendmsg_prep(req, sqe);
-			if (ret < 0)
-				break;
-		}
-		if (req->opcode == IORING_OP_SENDMSG)
-			ret = io_sendmsg(req, nxt, force_nonblock);
-		else
-			ret = io_send(req, nxt, force_nonblock);
+		ret = io_send(req, nxt, force_nonblock);
 		break;
 	case IORING_OP_RECVMSG:
+		ret = io_recvmsg(req, nxt, force_nonblock);
+		break;
 	case IORING_OP_RECV:
-		if (sqe) {
-			ret = io_recvmsg_prep(req, sqe);
-			if (ret)
-				break;
-		}
-		if (req->opcode == IORING_OP_RECVMSG)
-			ret = io_recvmsg(req, nxt, force_nonblock);
-		else
-			ret = io_recv(req, nxt, force_nonblock);
+		ret = io_recv(req, nxt, force_nonblock);
 		break;
 	case IORING_OP_TIMEOUT:
-		if (sqe) {
-			ret = io_timeout_prep(req, sqe, false);
-			if (ret)
-				break;
-		}
 		ret = io_timeout(req);
 		break;
 	case IORING_OP_TIMEOUT_REMOVE:
-		if (sqe) {
-			ret = io_timeout_remove_prep(req, sqe);
-			if (ret)
-				break;
-		}
 		ret = io_timeout_remove(req);
 		break;
 	case IORING_OP_ACCEPT:
-		if (sqe) {
-			ret = io_accept_prep(req, sqe);
-			if (ret)
-				break;
-		}
 		ret = io_accept(req, nxt, force_nonblock);
 		break;
-	case IORING_OP_CONNECT:
-		if (sqe) {
-			ret = io_connect_prep(req, sqe);
-			if (ret)
-				break;
-		}
-		ret = io_connect(req, nxt, force_nonblock);
-		break;
 	case IORING_OP_ASYNC_CANCEL:
-		if (sqe) {
-			ret = io_async_cancel_prep(req, sqe);
-			if (ret)
-				break;
-		}
 		ret = io_async_cancel(req, nxt);
 		break;
+	case IORING_OP_CONNECT:
+		ret = io_connect(req, nxt, force_nonblock);
+		break;
 	case IORING_OP_FALLOCATE:
-		if (sqe) {
-			ret = io_fallocate_prep(req, sqe);
-			if (ret)
-				break;
-		}
 		ret = io_fallocate(req, nxt, force_nonblock);
 		break;
 	case IORING_OP_OPENAT:
-		if (sqe) {
-			ret = io_openat_prep(req, sqe);
-			if (ret)
-				break;
-		}
 		ret = io_openat(req, nxt, force_nonblock);
 		break;
 	case IORING_OP_CLOSE:
-		if (sqe) {
-			ret = io_close_prep(req, sqe);
-			if (ret)
-				break;
-		}
 		ret = io_close(req, nxt, force_nonblock);
 		break;
 	case IORING_OP_FILES_UPDATE:
-		if (sqe) {
-			ret = io_files_update_prep(req, sqe);
-			if (ret)
-				break;
-		}
 		ret = io_files_update(req, force_nonblock);
 		break;
 	case IORING_OP_STATX:
-		if (sqe) {
-			ret = io_statx_prep(req, sqe);
-			if (ret)
-				break;
-		}
 		ret = io_statx(req, nxt, force_nonblock);
 		break;
 	case IORING_OP_FADVISE:
-		if (sqe) {
-			ret = io_fadvise_prep(req, sqe);
-			if (ret)
-				break;
-		}
 		ret = io_fadvise(req, nxt, force_nonblock);
 		break;
 	case IORING_OP_MADVISE:
-		if (sqe) {
-			ret = io_madvise_prep(req, sqe);
-			if (ret)
-				break;
-		}
 		ret = io_madvise(req, nxt, force_nonblock);
 		break;
 	case IORING_OP_OPENAT2:
-		if (sqe) {
-			ret = io_openat2_prep(req, sqe);
-			if (ret)
-				break;
-		}
 		ret = io_openat2(req, nxt, force_nonblock);
 		break;
 	case IORING_OP_EPOLL_CTL:
-		if (sqe) {
-			ret = io_epoll_ctl_prep(req, sqe);
-			if (ret)
-				break;
-		}
 		ret = io_epoll_ctl(req, nxt, force_nonblock);
 		break;
 	default:
-- 
2.25.0.114.g5b0ca878e0


[-- Attachment #3: v1-0002-WIP-io_uring-Separate-blocking-nonblocking-io_iss.patch --]
[-- Type: text/x-diff, Size: 2588 bytes --]

From 4efd092e07207d18b2f0fdbc6e68e93d5e7c93b0 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 23 Feb 2020 23:06:58 -0800
Subject: [PATCH v1 2/2] WIP: io_uring: Separate blocking/nonblocking
 io_issue_sqe cases.

Signed-off-by: Andres Freund <andres@anarazel.de>
---
 fs/io_uring.c | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 9a8fda8b28c9..b149ab57c5b4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4284,20 +4284,12 @@ static void io_cleanup_req(struct io_kiocb *req)
 	req->flags &= ~REQ_F_NEED_CLEANUP;
 }
 
-static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
+static inline int __io_issue_sqe(u8 opcode, struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			struct io_kiocb **nxt, bool force_nonblock)
 {
 	struct io_ring_ctx *ctx = req->ctx;
-	/* allow compiler to infer opcode doesn't change */
-	u8 opcode = req->opcode;
 	int ret;
 
-	if (sqe) {
-		ret = io_req_prep(opcode, req, sqe, force_nonblock);
-		if (ret)
-			return ret;
-	}
-
 	switch (opcode) {
 	case IORING_OP_NOP:
 		ret = io_nop(req);
@@ -4405,6 +4397,25 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	return 0;
 }
 
+static int io_prep_issue_sqe_nonblock(struct io_kiocb *req, const struct io_uring_sqe *sqe,
+				   struct io_kiocb **nxt)
+{
+	/* allow compiler to infer opcode doesn't change */
+	u8 opcode = req->opcode;
+	int ret;
+
+	ret = io_req_prep(opcode, req, sqe, true);
+	if (ret)
+		return ret;
+
+	return __io_issue_sqe(opcode, req, NULL, nxt, true);
+}
+
+static int io_issue_sqe_block(struct io_kiocb *req, struct io_kiocb **nxt)
+{
+	return __io_issue_sqe(req->opcode, req, NULL, nxt, false);
+}
+
 static void io_wq_submit_work(struct io_wq_work **workptr)
 {
 	struct io_wq_work *work = *workptr;
@@ -4421,7 +4432,7 @@ static void io_wq_submit_work(struct io_wq_work **workptr)
 	if (!ret) {
 		req->in_async = true;
 		do {
-			ret = io_issue_sqe(req, NULL, &nxt, false);
+			ret = io_issue_sqe_block(req, &nxt);
 			/*
 			 * We can get EAGAIN for polled IO even though we're
 			 * forcing a sync submission from here, since we can't
@@ -4616,7 +4627,7 @@ static void __io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 again:
 	linked_timeout = io_prep_linked_timeout(req);
 
-	ret = io_issue_sqe(req, sqe, &nxt, true);
+	ret = io_prep_issue_sqe_nonblock(req, sqe, &nxt);
 
 	/*
 	 * We async punt it if the file wasn't marked NOWAIT, or if the file
-- 
2.25.0.114.g5b0ca878e0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24  7:12       ` Andres Freund
@ 2020-02-24  9:10         ` Pavel Begunkov
  2020-02-24 15:40         ` Jens Axboe
  1 sibling, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2020-02-24  9:10 UTC (permalink / raw)
  To: Andres Freund, Jens Axboe; +Cc: io-uring

On 24/02/2020 10:12, Andres Freund wrote:
> Hi,
> 
> On 2020-02-23 20:52:26 -0700, Jens Axboe wrote:
>> The fast case is not being deferred, that's by far the common (and hot)
>> case, which means io_issue() is called with sqe != NULL. My worry is
>> that by moving it into a prep helper, the compiler isn't smart enough to
>> not make that basically two switches.
> 
> I'm not sure that benefit of a single switch isn't offset by the lower
> code density due to the additional per-opcode branches.  Not inlining
> the prepare function results in:
> 

The first looks good, I like the change. Do you have performance numbers?
e.g. tools/io_uring/io_uring-bench (do_nop=1, with high DEPTH e.g. 100)
would be good enough to estimate relative overhead.
I don't expect any difference, TBH.


> There's still some unnecessary branching on force_nonblocking. The
> second patch just separates the cases needing force_nonblocking
> out. Probably not quite the right structure.
> 

It's trickier there. It can get into io_prep_issue_sqe_nonblock() ->
io_req_prep() with sqe=NULL. With a glance look, it should crash.
The culprit is __io_queue_sqe() with linked requests.

Also, io_issue_sqe_nonblock() would look better than io_prep_issue_sqe_nonblock().

BTW, did you tried to run regression tests? It's under liburing repository.

> 
> Not quite sure what the policy is with attaching POC patches? Also send
> as separate emails?

I'd prefer it inlined (i.e. as text, not attachment), so it can be
inline-commented.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24  7:12       ` Andres Freund
  2020-02-24  9:10         ` Pavel Begunkov
@ 2020-02-24 15:40         ` Jens Axboe
  2020-02-24 15:44           ` Pavel Begunkov
  2020-02-24 16:53           ` Andres Freund
  1 sibling, 2 replies; 22+ messages in thread
From: Jens Axboe @ 2020-02-24 15:40 UTC (permalink / raw)
  To: Andres Freund; +Cc: io-uring

On 2/24/20 12:12 AM, Andres Freund wrote:
> Hi,
> 
> On 2020-02-23 20:52:26 -0700, Jens Axboe wrote:
>> The fast case is not being deferred, that's by far the common (and hot)
>> case, which means io_issue() is called with sqe != NULL. My worry is
>> that by moving it into a prep helper, the compiler isn't smart enough to
>> not make that basically two switches.
> 
> I'm not sure that benefit of a single switch isn't offset by the lower
> code density due to the additional per-opcode branches.  Not inlining
> the prepare function results in:
> 
> $ size fs/io_uring.o fs/io_uring.before.o
>    text	   data	    bss	    dec	    hex	filename
>   75383	   8237	      8	  83628	  146ac	fs/io_uring.o
>   76959	   8237	      8	  85204	  14cd4	fs/io_uring.before.o
> 
> symbol size
> -io_close_prep 0000000000000066
> -io_connect_prep 0000000000000051
> -io_epoll_ctl_prep 0000000000000051
> -io_issue_sqe 0000000000001101
> +io_issue_sqe 0000000000000de9
> -io_openat2_prep 00000000000000ed
> -io_openat_prep 0000000000000089
> -io_poll_add_prep 0000000000000056
> -io_prep_fsync 0000000000000053
> -io_prep_sfr 000000000000004e
> -io_read_prep 00000000000000ca
> -io_recvmsg_prep 0000000000000079
> -io_req_defer_prep 000000000000058e
> +io_req_defer_prep 0000000000000160
> +io_req_prep 0000000000000d26
> -io_sendmsg_prep 000000000000006b
> -io_statx_prep 00000000000000ed
> -io_write_prep 00000000000000cd
> 
> 
> 
>> Feel free to prove me wrong, I'd love to reduce it ;-)
> 
> With a bit of handholding the compiler can deduplicate the switches. It
> can't recognize on its own that req->opcode can't change between the
> switch for prep and issue. Can be solved by moving the opcode into a
> temporary variable. Also needs an inline for io_req_prep (not surpring,
> it's a bit large).
> 
> That results in a bit bigger code. That's partially because of more
> inlining:
>    text	   data	    bss	    dec	    hex	filename
>   78291	   8237	      8	  86536	  15208	fs/io_uring.o
>   76959	   8237	      8	  85204	  14cd4	fs/io_uring.before.o
> 
> symbol size
> +get_order 0000000000000015
> -io_close_prep 0000000000000066
> -io_connect_prep 0000000000000051
> -io_epoll_ctl_prep 0000000000000051
> -io_issue_sqe 0000000000001101
> +io_issue_sqe 00000000000018fa
> -io_openat2_prep 00000000000000ed
> -io_openat_prep 0000000000000089
> -io_poll_add_prep 0000000000000056
> -io_prep_fsync 0000000000000053
> -io_prep_sfr 000000000000004e
> -io_read_prep 00000000000000ca
> -io_recvmsg_prep 0000000000000079
> -io_req_defer_prep 000000000000058e
> +io_req_defer_prep 0000000000000f12
> -io_sendmsg_prep 000000000000006b
> -io_statx_prep 00000000000000ed
> -io_write_prep 00000000000000cd
> 
> 
> There's still some unnecessary branching on force_nonblocking. The
> second patch just separates the cases needing force_nonblocking
> out. Probably not quite the right structure.
> 
> 
> Oddly enough gcc decides that io_queue_async_work() wouldn't be inlined
> anymore after that. I'm quite doubtful it's a good candidate anyway?
> Seems mighty complex, and not likely to win much. That's a noticable
> win:
>    text	   data	    bss	    dec	    hex	filename
>   72857	   8141	      8	  81006	  13c6e	fs/io_uring.o
>   76959	   8237	      8	  85204	  14cd4	fs/io_uring.before.o
> --- /tmp/before.txt	2020-02-23 21:00:16.316753022 -0800
> +++ /tmp/after.txt	2020-02-23 23:10:44.979496728 -0800
> -io_commit_cqring 00000000000003ef
> +io_commit_cqring 000000000000012c
> +io_free_req 000000000000005e
> -io_free_req 00000000000002ed
> -io_issue_sqe 0000000000001101
> +io_issue_sqe 0000000000000e86
> -io_poll_remove_one 0000000000000308
> +io_poll_remove_one 0000000000000074
> -io_poll_wake 0000000000000498
> +io_poll_wake 000000000000021c
> +io_queue_async_work 00000000000002a0
> -io_queue_sqe 00000000000008cc
> +io_queue_sqe 0000000000000391

That's OK, it's slow path, I'd prefer it not to be inlined.

> Not quite sure what the policy is with attaching POC patches? Also send
> as separate emails?

Fine like this, though easier if you inline the patches so it's easier
to comment on them.

Agree that the first patch looks fine, though I don't quite see why
you want to pass in opcode as a separate argument as it's always
req->opcode. Seeing it separate makes me a bit nervous, thinking that
someone is reading it again from the sqe, or maybe not passing in
the right opcode for the given request. So that seems fragile and it
should go away.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24 15:40         ` Jens Axboe
@ 2020-02-24 15:44           ` Pavel Begunkov
  2020-02-24 15:46             ` Jens Axboe
  2020-02-24 16:53           ` Andres Freund
  1 sibling, 1 reply; 22+ messages in thread
From: Pavel Begunkov @ 2020-02-24 15:44 UTC (permalink / raw)
  To: Jens Axboe, Andres Freund; +Cc: io-uring


[-- Attachment #1.1: Type: text/plain, Size: 4939 bytes --]

On 24/02/2020 18:40, Jens Axboe wrote:
> On 2/24/20 12:12 AM, Andres Freund wrote:
>> Hi,
>>
>> On 2020-02-23 20:52:26 -0700, Jens Axboe wrote:
>>> The fast case is not being deferred, that's by far the common (and hot)
>>> case, which means io_issue() is called with sqe != NULL. My worry is
>>> that by moving it into a prep helper, the compiler isn't smart enough to
>>> not make that basically two switches.
>>
>> I'm not sure that benefit of a single switch isn't offset by the lower
>> code density due to the additional per-opcode branches.  Not inlining
>> the prepare function results in:
>>
>> $ size fs/io_uring.o fs/io_uring.before.o
>>    text	   data	    bss	    dec	    hex	filename
>>   75383	   8237	      8	  83628	  146ac	fs/io_uring.o
>>   76959	   8237	      8	  85204	  14cd4	fs/io_uring.before.o
>>
>> symbol size
>> -io_close_prep 0000000000000066
>> -io_connect_prep 0000000000000051
>> -io_epoll_ctl_prep 0000000000000051
>> -io_issue_sqe 0000000000001101
>> +io_issue_sqe 0000000000000de9
>> -io_openat2_prep 00000000000000ed
>> -io_openat_prep 0000000000000089
>> -io_poll_add_prep 0000000000000056
>> -io_prep_fsync 0000000000000053
>> -io_prep_sfr 000000000000004e
>> -io_read_prep 00000000000000ca
>> -io_recvmsg_prep 0000000000000079
>> -io_req_defer_prep 000000000000058e
>> +io_req_defer_prep 0000000000000160
>> +io_req_prep 0000000000000d26
>> -io_sendmsg_prep 000000000000006b
>> -io_statx_prep 00000000000000ed
>> -io_write_prep 00000000000000cd
>>
>>
>>
>>> Feel free to prove me wrong, I'd love to reduce it ;-)
>>
>> With a bit of handholding the compiler can deduplicate the switches. It
>> can't recognize on its own that req->opcode can't change between the
>> switch for prep and issue. Can be solved by moving the opcode into a
>> temporary variable. Also needs an inline for io_req_prep (not surpring,
>> it's a bit large).
>>
>> That results in a bit bigger code. That's partially because of more
>> inlining:
>>    text	   data	    bss	    dec	    hex	filename
>>   78291	   8237	      8	  86536	  15208	fs/io_uring.o
>>   76959	   8237	      8	  85204	  14cd4	fs/io_uring.before.o
>>
>> symbol size
>> +get_order 0000000000000015
>> -io_close_prep 0000000000000066
>> -io_connect_prep 0000000000000051
>> -io_epoll_ctl_prep 0000000000000051
>> -io_issue_sqe 0000000000001101
>> +io_issue_sqe 00000000000018fa
>> -io_openat2_prep 00000000000000ed
>> -io_openat_prep 0000000000000089
>> -io_poll_add_prep 0000000000000056
>> -io_prep_fsync 0000000000000053
>> -io_prep_sfr 000000000000004e
>> -io_read_prep 00000000000000ca
>> -io_recvmsg_prep 0000000000000079
>> -io_req_defer_prep 000000000000058e
>> +io_req_defer_prep 0000000000000f12
>> -io_sendmsg_prep 000000000000006b
>> -io_statx_prep 00000000000000ed
>> -io_write_prep 00000000000000cd
>>
>>
>> There's still some unnecessary branching on force_nonblocking. The
>> second patch just separates the cases needing force_nonblocking
>> out. Probably not quite the right structure.
>>
>>
>> Oddly enough gcc decides that io_queue_async_work() wouldn't be inlined
>> anymore after that. I'm quite doubtful it's a good candidate anyway?
>> Seems mighty complex, and not likely to win much. That's a noticable
>> win:
>>    text	   data	    bss	    dec	    hex	filename
>>   72857	   8141	      8	  81006	  13c6e	fs/io_uring.o
>>   76959	   8237	      8	  85204	  14cd4	fs/io_uring.before.o
>> --- /tmp/before.txt	2020-02-23 21:00:16.316753022 -0800
>> +++ /tmp/after.txt	2020-02-23 23:10:44.979496728 -0800
>> -io_commit_cqring 00000000000003ef
>> +io_commit_cqring 000000000000012c
>> +io_free_req 000000000000005e
>> -io_free_req 00000000000002ed
>> -io_issue_sqe 0000000000001101
>> +io_issue_sqe 0000000000000e86
>> -io_poll_remove_one 0000000000000308
>> +io_poll_remove_one 0000000000000074
>> -io_poll_wake 0000000000000498
>> +io_poll_wake 000000000000021c
>> +io_queue_async_work 00000000000002a0
>> -io_queue_sqe 00000000000008cc
>> +io_queue_sqe 0000000000000391
> 
> That's OK, it's slow path, I'd prefer it not to be inlined.
> 
>> Not quite sure what the policy is with attaching POC patches? Also send
>> as separate emails?
> 
> Fine like this, though easier if you inline the patches so it's easier
> to comment on them.
> 
> Agree that the first patch looks fine, though I don't quite see why
> you want to pass in opcode as a separate argument as it's always
> req->opcode. Seeing it separate makes me a bit nervous, thinking that
> someone is reading it again from the sqe, or maybe not passing in
> the right opcode for the given request. So that seems fragile and it
> should go away.

I suppose it's to hint a compiler, that opcode haven't been changed inside the
first switch. And any compiler I used breaks analysis there pretty easy.
Optimising C is such a pain...

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24 15:44           ` Pavel Begunkov
@ 2020-02-24 15:46             ` Jens Axboe
  2020-02-24 15:50               ` Pavel Begunkov
  0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2020-02-24 15:46 UTC (permalink / raw)
  To: Pavel Begunkov, Andres Freund; +Cc: io-uring

On 2/24/20 8:44 AM, Pavel Begunkov wrote:
>> Fine like this, though easier if you inline the patches so it's easier
>> to comment on them.
>>
>> Agree that the first patch looks fine, though I don't quite see why
>> you want to pass in opcode as a separate argument as it's always
>> req->opcode. Seeing it separate makes me a bit nervous, thinking that
>> someone is reading it again from the sqe, or maybe not passing in
>> the right opcode for the given request. So that seems fragile and it
>> should go away.
> 
> I suppose it's to hint a compiler, that opcode haven't been changed
> inside the first switch. And any compiler I used breaks analysis there
> pretty easy.  Optimising C is such a pain...

But if the choice is between confusion/fragility/performance vs obvious
and safe, then I'll go with the latter every time. We should definitely
not pass in req and opcode separately.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24 15:46             ` Jens Axboe
@ 2020-02-24 15:50               ` Pavel Begunkov
  2020-02-24 15:53                 ` Jens Axboe
  2020-02-25  9:26                 ` Pavel Begunkov
  0 siblings, 2 replies; 22+ messages in thread
From: Pavel Begunkov @ 2020-02-24 15:50 UTC (permalink / raw)
  To: Jens Axboe, Andres Freund; +Cc: io-uring


[-- Attachment #1.1: Type: text/plain, Size: 1153 bytes --]

On 24/02/2020 18:46, Jens Axboe wrote:
> On 2/24/20 8:44 AM, Pavel Begunkov wrote:
>>> Fine like this, though easier if you inline the patches so it's easier
>>> to comment on them.
>>>
>>> Agree that the first patch looks fine, though I don't quite see why
>>> you want to pass in opcode as a separate argument as it's always
>>> req->opcode. Seeing it separate makes me a bit nervous, thinking that
>>> someone is reading it again from the sqe, or maybe not passing in
>>> the right opcode for the given request. So that seems fragile and it
>>> should go away.
>>
>> I suppose it's to hint a compiler, that opcode haven't been changed
>> inside the first switch. And any compiler I used breaks analysis there
>> pretty easy.  Optimising C is such a pain...
> 
> But if the choice is between confusion/fragility/performance vs obvious
> and safe, then I'll go with the latter every time. We should definitely
> not pass in req and opcode separately.

Yep, and even better to go with the latter, and somehow hint, that it won't
change. Though, never found a way to do that. Have any tricks in a sleeve?


-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24 15:50               ` Pavel Begunkov
@ 2020-02-24 15:53                 ` Jens Axboe
  2020-02-24 15:56                   ` Pavel Begunkov
  2020-02-25  9:26                 ` Pavel Begunkov
  1 sibling, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2020-02-24 15:53 UTC (permalink / raw)
  To: Pavel Begunkov, Andres Freund; +Cc: io-uring

On 2/24/20 8:50 AM, Pavel Begunkov wrote:
> On 24/02/2020 18:46, Jens Axboe wrote:
>> On 2/24/20 8:44 AM, Pavel Begunkov wrote:
>>>> Fine like this, though easier if you inline the patches so it's easier
>>>> to comment on them.
>>>>
>>>> Agree that the first patch looks fine, though I don't quite see why
>>>> you want to pass in opcode as a separate argument as it's always
>>>> req->opcode. Seeing it separate makes me a bit nervous, thinking that
>>>> someone is reading it again from the sqe, or maybe not passing in
>>>> the right opcode for the given request. So that seems fragile and it
>>>> should go away.
>>>
>>> I suppose it's to hint a compiler, that opcode haven't been changed
>>> inside the first switch. And any compiler I used breaks analysis there
>>> pretty easy.  Optimising C is such a pain...
>>
>> But if the choice is between confusion/fragility/performance vs obvious
>> and safe, then I'll go with the latter every time. We should definitely
>> not pass in req and opcode separately.
> 
> Yep, and even better to go with the latter, and somehow hint, that it won't
> change. Though, never found a way to do that. Have any tricks in a sleeve?

We could make it const and just make the assignment a bit hackier... Apart
from that, don't have any tricks up my sleeve.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24 15:53                 ` Jens Axboe
@ 2020-02-24 15:56                   ` Pavel Begunkov
  2020-02-24 16:02                     ` Jens Axboe
  0 siblings, 1 reply; 22+ messages in thread
From: Pavel Begunkov @ 2020-02-24 15:56 UTC (permalink / raw)
  To: Jens Axboe, Andres Freund; +Cc: io-uring


[-- Attachment #1.1: Type: text/plain, Size: 1599 bytes --]

On 24/02/2020 18:53, Jens Axboe wrote:
> On 2/24/20 8:50 AM, Pavel Begunkov wrote:
>> On 24/02/2020 18:46, Jens Axboe wrote:
>>> On 2/24/20 8:44 AM, Pavel Begunkov wrote:
>>>>> Fine like this, though easier if you inline the patches so it's easier
>>>>> to comment on them.
>>>>>
>>>>> Agree that the first patch looks fine, though I don't quite see why
>>>>> you want to pass in opcode as a separate argument as it's always
>>>>> req->opcode. Seeing it separate makes me a bit nervous, thinking that
>>>>> someone is reading it again from the sqe, or maybe not passing in
>>>>> the right opcode for the given request. So that seems fragile and it
>>>>> should go away.
>>>>
>>>> I suppose it's to hint a compiler, that opcode haven't been changed
>>>> inside the first switch. And any compiler I used breaks analysis there
>>>> pretty easy.  Optimising C is such a pain...
>>>
>>> But if the choice is between confusion/fragility/performance vs obvious
>>> and safe, then I'll go with the latter every time. We should definitely
>>> not pass in req and opcode separately.
>>
>> Yep, and even better to go with the latter, and somehow hint, that it won't
>> change. Though, never found a way to do that. Have any tricks in a sleeve?
> 
> We could make it const and just make the assignment a bit hackier... Apart
> from that, don't have any tricks up my sleeve.

Usually doesn't work because of such possible "hackier assignments".
Ok, I have to go and experiment a bit. Anyway, it probably generates a lot of
useless stuff, e.g. for req->ctx

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24 15:56                   ` Pavel Begunkov
@ 2020-02-24 16:02                     ` Jens Axboe
  2020-02-24 16:18                       ` Pavel Begunkov
  0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2020-02-24 16:02 UTC (permalink / raw)
  To: Pavel Begunkov, Andres Freund; +Cc: io-uring

On 2/24/20 8:56 AM, Pavel Begunkov wrote:
> On 24/02/2020 18:53, Jens Axboe wrote:
>> On 2/24/20 8:50 AM, Pavel Begunkov wrote:
>>> On 24/02/2020 18:46, Jens Axboe wrote:
>>>> On 2/24/20 8:44 AM, Pavel Begunkov wrote:
>>>>>> Fine like this, though easier if you inline the patches so it's easier
>>>>>> to comment on them.
>>>>>>
>>>>>> Agree that the first patch looks fine, though I don't quite see why
>>>>>> you want to pass in opcode as a separate argument as it's always
>>>>>> req->opcode. Seeing it separate makes me a bit nervous, thinking that
>>>>>> someone is reading it again from the sqe, or maybe not passing in
>>>>>> the right opcode for the given request. So that seems fragile and it
>>>>>> should go away.
>>>>>
>>>>> I suppose it's to hint a compiler, that opcode haven't been changed
>>>>> inside the first switch. And any compiler I used breaks analysis there
>>>>> pretty easy.  Optimising C is such a pain...
>>>>
>>>> But if the choice is between confusion/fragility/performance vs obvious
>>>> and safe, then I'll go with the latter every time. We should definitely
>>>> not pass in req and opcode separately.
>>>
>>> Yep, and even better to go with the latter, and somehow hint, that it won't
>>> change. Though, never found a way to do that. Have any tricks in a sleeve?
>>
>> We could make it const and just make the assignment a bit hackier... Apart
>> from that, don't have any tricks up my sleeve.
> 
> Usually doesn't work because of such possible "hackier assignments".
> Ok, I have to go and experiment a bit. Anyway, it probably generates a lot of
> useless stuff, e.g. for req->ctx

Tried this, and it generates the same code...


diff --git a/fs/io_uring.c b/fs/io_uring.c
index ba8d4e2d9f99..8de5863aa749 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -598,7 +598,7 @@ struct io_kiocb {
 
 	struct io_async_ctx		*io;
 	bool				needs_fixed_file;
-	u8				opcode;
+	const u8			opcode;
 
 	struct io_ring_ctx	*ctx;
 	struct list_head	list;
@@ -5427,6 +5427,8 @@ static bool io_get_sqring(struct io_ring_ctx *ctx, struct io_kiocb *req,
 	 */
 	head = READ_ONCE(sq_array[ctx->cached_sq_head & ctx->sq_mask]);
 	if (likely(head < ctx->sq_entries)) {
+		u8 *op;
+
 		/*
 		 * All io need record the previous position, if LINK vs DARIN,
 		 * it can be used to mark the position of the first IO in the
@@ -5434,7 +5436,8 @@ static bool io_get_sqring(struct io_ring_ctx *ctx, struct io_kiocb *req,
 		 */
 		req->sequence = ctx->cached_sq_head;
 		*sqe_ptr = &ctx->sq_sqes[head];
-		req->opcode = READ_ONCE((*sqe_ptr)->opcode);
+		op = (void *) req + offsetof(struct io_kiocb, opcode);
+		*op = READ_ONCE((*sqe_ptr)->opcode);
 		req->user_data = READ_ONCE((*sqe_ptr)->user_data);
 		ctx->cached_sq_head++;
 		return true;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24 16:02                     ` Jens Axboe
@ 2020-02-24 16:18                       ` Pavel Begunkov
  2020-02-24 17:08                         ` Andres Freund
  0 siblings, 1 reply; 22+ messages in thread
From: Pavel Begunkov @ 2020-02-24 16:18 UTC (permalink / raw)
  To: Jens Axboe, Andres Freund; +Cc: io-uring


[-- Attachment #1.1: Type: text/plain, Size: 1721 bytes --]

On 24/02/2020 19:02, Jens Axboe wrote:
>> Usually doesn't work because of such possible "hackier assignments".
>> Ok, I have to go and experiment a bit. Anyway, it probably generates a lot of
>> useless stuff, e.g. for req->ctx
> 
> Tried this, and it generates the same code...

Maybe it wasn't able to optimise in the first place

E.g. for the following code any compiler generates 2 reads (thanks godbolt).

extern void foo(int);
int bar(const int *v)
{
    foo(*v);
    return *v;
}

> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index ba8d4e2d9f99..8de5863aa749 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -598,7 +598,7 @@ struct io_kiocb {
>  
>  	struct io_async_ctx		*io;
>  	bool				needs_fixed_file;
> -	u8				opcode;
> +	const u8			opcode;
>  
>  	struct io_ring_ctx	*ctx;
>  	struct list_head	list;
> @@ -5427,6 +5427,8 @@ static bool io_get_sqring(struct io_ring_ctx *ctx, struct io_kiocb *req,
>  	 */
>  	head = READ_ONCE(sq_array[ctx->cached_sq_head & ctx->sq_mask]);
>  	if (likely(head < ctx->sq_entries)) {
> +		u8 *op;
> +
>  		/*
>  		 * All io need record the previous position, if LINK vs DARIN,
>  		 * it can be used to mark the position of the first IO in the
> @@ -5434,7 +5436,8 @@ static bool io_get_sqring(struct io_ring_ctx *ctx, struct io_kiocb *req,
>  		 */
>  		req->sequence = ctx->cached_sq_head;
>  		*sqe_ptr = &ctx->sq_sqes[head];
> -		req->opcode = READ_ONCE((*sqe_ptr)->opcode);
> +		op = (void *) req + offsetof(struct io_kiocb, opcode);
> +		*op = READ_ONCE((*sqe_ptr)->opcode);
>  		req->user_data = READ_ONCE((*sqe_ptr)->user_data);
>  		ctx->cached_sq_head++;
>  		return true;
> 

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24 15:40         ` Jens Axboe
  2020-02-24 15:44           ` Pavel Begunkov
@ 2020-02-24 16:53           ` Andres Freund
  2020-02-24 17:19             ` Jens Axboe
  1 sibling, 1 reply; 22+ messages in thread
From: Andres Freund @ 2020-02-24 16:53 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring

Hi,

On 2020-02-24 08:40:16 -0700, Jens Axboe wrote:
> Agree that the first patch looks fine, though I don't quite see why
> you want to pass in opcode as a separate argument as it's always
> req->opcode. Seeing it separate makes me a bit nervous, thinking that
> someone is reading it again from the sqe, or maybe not passing in
> the right opcode for the given request. So that seems fragile and it
> should go away.

Without extracting it into an argument the compiler can't know that
io_kiocb->opcode doesn't change between the two switches - and therefore
is unable to merge the switches.

To my knowledge there's no easy and general way to avoid that in C,
unfortunately. const pointers etc aren't generally a workaround, even
they were applicable here - due to the potential for other pointers
existing, the compiler can't assume values don't change.  With
sufficient annotations of pointers with restrict, pure, etc. one can get
it there sometimes.

Another possibility is having a const copy of the struct on the stack,
because then the compiler often is able to deduce that the value
changing would be undefined behaviour.


I'm not sure that means it's worth going for the separate argument - I
was doing that mostly to address your concern about the duplicated
switch cost.

Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24 16:18                       ` Pavel Begunkov
@ 2020-02-24 17:08                         ` Andres Freund
  2020-02-24 17:16                           ` Pavel Begunkov
  0 siblings, 1 reply; 22+ messages in thread
From: Andres Freund @ 2020-02-24 17:08 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring

Hi,

On 2020-02-24 19:18:26 +0300, Pavel Begunkov wrote:
> On 24/02/2020 19:02, Jens Axboe wrote:
> >> Usually doesn't work because of such possible "hackier assignments".
> >> Ok, I have to go and experiment a bit. Anyway, it probably generates a lot of
> >> useless stuff, e.g. for req->ctx
> > 
> > Tried this, and it generates the same code...
> 
> Maybe it wasn't able to optimise in the first place
> 
> E.g. for the following code any compiler generates 2 reads (thanks godbolt).
> 
> extern void foo(int);
> int bar(const int *v)
> {
>     foo(*v);
>     return *v;
> }

Yea, the compiler really can't assume anything for this kind of
thing.
a) It's valid C to cast away the const here, as long as it's guaranteed
   that v isn't pointing to to actually const memory.
b) foo() could actually have access to *v without the argument,
   e.g. through a global.
and even in the case of a const member of a struct, as far as I know
it's legal to change the values, as long as the allocation isn't const.

Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24 17:08                         ` Andres Freund
@ 2020-02-24 17:16                           ` Pavel Begunkov
  0 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2020-02-24 17:16 UTC (permalink / raw)
  To: Andres Freund; +Cc: Jens Axboe, io-uring


[-- Attachment #1.1: Type: text/plain, Size: 1263 bytes --]

On 24/02/2020 20:08, Andres Freund wrote:
> Hi,
> 
> On 2020-02-24 19:18:26 +0300, Pavel Begunkov wrote:
>> On 24/02/2020 19:02, Jens Axboe wrote:
>>>> Usually doesn't work because of such possible "hackier assignments".
>>>> Ok, I have to go and experiment a bit. Anyway, it probably generates a lot of
>>>> useless stuff, e.g. for req->ctx
>>>
>>> Tried this, and it generates the same code...
>>
>> Maybe it wasn't able to optimise in the first place
>>
>> E.g. for the following code any compiler generates 2 reads (thanks godbolt).
>>
>> extern void foo(int);
>> int bar(const int *v)
>> {
>>     foo(*v);
>>     return *v;
>> }
> 
> Yea, the compiler really can't assume anything for this kind of
> thing.
> a) It's valid C to cast away the const here, as long as it's guaranteed
>    that v isn't pointing to to actually const memory.
> b) foo() could actually have access to *v without the argument,
>    e.g. through a global.
> and even in the case of a const member of a struct, as far as I know
> it's legal to change the values, as long as the allocation isn't const. 

Yep, regular stuff. And that's why I want to find a way to force compilers to
think otherwise. e.g. kind of __attribute__

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24 16:53           ` Andres Freund
@ 2020-02-24 17:19             ` Jens Axboe
  2020-02-24 17:30               ` Jens Axboe
  2020-02-24 17:37               ` Pavel Begunkov
  0 siblings, 2 replies; 22+ messages in thread
From: Jens Axboe @ 2020-02-24 17:19 UTC (permalink / raw)
  To: Andres Freund; +Cc: io-uring

On 2/24/20 9:53 AM, Andres Freund wrote:
> Hi,
> 
> On 2020-02-24 08:40:16 -0700, Jens Axboe wrote:
>> Agree that the first patch looks fine, though I don't quite see why
>> you want to pass in opcode as a separate argument as it's always
>> req->opcode. Seeing it separate makes me a bit nervous, thinking that
>> someone is reading it again from the sqe, or maybe not passing in
>> the right opcode for the given request. So that seems fragile and it
>> should go away.
> 
> Without extracting it into an argument the compiler can't know that
> io_kiocb->opcode doesn't change between the two switches - and therefore
> is unable to merge the switches.
> 
> To my knowledge there's no easy and general way to avoid that in C,
> unfortunately. const pointers etc aren't generally a workaround, even
> they were applicable here - due to the potential for other pointers
> existing, the compiler can't assume values don't change.  With
> sufficient annotations of pointers with restrict, pure, etc. one can get
> it there sometimes.
> 
> Another possibility is having a const copy of the struct on the stack,
> because then the compiler often is able to deduce that the value
> changing would be undefined behaviour.
> 
> 
> I'm not sure that means it's worth going for the separate argument - I
> was doing that mostly to address your concern about the duplicated
> switch cost.

Yeah I get that, but I don't think that's worth the pain. An alternative
solution might be to make the prep an indirect call, and just pair it
with some variant of INDIRECT_CALL(). This would be trivial, as the
arguments should be the same, and each call site knows exactly what
the function should be.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24 17:19             ` Jens Axboe
@ 2020-02-24 17:30               ` Jens Axboe
  2020-02-24 17:37               ` Pavel Begunkov
  1 sibling, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2020-02-24 17:30 UTC (permalink / raw)
  To: Andres Freund; +Cc: io-uring

On 2/24/20 10:19 AM, Jens Axboe wrote:
> On 2/24/20 9:53 AM, Andres Freund wrote:
>> Hi,
>>
>> On 2020-02-24 08:40:16 -0700, Jens Axboe wrote:
>>> Agree that the first patch looks fine, though I don't quite see why
>>> you want to pass in opcode as a separate argument as it's always
>>> req->opcode. Seeing it separate makes me a bit nervous, thinking that
>>> someone is reading it again from the sqe, or maybe not passing in
>>> the right opcode for the given request. So that seems fragile and it
>>> should go away.
>>
>> Without extracting it into an argument the compiler can't know that
>> io_kiocb->opcode doesn't change between the two switches - and therefore
>> is unable to merge the switches.
>>
>> To my knowledge there's no easy and general way to avoid that in C,
>> unfortunately. const pointers etc aren't generally a workaround, even
>> they were applicable here - due to the potential for other pointers
>> existing, the compiler can't assume values don't change.  With
>> sufficient annotations of pointers with restrict, pure, etc. one can get
>> it there sometimes.
>>
>> Another possibility is having a const copy of the struct on the stack,
>> because then the compiler often is able to deduce that the value
>> changing would be undefined behaviour.
>>
>>
>> I'm not sure that means it's worth going for the separate argument - I
>> was doing that mostly to address your concern about the duplicated
>> switch cost.
> 
> Yeah I get that, but I don't think that's worth the pain. An alternative
> solution might be to make the prep an indirect call, and just pair it
> with some variant of INDIRECT_CALL(). This would be trivial, as the
> arguments should be the same, and each call site knows exactly what
> the function should be.

I guess that won't work, as we'd still need it inside the switch then
and it sort of becomes a pointless exercise at that point...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24 17:19             ` Jens Axboe
  2020-02-24 17:30               ` Jens Axboe
@ 2020-02-24 17:37               ` Pavel Begunkov
  1 sibling, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2020-02-24 17:37 UTC (permalink / raw)
  To: Jens Axboe, Andres Freund; +Cc: io-uring


[-- Attachment #1.1: Type: text/plain, Size: 1998 bytes --]

On 24/02/2020 20:19, Jens Axboe wrote:
> On 2/24/20 9:53 AM, Andres Freund wrote:
>> Hi,
>>
>> On 2020-02-24 08:40:16 -0700, Jens Axboe wrote:
>>> Agree that the first patch looks fine, though I don't quite see why
>>> you want to pass in opcode as a separate argument as it's always
>>> req->opcode. Seeing it separate makes me a bit nervous, thinking that
>>> someone is reading it again from the sqe, or maybe not passing in
>>> the right opcode for the given request. So that seems fragile and it
>>> should go away.
>>
>> Without extracting it into an argument the compiler can't know that
>> io_kiocb->opcode doesn't change between the two switches - and therefore
>> is unable to merge the switches.
>>
>> To my knowledge there's no easy and general way to avoid that in C,
>> unfortunately. const pointers etc aren't generally a workaround, even
>> they were applicable here - due to the potential for other pointers
>> existing, the compiler can't assume values don't change.  With
>> sufficient annotations of pointers with restrict, pure, etc. one can get
>> it there sometimes.
>>
>> Another possibility is having a const copy of the struct on the stack,
>> because then the compiler often is able to deduce that the value
>> changing would be undefined behaviour.
>>
>>
>> I'm not sure that means it's worth going for the separate argument - I
>> was doing that mostly to address your concern about the duplicated
>> switch cost.
> 
> Yeah I get that, but I don't think that's worth the pain. An alternative
> solution might be to make the prep an indirect call, and just pair it
> with some variant of INDIRECT_CALL(). This would be trivial, as the
> arguments should be the same, and each call site knows exactly what
> the function should be.

And unfortunately, instead of a nice jump table or offset near-jump, it'll
generate a mess. Though, the attempt itself may prove useful by clarifying
handler/core/etc API.

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-24 15:50               ` Pavel Begunkov
  2020-02-24 15:53                 ` Jens Axboe
@ 2020-02-25  9:26                 ` Pavel Begunkov
  2020-02-27 21:06                   ` Andres Freund
  1 sibling, 1 reply; 22+ messages in thread
From: Pavel Begunkov @ 2020-02-25  9:26 UTC (permalink / raw)
  To: Jens Axboe, Andres Freund; +Cc: io-uring

On 2/24/2020 6:50 PM, Pavel Begunkov wrote:
> On 24/02/2020 18:46, Jens Axboe wrote:
>> On 2/24/20 8:44 AM, Pavel Begunkov wrote:
>>>> Fine like this, though easier if you inline the patches so it's easier
>>>> to comment on them.
>>>>
>>>> Agree that the first patch looks fine, though I don't quite see why
>>>> you want to pass in opcode as a separate argument as it's always
>>>> req->opcode. Seeing it separate makes me a bit nervous, thinking that
>>>> someone is reading it again from the sqe, or maybe not passing in
>>>> the right opcode for the given request. So that seems fragile and it
>>>> should go away.
>>>
>>> I suppose it's to hint a compiler, that opcode haven't been changed
>>> inside the first switch. And any compiler I used breaks analysis there
>>> pretty easy.  Optimising C is such a pain...
>>
>> But if the choice is between confusion/fragility/performance vs obvious
>> and safe, then I'll go with the latter every time. We should definitely
>> not pass in req and opcode separately.
> 
> Yep, and even better to go with the latter, and somehow hint, that it won't
> change. Though, never found a way to do that. Have any tricks in a sleeve?

It seems I have one. It can be done by using a const-attributed getter
function. And I see nothing against it in gcc manuals.

__attribute__((const))
static inline u8 io_get_opcode(struct io_kiocb *req)
{
    return req->opcode;
}


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Deduplicate io_*_prep calls?
  2020-02-25  9:26                 ` Pavel Begunkov
@ 2020-02-27 21:06                   ` Andres Freund
  0 siblings, 0 replies; 22+ messages in thread
From: Andres Freund @ 2020-02-27 21:06 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring

Hi,

On 2020-02-25 12:26:34 +0300, Pavel Begunkov wrote:
> On 2/24/2020 6:50 PM, Pavel Begunkov wrote:
> It seems I have one. It can be done by using a const-attributed getter
> function. And I see nothing against it in gcc manuals.
> 
> __attribute__((const))
> static inline u8 io_get_opcode(struct io_kiocb *req)
> {
>     return req->opcode;
> }

That's practically safe, I'd assume, but I'm not sure it's theoretically
sound. The gcc manual says:

> Note that a function that has pointer arguments and examines the data
> pointed to must not be declared const if the pointed-to data might
> change between successive invocations of the function. In general, since
> a function cannot distinguish data that might change from data that
> cannot, const functions should never take pointer or, in C++, reference
> arguments. Likewise, a function that calls a non-const function usually
> must not be const itself.

and since req might be reused, this seems to violate the point about
reading pointers that could change. Which certainly isn't the case
here. Theoretically the compiler could e.g. understand that fallback_req
or such should never change it's opcode.

Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-02-27 21:06 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-24  1:07 Deduplicate io_*_prep calls? Andres Freund
2020-02-24  3:17 ` Jens Axboe
2020-02-24  3:33   ` Andres Freund
2020-02-24  3:52     ` Jens Axboe
2020-02-24  7:12       ` Andres Freund
2020-02-24  9:10         ` Pavel Begunkov
2020-02-24 15:40         ` Jens Axboe
2020-02-24 15:44           ` Pavel Begunkov
2020-02-24 15:46             ` Jens Axboe
2020-02-24 15:50               ` Pavel Begunkov
2020-02-24 15:53                 ` Jens Axboe
2020-02-24 15:56                   ` Pavel Begunkov
2020-02-24 16:02                     ` Jens Axboe
2020-02-24 16:18                       ` Pavel Begunkov
2020-02-24 17:08                         ` Andres Freund
2020-02-24 17:16                           ` Pavel Begunkov
2020-02-25  9:26                 ` Pavel Begunkov
2020-02-27 21:06                   ` Andres Freund
2020-02-24 16:53           ` Andres Freund
2020-02-24 17:19             ` Jens Axboe
2020-02-24 17:30               ` Jens Axboe
2020-02-24 17:37               ` Pavel Begunkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.