io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Stefan Metzmacher <metze@samba.org>,
	"Darrick J. Wong" <djwong@kernel.org>
Cc: io-uring@vger.kernel.org
Subject: Re: [PATCH 2/5] io_uring: add support for IORING_OP_URING_CMD
Date: Sat, 20 Feb 2021 07:50:58 -0700	[thread overview]
Message-ID: <187cbfad-0dfe-9a16-11f0-bfe18a6c7520@kernel.dk> (raw)
In-Reply-To: <81d6c90e-b853-27c0-c4b7-23605bd4abae@samba.org>

On 2/19/21 8:57 PM, Stefan Metzmacher wrote:
> Hi Jens,
> 
> Am 28.01.21 um 03:19 schrieb Jens Axboe:
>>>> Assuming that I got that right, that means that the pdu information
>>>> doesn't actually go all the way to the end of the sqe, which currently
>>>> is just a bunch of padding.  Was that intentional, or does this mean
>>>> that io_uring_pdu could actually be 8 bytes longer?
>>>
>>> Also correct. The reason is actually kind of stupid, and I think we
>>> should just fix that up. struct io_uring_cmd should fit within the first
>>> cacheline of io_kiocb, to avoid bloating that one. But with the members
>>> in there, it ends up being 8 bytes too big, if we grab those 8 bytes.
>>> What I think we should do is get rid of ->done, and just have drivers
>>> call io_uring_cmd_done() instead. We can provide an empty hook for that.
>>> Then we can reclaim the 8 bytes, and grow the io_uring_cmd to 56 bytes.
>>
>> Pushed out that version:
>>
>> https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-fops.v2
>>
>> which gives you the full 56 bytes for the payload command.
> 
> I think we only have 48 bytes for the payload.

You are right, it's 64b minus 8 for the head, and 8 for user_data.

> I've rebased and improved your io_uring-fops.v2 on top of your io_uring-worker.v3.

Heh, I did that myself yesterday too, when I was folding in two fixes!

> See https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/io_uring-fops

Had a quick look, and some good stuff in there. So thanks for that. One
question, though - if you look at mine, you'll see I moved the force_nonblock
to be using the issue_flags instead. You dropped that from the issue path,
we definitely need that to be able to punt the request if we're in the
nonblock/fast path.

> 
> I've changed the layout like this:
> 
> struct io_uring_sqe {
>         __u8    opcode;         /* type of operation for this sqe */
>         __u8    flags;          /* IOSQE_ flags */
>         union {
>                 __u16   ioprio;         /* ioprio for the request */
>                 __u16   cmd_personality; /* IORING_OP_URING_CMD */
>         };
>         __s32   fd;             /* file descriptor to do IO on */
>         union {
>                 __u64   off;    /* offset into file */
>                 __u64   addr2;
>                 __u64   cmd_user_data; /* IORING_OP_URING_CMD: data to be passed back at completion time */
>         };
>         union {
>                 __u64   addr;   /* pointer to buffer or iovecs */
>                 __u64   splice_off_in;
>                 __u64   cmd_pdu_start; /* IORING_OP_URING_CMD: this is the start for the remaining 48 bytes */
>         };
> 
> And then use:
> 
> struct io_uring_cmd_pdu {
>        __u64 data[6]; /* 48 bytes available for free use */
> };
> 
> So we effectively have this:
> 
> struct io_uring_cmd_sqe {
>         __u8    opcode;         /* type of operation for this sqe */
>         __u8    flags;          /* IOSQE_ flags */
>         __u16   cmd_personality; /* IORING_OP_URING_CMD */
>         __s32   fd;             /* file descriptor to do IO on */
>         __u64   cmd_user_data; /* IORING_OP_URING_CMD: data to be passed back at completion time */
>         union {
>                 __u64   cmd_pdu_start; /* IORING_OP_URING_CMD: this is the start for the remaining 48 bytes */
>                 struct io_uring_cmd_pdu cmd_pdu;
>         };
> }
> 
> I think it's saner to have a complete block of 48 bytes available for the payload
> and move personality and user_data to to top if opcode is IORING_OP_URING_CMD
> instead of having a hole that can't be touched.
> 
> I also finished the socket glue from struct file -> struct socket -> struct sock
> 
> I think it compiles, but I haven't done any tests.
> 
> What do you think?

I've been thinking along the same lines, because having a sparse sqe layout
for the uring cmd is a pain. I do think 'personality' is a bit too specific
to be part of the shared space, that should probably belong in the pdu
instead if the user needs it. One thing they all have in common is that they'd
need a sub-command, so why not make that u16 that?

There's also the option of simply saying that the uring_cmd sqe is just
a different type, ala:

struct io_uring_cmd_sqe {
	__u8	opcode;		/* IO_OP_URING_CMD */
	__u8	flags;
	__u16	target_op;
	__s32	fd;
	__u64	user_data;
	strut io_uring_cmd_pdu cmd_pdu;
};

which is essentially the same as your suggestion in terms of layout
(because that is the one that makes the most sense), we just don't try
and shoe-horn it into the existing sqe. As long as we overlap
opcode/flags, then init is fine. And past init, sqe is already consumed.

Haven't tried and wire that up yet, and it may just be that the simple
layout change you did is just easier to deal with. The important part
here is the layout, and I certainly think we should do that. There's
effectively 54 bytes of data there, if you include the target op and fd
as part of that space. 48 fully usable for whatever.

-- 
Jens Axboe


  reply	other threads:[~2021-02-20 14:51 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-27 21:25 [PATCHSET RFC 0/5] file_operations based io_uring commands Jens Axboe
2021-01-27 21:25 ` [PATCH 1/5] fs: add file_operations->uring_cmd() Jens Axboe
2021-01-27 21:25 ` [PATCH 2/5] io_uring: add support for IORING_OP_URING_CMD Jens Axboe
2021-01-28  0:38   ` Darrick J. Wong
2021-01-28  1:45     ` Jens Axboe
2021-01-28  2:19       ` Jens Axboe
2021-02-20  3:57         ` Stefan Metzmacher
2021-02-20 14:50           ` Jens Axboe [this message]
2021-02-20 16:45             ` Jens Axboe
2021-02-22 20:04               ` Stefan Metzmacher
2021-02-22 20:14                 ` Jens Axboe
2021-02-23  8:14                   ` Stefan Metzmacher
2021-02-23 13:21                     ` Pavel Begunkov
2021-01-27 21:25 ` [PATCH 3/5] block: wire up support for file_operations->uring_cmd() Jens Axboe
2021-01-27 21:25 ` [PATCH 4/5] block: add example ioctl Jens Axboe
2021-01-27 21:25 ` [PATCH 5/5] net: wire up support for file_operations->uring_cmd() Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=187cbfad-0dfe-9a16-11f0-bfe18a6c7520@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=djwong@kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=metze@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).