Am 22.02.21 um 21:14 schrieb Jens Axboe: > On 2/22/21 1:04 PM, Stefan Metzmacher wrote: >> Hi Jens, >> >>>> I've been thinking along the same lines, because having a sparse sqe layout >>>> for the uring cmd is a pain. I do think 'personality' is a bit too specific >>>> to be part of the shared space, that should probably belong in the pdu >>>> instead if the user needs it. One thing they all have in common is that they'd >>>> need a sub-command, so why not make that u16 that? >>>> >>>> There's also the option of simply saying that the uring_cmd sqe is just >>>> a different type, ala: >>>> >>>> struct io_uring_cmd_sqe { >>>> __u8 opcode; /* IO_OP_URING_CMD */ >>>> __u8 flags; >>>> __u16 target_op; >>>> __s32 fd; >>>> __u64 user_data; >>>> strut io_uring_cmd_pdu cmd_pdu; >>>> }; >>>> >>>> which is essentially the same as your suggestion in terms of layout >>>> (because that is the one that makes the most sense), we just don't try >>>> and shoe-horn it into the existing sqe. As long as we overlap >>>> opcode/flags, then init is fine. And past init, sqe is already consumed. >>>> >>>> Haven't tried and wire that up yet, and it may just be that the simple >>>> layout change you did is just easier to deal with. The important part >>>> here is the layout, and I certainly think we should do that. There's >>>> effectively 54 bytes of data there, if you include the target op and fd >>>> as part of that space. 48 fully usable for whatever. >>> >>> OK, folded in some of your stuff, and pushed out a new branch. Find it >>> here: >>> >>> https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-fops.v3 >>> >>> I did notice while doing so that you put the issue flags in the cmd, >>> I've made them external again. Just seems cleaner to me, otherwise >>> you'd have to modify the command for reissue rather than just >>> pass in the flags directly. >> >> I think the first two commits need more verbose comments, which clearly >> document the uring_cmd() API. > > Oh for sure, I just haven't gotten around to it yet :-) > >> Event before uring_cmd(), it's really not clear to me why we have >> 'enum io_uring_cmd_flags', as 'enum'. >> As it seems to be use it as 'flags' (IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER). > > They could be unsigned in too, but not really a big deal imho. > >> With uring_cmd() it's not clear what the backend is supposed to do >> with these flags. > > IO_URING_F_NONBLOCK tells the lower layers that the operation should be > non-blocking, and if that isn't possible, then it must return -EAGAIN. > If that happens, then the operation will be retried from a context where > IO_URING_F_NONBLOCK isn't set. > > IO_URING_F_COMPLETE_DEFER is just part of the flags that should be > passed to the completion side, the handler need not do anything else. > It's only used internally, but allows fast processing if the completion > occurs off the IO_URING_F_NONBLOCK path. > > It'll get documented... But the above is also why it should get passed > in, rather than stuffed in the command itself. Thanks! >> I'd assume that uring_cmd() would per definition never block and go >> async itself, by returning -EIOCBQUEUED. And a single &req->uring_cmd >> is only ever passed once to uring_cmd() without any retry. > > No, -EIOCBQUEUED would mean "operation is queued, I'll call the > completion callback for it later". That's what I meant with async here. > For example, you start the IO operation and you'll get a notification (eg IRQ) later on which allows > you to complete it. Yes, it's up to the implementation of uring_cmd() to do the processing and waiting in the background, based on timers, hardware events or whatever and finally call io_uring_cmd_done(). But with this: ret = file->f_op->uring_cmd(&req->uring_cmd, issue_flags); /* queued async, consumer will call io_uring_cmd_done() when complete */ if (ret == -EIOCBQUEUED) return 0; io_uring_cmd_done(&req->uring_cmd, ret); return 0; I don't see where -EAGAIN would trigger a retry in a io-wq worker context. Isn't -EAGAIN exposed to the cqe. Similar to ret == -EAGAIN && req->flags & REQ_F_NOWAIT. >> It's also not clear if IOSQE_ASYNC should have any impact. > > Handler doesn't need to care about that, it'll just mean that the > initial queue attempt will not have IO_URING_F_NONBLOCK set. Ok, because it's done from the io-wq worker, correct? >> I think we also need a way to pass IORING_OP_ASYNC_CANCEL down. > > Cancelation indeed needs some thought. There are a few options: > > 1) Request completes sync, obviously no cancelation needed here as the > request is never stuck in a state that requires cancelation. > > 2) Request needs blocking context, and hence an async handler is doing > it. The regular cancelation already works for that, nothing is needed > here. Would probably be better handled with a cancel handler. > > 3) uring_cmd handler returns -EIOCBQUEUED. This is the only case that > needs active cancelation support. Only case where that would > currently happen are things like block IO, where we don't support > cancelation to begin with (insert long rant on inadequate hw > support). > > So tldr here is that 1+2 is already there, and 3 not being fixed leaves > us no different than the existing support for cancelation. IOW, I don't > think this is an initial requirement, support can always be expanded > later. Given that you list 2) here again, I get the impression that the logic should be: ret = file->f_op->uring_cmd(&req->uring_cmd, issue_flags); /* reschedule in io-wq worker again */ if (ret == -EAGAIN) return ret; /* queued async, consumer will call io_uring_cmd_done() when complete */ if (ret == -EIOCBQUEUED) return 0; io_uring_cmd_done(&req->uring_cmd, ret); return 0; With that the above would make sense and seems to make the whole design more flexible for the uring_cmd implementers. However my primary use case would be using the -EIOCBQUEUED logic. And I think it would be good to have IORING_OP_ASYNC_CANCEL logic in place for that, as it would simplify the userspace logic to single io_uring_opcode_supported(IO_OP_URING_CMD). I also noticed that some sendmsg/recvmsg implementations support -EIOCBQUEUED, e.g. _aead_recvmsg(), I guess it would be nice to support that for IORING_OP_SENDMSG and IORING_OP_RECVMSG as well. It uses struct kiocb and iocb->ki_complete(). Would it make sense to also use struct kiocb and iocb->ki_complete() instead of a custom io_uring_cmd_done()? Maybe it would be possible to also have a common way to cancel an struct kiocb request... >>> Since we just need that one branch in req init, I do think that your >>> suggestion of just modifying io_uring_sqe is the way to go. So that's >>> what the above branch does. >> >> Thanks! I think it's much easier to handle the personality logic in >> the core only. >> >> For fixed files or fixed buffers I think helper functions like this: >> >> struct file *io_uring_cmd_get_file(struct io_uring_cmd *cmd, int fd, bool fixed); >> >> And similar functions for io_buffer_select or io_import_fixed. > > I did end up retaining that, at least in its current state it's like you > proposed. Only change is some packing on that very union, which should > not be necessary, but due to fun arm reasons it is. I noticed that thanks! Do you also think a io_uring_cmd_get_file() would be useful? My uring_cmd() implementation would need a 2nd struct file in order to do something similar to a splice operation. And it might be good to allow also fixed files to be used. Referencing fixed buffer may also be useful, I'm not 100% sure I'll need them, but it would be good to be flexible and prototype various solutions. >>> I tested the block side, and it works for getting the bs of the >>> device. That's all the testing that has been done so far :-) >> >> I've added EXPORT_SYMBOL(io_uring_cmd_done); and split your net patch, >> similar to the two block patches. So we can better isolate the core >> from the first consumers. >> >> See https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/io_uring-fops.v3 > > Great thanks, I'll take a look and fold back. I'll also expand those > commit messages :-) Thanks!