On 14/12/2019 21:52, Jens Axboe wrote: > On 12/14/19 10:56 AM, Pavel Begunkov wrote: >> >> On 14/12/2019 20:12, Jann Horn wrote: >>> On Sat, Dec 14, 2019 at 4:30 PM Pavel Begunkov wrote: >>>> This works almost like ioctl(2), except it doesn't support a bunch of >>>> common opcodes, (e.g. FIOCLEX and FIBMAP, see ioctl.c), and goes >>>> straight to a device specific implementation. >>>> >>>> The case in mind is dma-buf, drm and other ioctl-centric interfaces. >>>> >>>> Not-yet Signed-off-by: Pavel Begunkov >>>> --- >>>> >>>> It clearly needs some testing first, though works fine with dma-buf, >>>> but I'd like to discuss whether the use cases are convincing enough, >>>> and is it ok to desert some ioctl opcodes. For the last point it's >>>> fairly easy to add, maybe except three requiring fd (e.g. FIOCLEX) >>>> >>>> P.S. Probably, it won't benefit enough to consider using io_uring >>>> in drm/mesa, but anyway. >>> [...] >>>> +static int io_ioctl(struct io_kiocb *req, >>>> + struct io_kiocb **nxt, bool force_nonblock) >>>> +{ >>>> + const struct io_uring_sqe *sqe = req->sqe; >>>> + unsigned int cmd = READ_ONCE(sqe->ioctl_cmd); >>>> + unsigned long arg = READ_ONCE(sqe->ioctl_arg); >>>> + int ret; >>>> + >>>> + if (!req->file) >>>> + return -EBADF; >>>> + if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) >>>> + return -EINVAL; >>>> + if (unlikely(sqe->ioprio || sqe->addr || sqe->buf_index >>>> + || sqe->rw_flags)) >>>> + return -EINVAL; >>>> + if (force_nonblock) >>>> + return -EAGAIN; >>>> + >>>> + ret = security_file_ioctl(req->file, cmd, arg); >>>> + if (!ret) >>>> + ret = (int)vfs_ioctl(req->file, cmd, arg); >>> >>> This isn't going to work. For several of the syscalls that were added, >>> special care had to be taken to avoid bugs - like for RECVMSG, for the >>> upcoming OPEN/CLOSE stuff, and so on. >>> >>> And in principle, ioctls handlers can do pretty much all of the things >>> syscalls can do, and more. They can look at the caller's PID, they can >>> open and close (well, technically that's slightly unsafe, but IIRC >>> autofs does it anyway) things in the file descriptor table, they can >>> give another process access to the calling process in some way, and so >>> on. If you just allow calling arbitrary ioctls through io_uring, you >>> will certainly get bugs, and probably security bugs, too. >>> >>> Therefore, I would prefer to see this not happen at all; and if you do >>> have a usecase where you think the complexity is worth it, then I >>> think you'll have to add new infrastructure that allows each >>> file_operations instance to opt in to having specific ioctls called >>> via this mechanism, or something like that, and ensure that each of >>> the exposed ioctls only performs operations that are safe from uring >>> worker context. >> >> Sounds like hell of a problem. Thanks for sorting this out! > > While the ioctl approach is tempting, for the use cases where it makes > sense, I think we should just add a ioctl type opcode and have the > sub-opcode be somewhere else in the sqe. Because I do think there's > a large opportunity to expose a fast API that works with ioctl like > mechanisms. If we have > > IORING_OP_IOCTL > > and set aside an sqe field for the per-driver (or per-user) and > add a file_operations method for sending these to the fd, then we'll > have a much better (and faster + async) API than ioctls. We could > add fops->uring_issue() or something, and that passes the io_kiocb. > When it completes, the ->io_uring_issue() posts a completion by > calling io_uring_complete_req() or something. > > Outside of the issues that Jann outlined, ioctls are also such a > decade old mess that we have to do the -EAGAIN punt for all of them > like you did in your patch. If it's opt-in like ->uring_issue(), then > care could be taken to do this right and just have it return -EAGAIN > if it does need async context. Right. But there is an overhead within io_uring, small but still. IMHO, there won't be much merit unless utilising batching/async. From my perspective, to justify the work there should be such a user (or one should be created) with a prototype and performance numbers. Any ideas where to look? > > ret = fops->uring_issue(req, force_nonblock); > if (ret == -EAGAIN) { > ... usual punt ... > } > > I think working on this would be great, and some of the more performance > sensitive ioctl cases should flock to it. > -- Pavel Begunkov