From: Christoph Hellwig <hch@lst.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org,
linux-block@vger.kernel.org, hch@lst.de, jmoyer@redhat.com,
avi@scylladb.com, linux-api@vger.kernel.org,
linux-man@vger.kernel.org
Subject: Re: [PATCH 05/18] Add io_uring IO interface
Date: Mon, 28 Jan 2019 15:57:01 +0100 [thread overview]
Message-ID: <20190128145700.GA9795@lst.de> (raw)
In-Reply-To: <20190123153536.7081-6-axboe@kernel.dk>
[please make sure linux-api and linux-man are CCed on new syscalls
so that we get API experts to review them]
> io_uring_enter(fd, to_submit, min_complete, flags)
> Initiates IO against the rings mapped to this fd, or waits for
> them to complete, or both. The behavior is controlled by the
> parameters passed in. If 'to_submit' is non-zero, then we'll
> try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
> kernel will wait for 'min_complete' events, if they aren't
> already available. It's valid to set IORING_ENTER_GETEVENTS
> and 'min_complete' == 0 at the same time, this allows the
> kernel to return already completed events without waiting
> for them. This is useful only for polling, as for IRQ
> driven IO, the application can just check the CQ ring
> without entering the kernel.
Especially with poll support now in the series, don't we need a ѕigmask
argument similar to pselect/ppoll/io_pgetevents now to deal with signal
blocking during waiting for events?
> +struct sqe_submit {
> + const struct io_uring_sqe *sqe;
> + unsigned index;
> +};
Can you make sure all the structs use tab indentation for their
field names? Maybe even the same for all structs just to be nice
to my eyes?
> +static int io_import_iovec(struct io_ring_ctx *ctx, int rw,
> + const struct io_uring_sqe *sqe,
> + struct iovec **iovec, struct iov_iter *iter)
> +{
> + void __user *buf = u64_to_user_ptr(sqe->addr);
> +
> +#ifdef CONFIG_COMPAT
> + if (ctx->compat)
> + return compat_import_iovec(rw, buf, sqe->len, UIO_FASTIOV,
> + iovec, iter);
> +#endif
I think we can just check in_compat_syscall() here, which means we
can kill the ->compat member, and the separate compat version of the
setup syscall.
> +/*
> + * IORING_OP_NOP just posts a completion event, nothing else.
> + */
> +static int io_nop(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> +{
> + struct io_ring_ctx *ctx = req->ctx;
> +
> + __io_cqring_add_event(ctx, sqe->user_data, 0, 0);
Can you explain why not taking the completion lock is safe here? And
why we want to have such a somewhat dangerous special case just for the
no-op benchmarking aid?
> +static bool io_get_sqring(struct io_ring_ctx *ctx, struct sqe_submit *s)
> +{
> + struct io_sq_ring *ring = ctx->sq_ring;
> + unsigned head;
> +
> + head = ctx->cached_sq_head;
> + smp_rmb();
> + if (head == READ_ONCE(ring->r.tail))
> + return false;
Do we really need to optimize the sq_head == tail case so much? Or
am I missing why we are using the cached sq head case here? Maybe
add some more comments for a start.
> +static int __io_uring_enter(struct io_ring_ctx *ctx, unsigned to_submit,
> + unsigned min_complete, unsigned flags)
> +{
> + int ret = 0;
> +
> + if (to_submit) {
> + ret = io_ring_submit(ctx, to_submit);
> + if (ret < 0)
> + return ret;
> + }
> + if (flags & IORING_ENTER_GETEVENTS) {
> + int get_ret;
> +
> + if (!ret && to_submit)
> + min_complete = 0;
Why do we have this special case? Does it need some documentation?
> +
> + get_ret = io_cqring_wait(ctx, min_complete);
> + if (get_ret < 0 && !ret)
> + ret = get_ret;
> + }
> +
> + return ret;
Maybe using different names and slightly different semantics for the
return values would clear some of this up?
if (to_submit) {
submitted = io_ring_submit(ctx, to_submit);
if (submitted < 0)
return submitted;
}
if (flags & IORING_ENTER_GETEVENTS) {
...
ret = io_cqring_wait(ctx, min_complete);
}
return submitted ? submitted : ret;
> +static int io_sq_offload_start(struct io_ring_ctx *ctx)
> +static void io_sq_offload_stop(struct io_ring_ctx *ctx)
Can we just merge these two functions into the callers? Currently
the flow is a little odd with these helpers that don't seem to be
too clear about their responsibilities.
> +static void io_free_scq_urings(struct io_ring_ctx *ctx)
> +{
> + if (ctx->sq_ring) {
> + page_frag_free(ctx->sq_ring);
> + ctx->sq_ring = NULL;
> + }
> + if (ctx->sq_sqes) {
> + page_frag_free(ctx->sq_sqes);
> + ctx->sq_sqes = NULL;
> + }
> + if (ctx->cq_ring) {
> + page_frag_free(ctx->cq_ring);
> + ctx->cq_ring = NULL;
> + }
Why is this using the page_frag helpers? Also the callers just free
these ctx structure, so there isn't much of a point zeroing them out.
Also I'd be tempted to open code the freeing in io_allocate_scq_urings
instead of caling the helper, which would avoid the NULL checks and
make the error handling code a little more obvious.
> + if (mutex_trylock(&ctx->uring_lock)) {
> + ret = __io_uring_enter(ctx, to_submit, min_complete, flags);
do we even need the separate __io_uring_enter helper?
> +static void io_fill_offsets(struct io_uring_params *p)
Do we really need this as a separate helper?
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
next parent reply other threads:[~2019-01-28 14:57 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20190123153536.7081-1-axboe@kernel.dk>
[not found] ` <20190123153536.7081-6-axboe@kernel.dk>
2019-01-28 14:57 ` Christoph Hellwig [this message]
2019-01-28 16:26 ` [PATCH 05/18] Add io_uring IO interface Jens Axboe
2019-01-28 16:34 ` Christoph Hellwig
2019-01-28 19:32 ` Jens Axboe
2019-01-28 18:25 ` Jens Axboe
2019-01-29 6:30 ` Christoph Hellwig
2019-01-29 11:58 ` Arnd Bergmann
2019-01-29 15:20 ` Jens Axboe
2019-01-29 16:18 ` Arnd Bergmann
2019-01-29 16:19 ` Jens Axboe
2019-01-29 16:26 ` Arnd Bergmann
2019-01-29 16:28 ` Jens Axboe
2019-01-29 16:46 ` Arnd Bergmann
2019-01-29 0:47 ` Andy Lutomirski
2019-01-29 1:20 ` Jens Axboe
2019-01-29 6:45 ` Christoph Hellwig
2019-01-29 12:05 ` Arnd Bergmann
2019-01-31 5:11 ` Andy Lutomirski
2019-01-31 16:37 ` Jens Axboe
2019-01-28 21:35 [PATCHSET v8] " Jens Axboe
2019-01-28 21:35 ` [PATCH 05/18] Add " Jens Axboe
2019-01-28 21:53 ` Jeff Moyer
2019-01-28 21:56 ` Jens Axboe
2019-01-28 22:32 ` Jann Horn
2019-01-28 23:46 ` Jens Axboe
2019-01-28 23:59 ` Jann Horn
2019-01-29 0:03 ` Jens Axboe
2019-01-29 0:31 ` Jens Axboe
2019-01-29 0:34 ` Jann Horn
2019-01-29 0:55 ` Jens Axboe
2019-01-29 0:58 ` Jann Horn
2019-01-29 1:01 ` Jens Axboe
2019-02-01 16:57 ` Matt Mullins
2019-02-01 17:04 ` Jann Horn
2019-02-01 17:23 ` Jann Horn
2019-02-01 18:05 ` Al Viro
2019-01-29 1:07 ` Jann Horn
2019-01-29 2:21 ` Jann Horn
2019-01-29 2:54 ` Jens Axboe
2019-01-29 3:46 ` Jens Axboe
2019-01-29 15:56 ` Jann Horn
2019-01-29 16:06 ` Jens Axboe
2019-01-29 2:21 ` Jens Axboe
2019-01-29 1:29 ` Jann Horn
2019-01-29 1:31 ` Jens Axboe
2019-01-29 1:32 ` Jann Horn
2019-01-29 2:23 ` Jens Axboe
2019-01-29 7:12 ` Bert Wesarg
2019-01-29 12:12 ` Florian Weimer
2019-01-29 13:35 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190128145700.GA9795@lst.de \
--to=hch@lst.de \
--cc=avi@scylladb.com \
--cc=axboe@kernel.dk \
--cc=jmoyer@redhat.com \
--cc=linux-aio@kvack.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-man@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).