linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org,
	linux-block@vger.kernel.org, hch@lst.de, jmoyer@redhat.com,
	avi@scylladb.com, linux-api@vger.kernel.org,
	linux-man@vger.kernel.org
Subject: Re: [PATCH 05/18] Add io_uring IO interface
Date: Mon, 28 Jan 2019 15:57:01 +0100	[thread overview]
Message-ID: <20190128145700.GA9795@lst.de> (raw)
In-Reply-To: <20190123153536.7081-6-axboe@kernel.dk>

[please make sure linux-api and linux-man are CCed on new syscalls
so that we get API experts to review them]

> io_uring_enter(fd, to_submit, min_complete, flags)
> 	Initiates IO against the rings mapped to this fd, or waits for
> 	them to complete, or both. The behavior is controlled by the
> 	parameters passed in. If 'to_submit' is non-zero, then we'll
> 	try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
> 	kernel will wait for 'min_complete' events, if they aren't
> 	already available. It's valid to set IORING_ENTER_GETEVENTS
> 	and 'min_complete' == 0 at the same time, this allows the
> 	kernel to return already completed events without waiting
> 	for them. This is useful only for polling, as for IRQ
> 	driven IO, the application can just check the CQ ring
> 	without entering the kernel.

Especially with poll support now in the series, don't we need a ѕigmask
argument similar to pselect/ppoll/io_pgetevents now to deal with signal
blocking during waiting for events?

> +struct sqe_submit {
> +	const struct io_uring_sqe *sqe;
> +	unsigned index;
> +};

Can you make sure all the structs use tab indentation for their
field names?  Maybe even the same for all structs just to be nice
to my eyes?

> +static int io_import_iovec(struct io_ring_ctx *ctx, int rw,
> +			   const struct io_uring_sqe *sqe,
> +			   struct iovec **iovec, struct iov_iter *iter)
> +{
> +	void __user *buf = u64_to_user_ptr(sqe->addr);
> +
> +#ifdef CONFIG_COMPAT
> +	if (ctx->compat)
> +		return compat_import_iovec(rw, buf, sqe->len, UIO_FASTIOV,
> +						iovec, iter);
> +#endif

I think we can just check in_compat_syscall() here, which means we
can kill the ->compat member, and the separate compat version of the
setup syscall.

> +/*
> + * IORING_OP_NOP just posts a completion event, nothing else.
> + */
> +static int io_nop(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> +{
> +	struct io_ring_ctx *ctx = req->ctx;
> +
> +	__io_cqring_add_event(ctx, sqe->user_data, 0, 0);

Can you explain why not taking the completion lock is safe here?  And
why we want to have such a somewhat dangerous special case just for the
no-op benchmarking aid?

> +static bool io_get_sqring(struct io_ring_ctx *ctx, struct sqe_submit *s)
> +{
> +	struct io_sq_ring *ring = ctx->sq_ring;
> +	unsigned head;
> +
> +	head = ctx->cached_sq_head;
> +	smp_rmb();
> +	if (head == READ_ONCE(ring->r.tail))
> +		return false;

Do we really need to optimize the sq_head == tail case so much? Or
am I missing why we are using the cached sq head case here?  Maybe
add some more comments for a start.

> +static int __io_uring_enter(struct io_ring_ctx *ctx, unsigned to_submit,
> +			    unsigned min_complete, unsigned flags)
> +{
> +	int ret = 0;
> +
> +	if (to_submit) {
> +		ret = io_ring_submit(ctx, to_submit);
> +		if (ret < 0)
> +			return ret;
> +	}
> +	if (flags & IORING_ENTER_GETEVENTS) {
> +		int get_ret;
> +
> +		if (!ret && to_submit)
> +			min_complete = 0;

Why do we have this special case?  Does it need some documentation?

> +
> +		get_ret = io_cqring_wait(ctx, min_complete);
> +		if (get_ret < 0 && !ret)
> +			ret = get_ret;
> +	}
> +
> +	return ret;

Maybe using different names and slightly different semantics for the
return values would clear some of this up?

	if (to_submit) {
		submitted = io_ring_submit(ctx, to_submit);
		if (submitted < 0)
			return submitted;
	}
	if (flags & IORING_ENTER_GETEVENTS) {
		...
		ret = io_cqring_wait(ctx, min_complete);
	}

	return submitted ? submitted : ret;

> +static int io_sq_offload_start(struct io_ring_ctx *ctx)

> +static void io_sq_offload_stop(struct io_ring_ctx *ctx)

Can we just merge these two functions into the callers?  Currently
the flow is a little odd with these helpers that don't seem to be
too clear about their responsibilities.

> +static void io_free_scq_urings(struct io_ring_ctx *ctx)
> +{
> +	if (ctx->sq_ring) {
> +		page_frag_free(ctx->sq_ring);
> +		ctx->sq_ring = NULL;
> +	}
> +	if (ctx->sq_sqes) {
> +		page_frag_free(ctx->sq_sqes);
> +		ctx->sq_sqes = NULL;
> +	}
> +	if (ctx->cq_ring) {
> +		page_frag_free(ctx->cq_ring);
> +		ctx->cq_ring = NULL;
> +	}

Why is this using the page_frag helpers?  Also the callers just free
these ctx structure, so there isn't much of a point zeroing them out.

Also I'd be tempted to open code the freeing in io_allocate_scq_urings
instead of caling the helper, which would avoid the NULL checks and
make the error handling code a little more obvious.

> +	if (mutex_trylock(&ctx->uring_lock)) {
> +		ret = __io_uring_enter(ctx, to_submit, min_complete, flags);

do we even need the separate __io_uring_enter helper?

> +static void io_fill_offsets(struct io_uring_params *p)

Do we really need this as a separate helper?

  reply	other threads:[~2019-01-28 14:57 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-23 15:35 [PATCHSET v7] io_uring IO interface Jens Axboe
2019-01-23 15:35 ` [PATCH 01/18] fs: add an iopoll method to struct file_operations Jens Axboe
2019-01-28 14:25   ` Christoph Hellwig
2019-01-28 16:13     ` Jens Axboe
2019-01-23 15:35 ` [PATCH 02/18] block: wire up block device iopoll method Jens Axboe
2019-01-23 15:35 ` [PATCH 03/18] block: add bio_set_polled() helper Jens Axboe
2019-01-28 14:26   ` Christoph Hellwig
2019-01-23 15:35 ` [PATCH 04/18] iomap: wire up the iopoll method Jens Axboe
2019-01-23 15:35 ` [PATCH 05/18] Add io_uring IO interface Jens Axboe
2019-01-28 14:57   ` Christoph Hellwig [this message]
2019-01-28 16:26     ` Jens Axboe
2019-01-28 16:34       ` Christoph Hellwig
2019-01-28 19:32         ` Jens Axboe
2019-01-28 18:25     ` Jens Axboe
2019-01-29  6:30       ` Christoph Hellwig
2019-01-29 11:58         ` Arnd Bergmann
2019-01-29 15:20           ` Jens Axboe
2019-01-29 16:18             ` Arnd Bergmann
2019-01-29 16:19               ` Jens Axboe
2019-01-29 16:26                 ` Arnd Bergmann
2019-01-29 16:28                   ` Jens Axboe
2019-01-29 16:46                     ` Arnd Bergmann
2019-01-29  0:47     ` Andy Lutomirski
2019-01-29  1:20       ` Jens Axboe
2019-01-29  6:45         ` Christoph Hellwig
2019-01-29 12:05           ` Arnd Bergmann
2019-01-31  5:11         ` Andy Lutomirski
2019-01-31 16:37           ` Jens Axboe
2019-01-23 15:35 ` [PATCH 06/18] io_uring: add fsync support Jens Axboe
2019-01-23 15:35 ` [PATCH 07/13] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-23 15:35 ` [PATCH 07/18] io_uring: support for IO polling Jens Axboe
2019-01-28 15:02   ` Christoph Hellwig
2019-01-28 16:46     ` Jens Axboe
2019-01-29  6:27       ` Christoph Hellwig
2019-01-29 13:20         ` Jens Axboe
2019-01-23 15:35 ` [PATCH 08/18] fs: add fget_many() and fput_many() Jens Axboe
2019-01-28 14:29   ` Christoph Hellwig
2019-01-28 16:48     ` Jens Axboe
2019-01-23 15:35 ` [PATCH 08/13] io_uring: add file set registration Jens Axboe
2019-01-23 15:35 ` [PATCH 09/13] io_uring: add submission polling Jens Axboe
2019-01-28 15:09   ` Christoph Hellwig
2019-01-28 17:05     ` Jens Axboe
2019-01-29  6:29       ` Christoph Hellwig
2019-01-29 13:21         ` Jens Axboe
2019-01-28 21:13   ` Jeff Moyer
2019-01-28 21:28     ` Jens Axboe
2019-01-23 15:35 ` [PATCH 09/18] io_uring: use fget/fput_many() for file references Jens Axboe
2019-01-23 15:35 ` [PATCH 10/13] io_uring: add io_kiocb ref count Jens Axboe
2019-01-23 15:35 ` [PATCH 10/18] io_uring: batch io_kiocb allocation Jens Axboe
2019-01-23 15:35 ` [PATCH 11/18] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-01-28 14:31   ` Christoph Hellwig
2019-01-28 16:54     ` Jens Axboe
2019-01-23 15:35 ` [PATCH 11/13] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-01-23 15:35 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-23 15:35 ` [PATCH 12/13] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-01-23 15:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-23 15:35 ` [PATCH 13/13] io_uring: add io_uring_event cache hit information Jens Axboe
2019-01-23 15:35 ` [PATCH 14/18] io_uring: add submission polling Jens Axboe
2019-01-23 15:35 ` [PATCH 15/18] io_uring: add io_kiocb ref count Jens Axboe
2019-01-23 15:35 ` [PATCH 16/18] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-01-23 15:35 ` [PATCH 17/18] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-01-23 15:35 ` [PATCH 18/18] io_uring: add io_uring_event cache hit information Jens Axboe
     [not found] <20190128213538.13486-1-axboe@kernel.dk>
     [not found] ` <20190128213538.13486-6-axboe@kernel.dk>
     [not found]   ` <CAG48ez0vDqDH4ks7q4L3F+xt-4kVQrN1yw34QwFAmwQyy27FTw@mail.gmail.com>
     [not found]     ` <e9326a77-54c5-e2b8-d9e5-663261462597@kernel.dk>
     [not found]       ` <CAG48ez17NW0GJVRC6dFcHZTgQifFz5og1XCUbXkHKhr6f=j74Q@mail.gmail.com>
     [not found]         ` <05cb18f7a97a6151c305cdb7240c4abc995aed59.camel@fb.com>
     [not found]           ` <CAG48ez09iOOnPz83b6HxktYHTfouS2GD6i3PfQEjp8WCp+3-VA@mail.gmail.com>
2019-02-01 17:23             ` [PATCH 05/18] Add io_uring IO interface Jann Horn
2019-02-01 18:05               ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190128145700.GA9795@lst.de \
    --to=hch@lst.de \
    --cc=avi@scylladb.com \
    --cc=axboe@kernel.dk \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).