All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org,
	linux-block@vger.kernel.org, hch@lst.de, jmoyer@redhat.com,
	avi@scylladb.com, linux-api@vger.kernel.org,
	linux-man@vger.kernel.org
Subject: Re: [PATCH 05/18] Add io_uring IO interface
Date: Mon, 28 Jan 2019 15:57:01 +0100	[thread overview]
Message-ID: <20190128145700.GA9795@lst.de> (raw)
In-Reply-To: <20190123153536.7081-6-axboe@kernel.dk>

[please make sure linux-api and linux-man are CCed on new syscalls
so that we get API experts to review them]

> io_uring_enter(fd, to_submit, min_complete, flags)
> 	Initiates IO against the rings mapped to this fd, or waits for
> 	them to complete, or both. The behavior is controlled by the
> 	parameters passed in. If 'to_submit' is non-zero, then we'll
> 	try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
> 	kernel will wait for 'min_complete' events, if they aren't
> 	already available. It's valid to set IORING_ENTER_GETEVENTS
> 	and 'min_complete' == 0 at the same time, this allows the
> 	kernel to return already completed events without waiting
> 	for them. This is useful only for polling, as for IRQ
> 	driven IO, the application can just check the CQ ring
> 	without entering the kernel.

Especially with poll support now in the series, don't we need a ѕigmask
argument similar to pselect/ppoll/io_pgetevents now to deal with signal
blocking during waiting for events?

> +struct sqe_submit {
> +	const struct io_uring_sqe *sqe;
> +	unsigned index;
> +};

Can you make sure all the structs use tab indentation for their
field names?  Maybe even the same for all structs just to be nice
to my eyes?

> +static int io_import_iovec(struct io_ring_ctx *ctx, int rw,
> +			   const struct io_uring_sqe *sqe,
> +			   struct iovec **iovec, struct iov_iter *iter)
> +{
> +	void __user *buf = u64_to_user_ptr(sqe->addr);
> +
> +#ifdef CONFIG_COMPAT
> +	if (ctx->compat)
> +		return compat_import_iovec(rw, buf, sqe->len, UIO_FASTIOV,
> +						iovec, iter);
> +#endif

I think we can just check in_compat_syscall() here, which means we
can kill the ->compat member, and the separate compat version of the
setup syscall.

> +/*
> + * IORING_OP_NOP just posts a completion event, nothing else.
> + */
> +static int io_nop(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> +{
> +	struct io_ring_ctx *ctx = req->ctx;
> +
> +	__io_cqring_add_event(ctx, sqe->user_data, 0, 0);

Can you explain why not taking the completion lock is safe here?  And
why we want to have such a somewhat dangerous special case just for the
no-op benchmarking aid?

> +static bool io_get_sqring(struct io_ring_ctx *ctx, struct sqe_submit *s)
> +{
> +	struct io_sq_ring *ring = ctx->sq_ring;
> +	unsigned head;
> +
> +	head = ctx->cached_sq_head;
> +	smp_rmb();
> +	if (head == READ_ONCE(ring->r.tail))
> +		return false;

Do we really need to optimize the sq_head == tail case so much? Or
am I missing why we are using the cached sq head case here?  Maybe
add some more comments for a start.

> +static int __io_uring_enter(struct io_ring_ctx *ctx, unsigned to_submit,
> +			    unsigned min_complete, unsigned flags)
> +{
> +	int ret = 0;
> +
> +	if (to_submit) {
> +		ret = io_ring_submit(ctx, to_submit);
> +		if (ret < 0)
> +			return ret;
> +	}
> +	if (flags & IORING_ENTER_GETEVENTS) {
> +		int get_ret;
> +
> +		if (!ret && to_submit)
> +			min_complete = 0;

Why do we have this special case?  Does it need some documentation?

> +
> +		get_ret = io_cqring_wait(ctx, min_complete);
> +		if (get_ret < 0 && !ret)
> +			ret = get_ret;
> +	}
> +
> +	return ret;

Maybe using different names and slightly different semantics for the
return values would clear some of this up?

	if (to_submit) {
		submitted = io_ring_submit(ctx, to_submit);
		if (submitted < 0)
			return submitted;
	}
	if (flags & IORING_ENTER_GETEVENTS) {
		...
		ret = io_cqring_wait(ctx, min_complete);
	}

	return submitted ? submitted : ret;

> +static int io_sq_offload_start(struct io_ring_ctx *ctx)

> +static void io_sq_offload_stop(struct io_ring_ctx *ctx)

Can we just merge these two functions into the callers?  Currently
the flow is a little odd with these helpers that don't seem to be
too clear about their responsibilities.

> +static void io_free_scq_urings(struct io_ring_ctx *ctx)
> +{
> +	if (ctx->sq_ring) {
> +		page_frag_free(ctx->sq_ring);
> +		ctx->sq_ring = NULL;
> +	}
> +	if (ctx->sq_sqes) {
> +		page_frag_free(ctx->sq_sqes);
> +		ctx->sq_sqes = NULL;
> +	}
> +	if (ctx->cq_ring) {
> +		page_frag_free(ctx->cq_ring);
> +		ctx->cq_ring = NULL;
> +	}

Why is this using the page_frag helpers?  Also the callers just free
these ctx structure, so there isn't much of a point zeroing them out.

Also I'd be tempted to open code the freeing in io_allocate_scq_urings
instead of caling the helper, which would avoid the NULL checks and
make the error handling code a little more obvious.

> +	if (mutex_trylock(&ctx->uring_lock)) {
> +		ret = __io_uring_enter(ctx, to_submit, min_complete, flags);

do we even need the separate __io_uring_enter helper?

> +static void io_fill_offsets(struct io_uring_params *p)

Do we really need this as a separate helper?

WARNING: multiple messages have this Message-ID (diff)
From: Christoph Hellwig <hch@lst.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org,
	linux-block@vger.kernel.org, hch@lst.de, jmoyer@redhat.com,
	avi@scylladb.com, linux-api@vger.kernel.org,
	linux-man@vger.kernel.org
Subject: Re: [PATCH 05/18] Add io_uring IO interface
Date: Mon, 28 Jan 2019 15:57:01 +0100	[thread overview]
Message-ID: <20190128145700.GA9795@lst.de> (raw)
In-Reply-To: <20190123153536.7081-6-axboe@kernel.dk>

[please make sure linux-api and linux-man are CCed on new syscalls
so that we get API experts to review them]

> io_uring_enter(fd, to_submit, min_complete, flags)
> 	Initiates IO against the rings mapped to this fd, or waits for
> 	them to complete, or both. The behavior is controlled by the
> 	parameters passed in. If 'to_submit' is non-zero, then we'll
> 	try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
> 	kernel will wait for 'min_complete' events, if they aren't
> 	already available. It's valid to set IORING_ENTER_GETEVENTS
> 	and 'min_complete' == 0 at the same time, this allows the
> 	kernel to return already completed events without waiting
> 	for them. This is useful only for polling, as for IRQ
> 	driven IO, the application can just check the CQ ring
> 	without entering the kernel.

Especially with poll support now in the series, don't we need a ѕigmask
argument similar to pselect/ppoll/io_pgetevents now to deal with signal
blocking during waiting for events?

> +struct sqe_submit {
> +	const struct io_uring_sqe *sqe;
> +	unsigned index;
> +};

Can you make sure all the structs use tab indentation for their
field names?  Maybe even the same for all structs just to be nice
to my eyes?

> +static int io_import_iovec(struct io_ring_ctx *ctx, int rw,
> +			   const struct io_uring_sqe *sqe,
> +			   struct iovec **iovec, struct iov_iter *iter)
> +{
> +	void __user *buf = u64_to_user_ptr(sqe->addr);
> +
> +#ifdef CONFIG_COMPAT
> +	if (ctx->compat)
> +		return compat_import_iovec(rw, buf, sqe->len, UIO_FASTIOV,
> +						iovec, iter);
> +#endif

I think we can just check in_compat_syscall() here, which means we
can kill the ->compat member, and the separate compat version of the
setup syscall.

> +/*
> + * IORING_OP_NOP just posts a completion event, nothing else.
> + */
> +static int io_nop(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> +{
> +	struct io_ring_ctx *ctx = req->ctx;
> +
> +	__io_cqring_add_event(ctx, sqe->user_data, 0, 0);

Can you explain why not taking the completion lock is safe here?  And
why we want to have such a somewhat dangerous special case just for the
no-op benchmarking aid?

> +static bool io_get_sqring(struct io_ring_ctx *ctx, struct sqe_submit *s)
> +{
> +	struct io_sq_ring *ring = ctx->sq_ring;
> +	unsigned head;
> +
> +	head = ctx->cached_sq_head;
> +	smp_rmb();
> +	if (head == READ_ONCE(ring->r.tail))
> +		return false;

Do we really need to optimize the sq_head == tail case so much? Or
am I missing why we are using the cached sq head case here?  Maybe
add some more comments for a start.

> +static int __io_uring_enter(struct io_ring_ctx *ctx, unsigned to_submit,
> +			    unsigned min_complete, unsigned flags)
> +{
> +	int ret = 0;
> +
> +	if (to_submit) {
> +		ret = io_ring_submit(ctx, to_submit);
> +		if (ret < 0)
> +			return ret;
> +	}
> +	if (flags & IORING_ENTER_GETEVENTS) {
> +		int get_ret;
> +
> +		if (!ret && to_submit)
> +			min_complete = 0;

Why do we have this special case?  Does it need some documentation?

> +
> +		get_ret = io_cqring_wait(ctx, min_complete);
> +		if (get_ret < 0 && !ret)
> +			ret = get_ret;
> +	}
> +
> +	return ret;

Maybe using different names and slightly different semantics for the
return values would clear some of this up?

	if (to_submit) {
		submitted = io_ring_submit(ctx, to_submit);
		if (submitted < 0)
			return submitted;
	}
	if (flags & IORING_ENTER_GETEVENTS) {
		...
		ret = io_cqring_wait(ctx, min_complete);
	}

	return submitted ? submitted : ret;

> +static int io_sq_offload_start(struct io_ring_ctx *ctx)

> +static void io_sq_offload_stop(struct io_ring_ctx *ctx)

Can we just merge these two functions into the callers?  Currently
the flow is a little odd with these helpers that don't seem to be
too clear about their responsibilities.

> +static void io_free_scq_urings(struct io_ring_ctx *ctx)
> +{
> +	if (ctx->sq_ring) {
> +		page_frag_free(ctx->sq_ring);
> +		ctx->sq_ring = NULL;
> +	}
> +	if (ctx->sq_sqes) {
> +		page_frag_free(ctx->sq_sqes);
> +		ctx->sq_sqes = NULL;
> +	}
> +	if (ctx->cq_ring) {
> +		page_frag_free(ctx->cq_ring);
> +		ctx->cq_ring = NULL;
> +	}

Why is this using the page_frag helpers?  Also the callers just free
these ctx structure, so there isn't much of a point zeroing them out.

Also I'd be tempted to open code the freeing in io_allocate_scq_urings
instead of caling the helper, which would avoid the NULL checks and
make the error handling code a little more obvious.

> +	if (mutex_trylock(&ctx->uring_lock)) {
> +		ret = __io_uring_enter(ctx, to_submit, min_complete, flags);

do we even need the separate __io_uring_enter helper?

> +static void io_fill_offsets(struct io_uring_params *p)

Do we really need this as a separate helper?

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

  reply	other threads:[~2019-01-28 14:57 UTC|newest]

Thread overview: 175+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-23 15:35 [PATCHSET v7] io_uring IO interface Jens Axboe
2019-01-23 15:35 ` [PATCH 01/18] fs: add an iopoll method to struct file_operations Jens Axboe
2019-01-28 14:25   ` Christoph Hellwig
2019-01-28 16:13     ` Jens Axboe
2019-01-23 15:35 ` [PATCH 02/18] block: wire up block device iopoll method Jens Axboe
2019-01-23 15:35 ` [PATCH 03/18] block: add bio_set_polled() helper Jens Axboe
2019-01-28 14:26   ` Christoph Hellwig
2019-01-23 15:35 ` [PATCH 04/18] iomap: wire up the iopoll method Jens Axboe
2019-01-23 15:35 ` [PATCH 05/18] Add io_uring IO interface Jens Axboe
2019-01-28 14:57   ` Christoph Hellwig [this message]
2019-01-28 14:57     ` Christoph Hellwig
2019-01-28 16:26     ` Jens Axboe
2019-01-28 16:26       ` Jens Axboe
2019-01-28 16:34       ` Christoph Hellwig
2019-01-28 16:34         ` Christoph Hellwig
2019-01-28 19:32         ` Jens Axboe
2019-01-28 19:32           ` Jens Axboe
2019-01-28 18:25     ` Jens Axboe
2019-01-28 18:25       ` Jens Axboe
2019-01-29  6:30       ` Christoph Hellwig
2019-01-29  6:30         ` Christoph Hellwig
2019-01-29 11:58         ` Arnd Bergmann
2019-01-29 11:58           ` Arnd Bergmann
2019-01-29 15:20           ` Jens Axboe
2019-01-29 15:20             ` Jens Axboe
2019-01-29 16:18             ` Arnd Bergmann
2019-01-29 16:18               ` Arnd Bergmann
2019-01-29 16:19               ` Jens Axboe
2019-01-29 16:19                 ` Jens Axboe
2019-01-29 16:26                 ` Arnd Bergmann
2019-01-29 16:26                   ` Arnd Bergmann
2019-01-29 16:28                   ` Jens Axboe
2019-01-29 16:28                     ` Jens Axboe
2019-01-29 16:46                     ` Arnd Bergmann
2019-01-29 16:46                       ` Arnd Bergmann
2019-01-29  0:47     ` Andy Lutomirski
2019-01-29  0:47       ` Andy Lutomirski
2019-01-29  1:20       ` Jens Axboe
2019-01-29  1:20         ` Jens Axboe
2019-01-29  6:45         ` Christoph Hellwig
2019-01-29  6:45           ` Christoph Hellwig
2019-01-29 12:05           ` Arnd Bergmann
2019-01-29 12:05             ` Arnd Bergmann
2019-01-31  5:11         ` Andy Lutomirski
2019-01-31  5:11           ` Andy Lutomirski
2019-01-31 16:37           ` Jens Axboe
2019-01-31 16:37             ` Jens Axboe
2019-01-23 15:35 ` [PATCH 06/18] io_uring: add fsync support Jens Axboe
2019-01-23 15:35 ` [PATCH 07/13] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-23 15:35 ` [PATCH 07/18] io_uring: support for IO polling Jens Axboe
2019-01-28 15:02   ` Christoph Hellwig
2019-01-28 16:46     ` Jens Axboe
2019-01-29  6:27       ` Christoph Hellwig
2019-01-29 13:20         ` Jens Axboe
2019-01-23 15:35 ` [PATCH 08/18] fs: add fget_many() and fput_many() Jens Axboe
2019-01-28 14:29   ` Christoph Hellwig
2019-01-28 16:48     ` Jens Axboe
2019-01-23 15:35 ` [PATCH 08/13] io_uring: add file set registration Jens Axboe
2019-01-23 15:35 ` [PATCH 09/13] io_uring: add submission polling Jens Axboe
2019-01-28 15:09   ` Christoph Hellwig
2019-01-28 17:05     ` Jens Axboe
2019-01-29  6:29       ` Christoph Hellwig
2019-01-29 13:21         ` Jens Axboe
2019-01-28 21:13   ` Jeff Moyer
2019-01-28 21:28     ` Jens Axboe
2019-01-23 15:35 ` [PATCH 09/18] io_uring: use fget/fput_many() for file references Jens Axboe
2019-01-23 15:35 ` [PATCH 10/13] io_uring: add io_kiocb ref count Jens Axboe
2019-01-23 15:35 ` [PATCH 10/18] io_uring: batch io_kiocb allocation Jens Axboe
2019-01-23 15:35 ` [PATCH 11/18] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-01-28 14:31   ` Christoph Hellwig
2019-01-28 16:54     ` Jens Axboe
2019-01-23 15:35 ` [PATCH 11/13] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-01-23 15:35 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-23 15:35 ` [PATCH 12/13] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-01-23 15:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-23 15:35 ` [PATCH 13/13] io_uring: add io_uring_event cache hit information Jens Axboe
2019-01-23 15:35 ` [PATCH 14/18] io_uring: add submission polling Jens Axboe
2019-01-23 15:35 ` [PATCH 15/18] io_uring: add io_kiocb ref count Jens Axboe
2019-01-23 15:35 ` [PATCH 16/18] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-01-23 15:35 ` [PATCH 17/18] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-01-23 15:35 ` [PATCH 18/18] io_uring: add io_uring_event cache hit information Jens Axboe
2019-01-28 21:35 [PATCHSET v8] io_uring IO interface Jens Axboe
2019-01-28 21:35 ` [PATCH 05/18] Add " Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-28 21:53   ` Jeff Moyer
2019-01-28 21:53     ` Jeff Moyer
2019-01-28 21:56     ` Jens Axboe
2019-01-28 21:56       ` Jens Axboe
2019-01-28 22:32   ` Jann Horn
2019-01-28 22:32     ` Jann Horn
2019-01-28 23:46     ` Jens Axboe
2019-01-28 23:46       ` Jens Axboe
2019-01-28 23:59       ` Jann Horn
2019-01-28 23:59         ` Jann Horn
2019-01-29  0:03         ` Jens Axboe
2019-01-29  0:03           ` Jens Axboe
2019-01-29  0:31           ` Jens Axboe
2019-01-29  0:31             ` Jens Axboe
2019-01-29  0:34             ` Jann Horn
2019-01-29  0:34               ` Jann Horn
2019-01-29  0:55               ` Jens Axboe
2019-01-29  0:55                 ` Jens Axboe
2019-01-29  0:58                 ` Jann Horn
2019-01-29  0:58                   ` Jann Horn
2019-01-29  1:01                   ` Jens Axboe
2019-01-29  1:01                     ` Jens Axboe
2019-02-01 16:57         ` Matt Mullins
2019-02-01 16:57           ` Matt Mullins
2019-02-01 17:04           ` Jann Horn
2019-02-01 17:04             ` Jann Horn
2019-02-01 17:23             ` Jann Horn
2019-02-01 17:23               ` Jann Horn
2019-02-01 18:05               ` Al Viro
2019-02-01 18:05                 ` Al Viro
2019-01-29  1:07   ` Jann Horn
2019-01-29  1:07     ` Jann Horn
2019-01-29  2:21     ` Jann Horn
2019-01-29  2:21       ` Jann Horn
2019-01-29  2:54       ` Jens Axboe
2019-01-29  2:54         ` Jens Axboe
2019-01-29  3:46       ` Jens Axboe
2019-01-29  3:46         ` Jens Axboe
2019-01-29 15:56         ` Jann Horn
2019-01-29 15:56           ` Jann Horn
2019-01-29 16:06           ` Jens Axboe
2019-01-29 16:06             ` Jens Axboe
2019-01-29  2:21     ` Jens Axboe
2019-01-29  2:21       ` Jens Axboe
2019-01-29  1:29   ` Jann Horn
2019-01-29  1:29     ` Jann Horn
2019-01-29  1:31     ` Jens Axboe
2019-01-29  1:31       ` Jens Axboe
2019-01-29  1:32       ` Jann Horn
2019-01-29  1:32         ` Jann Horn
2019-01-29  2:23         ` Jens Axboe
2019-01-29  2:23           ` Jens Axboe
2019-01-29  7:12   ` Bert Wesarg
2019-01-29  7:12     ` Bert Wesarg
2019-01-29 12:12   ` Florian Weimer
2019-01-29 12:12     ` Florian Weimer
2019-01-29 13:35     ` Jens Axboe
2019-01-29 13:35       ` Jens Axboe
2019-01-29 15:38       ` Jann Horn
2019-01-29 15:38         ` Jann Horn
2019-01-29 15:54         ` Jens Axboe
2019-01-29 15:54           ` Jens Axboe
2019-01-29 16:55         ` Christoph Hellwig
2019-01-29 16:55           ` Christoph Hellwig
2019-01-29 15:35   ` Jann Horn
2019-01-29 15:35     ` Jann Horn
2019-01-29 15:39     ` Jens Axboe
2019-01-29 15:39       ` Jens Axboe
2019-01-29 19:26 [PATCHSET v9] " Jens Axboe
2019-01-29 19:26 ` [PATCH 05/18] Add " Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-30 21:55 [PATCHSET v10] " Jens Axboe
2019-01-30 21:55 ` [PATCH 05/18] Add " Jens Axboe
2019-01-30 21:55   ` Jens Axboe
2019-02-01 15:23 [PATCHSET v11] " Jens Axboe
2019-02-01 15:24 ` [PATCH 05/18] Add " Jens Axboe
2019-02-01 15:24   ` Jens Axboe
2019-02-01 18:20   ` Florian Weimer
2019-02-01 18:20     ` Florian Weimer
2019-02-05 16:58     ` Jens Axboe
2019-02-05 16:58       ` Jens Axboe
2019-02-04 23:22   ` Jeff Moyer
2019-02-04 23:22     ` Jeff Moyer
2019-02-04 23:52     ` Jeff Moyer
2019-02-04 23:52       ` Jeff Moyer
2019-02-05 16:59       ` Jens Axboe
2019-02-05 16:59         ` Jens Axboe
2019-02-05 16:58     ` Jens Axboe
2019-02-05 16:58       ` Jens Axboe
2019-02-07 19:55 [PATCHSET v12] " Jens Axboe
2019-02-07 19:55 ` [PATCH 05/18] Add " Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-07 20:15   ` Keith Busch
2019-02-07 20:15     ` Keith Busch
2019-02-07 20:16     ` Jens Axboe
2019-02-07 20:16       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190128145700.GA9795@lst.de \
    --to=hch@lst.de \
    --cc=avi@scylladb.com \
    --cc=axboe@kernel.dk \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.