linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* io_uring: REQ_F_PREPPED race condition with punting to workers
@ 2019-05-03 10:22 Stefan Bühler
  2019-05-03 14:48 ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Bühler @ 2019-05-03 10:22 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-fsdevel

Hi,

if the initial operation returns EAGAIN (and REQ_F_NOWAIT) is not set,
io_submit_sqe copies the SQE for processing in a worker.

The worker will then read from the SQE copy to determine (some)
parameters for operations, but not all of those parameters will be
validated again, as the initial operation sets REQ_F_PREPPED.

So between the initial operation and the memcpy is a race in which the
application could change the SQE: for example it could change from
IORING_OP_FSYNC to IORING_OP_READV, which would result in broken kiocb
data afaict.

The only way around that I can see right now is copying the SQE in
io_submit_sqe (moving the call to io_cqring_add_event to io_submit_sqe
should simplify this afaict): does that sound acceptable?

cheers,
Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: io_uring: REQ_F_PREPPED race condition with punting to workers
  2019-05-03 10:22 io_uring: REQ_F_PREPPED race condition with punting to workers Stefan Bühler
@ 2019-05-03 14:48 ` Jens Axboe
  2019-05-11 17:08   ` [PATCH 1/1] io_uring: fix race condition reading SQE data Stefan Bühler
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2019-05-03 14:48 UTC (permalink / raw)
  To: Stefan Bühler, linux-block, linux-fsdevel

On 5/3/19 4:22 AM, Stefan Bühler wrote:
> Hi,
> 
> if the initial operation returns EAGAIN (and REQ_F_NOWAIT) is not set,
> io_submit_sqe copies the SQE for processing in a worker.
> 
> The worker will then read from the SQE copy to determine (some)
> parameters for operations, but not all of those parameters will be
> validated again, as the initial operation sets REQ_F_PREPPED.
> 
> So between the initial operation and the memcpy is a race in which the
> application could change the SQE: for example it could change from
> IORING_OP_FSYNC to IORING_OP_READV, which would result in broken kiocb
> data afaict.
> 
> The only way around that I can see right now is copying the SQE in
> io_submit_sqe (moving the call to io_cqring_add_event to io_submit_sqe
> should simplify this afaict): does that sound acceptable?

I'd be inclined to just fold the prep into the regular handling. The
only prep routine that does any significant work is the read/write one,
and if we're punting to async anyway, it's not a huge hit.

If we do that, then we can get rid of the PREPPED flag and the separate
need to call io_prep_xxx() for the command type.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/1] io_uring: fix race condition reading SQE data
  2019-05-03 14:48 ` Jens Axboe
@ 2019-05-11 17:08   ` Stefan Bühler
  2019-05-13 15:15     ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Bühler @ 2019-05-11 17:08 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-fsdevel

When punting to workers the SQE gets copied after the initial try.
There is a race condition between reading SQE data for the initial try
and copying it for punting it to the workers.

For example io_rw_done calls kiocb->ki_complete even if it was prepared
for IORING_OP_FSYNC (and would be NULL).

The easiest solution for now is to alway prepare again in the worker.

req->file is safe to prepare though as long as it is checked before use.

Signed-off-by: Stefan Bühler <source@stbuehler.de>
---
 fs/io_uring.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 48ea3977012a..576d9c652b4c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -329,9 +329,8 @@ struct io_kiocb {
 #define REQ_F_IOPOLL_COMPLETED	2	/* polled IO has completed */
 #define REQ_F_FIXED_FILE	4	/* ctx owns file */
 #define REQ_F_SEQ_PREV		8	/* sequential with previous */
-#define REQ_F_PREPPED		16	/* prep already done */
-#define REQ_F_IO_DRAIN		32	/* drain existing IO first */
-#define REQ_F_IO_DRAINED	64	/* drain done */
+#define REQ_F_IO_DRAIN		16	/* drain existing IO first */
+#define REQ_F_IO_DRAINED	32	/* drain done */
 	u64			user_data;
 	u32			error;	/* iopoll result from callback */
 	u32			sequence;
@@ -896,9 +895,6 @@ static int io_prep_rw(struct io_kiocb *req, const struct sqe_submit *s,
 
 	if (!req->file)
 		return -EBADF;
-	/* For -EAGAIN retry, everything is already prepped */
-	if (req->flags & REQ_F_PREPPED)
-		return 0;
 
 	if (force_nonblock && !io_file_supports_async(req->file))
 		force_nonblock = false;
@@ -941,7 +937,6 @@ static int io_prep_rw(struct io_kiocb *req, const struct sqe_submit *s,
 			return -EINVAL;
 		kiocb->ki_complete = io_complete_rw;
 	}
-	req->flags |= REQ_F_PREPPED;
 	return 0;
 }
 
@@ -1227,16 +1222,12 @@ static int io_prep_fsync(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 
 	if (!req->file)
 		return -EBADF;
-	/* Prep already done (EAGAIN retry) */
-	if (req->flags & REQ_F_PREPPED)
-		return 0;
 
 	if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
 		return -EINVAL;
 	if (unlikely(sqe->addr || sqe->ioprio || sqe->buf_index))
 		return -EINVAL;
 
-	req->flags |= REQ_F_PREPPED;
 	return 0;
 }
 
@@ -1277,16 +1268,12 @@ static int io_prep_sfr(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 
 	if (!req->file)
 		return -EBADF;
-	/* Prep already done (EAGAIN retry) */
-	if (req->flags & REQ_F_PREPPED)
-		return 0;
 
 	if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
 		return -EINVAL;
 	if (unlikely(sqe->addr || sqe->ioprio || sqe->buf_index))
 		return -EINVAL;
 
-	req->flags |= REQ_F_PREPPED;
 	return ret;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/1] io_uring: fix race condition reading SQE data
  2019-05-11 17:08   ` [PATCH 1/1] io_uring: fix race condition reading SQE data Stefan Bühler
@ 2019-05-13 15:15     ` Jens Axboe
  0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2019-05-13 15:15 UTC (permalink / raw)
  To: Stefan Bühler, linux-block, linux-fsdevel

On 5/11/19 11:08 AM, Stefan Bühler wrote:
> When punting to workers the SQE gets copied after the initial try.
> There is a race condition between reading SQE data for the initial try
> and copying it for punting it to the workers.
> 
> For example io_rw_done calls kiocb->ki_complete even if it was prepared
> for IORING_OP_FSYNC (and would be NULL).
> 
> The easiest solution for now is to alway prepare again in the worker.
> 
> req->file is safe to prepare though as long as it is checked before use.

Looks good, as we discussed a week or so ago. Applied, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-05-13 15:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-03 10:22 io_uring: REQ_F_PREPPED race condition with punting to workers Stefan Bühler
2019-05-03 14:48 ` Jens Axboe
2019-05-11 17:08   ` [PATCH 1/1] io_uring: fix race condition reading SQE data Stefan Bühler
2019-05-13 15:15     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).