linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fs/io_uring.c: fix null ptr deference in io_send_recvmsg()
@ 2020-08-04 12:56 Liu Yong
  2020-08-04 13:18 ` Pavel Begunkov
  0 siblings, 1 reply; 7+ messages in thread
From: Liu Yong @ 2020-08-04 12:56 UTC (permalink / raw)
  To: axboe; +Cc: linux-block

In io_send_recvmsg(), there is no check for the req->file.
User can change the opcode from IORING_OP_NOP to IORING_OP_SENDMSG
through competition after the io_req_set_file().

This vulnerability will leak sensitive kernel information.

[352693.910110] BUG: kernel NULL pointer dereference, address: 0000000000000028
[352693.910112] #PF: supervisor read access in kernel mode
[352693.910112] #PF: error_code(0x0000) - not-present page
[352693.910113] PGD 8000000251396067 P4D 8000000251396067 PUD 1d64ba067 PMD 0
[352693.910115] Oops: 0000 [#3] SMP PTI
[352693.910116] CPU: 11 PID: 303132 Comm: test Tainted: G      D
[352693.910117] Hardware name: Dell Inc. OptiPlex 3060/0T0MHW, BIOS 1.4.2 06/11/2019
[352693.910120] RIP: 0010:sock_from_file+0x9/0x30
[352693.910122] RSP: 0018:ffffc0a5084cfc50 EFLAGS: 00010246
[352693.910122] RAX: ffff9f6ee284d000 RBX: ffff9f6bd3675000 RCX: ffffffff8b111340
[352693.910123] RDX: 0000000000000001 RSI: ffffc0a5084cfc64 RDI: 0000000000000000
[352693.910124] RBP: ffffc0a5084cfc50 R08: 0000000000000000 R09: ffff9f6ee51a9200
[352693.910124] R10: ffff9f6ee284d200 R11: 0000000000000000 R12: ffff9f6ee51a9200
[352693.910125] R13: 0000000000000001 R14: ffffffff8b111340 R15: ffff9f6ee284d000
[352693.910126] FS:  00000000016d7880(0000) GS:ffff9f6eee2c0000(0000) knlGS:0000000000000000
[352693.910126] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[352693.910127] CR2: 0000000000000028 CR3: 000000041fb4a005 CR4: 00000000003626e0
[352693.910127] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[352693.910128] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[352693.910128] Call Trace:
[352693.910132]  io_send_recvmsg+0x49/0x170
[352693.910134]  ? __switch_to_asm+0x34/0x70
[352693.910135]  __io_submit_sqe+0x45e/0x8e0
[352693.910136]  ? __switch_to_asm+0x34/0x70
[352693.910137]  ? __switch_to_asm+0x40/0x70
[352693.910138]  ? __switch_to_asm+0x34/0x70
[352693.910138]  ? __switch_to_asm+0x40/0x70
[352693.910139]  ? __switch_to_asm+0x34/0x70
[352693.910140]  ? __switch_to_asm+0x40/0x70
[352693.910141]  ? __switch_to_asm+0x34/0x70
[352693.910142]  ? __switch_to_asm+0x40/0x70
[352693.910143]  ? __switch_to_asm+0x34/0x70
[352693.910144]  ? __switch_to_asm+0x34/0x70
[352693.910145]  __io_queue_sqe+0x23/0x230
[352693.910146]  io_queue_sqe+0x7a/0x90
[352693.910148]  io_submit_sqe+0x23d/0x330
[352693.910149]  io_ring_submit+0xca/0x200
[352693.910150]  ? do_nanosleep+0xad/0x160
[352693.910151]  ? hrtimer_init_sleeper+0x2c/0x90
[352693.910152]  ? hrtimer_nanosleep+0xc2/0x1a0
[352693.910154]  __x64_sys_io_uring_enter+0x1e4/0x2c0
[352693.910156]  do_syscall_64+0x57/0x190
[352693.910157]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Liu Yong <pkfxxxing@gmail.com>
---
 fs/io_uring.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index e0200406765c..0a26100b8260 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1675,6 +1675,9 @@ static int io_send_recvmsg(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
 		return -EINVAL;
 
+	if (!req->file)
+		return -EBADF;
+
 	sock = sock_from_file(req->file, &ret);
 	if (sock) {
 		struct user_msghdr __user *msg;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs/io_uring.c: fix null ptr deference in io_send_recvmsg()
  2020-08-04 12:56 [PATCH] fs/io_uring.c: fix null ptr deference in io_send_recvmsg() Liu Yong
@ 2020-08-04 13:18 ` Pavel Begunkov
  2020-08-04 13:27   ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Begunkov @ 2020-08-04 13:18 UTC (permalink / raw)
  To: Liu Yong, axboe; +Cc: linux-block, io-uring

On 04/08/2020 15:56, Liu Yong wrote:
> In io_send_recvmsg(), there is no check for the req->file.
> User can change the opcode from IORING_OP_NOP to IORING_OP_SENDMSG
> through competition after the io_req_set_file().

After sqe->opcode is read and copied in io_init_req(), it only uses
in-kernel req->opcode. Also, io_init_req() should check for req->file
NULL, so shouldn't happen after.

Do you have a reproducer? What kernel version did you use?

> 
> This vulnerability will leak sensitive kernel information.
> 
> [352693.910110] BUG: kernel NULL pointer dereference, address: 0000000000000028
> [352693.910112] #PF: supervisor read access in kernel mode
> [352693.910112] #PF: error_code(0x0000) - not-present page
> [352693.910113] PGD 8000000251396067 P4D 8000000251396067 PUD 1d64ba067 PMD 0
> [352693.910115] Oops: 0000 [#3] SMP PTI
> [352693.910116] CPU: 11 PID: 303132 Comm: test Tainted: G      D
> [352693.910117] Hardware name: Dell Inc. OptiPlex 3060/0T0MHW, BIOS 1.4.2 06/11/2019
> [352693.910120] RIP: 0010:sock_from_file+0x9/0x30
> [352693.910122] RSP: 0018:ffffc0a5084cfc50 EFLAGS: 00010246
> [352693.910122] RAX: ffff9f6ee284d000 RBX: ffff9f6bd3675000 RCX: ffffffff8b111340
> [352693.910123] RDX: 0000000000000001 RSI: ffffc0a5084cfc64 RDI: 0000000000000000
> [352693.910124] RBP: ffffc0a5084cfc50 R08: 0000000000000000 R09: ffff9f6ee51a9200
> [352693.910124] R10: ffff9f6ee284d200 R11: 0000000000000000 R12: ffff9f6ee51a9200
> [352693.910125] R13: 0000000000000001 R14: ffffffff8b111340 R15: ffff9f6ee284d000
> [352693.910126] FS:  00000000016d7880(0000) GS:ffff9f6eee2c0000(0000) knlGS:0000000000000000
> [352693.910126] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [352693.910127] CR2: 0000000000000028 CR3: 000000041fb4a005 CR4: 00000000003626e0
> [352693.910127] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [352693.910128] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [352693.910128] Call Trace:
> [352693.910132]  io_send_recvmsg+0x49/0x170
> [352693.910134]  ? __switch_to_asm+0x34/0x70
> [352693.910135]  __io_submit_sqe+0x45e/0x8e0
> [352693.910136]  ? __switch_to_asm+0x34/0x70
> [352693.910137]  ? __switch_to_asm+0x40/0x70
> [352693.910138]  ? __switch_to_asm+0x34/0x70
> [352693.910138]  ? __switch_to_asm+0x40/0x70
> [352693.910139]  ? __switch_to_asm+0x34/0x70
> [352693.910140]  ? __switch_to_asm+0x40/0x70
> [352693.910141]  ? __switch_to_asm+0x34/0x70
> [352693.910142]  ? __switch_to_asm+0x40/0x70
> [352693.910143]  ? __switch_to_asm+0x34/0x70
> [352693.910144]  ? __switch_to_asm+0x34/0x70
> [352693.910145]  __io_queue_sqe+0x23/0x230
> [352693.910146]  io_queue_sqe+0x7a/0x90
> [352693.910148]  io_submit_sqe+0x23d/0x330
> [352693.910149]  io_ring_submit+0xca/0x200
> [352693.910150]  ? do_nanosleep+0xad/0x160
> [352693.910151]  ? hrtimer_init_sleeper+0x2c/0x90
> [352693.910152]  ? hrtimer_nanosleep+0xc2/0x1a0
> [352693.910154]  __x64_sys_io_uring_enter+0x1e4/0x2c0
> [352693.910156]  do_syscall_64+0x57/0x190
> [352693.910157]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Signed-off-by: Liu Yong <pkfxxxing@gmail.com>
> ---
>  fs/io_uring.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index e0200406765c..0a26100b8260 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1675,6 +1675,9 @@ static int io_send_recvmsg(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>  	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>  		return -EINVAL;
>  
> +	if (!req->file)
> +		return -EBADF;
> +
>  	sock = sock_from_file(req->file, &ret);
>  	if (sock) {
>  		struct user_msghdr __user *msg;
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs/io_uring.c: fix null ptr deference in io_send_recvmsg()
  2020-08-04 13:18 ` Pavel Begunkov
@ 2020-08-04 13:27   ` Jens Axboe
       [not found]     ` <CAGAoTxzadSphnE2aLsFKS04TjTKYVq2uLFgH9dvLPwWiyqEGEQ@mail.gmail.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2020-08-04 13:27 UTC (permalink / raw)
  To: Pavel Begunkov, Liu Yong; +Cc: linux-block, io-uring

On 8/4/20 7:18 AM, Pavel Begunkov wrote:
> On 04/08/2020 15:56, Liu Yong wrote:
>> In io_send_recvmsg(), there is no check for the req->file.
>> User can change the opcode from IORING_OP_NOP to IORING_OP_SENDMSG
>> through competition after the io_req_set_file().
> 
> After sqe->opcode is read and copied in io_init_req(), it only uses
> in-kernel req->opcode. Also, io_init_req() should check for req->file
> NULL, so shouldn't happen after.
> 
> Do you have a reproducer? What kernel version did you use?

Was looking at this too, and I'm guessing this is some 5.4 based kernel.
Unfortunately the oops doesn't include that information.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs/io_uring.c: fix null ptr deference in io_send_recvmsg()
       [not found]     ` <CAGAoTxzadSphnE2aLsFKS04TjTKYVq2uLFgH9dvLPwWiyqEGEQ@mail.gmail.com>
@ 2020-08-04 17:15       ` Jens Axboe
  2020-08-04 21:55         ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2020-08-04 17:15 UTC (permalink / raw)
  To: xiao lin; +Cc: Pavel Begunkov, linux-block, io-uring

On 8/4/20 11:02 AM, xiao lin wrote:
> 在 2020年8月4日星期二,Jens Axboe <axboe@kernel.dk <mailto:axboe@kernel.dk>> 写道:
> 
>     On 8/4/20 7:18 AM, Pavel Begunkov wrote:
>     > On 04/08/2020 15:56, Liu Yong wrote:
>     >> In io_send_recvmsg(), there is no check for the req->file.
>     >> User can change the opcode from IORING_OP_NOP to IORING_OP_SENDMSG
>     >> through competition after the io_req_set_file().
>     >
>     > After sqe->opcode is read and copied in io_init_req(), it only uses
>     > in-kernel req->opcode. Also, io_init_req() should check for req->file
>     > NULL, so shouldn't happen after.
>     >
>     > Do you have a reproducer? What kernel version did you use?
> 
>     Was looking at this too, and I'm guessing this is some 5.4 based kernel.
>     Unfortunately the oops doesn't include that information.

> Sorry, I forgot to mention that the kernel version I am using is 5.4.55.

I think there are two options here:

1) Backport the series that ensured we only read those important bits once
2) Make s->sqe a full sqe, and memcpy it in

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs/io_uring.c: fix null ptr deference in io_send_recvmsg()
  2020-08-04 17:15       ` Jens Axboe
@ 2020-08-04 21:55         ` Jens Axboe
  2020-08-05  3:40           ` Liu Yong
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2020-08-04 21:55 UTC (permalink / raw)
  To: xiao lin; +Cc: Pavel Begunkov, linux-block, io-uring

On 8/4/20 11:15 AM, Jens Axboe wrote:
> On 8/4/20 11:02 AM, xiao lin wrote:
>> 在 2020年8月4日星期二,Jens Axboe <axboe@kernel.dk <mailto:axboe@kernel.dk>> 写道:
>>
>>     On 8/4/20 7:18 AM, Pavel Begunkov wrote:
>>     > On 04/08/2020 15:56, Liu Yong wrote:
>>     >> In io_send_recvmsg(), there is no check for the req->file.
>>     >> User can change the opcode from IORING_OP_NOP to IORING_OP_SENDMSG
>>     >> through competition after the io_req_set_file().
>>     >
>>     > After sqe->opcode is read and copied in io_init_req(), it only uses
>>     > in-kernel req->opcode. Also, io_init_req() should check for req->file
>>     > NULL, so shouldn't happen after.
>>     >
>>     > Do you have a reproducer? What kernel version did you use?
>>
>>     Was looking at this too, and I'm guessing this is some 5.4 based kernel.
>>     Unfortunately the oops doesn't include that information.
> 
>> Sorry, I forgot to mention that the kernel version I am using is 5.4.55.
> 
> I think there are two options here:
> 
> 1) Backport the series that ensured we only read those important bits once
> 2) Make s->sqe a full sqe, and memcpy it in

Something like this should close the ->opcode re-read for 5.4-stable.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index e0200406765c..8bb5e19b7c3c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -279,6 +279,7 @@ struct sqe_submit {
 	bool				has_user;
 	bool				needs_lock;
 	bool				needs_fixed_file;
+	u8				opcode;
 };
 
 /*
@@ -505,7 +506,7 @@ static inline void io_queue_async_work(struct io_ring_ctx *ctx,
 	int rw = 0;
 
 	if (req->submit.sqe) {
-		switch (req->submit.sqe->opcode) {
+		switch (req->submit.opcode) {
 		case IORING_OP_WRITEV:
 		case IORING_OP_WRITE_FIXED:
 			rw = !(req->rw.ki_flags & IOCB_DIRECT);
@@ -1254,23 +1255,15 @@ static int io_import_fixed(struct io_ring_ctx *ctx, int rw,
 }
 
 static ssize_t io_import_iovec(struct io_ring_ctx *ctx, int rw,
-			       const struct sqe_submit *s, struct iovec **iovec,
+			       struct io_kiocb *req, struct iovec **iovec,
 			       struct iov_iter *iter)
 {
-	const struct io_uring_sqe *sqe = s->sqe;
+	const struct io_uring_sqe *sqe = req->submit.sqe;
 	void __user *buf = u64_to_user_ptr(READ_ONCE(sqe->addr));
 	size_t sqe_len = READ_ONCE(sqe->len);
 	u8 opcode;
 
-	/*
-	 * We're reading ->opcode for the second time, but the first read
-	 * doesn't care whether it's _FIXED or not, so it doesn't matter
-	 * whether ->opcode changes concurrently. The first read does care
-	 * about whether it is a READ or a WRITE, so we don't trust this read
-	 * for that purpose and instead let the caller pass in the read/write
-	 * flag.
-	 */
-	opcode = READ_ONCE(sqe->opcode);
+	opcode = req->submit.opcode;
 	if (opcode == IORING_OP_READ_FIXED ||
 	    opcode == IORING_OP_WRITE_FIXED) {
 		ssize_t ret = io_import_fixed(ctx, rw, sqe, iter);
@@ -1278,7 +1271,7 @@ static ssize_t io_import_iovec(struct io_ring_ctx *ctx, int rw,
 		return ret;
 	}
 
-	if (!s->has_user)
+	if (!req->submit.has_user)
 		return -EFAULT;
 
 #ifdef CONFIG_COMPAT
@@ -1425,7 +1418,7 @@ static int io_read(struct io_kiocb *req, const struct sqe_submit *s,
 	if (unlikely(!(file->f_mode & FMODE_READ)))
 		return -EBADF;
 
-	ret = io_import_iovec(req->ctx, READ, s, &iovec, &iter);
+	ret = io_import_iovec(req->ctx, READ, req, &iovec, &iter);
 	if (ret < 0)
 		return ret;
 
@@ -1490,7 +1483,7 @@ static int io_write(struct io_kiocb *req, const struct sqe_submit *s,
 	if (unlikely(!(file->f_mode & FMODE_WRITE)))
 		return -EBADF;
 
-	ret = io_import_iovec(req->ctx, WRITE, s, &iovec, &iter);
+	ret = io_import_iovec(req->ctx, WRITE, req, &iovec, &iter);
 	if (ret < 0)
 		return ret;
 
@@ -2109,15 +2102,14 @@ static int io_req_defer(struct io_ring_ctx *ctx, struct io_kiocb *req,
 static int __io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
 			   const struct sqe_submit *s, bool force_nonblock)
 {
-	int ret, opcode;
+	int ret;
 
 	req->user_data = READ_ONCE(s->sqe->user_data);
 
 	if (unlikely(s->index >= ctx->sq_entries))
 		return -EINVAL;
 
-	opcode = READ_ONCE(s->sqe->opcode);
-	switch (opcode) {
+	switch (req->submit.opcode) {
 	case IORING_OP_NOP:
 		ret = io_nop(req, req->user_data);
 		break;
@@ -2181,10 +2173,10 @@ static int __io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
 	return 0;
 }
 
-static struct async_list *io_async_list_from_sqe(struct io_ring_ctx *ctx,
-						 const struct io_uring_sqe *sqe)
+static struct async_list *io_async_list_from_req(struct io_ring_ctx *ctx,
+						 struct io_kiocb *req)
 {
-	switch (sqe->opcode) {
+	switch (req->submit.opcode) {
 	case IORING_OP_READV:
 	case IORING_OP_READ_FIXED:
 		return &ctx->pending_async[READ];
@@ -2196,12 +2188,10 @@ static struct async_list *io_async_list_from_sqe(struct io_ring_ctx *ctx,
 	}
 }
 
-static inline bool io_sqe_needs_user(const struct io_uring_sqe *sqe)
+static inline bool io_req_needs_user(struct io_kiocb *req)
 {
-	u8 opcode = READ_ONCE(sqe->opcode);
-
-	return !(opcode == IORING_OP_READ_FIXED ||
-		 opcode == IORING_OP_WRITE_FIXED);
+	return !(req->submit.opcode == IORING_OP_READ_FIXED ||
+		req->submit.opcode == IORING_OP_WRITE_FIXED);
 }
 
 static void io_sq_wq_submit_work(struct work_struct *work)
@@ -2217,7 +2207,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
 	int ret;
 
 	old_cred = override_creds(ctx->creds);
-	async_list = io_async_list_from_sqe(ctx, req->submit.sqe);
+	async_list = io_async_list_from_req(ctx, req);
 
 	allow_kernel_signal(SIGINT);
 restart:
@@ -2239,7 +2229,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
 		}
 
 		ret = 0;
-		if (io_sqe_needs_user(sqe) && !cur_mm) {
+		if (io_req_needs_user(req) && !cur_mm) {
 			if (!mmget_not_zero(ctx->sqo_mm)) {
 				ret = -EFAULT;
 			} else {
@@ -2387,11 +2377,9 @@ static bool io_add_to_prev_work(struct async_list *list, struct io_kiocb *req)
 	return ret;
 }
 
-static bool io_op_needs_file(const struct io_uring_sqe *sqe)
+static bool io_op_needs_file(struct io_kiocb *req)
 {
-	int op = READ_ONCE(sqe->opcode);
-
-	switch (op) {
+	switch (req->submit.opcode) {
 	case IORING_OP_NOP:
 	case IORING_OP_POLL_REMOVE:
 	case IORING_OP_TIMEOUT:
@@ -2419,7 +2407,7 @@ static int io_req_set_file(struct io_ring_ctx *ctx, const struct sqe_submit *s,
 	 */
 	req->sequence = s->sequence;
 
-	if (!io_op_needs_file(s->sqe))
+	if (!io_op_needs_file(req))
 		return 0;
 
 	if (flags & IOSQE_FIXED_FILE) {
@@ -2460,7 +2448,7 @@ static int __io_queue_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
 
 			s->sqe = sqe_copy;
 			memcpy(&req->submit, s, sizeof(*s));
-			list = io_async_list_from_sqe(ctx, s->sqe);
+			list = io_async_list_from_req(ctx, req);
 			if (!io_add_to_prev_work(list, req)) {
 				if (list)
 					atomic_inc(&list->cnt);
@@ -2582,7 +2570,7 @@ static void io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s,
 	req->user_data = s->sqe->user_data;
 
 #if defined(CONFIG_NET)
-	switch (READ_ONCE(s->sqe->opcode)) {
+	switch (req->submit.opcode) {
 	case IORING_OP_SENDMSG:
 	case IORING_OP_RECVMSG:
 		spin_lock(&current->fs->lock);
@@ -2697,6 +2685,7 @@ static bool io_get_sqring(struct io_ring_ctx *ctx, struct sqe_submit *s)
 	if (head < ctx->sq_entries) {
 		s->index = head;
 		s->sqe = &ctx->sq_sqes[head];
+		s->opcode = READ_ONCE(s->sqe->opcode);
 		s->sequence = ctx->cached_sq_head;
 		ctx->cached_sq_head++;
 		return true;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs/io_uring.c: fix null ptr deference in io_send_recvmsg()
  2020-08-04 21:55         ` Jens Axboe
@ 2020-08-05  3:40           ` Liu Yong
  2020-08-05  4:10             ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Liu Yong @ 2020-08-05  3:40 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Pavel Begunkov, 、 linux-block@vger.kernel.org, 、io-uring

On Tue, Aug 04, 2020 at 03:55:16PM -0600, Jens Axboe wrote:
> On 8/4/20 11:15 AM, Jens Axboe wrote:
> > On 8/4/20 11:02 AM, xiao lin wrote:
> >> 在 2020年8月4日星期二,Jens Axboe <axboe@kernel.dk <mailto:axboe@kernel.dk>> 写道:
> >>
> >>     On 8/4/20 7:18 AM, Pavel Begunkov wrote:
> >>     > On 04/08/2020 15:56, Liu Yong wrote:
> >>     >> In io_send_recvmsg(), there is no check for the req->file.
> >>     >> User can change the opcode from IORING_OP_NOP to IORING_OP_SENDMSG
> >>     >> through competition after the io_req_set_file().
> >>     >
> >>     > After sqe->opcode is read and copied in io_init_req(), it only uses
> >>     > in-kernel req->opcode. Also, io_init_req() should check for req->file
> >>     > NULL, so shouldn't happen after.
> >>     >
> >>     > Do you have a reproducer? What kernel version did you use?
> >>
> >>     Was looking at this too, and I'm guessing this is some 5.4 based kernel.
> >>     Unfortunately the oops doesn't include that information.
> > 
> >> Sorry, I forgot to mention that the kernel version I am using is 5.4.55.
> > 
> > I think there are two options here:
> > 
> > 1) Backport the series that ensured we only read those important bits once
> > 2) Make s->sqe a full sqe, and memcpy it in
> 
> Something like this should close the ->opcode re-read for 5.4-stable.
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index e0200406765c..8bb5e19b7c3c 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -279,6 +279,7 @@ struct sqe_submit {
>  	bool				has_user;
>  	bool				needs_lock;
>  	bool				needs_fixed_file;
> +	u8				opcode;
>  };
>  
>  /*
> @@ -505,7 +506,7 @@ static inline void io_queue_async_work(struct io_ring_ctx *ctx,
>  	int rw = 0;
>  
>  	if (req->submit.sqe) {
> -		switch (req->submit.sqe->opcode) {
> +		switch (req->submit.opcode) {
>  		case IORING_OP_WRITEV:
>  		case IORING_OP_WRITE_FIXED:
>  			rw = !(req->rw.ki_flags & IOCB_DIRECT);
> @@ -1254,23 +1255,15 @@ static int io_import_fixed(struct io_ring_ctx *ctx, int rw,
>  }
>  
>  static ssize_t io_import_iovec(struct io_ring_ctx *ctx, int rw,
> -			       const struct sqe_submit *s, struct iovec **iovec,
> +			       struct io_kiocb *req, struct iovec **iovec,
>  			       struct iov_iter *iter)
>  {
> -	const struct io_uring_sqe *sqe = s->sqe;
> +	const struct io_uring_sqe *sqe = req->submit.sqe;
>  	void __user *buf = u64_to_user_ptr(READ_ONCE(sqe->addr));
>  	size_t sqe_len = READ_ONCE(sqe->len);
>  	u8 opcode;
>  
> -	/*
> -	 * We're reading ->opcode for the second time, but the first read
> -	 * doesn't care whether it's _FIXED or not, so it doesn't matter
> -	 * whether ->opcode changes concurrently. The first read does care
> -	 * about whether it is a READ or a WRITE, so we don't trust this read
> -	 * for that purpose and instead let the caller pass in the read/write
> -	 * flag.
> -	 */
> -	opcode = READ_ONCE(sqe->opcode);
> +	opcode = req->submit.opcode;
>  	if (opcode == IORING_OP_READ_FIXED ||
>  	    opcode == IORING_OP_WRITE_FIXED) {
>  		ssize_t ret = io_import_fixed(ctx, rw, sqe, iter);
> @@ -1278,7 +1271,7 @@ static ssize_t io_import_iovec(struct io_ring_ctx *ctx, int rw,
>  		return ret;
>  	}
>  
> -	if (!s->has_user)
> +	if (!req->submit.has_user)
>  		return -EFAULT;
>  
>  #ifdef CONFIG_COMPAT
> @@ -1425,7 +1418,7 @@ static int io_read(struct io_kiocb *req, const struct sqe_submit *s,
>  	if (unlikely(!(file->f_mode & FMODE_READ)))
>  		return -EBADF;
>  
> -	ret = io_import_iovec(req->ctx, READ, s, &iovec, &iter);
> +	ret = io_import_iovec(req->ctx, READ, req, &iovec, &iter);
>  	if (ret < 0)
>  		return ret;
>  
> @@ -1490,7 +1483,7 @@ static int io_write(struct io_kiocb *req, const struct sqe_submit *s,
>  	if (unlikely(!(file->f_mode & FMODE_WRITE)))
>  		return -EBADF;
>  
> -	ret = io_import_iovec(req->ctx, WRITE, s, &iovec, &iter);
> +	ret = io_import_iovec(req->ctx, WRITE, req, &iovec, &iter);
>  	if (ret < 0)
>  		return ret;
>  
> @@ -2109,15 +2102,14 @@ static int io_req_defer(struct io_ring_ctx *ctx, struct io_kiocb *req,
>  static int __io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
>  			   const struct sqe_submit *s, bool force_nonblock)
>  {
> -	int ret, opcode;
> +	int ret;
>  
>  	req->user_data = READ_ONCE(s->sqe->user_data);
>  
>  	if (unlikely(s->index >= ctx->sq_entries))
>  		return -EINVAL;
>  
> -	opcode = READ_ONCE(s->sqe->opcode);
> -	switch (opcode) {
> +	switch (req->submit.opcode) {
>  	case IORING_OP_NOP:
>  		ret = io_nop(req, req->user_data);
>  		break;
> @@ -2181,10 +2173,10 @@ static int __io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
>  	return 0;
>  }
>  
> -static struct async_list *io_async_list_from_sqe(struct io_ring_ctx *ctx,
> -						 const struct io_uring_sqe *sqe)
> +static struct async_list *io_async_list_from_req(struct io_ring_ctx *ctx,
> +						 struct io_kiocb *req)
>  {
> -	switch (sqe->opcode) {
> +	switch (req->submit.opcode) {
>  	case IORING_OP_READV:
>  	case IORING_OP_READ_FIXED:
>  		return &ctx->pending_async[READ];
> @@ -2196,12 +2188,10 @@ static struct async_list *io_async_list_from_sqe(struct io_ring_ctx *ctx,
>  	}
>  }
>  
> -static inline bool io_sqe_needs_user(const struct io_uring_sqe *sqe)
> +static inline bool io_req_needs_user(struct io_kiocb *req)
>  {
> -	u8 opcode = READ_ONCE(sqe->opcode);
> -
> -	return !(opcode == IORING_OP_READ_FIXED ||
> -		 opcode == IORING_OP_WRITE_FIXED);
> +	return !(req->submit.opcode == IORING_OP_READ_FIXED ||
> +		req->submit.opcode == IORING_OP_WRITE_FIXED);
>  }
>  
>  static void io_sq_wq_submit_work(struct work_struct *work)
> @@ -2217,7 +2207,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>  	int ret;
>  
>  	old_cred = override_creds(ctx->creds);
> -	async_list = io_async_list_from_sqe(ctx, req->submit.sqe);
> +	async_list = io_async_list_from_req(ctx, req);
>  
>  	allow_kernel_signal(SIGINT);
>  restart:
> @@ -2239,7 +2229,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>  		}
>  
>  		ret = 0;
> -		if (io_sqe_needs_user(sqe) && !cur_mm) {
> +		if (io_req_needs_user(req) && !cur_mm) {
>  			if (!mmget_not_zero(ctx->sqo_mm)) {
>  				ret = -EFAULT;
>  			} else {
> @@ -2387,11 +2377,9 @@ static bool io_add_to_prev_work(struct async_list *list, struct io_kiocb *req)
>  	return ret;
>  }
>  
> -static bool io_op_needs_file(const struct io_uring_sqe *sqe)
> +static bool io_op_needs_file(struct io_kiocb *req)
>  {
> -	int op = READ_ONCE(sqe->opcode);
> -
> -	switch (op) {
> +	switch (req->submit.opcode) {
>  	case IORING_OP_NOP:
>  	case IORING_OP_POLL_REMOVE:
>  	case IORING_OP_TIMEOUT:
> @@ -2419,7 +2407,7 @@ static int io_req_set_file(struct io_ring_ctx *ctx, const struct sqe_submit *s,
>  	 */
>  	req->sequence = s->sequence;
>  
> -	if (!io_op_needs_file(s->sqe))
> +	if (!io_op_needs_file(req))
>  		return 0;
>  
>  	if (flags & IOSQE_FIXED_FILE) {
> @@ -2460,7 +2448,7 @@ static int __io_queue_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
>  
>  			s->sqe = sqe_copy;
>  			memcpy(&req->submit, s, sizeof(*s));
> -			list = io_async_list_from_sqe(ctx, s->sqe);
> +			list = io_async_list_from_req(ctx, req);
>  			if (!io_add_to_prev_work(list, req)) {
>  				if (list)
>  					atomic_inc(&list->cnt);
> @@ -2582,7 +2570,7 @@ static void io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s,
>  	req->user_data = s->sqe->user_data;
>  
>  #if defined(CONFIG_NET)
> -	switch (READ_ONCE(s->sqe->opcode)) {
> +	switch (req->submit.opcode) {
>  	case IORING_OP_SENDMSG:
>  	case IORING_OP_RECVMSG:
>  		spin_lock(&current->fs->lock);
> @@ -2697,6 +2685,7 @@ static bool io_get_sqring(struct io_ring_ctx *ctx, struct sqe_submit *s)
>  	if (head < ctx->sq_entries) {
>  		s->index = head;
>  		s->sqe = &ctx->sq_sqes[head];
> +		s->opcode = READ_ONCE(s->sqe->opcode);
>  		s->sequence = ctx->cached_sq_head;
>  		ctx->cached_sq_head++;
>  		return true;
> 
> -- 
> Jens Axboe
> 

I think this patch solves similar problems from the root cause.
So, Should I submit this commit, or you?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fs/io_uring.c: fix null ptr deference in io_send_recvmsg()
  2020-08-05  3:40           ` Liu Yong
@ 2020-08-05  4:10             ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2020-08-05  4:10 UTC (permalink / raw)
  To: Liu Yong
  Cc: Pavel Begunkov, 、 linux-block@vger.kernel.org, 、io-uring

On 8/4/20 9:40 PM, Liu Yong wrote:
> On Tue, Aug 04, 2020 at 03:55:16PM -0600, Jens Axboe wrote:
>> On 8/4/20 11:15 AM, Jens Axboe wrote:
>>> On 8/4/20 11:02 AM, xiao lin wrote:
>>>> 在 2020年8月4日星期二,Jens Axboe <axboe@kernel.dk <mailto:axboe@kernel.dk>> 写道:
>>>>
>>>>     On 8/4/20 7:18 AM, Pavel Begunkov wrote:
>>>>     > On 04/08/2020 15:56, Liu Yong wrote:
>>>>     >> In io_send_recvmsg(), there is no check for the req->file.
>>>>     >> User can change the opcode from IORING_OP_NOP to IORING_OP_SENDMSG
>>>>     >> through competition after the io_req_set_file().
>>>>     >
>>>>     > After sqe->opcode is read and copied in io_init_req(), it only uses
>>>>     > in-kernel req->opcode. Also, io_init_req() should check for req->file
>>>>     > NULL, so shouldn't happen after.
>>>>     >
>>>>     > Do you have a reproducer? What kernel version did you use?
>>>>
>>>>     Was looking at this too, and I'm guessing this is some 5.4 based kernel.
>>>>     Unfortunately the oops doesn't include that information.
>>>
>>>> Sorry, I forgot to mention that the kernel version I am using is 5.4.55.
>>>
>>> I think there are two options here:
>>>
>>> 1) Backport the series that ensured we only read those important bits once
>>> 2) Make s->sqe a full sqe, and memcpy it in
>>
>> Something like this should close the ->opcode re-read for 5.4-stable.
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index e0200406765c..8bb5e19b7c3c 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -279,6 +279,7 @@ struct sqe_submit {
>>  	bool				has_user;
>>  	bool				needs_lock;
>>  	bool				needs_fixed_file;
>> +	u8				opcode;
>>  };
>>  
>>  /*
>> @@ -505,7 +506,7 @@ static inline void io_queue_async_work(struct io_ring_ctx *ctx,
>>  	int rw = 0;
>>  
>>  	if (req->submit.sqe) {
>> -		switch (req->submit.sqe->opcode) {
>> +		switch (req->submit.opcode) {
>>  		case IORING_OP_WRITEV:
>>  		case IORING_OP_WRITE_FIXED:
>>  			rw = !(req->rw.ki_flags & IOCB_DIRECT);
>> @@ -1254,23 +1255,15 @@ static int io_import_fixed(struct io_ring_ctx *ctx, int rw,
>>  }
>>  
>>  static ssize_t io_import_iovec(struct io_ring_ctx *ctx, int rw,
>> -			       const struct sqe_submit *s, struct iovec **iovec,
>> +			       struct io_kiocb *req, struct iovec **iovec,
>>  			       struct iov_iter *iter)
>>  {
>> -	const struct io_uring_sqe *sqe = s->sqe;
>> +	const struct io_uring_sqe *sqe = req->submit.sqe;
>>  	void __user *buf = u64_to_user_ptr(READ_ONCE(sqe->addr));
>>  	size_t sqe_len = READ_ONCE(sqe->len);
>>  	u8 opcode;
>>  
>> -	/*
>> -	 * We're reading ->opcode for the second time, but the first read
>> -	 * doesn't care whether it's _FIXED or not, so it doesn't matter
>> -	 * whether ->opcode changes concurrently. The first read does care
>> -	 * about whether it is a READ or a WRITE, so we don't trust this read
>> -	 * for that purpose and instead let the caller pass in the read/write
>> -	 * flag.
>> -	 */
>> -	opcode = READ_ONCE(sqe->opcode);
>> +	opcode = req->submit.opcode;
>>  	if (opcode == IORING_OP_READ_FIXED ||
>>  	    opcode == IORING_OP_WRITE_FIXED) {
>>  		ssize_t ret = io_import_fixed(ctx, rw, sqe, iter);
>> @@ -1278,7 +1271,7 @@ static ssize_t io_import_iovec(struct io_ring_ctx *ctx, int rw,
>>  		return ret;
>>  	}
>>  
>> -	if (!s->has_user)
>> +	if (!req->submit.has_user)
>>  		return -EFAULT;
>>  
>>  #ifdef CONFIG_COMPAT
>> @@ -1425,7 +1418,7 @@ static int io_read(struct io_kiocb *req, const struct sqe_submit *s,
>>  	if (unlikely(!(file->f_mode & FMODE_READ)))
>>  		return -EBADF;
>>  
>> -	ret = io_import_iovec(req->ctx, READ, s, &iovec, &iter);
>> +	ret = io_import_iovec(req->ctx, READ, req, &iovec, &iter);
>>  	if (ret < 0)
>>  		return ret;
>>  
>> @@ -1490,7 +1483,7 @@ static int io_write(struct io_kiocb *req, const struct sqe_submit *s,
>>  	if (unlikely(!(file->f_mode & FMODE_WRITE)))
>>  		return -EBADF;
>>  
>> -	ret = io_import_iovec(req->ctx, WRITE, s, &iovec, &iter);
>> +	ret = io_import_iovec(req->ctx, WRITE, req, &iovec, &iter);
>>  	if (ret < 0)
>>  		return ret;
>>  
>> @@ -2109,15 +2102,14 @@ static int io_req_defer(struct io_ring_ctx *ctx, struct io_kiocb *req,
>>  static int __io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
>>  			   const struct sqe_submit *s, bool force_nonblock)
>>  {
>> -	int ret, opcode;
>> +	int ret;
>>  
>>  	req->user_data = READ_ONCE(s->sqe->user_data);
>>  
>>  	if (unlikely(s->index >= ctx->sq_entries))
>>  		return -EINVAL;
>>  
>> -	opcode = READ_ONCE(s->sqe->opcode);
>> -	switch (opcode) {
>> +	switch (req->submit.opcode) {
>>  	case IORING_OP_NOP:
>>  		ret = io_nop(req, req->user_data);
>>  		break;
>> @@ -2181,10 +2173,10 @@ static int __io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
>>  	return 0;
>>  }
>>  
>> -static struct async_list *io_async_list_from_sqe(struct io_ring_ctx *ctx,
>> -						 const struct io_uring_sqe *sqe)
>> +static struct async_list *io_async_list_from_req(struct io_ring_ctx *ctx,
>> +						 struct io_kiocb *req)
>>  {
>> -	switch (sqe->opcode) {
>> +	switch (req->submit.opcode) {
>>  	case IORING_OP_READV:
>>  	case IORING_OP_READ_FIXED:
>>  		return &ctx->pending_async[READ];
>> @@ -2196,12 +2188,10 @@ static struct async_list *io_async_list_from_sqe(struct io_ring_ctx *ctx,
>>  	}
>>  }
>>  
>> -static inline bool io_sqe_needs_user(const struct io_uring_sqe *sqe)
>> +static inline bool io_req_needs_user(struct io_kiocb *req)
>>  {
>> -	u8 opcode = READ_ONCE(sqe->opcode);
>> -
>> -	return !(opcode == IORING_OP_READ_FIXED ||
>> -		 opcode == IORING_OP_WRITE_FIXED);
>> +	return !(req->submit.opcode == IORING_OP_READ_FIXED ||
>> +		req->submit.opcode == IORING_OP_WRITE_FIXED);
>>  }
>>  
>>  static void io_sq_wq_submit_work(struct work_struct *work)
>> @@ -2217,7 +2207,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>>  	int ret;
>>  
>>  	old_cred = override_creds(ctx->creds);
>> -	async_list = io_async_list_from_sqe(ctx, req->submit.sqe);
>> +	async_list = io_async_list_from_req(ctx, req);
>>  
>>  	allow_kernel_signal(SIGINT);
>>  restart:
>> @@ -2239,7 +2229,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>>  		}
>>  
>>  		ret = 0;
>> -		if (io_sqe_needs_user(sqe) && !cur_mm) {
>> +		if (io_req_needs_user(req) && !cur_mm) {
>>  			if (!mmget_not_zero(ctx->sqo_mm)) {
>>  				ret = -EFAULT;
>>  			} else {
>> @@ -2387,11 +2377,9 @@ static bool io_add_to_prev_work(struct async_list *list, struct io_kiocb *req)
>>  	return ret;
>>  }
>>  
>> -static bool io_op_needs_file(const struct io_uring_sqe *sqe)
>> +static bool io_op_needs_file(struct io_kiocb *req)
>>  {
>> -	int op = READ_ONCE(sqe->opcode);
>> -
>> -	switch (op) {
>> +	switch (req->submit.opcode) {
>>  	case IORING_OP_NOP:
>>  	case IORING_OP_POLL_REMOVE:
>>  	case IORING_OP_TIMEOUT:
>> @@ -2419,7 +2407,7 @@ static int io_req_set_file(struct io_ring_ctx *ctx, const struct sqe_submit *s,
>>  	 */
>>  	req->sequence = s->sequence;
>>  
>> -	if (!io_op_needs_file(s->sqe))
>> +	if (!io_op_needs_file(req))
>>  		return 0;
>>  
>>  	if (flags & IOSQE_FIXED_FILE) {
>> @@ -2460,7 +2448,7 @@ static int __io_queue_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
>>  
>>  			s->sqe = sqe_copy;
>>  			memcpy(&req->submit, s, sizeof(*s));
>> -			list = io_async_list_from_sqe(ctx, s->sqe);
>> +			list = io_async_list_from_req(ctx, req);
>>  			if (!io_add_to_prev_work(list, req)) {
>>  				if (list)
>>  					atomic_inc(&list->cnt);
>> @@ -2582,7 +2570,7 @@ static void io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s,
>>  	req->user_data = s->sqe->user_data;
>>  
>>  #if defined(CONFIG_NET)
>> -	switch (READ_ONCE(s->sqe->opcode)) {
>> +	switch (req->submit.opcode) {
>>  	case IORING_OP_SENDMSG:
>>  	case IORING_OP_RECVMSG:
>>  		spin_lock(&current->fs->lock);
>> @@ -2697,6 +2685,7 @@ static bool io_get_sqring(struct io_ring_ctx *ctx, struct sqe_submit *s)
>>  	if (head < ctx->sq_entries) {
>>  		s->index = head;
>>  		s->sqe = &ctx->sq_sqes[head];
>> +		s->opcode = READ_ONCE(s->sqe->opcode);
>>  		s->sequence = ctx->cached_sq_head;
>>  		ctx->cached_sq_head++;
>>  		return true;
>>
>> -- 
>> Jens Axboe
>>
> 
> I think this patch solves similar problems from the root cause.
> So, Should I submit this commit, or you?

Thanks for testing, I'll add your tested-by. Probably best if I do it,
since it's going to 5.4-stable only and it's not from upstream.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-08-05  4:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-04 12:56 [PATCH] fs/io_uring.c: fix null ptr deference in io_send_recvmsg() Liu Yong
2020-08-04 13:18 ` Pavel Begunkov
2020-08-04 13:27   ` Jens Axboe
     [not found]     ` <CAGAoTxzadSphnE2aLsFKS04TjTKYVq2uLFgH9dvLPwWiyqEGEQ@mail.gmail.com>
2020-08-04 17:15       ` Jens Axboe
2020-08-04 21:55         ` Jens Axboe
2020-08-05  3:40           ` Liu Yong
2020-08-05  4:10             ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).