All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] small 5.13 cleanups
@ 2021-04-20 11:03 Pavel Begunkov
  2021-04-20 11:03 ` [PATCH 1/3] io_uring: move inflight un-tracking into cleanup Pavel Begunkov
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Pavel Begunkov @ 2021-04-20 11:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Bunch of bundled cleanups

Pavel Begunkov (3):
  io_uring: move inflight un-tracking into cleanup
  io_uring: safer sq_creds putting
  io_uring: refactor io_sq_offload_create()

 fs/io_uring.c | 43 +++++++++++++++++--------------------------
 1 file changed, 17 insertions(+), 26 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/3] io_uring: move inflight un-tracking into cleanup
  2021-04-20 11:03 [PATCH 0/3] small 5.13 cleanups Pavel Begunkov
@ 2021-04-20 11:03 ` Pavel Begunkov
  2021-04-20 11:03 ` [PATCH 2/3] io_uring: safer sq_creds putting Pavel Begunkov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: Pavel Begunkov @ 2021-04-20 11:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

REQ_F_INFLIGHT deaccounting doesn't do any spinlocking or resource
freeing anymore, so it's safe to move it into the normal cleanup flow,
i.e. into io_clean_op(), so making it cleaner.

Also move io_req_needs_clean() to be first in io_dismantle_req() so it
doesn't reload req->flags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index d3c2387b4629..f8b2fb553410 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1601,7 +1601,7 @@ static void io_req_complete_post(struct io_kiocb *req, long res,
 static inline bool io_req_needs_clean(struct io_kiocb *req)
 {
 	return req->flags & (REQ_F_BUFFER_SELECTED | REQ_F_NEED_CLEANUP |
-				REQ_F_POLLED);
+				REQ_F_POLLED | REQ_F_INFLIGHT);
 }
 
 static void io_req_complete_state(struct io_kiocb *req, long res,
@@ -1717,17 +1717,10 @@ static void io_dismantle_req(struct io_kiocb *req)
 {
 	unsigned int flags = req->flags;
 
+	if (io_req_needs_clean(req))
+		io_clean_op(req);
 	if (!(flags & REQ_F_FIXED_FILE))
 		io_put_file(req->file);
-	if (io_req_needs_clean(req) || (req->flags & REQ_F_INFLIGHT)) {
-		io_clean_op(req);
-		if (req->flags & REQ_F_INFLIGHT) {
-			struct io_uring_task *tctx = req->task->io_uring;
-
-			atomic_dec(&tctx->inflight_tracked);
-			req->flags &= ~REQ_F_INFLIGHT;
-		}
-	}
 	if (req->fixed_rsrc_refs)
 		percpu_ref_put(req->fixed_rsrc_refs);
 	if (req->async_data)
@@ -6051,6 +6044,12 @@ static void io_clean_op(struct io_kiocb *req)
 		kfree(req->apoll);
 		req->apoll = NULL;
 	}
+	if (req->flags & REQ_F_INFLIGHT) {
+		struct io_uring_task *tctx = req->task->io_uring;
+
+		atomic_dec(&tctx->inflight_tracked);
+		req->flags &= ~REQ_F_INFLIGHT;
+	}
 }
 
 static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/3] io_uring: safer sq_creds putting
  2021-04-20 11:03 [PATCH 0/3] small 5.13 cleanups Pavel Begunkov
  2021-04-20 11:03 ` [PATCH 1/3] io_uring: move inflight un-tracking into cleanup Pavel Begunkov
@ 2021-04-20 11:03 ` Pavel Begunkov
  2021-04-20 11:03 ` [PATCH 3/3] io_uring: refactor io_sq_offload_create() Pavel Begunkov
  2021-04-20 18:55 ` [PATCH 0/3] small 5.13 cleanups Jens Axboe
  3 siblings, 0 replies; 26+ messages in thread
From: Pavel Begunkov @ 2021-04-20 11:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Put sq_creds as a part of io_ring_ctx_free(), it's easy to miss doing it
in io_sq_thread_finish(), especially considering past mistakes related
to ring creation failures.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f8b2fb553410..482c77d57219 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7226,8 +7226,6 @@ static void io_sq_thread_finish(struct io_ring_ctx *ctx)
 
 		io_put_sq_data(sqd);
 		ctx->sq_data = NULL;
-		if (ctx->sq_creds)
-			put_cred(ctx->sq_creds);
 	}
 }
 
@@ -8422,6 +8420,8 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	mutex_unlock(&ctx->uring_lock);
 	io_eventfd_unregister(ctx);
 	io_destroy_buffers(ctx);
+	if (ctx->sq_creds)
+		put_cred(ctx->sq_creds);
 
 	/* there are no registered resources left, nobody uses it */
 	if (ctx->rsrc_node)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-04-20 11:03 [PATCH 0/3] small 5.13 cleanups Pavel Begunkov
  2021-04-20 11:03 ` [PATCH 1/3] io_uring: move inflight un-tracking into cleanup Pavel Begunkov
  2021-04-20 11:03 ` [PATCH 2/3] io_uring: safer sq_creds putting Pavel Begunkov
@ 2021-04-20 11:03 ` Pavel Begunkov
  2021-07-22 21:59   ` Al Viro
  2021-04-20 18:55 ` [PATCH 0/3] small 5.13 cleanups Jens Axboe
  3 siblings, 1 reply; 26+ messages in thread
From: Pavel Begunkov @ 2021-04-20 11:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Just a bit of code tossing in io_sq_offload_create(), so it looks a bit
better. No functional changes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 482c77d57219..b2aa9b99b820 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7876,11 +7876,9 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 		f = fdget(p->wq_fd);
 		if (!f.file)
 			return -ENXIO;
-		if (f.file->f_op != &io_uring_fops) {
-			fdput(f);
-			return -EINVAL;
-		}
 		fdput(f);
+		if (f.file->f_op != &io_uring_fops)
+			return -EINVAL;
 	}
 	if (ctx->flags & IORING_SETUP_SQPOLL) {
 		struct task_struct *tsk;
@@ -7899,13 +7897,11 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 		if (!ctx->sq_thread_idle)
 			ctx->sq_thread_idle = HZ;
 
-		ret = 0;
 		io_sq_thread_park(sqd);
 		list_add(&ctx->sqd_list, &sqd->ctx_list);
 		io_sqd_update_thread_idle(sqd);
 		/* don't attach to a dying SQPOLL thread, would be racy */
-		if (attached && !sqd->thread)
-			ret = -ENXIO;
+		ret = (attached && !sqd->thread) ? -ENXIO : 0;
 		io_sq_thread_unpark(sqd);
 
 		if (ret < 0)
@@ -7917,11 +7913,8 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 			int cpu = p->sq_thread_cpu;
 
 			ret = -EINVAL;
-			if (cpu >= nr_cpu_ids)
-				goto err_sqpoll;
-			if (!cpu_online(cpu))
+			if (cpu >= nr_cpu_ids || !cpu_online(cpu))
 				goto err_sqpoll;
-
 			sqd->sq_cpu = cpu;
 		} else {
 			sqd->sq_cpu = -1;
@@ -7947,12 +7940,11 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 	}
 
 	return 0;
+err_sqpoll:
+	complete(&ctx->sq_data->exited);
 err:
 	io_sq_thread_finish(ctx);
 	return ret;
-err_sqpoll:
-	complete(&ctx->sq_data->exited);
-	goto err;
 }
 
 static inline void __io_unaccount_mem(struct user_struct *user,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] small 5.13 cleanups
  2021-04-20 11:03 [PATCH 0/3] small 5.13 cleanups Pavel Begunkov
                   ` (2 preceding siblings ...)
  2021-04-20 11:03 ` [PATCH 3/3] io_uring: refactor io_sq_offload_create() Pavel Begunkov
@ 2021-04-20 18:55 ` Jens Axboe
  3 siblings, 0 replies; 26+ messages in thread
From: Jens Axboe @ 2021-04-20 18:55 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 4/20/21 5:03 AM, Pavel Begunkov wrote:
> Bunch of bundled cleanups

Applied, thanks Pavel.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-04-20 11:03 ` [PATCH 3/3] io_uring: refactor io_sq_offload_create() Pavel Begunkov
@ 2021-07-22 21:59   ` Al Viro
  2021-07-22 23:06     ` Jens Axboe
  2021-07-23  9:59     ` Pavel Begunkov
  0 siblings, 2 replies; 26+ messages in thread
From: Al Viro @ 2021-07-22 21:59 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, linux-fsdevel, Linus Torvalds

On Tue, Apr 20, 2021 at 12:03:33PM +0100, Pavel Begunkov wrote:
> Just a bit of code tossing in io_sq_offload_create(), so it looks a bit
> better. No functional changes.

Does a use-after-free count as a functional change?

>  		f = fdget(p->wq_fd);

Descriptor table is shared with another thread, grabbed a reference to file.
Refcount is 2 (1 from descriptor table, 1 held by us)

>  		if (!f.file)
>  			return -ENXIO;

Nope, not NULL.

> -		if (f.file->f_op != &io_uring_fops) {
> -			fdput(f);
> -			return -EINVAL;
> -		}
>  		fdput(f);

Decrement refcount, get preempted away.  f.file->f_count is 1 now.

Another thread: close() on the same descriptor.  Final reference to
struct file (from descriptor table) is gone, file closed, memory freed.

Regain CPU...

> +		if (f.file->f_op != &io_uring_fops)
> +			return -EINVAL;

... and dereference an already freed structure.

What scares me here is that you are playing with bloody fundamental objects,
without understanding even the basics regarding their handling ;-/

1) descriptor tables can be shared.
2) another thread can close file right under you.
3) once all references to opened file are gone, it gets shut down and
struct file gets freed.
4) inside an fdget()/fdput() pair you are guaranteed that (3) won't happen.
As soon as you've done fdput(), that promise is gone.

	In the above only (1) might have been non-obvious, because if you
accept _that_, you have to ask yourself what the fuck would prevent file
disappearing once you've done fdput(), seeing that it might be the last
thing your syscall is doing to the damn thing.  So either that would've
leaked it, or _something_ in the operations you've done to it must've
made it possible for close(2) to get the damn thing.  And dereferencing
->f_op is unlikely to be that, isn't it?  Which leaves fdput() the
only candidate.  It's common sense stuff...

	Again, descriptor table is a shared resource and threads sharing
it can issue syscalls at the same time.  Sure, I've got fewer excuses
than you do for lack of documentation, but that's really basic...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-22 21:59   ` Al Viro
@ 2021-07-22 23:06     ` Jens Axboe
  2021-07-22 23:30       ` Al Viro
  2021-07-23  9:59     ` Pavel Begunkov
  1 sibling, 1 reply; 26+ messages in thread
From: Jens Axboe @ 2021-07-22 23:06 UTC (permalink / raw)
  To: Al Viro, Pavel Begunkov; +Cc: io-uring, linux-fsdevel, Linus Torvalds

On 7/22/21 3:59 PM, Al Viro wrote:
> On Tue, Apr 20, 2021 at 12:03:33PM +0100, Pavel Begunkov wrote:
>> Just a bit of code tossing in io_sq_offload_create(), so it looks a bit
>> better. No functional changes.
> 
> Does a use-after-free count as a functional change?
> 
>>  		f = fdget(p->wq_fd);
> 
> Descriptor table is shared with another thread, grabbed a reference to file.
> Refcount is 2 (1 from descriptor table, 1 held by us)
> 
>>  		if (!f.file)
>>  			return -ENXIO;
> 
> Nope, not NULL.
> 
>> -		if (f.file->f_op != &io_uring_fops) {
>> -			fdput(f);
>> -			return -EINVAL;
>> -		}
>>  		fdput(f);
> 
> Decrement refcount, get preempted away.  f.file->f_count is 1 now.
> 
> Another thread: close() on the same descriptor.  Final reference to
> struct file (from descriptor table) is gone, file closed, memory freed.
> 
> Regain CPU...
> 
>> +		if (f.file->f_op != &io_uring_fops)
>> +			return -EINVAL;
> 
> ... and dereference an already freed structure.
> 
> What scares me here is that you are playing with bloody fundamental
> objects, without understanding even the basics regarding their
> handling ;-/

Let's calm down here, no need to resort to hyperbole. It looks like an
honest mistake to me, and I should have caught that in review. You don't
even need to understand file structure life times to realize that:

	put(shared_struct);
	if (shared_struct->foo)
		...

is a bad idea. Which Pavel obviously does.

But yes, that is not great and obviously a bug, and we'll of course get
it fixed up asap.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-22 23:06     ` Jens Axboe
@ 2021-07-22 23:30       ` Al Viro
  2021-07-22 23:42         ` Jens Axboe
  2021-07-23  0:03         ` Al Viro
  0 siblings, 2 replies; 26+ messages in thread
From: Al Viro @ 2021-07-22 23:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On Thu, Jul 22, 2021 at 05:06:24PM -0600, Jens Axboe wrote:

> But yes, that is not great and obviously a bug, and we'll of course get
> it fixed up asap.

Another fun question: in do_exit() you have
        io_uring_files_cancel(tsk->files);

with

static inline void io_uring_files_cancel(struct files_struct *files)
{
        if (current->io_uring)
		__io_uring_cancel(files);
}

and

void __io_uring_cancel(struct files_struct *files)
{
        io_uring_cancel_generic(!files, NULL);
}

What the hell is that about?  What are you trying to check there?

All assignments to ->files:
init/init_task.c:116:   .files          = &init_files,
	Not NULL.
fs/file.c:433:          tsk->files = NULL;
	exit_files(), sets to NULL
fs/file.c:741:          me->files = cur_fds;
	__close_range(), if the value has been changed at all, the new one
	came from if (fds) swap(cur_fds, fds), so it can't become NULL here.
kernel/fork.c:1482:     tsk->files = newf;
	copy_files(), immediately preceded by verifying newf != NULL
kernel/fork.c:3044:                     current->files = new_fd;
	ksys_unshare(), under if (new_fd)
kernel/fork.c:3097:     task->files = copy;
	unshare_files(), with if (error || !copy) return error;
	slightly upstream.

IOW, task->files can be NULL *ONLY* after exit_files().  There are two callers
of that; one is for stillborns in copy_process(), another - in do_exit(),
well past that call of io_uring_files_cancel().  And around that call we have

        if (unlikely(tsk->flags & PF_EXITING)) {
		pr_alert("Fixing recursive fault but reboot is needed!\n");
		futex_exit_recursive(tsk);
		set_current_state(TASK_UNINTERRUPTIBLE);
		schedule();
	}
        io_uring_files_cancel(tsk->files);
	exit_signals(tsk);  /* sets PF_EXITING */

So how can we possibly get there with tsk->files == NULL and what does it
have to do with files, anyway?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-22 23:30       ` Al Viro
@ 2021-07-22 23:42         ` Jens Axboe
  2021-07-23  0:10           ` Al Viro
  2021-07-23  0:03         ` Al Viro
  1 sibling, 1 reply; 26+ messages in thread
From: Jens Axboe @ 2021-07-22 23:42 UTC (permalink / raw)
  To: Al Viro; +Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On 7/22/21 5:30 PM, Al Viro wrote:
> On Thu, Jul 22, 2021 at 05:06:24PM -0600, Jens Axboe wrote:
> 
>> But yes, that is not great and obviously a bug, and we'll of course get
>> it fixed up asap.
> 
> Another fun question: in do_exit() you have
>         io_uring_files_cancel(tsk->files);
> 
> with
> 
> static inline void io_uring_files_cancel(struct files_struct *files)
> {
>         if (current->io_uring)
> 		__io_uring_cancel(files);
> }
> 
> and
> 
> void __io_uring_cancel(struct files_struct *files)
> {
>         io_uring_cancel_generic(!files, NULL);
> }
> 
> What the hell is that about?  What are you trying to check there?
> 
> All assignments to ->files:
> init/init_task.c:116:   .files          = &init_files,
> 	Not NULL.
> fs/file.c:433:          tsk->files = NULL;
> 	exit_files(), sets to NULL
> fs/file.c:741:          me->files = cur_fds;
> 	__close_range(), if the value has been changed at all, the new one
> 	came from if (fds) swap(cur_fds, fds), so it can't become NULL here.
> kernel/fork.c:1482:     tsk->files = newf;
> 	copy_files(), immediately preceded by verifying newf != NULL
> kernel/fork.c:3044:                     current->files = new_fd;
> 	ksys_unshare(), under if (new_fd)
> kernel/fork.c:3097:     task->files = copy;
> 	unshare_files(), with if (error || !copy) return error;
> 	slightly upstream.
> 
> IOW, task->files can be NULL *ONLY* after exit_files().  There are two callers
> of that; one is for stillborns in copy_process(), another - in do_exit(),
> well past that call of io_uring_files_cancel().  And around that call we have
> 
>         if (unlikely(tsk->flags & PF_EXITING)) {
> 		pr_alert("Fixing recursive fault but reboot is needed!\n");
> 		futex_exit_recursive(tsk);
> 		set_current_state(TASK_UNINTERRUPTIBLE);
> 		schedule();
> 	}
>         io_uring_files_cancel(tsk->files);
> 	exit_signals(tsk);  /* sets PF_EXITING */
> 
> So how can we possibly get there with tsk->files == NULL and what does it
> have to do with files, anyway?

It's not the clearest, but the files check is just to distinguish between
exec vs normal cancel. For exec, we pass in files == NULL. It's not
related to task->files being NULL or not, we explicitly pass NULL for
exec.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-22 23:30       ` Al Viro
  2021-07-22 23:42         ` Jens Axboe
@ 2021-07-23  0:03         ` Al Viro
  1 sibling, 0 replies; 26+ messages in thread
From: Al Viro @ 2021-07-23  0:03 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On Thu, Jul 22, 2021 at 11:30:35PM +0000, Al Viro wrote:

> IOW, task->files can be NULL *ONLY* after exit_files().  There are two callers
> of that; one is for stillborns in copy_process(), another - in do_exit(),
> well past that call of io_uring_files_cancel().  And around that call we have
> 
>         if (unlikely(tsk->flags & PF_EXITING)) {
> 		pr_alert("Fixing recursive fault but reboot is needed!\n");
> 		futex_exit_recursive(tsk);
> 		set_current_state(TASK_UNINTERRUPTIBLE);
> 		schedule();
> 	}
>         io_uring_files_cancel(tsk->files);
> 	exit_signals(tsk);  /* sets PF_EXITING */
> 
> So how can we possibly get there with tsk->files == NULL and what does it
> have to do with files, anyway?

PS: processes with ->files == NULL can be observed; e.g. access via procfs
can very well race with exit().  If procfs acquires task_struct reference
before exit(), the object won't get freed until we do put_task_struct().
However, the process in question can get through the entire do_exit(),
become a zombie, be successfull reaped, etc., so its state can be very
thoroughly taken apart while procfs tries to access it.

There the checks for tsk->files == NULL are meaningful; doing them for
current, OTOH, is basically asking "am I rather deep into do_exit()?"

	Once upon a time we had exit_files() done kernel threads.
Not for the last 9 years since 864bdb3b6cbd ("new helper:
daemonize_descriptors()"), though (and shortly after that the entire
daemonize() thing has disappeared - kernel threads are spawned by
kthreadd, and inherit ->files from it just fine).

	Should've killed the useless checks for NULL ->files at the same
time, hadn't...  FWIW, the checks in fget_task(), task_lookup_fd_rcu(),
task_lookup_next_fd_rcu(), task_state(), fs/proc/fd.c:seq_show()
and iterate_fd() are there for good reason.  The ones in unshare_fd(),
copy_files(), fs/proc/task_nommu.c:task_mem() and in exit_files() itself
are noise.  I'll throw their removal in vfs.git#work.misc...

	Anyway, if you intended to check for some(?) kernel threads,
that place needs fixing.  If not, I'd suggest just passing a boolean
to that thing (and giving it less confusing name)...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-22 23:42         ` Jens Axboe
@ 2021-07-23  0:10           ` Al Viro
  2021-07-23  0:12             ` Al Viro
  2021-07-23 16:17             ` Jens Axboe
  0 siblings, 2 replies; 26+ messages in thread
From: Al Viro @ 2021-07-23  0:10 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On Thu, Jul 22, 2021 at 05:42:55PM -0600, Jens Axboe wrote:

> > So how can we possibly get there with tsk->files == NULL and what does it
> > have to do with files, anyway?
> 
> It's not the clearest, but the files check is just to distinguish between
> exec vs normal cancel. For exec, we pass in files == NULL. It's not
> related to task->files being NULL or not, we explicitly pass NULL for
> exec.

Er...  So turn that argument into bool cancel_all, and pass false on exit and
true on exit?  While we are at it, what happens if you pass io_uring descriptor
to another process, close yours and then have the recepient close the one it
has gotten?  AFAICS, io_ring_ctx_wait_and_kill(ctx) will be called in context
of a process that has never done anything io_uring-related.  Can it end up
trying to resubmit some requests?

I rather hope it can't happen, but I don't see what would prevent it...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-23  0:10           ` Al Viro
@ 2021-07-23  0:12             ` Al Viro
  2021-07-23 16:17             ` Jens Axboe
  1 sibling, 0 replies; 26+ messages in thread
From: Al Viro @ 2021-07-23  0:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On Fri, Jul 23, 2021 at 12:10:32AM +0000, Al Viro wrote:
> On Thu, Jul 22, 2021 at 05:42:55PM -0600, Jens Axboe wrote:
> 
> > > So how can we possibly get there with tsk->files == NULL and what does it
> > > have to do with files, anyway?
> > 
> > It's not the clearest, but the files check is just to distinguish between
> > exec vs normal cancel. For exec, we pass in files == NULL. It's not
> > related to task->files being NULL or not, we explicitly pass NULL for
> > exec.
> 
> Er...  So turn that argument into bool cancel_all, and pass false on exit and
> true on exit?
          ^^^^ exec, that is.  Sorry.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-22 21:59   ` Al Viro
  2021-07-22 23:06     ` Jens Axboe
@ 2021-07-23  9:59     ` Pavel Begunkov
  1 sibling, 0 replies; 26+ messages in thread
From: Pavel Begunkov @ 2021-07-23  9:59 UTC (permalink / raw)
  To: Al Viro; +Cc: Jens Axboe, io-uring, linux-fsdevel, Linus Torvalds

On 7/22/21 10:59 PM, Al Viro wrote:
> On Tue, Apr 20, 2021 at 12:03:33PM +0100, Pavel Begunkov wrote:
>> Just a bit of code tossing in io_sq_offload_create(), so it looks a bit
>> better. No functional changes.
> 
> Does a use-after-free count as a functional change?
> 
>>  		f = fdget(p->wq_fd);
> 
> Descriptor table is shared with another thread, grabbed a reference to file.
> Refcount is 2 (1 from descriptor table, 1 held by us)
> 
>>  		if (!f.file)
>>  			return -ENXIO;
> 
> Nope, not NULL.
> 
>> -		if (f.file->f_op != &io_uring_fops) {
>> -			fdput(f);
>> -			return -EINVAL;
>> -		}
>>  		fdput(f);
> 
> Decrement refcount, get preempted away.  f.file->f_count is 1 now.
> 
> Another thread: close() on the same descriptor.  Final reference to
> struct file (from descriptor table) is gone, file closed, memory freed.
> 
> Regain CPU...
> 
>> +		if (f.file->f_op != &io_uring_fops)
>> +			return -EINVAL;
> 
> ... and dereference an already freed structure.
> 
> What scares me here is that you are playing with bloody fundamental objects,
> without understanding even the basics regarding their handling ;-/

Yes, it's a stupid bug and slipped through accidentally, not proud of
it. And it's obvious to anyone that it shouldn't be touched after a
put, so have no clue why there is such a long explanation.
Anyway, thanks for letting know.

By luck, it should be of low severity, as it's a compatibility check,
the result of which is not depended upon by any code after. To
fault would need some RAM hot-remove (?). Not an excuse, how you
put it, but useful to notice.


> 1) descriptor tables can be shared.
> 2) another thread can close file right under you.
> 3) once all references to opened file are gone, it gets shut down and
> struct file gets freed.
> 4) inside an fdget()/fdput() pair you are guaranteed that (3) won't happen.
> As soon as you've done fdput(), that promise is gone.
> 
> 	In the above only (1) might have been non-obvious, because if you
> accept _that_, you have to ask yourself what the fuck would prevent file
> disappearing once you've done fdput(), seeing that it might be the last
> thing your syscall is doing to the damn thing.  So either that would've
> leaked it, or _something_ in the operations you've done to it must've
> made it possible for close(2) to get the damn thing.  And dereferencing
> ->f_op is unlikely to be that, isn't it?  Which leaves fdput() the
> only candidate.  It's common sense stuff...
> 
> 	Again, descriptor table is a shared resource and threads sharing
> it can issue syscalls at the same time.  Sure, I've got fewer excuses
> than you do for lack of documentation, but that's really basic...
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-23  0:10           ` Al Viro
  2021-07-23  0:12             ` Al Viro
@ 2021-07-23 16:17             ` Jens Axboe
  2021-07-23 17:11               ` Al Viro
  1 sibling, 1 reply; 26+ messages in thread
From: Jens Axboe @ 2021-07-23 16:17 UTC (permalink / raw)
  To: Al Viro; +Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On 7/22/21 6:10 PM, Al Viro wrote:
> On Thu, Jul 22, 2021 at 05:42:55PM -0600, Jens Axboe wrote:
> 
>>> So how can we possibly get there with tsk->files == NULL and what does it
>>> have to do with files, anyway?
>>
>> It's not the clearest, but the files check is just to distinguish between
>> exec vs normal cancel. For exec, we pass in files == NULL. It's not
>> related to task->files being NULL or not, we explicitly pass NULL for
>> exec.
> 
> Er...  So turn that argument into bool cancel_all, and pass false on exit and
> true on exec? 

Yes

> While we are at it, what happens if you pass io_uring descriptor
> to another process, close yours and then have the recepient close the one it
> has gotten?  AFAICS, io_ring_ctx_wait_and_kill(ctx) will be called in context
> of a process that has never done anything io_uring-related.  Can it end up
> trying to resubmit some requests?> 
> I rather hope it can't happen, but I don't see what would prevent it...

No, the pending request would either have gone to a created thread of
the original task on submission, or it would be sitting in a
ready-to-retry state. The retry would attempt to queue to original task,
and either succeed (if still alive) or get failed with -ECANCELED. Any
given request is tied to the original task.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-23 16:17             ` Jens Axboe
@ 2021-07-23 17:11               ` Al Viro
  2021-07-23 17:32                 ` Jens Axboe
  0 siblings, 1 reply; 26+ messages in thread
From: Al Viro @ 2021-07-23 17:11 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On Fri, Jul 23, 2021 at 10:17:27AM -0600, Jens Axboe wrote:
> On 7/22/21 6:10 PM, Al Viro wrote:
> > On Thu, Jul 22, 2021 at 05:42:55PM -0600, Jens Axboe wrote:
> > 
> >>> So how can we possibly get there with tsk->files == NULL and what does it
> >>> have to do with files, anyway?
> >>
> >> It's not the clearest, but the files check is just to distinguish between
> >> exec vs normal cancel. For exec, we pass in files == NULL. It's not
> >> related to task->files being NULL or not, we explicitly pass NULL for
> >> exec.
> > 
> > Er...  So turn that argument into bool cancel_all, and pass false on exit and
> > true on exec? 
> 
> Yes
> 
> > While we are at it, what happens if you pass io_uring descriptor
> > to another process, close yours and then have the recepient close the one it
> > has gotten?  AFAICS, io_ring_ctx_wait_and_kill(ctx) will be called in context
> > of a process that has never done anything io_uring-related.  Can it end up
> > trying to resubmit some requests?> 
> > I rather hope it can't happen, but I don't see what would prevent it...
> 
> No, the pending request would either have gone to a created thread of
> the original task on submission, or it would be sitting in a
> ready-to-retry state. The retry would attempt to queue to original task,
> and either succeed (if still alive) or get failed with -ECANCELED. Any
> given request is tied to the original task.

Hmm...  Sure, you'll be pushing it to the same io_wqe it went through originally,
but you are still in context of io_uring_release() caller, aren't you?

So you call io_wqe_wake_worker(), and it decides that all threads are busy,
but ->nr_workers is still below ->max_workers.  And proceeds to
	create_io_worker(wqe->wq, wqe, acct->index);
which will create a new io-worker thread, but do that in the thread group of
current, i.e. the caller of io_uring_release().  Looks like we'd get
an io-worker thread with the wrong parent...

What am I missing here?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-23 17:11               ` Al Viro
@ 2021-07-23 17:32                 ` Jens Axboe
  2021-07-23 17:36                   ` Jens Axboe
  0 siblings, 1 reply; 26+ messages in thread
From: Jens Axboe @ 2021-07-23 17:32 UTC (permalink / raw)
  To: Al Viro; +Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On 7/23/21 11:11 AM, Al Viro wrote:
> On Fri, Jul 23, 2021 at 10:17:27AM -0600, Jens Axboe wrote:
>> On 7/22/21 6:10 PM, Al Viro wrote:
>>> On Thu, Jul 22, 2021 at 05:42:55PM -0600, Jens Axboe wrote:
>>>
>>>>> So how can we possibly get there with tsk->files == NULL and what does it
>>>>> have to do with files, anyway?
>>>>
>>>> It's not the clearest, but the files check is just to distinguish between
>>>> exec vs normal cancel. For exec, we pass in files == NULL. It's not
>>>> related to task->files being NULL or not, we explicitly pass NULL for
>>>> exec.
>>>
>>> Er...  So turn that argument into bool cancel_all, and pass false on exit and
>>> true on exec? 
>>
>> Yes
>>
>>> While we are at it, what happens if you pass io_uring descriptor
>>> to another process, close yours and then have the recepient close the one it
>>> has gotten?  AFAICS, io_ring_ctx_wait_and_kill(ctx) will be called in context
>>> of a process that has never done anything io_uring-related.  Can it end up
>>> trying to resubmit some requests?> 
>>> I rather hope it can't happen, but I don't see what would prevent it...
>>
>> No, the pending request would either have gone to a created thread of
>> the original task on submission, or it would be sitting in a
>> ready-to-retry state. The retry would attempt to queue to original task,
>> and either succeed (if still alive) or get failed with -ECANCELED. Any
>> given request is tied to the original task.
> 
> Hmm...  Sure, you'll be pushing it to the same io_wqe it went through originally,
> but you are still in context of io_uring_release() caller, aren't you?
> 
> So you call io_wqe_wake_worker(), and it decides that all threads are busy,
> but ->nr_workers is still below ->max_workers.  And proceeds to
> 	create_io_worker(wqe->wq, wqe, acct->index);
> which will create a new io-worker thread, but do that in the thread group of
> current, i.e. the caller of io_uring_release().  Looks like we'd get
> an io-worker thread with the wrong parent...
> 
> What am I missing here?

I think there's two main cases here:

1) Request has already been issued by original task in some shape or form.
   This is the case I was mentioning in my previous reply.

2) Request has not been seen yet, this would be a new submit.

For case #2, let's say you pass the ring to another task, there are entries
in the SQ ring that haven't been submitted yet. Will these go under the new
task? Yes they would - you've passed the ring to someone else at that point.

For case #1, by definition the request has already been issued and is
assigned to the task that did that. This either happens from the syscall
itself, or from task_work which is also run from that original task.

For your particular case, it's either an original async queue (hasn't
been done on this task before), in which case it will create a thread
from the right task. The other option is that we decide to async requeue
from async context for some odd reason, and we're already in the right
context at that point to create a new thread (which should not even
happen, as the same thread is now available).

I don't see a case where this wouldn't work as expected. However, I do
think we could add a WARN_ON_ONCE (or similar) and reject any attempt
to io_wq_enqueue() from the wrong context as a proactive measure to
catch any bugs in testing rather than later.

Outside of that, we're not submitting off release, only killing anything
pending. The only odd case there is iopoll, but that doesn't resubmit
here.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-23 17:32                 ` Jens Axboe
@ 2021-07-23 17:36                   ` Jens Axboe
  2021-07-23 17:56                     ` Jens Axboe
  0 siblings, 1 reply; 26+ messages in thread
From: Jens Axboe @ 2021-07-23 17:36 UTC (permalink / raw)
  To: Al Viro; +Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On 7/23/21 11:32 AM, Jens Axboe wrote:
> Outside of that, we're not submitting off release, only killing anything
> pending. The only odd case there is iopoll, but that doesn't resubmit
> here.

OK perhaps I'm wrong on this one - if we have a pending iopoll request,
and we run into the rare case of needing resubmit, we are doing that off
the release path and that should not happen. Hence it could potentially
happen for iosched and/or low queue depth devices, if you are using a
ring for pure polling. I'll patch that up.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-23 17:36                   ` Jens Axboe
@ 2021-07-23 17:56                     ` Jens Axboe
  2021-07-23 19:00                       ` Al Viro
  0 siblings, 1 reply; 26+ messages in thread
From: Jens Axboe @ 2021-07-23 17:56 UTC (permalink / raw)
  To: Al Viro; +Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On 7/23/21 11:36 AM, Jens Axboe wrote:
> On 7/23/21 11:32 AM, Jens Axboe wrote:
>> Outside of that, we're not submitting off release, only killing anything
>> pending. The only odd case there is iopoll, but that doesn't resubmit
>> here.
> 
> OK perhaps I'm wrong on this one - if we have a pending iopoll request,
> and we run into the rare case of needing resubmit, we are doing that off
> the release path and that should not happen. Hence it could potentially
> happen for iosched and/or low queue depth devices, if you are using a
> ring for pure polling. I'll patch that up.

Will send out two patches for this. Note that I don't see this being a
real issue, as we explicitly gave the ring fd to another task, and being
that this is purely for read/write, it would result in -EFAULT anyway.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-23 17:56                     ` Jens Axboe
@ 2021-07-23 19:00                       ` Al Viro
  2021-07-23 20:10                         ` Jens Axboe
  2021-07-23 20:19                         ` Al Viro
  0 siblings, 2 replies; 26+ messages in thread
From: Al Viro @ 2021-07-23 19:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On Fri, Jul 23, 2021 at 11:56:29AM -0600, Jens Axboe wrote:

> Will send out two patches for this. Note that I don't see this being a
> real issue, as we explicitly gave the ring fd to another task, and being
> that this is purely for read/write, it would result in -EFAULT anyway.

You do realize that ->release() might come from seriously unexpected places,
right?  E.g. recvmsg() by something that doesn't expect SCM_RIGHTS attached
to it will end up with all struct file references stashed into the sucker
dropped, and if by that time that's the last reference - welcome to ->release()
run as soon as recepient hits task_work_run().

What's more, if you stash that into garbage for unix_gc() to pick, *any*
process closing an AF_UNIX socket might end up running your ->release().

So you really do *not* want to spawn any threads there, let alone
possibly exfiltrating memory contents of happy recepient of your present...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-23 19:00                       ` Al Viro
@ 2021-07-23 20:10                         ` Jens Axboe
  2021-07-23 20:24                           ` Al Viro
  2021-07-23 20:19                         ` Al Viro
  1 sibling, 1 reply; 26+ messages in thread
From: Jens Axboe @ 2021-07-23 20:10 UTC (permalink / raw)
  To: Al Viro; +Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On 7/23/21 1:00 PM, Al Viro wrote:
> On Fri, Jul 23, 2021 at 11:56:29AM -0600, Jens Axboe wrote:
> 
>> Will send out two patches for this. Note that I don't see this being a
>> real issue, as we explicitly gave the ring fd to another task, and being
>> that this is purely for read/write, it would result in -EFAULT anyway.
> 
> You do realize that ->release() might come from seriously unexpected
> places, right?  E.g. recvmsg() by something that doesn't expect
> SCM_RIGHTS attached to it will end up with all struct file references
> stashed into the sucker dropped, and if by that time that's the last
> reference - welcome to ->release() run as soon as recepient hits
> task_work_run().
> 
> What's more, if you stash that into garbage for unix_gc() to pick,
> *any* process closing an AF_UNIX socket might end up running your
> ->release().
> 
> So you really do *not* want to spawn any threads there, let alone
> possibly exfiltrating memory contents of happy recepient of your
> present...

Yes I know, and the iopoll was the exception - we don't do anything but
cancel off release otherwise.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-23 19:00                       ` Al Viro
  2021-07-23 20:10                         ` Jens Axboe
@ 2021-07-23 20:19                         ` Al Viro
  2021-07-23 23:45                           ` Matthew Wilcox
  1 sibling, 1 reply; 26+ messages in thread
From: Al Viro @ 2021-07-23 20:19 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On Fri, Jul 23, 2021 at 07:00:40PM +0000, Al Viro wrote:
> On Fri, Jul 23, 2021 at 11:56:29AM -0600, Jens Axboe wrote:
> 
> > Will send out two patches for this. Note that I don't see this being a
> > real issue, as we explicitly gave the ring fd to another task, and being
> > that this is purely for read/write, it would result in -EFAULT anyway.
> 
> You do realize that ->release() might come from seriously unexpected places,
> right?  E.g. recvmsg() by something that doesn't expect SCM_RIGHTS attached
> to it will end up with all struct file references stashed into the sucker
> dropped, and if by that time that's the last reference - welcome to ->release()
> run as soon as recepient hits task_work_run().
> 
> What's more, if you stash that into garbage for unix_gc() to pick, *any*
> process closing an AF_UNIX socket might end up running your ->release().
> 
> So you really do *not* want to spawn any threads there, let alone
> possibly exfiltrating memory contents of happy recepient of your present...

To elaborate: ->release() instance may not assume anything about current->mm,
or assume anything about current, for that matter.  It is entirely possible
to arrange its execution in context of a process that is not yours and had not
consent to doing that.  In particular, it's a hard bug to have _any_ visible
effects depending upon the memory mappings, memory contents or the contents of
descriptor table of the process in question.

There's really no way around that.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-23 20:10                         ` Jens Axboe
@ 2021-07-23 20:24                           ` Al Viro
  2021-07-23 22:32                             ` Jens Axboe
  0 siblings, 1 reply; 26+ messages in thread
From: Al Viro @ 2021-07-23 20:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On Fri, Jul 23, 2021 at 02:10:40PM -0600, Jens Axboe wrote:
> On 7/23/21 1:00 PM, Al Viro wrote:
> > On Fri, Jul 23, 2021 at 11:56:29AM -0600, Jens Axboe wrote:
> > 
> >> Will send out two patches for this. Note that I don't see this being a
> >> real issue, as we explicitly gave the ring fd to another task, and being
> >> that this is purely for read/write, it would result in -EFAULT anyway.
> > 
> > You do realize that ->release() might come from seriously unexpected
> > places, right?  E.g. recvmsg() by something that doesn't expect
> > SCM_RIGHTS attached to it will end up with all struct file references
> > stashed into the sucker dropped, and if by that time that's the last
> > reference - welcome to ->release() run as soon as recepient hits
> > task_work_run().
> > 
> > What's more, if you stash that into garbage for unix_gc() to pick,
> > *any* process closing an AF_UNIX socket might end up running your
> > ->release().
> > 
> > So you really do *not* want to spawn any threads there, let alone
> > possibly exfiltrating memory contents of happy recepient of your
> > present...
> 
> Yes I know, and the iopoll was the exception - we don't do anything but
> cancel off release otherwise.

Not saying you don't - I just want to have that in (searchable) archives.
Ideally we need that kind of stuff in Documentation/*, but having it
findable by google search is at least better than nothing...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-23 20:24                           ` Al Viro
@ 2021-07-23 22:32                             ` Jens Axboe
  0 siblings, 0 replies; 26+ messages in thread
From: Jens Axboe @ 2021-07-23 22:32 UTC (permalink / raw)
  To: Al Viro; +Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On 7/23/21 2:24 PM, Al Viro wrote:
> On Fri, Jul 23, 2021 at 02:10:40PM -0600, Jens Axboe wrote:
>> On 7/23/21 1:00 PM, Al Viro wrote:
>>> On Fri, Jul 23, 2021 at 11:56:29AM -0600, Jens Axboe wrote:
>>>
>>>> Will send out two patches for this. Note that I don't see this being a
>>>> real issue, as we explicitly gave the ring fd to another task, and being
>>>> that this is purely for read/write, it would result in -EFAULT anyway.
>>>
>>> You do realize that ->release() might come from seriously unexpected
>>> places, right?  E.g. recvmsg() by something that doesn't expect
>>> SCM_RIGHTS attached to it will end up with all struct file references
>>> stashed into the sucker dropped, and if by that time that's the last
>>> reference - welcome to ->release() run as soon as recepient hits
>>> task_work_run().
>>>
>>> What's more, if you stash that into garbage for unix_gc() to pick,
>>> *any* process closing an AF_UNIX socket might end up running your
>>> ->release().
>>>
>>> So you really do *not* want to spawn any threads there, let alone
>>> possibly exfiltrating memory contents of happy recepient of your
>>> present...
>>
>> Yes I know, and the iopoll was the exception - we don't do anything but
>> cancel off release otherwise.
> 
> Not saying you don't - I just want to have that in (searchable) archives.
> Ideally we need that kind of stuff in Documentation/*, but having it
> findable by google search is at least better than nothing...

Agree, and I'll amend the commit to include a reference to it as well
and expand the explanation a bit. The easier that kind of stuff is to
find, the better.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-23 20:19                         ` Al Viro
@ 2021-07-23 23:45                           ` Matthew Wilcox
  2021-07-23 23:57                             ` Jens Axboe
  2021-07-24  1:31                             ` Al Viro
  0 siblings, 2 replies; 26+ messages in thread
From: Matthew Wilcox @ 2021-07-23 23:45 UTC (permalink / raw)
  To: Al Viro
  Cc: Jens Axboe, Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On Fri, Jul 23, 2021 at 08:19:49PM +0000, Al Viro wrote:
> To elaborate: ->release() instance may not assume anything about current->mm,
> or assume anything about current, for that matter.  It is entirely possible
> to arrange its execution in context of a process that is not yours and had not
> consent to doing that.  In particular, it's a hard bug to have _any_ visible
> effects depending upon the memory mappings, memory contents or the contents of
> descriptor table of the process in question.

Hmm.  Could we add a poison_current() function?  Something like ...

static inline void call_release(struct file *file, struct inode *inode)
{
	void *tmp = poison_current();
	if (file->f_op->release)
		file->f_op->release(inode, file);
	restore_current(tmp);
}

Should be straightforward for asm-generic/current.h and for x86 too.
Probably have to disable preemption?  Maybe interrupts too?  Not sure
what's kept in current these days that an interrupt handler might
rely on being able to access temporarily.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-23 23:45                           ` Matthew Wilcox
@ 2021-07-23 23:57                             ` Jens Axboe
  2021-07-24  1:31                             ` Al Viro
  1 sibling, 0 replies; 26+ messages in thread
From: Jens Axboe @ 2021-07-23 23:57 UTC (permalink / raw)
  To: Matthew Wilcox, Al Viro
  Cc: Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On 7/23/21 5:45 PM, Matthew Wilcox wrote:
> On Fri, Jul 23, 2021 at 08:19:49PM +0000, Al Viro wrote:
>> To elaborate: ->release() instance may not assume anything about current->mm,
>> or assume anything about current, for that matter.  It is entirely possible
>> to arrange its execution in context of a process that is not yours and had not
>> consent to doing that.  In particular, it's a hard bug to have _any_ visible
>> effects depending upon the memory mappings, memory contents or the contents of
>> descriptor table of the process in question.
> 
> Hmm.  Could we add a poison_current() function?  Something like ...
> 
> static inline void call_release(struct file *file, struct inode *inode)
> {
> 	void *tmp = poison_current();
> 	if (file->f_op->release)
> 		file->f_op->release(inode, file);
> 	restore_current(tmp);
> }
> 
> Should be straightforward for asm-generic/current.h and for x86 too.
> Probably have to disable preemption?  Maybe interrupts too?  Not sure
> what's kept in current these days that an interrupt handler might
> rely on being able to access temporarily.

->release() would probably be unhappy with preempt and/or interrupts
disabled for a lot of legit cases...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] io_uring: refactor io_sq_offload_create()
  2021-07-23 23:45                           ` Matthew Wilcox
  2021-07-23 23:57                             ` Jens Axboe
@ 2021-07-24  1:31                             ` Al Viro
  1 sibling, 0 replies; 26+ messages in thread
From: Al Viro @ 2021-07-24  1:31 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jens Axboe, Pavel Begunkov, io-uring, linux-fsdevel, Linus Torvalds

On Sat, Jul 24, 2021 at 12:45:12AM +0100, Matthew Wilcox wrote:
> On Fri, Jul 23, 2021 at 08:19:49PM +0000, Al Viro wrote:
> > To elaborate: ->release() instance may not assume anything about current->mm,
> > or assume anything about current, for that matter.  It is entirely possible
> > to arrange its execution in context of a process that is not yours and had not
> > consent to doing that.  In particular, it's a hard bug to have _any_ visible
> > effects depending upon the memory mappings, memory contents or the contents of
> > descriptor table of the process in question.
> 
> Hmm.  Could we add a poison_current() function?  Something like ...
> 
> static inline void call_release(struct file *file, struct inode *inode)
> {
> 	void *tmp = poison_current();
> 	if (file->f_op->release)
> 		file->f_op->release(inode, file);
> 	restore_current(tmp);
> }
> 
> Should be straightforward for asm-generic/current.h and for x86 too.
> Probably have to disable preemption?  Maybe interrupts too?  Not sure
> what's kept in current these days that an interrupt handler might
> rely on being able to access temporarily.

->release() might grab a mutex, for example.  Scheduler is going to be unhappy
if it runs into somebody playing silly buggers with current...

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-07-24  1:38 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-20 11:03 [PATCH 0/3] small 5.13 cleanups Pavel Begunkov
2021-04-20 11:03 ` [PATCH 1/3] io_uring: move inflight un-tracking into cleanup Pavel Begunkov
2021-04-20 11:03 ` [PATCH 2/3] io_uring: safer sq_creds putting Pavel Begunkov
2021-04-20 11:03 ` [PATCH 3/3] io_uring: refactor io_sq_offload_create() Pavel Begunkov
2021-07-22 21:59   ` Al Viro
2021-07-22 23:06     ` Jens Axboe
2021-07-22 23:30       ` Al Viro
2021-07-22 23:42         ` Jens Axboe
2021-07-23  0:10           ` Al Viro
2021-07-23  0:12             ` Al Viro
2021-07-23 16:17             ` Jens Axboe
2021-07-23 17:11               ` Al Viro
2021-07-23 17:32                 ` Jens Axboe
2021-07-23 17:36                   ` Jens Axboe
2021-07-23 17:56                     ` Jens Axboe
2021-07-23 19:00                       ` Al Viro
2021-07-23 20:10                         ` Jens Axboe
2021-07-23 20:24                           ` Al Viro
2021-07-23 22:32                             ` Jens Axboe
2021-07-23 20:19                         ` Al Viro
2021-07-23 23:45                           ` Matthew Wilcox
2021-07-23 23:57                             ` Jens Axboe
2021-07-24  1:31                             ` Al Viro
2021-07-23  0:03         ` Al Viro
2021-07-23  9:59     ` Pavel Begunkov
2021-04-20 18:55 ` [PATCH 0/3] small 5.13 cleanups Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.