>> I'm sorry, but I'm still unsure we're talking about the same thing >> (or maybe I'm missing some basics here). >> >> My understanding of the io_uring_enter() is that it will execute as much >> non-blocking calls as it can without switching to any other kernel thread. > > Correct, any SQE that we can do without switching, we will. > >> And my fear is that openat will use get_current_cred() instead of >> ctx->creds. > > OK, I think I follow your concern. So you'd like to setup the rings from > a _different_ user, and then later on use it for submission for SQEs that > a specific user. So sort of the same as our initial discussion, except > the mapping would be static. The difference being that you might setup > the ring from a different user than the user that would be submitting IO > on it? Our current (much simplified here) flow is this: # we start as root seteuid(0);setegid(0);setgroups()... ... # we become the user555 and # create our desired credential token seteuid(555); seteguid(555); setgroups()... # Start an openat2 on behalf of user555 openat2() # we unbecome the user again and run as root seteuid(0);setegid(0); setgroups()... ... # we become the user444 and # create our desired credential token seteuid(444); seteguid(444); setgroups()... # Start an openat2 on behalf of user444 openat2() # we unbecome the user again and run as root seteuid(0);setegid(0); setgroups()... ... # we become the user555 and # create our desired credential token seteuid(555); seteguid(555); setgroups()... # Start an openat2 on behalf of user555 openat2() # we unbecome the user again and run as root seteuid(0);setegid(0); setgroups()... It means we have to do about 7 syscalls in order to open a file on behalf of a user. (In reality we cache things and avoid set*id() calls most of the time, but I want to demonstrate the simplified design here) With io_uring I'd like to use a flow like this: # we start as root seteuid(0);setegid(0);setgroups()... ... # we become the user444 and # create our desired credential token seteuid(444); seteguid(444); setgroups()... # we snapshot the credentials to the new ring for user444 ring444 = io_uring_setup() # we unbecome the user again and run as root seteuid(0);setegid(0);setgroups()... ... # we become the user555 and # create our desired credential token seteuid(555); seteguid(555); setgroups()... # we snapshot the credentials to the new ring for user555 ring555 = io_uring_setup() # we unbecome the user again and run as root seteuid(0);setegid(0);setgroups()... ... # Start an openat2 on behalf of user555 io_uring_enter(ring555, OP_OPENAT2...) ... # Start an openat2 on behalf of user444 io_uring_enter(ring444, OP_OPENAT2...) ... # Start an openat2 on behalf of user555 io_uring_enter(ring555, OP_OPENAT2...) So instead of constantly doing 7 syscalls per open, we would be down to just at most one. And I would assume that io_uring_enter() would do the temporary credential switch for me also in the non-blocking case. > If so, then we do need something to support that, probably an > IORING_REGISTER_CREDS or similar. This would allow you to replace the > creds you currently have in ctx->creds with whatever new one. I don't want to change ctx->creds, but I want it to be used consistently. What I think is missing is something like this: diff --git a/fs/io_uring.c b/fs/io_uring.c index 32aee149f652..55dbb154915a 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6359,10 +6359,27 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, struct mm_struct *cur_mm; mutex_lock(&ctx->uring_lock); + if (current->mm != ctx->sqo_mm) { + // TODO: somthing like this... + restore_mm = current->mm; + use_mm(ctx->sqo_mm); + } /* already have mm, so io_submit_sqes() won't try to grab it */ cur_mm = ctx->sqo_mm; + if (current_cred() != ctx->creds) { + // TODO: somthing like this... + restore_cred = override_creds(ctx->creds); + } submitted = io_submit_sqes(ctx, to_submit, f.file, fd, &cur_mm, false); + if (restore_cred != NULL) { + revert_creds(restore_cred); + } + if (restore_mm != NULL) { + // TODO: something like this... + unuse_mm(ctx->sqo_mm); + use_mm(restore_mm); + } mutex_unlock(&ctx->uring_lock); if (submitted != to_submit) I'm not sure if current->mm is needed, I just added it for completeness and as hint that io_op_defs[req->opcode].needs_mm is there and a needs_creds could also be added (if it helps with performance) Is it possible to trigger a change of current->mm from userspace? An IORING_REGISTER_CREDS would only be useful if it's possible to register a set of credentials and then use per io_uring_sqe credentials. That would also be fine for me, but I'm not sure it's needed for now. Apart from IORING_REGISTER_CREDS I think a change like the one above is needed in order to avoid potential security problems. >> I'm I missing something? > > I think we're talking about the same thing, just different views of it :-) I hope it's clear from my side now :-) Thanks! metze