io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
@ 2020-01-28 10:18 Stefan Metzmacher
  2020-01-28 16:10 ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Stefan Metzmacher @ 2020-01-28 10:18 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Linux API Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 1080 bytes --]

Hi Jens,

now that we have IORING_FEAT_CUR_PERSONALITY...

How can we optimize the fileserver case now, in order to avoid the
overhead of always calling 5 syscalls before io_uring_enter()?:

 /* gain root again */
 setresuid(-1,0,-1); setresgid(-1,0,-1)
 /* impersonate the user with groups */
 setgroups(num, grps); setresgid(-1,gid,-1); setresuid(-1,uid,-1);
 /* trigger the operation */
 io_uring_enter();

I guess some kind of IORING_REGISTER_CREDS[_UPDATE] would be
good, together with a IOSQE_FIXED_CREDS in order to specify
credentials per operation.

Or we make it much more generic and introduce a credsfd_create()
syscall in order to get an fd for a credential handle, maybe
together with another syscall to activate the credentials of
the current thread (or let a write to the fd trigger the activation
in order to avoid an additional syscall number).

Having just an fd would allow IORING_REGISTER_CREDS[_UPDATE]
to be just an array of int values instead of a more complex
structure to define the credentials.

What do you think?
metze


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 10:18 IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? Stefan Metzmacher
@ 2020-01-28 16:10 ` Jens Axboe
  2020-01-28 16:17   ` Stefan Metzmacher
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2020-01-28 16:10 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List

On 1/28/20 3:18 AM, Stefan Metzmacher wrote:
> Hi Jens,
> 
> now that we have IORING_FEAT_CUR_PERSONALITY...
> 
> How can we optimize the fileserver case now, in order to avoid the
> overhead of always calling 5 syscalls before io_uring_enter()?:
> 
>  /* gain root again */
>  setresuid(-1,0,-1); setresgid(-1,0,-1)
>  /* impersonate the user with groups */
>  setgroups(num, grps); setresgid(-1,gid,-1); setresuid(-1,uid,-1);
>  /* trigger the operation */
>  io_uring_enter();
> 
> I guess some kind of IORING_REGISTER_CREDS[_UPDATE] would be
> good, together with a IOSQE_FIXED_CREDS in order to specify
> credentials per operation.
> 
> Or we make it much more generic and introduce a credsfd_create()
> syscall in order to get an fd for a credential handle, maybe
> together with another syscall to activate the credentials of
> the current thread (or let a write to the fd trigger the activation
> in order to avoid an additional syscall number).
> 
> Having just an fd would allow IORING_REGISTER_CREDS[_UPDATE]
> to be just an array of int values instead of a more complex
> structure to define the credentials.

I'd rather avoid having to add more infrastructure for this, even if
credsfd_create() would be nifty.

With that in mind, something like:

- Application does IORING_REGISTER_CREDS, which returns some index
- Add a IORING_OP_USE_CREDS opcode, which sets the creds associated
  with dependent commands
- Actual request is linked to the IORING_OP_USE_CREDS command, any
  link off IORING_OP_USE_CREDS will use those credentials
- IORING_UNREGISTER_CREDS removes the registered creds

Just throwing that out there, definitely willing to entertain other
methods that make sense for this. Trying to avoid needing to put this
information in the SQE itself, hence the idea to use a chain of links
for it.

The downside is that we'll need to maintain an array of key -> creds,
but that's probably not a big deal.

What do you think? Ideally I'd like to get this done for 5.6 even if we
are a bit late, so you'll have everything you need with that release.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 16:10 ` Jens Axboe
@ 2020-01-28 16:17   ` Stefan Metzmacher
  2020-01-28 16:19     ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Stefan Metzmacher @ 2020-01-28 16:17 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Linux API Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 2515 bytes --]

Am 28.01.20 um 17:10 schrieb Jens Axboe:
> On 1/28/20 3:18 AM, Stefan Metzmacher wrote:
>> Hi Jens,
>>
>> now that we have IORING_FEAT_CUR_PERSONALITY...
>>
>> How can we optimize the fileserver case now, in order to avoid the
>> overhead of always calling 5 syscalls before io_uring_enter()?:
>>
>>  /* gain root again */
>>  setresuid(-1,0,-1); setresgid(-1,0,-1)
>>  /* impersonate the user with groups */
>>  setgroups(num, grps); setresgid(-1,gid,-1); setresuid(-1,uid,-1);
>>  /* trigger the operation */
>>  io_uring_enter();
>>
>> I guess some kind of IORING_REGISTER_CREDS[_UPDATE] would be
>> good, together with a IOSQE_FIXED_CREDS in order to specify
>> credentials per operation.
>>
>> Or we make it much more generic and introduce a credsfd_create()
>> syscall in order to get an fd for a credential handle, maybe
>> together with another syscall to activate the credentials of
>> the current thread (or let a write to the fd trigger the activation
>> in order to avoid an additional syscall number).
>>
>> Having just an fd would allow IORING_REGISTER_CREDS[_UPDATE]
>> to be just an array of int values instead of a more complex
>> structure to define the credentials.
> 
> I'd rather avoid having to add more infrastructure for this, even if
> credsfd_create() would be nifty.
> 
> With that in mind, something like:
> 
> - Application does IORING_REGISTER_CREDS, which returns some index
>
> - Add a IORING_OP_USE_CREDS opcode, which sets the creds associated
>   with dependent commands
> - Actual request is linked to the IORING_OP_USE_CREDS command, any
>   link off IORING_OP_USE_CREDS will use those credentials

Using links for this sounds ok.

> - IORING_UNREGISTER_CREDS removes the registered creds
> 
> Just throwing that out there, definitely willing to entertain other
> methods that make sense for this. Trying to avoid needing to put this
> information in the SQE itself, hence the idea to use a chain of links
> for it.
> 
> The downside is that we'll need to maintain an array of key -> creds,
> but that's probably not a big deal.
> 
> What do you think?

So IORING_REGISTER_CREDS would be a simple operation that just takes a
snapshot of the current_cred() and returns an id that can be passed to
IORING_OP_USE_CREDS or IORING_UNREGISTER_CREDS?

> Ideally I'd like to get this done for 5.6 even if we
> are a bit late, so you'll have everything you need with that release.

That would be great!

Thanks!
metze


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 16:17   ` Stefan Metzmacher
@ 2020-01-28 16:19     ` Jens Axboe
  2020-01-28 17:19       ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2020-01-28 16:19 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List

On 1/28/20 9:17 AM, Stefan Metzmacher wrote:
> Am 28.01.20 um 17:10 schrieb Jens Axboe:
>> On 1/28/20 3:18 AM, Stefan Metzmacher wrote:
>>> Hi Jens,
>>>
>>> now that we have IORING_FEAT_CUR_PERSONALITY...
>>>
>>> How can we optimize the fileserver case now, in order to avoid the
>>> overhead of always calling 5 syscalls before io_uring_enter()?:
>>>
>>>  /* gain root again */
>>>  setresuid(-1,0,-1); setresgid(-1,0,-1)
>>>  /* impersonate the user with groups */
>>>  setgroups(num, grps); setresgid(-1,gid,-1); setresuid(-1,uid,-1);
>>>  /* trigger the operation */
>>>  io_uring_enter();
>>>
>>> I guess some kind of IORING_REGISTER_CREDS[_UPDATE] would be
>>> good, together with a IOSQE_FIXED_CREDS in order to specify
>>> credentials per operation.
>>>
>>> Or we make it much more generic and introduce a credsfd_create()
>>> syscall in order to get an fd for a credential handle, maybe
>>> together with another syscall to activate the credentials of
>>> the current thread (or let a write to the fd trigger the activation
>>> in order to avoid an additional syscall number).
>>>
>>> Having just an fd would allow IORING_REGISTER_CREDS[_UPDATE]
>>> to be just an array of int values instead of a more complex
>>> structure to define the credentials.
>>
>> I'd rather avoid having to add more infrastructure for this, even if
>> credsfd_create() would be nifty.
>>
>> With that in mind, something like:
>>
>> - Application does IORING_REGISTER_CREDS, which returns some index
>>
>> - Add a IORING_OP_USE_CREDS opcode, which sets the creds associated
>>   with dependent commands
>> - Actual request is linked to the IORING_OP_USE_CREDS command, any
>>   link off IORING_OP_USE_CREDS will use those credentials
> 
> Using links for this sounds ok.

Great! I'll try and hack this up and see how it turns out.

>> - IORING_UNREGISTER_CREDS removes the registered creds
>>
>> Just throwing that out there, definitely willing to entertain other
>> methods that make sense for this. Trying to avoid needing to put this
>> information in the SQE itself, hence the idea to use a chain of links
>> for it.
>>
>> The downside is that we'll need to maintain an array of key -> creds,
>> but that's probably not a big deal.
>>
>> What do you think?
> 
> So IORING_REGISTER_CREDS would be a simple operation that just takes a
> snapshot of the current_cred() and returns an id that can be passed to
> IORING_OP_USE_CREDS or IORING_UNREGISTER_CREDS?

Right, you would not pass in any arguments, it'd have to be run from the
personality you wish to register. It simply returns an integer, which is
a key to use for IORING_OP_USE_CREDS, or at the end for
IORING_UNREGISTER_CREDS when you no longer wish to use this personality.

>> Ideally I'd like to get this done for 5.6 even if we
>> are a bit late, so you'll have everything you need with that release.
> 
> That would be great!

Crossing fingers...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 16:19     ` Jens Axboe
@ 2020-01-28 17:19       ` Jens Axboe
  2020-01-28 18:04         ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2020-01-28 17:19 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List, Pavel Begunkov

On 1/28/20 9:19 AM, Jens Axboe wrote:
> On 1/28/20 9:17 AM, Stefan Metzmacher wrote:
>> Am 28.01.20 um 17:10 schrieb Jens Axboe:
>>> On 1/28/20 3:18 AM, Stefan Metzmacher wrote:
>>>> Hi Jens,
>>>>
>>>> now that we have IORING_FEAT_CUR_PERSONALITY...
>>>>
>>>> How can we optimize the fileserver case now, in order to avoid the
>>>> overhead of always calling 5 syscalls before io_uring_enter()?:
>>>>
>>>>  /* gain root again */
>>>>  setresuid(-1,0,-1); setresgid(-1,0,-1)
>>>>  /* impersonate the user with groups */
>>>>  setgroups(num, grps); setresgid(-1,gid,-1); setresuid(-1,uid,-1);
>>>>  /* trigger the operation */
>>>>  io_uring_enter();
>>>>
>>>> I guess some kind of IORING_REGISTER_CREDS[_UPDATE] would be
>>>> good, together with a IOSQE_FIXED_CREDS in order to specify
>>>> credentials per operation.
>>>>
>>>> Or we make it much more generic and introduce a credsfd_create()
>>>> syscall in order to get an fd for a credential handle, maybe
>>>> together with another syscall to activate the credentials of
>>>> the current thread (or let a write to the fd trigger the activation
>>>> in order to avoid an additional syscall number).
>>>>
>>>> Having just an fd would allow IORING_REGISTER_CREDS[_UPDATE]
>>>> to be just an array of int values instead of a more complex
>>>> structure to define the credentials.
>>>
>>> I'd rather avoid having to add more infrastructure for this, even if
>>> credsfd_create() would be nifty.
>>>
>>> With that in mind, something like:
>>>
>>> - Application does IORING_REGISTER_CREDS, which returns some index
>>>
>>> - Add a IORING_OP_USE_CREDS opcode, which sets the creds associated
>>>   with dependent commands
>>> - Actual request is linked to the IORING_OP_USE_CREDS command, any
>>>   link off IORING_OP_USE_CREDS will use those credentials
>>
>> Using links for this sounds ok.
> 
> Great! I'll try and hack this up and see how it turns out.
> 
>>> - IORING_UNREGISTER_CREDS removes the registered creds
>>>
>>> Just throwing that out there, definitely willing to entertain other
>>> methods that make sense for this. Trying to avoid needing to put this
>>> information in the SQE itself, hence the idea to use a chain of links
>>> for it.
>>>
>>> The downside is that we'll need to maintain an array of key -> creds,
>>> but that's probably not a big deal.
>>>
>>> What do you think?
>>
>> So IORING_REGISTER_CREDS would be a simple operation that just takes a
>> snapshot of the current_cred() and returns an id that can be passed to
>> IORING_OP_USE_CREDS or IORING_UNREGISTER_CREDS?
> 
> Right, you would not pass in any arguments, it'd have to be run from the
> personality you wish to register. It simply returns an integer, which is
> a key to use for IORING_OP_USE_CREDS, or at the end for
> IORING_UNREGISTER_CREDS when you no longer wish to use this personality.
> 
>>> Ideally I'd like to get this done for 5.6 even if we
>>> are a bit late, so you'll have everything you need with that release.
>>
>> That would be great!
> 
> Crossing fingers...

OK, so here are two patches for testing:

https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds

#1 adds support for registering the personality of the invoking task,
and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
just having one link, it doesn't support a chain of them.

I'll try and write a test case for this just to see if it actually works,
so far it's totally untested. 

Adding Pavel to the CC.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 17:19       ` Jens Axboe
@ 2020-01-28 18:04         ` Jens Axboe
  2020-01-28 19:42           ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2020-01-28 18:04 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List, Pavel Begunkov

On 1/28/20 10:19 AM, Jens Axboe wrote:
> On 1/28/20 9:19 AM, Jens Axboe wrote:
>> On 1/28/20 9:17 AM, Stefan Metzmacher wrote:
>>> Am 28.01.20 um 17:10 schrieb Jens Axboe:
>>>> On 1/28/20 3:18 AM, Stefan Metzmacher wrote:
>>>>> Hi Jens,
>>>>>
>>>>> now that we have IORING_FEAT_CUR_PERSONALITY...
>>>>>
>>>>> How can we optimize the fileserver case now, in order to avoid the
>>>>> overhead of always calling 5 syscalls before io_uring_enter()?:
>>>>>
>>>>>  /* gain root again */
>>>>>  setresuid(-1,0,-1); setresgid(-1,0,-1)
>>>>>  /* impersonate the user with groups */
>>>>>  setgroups(num, grps); setresgid(-1,gid,-1); setresuid(-1,uid,-1);
>>>>>  /* trigger the operation */
>>>>>  io_uring_enter();
>>>>>
>>>>> I guess some kind of IORING_REGISTER_CREDS[_UPDATE] would be
>>>>> good, together with a IOSQE_FIXED_CREDS in order to specify
>>>>> credentials per operation.
>>>>>
>>>>> Or we make it much more generic and introduce a credsfd_create()
>>>>> syscall in order to get an fd for a credential handle, maybe
>>>>> together with another syscall to activate the credentials of
>>>>> the current thread (or let a write to the fd trigger the activation
>>>>> in order to avoid an additional syscall number).
>>>>>
>>>>> Having just an fd would allow IORING_REGISTER_CREDS[_UPDATE]
>>>>> to be just an array of int values instead of a more complex
>>>>> structure to define the credentials.
>>>>
>>>> I'd rather avoid having to add more infrastructure for this, even if
>>>> credsfd_create() would be nifty.
>>>>
>>>> With that in mind, something like:
>>>>
>>>> - Application does IORING_REGISTER_CREDS, which returns some index
>>>>
>>>> - Add a IORING_OP_USE_CREDS opcode, which sets the creds associated
>>>>   with dependent commands
>>>> - Actual request is linked to the IORING_OP_USE_CREDS command, any
>>>>   link off IORING_OP_USE_CREDS will use those credentials
>>>
>>> Using links for this sounds ok.
>>
>> Great! I'll try and hack this up and see how it turns out.
>>
>>>> - IORING_UNREGISTER_CREDS removes the registered creds
>>>>
>>>> Just throwing that out there, definitely willing to entertain other
>>>> methods that make sense for this. Trying to avoid needing to put this
>>>> information in the SQE itself, hence the idea to use a chain of links
>>>> for it.
>>>>
>>>> The downside is that we'll need to maintain an array of key -> creds,
>>>> but that's probably not a big deal.
>>>>
>>>> What do you think?
>>>
>>> So IORING_REGISTER_CREDS would be a simple operation that just takes a
>>> snapshot of the current_cred() and returns an id that can be passed to
>>> IORING_OP_USE_CREDS or IORING_UNREGISTER_CREDS?
>>
>> Right, you would not pass in any arguments, it'd have to be run from the
>> personality you wish to register. It simply returns an integer, which is
>> a key to use for IORING_OP_USE_CREDS, or at the end for
>> IORING_UNREGISTER_CREDS when you no longer wish to use this personality.
>>
>>>> Ideally I'd like to get this done for 5.6 even if we
>>>> are a bit late, so you'll have everything you need with that release.
>>>
>>> That would be great!
>>
>> Crossing fingers...
> 
> OK, so here are two patches for testing:
> 
> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
> 
> #1 adds support for registering the personality of the invoking task,
> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
> just having one link, it doesn't support a chain of them.
> 
> I'll try and write a test case for this just to see if it actually works,
> so far it's totally untested. 
> 
> Adding Pavel to the CC.

Minor tweak to ensuring we do the right thing for async offload as well,
and it tests fine for me. Test case is:

- Run as root
- Register personality for root
- create root only file
- check we can IORING_OP_OPENAT the file
- switch to user id test
- check we cannot IORING_OP_OPENAT the file
- check that we can open the file with IORING_OP_USE_CREDS linked


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 18:04         ` Jens Axboe
@ 2020-01-28 19:42           ` Jens Axboe
  2020-01-28 20:16             ` Pavel Begunkov
                               ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Jens Axboe @ 2020-01-28 19:42 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List, Pavel Begunkov

On 1/28/20 11:04 AM, Jens Axboe wrote:
> On 1/28/20 10:19 AM, Jens Axboe wrote:
>> On 1/28/20 9:19 AM, Jens Axboe wrote:
>>> On 1/28/20 9:17 AM, Stefan Metzmacher wrote:
>>>> Am 28.01.20 um 17:10 schrieb Jens Axboe:
>>>>> On 1/28/20 3:18 AM, Stefan Metzmacher wrote:
>>>>>> Hi Jens,
>>>>>>
>>>>>> now that we have IORING_FEAT_CUR_PERSONALITY...
>>>>>>
>>>>>> How can we optimize the fileserver case now, in order to avoid the
>>>>>> overhead of always calling 5 syscalls before io_uring_enter()?:
>>>>>>
>>>>>>  /* gain root again */
>>>>>>  setresuid(-1,0,-1); setresgid(-1,0,-1)
>>>>>>  /* impersonate the user with groups */
>>>>>>  setgroups(num, grps); setresgid(-1,gid,-1); setresuid(-1,uid,-1);
>>>>>>  /* trigger the operation */
>>>>>>  io_uring_enter();
>>>>>>
>>>>>> I guess some kind of IORING_REGISTER_CREDS[_UPDATE] would be
>>>>>> good, together with a IOSQE_FIXED_CREDS in order to specify
>>>>>> credentials per operation.
>>>>>>
>>>>>> Or we make it much more generic and introduce a credsfd_create()
>>>>>> syscall in order to get an fd for a credential handle, maybe
>>>>>> together with another syscall to activate the credentials of
>>>>>> the current thread (or let a write to the fd trigger the activation
>>>>>> in order to avoid an additional syscall number).
>>>>>>
>>>>>> Having just an fd would allow IORING_REGISTER_CREDS[_UPDATE]
>>>>>> to be just an array of int values instead of a more complex
>>>>>> structure to define the credentials.
>>>>>
>>>>> I'd rather avoid having to add more infrastructure for this, even if
>>>>> credsfd_create() would be nifty.
>>>>>
>>>>> With that in mind, something like:
>>>>>
>>>>> - Application does IORING_REGISTER_CREDS, which returns some index
>>>>>
>>>>> - Add a IORING_OP_USE_CREDS opcode, which sets the creds associated
>>>>>   with dependent commands
>>>>> - Actual request is linked to the IORING_OP_USE_CREDS command, any
>>>>>   link off IORING_OP_USE_CREDS will use those credentials
>>>>
>>>> Using links for this sounds ok.
>>>
>>> Great! I'll try and hack this up and see how it turns out.
>>>
>>>>> - IORING_UNREGISTER_CREDS removes the registered creds
>>>>>
>>>>> Just throwing that out there, definitely willing to entertain other
>>>>> methods that make sense for this. Trying to avoid needing to put this
>>>>> information in the SQE itself, hence the idea to use a chain of links
>>>>> for it.
>>>>>
>>>>> The downside is that we'll need to maintain an array of key -> creds,
>>>>> but that's probably not a big deal.
>>>>>
>>>>> What do you think?
>>>>
>>>> So IORING_REGISTER_CREDS would be a simple operation that just takes a
>>>> snapshot of the current_cred() and returns an id that can be passed to
>>>> IORING_OP_USE_CREDS or IORING_UNREGISTER_CREDS?
>>>
>>> Right, you would not pass in any arguments, it'd have to be run from the
>>> personality you wish to register. It simply returns an integer, which is
>>> a key to use for IORING_OP_USE_CREDS, or at the end for
>>> IORING_UNREGISTER_CREDS when you no longer wish to use this personality.
>>>
>>>>> Ideally I'd like to get this done for 5.6 even if we
>>>>> are a bit late, so you'll have everything you need with that release.
>>>>
>>>> That would be great!
>>>
>>> Crossing fingers...
>>
>> OK, so here are two patches for testing:
>>
>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>
>> #1 adds support for registering the personality of the invoking task,
>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
>> just having one link, it doesn't support a chain of them.
>>
>> I'll try and write a test case for this just to see if it actually works,
>> so far it's totally untested. 
>>
>> Adding Pavel to the CC.
> 
> Minor tweak to ensuring we do the right thing for async offload as well,
> and it tests fine for me. Test case is:
> 
> - Run as root
> - Register personality for root
> - create root only file
> - check we can IORING_OP_OPENAT the file
> - switch to user id test
> - check we cannot IORING_OP_OPENAT the file
> - check that we can open the file with IORING_OP_USE_CREDS linked

I didn't like it becoming a bit too complicated, both in terms of
implementation and use. And the fact that we'd have to jump through
hoops to make this work for a full chain.

So I punted and just added sqe->personality and IOSQE_PERSONALITY.
This makes it way easier to use. Same branch:

https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds

I'd feel much better with this variant for 5.6.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 19:42           ` Jens Axboe
@ 2020-01-28 20:16             ` Pavel Begunkov
  2020-01-28 20:19               ` Jens Axboe
  2020-01-28 23:36             ` Pavel Begunkov
  2020-01-29 14:59             ` Jann Horn
  2 siblings, 1 reply; 48+ messages in thread
From: Pavel Begunkov @ 2020-01-28 20:16 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 2158 bytes --]

On 28/01/2020 22:42, Jens Axboe wrote:
> On 1/28/20 11:04 AM, Jens Axboe wrote:
>> On 1/28/20 10:19 AM, Jens Axboe wrote:
>>> On 1/28/20 9:19 AM, Jens Axboe wrote:
>>>> On 1/28/20 9:17 AM, Stefan Metzmacher wrote:
>>> OK, so here are two patches for testing:
>>>
>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>
>>> #1 adds support for registering the personality of the invoking task,
>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
>>> just having one link, it doesn't support a chain of them.
>>>
>>> I'll try and write a test case for this just to see if it actually works,
>>> so far it's totally untested. 
>>>
>>> Adding Pavel to the CC.
>>
>> Minor tweak to ensuring we do the right thing for async offload as well,
>> and it tests fine for me. Test case is:
>>
>> - Run as root
>> - Register personality for root
>> - create root only file
>> - check we can IORING_OP_OPENAT the file
>> - switch to user id test
>> - check we cannot IORING_OP_OPENAT the file
>> - check that we can open the file with IORING_OP_USE_CREDS linked
> 
> I didn't like it becoming a bit too complicated, both in terms of
> implementation and use. And the fact that we'd have to jump through
> hoops to make this work for a full chain.
> 
> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
> This makes it way easier to use. Same branch:
> 
> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
> 
> I'd feel much better with this variant for 5.6.
> 

To be honest, sounds pretty dangerous. Especially since somebody started talking
about stealing fds from a process, it could lead to a nasty loophole somehow.
E.g. root registers its credentials, passes io_uring it to non-privileged
children, and then some process steals the uring fd (though, it would need
priviledged mode for code-injection or else). Could we Cc here someone really
keen on security?

Stefan, could you please explain, how this 5 syscalls pattern from the first
email came in the first place? Just want to understand the case.

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 20:16             ` Pavel Begunkov
@ 2020-01-28 20:19               ` Jens Axboe
  2020-01-28 20:50                 ` Pavel Begunkov
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2020-01-28 20:19 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List

On 1/28/20 1:16 PM, Pavel Begunkov wrote:
> On 28/01/2020 22:42, Jens Axboe wrote:
>> On 1/28/20 11:04 AM, Jens Axboe wrote:
>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
>>>> On 1/28/20 9:19 AM, Jens Axboe wrote:
>>>>> On 1/28/20 9:17 AM, Stefan Metzmacher wrote:
>>>> OK, so here are two patches for testing:
>>>>
>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>>
>>>> #1 adds support for registering the personality of the invoking task,
>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
>>>> just having one link, it doesn't support a chain of them.
>>>>
>>>> I'll try and write a test case for this just to see if it actually works,
>>>> so far it's totally untested. 
>>>>
>>>> Adding Pavel to the CC.
>>>
>>> Minor tweak to ensuring we do the right thing for async offload as well,
>>> and it tests fine for me. Test case is:
>>>
>>> - Run as root
>>> - Register personality for root
>>> - create root only file
>>> - check we can IORING_OP_OPENAT the file
>>> - switch to user id test
>>> - check we cannot IORING_OP_OPENAT the file
>>> - check that we can open the file with IORING_OP_USE_CREDS linked
>>
>> I didn't like it becoming a bit too complicated, both in terms of
>> implementation and use. And the fact that we'd have to jump through
>> hoops to make this work for a full chain.
>>
>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>> This makes it way easier to use. Same branch:
>>
>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>
>> I'd feel much better with this variant for 5.6.
>>
> 
> To be honest, sounds pretty dangerous. Especially since somebody started talking
> about stealing fds from a process, it could lead to a nasty loophole somehow.
> E.g. root registers its credentials, passes io_uring it to non-privileged
> children, and then some process steals the uring fd (though, it would need
> priviledged mode for code-injection or else). Could we Cc here someone really
> keen on security?

Link? If you can steal fds, then surely you've already lost any sense of
security in the first place? Besides, if root registered the ring, the root
credentials are already IN the ring. I don't see how this adds any extra
holes.

> Stefan, could you please explain, how this 5 syscalls pattern from the first
> email came in the first place? Just want to understand the case.

I think if you go back a bit in the archive, Stefan has a fuller explanation
of how samba does the credentials dance.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 20:19               ` Jens Axboe
@ 2020-01-28 20:50                 ` Pavel Begunkov
  2020-01-28 20:56                   ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Pavel Begunkov @ 2020-01-28 20:50 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 3099 bytes --]

On 28/01/2020 23:19, Jens Axboe wrote:
> On 1/28/20 1:16 PM, Pavel Begunkov wrote:
>> On 28/01/2020 22:42, Jens Axboe wrote:
>>> On 1/28/20 11:04 AM, Jens Axboe wrote:
>>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
>>>>> On 1/28/20 9:19 AM, Jens Axboe wrote:
>>>>>> On 1/28/20 9:17 AM, Stefan Metzmacher wrote:
>>>>> OK, so here are two patches for testing:
>>>>>
>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>>>
>>>>> #1 adds support for registering the personality of the invoking task,
>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
>>>>> just having one link, it doesn't support a chain of them.
>>>>>
>>>>> I'll try and write a test case for this just to see if it actually works,
>>>>> so far it's totally untested. 
>>>>>
>>>>> Adding Pavel to the CC.
>>>>
>>>> Minor tweak to ensuring we do the right thing for async offload as well,
>>>> and it tests fine for me. Test case is:
>>>>
>>>> - Run as root
>>>> - Register personality for root
>>>> - create root only file
>>>> - check we can IORING_OP_OPENAT the file
>>>> - switch to user id test
>>>> - check we cannot IORING_OP_OPENAT the file
>>>> - check that we can open the file with IORING_OP_USE_CREDS linked
>>>
>>> I didn't like it becoming a bit too complicated, both in terms of
>>> implementation and use. And the fact that we'd have to jump through
>>> hoops to make this work for a full chain.
>>>
>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>>> This makes it way easier to use. Same branch:
>>>
>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>
>>> I'd feel much better with this variant for 5.6.
>>>
>>
>> To be honest, sounds pretty dangerous. Especially since somebody started talking
>> about stealing fds from a process, it could lead to a nasty loophole somehow.
>> E.g. root registers its credentials, passes io_uring it to non-privileged
>> children, and then some process steals the uring fd (though, it would need
>> priviledged mode for code-injection or else). Could we Cc here someone really
>> keen on security?
> 
> Link? If you can steal fds, then surely you've already lost any sense of

https://lwn.net/Articles/808997/
But I didn't looked up it yet.

> security in the first place? Besides, if root registered the ring, the root
> credentials are already IN the ring. I don't see how this adds any extra
> holes.

Isn't it what you fixed in ("don't use static creds/mm assignments") ?

I'm not sure what capability (and whether any) it would need, but better to
think such cases through. Just saying, I would prefer to ask a person with
extensive security experience, unlike me.

>> Stefan, could you please explain, how this 5 syscalls pattern from the first
>> email came in the first place? Just want to understand the case.
> 
> I think if you go back a bit in the archive, Stefan has a fuller explanation
> of how samba does the credentials dance.

Missed it, I'll take a look, thanks

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 20:50                 ` Pavel Begunkov
@ 2020-01-28 20:56                   ` Jens Axboe
  2020-01-28 21:25                     ` Christian Brauner
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2020-01-28 20:56 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List

On 1/28/20 1:50 PM, Pavel Begunkov wrote:
> On 28/01/2020 23:19, Jens Axboe wrote:
>> On 1/28/20 1:16 PM, Pavel Begunkov wrote:
>>> On 28/01/2020 22:42, Jens Axboe wrote:
>>>> On 1/28/20 11:04 AM, Jens Axboe wrote:
>>>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
>>>>>> On 1/28/20 9:19 AM, Jens Axboe wrote:
>>>>>>> On 1/28/20 9:17 AM, Stefan Metzmacher wrote:
>>>>>> OK, so here are two patches for testing:
>>>>>>
>>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>>>>
>>>>>> #1 adds support for registering the personality of the invoking task,
>>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
>>>>>> just having one link, it doesn't support a chain of them.
>>>>>>
>>>>>> I'll try and write a test case for this just to see if it actually works,
>>>>>> so far it's totally untested. 
>>>>>>
>>>>>> Adding Pavel to the CC.
>>>>>
>>>>> Minor tweak to ensuring we do the right thing for async offload as well,
>>>>> and it tests fine for me. Test case is:
>>>>>
>>>>> - Run as root
>>>>> - Register personality for root
>>>>> - create root only file
>>>>> - check we can IORING_OP_OPENAT the file
>>>>> - switch to user id test
>>>>> - check we cannot IORING_OP_OPENAT the file
>>>>> - check that we can open the file with IORING_OP_USE_CREDS linked
>>>>
>>>> I didn't like it becoming a bit too complicated, both in terms of
>>>> implementation and use. And the fact that we'd have to jump through
>>>> hoops to make this work for a full chain.
>>>>
>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>>>> This makes it way easier to use. Same branch:
>>>>
>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>>
>>>> I'd feel much better with this variant for 5.6.
>>>>
>>>
>>> To be honest, sounds pretty dangerous. Especially since somebody started talking
>>> about stealing fds from a process, it could lead to a nasty loophole somehow.
>>> E.g. root registers its credentials, passes io_uring it to non-privileged
>>> children, and then some process steals the uring fd (though, it would need
>>> priviledged mode for code-injection or else). Could we Cc here someone really
>>> keen on security?
>>
>> Link? If you can steal fds, then surely you've already lost any sense of
> 
> https://lwn.net/Articles/808997/
> But I didn't looked up it yet.

This isn't new by any stretch, it's always been possible to pass file
descriptors through SCM_RIGHTS. This just gives you a new way to do it.
That's not stealing or leaking, it's deliberately passing it to someone
else.

>> security in the first place? Besides, if root registered the ring, the root
>> credentials are already IN the ring. I don't see how this adds any extra
>> holes.
> 
> Isn't it what you fixed in ("don't use static creds/mm assignments") ?

Sure, but SQPOLL still uses it.

> I'm not sure what capability (and whether any) it would need, but
> better to think such cases through. Just saying, I would prefer to ask
> a person with extensive security experience, unlike me.

I don't disagree, but I really don't think this is any different than
what we already allow.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 20:56                   ` Jens Axboe
@ 2020-01-28 21:25                     ` Christian Brauner
  2020-01-28 22:38                       ` Pavel Begunkov
  0 siblings, 1 reply; 48+ messages in thread
From: Christian Brauner @ 2020-01-28 21:25 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Pavel Begunkov, Stefan Metzmacher, io-uring, Linux API Mailing List

On Tue, Jan 28, 2020 at 01:56:00PM -0700, Jens Axboe wrote:
> On 1/28/20 1:50 PM, Pavel Begunkov wrote:
> > On 28/01/2020 23:19, Jens Axboe wrote:
> >> On 1/28/20 1:16 PM, Pavel Begunkov wrote:
> >>> On 28/01/2020 22:42, Jens Axboe wrote:
> >>>> On 1/28/20 11:04 AM, Jens Axboe wrote:
> >>>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
> >>>>>> On 1/28/20 9:19 AM, Jens Axboe wrote:
> >>>>>>> On 1/28/20 9:17 AM, Stefan Metzmacher wrote:
> >>>>>> OK, so here are two patches for testing:
> >>>>>>
> >>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
> >>>>>>
> >>>>>> #1 adds support for registering the personality of the invoking task,
> >>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
> >>>>>> just having one link, it doesn't support a chain of them.
> >>>>>>
> >>>>>> I'll try and write a test case for this just to see if it actually works,
> >>>>>> so far it's totally untested. 
> >>>>>>
> >>>>>> Adding Pavel to the CC.
> >>>>>
> >>>>> Minor tweak to ensuring we do the right thing for async offload as well,
> >>>>> and it tests fine for me. Test case is:
> >>>>>
> >>>>> - Run as root
> >>>>> - Register personality for root
> >>>>> - create root only file
> >>>>> - check we can IORING_OP_OPENAT the file
> >>>>> - switch to user id test
> >>>>> - check we cannot IORING_OP_OPENAT the file
> >>>>> - check that we can open the file with IORING_OP_USE_CREDS linked
> >>>>
> >>>> I didn't like it becoming a bit too complicated, both in terms of
> >>>> implementation and use. And the fact that we'd have to jump through
> >>>> hoops to make this work for a full chain.
> >>>>
> >>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
> >>>> This makes it way easier to use. Same branch:
> >>>>
> >>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
> >>>>
> >>>> I'd feel much better with this variant for 5.6.
> >>>>
> >>>
> >>> To be honest, sounds pretty dangerous. Especially since somebody started talking
> >>> about stealing fds from a process, it could lead to a nasty loophole somehow.
> >>> E.g. root registers its credentials, passes io_uring it to non-privileged
> >>> children, and then some process steals the uring fd (though, it would need
> >>> priviledged mode for code-injection or else). Could we Cc here someone really
> >>> keen on security?
> >>
> >> Link? If you can steal fds, then surely you've already lost any sense of
> > 
> > https://lwn.net/Articles/808997/
> > But I didn't looked up it yet.
> 
> This isn't new by any stretch, it's always been possible to pass file
> descriptors through SCM_RIGHTS. This just gives you a new way to do it.
> That's not stealing or leaking, it's deliberately passing it to someone
> else.

I've been reading along quietly. In addition to what Jens said, to ease
everyone's mind: pidfd_getfd() doesn't allow to unconditionally grab
file descriptors for any task. That would be crazy. The calling task
needs ptrace_may_access() permissions on the target task, i.e. the task
from which you want to grab the io_uring file descriptor. And any
calling task that has ptrace_may_access() permissions on the target can
do much worse than just grabbing an fd.

Christian

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 21:25                     ` Christian Brauner
@ 2020-01-28 22:38                       ` Pavel Begunkov
  0 siblings, 0 replies; 48+ messages in thread
From: Pavel Begunkov @ 2020-01-28 22:38 UTC (permalink / raw)
  To: Christian Brauner, Jens Axboe
  Cc: Stefan Metzmacher, io-uring, Linux API Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 578 bytes --]

On 29/01/2020 00:25, Christian Brauner wrote:
> I've been reading along quietly. In addition to what Jens said, to ease
> everyone's mind: pidfd_getfd() doesn't allow to unconditionally grab
> file descriptors for any task. That would be crazy. The calling task
> needs ptrace_may_access() permissions on the target task, i.e. the task
> from which you want to grab the io_uring file descriptor. And any
> calling task that has ptrace_may_access() permissions on the target can
> do much worse than just grabbing an fd.

Good to know, thanks!

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 19:42           ` Jens Axboe
  2020-01-28 20:16             ` Pavel Begunkov
@ 2020-01-28 23:36             ` Pavel Begunkov
  2020-01-28 23:40               ` Jens Axboe
  2020-01-29 14:59             ` Jann Horn
  2 siblings, 1 reply; 48+ messages in thread
From: Pavel Begunkov @ 2020-01-28 23:36 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 989 bytes --]

On 28/01/2020 22:42, Jens Axboe wrote:
> I didn't like it becoming a bit too complicated, both in terms of
> implementation and use. And the fact that we'd have to jump through
> hoops to make this work for a full chain.
> 
> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
> This makes it way easier to use. Same branch:
> 
> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
> 
> I'd feel much better with this variant for 5.6.
> 

Checked out ("don't use static creds/mm assignments")

1. do we miscount cred refs? We grab one in get_current_cred() for each async
request, but if (worker->creds != work->creds) it will never be put.

2. shouldn't worker->creds be named {old,saved,etc}_creds? It's set as

    worker->creds = override_creds(work->creds);

Where override_creds() returns previous creds. And if so, then the following
fast check looks strange:

    worker->creds != work->creds

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 23:36             ` Pavel Begunkov
@ 2020-01-28 23:40               ` Jens Axboe
  2020-01-28 23:51                 ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2020-01-28 23:40 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List

On 1/28/20 4:36 PM, Pavel Begunkov wrote:
> On 28/01/2020 22:42, Jens Axboe wrote:
>> I didn't like it becoming a bit too complicated, both in terms of
>> implementation and use. And the fact that we'd have to jump through
>> hoops to make this work for a full chain.
>>
>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>> This makes it way easier to use. Same branch:
>>
>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>
>> I'd feel much better with this variant for 5.6.
>>
> 
> Checked out ("don't use static creds/mm assignments")
> 
> 1. do we miscount cred refs? We grab one in get_current_cred() for each async
> request, but if (worker->creds != work->creds) it will never be put.

Yeah I think you're right, that needs a bit of fixing up.

> 2. shouldn't worker->creds be named {old,saved,etc}_creds? It's set as
> 
>     worker->creds = override_creds(work->creds);
> 
> Where override_creds() returns previous creds. And if so, then the following
> fast check looks strange:
> 
>     worker->creds != work->creds

Don't care too much about the naming, but the logic does appear off.
I'll take a look at both of these tonight, unless you beat me to it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 23:40               ` Jens Axboe
@ 2020-01-28 23:51                 ` Jens Axboe
  2020-01-29  0:10                   ` Pavel Begunkov
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2020-01-28 23:51 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List

On 1/28/20 4:40 PM, Jens Axboe wrote:
> On 1/28/20 4:36 PM, Pavel Begunkov wrote:
>> On 28/01/2020 22:42, Jens Axboe wrote:
>>> I didn't like it becoming a bit too complicated, both in terms of
>>> implementation and use. And the fact that we'd have to jump through
>>> hoops to make this work for a full chain.
>>>
>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>>> This makes it way easier to use. Same branch:
>>>
>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>
>>> I'd feel much better with this variant for 5.6.
>>>
>>
>> Checked out ("don't use static creds/mm assignments")
>>
>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async
>> request, but if (worker->creds != work->creds) it will never be put.
> 
> Yeah I think you're right, that needs a bit of fixing up.

I think this may have gotten fixed with the later addition posted today?
I'll double check. But for the newer stuff, we put it for both cases
when the request is freed.

>> 2. shouldn't worker->creds be named {old,saved,etc}_creds? It's set as
>>
>>     worker->creds = override_creds(work->creds);
>>
>> Where override_creds() returns previous creds. And if so, then the following
>> fast check looks strange:
>>
>>     worker->creds != work->creds
> 
> Don't care too much about the naming, but the logic does appear off.
> I'll take a look at both of these tonight, unless you beat me to it.

Testing this now, what a braino.

diff --git a/fs/io-wq.c b/fs/io-wq.c
index ee49e8852d39..8fbbadf04cc3 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -56,7 +56,8 @@ struct io_worker {
 
 	struct rcu_head rcu;
 	struct mm_struct *mm;
-	const struct cred *creds;
+	const struct cred *cur_creds;
+	const struct cred *saved_creds;
 	struct files_struct *restore_files;
 };
 
@@ -135,9 +136,9 @@ static bool __io_worker_unuse(struct io_wqe *wqe, struct io_worker *worker)
 {
 	bool dropped_lock = false;
 
-	if (worker->creds) {
-		revert_creds(worker->creds);
-		worker->creds = NULL;
+	if (worker->saved_creds) {
+		revert_creds(worker->saved_creds);
+		worker->cur_creds = worker->saved_creds = NULL;
 	}
 
 	if (current->files != worker->restore_files) {
@@ -424,10 +425,11 @@ static void io_wq_switch_mm(struct io_worker *worker, struct io_wq_work *work)
 static void io_wq_switch_creds(struct io_worker *worker,
 			       struct io_wq_work *work)
 {
-	if (worker->creds)
-		revert_creds(worker->creds);
+	if (worker->saved_creds)
+		revert_creds(worker->saved_creds);
 
-	worker->creds = override_creds(work->creds);
+	worker->saved_creds = override_creds(work->creds);
+	worker->cur_creds = work->creds;
 }
 
 static void io_worker_handle_work(struct io_worker *worker)
@@ -480,7 +482,7 @@ static void io_worker_handle_work(struct io_worker *worker)
 		}
 		if (work->mm != worker->mm)
 			io_wq_switch_mm(worker, work);
-		if (worker->creds != work->creds)
+		if (worker->cur_creds != work->creds)
 			io_wq_switch_creds(worker, work);
 		/*
 		 * OK to set IO_WQ_WORK_CANCEL even for uncancellable work,

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 23:51                 ` Jens Axboe
@ 2020-01-29  0:10                   ` Pavel Begunkov
  2020-01-29  0:15                     ` Jens Axboe
  2020-01-29  0:20                     ` Jens Axboe
  0 siblings, 2 replies; 48+ messages in thread
From: Pavel Begunkov @ 2020-01-29  0:10 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 3959 bytes --]

On 29/01/2020 02:51, Jens Axboe wrote:
> On 1/28/20 4:40 PM, Jens Axboe wrote:
>> On 1/28/20 4:36 PM, Pavel Begunkov wrote:
>>> On 28/01/2020 22:42, Jens Axboe wrote:
>>>> I didn't like it becoming a bit too complicated, both in terms of
>>>> implementation and use. And the fact that we'd have to jump through
>>>> hoops to make this work for a full chain.
>>>>
>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>>>> This makes it way easier to use. Same branch:
>>>>
>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>>
>>>> I'd feel much better with this variant for 5.6.
>>>>
>>>
>>> Checked out ("don't use static creds/mm assignments")
>>>
>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async
>>> request, but if (worker->creds != work->creds) it will never be put.
>>
>> Yeah I think you're right, that needs a bit of fixing up.
> 

Hmm, it seems it leaks it unconditionally, as it grabs in a ref in override_creds().

> I think this may have gotten fixed with the later addition posted today?
> I'll double check. But for the newer stuff, we put it for both cases
> when the request is freed.

Yeah, maybe. I got tangled trying to verify both at once and decided to start
with the old one.


>>> 2. shouldn't worker->creds be named {old,saved,etc}_creds? It's set as
>>>
>>>     worker->creds = override_creds(work->creds);
>>>
>>> Where override_creds() returns previous creds. And if so, then the following
>>> fast check looks strange:
>>>
>>>     worker->creds != work->creds
>>
>> Don't care too much about the naming, but the logic does appear off.
>> I'll take a look at both of these tonight, unless you beat me to it.

Apparently, you're faster :)

> 
> Testing this now, what a braino.
> 
> diff --git a/fs/io-wq.c b/fs/io-wq.c
> index ee49e8852d39..8fbbadf04cc3 100644
> --- a/fs/io-wq.c
> +++ b/fs/io-wq.c
> @@ -56,7 +56,8 @@ struct io_worker {
>  
>  	struct rcu_head rcu;
>  	struct mm_struct *mm;
> -	const struct cred *creds;
> +	const struct cred *cur_creds;
> +	const struct cred *saved_creds;
>  	struct files_struct *restore_files;
>  };
>  
> @@ -135,9 +136,9 @@ static bool __io_worker_unuse(struct io_wqe *wqe, struct io_worker *worker)
>  {
>  	bool dropped_lock = false;
>  
> -	if (worker->creds) {
> -		revert_creds(worker->creds);
> -		worker->creds = NULL;
> +	if (worker->saved_creds) {
> +		revert_creds(worker->saved_creds);
> +		worker->cur_creds = worker->saved_creds = NULL;
>  	}
>  
>  	if (current->files != worker->restore_files) {
> @@ -424,10 +425,11 @@ static void io_wq_switch_mm(struct io_worker *worker, struct io_wq_work *work)
>  static void io_wq_switch_creds(struct io_worker *worker,
>  			       struct io_wq_work *work)
>  {
> -	if (worker->creds)
> -		revert_creds(worker->creds);
> +	if (worker->saved_creds)
> +		revert_creds(worker->saved_creds);
>  
> -	worker->creds = override_creds(work->creds);
> +	worker->saved_creds = override_creds(work->creds);
> +	worker->cur_creds = work->creds;
>  }

How about as follows? rever_creds() is a bit heavier than put_creds().

static void io_wq_switch_creds(struct io_worker *worker,
			       struct io_wq_work *work)
{
	const struct cred *old_creds = override_creds(work->creds);

	if (worker->saved_creds)
		put_cred(old_creds);
	else
		worker->saved_creds = old;
	worker->cur_creds = work->creds;
}

>  
>  static void io_worker_handle_work(struct io_worker *worker)
> @@ -480,7 +482,7 @@ static void io_worker_handle_work(struct io_worker *worker)
>  		}
>  		if (work->mm != worker->mm)
>  			io_wq_switch_mm(worker, work);
> -		if (worker->creds != work->creds)
> +		if (worker->cur_creds != work->creds)
>  			io_wq_switch_creds(worker, work);
>  		/*
>  		 * OK to set IO_WQ_WORK_CANCEL even for uncancellable work,
> 

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29  0:10                   ` Pavel Begunkov
@ 2020-01-29  0:15                     ` Jens Axboe
  2020-01-29  0:18                       ` Jens Axboe
  2020-01-29  0:20                     ` Jens Axboe
  1 sibling, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2020-01-29  0:15 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List

On 1/28/20 5:10 PM, Pavel Begunkov wrote:
> On 29/01/2020 02:51, Jens Axboe wrote:
>> On 1/28/20 4:40 PM, Jens Axboe wrote:
>>> On 1/28/20 4:36 PM, Pavel Begunkov wrote:
>>>> On 28/01/2020 22:42, Jens Axboe wrote:
>>>>> I didn't like it becoming a bit too complicated, both in terms of
>>>>> implementation and use. And the fact that we'd have to jump through
>>>>> hoops to make this work for a full chain.
>>>>>
>>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>>>>> This makes it way easier to use. Same branch:
>>>>>
>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>>>
>>>>> I'd feel much better with this variant for 5.6.
>>>>>
>>>>
>>>> Checked out ("don't use static creds/mm assignments")
>>>>
>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async
>>>> request, but if (worker->creds != work->creds) it will never be put.
>>>
>>> Yeah I think you're right, that needs a bit of fixing up.
>>
> 
> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in
> override_creds().
> 
>> I think this may have gotten fixed with the later addition posted today?
>> I'll double check. But for the newer stuff, we put it for both cases
>> when the request is freed.
> 
> Yeah, maybe. I got tangled trying to verify both at once and decided to start
> with the old one.
> 
> 
>>>> 2. shouldn't worker->creds be named {old,saved,etc}_creds? It's set as
>>>>
>>>>     worker->creds = override_creds(work->creds);
>>>>
>>>> Where override_creds() returns previous creds. And if so, then the following
>>>> fast check looks strange:
>>>>
>>>>     worker->creds != work->creds
>>>
>>> Don't care too much about the naming, but the logic does appear off.
>>> I'll take a look at both of these tonight, unless you beat me to it.
> 
> Apparently, you're faster :)
> 
>>
>> Testing this now, what a braino.
>>
>> diff --git a/fs/io-wq.c b/fs/io-wq.c
>> index ee49e8852d39..8fbbadf04cc3 100644
>> --- a/fs/io-wq.c
>> +++ b/fs/io-wq.c
>> @@ -56,7 +56,8 @@ struct io_worker {
>>  
>>  	struct rcu_head rcu;
>>  	struct mm_struct *mm;
>> -	const struct cred *creds;
>> +	const struct cred *cur_creds;
>> +	const struct cred *saved_creds;
>>  	struct files_struct *restore_files;
>>  };
>>  
>> @@ -135,9 +136,9 @@ static bool __io_worker_unuse(struct io_wqe *wqe, struct io_worker *worker)
>>  {
>>  	bool dropped_lock = false;
>>  
>> -	if (worker->creds) {
>> -		revert_creds(worker->creds);
>> -		worker->creds = NULL;
>> +	if (worker->saved_creds) {
>> +		revert_creds(worker->saved_creds);
>> +		worker->cur_creds = worker->saved_creds = NULL;
>>  	}
>>  
>>  	if (current->files != worker->restore_files) {
>> @@ -424,10 +425,11 @@ static void io_wq_switch_mm(struct io_worker *worker, struct io_wq_work *work)
>>  static void io_wq_switch_creds(struct io_worker *worker,
>>  			       struct io_wq_work *work)
>>  {
>> -	if (worker->creds)
>> -		revert_creds(worker->creds);
>> +	if (worker->saved_creds)
>> +		revert_creds(worker->saved_creds);
>>  
>> -	worker->creds = override_creds(work->creds);
>> +	worker->saved_creds = override_creds(work->creds);
>> +	worker->cur_creds = work->creds;
>>  }
> 
> How about as follows? rever_creds() is a bit heavier than put_creds().
> 
> static void io_wq_switch_creds(struct io_worker *worker,
> 			       struct io_wq_work *work)
> {
> 	const struct cred *old_creds = override_creds(work->creds);
> 
> 	if (worker->saved_creds)
> 		put_cred(old_creds);
> 	else
> 		worker->saved_creds = old;
> 	worker->cur_creds = work->creds;
> }

Looks good to me, I'll fold.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29  0:15                     ` Jens Axboe
@ 2020-01-29  0:18                       ` Jens Axboe
  0 siblings, 0 replies; 48+ messages in thread
From: Jens Axboe @ 2020-01-29  0:18 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List

On 1/28/20 5:15 PM, Jens Axboe wrote:
> On 1/28/20 5:10 PM, Pavel Begunkov wrote:
>> On 29/01/2020 02:51, Jens Axboe wrote:
>>> On 1/28/20 4:40 PM, Jens Axboe wrote:
>>>> On 1/28/20 4:36 PM, Pavel Begunkov wrote:
>>>>> On 28/01/2020 22:42, Jens Axboe wrote:
>>>>>> I didn't like it becoming a bit too complicated, both in terms of
>>>>>> implementation and use. And the fact that we'd have to jump through
>>>>>> hoops to make this work for a full chain.
>>>>>>
>>>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>>>>>> This makes it way easier to use. Same branch:
>>>>>>
>>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>>>>
>>>>>> I'd feel much better with this variant for 5.6.
>>>>>>
>>>>>
>>>>> Checked out ("don't use static creds/mm assignments")
>>>>>
>>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async
>>>>> request, but if (worker->creds != work->creds) it will never be put.
>>>>
>>>> Yeah I think you're right, that needs a bit of fixing up.
>>>
>>
>> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in
>> override_creds().
>>
>>> I think this may have gotten fixed with the later addition posted today?
>>> I'll double check. But for the newer stuff, we put it for both cases
>>> when the request is freed.
>>
>> Yeah, maybe. I got tangled trying to verify both at once and decided to start
>> with the old one.
>>
>>
>>>>> 2. shouldn't worker->creds be named {old,saved,etc}_creds? It's set as
>>>>>
>>>>>     worker->creds = override_creds(work->creds);
>>>>>
>>>>> Where override_creds() returns previous creds. And if so, then the following
>>>>> fast check looks strange:
>>>>>
>>>>>     worker->creds != work->creds
>>>>
>>>> Don't care too much about the naming, but the logic does appear off.
>>>> I'll take a look at both of these tonight, unless you beat me to it.
>>
>> Apparently, you're faster :)
>>
>>>
>>> Testing this now, what a braino.
>>>
>>> diff --git a/fs/io-wq.c b/fs/io-wq.c
>>> index ee49e8852d39..8fbbadf04cc3 100644
>>> --- a/fs/io-wq.c
>>> +++ b/fs/io-wq.c
>>> @@ -56,7 +56,8 @@ struct io_worker {
>>>  
>>>  	struct rcu_head rcu;
>>>  	struct mm_struct *mm;
>>> -	const struct cred *creds;
>>> +	const struct cred *cur_creds;
>>> +	const struct cred *saved_creds;
>>>  	struct files_struct *restore_files;
>>>  };
>>>  
>>> @@ -135,9 +136,9 @@ static bool __io_worker_unuse(struct io_wqe *wqe, struct io_worker *worker)
>>>  {
>>>  	bool dropped_lock = false;
>>>  
>>> -	if (worker->creds) {
>>> -		revert_creds(worker->creds);
>>> -		worker->creds = NULL;
>>> +	if (worker->saved_creds) {
>>> +		revert_creds(worker->saved_creds);
>>> +		worker->cur_creds = worker->saved_creds = NULL;
>>>  	}
>>>  
>>>  	if (current->files != worker->restore_files) {
>>> @@ -424,10 +425,11 @@ static void io_wq_switch_mm(struct io_worker *worker, struct io_wq_work *work)
>>>  static void io_wq_switch_creds(struct io_worker *worker,
>>>  			       struct io_wq_work *work)
>>>  {
>>> -	if (worker->creds)
>>> -		revert_creds(worker->creds);
>>> +	if (worker->saved_creds)
>>> +		revert_creds(worker->saved_creds);
>>>  
>>> -	worker->creds = override_creds(work->creds);
>>> +	worker->saved_creds = override_creds(work->creds);
>>> +	worker->cur_creds = work->creds;
>>>  }
>>
>> How about as follows? rever_creds() is a bit heavier than put_creds().
>>
>> static void io_wq_switch_creds(struct io_worker *worker,
>> 			       struct io_wq_work *work)
>> {
>> 	const struct cred *old_creds = override_creds(work->creds);
>>
>> 	if (worker->saved_creds)
>> 		put_cred(old_creds);
>> 	else
>> 		worker->saved_creds = old;
>> 	worker->cur_creds = work->creds;
>> }
> 
> Looks good to me, I'll fold.

Actually, it doesn't clear current->cred then, which seems a bit..
unfortunate. Could be a source of issues.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29  0:10                   ` Pavel Begunkov
  2020-01-29  0:15                     ` Jens Axboe
@ 2020-01-29  0:20                     ` Jens Axboe
  2020-01-29  0:21                       ` Pavel Begunkov
  1 sibling, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2020-01-29  0:20 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List

On 1/28/20 5:10 PM, Pavel Begunkov wrote:
>>>> Checked out ("don't use static creds/mm assignments")
>>>>
>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async
>>>> request, but if (worker->creds != work->creds) it will never be put.
>>>
>>> Yeah I think you're right, that needs a bit of fixing up.
>>
> 
> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in
> override_creds().
> 

We grab one there, and an extra one. Then we drop one of them inline,
and the other in __io_req_aux_free().

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29  0:20                     ` Jens Axboe
@ 2020-01-29  0:21                       ` Pavel Begunkov
  2020-01-29  0:24                         ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Pavel Begunkov @ 2020-01-29  0:21 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 687 bytes --]

On 29/01/2020 03:20, Jens Axboe wrote:
> On 1/28/20 5:10 PM, Pavel Begunkov wrote:
>>>>> Checked out ("don't use static creds/mm assignments")
>>>>>
>>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async
>>>>> request, but if (worker->creds != work->creds) it will never be put.
>>>>
>>>> Yeah I think you're right, that needs a bit of fixing up.
>>>
>>
>> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in
>> override_creds().
>>
> 
> We grab one there, and an extra one. Then we drop one of them inline,
> and the other in __io_req_aux_free().
> 
Yeah, with the last patch it should make it even

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29  0:21                       ` Pavel Begunkov
@ 2020-01-29  0:24                         ` Jens Axboe
  2020-01-29  0:54                           ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2020-01-29  0:24 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List

On 1/28/20 5:21 PM, Pavel Begunkov wrote:
> On 29/01/2020 03:20, Jens Axboe wrote:
>> On 1/28/20 5:10 PM, Pavel Begunkov wrote:
>>>>>> Checked out ("don't use static creds/mm assignments")
>>>>>>
>>>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async
>>>>>> request, but if (worker->creds != work->creds) it will never be put.
>>>>>
>>>>> Yeah I think you're right, that needs a bit of fixing up.
>>>>
>>>
>>> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in
>>> override_creds().
>>>
>>
>> We grab one there, and an extra one. Then we drop one of them inline,
>> and the other in __io_req_aux_free().
>>
> Yeah, with the last patch it should make it even

OK good we agree on that. I should probably pull back that bit to the
original patch to avoid having a hole in there...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29  0:24                         ` Jens Axboe
@ 2020-01-29  0:54                           ` Jens Axboe
  2020-01-29 10:17                             ` Pavel Begunkov
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2020-01-29  0:54 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List

On 1/28/20 5:24 PM, Jens Axboe wrote:
> On 1/28/20 5:21 PM, Pavel Begunkov wrote:
>> On 29/01/2020 03:20, Jens Axboe wrote:
>>> On 1/28/20 5:10 PM, Pavel Begunkov wrote:
>>>>>>> Checked out ("don't use static creds/mm assignments")
>>>>>>>
>>>>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async
>>>>>>> request, but if (worker->creds != work->creds) it will never be put.
>>>>>>
>>>>>> Yeah I think you're right, that needs a bit of fixing up.
>>>>>
>>>>
>>>> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in
>>>> override_creds().
>>>>
>>>
>>> We grab one there, and an extra one. Then we drop one of them inline,
>>> and the other in __io_req_aux_free().
>>>
>> Yeah, with the last patch it should make it even
> 
> OK good we agree on that. I should probably pull back that bit to the
> original patch to avoid having a hole in there...

Done

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29  0:54                           ` Jens Axboe
@ 2020-01-29 10:17                             ` Pavel Begunkov
  2020-01-29 13:11                               ` Stefan Metzmacher
  0 siblings, 1 reply; 48+ messages in thread
From: Pavel Begunkov @ 2020-01-29 10:17 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 1416 bytes --]

On 29/01/2020 03:54, Jens Axboe wrote:
> On 1/28/20 5:24 PM, Jens Axboe wrote:
>> On 1/28/20 5:21 PM, Pavel Begunkov wrote:
>>> On 29/01/2020 03:20, Jens Axboe wrote:
>>>> On 1/28/20 5:10 PM, Pavel Begunkov wrote:
>>>>>>>> Checked out ("don't use static creds/mm assignments")
>>>>>>>>
>>>>>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async
>>>>>>>> request, but if (worker->creds != work->creds) it will never be put.
>>>>>>>
>>>>>>> Yeah I think you're right, that needs a bit of fixing up.
>>>>>>
>>>>>
>>>>> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in
>>>>> override_creds().
>>>>>
>>>>
>>>> We grab one there, and an extra one. Then we drop one of them inline,
>>>> and the other in __io_req_aux_free().
>>>>
>>> Yeah, with the last patch it should make it even
>>
>> OK good we agree on that. I should probably pull back that bit to the
>> original patch to avoid having a hole in there...
> 
> Done
> 

("io_uring/io-wq: don't use static creds/mm assignments") and ("io_uring:
support using a registered personality for commands") looks good now.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>


BTW, why do use decided to use idr, but not linear buffer scheme as for the rest
registered ones? We can rework it later for performance with linear search in
the buffer upon registration.

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29 10:17                             ` Pavel Begunkov
@ 2020-01-29 13:11                               ` Stefan Metzmacher
  2020-01-29 13:41                                 ` Pavel Begunkov
  0 siblings, 1 reply; 48+ messages in thread
From: Stefan Metzmacher @ 2020-01-29 13:11 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Linux API Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 4113 bytes --]

Am 29.01.20 um 11:17 schrieb Pavel Begunkov:
> On 29/01/2020 03:54, Jens Axboe wrote:
>> On 1/28/20 5:24 PM, Jens Axboe wrote:
>>> On 1/28/20 5:21 PM, Pavel Begunkov wrote:
>>>> On 29/01/2020 03:20, Jens Axboe wrote:
>>>>> On 1/28/20 5:10 PM, Pavel Begunkov wrote:
>>>>>>>>> Checked out ("don't use static creds/mm assignments")
>>>>>>>>>
>>>>>>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async
>>>>>>>>> request, but if (worker->creds != work->creds) it will never be put.
>>>>>>>>
>>>>>>>> Yeah I think you're right, that needs a bit of fixing up.
>>>>>>>
>>>>>>
>>>>>> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in
>>>>>> override_creds().
>>>>>>
>>>>>
>>>>> We grab one there, and an extra one. Then we drop one of them inline,
>>>>> and the other in __io_req_aux_free().
>>>>>
>>>> Yeah, with the last patch it should make it even
>>>
>>> OK good we agree on that. I should probably pull back that bit to the
>>> original patch to avoid having a hole in there...
>>
>> Done
>>
> 
> ("io_uring/io-wq: don't use static creds/mm assignments") and ("io_uring:
> support using a registered personality for commands") looks good now.
> 
> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>


I'm very happy with the design, thanks!
That exactly what I had in mind:-)

It would also work with IORING_SETUP_SQPOLL, correct?

However I think there're a few things to improve/simplify.

> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=a26d26412e1e1783473f9dc8f030c3af3d54b1a6

In fs/io_uring.c mmgrab() and get_current_cred() are used together in
two places, why is put_cred() called in __io_req_aux_free while
mmdrop() is called from io_put_work(). I think both should be called
in io_put_work(), that makes the code much easier to understand.

My guess is that you choose __io_req_aux_free() for put_cred() because
of the following patches, but I'll explain on the other commit
why it's not needed.

> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=d9db233adf034bd7855ba06190525e10a05868be

A minor one would be starting with 1 instead of 0 and using
idr_alloc_cyclic() in order to avoid immediate reuse of ids.
That way we could include the id in the tracing message and
0 would mean the current creds were used.

> +static int io_remove_personalities(int id, void *p, void *data)
> +{
> +	struct io_ring_ctx *ctx = data;
> +
> +	idr_remove(&ctx->personality_idr, id);

Here we need something like:
put_creds((const struct cred *)p);

> +	return 0;
> +}


The io_uring_register() calles would look like this, correct?

 id = io_uring_register(ring_fd, IORING_REGISTER_PERSONALITY, NULL, 0);
 io_uring_register(ring_fd, IORING_UNREGISTER_PERSONALITY, NULL, id);

> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=eec9e69e0ad9ad364e1b6a5dfc52ad576afee235
> +
> +	if (sqe_flags & IOSQE_PERSONALITY) {
> +		int id = READ_ONCE(sqe->personality);
> +
> +		req->work.creds = idr_find(&ctx->personality_idr, id);
> +		if (unlikely(!req->work.creds)) {
> +			ret = -EINVAL;
> +			goto err_req;
> +		}
> +		get_cred(req->work.creds);> +		old_creds = override_creds(req->work.creds);
> +	}
> +

Here we could use a helper variable
const struct cred *personality_creds;
and leave req->work.creds as NULL.
It means we can avoid the explicit get_cred() call
and can skip the following hunk too:

> @@ -3977,7 +3977,8 @@ static int io_req_defer_prep(struct io_kiocb *req,
>  		mmgrab(current->mm);
>  		req->work.mm = current->mm;
>  	}
> -	req->work.creds = get_current_cred();
> +	if (!req->work.creds)
> +		req->work.creds = get_current_cred();
>  
>  	switch (req->opcode) {
>  	case IORING_OP_NOP:

The override_creds(personality_creds) has changed current->cred
and get_current_cred() will just pick it up as in the default case.

This would make the patch much simpler and allows put_cred() to be
in io_put_work() instead of __io_req_aux_free() as explained above.

metze


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29 13:11                               ` Stefan Metzmacher
@ 2020-01-29 13:41                                 ` Pavel Begunkov
  2020-01-29 13:56                                   ` Stefan Metzmacher
  0 siblings, 1 reply; 48+ messages in thread
From: Pavel Begunkov @ 2020-01-29 13:41 UTC (permalink / raw)
  To: Stefan Metzmacher, Jens Axboe; +Cc: io-uring, Linux API Mailing List

On 1/29/2020 4:11 PM, Stefan Metzmacher wrote:
> Am 29.01.20 um 11:17 schrieb Pavel Begunkov:
>> On 29/01/2020 03:54, Jens Axboe wrote:
>>> On 1/28/20 5:24 PM, Jens Axboe wrote:
>>>> On 1/28/20 5:21 PM, Pavel Begunkov wrote:
>>>>> On 29/01/2020 03:20, Jens Axboe wrote:
>>>>>> On 1/28/20 5:10 PM, Pavel Begunkov wrote:
>>>>>>>>>> Checked out ("don't use static creds/mm assignments")
>>>>>>>>>>
>>>>>>>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async
>>>>>>>>>> request, but if (worker->creds != work->creds) it will never be put.
>>>>>>>>>
>>>>>>>>> Yeah I think you're right, that needs a bit of fixing up.
>>>>>>>>
>>>>>>>
>>>>>>> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in
>>>>>>> override_creds().
>>>>>>>
>>>>>>
>>>>>> We grab one there, and an extra one. Then we drop one of them inline,
>>>>>> and the other in __io_req_aux_free().
>>>>>>
>>>>> Yeah, with the last patch it should make it even
>>>>
>>>> OK good we agree on that. I should probably pull back that bit to the
>>>> original patch to avoid having a hole in there...
>>>
>>> Done
>>>
>>
>> ("io_uring/io-wq: don't use static creds/mm assignments") and ("io_uring:
>> support using a registered personality for commands") looks good now.
>>
>> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> 
> 
> I'm very happy with the design, thanks!
> That exactly what I had in mind:-)
> 
> It would also work with IORING_SETUP_SQPOLL, correct?
> 

Yep

> However I think there're a few things to improve/simplify.
> 

Since 5.6 is already semi-open, it'd be great to have an incremental
patch for that. I'll retoss things as usual, if nobody do it before.

>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=a26d26412e1e1783473f9dc8f030c3af3d54b1a6
> 
> In fs/io_uring.c mmgrab() and get_current_cred() are used together in
> two places, why is put_cred() called in __io_req_aux_free while
> mmdrop() is called from io_put_work(). I think both should be called
> in io_put_work(), that makes the code much easier to understand.
> 
> My guess is that you choose __io_req_aux_free() for put_cred() because
> of the following patches, but I'll explain on the other commit
> why it's not needed.
> 
>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=d9db233adf034bd7855ba06190525e10a05868be
> 
> A minor one would be starting with 1 instead of 0 and using
> idr_alloc_cyclic() in order to avoid immediate reuse of ids.
> That way we could include the id in the tracing message and
> 0 would mean the current creds were used.
> 
>> +static int io_remove_personalities(int id, void *p, void *data)
>> +{
>> +	struct io_ring_ctx *ctx = data;
>> +
>> +	idr_remove(&ctx->personality_idr, id);
> 
> Here we need something like:
> put_creds((const struct cred *)p);

Good catch

> 
>> +	return 0;
>> +}
> 
> 
> The io_uring_register() calles would look like this, correct?
> 
>  id = io_uring_register(ring_fd, IORING_REGISTER_PERSONALITY, NULL, 0);
>  io_uring_register(ring_fd, IORING_UNREGISTER_PERSONALITY, NULL, id);
> 
>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=eec9e69e0ad9ad364e1b6a5dfc52ad576afee235
>> +
>> +	if (sqe_flags & IOSQE_PERSONALITY) {
>> +		int id = READ_ONCE(sqe->personality);
>> +
>> +		req->work.creds = idr_find(&ctx->personality_idr, id);
>> +		if (unlikely(!req->work.creds)) {
>> +			ret = -EINVAL;
>> +			goto err_req;
>> +		}
>> +		get_cred(req->work.creds);> +		old_creds = override_creds(req->work.creds);
>> +	}
>> +
> 
> Here we could use a helper variable
> const struct cred *personality_creds;
> and leave req->work.creds as NULL.
> It means we can avoid the explicit get_cred() call
> and can skip the following hunk too:
> 
>> @@ -3977,7 +3977,8 @@ static int io_req_defer_prep(struct io_kiocb *req,
>>  		mmgrab(current->mm);
>>  		req->work.mm = current->mm;
>>  	}
>> -	req->work.creds = get_current_cred();
>> +	if (!req->work.creds)
>> +		req->work.creds = get_current_cred();
>>  
>>  	switch (req->opcode) {
>>  	case IORING_OP_NOP:
> 
> The override_creds(personality_creds) has changed current->cred
> and get_current_cred() will just pick it up as in the default case.
> 
> This would make the patch much simpler and allows put_cred() to be
> in io_put_work() instead of __io_req_aux_free() as explained above.
> 

It's one extra get_current_cred(). I'd prefer to find another way to
clean this up.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29 13:41                                 ` Pavel Begunkov
@ 2020-01-29 13:56                                   ` Stefan Metzmacher
  2020-01-29 14:23                                     ` Pavel Begunkov
  0 siblings, 1 reply; 48+ messages in thread
From: Stefan Metzmacher @ 2020-01-29 13:56 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Linux API Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 5356 bytes --]

Am 29.01.20 um 14:41 schrieb Pavel Begunkov:
> On 1/29/2020 4:11 PM, Stefan Metzmacher wrote:
>> Am 29.01.20 um 11:17 schrieb Pavel Begunkov:
>>> On 29/01/2020 03:54, Jens Axboe wrote:
>>>> On 1/28/20 5:24 PM, Jens Axboe wrote:
>>>>> On 1/28/20 5:21 PM, Pavel Begunkov wrote:
>>>>>> On 29/01/2020 03:20, Jens Axboe wrote:
>>>>>>> On 1/28/20 5:10 PM, Pavel Begunkov wrote:
>>>>>>>>>>> Checked out ("don't use static creds/mm assignments")
>>>>>>>>>>>
>>>>>>>>>>> 1. do we miscount cred refs? We grab one in get_current_cred() for each async
>>>>>>>>>>> request, but if (worker->creds != work->creds) it will never be put.
>>>>>>>>>>
>>>>>>>>>> Yeah I think you're right, that needs a bit of fixing up.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hmm, it seems it leaks it unconditionally, as it grabs in a ref in
>>>>>>>> override_creds().
>>>>>>>>
>>>>>>>
>>>>>>> We grab one there, and an extra one. Then we drop one of them inline,
>>>>>>> and the other in __io_req_aux_free().
>>>>>>>
>>>>>> Yeah, with the last patch it should make it even
>>>>>
>>>>> OK good we agree on that. I should probably pull back that bit to the
>>>>> original patch to avoid having a hole in there...
>>>>
>>>> Done
>>>>
>>>
>>> ("io_uring/io-wq: don't use static creds/mm assignments") and ("io_uring:
>>> support using a registered personality for commands") looks good now.
>>>
>>> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
>>
>>
>> I'm very happy with the design, thanks!
>> That exactly what I had in mind:-)
>>
>> It would also work with IORING_SETUP_SQPOLL, correct?
>>
> 
> Yep
> 
>> However I think there're a few things to improve/simplify.
>>
> 
> Since 5.6 is already semi-open, it'd be great to have an incremental
> patch for that. I'll retoss things as usual, if nobody do it before.

I'll wait for comments from Jens first:-)
I guess we'll have things changed in his branch, when I wake up
tomorrow. Otherwise I can also create patches and submit them.

But I currently don't have an environment where I can do runtime tests
with it.

>>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=a26d26412e1e1783473f9dc8f030c3af3d54b1a6
>>
>> In fs/io_uring.c mmgrab() and get_current_cred() are used together in
>> two places, why is put_cred() called in __io_req_aux_free while
>> mmdrop() is called from io_put_work(). I think both should be called
>> in io_put_work(), that makes the code much easier to understand.
>>
>> My guess is that you choose __io_req_aux_free() for put_cred() because
>> of the following patches, but I'll explain on the other commit
>> why it's not needed.
>>
>>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=d9db233adf034bd7855ba06190525e10a05868be
>>
>> A minor one would be starting with 1 instead of 0 and using
>> idr_alloc_cyclic() in order to avoid immediate reuse of ids.
>> That way we could include the id in the tracing message and
>> 0 would mean the current creds were used.
>>
>>> +static int io_remove_personalities(int id, void *p, void *data)
>>> +{
>>> +	struct io_ring_ctx *ctx = data;
>>> +
>>> +	idr_remove(&ctx->personality_idr, id);
>>
>> Here we need something like:
>> put_creds((const struct cred *)p);
> 
> Good catch
> 
>>
>>> +	return 0;
>>> +}
>>
>>
>> The io_uring_register() calles would look like this, correct?
>>
>>  id = io_uring_register(ring_fd, IORING_REGISTER_PERSONALITY, NULL, 0);
>>  io_uring_register(ring_fd, IORING_UNREGISTER_PERSONALITY, NULL, id);
>>
>>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=eec9e69e0ad9ad364e1b6a5dfc52ad576afee235
>>> +
>>> +	if (sqe_flags & IOSQE_PERSONALITY) {
>>> +		int id = READ_ONCE(sqe->personality);
>>> +
>>> +		req->work.creds = idr_find(&ctx->personality_idr, id);
>>> +		if (unlikely(!req->work.creds)) {
>>> +			ret = -EINVAL;
>>> +			goto err_req;
>>> +		}
>>> +		get_cred(req->work.creds);> +		old_creds = override_creds(req->work.creds);
>>> +	}
>>> +
>>
>> Here we could use a helper variable
>> const struct cred *personality_creds;
>> and leave req->work.creds as NULL.
>> It means we can avoid the explicit get_cred() call
>> and can skip the following hunk too:
>>
>>> @@ -3977,7 +3977,8 @@ static int io_req_defer_prep(struct io_kiocb *req,
>>>  		mmgrab(current->mm);
>>>  		req->work.mm = current->mm;
>>>  	}
>>> -	req->work.creds = get_current_cred();
>>> +	if (!req->work.creds)
>>> +		req->work.creds = get_current_cred();
>>>  
>>>  	switch (req->opcode) {
>>>  	case IORING_OP_NOP:
>>
>> The override_creds(personality_creds) has changed current->cred
>> and get_current_cred() will just pick it up as in the default case.
>>
>> This would make the patch much simpler and allows put_cred() to be
>> in io_put_work() instead of __io_req_aux_free() as explained above.
>>
> 
> It's one extra get_current_cred(). I'd prefer to find another way to
> clean this up.

As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case
and the if (!req->work.creds) for both cases.

What do you mean exactly with one extra get_current_cred()?
Is that any worse than calling get_cred() and having an if check?

It also seems to avoid req->work.creds from being filled at all
for the non-blocking case.

metze


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29 13:56                                   ` Stefan Metzmacher
@ 2020-01-29 14:23                                     ` Pavel Begunkov
  2020-01-29 14:27                                       ` Stefan Metzmacher
  2020-01-29 17:34                                       ` Jens Axboe
  0 siblings, 2 replies; 48+ messages in thread
From: Pavel Begunkov @ 2020-01-29 14:23 UTC (permalink / raw)
  To: Stefan Metzmacher, Jens Axboe; +Cc: io-uring, Linux API Mailing List

On 1/29/2020 4:56 PM, Stefan Metzmacher wrote:
>>> However I think there're a few things to improve/simplify.
>> Since 5.6 is already semi-open, it'd be great to have an incremental
>> patch for that. I'll retoss things as usual, if nobody do it before.
> 
> I'll wait for comments from Jens first:-)
> I guess we'll have things changed in his branch, when I wake up
> tomorrow. Otherwise I can also create patches and submit them.

Sure, I won't get there any time soon.

> 
> But I currently don't have an environment where I can do runtime tests
> with it.
> 
>>>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=a26d26412e1e1783473f9dc8f030c3af3d54b1a6
>>>
>>> In fs/io_uring.c mmgrab() and get_current_cred() are used together in
>>> two places, why is put_cred() called in __io_req_aux_free while
>>> mmdrop() is called from io_put_work(). I think both should be called
>>> in io_put_work(), that makes the code much easier to understand.
>>>
>>> My guess is that you choose __io_req_aux_free() for put_cred() because
>>> of the following patches, but I'll explain on the other commit
>>> why it's not needed.
>>>
>>>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=d9db233adf034bd7855ba06190525e10a05868be
>>>
>>> A minor one would be starting with 1 instead of 0 and using
>>> idr_alloc_cyclic() in order to avoid immediate reuse of ids.
>>> That way we could include the id in the tracing message and
>>> 0 would mean the current creds were used.
>>>
>>>> +static int io_remove_personalities(int id, void *p, void *data)
>>>> +{
>>>> +	struct io_ring_ctx *ctx = data;
>>>> +
>>>> +	idr_remove(&ctx->personality_idr, id);
>>>
>>> Here we need something like:
>>> put_creds((const struct cred *)p);
>>
>> Good catch
>>
>>>
>>>> +	return 0;
>>>> +}
>>>
>>>
>>> The io_uring_register() calles would look like this, correct?
>>>
>>>  id = io_uring_register(ring_fd, IORING_REGISTER_PERSONALITY, NULL, 0);
>>>  io_uring_register(ring_fd, IORING_UNREGISTER_PERSONALITY, NULL, id);
>>>
>>>> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.6/io_uring-vfs&id=eec9e69e0ad9ad364e1b6a5dfc52ad576afee235
>>>> +
>>>> +	if (sqe_flags & IOSQE_PERSONALITY) {
>>>> +		int id = READ_ONCE(sqe->personality);
>>>> +
>>>> +		req->work.creds = idr_find(&ctx->personality_idr, id);
>>>> +		if (unlikely(!req->work.creds)) {
>>>> +			ret = -EINVAL;
>>>> +			goto err_req;
>>>> +		}
>>>> +		get_cred(req->work.creds);> +		old_creds = override_creds(req->work.creds);
>>>> +	}
>>>> +
>>>
>>> Here we could use a helper variable
>>> const struct cred *personality_creds;
>>> and leave req->work.creds as NULL.
>>> It means we can avoid the explicit get_cred() call
>>> and can skip the following hunk too:
>>>
>>>> @@ -3977,7 +3977,8 @@ static int io_req_defer_prep(struct io_kiocb *req,
>>>>  		mmgrab(current->mm);
>>>>  		req->work.mm = current->mm;
>>>>  	}
>>>> -	req->work.creds = get_current_cred();
>>>> +	if (!req->work.creds)
>>>> +		req->work.creds = get_current_cred();
>>>>  
>>>>  	switch (req->opcode) {
>>>>  	case IORING_OP_NOP:
>>>
>>> The override_creds(personality_creds) has changed current->cred
>>> and get_current_cred() will just pick it up as in the default case.
>>>
>>> This would make the patch much simpler and allows put_cred() to be
>>> in io_put_work() instead of __io_req_aux_free() as explained above.
>>>
>>
>> It's one extra get_current_cred(). I'd prefer to find another way to
>> clean this up.
> 
> As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case
> and the if (!req->work.creds) for both cases.

Great, that you turned attention to that! override_creds() is already
grabbing a ref, so it shouldn't call get_cred() there.
So, that's a bug.

It could be I'm wrong with the statement above, need to recheck all this
code to be sure.

BTW, io_req_defer_prep() may be called twice for a req, so you will
reassign it without putting a ref. It's safer to leave NULL checks. At
least, until I've done reworking and fixing preparation paths.

> 
> What do you mean exactly with one extra get_current_cred()?
> Is that any worse than calling get_cred() and having an if check?
> 
> It also seems to avoid req->work.creds from being filled at all
> for the non-blocking case.
> 
> metze
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29 14:23                                     ` Pavel Begunkov
@ 2020-01-29 14:27                                       ` Stefan Metzmacher
  2020-01-29 14:34                                         ` Pavel Begunkov
  2020-01-29 17:34                                       ` Jens Axboe
  1 sibling, 1 reply; 48+ messages in thread
From: Stefan Metzmacher @ 2020-01-29 14:27 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Linux API Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 1590 bytes --]

Hi Pavel,

>>>>> @@ -3977,7 +3977,8 @@ static int io_req_defer_prep(struct io_kiocb *req,
>>>>>  		mmgrab(current->mm);
>>>>>  		req->work.mm = current->mm;
>>>>>  	}
>>>>> -	req->work.creds = get_current_cred();
>>>>> +	if (!req->work.creds)
>>>>> +		req->work.creds = get_current_cred();
>>>>>  
>>>>>  	switch (req->opcode) {
>>>>>  	case IORING_OP_NOP:
>>>>
>>>> The override_creds(personality_creds) has changed current->cred
>>>> and get_current_cred() will just pick it up as in the default case.
>>>>
>>>> This would make the patch much simpler and allows put_cred() to be
>>>> in io_put_work() instead of __io_req_aux_free() as explained above.
>>>>
>>>
>>> It's one extra get_current_cred(). I'd prefer to find another way to
>>> clean this up.
>>
>> As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case
>> and the if (!req->work.creds) for both cases.
> 
> Great, that you turned attention to that! override_creds() is already
> grabbing a ref, so it shouldn't call get_cred() there.
> So, that's a bug.
> 
> It could be I'm wrong with the statement above, need to recheck all this
> code to be sure.
> 
> BTW, io_req_defer_prep() may be called twice for a req, so you will
> reassign it without putting a ref. It's safer to leave NULL checks. At
> least, until I've done reworking and fixing preparation paths.

Ok, but that would be already a bug in
"io_uring/io-wq: don't use static creds/mm assignments"
instead of logically being part of
"io_uring: support using a registered personality for commands"

metze


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29 14:27                                       ` Stefan Metzmacher
@ 2020-01-29 14:34                                         ` Pavel Begunkov
  0 siblings, 0 replies; 48+ messages in thread
From: Pavel Begunkov @ 2020-01-29 14:34 UTC (permalink / raw)
  To: Stefan Metzmacher, Jens Axboe; +Cc: io-uring, Linux API Mailing List

On 1/29/2020 5:27 PM, Stefan Metzmacher wrote:
>> Great, that you turned attention to that! override_creds() is already
>> grabbing a ref, so it shouldn't call get_cred() there.
>> So, that's a bug.
>>
>> It could be I'm wrong with the statement above, need to recheck all this
>> code to be sure.
>>
>> BTW, io_req_defer_prep() may be called twice for a req, so you will
>> reassign it without putting a ref. It's safer to leave NULL checks. At
>> least, until I've done reworking and fixing preparation paths.
> 
> Ok, but that would be already a bug in
> "io_uring/io-wq: don't use static creds/mm assignments"
> instead of logically being part of
> "io_uring: support using a registered personality for commands"

Right. Probably should be backported there, since it's just 2 commits
before.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-28 19:42           ` Jens Axboe
  2020-01-28 20:16             ` Pavel Begunkov
  2020-01-28 23:36             ` Pavel Begunkov
@ 2020-01-29 14:59             ` Jann Horn
  2020-01-29 17:34               ` Jens Axboe
  2 siblings, 1 reply; 48+ messages in thread
From: Jann Horn @ 2020-01-29 14:59 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov

On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <axboe@kernel.dk> wrote:
> On 1/28/20 11:04 AM, Jens Axboe wrote:
> > On 1/28/20 10:19 AM, Jens Axboe wrote:
[...]
> >> #1 adds support for registering the personality of the invoking task,
> >> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
> >> just having one link, it doesn't support a chain of them.
[...]
> I didn't like it becoming a bit too complicated, both in terms of
> implementation and use. And the fact that we'd have to jump through
> hoops to make this work for a full chain.
>
> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
> This makes it way easier to use. Same branch:
>
> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>
> I'd feel much better with this variant for 5.6.

Some general feedback from an inspectability/debuggability perspective:

At some point, it might be nice if you could add a .show_fdinfo
handler to the io_uring_fops that makes it possible to get a rough
overview over the state of the uring by reading /proc/$pid/fdinfo/$fd,
just like e.g. eventfd (see eventfd_show_fdinfo()). It might be
helpful for debugging to be able to see information about the fixed
files and buffers that have been registered. Same for the
personalities; that information might also be useful when someone is
trying to figure out what privileges a running process actually has.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29 14:23                                     ` Pavel Begunkov
  2020-01-29 14:27                                       ` Stefan Metzmacher
@ 2020-01-29 17:34                                       ` Jens Axboe
  2020-01-29 17:42                                         ` Jens Axboe
  2020-01-29 17:46                                         ` Pavel Begunkov
  1 sibling, 2 replies; 48+ messages in thread
From: Jens Axboe @ 2020-01-29 17:34 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List

On 1/29/20 7:23 AM, Pavel Begunkov wrote:
>>>> The override_creds(personality_creds) has changed current->cred
>>>> and get_current_cred() will just pick it up as in the default case.
>>>>
>>>> This would make the patch much simpler and allows put_cred() to be
>>>> in io_put_work() instead of __io_req_aux_free() as explained above.
>>>>
>>>
>>> It's one extra get_current_cred(). I'd prefer to find another way to
>>> clean this up.
>>
>> As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case
>> and the if (!req->work.creds) for both cases.
> 
> Great, that you turned attention to that! override_creds() is already
> grabbing a ref, so it shouldn't call get_cred() there.
> So, that's a bug.

It's not though - one is dropped in that function, the other when the
request is freed. So we do need two references to it. With the proposed
change to keep the override_creds() variable local for that spot we
don't, and the get_cred() can then go.

> It could be I'm wrong with the statement above, need to recheck all this
> code to be sure.

I think you are :-)

> BTW, io_req_defer_prep() may be called twice for a req, so you will
> reassign it without putting a ref. It's safer to leave NULL checks. At
> least, until I've done reworking and fixing preparation paths.

Agree, the NULL checks are safer and we should keep them.

Going through the rest of this thread, I'm making the following changes:

- ID must be > 0. I like that change, as we don't need an sqe flag to
  select personality then, and it also makes it obvious that id == 0 is
  just using current creds.

- Fixed the missing put_cred() in the teardown

- Use a local variable in io_submit_sqe() instead of assigning the
  creds to req->work.creds there

- Use cyclic idr allocation

I'm going to fold in as appropriate. If there are fixes needed on top of
that, let's do them separately.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29 14:59             ` Jann Horn
@ 2020-01-29 17:34               ` Jens Axboe
  2020-01-30  1:08                 ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2020-01-29 17:34 UTC (permalink / raw)
  To: Jann Horn
  Cc: Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov

On 1/29/20 7:59 AM, Jann Horn wrote:
> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <axboe@kernel.dk> wrote:
>> On 1/28/20 11:04 AM, Jens Axboe wrote:
>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
> [...]
>>>> #1 adds support for registering the personality of the invoking task,
>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
>>>> just having one link, it doesn't support a chain of them.
> [...]
>> I didn't like it becoming a bit too complicated, both in terms of
>> implementation and use. And the fact that we'd have to jump through
>> hoops to make this work for a full chain.
>>
>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>> This makes it way easier to use. Same branch:
>>
>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>
>> I'd feel much better with this variant for 5.6.
> 
> Some general feedback from an inspectability/debuggability perspective:
> 
> At some point, it might be nice if you could add a .show_fdinfo
> handler to the io_uring_fops that makes it possible to get a rough
> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd,
> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be
> helpful for debugging to be able to see information about the fixed
> files and buffers that have been registered. Same for the
> personalities; that information might also be useful when someone is
> trying to figure out what privileges a running process actually has.

Agree, that would be a very useful addition. I'll take a look at it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29 17:34                                       ` Jens Axboe
@ 2020-01-29 17:42                                         ` Jens Axboe
  2020-01-29 20:09                                           ` Stefan Metzmacher
  2020-01-29 17:46                                         ` Pavel Begunkov
  1 sibling, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2020-01-29 17:42 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List

On 1/29/20 10:34 AM, Jens Axboe wrote:
> On 1/29/20 7:23 AM, Pavel Begunkov wrote:
>>>>> The override_creds(personality_creds) has changed current->cred
>>>>> and get_current_cred() will just pick it up as in the default case.
>>>>>
>>>>> This would make the patch much simpler and allows put_cred() to be
>>>>> in io_put_work() instead of __io_req_aux_free() as explained above.
>>>>>
>>>>
>>>> It's one extra get_current_cred(). I'd prefer to find another way to
>>>> clean this up.
>>>
>>> As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case
>>> and the if (!req->work.creds) for both cases.
>>
>> Great, that you turned attention to that! override_creds() is already
>> grabbing a ref, so it shouldn't call get_cred() there.
>> So, that's a bug.
> 
> It's not though - one is dropped in that function, the other when the
> request is freed. So we do need two references to it. With the proposed
> change to keep the override_creds() variable local for that spot we
> don't, and the get_cred() can then go.
> 
>> It could be I'm wrong with the statement above, need to recheck all this
>> code to be sure.
> 
> I think you are :-)
> 
>> BTW, io_req_defer_prep() may be called twice for a req, so you will
>> reassign it without putting a ref. It's safer to leave NULL checks. At
>> least, until I've done reworking and fixing preparation paths.
> 
> Agree, the NULL checks are safer and we should keep them.
> 
> Going through the rest of this thread, I'm making the following changes:
> 
> - ID must be > 0. I like that change, as we don't need an sqe flag to
>   select personality then, and it also makes it obvious that id == 0 is
>   just using current creds.
> 
> - Fixed the missing put_cred() in the teardown
> 
> - Use a local variable in io_submit_sqe() instead of assigning the
>   creds to req->work.creds there
> 
> - Use cyclic idr allocation
> 
> I'm going to fold in as appropriate. If there are fixes needed on top of
> that, let's do them separately.

In particular, would love a patch that only assigns req->work.creds if
we do go async, so we can leave the put_cred() in io_put_work()
instead of needing it in __io_req_aux_free().

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29 17:34                                       ` Jens Axboe
  2020-01-29 17:42                                         ` Jens Axboe
@ 2020-01-29 17:46                                         ` Pavel Begunkov
  1 sibling, 0 replies; 48+ messages in thread
From: Pavel Begunkov @ 2020-01-29 17:46 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher; +Cc: io-uring, Linux API Mailing List

On 1/29/2020 8:34 PM, Jens Axboe wrote:
> On 1/29/20 7:23 AM, Pavel Begunkov wrote:
>>>>> The override_creds(personality_creds) has changed current->cred
>>>>> and get_current_cred() will just pick it up as in the default case.
>>>>>
>>>>> This would make the patch much simpler and allows put_cred() to be
>>>>> in io_put_work() instead of __io_req_aux_free() as explained above.
>>>>>
>>>>
>>>> It's one extra get_current_cred(). I'd prefer to find another way to
>>>> clean this up.
>>>
>>> As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case
>>> and the if (!req->work.creds) for both cases.
>>
>> Great, that you turned attention to that! override_creds() is already
>> grabbing a ref, so it shouldn't call get_cred() there.
>> So, that's a bug.
> 
> It's not though - one is dropped in that function, the other when the
> request is freed. So we do need two references to it. With the proposed
> change to keep the override_creds() variable local for that spot we
> don't, and the get_cred() can then go.

You're right here. It seems, it was too much looking at the same code :)

> 
>> It could be I'm wrong with the statement above, need to recheck all this
>> code to be sure.
> 
> I think you are :-)

Considering above, there shouldn't be much difference indeed.
One extra rcu_dereference() in get_current_creds() instead of
get_cred(), but that's nothing.

Later we can hide one get by using submission state from the long
patchset, I sent a while ago.

> 
>> BTW, io_req_defer_prep() may be called twice for a req, so you will
>> reassign it without putting a ref. It's safer to leave NULL checks. At
>> least, until I've done reworking and fixing preparation paths.
> 
> Agree, the NULL checks are safer and we should keep them.
> 
> Going through the rest of this thread, I'm making the following changes:
> 
> - ID must be > 0. I like that change, as we don't need an sqe flag to
>   select personality then, and it also makes it obvious that id == 0 is
>   just using current creds.
> 
> - Fixed the missing put_cred() in the teardown
> 
> - Use a local variable in io_submit_sqe() instead of assigning the
>   creds to req->work.creds there
> 
> - Use cyclic idr allocation
> 
> I'm going to fold in as appropriate. If there are fixes needed on top of
> that, let's do them separately.
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29 17:42                                         ` Jens Axboe
@ 2020-01-29 20:09                                           ` Stefan Metzmacher
  2020-01-29 20:48                                             ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Stefan Metzmacher @ 2020-01-29 20:09 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: io-uring, Linux API Mailing List

Am 29.01.20 um 18:42 schrieb Jens Axboe:
> On 1/29/20 10:34 AM, Jens Axboe wrote:
>> On 1/29/20 7:23 AM, Pavel Begunkov wrote:
>>>>>> The override_creds(personality_creds) has changed current->cred
>>>>>> and get_current_cred() will just pick it up as in the default case.
>>>>>>
>>>>>> This would make the patch much simpler and allows put_cred() to be
>>>>>> in io_put_work() instead of __io_req_aux_free() as explained above.
>>>>>>
>>>>>
>>>>> It's one extra get_current_cred(). I'd prefer to find another way to
>>>>> clean this up.
>>>>
>>>> As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case
>>>> and the if (!req->work.creds) for both cases.
>>>
>>> Great, that you turned attention to that! override_creds() is already
>>> grabbing a ref, so it shouldn't call get_cred() there.
>>> So, that's a bug.
>>
>> It's not though - one is dropped in that function, the other when the
>> request is freed. So we do need two references to it. With the proposed
>> change to keep the override_creds() variable local for that spot we
>> don't, and the get_cred() can then go.
>>
>>> It could be I'm wrong with the statement above, need to recheck all this
>>> code to be sure.
>>
>> I think you are :-)
>>
>>> BTW, io_req_defer_prep() may be called twice for a req, so you will
>>> reassign it without putting a ref. It's safer to leave NULL checks. At
>>> least, until I've done reworking and fixing preparation paths.
>>
>> Agree, the NULL checks are safer and we should keep them.
>>
>> Going through the rest of this thread, I'm making the following changes:
>>
>> - ID must be > 0. I like that change, as we don't need an sqe flag to
>>   select personality then, and it also makes it obvious that id == 0 is
>>   just using current creds.
>>
>> - Fixed the missing put_cred() in the teardown
>>
>> - Use a local variable in io_submit_sqe() instead of assigning the
>>   creds to req->work.creds there
>>
>> - Use cyclic idr allocation
>>
>> I'm going to fold in as appropriate. If there are fixes needed on top of
>> that, let's do them separately.
> 
> In particular, would love a patch that only assigns req->work.creds if
> we do go async, so we can leave the put_cred() in io_put_work()
> instead of needing it in __io_req_aux_free().

I made some improvements here:

https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-5.6/io_uring-vfs

Feel free to squash
https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=ce8812c9b935467bb08ed4d528dd92b9f67e221c
into
https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=22021e95e73d4658a6c834a3276886e127ab8425
and add my review to it.

If you're confident that it's safe you can
call io_req_work_drop_env() also in io_put_work(),

metze





^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29 20:09                                           ` Stefan Metzmacher
@ 2020-01-29 20:48                                             ` Jens Axboe
  0 siblings, 0 replies; 48+ messages in thread
From: Jens Axboe @ 2020-01-29 20:48 UTC (permalink / raw)
  To: Stefan Metzmacher, Pavel Begunkov; +Cc: io-uring, Linux API Mailing List

On 1/29/20 1:09 PM, Stefan Metzmacher wrote:
> Am 29.01.20 um 18:42 schrieb Jens Axboe:
>> On 1/29/20 10:34 AM, Jens Axboe wrote:
>>> On 1/29/20 7:23 AM, Pavel Begunkov wrote:
>>>>>>> The override_creds(personality_creds) has changed current->cred
>>>>>>> and get_current_cred() will just pick it up as in the default case.
>>>>>>>
>>>>>>> This would make the patch much simpler and allows put_cred() to be
>>>>>>> in io_put_work() instead of __io_req_aux_free() as explained above.
>>>>>>>
>>>>>>
>>>>>> It's one extra get_current_cred(). I'd prefer to find another way to
>>>>>> clean this up.
>>>>>
>>>>> As far as I can see it avoids a get_cred() in the IOSQE_PERSONALITY case
>>>>> and the if (!req->work.creds) for both cases.
>>>>
>>>> Great, that you turned attention to that! override_creds() is already
>>>> grabbing a ref, so it shouldn't call get_cred() there.
>>>> So, that's a bug.
>>>
>>> It's not though - one is dropped in that function, the other when the
>>> request is freed. So we do need two references to it. With the proposed
>>> change to keep the override_creds() variable local for that spot we
>>> don't, and the get_cred() can then go.
>>>
>>>> It could be I'm wrong with the statement above, need to recheck all this
>>>> code to be sure.
>>>
>>> I think you are :-)
>>>
>>>> BTW, io_req_defer_prep() may be called twice for a req, so you will
>>>> reassign it without putting a ref. It's safer to leave NULL checks. At
>>>> least, until I've done reworking and fixing preparation paths.
>>>
>>> Agree, the NULL checks are safer and we should keep them.
>>>
>>> Going through the rest of this thread, I'm making the following changes:
>>>
>>> - ID must be > 0. I like that change, as we don't need an sqe flag to
>>>   select personality then, and it also makes it obvious that id == 0 is
>>>   just using current creds.
>>>
>>> - Fixed the missing put_cred() in the teardown
>>>
>>> - Use a local variable in io_submit_sqe() instead of assigning the
>>>   creds to req->work.creds there
>>>
>>> - Use cyclic idr allocation
>>>
>>> I'm going to fold in as appropriate. If there are fixes needed on top of
>>> that, let's do them separately.
>>
>> In particular, would love a patch that only assigns req->work.creds if
>> we do go async, so we can leave the put_cred() in io_put_work()
>> instead of needing it in __io_req_aux_free().
> 
> I made some improvements here:
> 
> https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/for-5.6/io_uring-vfs
> 
> Feel free to squash
> https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=ce8812c9b935467bb08ed4d528dd92b9f67e221c
> into
> https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=22021e95e73d4658a6c834a3276886e127ab8425
> and add my review to it.

Looks good to me, folded. Thanks!

> If you're confident that it's safe you can
> call io_req_work_drop_env() also in io_put_work(),

Let's look into that later, want to flush this out sooner rather than
later...


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-29 17:34               ` Jens Axboe
@ 2020-01-30  1:08                 ` Jens Axboe
  2020-01-30  2:20                   ` Jens Axboe
                                     ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Jens Axboe @ 2020-01-30  1:08 UTC (permalink / raw)
  To: Jann Horn
  Cc: Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov

On 1/29/20 10:34 AM, Jens Axboe wrote:
> On 1/29/20 7:59 AM, Jann Horn wrote:
>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <axboe@kernel.dk> wrote:
>>> On 1/28/20 11:04 AM, Jens Axboe wrote:
>>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
>> [...]
>>>>> #1 adds support for registering the personality of the invoking task,
>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
>>>>> just having one link, it doesn't support a chain of them.
>> [...]
>>> I didn't like it becoming a bit too complicated, both in terms of
>>> implementation and use. And the fact that we'd have to jump through
>>> hoops to make this work for a full chain.
>>>
>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>>> This makes it way easier to use. Same branch:
>>>
>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>
>>> I'd feel much better with this variant for 5.6.
>>
>> Some general feedback from an inspectability/debuggability perspective:
>>
>> At some point, it might be nice if you could add a .show_fdinfo
>> handler to the io_uring_fops that makes it possible to get a rough
>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd,
>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be
>> helpful for debugging to be able to see information about the fixed
>> files and buffers that have been registered. Same for the
>> personalities; that information might also be useful when someone is
>> trying to figure out what privileges a running process actually has.
> 
> Agree, that would be a very useful addition. I'll take a look at it.

Jann, how much info are you looking for? Here's a rough start, just
shows the number of registered files and buffers, and lists the
personalities registered. We could also dump the buffer info for
each of them, and ditto for the files. Not sure how much verbosity
is acceptable in fdinfo?

Here's the test app for personality:

# cat 3
pos:	0
flags:	02000002
mnt_id:	14
user-files: 0
user-bufs: 0
personalities:
	    1: uid=0/gid=0


diff --git a/fs/io_uring.c b/fs/io_uring.c
index c5ca84a305d3..0b2c7d800297 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6511,6 +6505,45 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 	return submitted ? submitted : ret;
 }
 
+struct ring_show_idr {
+	struct io_ring_ctx *ctx;
+	struct seq_file *m;
+};
+
+static int io_uring_show_cred(int id, void *p, void *data)
+{
+	struct ring_show_idr *r = data;
+	const struct cred *cred = p;
+
+	seq_printf(r->m, "\t%5d: uid=%u/gid=%u\n", id, cred->uid.val,
+						cred->gid.val);
+	return 0;
+}
+
+static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
+{
+	struct ring_show_idr r = { .ctx = ctx, .m = m };
+
+	mutex_lock(&ctx->uring_lock);
+	seq_printf(m, "user-files: %d\n", ctx->nr_user_files);
+	seq_printf(m, "user-bufs: %d\n", ctx->nr_user_bufs);
+	if (!idr_is_empty(&ctx->personality_idr)) {
+		seq_printf(m, "personalities:\n");
+		idr_for_each(&ctx->personality_idr, io_uring_show_cred, &r);
+	}
+	mutex_unlock(&ctx->uring_lock);
+}
+
+static void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
+{
+	struct io_ring_ctx *ctx = f->private_data;
+
+	if (percpu_ref_tryget(&ctx->refs)) {
+		__io_uring_show_fdinfo(ctx, m);
+		percpu_ref_put(&ctx->refs);
+	}
+}
+
 static const struct file_operations io_uring_fops = {
 	.release	= io_uring_release,
 	.flush		= io_uring_flush,
@@ -6521,6 +6554,7 @@ static const struct file_operations io_uring_fops = {
 #endif
 	.poll		= io_uring_poll,
 	.fasync		= io_uring_fasync,
+	.show_fdinfo	= io_uring_show_fdinfo,
 };
 
 static int io_allocate_scq_urings(struct io_ring_ctx *ctx,

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-30  1:08                 ` Jens Axboe
@ 2020-01-30  2:20                   ` Jens Axboe
  2020-01-30  3:18                     ` Jens Axboe
  2020-01-30  6:53                   ` Stefan Metzmacher
  2020-01-30 10:11                   ` Jann Horn
  2 siblings, 1 reply; 48+ messages in thread
From: Jens Axboe @ 2020-01-30  2:20 UTC (permalink / raw)
  To: Jann Horn
  Cc: Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov

On 1/29/20 6:08 PM, Jens Axboe wrote:
> On 1/29/20 10:34 AM, Jens Axboe wrote:
>> On 1/29/20 7:59 AM, Jann Horn wrote:
>>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>> On 1/28/20 11:04 AM, Jens Axboe wrote:
>>>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
>>> [...]
>>>>>> #1 adds support for registering the personality of the invoking task,
>>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
>>>>>> just having one link, it doesn't support a chain of them.
>>> [...]
>>>> I didn't like it becoming a bit too complicated, both in terms of
>>>> implementation and use. And the fact that we'd have to jump through
>>>> hoops to make this work for a full chain.
>>>>
>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>>>> This makes it way easier to use. Same branch:
>>>>
>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>>
>>>> I'd feel much better with this variant for 5.6.
>>>
>>> Some general feedback from an inspectability/debuggability perspective:
>>>
>>> At some point, it might be nice if you could add a .show_fdinfo
>>> handler to the io_uring_fops that makes it possible to get a rough
>>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd,
>>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be
>>> helpful for debugging to be able to see information about the fixed
>>> files and buffers that have been registered. Same for the
>>> personalities; that information might also be useful when someone is
>>> trying to figure out what privileges a running process actually has.
>>
>> Agree, that would be a very useful addition. I'll take a look at it.
> 
> Jann, how much info are you looking for? Here's a rough start, just
> shows the number of registered files and buffers, and lists the
> personalities registered. We could also dump the buffer info for
> each of them, and ditto for the files. Not sure how much verbosity
> is acceptable in fdinfo?
> 
> Here's the test app for personality:
> 
> # cat 3
> pos:	0
> flags:	02000002
> mnt_id:	14
> user-files: 0
> user-bufs: 0
> personalities:
> 	    1: uid=0/gid=0

Here's one that adds the registered buffers and files as well. So
essentially this shows any info on the registered parts.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index c5ca84a305d3..e306691bc7a4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6511,6 +6505,55 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 	return submitted ? submitted : ret;
 }
 
+static int io_uring_show_cred(int id, void *p, void *data)
+{
+	const struct cred *cred = p;
+	struct seq_file *m = data;
+
+	seq_printf(m, "%5d: uid=%u/gid=%u\n", id, cred->uid.val, cred->gid.val);
+	return 0;
+}
+
+static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
+{
+	int i;
+
+	mutex_lock(&ctx->uring_lock);
+	seq_printf(m, "user-files: %d\n", ctx->nr_user_files);
+	for (i = 0; i < ctx->nr_user_files; i++) {
+		struct fixed_file_table *table;
+		struct file *f;
+
+		table = &ctx->file_data->table[i >> IORING_FILE_TABLE_SHIFT];
+		f = table->files[i & IORING_FILE_TABLE_MASK];
+		if (f)
+			seq_printf(m, "%5d: %s\n", i, file_dentry(f)->d_iname);
+		else
+			seq_printf(m, "%5d: <none>\n", i);
+	}
+	seq_printf(m, "user-bufs: %d\n", ctx->nr_user_bufs);
+	for (i = 0; i < ctx->nr_user_bufs; i++) {
+		struct io_mapped_ubuf *buf = &ctx->user_bufs[i];
+
+		seq_printf(m, "%5d: 0x%llx/%lu\n", i, buf->ubuf, buf->len);
+	}
+	if (!idr_is_empty(&ctx->personality_idr)) {
+		seq_printf(m, "personalities:\n");
+		idr_for_each(&ctx->personality_idr, io_uring_show_cred, m);
+	}
+	mutex_unlock(&ctx->uring_lock);
+}
+
+static void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
+{
+	struct io_ring_ctx *ctx = f->private_data;
+
+	if (percpu_ref_tryget(&ctx->refs)) {
+		__io_uring_show_fdinfo(ctx, m);
+		percpu_ref_put(&ctx->refs);
+	}
+}
+
 static const struct file_operations io_uring_fops = {
 	.release	= io_uring_release,
 	.flush		= io_uring_flush,
@@ -6521,6 +6564,7 @@ static const struct file_operations io_uring_fops = {
 #endif
 	.poll		= io_uring_poll,
 	.fasync		= io_uring_fasync,
+	.show_fdinfo	= io_uring_show_fdinfo,
 };
 
 static int io_allocate_scq_urings(struct io_ring_ctx *ctx,

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-30  2:20                   ` Jens Axboe
@ 2020-01-30  3:18                     ` Jens Axboe
  0 siblings, 0 replies; 48+ messages in thread
From: Jens Axboe @ 2020-01-30  3:18 UTC (permalink / raw)
  To: Jann Horn
  Cc: Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov

On 1/29/20 7:20 PM, Jens Axboe wrote:
> On 1/29/20 6:08 PM, Jens Axboe wrote:
>> On 1/29/20 10:34 AM, Jens Axboe wrote:
>>> On 1/29/20 7:59 AM, Jann Horn wrote:
>>>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>> On 1/28/20 11:04 AM, Jens Axboe wrote:
>>>>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
>>>> [...]
>>>>>>> #1 adds support for registering the personality of the invoking task,
>>>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
>>>>>>> just having one link, it doesn't support a chain of them.
>>>> [...]
>>>>> I didn't like it becoming a bit too complicated, both in terms of
>>>>> implementation and use. And the fact that we'd have to jump through
>>>>> hoops to make this work for a full chain.
>>>>>
>>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>>>>> This makes it way easier to use. Same branch:
>>>>>
>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>>>
>>>>> I'd feel much better with this variant for 5.6.
>>>>
>>>> Some general feedback from an inspectability/debuggability perspective:
>>>>
>>>> At some point, it might be nice if you could add a .show_fdinfo
>>>> handler to the io_uring_fops that makes it possible to get a rough
>>>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd,
>>>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be
>>>> helpful for debugging to be able to see information about the fixed
>>>> files and buffers that have been registered. Same for the
>>>> personalities; that information might also be useful when someone is
>>>> trying to figure out what privileges a running process actually has.
>>>
>>> Agree, that would be a very useful addition. I'll take a look at it.
>>
>> Jann, how much info are you looking for? Here's a rough start, just
>> shows the number of registered files and buffers, and lists the
>> personalities registered. We could also dump the buffer info for
>> each of them, and ditto for the files. Not sure how much verbosity
>> is acceptable in fdinfo?
>>
>> Here's the test app for personality:
>>
>> # cat 3
>> pos:	0
>> flags:	02000002
>> mnt_id:	14
>> user-files: 0
>> user-bufs: 0
>> personalities:
>> 	    1: uid=0/gid=0
> 
> Here's one that adds the registered buffers and files as well. So
> essentially this shows any info on the registered parts.

Output of this version, using t/io_uring from fio:

# cat /proc/338/fdinfo/4
pos:	0
flags:	02000002
mnt_id:	14
user-files: 1
    0: nvme0n1p2
user-bufs: 128
    0: 55ea44b52000/4096
    1: 55ea44b54000/4096
    2: 55ea44b56000/4096
    3: 55ea44b58000/4096
    4: 55ea44b5a000/4096
    5: 55ea44b5c000/4096
    6: 55ea44b5e000/4096
    7: 55ea44b60000/4096
    8: 55ea44b62000/4096
    9: 55ea44b64000/4096
   10: 55ea44b66000/4096
   11: 55ea44b68000/4096
   12: 55ea44b6a000/4096
   13: 55ea44b6c000/4096
   14: 55ea44b6e000/4096
   15: 55ea44b70000/4096
   16: 55ea44b72000/4096
   17: 55ea44b74000/4096
   18: 55ea44b76000/4096
   19: 55ea44b78000/4096
   20: 55ea44b7a000/4096
   21: 55ea44b7c000/4096
   22: 55ea44b7e000/4096
   23: 55ea44b80000/4096
   24: 55ea44b82000/4096
   25: 55ea44b84000/4096
   26: 55ea44b86000/4096
   27: 55ea44b88000/4096
   28: 55ea44b8a000/4096
   29: 55ea44b8c000/4096
   30: 55ea44b8e000/4096
   31: 55ea44b90000/4096
   32: 55ea44b92000/4096
   33: 55ea44b94000/4096
   34: 55ea44b96000/4096
   35: 55ea44b98000/4096
   36: 55ea44b9a000/4096
   37: 55ea44b9c000/4096
   38: 55ea44b9e000/4096
   39: 55ea44ba0000/4096
   40: 55ea44ba2000/4096
   41: 55ea44ba4000/4096
   42: 55ea44ba6000/4096
   43: 55ea44ba8000/4096
   44: 55ea44baa000/4096
   45: 55ea44bac000/4096
   46: 55ea44bae000/4096
   47: 55ea44bb0000/4096
   48: 55ea44bb2000/4096
   49: 55ea44bb4000/4096
   50: 55ea44bb6000/4096
   51: 55ea44bb8000/4096
   52: 55ea44bba000/4096
   53: 55ea44bbc000/4096
   54: 55ea44bbe000/4096
   55: 55ea44bc0000/4096
   56: 55ea44bc2000/4096
   57: 55ea44bc4000/4096
   58: 55ea44bc6000/4096
   59: 55ea44bc8000/4096
   60: 55ea44bca000/4096
   61: 55ea44bcc000/4096
   62: 55ea44bce000/4096
   63: 55ea44bd0000/4096
   64: 55ea44bd2000/4096
   65: 55ea44bd4000/4096
   66: 55ea44bd6000/4096
   67: 55ea44bd8000/4096
   68: 55ea44bda000/4096
   69: 55ea44bdc000/4096
   70: 55ea44bde000/4096
   71: 55ea44be0000/4096
   72: 55ea44be2000/4096
   73: 55ea44be4000/4096
   74: 55ea44be6000/4096
   75: 55ea44be8000/4096
   76: 55ea44bea000/4096
   77: 55ea44bec000/4096
   78: 55ea44bee000/4096
   79: 55ea44bf0000/4096
   80: 55ea44bf2000/4096
   81: 55ea44bf4000/4096
   82: 55ea44bf6000/4096
   83: 55ea44bf8000/4096
   84: 55ea44bfa000/4096
   85: 55ea44bfc000/4096
   86: 55ea44bfe000/4096
   87: 55ea44c00000/4096
   88: 55ea44c02000/4096
   89: 55ea44c04000/4096
   90: 55ea44c06000/4096
   91: 55ea44c08000/4096
   92: 55ea44c0a000/4096
   93: 55ea44c0c000/4096
   94: 55ea44c0e000/4096
   95: 55ea44c10000/4096
   96: 55ea44c12000/4096
   97: 55ea44c14000/4096
   98: 55ea44c16000/4096
   99: 55ea44c18000/4096
  100: 55ea44c1a000/4096
  101: 55ea44c1c000/4096
  102: 55ea44c1e000/4096
  103: 55ea44c20000/4096
  104: 55ea44c22000/4096
  105: 55ea44c24000/4096
  106: 55ea44c26000/4096
  107: 55ea44c28000/4096
  108: 55ea44c2a000/4096
  109: 55ea44c2c000/4096
  110: 55ea44c2e000/4096
  111: 55ea44c30000/4096
  112: 55ea44c32000/4096
  113: 55ea44c34000/4096
  114: 55ea44c36000/4096
  115: 55ea44c38000/4096
  116: 55ea44c3a000/4096
  117: 55ea44c3c000/4096
  118: 55ea44c3e000/4096
  119: 55ea44c40000/4096
  120: 55ea44c42000/4096
  121: 55ea44c44000/4096
  122: 55ea44c46000/4096
  123: 55ea44c48000/4096
  124: 55ea44c4a000/4096
  125: 55ea44c4c000/4096
  126: 55ea44c4e000/4096
  127: 55ea44c50000/4096

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-30  1:08                 ` Jens Axboe
  2020-01-30  2:20                   ` Jens Axboe
@ 2020-01-30  6:53                   ` Stefan Metzmacher
  2020-01-30 10:11                   ` Jann Horn
  2 siblings, 0 replies; 48+ messages in thread
From: Stefan Metzmacher @ 2020-01-30  6:53 UTC (permalink / raw)
  To: Jens Axboe, Jann Horn; +Cc: io-uring, Linux API Mailing List, Pavel Begunkov


[-- Attachment #1.1: Type: text/plain, Size: 822 bytes --]

Hi Jens,

> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index c5ca84a305d3..0b2c7d800297 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -6511,6 +6505,45 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
>  	return submitted ? submitted : ret;
>  }
>  
> +struct ring_show_idr {
> +	struct io_ring_ctx *ctx;
> +	struct seq_file *m;
> +};
> +
> +static int io_uring_show_cred(int id, void *p, void *data)
> +{
> +	struct ring_show_idr *r = data;
> +	const struct cred *cred = p;
> +
> +	seq_printf(r->m, "\t%5d: uid=%u/gid=%u\n", id, cred->uid.val,
> +						cred->gid.val);
> +	return 0;
> +}

I think we should print similar information as task_state(),
there we have:

Uid:    1000   1000   1000   1000
Gid:    1000   1000   1000   1000
Groups: 1 2 3 4 5 1000


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-30  1:08                 ` Jens Axboe
  2020-01-30  2:20                   ` Jens Axboe
  2020-01-30  6:53                   ` Stefan Metzmacher
@ 2020-01-30 10:11                   ` Jann Horn
  2020-01-30 10:26                     ` Christian Brauner
  2 siblings, 1 reply; 48+ messages in thread
From: Jann Horn @ 2020-01-30 10:11 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov

On Thu, Jan 30, 2020 at 2:08 AM Jens Axboe <axboe@kernel.dk> wrote:
> On 1/29/20 10:34 AM, Jens Axboe wrote:
> > On 1/29/20 7:59 AM, Jann Horn wrote:
> >> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <axboe@kernel.dk> wrote:
> >>> On 1/28/20 11:04 AM, Jens Axboe wrote:
> >>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
> >> [...]
> >>>>> #1 adds support for registering the personality of the invoking task,
> >>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
> >>>>> just having one link, it doesn't support a chain of them.
> >> [...]
> >>> I didn't like it becoming a bit too complicated, both in terms of
> >>> implementation and use. And the fact that we'd have to jump through
> >>> hoops to make this work for a full chain.
> >>>
> >>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
> >>> This makes it way easier to use. Same branch:
> >>>
> >>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
> >>>
> >>> I'd feel much better with this variant for 5.6.
> >>
> >> Some general feedback from an inspectability/debuggability perspective:
> >>
> >> At some point, it might be nice if you could add a .show_fdinfo
> >> handler to the io_uring_fops that makes it possible to get a rough
> >> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd,
> >> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be
> >> helpful for debugging to be able to see information about the fixed
> >> files and buffers that have been registered. Same for the
> >> personalities; that information might also be useful when someone is
> >> trying to figure out what privileges a running process actually has.
> >
> > Agree, that would be a very useful addition. I'll take a look at it.
>
> Jann, how much info are you looking for? Here's a rough start, just
> shows the number of registered files and buffers, and lists the
> personalities registered. We could also dump the buffer info for
> each of them, and ditto for the files. Not sure how much verbosity
> is acceptable in fdinfo?

At the moment, I personally am just interested in this from the
perspective of being able to audit the state of personalities, to make
important information about the security state of processes visible.

Good point about verbosity in fdinfo - I'm not sure about that myself either.

> Here's the test app for personality:

Oh, that was quick...

> # cat 3
> pos:    0
> flags:  02000002
> mnt_id: 14
> user-files: 0
> user-bufs: 0
> personalities:
>             1: uid=0/gid=0
>
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index c5ca84a305d3..0b2c7d800297 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -6511,6 +6505,45 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
>         return submitted ? submitted : ret;
>  }
>
> +struct ring_show_idr {
> +       struct io_ring_ctx *ctx;
> +       struct seq_file *m;
> +};
> +
> +static int io_uring_show_cred(int id, void *p, void *data)
> +{
> +       struct ring_show_idr *r = data;
> +       const struct cred *cred = p;
> +
> +       seq_printf(r->m, "\t%5d: uid=%u/gid=%u\n", id, cred->uid.val,
> +                                               cred->gid.val);

As Stefan said, the ->uid and ->gid aren't very useful, since when a
process switches UIDs for accessing things in the filesystem, it
probably only changes its EUID and FSUID, not its RUID.
I think what's particularly relevant for uring would be the ->fsuid
and the ->fsgid along with ->cap_effective; and perhaps for some
operations also the ->euid and ->egid. The real UID/GID aren't really
relevant when performing normal filesystem operations and such.

> +       return 0;
> +}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-30 10:11                   ` Jann Horn
@ 2020-01-30 10:26                     ` Christian Brauner
  2020-01-30 14:11                       ` Jens Axboe
  0 siblings, 1 reply; 48+ messages in thread
From: Christian Brauner @ 2020-01-30 10:26 UTC (permalink / raw)
  To: Jann Horn
  Cc: Jens Axboe, Stefan Metzmacher, io-uring, Linux API Mailing List,
	Pavel Begunkov

On Thu, Jan 30, 2020 at 11:11:58AM +0100, Jann Horn wrote:
> On Thu, Jan 30, 2020 at 2:08 AM Jens Axboe <axboe@kernel.dk> wrote:
> > On 1/29/20 10:34 AM, Jens Axboe wrote:
> > > On 1/29/20 7:59 AM, Jann Horn wrote:
> > >> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <axboe@kernel.dk> wrote:
> > >>> On 1/28/20 11:04 AM, Jens Axboe wrote:
> > >>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
> > >> [...]
> > >>>>> #1 adds support for registering the personality of the invoking task,
> > >>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
> > >>>>> just having one link, it doesn't support a chain of them.
> > >> [...]
> > >>> I didn't like it becoming a bit too complicated, both in terms of
> > >>> implementation and use. And the fact that we'd have to jump through
> > >>> hoops to make this work for a full chain.
> > >>>
> > >>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
> > >>> This makes it way easier to use. Same branch:
> > >>>
> > >>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
> > >>>
> > >>> I'd feel much better with this variant for 5.6.
> > >>
> > >> Some general feedback from an inspectability/debuggability perspective:
> > >>
> > >> At some point, it might be nice if you could add a .show_fdinfo
> > >> handler to the io_uring_fops that makes it possible to get a rough
> > >> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd,
> > >> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be
> > >> helpful for debugging to be able to see information about the fixed
> > >> files and buffers that have been registered. Same for the
> > >> personalities; that information might also be useful when someone is
> > >> trying to figure out what privileges a running process actually has.
> > >
> > > Agree, that would be a very useful addition. I'll take a look at it.
> >
> > Jann, how much info are you looking for? Here's a rough start, just
> > shows the number of registered files and buffers, and lists the
> > personalities registered. We could also dump the buffer info for
> > each of them, and ditto for the files. Not sure how much verbosity
> > is acceptable in fdinfo?
> 
> At the moment, I personally am just interested in this from the
> perspective of being able to audit the state of personalities, to make
> important information about the security state of processes visible.
> 
> Good point about verbosity in fdinfo - I'm not sure about that myself either.
> 
> > Here's the test app for personality:
> 
> Oh, that was quick...
> 
> > # cat 3
> > pos:    0
> > flags:  02000002
> > mnt_id: 14
> > user-files: 0
> > user-bufs: 0
> > personalities:
> >             1: uid=0/gid=0
> >
> >
> > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > index c5ca84a305d3..0b2c7d800297 100644
> > --- a/fs/io_uring.c
> > +++ b/fs/io_uring.c
> > @@ -6511,6 +6505,45 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
> >         return submitted ? submitted : ret;
> >  }
> >
> > +struct ring_show_idr {
> > +       struct io_ring_ctx *ctx;
> > +       struct seq_file *m;
> > +};
> > +
> > +static int io_uring_show_cred(int id, void *p, void *data)
> > +{
> > +       struct ring_show_idr *r = data;
> > +       const struct cred *cred = p;
> > +
> > +       seq_printf(r->m, "\t%5d: uid=%u/gid=%u\n", id, cred->uid.val,
> > +                                               cred->gid.val);
> 
> As Stefan said, the ->uid and ->gid aren't very useful, since when a
> process switches UIDs for accessing things in the filesystem, it
> probably only changes its EUID and FSUID, not its RUID.
> I think what's particularly relevant for uring would be the ->fsuid
> and the ->fsgid along with ->cap_effective; and perhaps for some
> operations also the ->euid and ->egid. The real UID/GID aren't really
> relevant when performing normal filesystem operations and such.

This should probably just use the same format that is found in
/proc/<pid>/status to make it easy for tools to use the same parsing
logic and for the sake of consistency. We've adapted the same format for
pidfds. So that would mean:

Uid:	1000	1000	1000	1000
Gid:	1000	1000	1000	1000

Which would be: Real, effective, saved set, and filesystem {G,U}IDs

And CapEff in /proc/<pid>/status has the format:
CapEff:	0000000000000000

Christian

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-30 10:26                     ` Christian Brauner
@ 2020-01-30 14:11                       ` Jens Axboe
  2020-01-30 14:47                         ` Stefan Metzmacher
  2020-01-30 15:13                         ` Christian Brauner
  0 siblings, 2 replies; 48+ messages in thread
From: Jens Axboe @ 2020-01-30 14:11 UTC (permalink / raw)
  To: Christian Brauner, Jann Horn
  Cc: Stefan Metzmacher, io-uring, Linux API Mailing List, Pavel Begunkov

On 1/30/20 3:26 AM, Christian Brauner wrote:
> On Thu, Jan 30, 2020 at 11:11:58AM +0100, Jann Horn wrote:
>> On Thu, Jan 30, 2020 at 2:08 AM Jens Axboe <axboe@kernel.dk> wrote:
>>> On 1/29/20 10:34 AM, Jens Axboe wrote:
>>>> On 1/29/20 7:59 AM, Jann Horn wrote:
>>>>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>> On 1/28/20 11:04 AM, Jens Axboe wrote:
>>>>>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
>>>>> [...]
>>>>>>>> #1 adds support for registering the personality of the invoking task,
>>>>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
>>>>>>>> just having one link, it doesn't support a chain of them.
>>>>> [...]
>>>>>> I didn't like it becoming a bit too complicated, both in terms of
>>>>>> implementation and use. And the fact that we'd have to jump through
>>>>>> hoops to make this work for a full chain.
>>>>>>
>>>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>>>>>> This makes it way easier to use. Same branch:
>>>>>>
>>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>>>>
>>>>>> I'd feel much better with this variant for 5.6.
>>>>>
>>>>> Some general feedback from an inspectability/debuggability perspective:
>>>>>
>>>>> At some point, it might be nice if you could add a .show_fdinfo
>>>>> handler to the io_uring_fops that makes it possible to get a rough
>>>>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd,
>>>>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be
>>>>> helpful for debugging to be able to see information about the fixed
>>>>> files and buffers that have been registered. Same for the
>>>>> personalities; that information might also be useful when someone is
>>>>> trying to figure out what privileges a running process actually has.
>>>>
>>>> Agree, that would be a very useful addition. I'll take a look at it.
>>>
>>> Jann, how much info are you looking for? Here's a rough start, just
>>> shows the number of registered files and buffers, and lists the
>>> personalities registered. We could also dump the buffer info for
>>> each of them, and ditto for the files. Not sure how much verbosity
>>> is acceptable in fdinfo?
>>
>> At the moment, I personally am just interested in this from the
>> perspective of being able to audit the state of personalities, to make
>> important information about the security state of processes visible.
>>
>> Good point about verbosity in fdinfo - I'm not sure about that myself either.
>>
>>> Here's the test app for personality:
>>
>> Oh, that was quick...
>>
>>> # cat 3
>>> pos:    0
>>> flags:  02000002
>>> mnt_id: 14
>>> user-files: 0
>>> user-bufs: 0
>>> personalities:
>>>             1: uid=0/gid=0
>>>
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index c5ca84a305d3..0b2c7d800297 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -6511,6 +6505,45 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
>>>         return submitted ? submitted : ret;
>>>  }
>>>
>>> +struct ring_show_idr {
>>> +       struct io_ring_ctx *ctx;
>>> +       struct seq_file *m;
>>> +};
>>> +
>>> +static int io_uring_show_cred(int id, void *p, void *data)
>>> +{
>>> +       struct ring_show_idr *r = data;
>>> +       const struct cred *cred = p;
>>> +
>>> +       seq_printf(r->m, "\t%5d: uid=%u/gid=%u\n", id, cred->uid.val,
>>> +                                               cred->gid.val);
>>
>> As Stefan said, the ->uid and ->gid aren't very useful, since when a
>> process switches UIDs for accessing things in the filesystem, it
>> probably only changes its EUID and FSUID, not its RUID.
>> I think what's particularly relevant for uring would be the ->fsuid
>> and the ->fsgid along with ->cap_effective; and perhaps for some
>> operations also the ->euid and ->egid. The real UID/GID aren't really
>> relevant when performing normal filesystem operations and such.
> 
> This should probably just use the same format that is found in
> /proc/<pid>/status to make it easy for tools to use the same parsing
> logic and for the sake of consistency. We've adapted the same format for
> pidfds. So that would mean:
> 
> Uid:	1000	1000	1000	1000
> Gid:	1000	1000	1000	1000
> 
> Which would be: Real, effective, saved set, and filesystem {G,U}IDs
> 
> And CapEff in /proc/<pid>/status has the format:
> CapEff:	0000000000000000

I agree, consistency is good. I've added this, and also changed the
naming to be CamelCase, which is seems like most of them are. Now it
looks like this:

pos:	0
flags:	02000002
mnt_id:	14
UserFiles:     0
UserBufs:     0
Personalities:
    1
	Uid:	0		0		0		0
	Gid:	0		0		0		0
	Groups:	0
	CapEff:	0000003fffffffff

for a single personality registered (root). I have to indent it an extra
tab to display each personality.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index c5ca84a305d3..8b2411542f3e 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6511,6 +6505,79 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 	return submitted ? submitted : ret;
 }
 
+static int io_uring_show_cred(int id, void *p, void *data)
+{
+	const struct cred *cred = p;
+	struct seq_file *m = data;
+	struct user_namespace *uns = seq_user_ns(m);
+	struct group_info *gi;
+	kernel_cap_t cap;
+	unsigned __capi;
+	int g;
+
+	seq_printf(m, "%5d\n", id);
+	seq_put_decimal_ull(m, "\tUid:\t", from_kuid_munged(uns, cred->uid));
+	seq_put_decimal_ull(m, "\t\t", from_kuid_munged(uns, cred->euid));
+	seq_put_decimal_ull(m, "\t\t", from_kuid_munged(uns, cred->suid));
+	seq_put_decimal_ull(m, "\t\t", from_kuid_munged(uns, cred->fsuid));
+	seq_put_decimal_ull(m, "\n\tGid:\t", from_kgid_munged(uns, cred->gid));
+	seq_put_decimal_ull(m, "\t\t", from_kgid_munged(uns, cred->egid));
+	seq_put_decimal_ull(m, "\t\t", from_kgid_munged(uns, cred->sgid));
+	seq_put_decimal_ull(m, "\t\t", from_kgid_munged(uns, cred->fsgid));
+	seq_puts(m, "\n\tGroups:\t");
+	gi = cred->group_info;
+	for (g = 0; g < gi->ngroups; g++) {
+		seq_put_decimal_ull(m, g ? " " : "",
+					from_kgid_munged(uns, gi->gid[g]));
+	}
+	seq_puts(m, "\n\tCapEff:\t");
+	cap = cred->cap_effective;
+	CAP_FOR_EACH_U32(__capi)
+		seq_put_hex_ll(m, NULL, cap.cap[CAP_LAST_U32 - __capi], 8);
+	seq_putc(m, '\n');
+	return 0;
+}
+
+static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
+{
+	int i;
+
+	mutex_lock(&ctx->uring_lock);
+	seq_printf(m, "UserFiles: %5u\n", ctx->nr_user_files);
+	for (i = 0; i < ctx->nr_user_files; i++) {
+		struct fixed_file_table *table;
+		struct file *f;
+
+		table = &ctx->file_data->table[i >> IORING_FILE_TABLE_SHIFT];
+		f = table->files[i & IORING_FILE_TABLE_MASK];
+		if (f)
+			seq_printf(m, "%5u: %s\n", i, file_dentry(f)->d_iname);
+		else
+			seq_printf(m, "%5u: <none>\n", i);
+	}
+	seq_printf(m, "UserBufs: %5u\n", ctx->nr_user_bufs);
+	for (i = 0; i < ctx->nr_user_bufs; i++) {
+		struct io_mapped_ubuf *buf = &ctx->user_bufs[i];
+
+		seq_printf(m, "%5u: 0x%llx/%lu\n", i, buf->ubuf, buf->len);
+	}
+	if (!idr_is_empty(&ctx->personality_idr)) {
+		seq_printf(m, "Personalities:\n");
+		idr_for_each(&ctx->personality_idr, io_uring_show_cred, m);
+	}
+	mutex_unlock(&ctx->uring_lock);
+}
+
+static void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
+{
+	struct io_ring_ctx *ctx = f->private_data;
+
+	if (percpu_ref_tryget(&ctx->refs)) {
+		__io_uring_show_fdinfo(ctx, m);
+		percpu_ref_put(&ctx->refs);
+	}
+}
+
 static const struct file_operations io_uring_fops = {
 	.release	= io_uring_release,
 	.flush		= io_uring_flush,
@@ -6521,6 +6588,7 @@ static const struct file_operations io_uring_fops = {
 #endif
 	.poll		= io_uring_poll,
 	.fasync		= io_uring_fasync,
+	.show_fdinfo	= io_uring_show_fdinfo,
 };
 
 static int io_allocate_scq_urings(struct io_ring_ctx *ctx,

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-30 14:11                       ` Jens Axboe
@ 2020-01-30 14:47                         ` Stefan Metzmacher
  2020-01-30 15:34                           ` Jens Axboe
  2020-01-30 15:13                         ` Christian Brauner
  1 sibling, 1 reply; 48+ messages in thread
From: Stefan Metzmacher @ 2020-01-30 14:47 UTC (permalink / raw)
  To: Jens Axboe, Christian Brauner, Jann Horn
  Cc: io-uring, Linux API Mailing List, Pavel Begunkov


[-- Attachment #1.1: Type: text/plain, Size: 5697 bytes --]

Am 30.01.20 um 15:11 schrieb Jens Axboe:
> On 1/30/20 3:26 AM, Christian Brauner wrote:
>> On Thu, Jan 30, 2020 at 11:11:58AM +0100, Jann Horn wrote:
>>> On Thu, Jan 30, 2020 at 2:08 AM Jens Axboe <axboe@kernel.dk> wrote:
>>>> On 1/29/20 10:34 AM, Jens Axboe wrote:
>>>>> On 1/29/20 7:59 AM, Jann Horn wrote:
>>>>>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>> On 1/28/20 11:04 AM, Jens Axboe wrote:
>>>>>>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
>>>>>> [...]
>>>>>>>>> #1 adds support for registering the personality of the invoking task,
>>>>>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
>>>>>>>>> just having one link, it doesn't support a chain of them.
>>>>>> [...]
>>>>>>> I didn't like it becoming a bit too complicated, both in terms of
>>>>>>> implementation and use. And the fact that we'd have to jump through
>>>>>>> hoops to make this work for a full chain.
>>>>>>>
>>>>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>>>>>>> This makes it way easier to use. Same branch:
>>>>>>>
>>>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>>>>>
>>>>>>> I'd feel much better with this variant for 5.6.
>>>>>>
>>>>>> Some general feedback from an inspectability/debuggability perspective:
>>>>>>
>>>>>> At some point, it might be nice if you could add a .show_fdinfo
>>>>>> handler to the io_uring_fops that makes it possible to get a rough
>>>>>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd,
>>>>>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be
>>>>>> helpful for debugging to be able to see information about the fixed
>>>>>> files and buffers that have been registered. Same for the
>>>>>> personalities; that information might also be useful when someone is
>>>>>> trying to figure out what privileges a running process actually has.
>>>>>
>>>>> Agree, that would be a very useful addition. I'll take a look at it.
>>>>
>>>> Jann, how much info are you looking for? Here's a rough start, just
>>>> shows the number of registered files and buffers, and lists the
>>>> personalities registered. We could also dump the buffer info for
>>>> each of them, and ditto for the files. Not sure how much verbosity
>>>> is acceptable in fdinfo?
>>>
>>> At the moment, I personally am just interested in this from the
>>> perspective of being able to audit the state of personalities, to make
>>> important information about the security state of processes visible.
>>>
>>> Good point about verbosity in fdinfo - I'm not sure about that myself either.
>>>
>>>> Here's the test app for personality:
>>>
>>> Oh, that was quick...
>>>
>>>> # cat 3
>>>> pos:    0
>>>> flags:  02000002
>>>> mnt_id: 14
>>>> user-files: 0
>>>> user-bufs: 0
>>>> personalities:
>>>>             1: uid=0/gid=0
>>>>
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index c5ca84a305d3..0b2c7d800297 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -6511,6 +6505,45 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
>>>>         return submitted ? submitted : ret;
>>>>  }
>>>>
>>>> +struct ring_show_idr {
>>>> +       struct io_ring_ctx *ctx;
>>>> +       struct seq_file *m;
>>>> +};
>>>> +
>>>> +static int io_uring_show_cred(int id, void *p, void *data)
>>>> +{
>>>> +       struct ring_show_idr *r = data;
>>>> +       const struct cred *cred = p;
>>>> +
>>>> +       seq_printf(r->m, "\t%5d: uid=%u/gid=%u\n", id, cred->uid.val,
>>>> +                                               cred->gid.val);
>>>
>>> As Stefan said, the ->uid and ->gid aren't very useful, since when a
>>> process switches UIDs for accessing things in the filesystem, it
>>> probably only changes its EUID and FSUID, not its RUID.
>>> I think what's particularly relevant for uring would be the ->fsuid
>>> and the ->fsgid along with ->cap_effective; and perhaps for some
>>> operations also the ->euid and ->egid. The real UID/GID aren't really
>>> relevant when performing normal filesystem operations and such.
>>
>> This should probably just use the same format that is found in
>> /proc/<pid>/status to make it easy for tools to use the same parsing
>> logic and for the sake of consistency. We've adapted the same format for
>> pidfds. So that would mean:
>>
>> Uid:	1000	1000	1000	1000
>> Gid:	1000	1000	1000	1000
>>
>> Which would be: Real, effective, saved set, and filesystem {G,U}IDs
>>
>> And CapEff in /proc/<pid>/status has the format:
>> CapEff:	0000000000000000
> 
> I agree, consistency is good. I've added this, and also changed the
> naming to be CamelCase, which is seems like most of them are. Now it
> looks like this:
> 
> pos:	0
> flags:	02000002
> mnt_id:	14
> UserFiles:     0
> UserBufs:     0
> Personalities:
>     1
> 	Uid:	0		0		0		0
> 	Gid:	0		0		0		0
> 	Groups:	0
> 	CapEff:	0000003fffffffff
> 
> for a single personality registered (root). I have to indent it an extra
> tab to display each personality.

That looks good.

Maybe also print some details of struct io_ring_ctx,
flags and the ring sizes, ctx->cred.

Maybe details for io_wq and sqo_thread.

Maybe pending requests?
I'm not sure about how io_wq threads work in detail.
Is it possible that a large number of blocking request
(against an external harddisk with disconnected cable)
to block other blocking requests to a working ssd?
It would be good to diagnose such situations from
the output.

How is this supposed to be ABI-wise? Is it possible to change
the output in later kernel versions?

metze


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-30 14:11                       ` Jens Axboe
  2020-01-30 14:47                         ` Stefan Metzmacher
@ 2020-01-30 15:13                         ` Christian Brauner
  2020-01-30 15:29                           ` Jens Axboe
  1 sibling, 1 reply; 48+ messages in thread
From: Christian Brauner @ 2020-01-30 15:13 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jann Horn, Stefan Metzmacher, io-uring, Linux API Mailing List,
	Pavel Begunkov

On Thu, Jan 30, 2020 at 07:11:08AM -0700, Jens Axboe wrote:
> On 1/30/20 3:26 AM, Christian Brauner wrote:
> > On Thu, Jan 30, 2020 at 11:11:58AM +0100, Jann Horn wrote:
> >> On Thu, Jan 30, 2020 at 2:08 AM Jens Axboe <axboe@kernel.dk> wrote:
> >>> On 1/29/20 10:34 AM, Jens Axboe wrote:
> >>>> On 1/29/20 7:59 AM, Jann Horn wrote:
> >>>>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <axboe@kernel.dk> wrote:
> >>>>>> On 1/28/20 11:04 AM, Jens Axboe wrote:
> >>>>>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
> >>>>> [...]
> >>>>>>>> #1 adds support for registering the personality of the invoking task,
> >>>>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
> >>>>>>>> just having one link, it doesn't support a chain of them.
> >>>>> [...]
> >>>>>> I didn't like it becoming a bit too complicated, both in terms of
> >>>>>> implementation and use. And the fact that we'd have to jump through
> >>>>>> hoops to make this work for a full chain.
> >>>>>>
> >>>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
> >>>>>> This makes it way easier to use. Same branch:
> >>>>>>
> >>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
> >>>>>>
> >>>>>> I'd feel much better with this variant for 5.6.
> >>>>>
> >>>>> Some general feedback from an inspectability/debuggability perspective:
> >>>>>
> >>>>> At some point, it might be nice if you could add a .show_fdinfo
> >>>>> handler to the io_uring_fops that makes it possible to get a rough
> >>>>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd,
> >>>>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be
> >>>>> helpful for debugging to be able to see information about the fixed
> >>>>> files and buffers that have been registered. Same for the
> >>>>> personalities; that information might also be useful when someone is
> >>>>> trying to figure out what privileges a running process actually has.
> >>>>
> >>>> Agree, that would be a very useful addition. I'll take a look at it.
> >>>
> >>> Jann, how much info are you looking for? Here's a rough start, just
> >>> shows the number of registered files and buffers, and lists the
> >>> personalities registered. We could also dump the buffer info for
> >>> each of them, and ditto for the files. Not sure how much verbosity
> >>> is acceptable in fdinfo?
> >>
> >> At the moment, I personally am just interested in this from the
> >> perspective of being able to audit the state of personalities, to make
> >> important information about the security state of processes visible.
> >>
> >> Good point about verbosity in fdinfo - I'm not sure about that myself either.

Afaik, there's no rule here. I would expect that it shouldn't exceed
4096kb just because that is the limit that seems to be enforced for
writes to proc files atm; other than that it should be the wild west.
The fdinfo files are mostly interesting for anon_inode fds imho and the
ones that come to mind right now simply don't have a lot of information
to provide:

eventfd
timerfd
seccomp_notify_fd

Potentially, the mount fds from David could be extended in the future.

(Side note: One thing that comes to mind is that we should probably
enforce^Wdocument that all fdinfo files use CamelCase?)

Christian

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-30 15:13                         ` Christian Brauner
@ 2020-01-30 15:29                           ` Jens Axboe
  0 siblings, 0 replies; 48+ messages in thread
From: Jens Axboe @ 2020-01-30 15:29 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jann Horn, Stefan Metzmacher, io-uring, Linux API Mailing List,
	Pavel Begunkov

On 1/30/20 8:13 AM, Christian Brauner wrote:
> On Thu, Jan 30, 2020 at 07:11:08AM -0700, Jens Axboe wrote:
>> On 1/30/20 3:26 AM, Christian Brauner wrote:
>>> On Thu, Jan 30, 2020 at 11:11:58AM +0100, Jann Horn wrote:
>>>> On Thu, Jan 30, 2020 at 2:08 AM Jens Axboe <axboe@kernel.dk> wrote:
>>>>> On 1/29/20 10:34 AM, Jens Axboe wrote:
>>>>>> On 1/29/20 7:59 AM, Jann Horn wrote:
>>>>>>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>> On 1/28/20 11:04 AM, Jens Axboe wrote:
>>>>>>>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
>>>>>>> [...]
>>>>>>>>>> #1 adds support for registering the personality of the invoking task,
>>>>>>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
>>>>>>>>>> just having one link, it doesn't support a chain of them.
>>>>>>> [...]
>>>>>>>> I didn't like it becoming a bit too complicated, both in terms of
>>>>>>>> implementation and use. And the fact that we'd have to jump through
>>>>>>>> hoops to make this work for a full chain.
>>>>>>>>
>>>>>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>>>>>>>> This makes it way easier to use. Same branch:
>>>>>>>>
>>>>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>>>>>>
>>>>>>>> I'd feel much better with this variant for 5.6.
>>>>>>>
>>>>>>> Some general feedback from an inspectability/debuggability perspective:
>>>>>>>
>>>>>>> At some point, it might be nice if you could add a .show_fdinfo
>>>>>>> handler to the io_uring_fops that makes it possible to get a rough
>>>>>>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd,
>>>>>>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be
>>>>>>> helpful for debugging to be able to see information about the fixed
>>>>>>> files and buffers that have been registered. Same for the
>>>>>>> personalities; that information might also be useful when someone is
>>>>>>> trying to figure out what privileges a running process actually has.
>>>>>>
>>>>>> Agree, that would be a very useful addition. I'll take a look at it.
>>>>>
>>>>> Jann, how much info are you looking for? Here's a rough start, just
>>>>> shows the number of registered files and buffers, and lists the
>>>>> personalities registered. We could also dump the buffer info for
>>>>> each of them, and ditto for the files. Not sure how much verbosity
>>>>> is acceptable in fdinfo?
>>>>
>>>> At the moment, I personally am just interested in this from the
>>>> perspective of being able to audit the state of personalities, to make
>>>> important information about the security state of processes visible.
>>>>
>>>> Good point about verbosity in fdinfo - I'm not sure about that myself either.
> 
> Afaik, there's no rule here. I would expect that it shouldn't exceed
> 4096kb just because that is the limit that seems to be enforced for
> writes to proc files atm; other than that it should be the wild west.
> The fdinfo files are mostly interesting for anon_inode fds imho and the
> ones that come to mind right now simply don't have a lot of information
> to provide:
> 
> eventfd
> timerfd
> seccomp_notify_fd
> 
> Potentially, the mount fds from David could be extended in the future.

4MB is huge, I'd not get anywhere near that. So I'd say the current
format is probably fine. I honed it a little bit to be prettier, looks
good to me. I'll send it out for review.

> (Side note: One thing that comes to mind is that we should probably
> enforce^Wdocument that all fdinfo files use CamelCase?)

Fine with me, seems to already be the norm, would be nice to have it
documented.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()?
  2020-01-30 14:47                         ` Stefan Metzmacher
@ 2020-01-30 15:34                           ` Jens Axboe
  0 siblings, 0 replies; 48+ messages in thread
From: Jens Axboe @ 2020-01-30 15:34 UTC (permalink / raw)
  To: Stefan Metzmacher, Christian Brauner, Jann Horn
  Cc: io-uring, Linux API Mailing List, Pavel Begunkov

On 1/30/20 7:47 AM, Stefan Metzmacher wrote:
> Am 30.01.20 um 15:11 schrieb Jens Axboe:
>> On 1/30/20 3:26 AM, Christian Brauner wrote:
>>> On Thu, Jan 30, 2020 at 11:11:58AM +0100, Jann Horn wrote:
>>>> On Thu, Jan 30, 2020 at 2:08 AM Jens Axboe <axboe@kernel.dk> wrote:
>>>>> On 1/29/20 10:34 AM, Jens Axboe wrote:
>>>>>> On 1/29/20 7:59 AM, Jann Horn wrote:
>>>>>>> On Tue, Jan 28, 2020 at 8:42 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>> On 1/28/20 11:04 AM, Jens Axboe wrote:
>>>>>>>>> On 1/28/20 10:19 AM, Jens Axboe wrote:
>>>>>>> [...]
>>>>>>>>>> #1 adds support for registering the personality of the invoking task,
>>>>>>>>>> and #2 adds support for IORING_OP_USE_CREDS. Right now it's limited to
>>>>>>>>>> just having one link, it doesn't support a chain of them.
>>>>>>> [...]
>>>>>>>> I didn't like it becoming a bit too complicated, both in terms of
>>>>>>>> implementation and use. And the fact that we'd have to jump through
>>>>>>>> hoops to make this work for a full chain.
>>>>>>>>
>>>>>>>> So I punted and just added sqe->personality and IOSQE_PERSONALITY.
>>>>>>>> This makes it way easier to use. Same branch:
>>>>>>>>
>>>>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.6/io_uring-vfs-creds
>>>>>>>>
>>>>>>>> I'd feel much better with this variant for 5.6.
>>>>>>>
>>>>>>> Some general feedback from an inspectability/debuggability perspective:
>>>>>>>
>>>>>>> At some point, it might be nice if you could add a .show_fdinfo
>>>>>>> handler to the io_uring_fops that makes it possible to get a rough
>>>>>>> overview over the state of the uring by reading /proc/$pid/fdinfo/$fd,
>>>>>>> just like e.g. eventfd (see eventfd_show_fdinfo()). It might be
>>>>>>> helpful for debugging to be able to see information about the fixed
>>>>>>> files and buffers that have been registered. Same for the
>>>>>>> personalities; that information might also be useful when someone is
>>>>>>> trying to figure out what privileges a running process actually has.
>>>>>>
>>>>>> Agree, that would be a very useful addition. I'll take a look at it.
>>>>>
>>>>> Jann, how much info are you looking for? Here's a rough start, just
>>>>> shows the number of registered files and buffers, and lists the
>>>>> personalities registered. We could also dump the buffer info for
>>>>> each of them, and ditto for the files. Not sure how much verbosity
>>>>> is acceptable in fdinfo?
>>>>
>>>> At the moment, I personally am just interested in this from the
>>>> perspective of being able to audit the state of personalities, to make
>>>> important information about the security state of processes visible.
>>>>
>>>> Good point about verbosity in fdinfo - I'm not sure about that myself either.
>>>>
>>>>> Here's the test app for personality:
>>>>
>>>> Oh, that was quick...
>>>>
>>>>> # cat 3
>>>>> pos:    0
>>>>> flags:  02000002
>>>>> mnt_id: 14
>>>>> user-files: 0
>>>>> user-bufs: 0
>>>>> personalities:
>>>>>             1: uid=0/gid=0
>>>>>
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index c5ca84a305d3..0b2c7d800297 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -6511,6 +6505,45 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
>>>>>         return submitted ? submitted : ret;
>>>>>  }
>>>>>
>>>>> +struct ring_show_idr {
>>>>> +       struct io_ring_ctx *ctx;
>>>>> +       struct seq_file *m;
>>>>> +};
>>>>> +
>>>>> +static int io_uring_show_cred(int id, void *p, void *data)
>>>>> +{
>>>>> +       struct ring_show_idr *r = data;
>>>>> +       const struct cred *cred = p;
>>>>> +
>>>>> +       seq_printf(r->m, "\t%5d: uid=%u/gid=%u\n", id, cred->uid.val,
>>>>> +                                               cred->gid.val);
>>>>
>>>> As Stefan said, the ->uid and ->gid aren't very useful, since when a
>>>> process switches UIDs for accessing things in the filesystem, it
>>>> probably only changes its EUID and FSUID, not its RUID.
>>>> I think what's particularly relevant for uring would be the ->fsuid
>>>> and the ->fsgid along with ->cap_effective; and perhaps for some
>>>> operations also the ->euid and ->egid. The real UID/GID aren't really
>>>> relevant when performing normal filesystem operations and such.
>>>
>>> This should probably just use the same format that is found in
>>> /proc/<pid>/status to make it easy for tools to use the same parsing
>>> logic and for the sake of consistency. We've adapted the same format for
>>> pidfds. So that would mean:
>>>
>>> Uid:	1000	1000	1000	1000
>>> Gid:	1000	1000	1000	1000
>>>
>>> Which would be: Real, effective, saved set, and filesystem {G,U}IDs
>>>
>>> And CapEff in /proc/<pid>/status has the format:
>>> CapEff:	0000000000000000
>>
>> I agree, consistency is good. I've added this, and also changed the
>> naming to be CamelCase, which is seems like most of them are. Now it
>> looks like this:
>>
>> pos:	0
>> flags:	02000002
>> mnt_id:	14
>> UserFiles:     0
>> UserBufs:     0
>> Personalities:
>>     1
>> 	Uid:	0		0		0		0
>> 	Gid:	0		0		0		0
>> 	Groups:	0
>> 	CapEff:	0000003fffffffff
>>
>> for a single personality registered (root). I have to indent it an extra
>> tab to display each personality.
> 
> That looks good.
> 
> Maybe also print some details of struct io_ring_ctx,
> flags and the ring sizes, ctx->cred.
> 
> Maybe details for io_wq and sqo_thread.

Yeah, I agree that we should probably just add a ton more, there's
plenty of information that would be useful. But let's start simple - I
forgot to CC you on the patch I just sent out, but it's basically the
above cleaned up. We dump information that's registered with the ring,
that's the theme right now. I'd be happy to add some of the state
information as well, we should do that as a separate patch.

> Maybe pending requests?
> I'm not sure about how io_wq threads work in detail.
> Is it possible that a large number of blocking request
> (against an external harddisk with disconnected cable)
> to block other blocking requests to a working ssd?
> It would be good to diagnose such situations from
> the output.

io_uring doesn't necessarily track pending requests, only if it has to.
For bounded request time IO, like the above, it'll depend on the
concurrency level. If you setup the ring with eg N entries, that'll be
at most N pending bounded requests. If all of those are blocked because
the disk isn't responding, yes, that could happen. At least until the
timeout happens.

> How is this supposed to be ABI-wise? Is it possible to change
> the output in later kernel versions?

We should always be able to append to the file, I'd just prefer if we
don't change the format of lines that have already been added.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2020-01-30 15:34 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-28 10:18 IORING_REGISTER_CREDS[_UPDATE]() and credfd_create()? Stefan Metzmacher
2020-01-28 16:10 ` Jens Axboe
2020-01-28 16:17   ` Stefan Metzmacher
2020-01-28 16:19     ` Jens Axboe
2020-01-28 17:19       ` Jens Axboe
2020-01-28 18:04         ` Jens Axboe
2020-01-28 19:42           ` Jens Axboe
2020-01-28 20:16             ` Pavel Begunkov
2020-01-28 20:19               ` Jens Axboe
2020-01-28 20:50                 ` Pavel Begunkov
2020-01-28 20:56                   ` Jens Axboe
2020-01-28 21:25                     ` Christian Brauner
2020-01-28 22:38                       ` Pavel Begunkov
2020-01-28 23:36             ` Pavel Begunkov
2020-01-28 23:40               ` Jens Axboe
2020-01-28 23:51                 ` Jens Axboe
2020-01-29  0:10                   ` Pavel Begunkov
2020-01-29  0:15                     ` Jens Axboe
2020-01-29  0:18                       ` Jens Axboe
2020-01-29  0:20                     ` Jens Axboe
2020-01-29  0:21                       ` Pavel Begunkov
2020-01-29  0:24                         ` Jens Axboe
2020-01-29  0:54                           ` Jens Axboe
2020-01-29 10:17                             ` Pavel Begunkov
2020-01-29 13:11                               ` Stefan Metzmacher
2020-01-29 13:41                                 ` Pavel Begunkov
2020-01-29 13:56                                   ` Stefan Metzmacher
2020-01-29 14:23                                     ` Pavel Begunkov
2020-01-29 14:27                                       ` Stefan Metzmacher
2020-01-29 14:34                                         ` Pavel Begunkov
2020-01-29 17:34                                       ` Jens Axboe
2020-01-29 17:42                                         ` Jens Axboe
2020-01-29 20:09                                           ` Stefan Metzmacher
2020-01-29 20:48                                             ` Jens Axboe
2020-01-29 17:46                                         ` Pavel Begunkov
2020-01-29 14:59             ` Jann Horn
2020-01-29 17:34               ` Jens Axboe
2020-01-30  1:08                 ` Jens Axboe
2020-01-30  2:20                   ` Jens Axboe
2020-01-30  3:18                     ` Jens Axboe
2020-01-30  6:53                   ` Stefan Metzmacher
2020-01-30 10:11                   ` Jann Horn
2020-01-30 10:26                     ` Christian Brauner
2020-01-30 14:11                       ` Jens Axboe
2020-01-30 14:47                         ` Stefan Metzmacher
2020-01-30 15:34                           ` Jens Axboe
2020-01-30 15:13                         ` Christian Brauner
2020-01-30 15:29                           ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).