linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Odd interaction with file capabilities and procfs files
@ 2022-10-19  0:42 Daniel Xu
  2022-10-19 13:22 ` Christian Brauner
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Xu @ 2022-10-19  0:42 UTC (permalink / raw)
  To: viro, linux-fsdevel, linux-kernel

Hi,

(Going off get_maintainers.pl for fs/namei.c here)

I'm seeing some weird interactions with file capabilities and S_IRUSR
procfs files. Best I can tell it doesn't occur with real files on my btrfs
home partition.

Test program:

        #include <fcntl.h>
        #include <stdio.h>
        
        int main()
        {
                int fd = open("/proc/self/auxv", O_RDONLY);
                if (fd < 0) {
                        perror("open");
                        return 1;
                }
       
                printf("ok\n");
                return 0;
        }

Steps to reproduce:

        $ gcc main.c
        $ ./a.out
        ok
        $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out
        $ ./a.out
        open: Permission denied

It's not obvious why this happens, even after spending a few hours
going through the standard documentation and kernel code. It's
intuitively odd b/c you'd think adding capabilities to the permitted
set wouldn't affect functionality.

Best I could tell the -EACCES error occurs in the fallthrough codepath
inside generic_permission().

Sorry if this is something dumb or obvious.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Odd interaction with file capabilities and procfs files
  2022-10-19  0:42 Odd interaction with file capabilities and procfs files Daniel Xu
@ 2022-10-19 13:22 ` Christian Brauner
  2022-10-19 21:42   ` Daniel Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Brauner @ 2022-10-19 13:22 UTC (permalink / raw)
  To: Daniel Xu; +Cc: viro, linux-fsdevel, linux-kernel

On Tue, Oct 18, 2022 at 06:42:04PM -0600, Daniel Xu wrote:
> Hi,
> 
> (Going off get_maintainers.pl for fs/namei.c here)
> 
> I'm seeing some weird interactions with file capabilities and S_IRUSR
> procfs files. Best I can tell it doesn't occur with real files on my btrfs
> home partition.
> 
> Test program:
> 
>         #include <fcntl.h>
>         #include <stdio.h>
>         
>         int main()
>         {
>                 int fd = open("/proc/self/auxv", O_RDONLY);
>                 if (fd < 0) {
>                         perror("open");
>                         return 1;
>                 }
>        
>                 printf("ok\n");
>                 return 0;
>         }
> 
> Steps to reproduce:
> 
>         $ gcc main.c
>         $ ./a.out
>         ok
>         $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out
>         $ ./a.out
>         open: Permission denied
> 
> It's not obvious why this happens, even after spending a few hours
> going through the standard documentation and kernel code. It's
> intuitively odd b/c you'd think adding capabilities to the permitted
> set wouldn't affect functionality.
> 
> Best I could tell the -EACCES error occurs in the fallthrough codepath
> inside generic_permission().
> 
> Sorry if this is something dumb or obvious.

Hey Daniel,

No, this is neither dumb nor obvious. :)

Basically, if you set fscaps then /proc/self/auxv will be owned by
root:root. You can verify this:

#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>

int main()
{
        struct stat st;
        printf("%d | %d\n", getuid(), geteuid());

        if (stat("/proc/self/auxv", &st)) {
                fprintf(stderr, "stat: %d - %m\n", errno);
                return 1;
        }
        printf("stat: %d | %d\n", st.st_uid, st.st_gid);

        int fd = open("/proc/self/auxv", O_RDONLY);
        if (fd < 0) {
                fprintf(stderr, "open: %d - %m\n", errno);
                return 1;
        }

        printf("ok\n");
        return 0;
}

$ ./a.out
1000 | 1000
stat: 1000 | 1000
ok
$ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out
$ ./a.out
1000 | 1000
stat: 0 | 0
open: 13 - Permission denied

So acl_permission_check() fails and returns -EACCESS which will cause
generic_permission() to rely on capable_wrt_inode_uidgid() which checks
for CAP_DAC_READ_SEARCH which you don't have as an unprivileged user.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Odd interaction with file capabilities and procfs files
  2022-10-19 13:22 ` Christian Brauner
@ 2022-10-19 21:42   ` Daniel Xu
  2022-10-20  7:44     ` Christian Brauner
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Xu @ 2022-10-19 21:42 UTC (permalink / raw)
  To: Christian Brauner; +Cc: viro, linux-fsdevel, linux-kernel

Hi Christian,

On Wed, Oct 19, 2022, at 7:22 AM, Christian Brauner wrote:
> On Tue, Oct 18, 2022 at 06:42:04PM -0600, Daniel Xu wrote:
>> Hi,
>> 
>> (Going off get_maintainers.pl for fs/namei.c here)
>> 
>> I'm seeing some weird interactions with file capabilities and S_IRUSR
>> procfs files. Best I can tell it doesn't occur with real files on my btrfs
>> home partition.
>> 
>> Test program:
>> 
>>         #include <fcntl.h>
>>         #include <stdio.h>
>>         
>>         int main()
>>         {
>>                 int fd = open("/proc/self/auxv", O_RDONLY);
>>                 if (fd < 0) {
>>                         perror("open");
>>                         return 1;
>>                 }
>>        
>>                 printf("ok\n");
>>                 return 0;
>>         }
>> 
>> Steps to reproduce:
>> 
>>         $ gcc main.c
>>         $ ./a.out
>>         ok
>>         $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out
>>         $ ./a.out
>>         open: Permission denied
>> 
>> It's not obvious why this happens, even after spending a few hours
>> going through the standard documentation and kernel code. It's
>> intuitively odd b/c you'd think adding capabilities to the permitted
>> set wouldn't affect functionality.
>> 
>> Best I could tell the -EACCES error occurs in the fallthrough codepath
>> inside generic_permission().
>> 
>> Sorry if this is something dumb or obvious.
>
> Hey Daniel,
>
> No, this is neither dumb nor obvious. :)
>
> Basically, if you set fscaps then /proc/self/auxv will be owned by
> root:root. You can verify this:
>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <stdio.h>
> #include <errno.h>
> #include <unistd.h>
>
> int main()
> {
>         struct stat st;
>         printf("%d | %d\n", getuid(), geteuid());
>
>         if (stat("/proc/self/auxv", &st)) {
>                 fprintf(stderr, "stat: %d - %m\n", errno);
>                 return 1;
>         }
>         printf("stat: %d | %d\n", st.st_uid, st.st_gid);
>
>         int fd = open("/proc/self/auxv", O_RDONLY);
>         if (fd < 0) {
>                 fprintf(stderr, "open: %d - %m\n", errno);
>                 return 1;
>         }
>
>         printf("ok\n");
>         return 0;
> }
>
> $ ./a.out
> 1000 | 1000
> stat: 1000 | 1000
> ok
> $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out
> $ ./a.out
> 1000 | 1000
> stat: 0 | 0
> open: 13 - Permission denied
>
> So acl_permission_check() fails and returns -EACCESS which will cause
> generic_permission() to rely on capable_wrt_inode_uidgid() which checks
> for CAP_DAC_READ_SEARCH which you don't have as an unprivileged user.

Thanks for checking on this.

That does explain explain the weirdness but at the expense of another
question: why do fscaps cause /proc/self/auxv to be owned by root?
Is that the correct semantics? This also seems rather unexpected.

I'll take a look tonight and see if I can come up with any answers.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Odd interaction with file capabilities and procfs files
  2022-10-19 21:42   ` Daniel Xu
@ 2022-10-20  7:44     ` Christian Brauner
  2022-10-20 21:35       ` Daniel Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Brauner @ 2022-10-20  7:44 UTC (permalink / raw)
  To: Daniel Xu; +Cc: viro, linux-fsdevel, linux-kernel

On Wed, Oct 19, 2022 at 03:42:42PM -0600, Daniel Xu wrote:
> Hi Christian,
> 
> On Wed, Oct 19, 2022, at 7:22 AM, Christian Brauner wrote:
> > On Tue, Oct 18, 2022 at 06:42:04PM -0600, Daniel Xu wrote:
> >> Hi,
> >> 
> >> (Going off get_maintainers.pl for fs/namei.c here)
> >> 
> >> I'm seeing some weird interactions with file capabilities and S_IRUSR
> >> procfs files. Best I can tell it doesn't occur with real files on my btrfs
> >> home partition.
> >> 
> >> Test program:
> >> 
> >>         #include <fcntl.h>
> >>         #include <stdio.h>
> >>         
> >>         int main()
> >>         {
> >>                 int fd = open("/proc/self/auxv", O_RDONLY);
> >>                 if (fd < 0) {
> >>                         perror("open");
> >>                         return 1;
> >>                 }
> >>        
> >>                 printf("ok\n");
> >>                 return 0;
> >>         }
> >> 
> >> Steps to reproduce:
> >> 
> >>         $ gcc main.c
> >>         $ ./a.out
> >>         ok
> >>         $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out
> >>         $ ./a.out
> >>         open: Permission denied
> >> 
> >> It's not obvious why this happens, even after spending a few hours
> >> going through the standard documentation and kernel code. It's
> >> intuitively odd b/c you'd think adding capabilities to the permitted
> >> set wouldn't affect functionality.
> >> 
> >> Best I could tell the -EACCES error occurs in the fallthrough codepath
> >> inside generic_permission().
> >> 
> >> Sorry if this is something dumb or obvious.
> >
> > Hey Daniel,
> >
> > No, this is neither dumb nor obvious. :)
> >
> > Basically, if you set fscaps then /proc/self/auxv will be owned by
> > root:root. You can verify this:
> >
> > #include <fcntl.h>
> > #include <sys/types.h>
> > #include <sys/stat.h>
> > #include <stdio.h>
> > #include <errno.h>
> > #include <unistd.h>
> >
> > int main()
> > {
> >         struct stat st;
> >         printf("%d | %d\n", getuid(), geteuid());
> >
> >         if (stat("/proc/self/auxv", &st)) {
> >                 fprintf(stderr, "stat: %d - %m\n", errno);
> >                 return 1;
> >         }
> >         printf("stat: %d | %d\n", st.st_uid, st.st_gid);
> >
> >         int fd = open("/proc/self/auxv", O_RDONLY);
> >         if (fd < 0) {
> >                 fprintf(stderr, "open: %d - %m\n", errno);
> >                 return 1;
> >         }
> >
> >         printf("ok\n");
> >         return 0;
> > }
> >
> > $ ./a.out
> > 1000 | 1000
> > stat: 1000 | 1000
> > ok
> > $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out
> > $ ./a.out
> > 1000 | 1000
> > stat: 0 | 0
> > open: 13 - Permission denied
> >
> > So acl_permission_check() fails and returns -EACCESS which will cause
> > generic_permission() to rely on capable_wrt_inode_uidgid() which checks
> > for CAP_DAC_READ_SEARCH which you don't have as an unprivileged user.
> 
> Thanks for checking on this.
> 
> That does explain explain the weirdness but at the expense of another
> question: why do fscaps cause /proc/self/auxv to be owned by root?
> Is that the correct semantics? This also seems rather unexpected.
> 
> I'll take a look tonight and see if I can come up with any answers.

Sorry I didn't explain this in more detail.
You mostly uncovered the reasons as evidenced by the Twitter thread.

Yes, this is expected. When a new process that gains privileges during
exec the kernel will make it non-dumpable. That includes changing of the
e{g,u}id or fs{g,u}id of the process, s{g,u}id binary execution that
results in changed e{g,u}id, or if the executed binary has fscaps set if
the new permitted caps aren't a subset of the currently permitted caps.

The last reason is what causes your sample program's /proc/self to be
owned by root. The culprit here is cred_cap_issubset() which is called
during commit_creds() in begin_new_exec().

If the dumpable attribute is set then all files in /proc/<pid> will be
owned by (userns) root. To get the full picture you'd need to at least
read man proc(5), man execve(2), and man prctl(2).

The reason behind the dumpability change is to prevent unprivileged user
to make privilege-elevating-binaries (e.g., s{g,u}id binaries) crash to
produce (userns-)root-owned coredumps which can be used in exploits. A
fairly recent example of this is e.g.,
https://alephsecurity.com/2021/10/20/sudump/
https://www.openwall.com/lists/oss-security/2021/10/20/2

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Odd interaction with file capabilities and procfs files
  2022-10-20  7:44     ` Christian Brauner
@ 2022-10-20 21:35       ` Daniel Xu
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel Xu @ 2022-10-20 21:35 UTC (permalink / raw)
  To: Christian Brauner; +Cc: viro, linux-fsdevel, linux-kernel

On Thu, Oct 20, 2022, at 1:44 AM, Christian Brauner wrote:
> On Wed, Oct 19, 2022 at 03:42:42PM -0600, Daniel Xu wrote:
>> Hi Christian,
>> 
>> On Wed, Oct 19, 2022, at 7:22 AM, Christian Brauner wrote:
>> > On Tue, Oct 18, 2022 at 06:42:04PM -0600, Daniel Xu wrote:
>> >> Hi,
>> >> 
>> >> (Going off get_maintainers.pl for fs/namei.c here)
>> >> 
>> >> I'm seeing some weird interactions with file capabilities and S_IRUSR
>> >> procfs files. Best I can tell it doesn't occur with real files on my btrfs
>> >> home partition.
>> >> 
>> >> Test program:
>> >> 
>> >>         #include <fcntl.h>
>> >>         #include <stdio.h>
>> >>         
>> >>         int main()
>> >>         {
>> >>                 int fd = open("/proc/self/auxv", O_RDONLY);
>> >>                 if (fd < 0) {
>> >>                         perror("open");
>> >>                         return 1;
>> >>                 }
>> >>        
>> >>                 printf("ok\n");
>> >>                 return 0;
>> >>         }
>> >> 
>> >> Steps to reproduce:
>> >> 
>> >>         $ gcc main.c
>> >>         $ ./a.out
>> >>         ok
>> >>         $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out
>> >>         $ ./a.out
>> >>         open: Permission denied
>> >> 
>> >> It's not obvious why this happens, even after spending a few hours
>> >> going through the standard documentation and kernel code. It's
>> >> intuitively odd b/c you'd think adding capabilities to the permitted
>> >> set wouldn't affect functionality.
>> >> 
>> >> Best I could tell the -EACCES error occurs in the fallthrough codepath
>> >> inside generic_permission().
>> >> 
>> >> Sorry if this is something dumb or obvious.
>> >
>> > Hey Daniel,
>> >
>> > No, this is neither dumb nor obvious. :)
>> >
>> > Basically, if you set fscaps then /proc/self/auxv will be owned by
>> > root:root. You can verify this:
>> >
>> > #include <fcntl.h>
>> > #include <sys/types.h>
>> > #include <sys/stat.h>
>> > #include <stdio.h>
>> > #include <errno.h>
>> > #include <unistd.h>
>> >
>> > int main()
>> > {
>> >         struct stat st;
>> >         printf("%d | %d\n", getuid(), geteuid());
>> >
>> >         if (stat("/proc/self/auxv", &st)) {
>> >                 fprintf(stderr, "stat: %d - %m\n", errno);
>> >                 return 1;
>> >         }
>> >         printf("stat: %d | %d\n", st.st_uid, st.st_gid);
>> >
>> >         int fd = open("/proc/self/auxv", O_RDONLY);
>> >         if (fd < 0) {
>> >                 fprintf(stderr, "open: %d - %m\n", errno);
>> >                 return 1;
>> >         }
>> >
>> >         printf("ok\n");
>> >         return 0;
>> > }
>> >
>> > $ ./a.out
>> > 1000 | 1000
>> > stat: 1000 | 1000
>> > ok
>> > $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out
>> > $ ./a.out
>> > 1000 | 1000
>> > stat: 0 | 0
>> > open: 13 - Permission denied
>> >
>> > So acl_permission_check() fails and returns -EACCESS which will cause
>> > generic_permission() to rely on capable_wrt_inode_uidgid() which checks
>> > for CAP_DAC_READ_SEARCH which you don't have as an unprivileged user.
>> 
>> Thanks for checking on this.
>> 
>> That does explain explain the weirdness but at the expense of another
>> question: why do fscaps cause /proc/self/auxv to be owned by root?
>> Is that the correct semantics? This also seems rather unexpected.
>> 
>> I'll take a look tonight and see if I can come up with any answers.
>
> Sorry I didn't explain this in more detail.
> You mostly uncovered the reasons as evidenced by the Twitter thread.
>
> Yes, this is expected. When a new process that gains privileges during
> exec the kernel will make it non-dumpable. That includes changing of the
> e{g,u}id or fs{g,u}id of the process, s{g,u}id binary execution that
> results in changed e{g,u}id, or if the executed binary has fscaps set if
> the new permitted caps aren't a subset of the currently permitted caps.
>
> The last reason is what causes your sample program's /proc/self to be
> owned by root. The culprit here is cred_cap_issubset() which is called
> during commit_creds() in begin_new_exec().
>
> If the dumpable attribute is set then all files in /proc/<pid> will be
> owned by (userns) root. To get the full picture you'd need to at least
> read man proc(5), man execve(2), and man prctl(2).
>
> The reason behind the dumpability change is to prevent unprivileged user
> to make privilege-elevating-binaries (e.g., s{g,u}id binaries) crash to
> produce (userns-)root-owned coredumps which can be used in exploits. A
> fairly recent example of this is e.g.,
> https://alephsecurity.com/2021/10/20/sudump/
> https://www.openwall.com/lists/oss-security/2021/10/20/2

Thanks for the detailed explanation! I think each sense makes sense to
me now. Even if the final result is a little odd. One of those things I guess
:).

I'll see if a patch to the man-pages is appropriate.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-10-20 21:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-19  0:42 Odd interaction with file capabilities and procfs files Daniel Xu
2022-10-19 13:22 ` Christian Brauner
2022-10-19 21:42   ` Daniel Xu
2022-10-20  7:44     ` Christian Brauner
2022-10-20 21:35       ` Daniel Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).