* Odd interaction with file capabilities and procfs files @ 2022-10-19 0:42 Daniel Xu 2022-10-19 13:22 ` Christian Brauner 0 siblings, 1 reply; 5+ messages in thread From: Daniel Xu @ 2022-10-19 0:42 UTC (permalink / raw) To: viro, linux-fsdevel, linux-kernel Hi, (Going off get_maintainers.pl for fs/namei.c here) I'm seeing some weird interactions with file capabilities and S_IRUSR procfs files. Best I can tell it doesn't occur with real files on my btrfs home partition. Test program: #include <fcntl.h> #include <stdio.h> int main() { int fd = open("/proc/self/auxv", O_RDONLY); if (fd < 0) { perror("open"); return 1; } printf("ok\n"); return 0; } Steps to reproduce: $ gcc main.c $ ./a.out ok $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out $ ./a.out open: Permission denied It's not obvious why this happens, even after spending a few hours going through the standard documentation and kernel code. It's intuitively odd b/c you'd think adding capabilities to the permitted set wouldn't affect functionality. Best I could tell the -EACCES error occurs in the fallthrough codepath inside generic_permission(). Sorry if this is something dumb or obvious. Thanks, Daniel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Odd interaction with file capabilities and procfs files 2022-10-19 0:42 Odd interaction with file capabilities and procfs files Daniel Xu @ 2022-10-19 13:22 ` Christian Brauner 2022-10-19 21:42 ` Daniel Xu 0 siblings, 1 reply; 5+ messages in thread From: Christian Brauner @ 2022-10-19 13:22 UTC (permalink / raw) To: Daniel Xu; +Cc: viro, linux-fsdevel, linux-kernel On Tue, Oct 18, 2022 at 06:42:04PM -0600, Daniel Xu wrote: > Hi, > > (Going off get_maintainers.pl for fs/namei.c here) > > I'm seeing some weird interactions with file capabilities and S_IRUSR > procfs files. Best I can tell it doesn't occur with real files on my btrfs > home partition. > > Test program: > > #include <fcntl.h> > #include <stdio.h> > > int main() > { > int fd = open("/proc/self/auxv", O_RDONLY); > if (fd < 0) { > perror("open"); > return 1; > } > > printf("ok\n"); > return 0; > } > > Steps to reproduce: > > $ gcc main.c > $ ./a.out > ok > $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out > $ ./a.out > open: Permission denied > > It's not obvious why this happens, even after spending a few hours > going through the standard documentation and kernel code. It's > intuitively odd b/c you'd think adding capabilities to the permitted > set wouldn't affect functionality. > > Best I could tell the -EACCES error occurs in the fallthrough codepath > inside generic_permission(). > > Sorry if this is something dumb or obvious. Hey Daniel, No, this is neither dumb nor obvious. :) Basically, if you set fscaps then /proc/self/auxv will be owned by root:root. You can verify this: #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #include <stdio.h> #include <errno.h> #include <unistd.h> int main() { struct stat st; printf("%d | %d\n", getuid(), geteuid()); if (stat("/proc/self/auxv", &st)) { fprintf(stderr, "stat: %d - %m\n", errno); return 1; } printf("stat: %d | %d\n", st.st_uid, st.st_gid); int fd = open("/proc/self/auxv", O_RDONLY); if (fd < 0) { fprintf(stderr, "open: %d - %m\n", errno); return 1; } printf("ok\n"); return 0; } $ ./a.out 1000 | 1000 stat: 1000 | 1000 ok $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out $ ./a.out 1000 | 1000 stat: 0 | 0 open: 13 - Permission denied So acl_permission_check() fails and returns -EACCESS which will cause generic_permission() to rely on capable_wrt_inode_uidgid() which checks for CAP_DAC_READ_SEARCH which you don't have as an unprivileged user. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Odd interaction with file capabilities and procfs files 2022-10-19 13:22 ` Christian Brauner @ 2022-10-19 21:42 ` Daniel Xu 2022-10-20 7:44 ` Christian Brauner 0 siblings, 1 reply; 5+ messages in thread From: Daniel Xu @ 2022-10-19 21:42 UTC (permalink / raw) To: Christian Brauner; +Cc: viro, linux-fsdevel, linux-kernel Hi Christian, On Wed, Oct 19, 2022, at 7:22 AM, Christian Brauner wrote: > On Tue, Oct 18, 2022 at 06:42:04PM -0600, Daniel Xu wrote: >> Hi, >> >> (Going off get_maintainers.pl for fs/namei.c here) >> >> I'm seeing some weird interactions with file capabilities and S_IRUSR >> procfs files. Best I can tell it doesn't occur with real files on my btrfs >> home partition. >> >> Test program: >> >> #include <fcntl.h> >> #include <stdio.h> >> >> int main() >> { >> int fd = open("/proc/self/auxv", O_RDONLY); >> if (fd < 0) { >> perror("open"); >> return 1; >> } >> >> printf("ok\n"); >> return 0; >> } >> >> Steps to reproduce: >> >> $ gcc main.c >> $ ./a.out >> ok >> $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out >> $ ./a.out >> open: Permission denied >> >> It's not obvious why this happens, even after spending a few hours >> going through the standard documentation and kernel code. It's >> intuitively odd b/c you'd think adding capabilities to the permitted >> set wouldn't affect functionality. >> >> Best I could tell the -EACCES error occurs in the fallthrough codepath >> inside generic_permission(). >> >> Sorry if this is something dumb or obvious. > > Hey Daniel, > > No, this is neither dumb nor obvious. :) > > Basically, if you set fscaps then /proc/self/auxv will be owned by > root:root. You can verify this: > > #include <fcntl.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <stdio.h> > #include <errno.h> > #include <unistd.h> > > int main() > { > struct stat st; > printf("%d | %d\n", getuid(), geteuid()); > > if (stat("/proc/self/auxv", &st)) { > fprintf(stderr, "stat: %d - %m\n", errno); > return 1; > } > printf("stat: %d | %d\n", st.st_uid, st.st_gid); > > int fd = open("/proc/self/auxv", O_RDONLY); > if (fd < 0) { > fprintf(stderr, "open: %d - %m\n", errno); > return 1; > } > > printf("ok\n"); > return 0; > } > > $ ./a.out > 1000 | 1000 > stat: 1000 | 1000 > ok > $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out > $ ./a.out > 1000 | 1000 > stat: 0 | 0 > open: 13 - Permission denied > > So acl_permission_check() fails and returns -EACCESS which will cause > generic_permission() to rely on capable_wrt_inode_uidgid() which checks > for CAP_DAC_READ_SEARCH which you don't have as an unprivileged user. Thanks for checking on this. That does explain explain the weirdness but at the expense of another question: why do fscaps cause /proc/self/auxv to be owned by root? Is that the correct semantics? This also seems rather unexpected. I'll take a look tonight and see if I can come up with any answers. Thanks, Daniel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Odd interaction with file capabilities and procfs files 2022-10-19 21:42 ` Daniel Xu @ 2022-10-20 7:44 ` Christian Brauner 2022-10-20 21:35 ` Daniel Xu 0 siblings, 1 reply; 5+ messages in thread From: Christian Brauner @ 2022-10-20 7:44 UTC (permalink / raw) To: Daniel Xu; +Cc: viro, linux-fsdevel, linux-kernel On Wed, Oct 19, 2022 at 03:42:42PM -0600, Daniel Xu wrote: > Hi Christian, > > On Wed, Oct 19, 2022, at 7:22 AM, Christian Brauner wrote: > > On Tue, Oct 18, 2022 at 06:42:04PM -0600, Daniel Xu wrote: > >> Hi, > >> > >> (Going off get_maintainers.pl for fs/namei.c here) > >> > >> I'm seeing some weird interactions with file capabilities and S_IRUSR > >> procfs files. Best I can tell it doesn't occur with real files on my btrfs > >> home partition. > >> > >> Test program: > >> > >> #include <fcntl.h> > >> #include <stdio.h> > >> > >> int main() > >> { > >> int fd = open("/proc/self/auxv", O_RDONLY); > >> if (fd < 0) { > >> perror("open"); > >> return 1; > >> } > >> > >> printf("ok\n"); > >> return 0; > >> } > >> > >> Steps to reproduce: > >> > >> $ gcc main.c > >> $ ./a.out > >> ok > >> $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out > >> $ ./a.out > >> open: Permission denied > >> > >> It's not obvious why this happens, even after spending a few hours > >> going through the standard documentation and kernel code. It's > >> intuitively odd b/c you'd think adding capabilities to the permitted > >> set wouldn't affect functionality. > >> > >> Best I could tell the -EACCES error occurs in the fallthrough codepath > >> inside generic_permission(). > >> > >> Sorry if this is something dumb or obvious. > > > > Hey Daniel, > > > > No, this is neither dumb nor obvious. :) > > > > Basically, if you set fscaps then /proc/self/auxv will be owned by > > root:root. You can verify this: > > > > #include <fcntl.h> > > #include <sys/types.h> > > #include <sys/stat.h> > > #include <stdio.h> > > #include <errno.h> > > #include <unistd.h> > > > > int main() > > { > > struct stat st; > > printf("%d | %d\n", getuid(), geteuid()); > > > > if (stat("/proc/self/auxv", &st)) { > > fprintf(stderr, "stat: %d - %m\n", errno); > > return 1; > > } > > printf("stat: %d | %d\n", st.st_uid, st.st_gid); > > > > int fd = open("/proc/self/auxv", O_RDONLY); > > if (fd < 0) { > > fprintf(stderr, "open: %d - %m\n", errno); > > return 1; > > } > > > > printf("ok\n"); > > return 0; > > } > > > > $ ./a.out > > 1000 | 1000 > > stat: 1000 | 1000 > > ok > > $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out > > $ ./a.out > > 1000 | 1000 > > stat: 0 | 0 > > open: 13 - Permission denied > > > > So acl_permission_check() fails and returns -EACCESS which will cause > > generic_permission() to rely on capable_wrt_inode_uidgid() which checks > > for CAP_DAC_READ_SEARCH which you don't have as an unprivileged user. > > Thanks for checking on this. > > That does explain explain the weirdness but at the expense of another > question: why do fscaps cause /proc/self/auxv to be owned by root? > Is that the correct semantics? This also seems rather unexpected. > > I'll take a look tonight and see if I can come up with any answers. Sorry I didn't explain this in more detail. You mostly uncovered the reasons as evidenced by the Twitter thread. Yes, this is expected. When a new process that gains privileges during exec the kernel will make it non-dumpable. That includes changing of the e{g,u}id or fs{g,u}id of the process, s{g,u}id binary execution that results in changed e{g,u}id, or if the executed binary has fscaps set if the new permitted caps aren't a subset of the currently permitted caps. The last reason is what causes your sample program's /proc/self to be owned by root. The culprit here is cred_cap_issubset() which is called during commit_creds() in begin_new_exec(). If the dumpable attribute is set then all files in /proc/<pid> will be owned by (userns) root. To get the full picture you'd need to at least read man proc(5), man execve(2), and man prctl(2). The reason behind the dumpability change is to prevent unprivileged user to make privilege-elevating-binaries (e.g., s{g,u}id binaries) crash to produce (userns-)root-owned coredumps which can be used in exploits. A fairly recent example of this is e.g., https://alephsecurity.com/2021/10/20/sudump/ https://www.openwall.com/lists/oss-security/2021/10/20/2 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Odd interaction with file capabilities and procfs files 2022-10-20 7:44 ` Christian Brauner @ 2022-10-20 21:35 ` Daniel Xu 0 siblings, 0 replies; 5+ messages in thread From: Daniel Xu @ 2022-10-20 21:35 UTC (permalink / raw) To: Christian Brauner; +Cc: viro, linux-fsdevel, linux-kernel On Thu, Oct 20, 2022, at 1:44 AM, Christian Brauner wrote: > On Wed, Oct 19, 2022 at 03:42:42PM -0600, Daniel Xu wrote: >> Hi Christian, >> >> On Wed, Oct 19, 2022, at 7:22 AM, Christian Brauner wrote: >> > On Tue, Oct 18, 2022 at 06:42:04PM -0600, Daniel Xu wrote: >> >> Hi, >> >> >> >> (Going off get_maintainers.pl for fs/namei.c here) >> >> >> >> I'm seeing some weird interactions with file capabilities and S_IRUSR >> >> procfs files. Best I can tell it doesn't occur with real files on my btrfs >> >> home partition. >> >> >> >> Test program: >> >> >> >> #include <fcntl.h> >> >> #include <stdio.h> >> >> >> >> int main() >> >> { >> >> int fd = open("/proc/self/auxv", O_RDONLY); >> >> if (fd < 0) { >> >> perror("open"); >> >> return 1; >> >> } >> >> >> >> printf("ok\n"); >> >> return 0; >> >> } >> >> >> >> Steps to reproduce: >> >> >> >> $ gcc main.c >> >> $ ./a.out >> >> ok >> >> $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out >> >> $ ./a.out >> >> open: Permission denied >> >> >> >> It's not obvious why this happens, even after spending a few hours >> >> going through the standard documentation and kernel code. It's >> >> intuitively odd b/c you'd think adding capabilities to the permitted >> >> set wouldn't affect functionality. >> >> >> >> Best I could tell the -EACCES error occurs in the fallthrough codepath >> >> inside generic_permission(). >> >> >> >> Sorry if this is something dumb or obvious. >> > >> > Hey Daniel, >> > >> > No, this is neither dumb nor obvious. :) >> > >> > Basically, if you set fscaps then /proc/self/auxv will be owned by >> > root:root. You can verify this: >> > >> > #include <fcntl.h> >> > #include <sys/types.h> >> > #include <sys/stat.h> >> > #include <stdio.h> >> > #include <errno.h> >> > #include <unistd.h> >> > >> > int main() >> > { >> > struct stat st; >> > printf("%d | %d\n", getuid(), geteuid()); >> > >> > if (stat("/proc/self/auxv", &st)) { >> > fprintf(stderr, "stat: %d - %m\n", errno); >> > return 1; >> > } >> > printf("stat: %d | %d\n", st.st_uid, st.st_gid); >> > >> > int fd = open("/proc/self/auxv", O_RDONLY); >> > if (fd < 0) { >> > fprintf(stderr, "open: %d - %m\n", errno); >> > return 1; >> > } >> > >> > printf("ok\n"); >> > return 0; >> > } >> > >> > $ ./a.out >> > 1000 | 1000 >> > stat: 1000 | 1000 >> > ok >> > $ sudo setcap "cap_net_admin,cap_sys_admin+p" a.out >> > $ ./a.out >> > 1000 | 1000 >> > stat: 0 | 0 >> > open: 13 - Permission denied >> > >> > So acl_permission_check() fails and returns -EACCESS which will cause >> > generic_permission() to rely on capable_wrt_inode_uidgid() which checks >> > for CAP_DAC_READ_SEARCH which you don't have as an unprivileged user. >> >> Thanks for checking on this. >> >> That does explain explain the weirdness but at the expense of another >> question: why do fscaps cause /proc/self/auxv to be owned by root? >> Is that the correct semantics? This also seems rather unexpected. >> >> I'll take a look tonight and see if I can come up with any answers. > > Sorry I didn't explain this in more detail. > You mostly uncovered the reasons as evidenced by the Twitter thread. > > Yes, this is expected. When a new process that gains privileges during > exec the kernel will make it non-dumpable. That includes changing of the > e{g,u}id or fs{g,u}id of the process, s{g,u}id binary execution that > results in changed e{g,u}id, or if the executed binary has fscaps set if > the new permitted caps aren't a subset of the currently permitted caps. > > The last reason is what causes your sample program's /proc/self to be > owned by root. The culprit here is cred_cap_issubset() which is called > during commit_creds() in begin_new_exec(). > > If the dumpable attribute is set then all files in /proc/<pid> will be > owned by (userns) root. To get the full picture you'd need to at least > read man proc(5), man execve(2), and man prctl(2). > > The reason behind the dumpability change is to prevent unprivileged user > to make privilege-elevating-binaries (e.g., s{g,u}id binaries) crash to > produce (userns-)root-owned coredumps which can be used in exploits. A > fairly recent example of this is e.g., > https://alephsecurity.com/2021/10/20/sudump/ > https://www.openwall.com/lists/oss-security/2021/10/20/2 Thanks for the detailed explanation! I think each sense makes sense to me now. Even if the final result is a little odd. One of those things I guess :). I'll see if a patch to the man-pages is appropriate. Thanks, Daniel ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-10-20 21:36 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-10-19 0:42 Odd interaction with file capabilities and procfs files Daniel Xu 2022-10-19 13:22 ` Christian Brauner 2022-10-19 21:42 ` Daniel Xu 2022-10-20 7:44 ` Christian Brauner 2022-10-20 21:35 ` Daniel Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).