* [GIT PULL] proc changes for v5.8-rc1 @ 2020-06-03 14:47 Eric W. Biederman 2020-06-04 21:15 ` pr-tracker-bot 2020-06-10 21:45 ` [GIT PULL] proc fixes " Eric W. Biederman 0 siblings, 2 replies; 10+ messages in thread From: Eric W. Biederman @ 2020-06-03 14:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Please pull the proc-linus branch from the git tree: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git proc-linus HEAD: 9d78edeaec759f997c303f286ecd39daee166f2a proc: proc_pid_ns takes super_block as an argument This branch has 4 sets of changes: proc: modernize proc to support multiple private instances proc: Ensure we see the exit of each process tid exactly Removing has_group_leader_pid posix-cpu-timers: Use pids not tasks in lookup Alexey updated proc so each mount of proc uses a new superblock. This allows people to actually use mount options with proc with no fear of messing up another mount of proc. Given the kernel's internal mounts of proc for things like uml this was a real problem, and resulted in Android's hidepid mount options being ignored and introducing security issues. The rest of the changes are small cleanups and fixes that came out of my work to allow this change to proc. In essence it is swapping the pids in de_thread during exec which revoves a special case the code had to handle. Then updating the code to stop handling that special case. Alexey Gladkov (9): proc: modernize proc to support multiple private instances proc: rename struct proc_fs_info to proc_fs_opts proc: allow to mount many instances of proc in one pid namespace proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option proc: add option to mount only a pids subset docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior proc: use human-readable values for hidepid proc: use named enums for better readability Use proc_pid_ns() to get pid_namespace from the proc superblock proc: proc_pid_ns takes super_block as an argument Eric W. Biederman (14): proc: Use PIDTYPE_TGID in next_tgid rculist: Add hlists_swap_heads_rcu proc: Ensure we see the exit of each process tid exactly once proc: Ensure we see the exit of each process tid exactly posix-cpu-timer: Tidy up group_leader logic in lookup_task posix-cpu-timer: Unify the now redundant code in lookup_task exec: Remove BUG_ON(has_group_leader_pid) signal: Remove has_group_leader_pid posix-cpu-timers: Extend rcu_read_lock removing task_struct references posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock Removing has_group_leader_pid posix-cpu-timers: Use pids not tasks in lookup Oleg Nesterov (1): remove the no longer needed pid_alive() check in __task_pid_nr_ns() Documentation/filesystems/proc.rst | 92 ++++++++++++--- fs/exec.c | 6 +- fs/locks.c | 4 +- fs/proc/array.c | 2 +- fs/proc/base.c | 74 ++++++------ fs/proc/generic.c | 9 ++ fs/proc/inode.c | 30 ++++- fs/proc/root.c | 131 ++++++++++++++++----- fs/proc/self.c | 8 +- fs/proc/thread_self.c | 8 +- fs/proc_namespace.c | 14 +-- include/linux/pid.h | 1 + include/linux/pid_namespace.h | 12 -- include/linux/proc_fs.h | 32 ++++- include/linux/rculist.h | 21 ++++ include/linux/sched/signal.h | 11 -- kernel/fork.c | 2 +- kernel/pid.c | 22 +++- kernel/time/posix-cpu-timers.c | 111 ++++++++--------- net/ipv6/ip6_flowlabel.c | 2 +- security/tomoyo/realpath.c | 4 +- tools/testing/selftests/proc/.gitignore | 2 + tools/testing/selftests/proc/Makefile | 2 + .../testing/selftests/proc/proc-fsconfig-hidepid.c | 50 ++++++++ .../testing/selftests/proc/proc-multiple-procfs.c | 48 ++++++++ 25 files changed, 492 insertions(+), 206 deletions(-) Eric ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [GIT PULL] proc changes for v5.8-rc1 2020-06-03 14:47 [GIT PULL] proc changes for v5.8-rc1 Eric W. Biederman @ 2020-06-04 21:15 ` pr-tracker-bot 2020-06-10 21:45 ` [GIT PULL] proc fixes " Eric W. Biederman 1 sibling, 0 replies; 10+ messages in thread From: pr-tracker-bot @ 2020-06-04 21:15 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linus Torvalds, linux-kernel The pull request you sent on Wed, 03 Jun 2020 09:47:51 -0500: > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git proc-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/9ff7258575d5fee011649d20cc56de720a395191 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker ^ permalink raw reply [flat|nested] 10+ messages in thread
* [GIT PULL] proc fixes for v5.8-rc1 2020-06-03 14:47 [GIT PULL] proc changes for v5.8-rc1 Eric W. Biederman 2020-06-04 21:15 ` pr-tracker-bot @ 2020-06-10 21:45 ` Eric W. Biederman 2020-06-10 22:05 ` pr-tracker-bot 2020-06-12 19:29 ` [GIT PULL] proc fixes v2 " Eric W. Biederman 1 sibling, 2 replies; 10+ messages in thread From: Eric W. Biederman @ 2020-06-10 21:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, Alexey Gladkov Please pull the proc-linus branch from the git tree: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git proc-linus HEAD: 058f2e4da79b23afb56ce3d03d907d6cdd36f2b8 proc: s_fs_info may be NULL when proc_kill_sb is called Syzbot found a NULL pointer dereference if kzalloc of proc's s_fs_info fails. The fix is the patch below. --- From: Alexey Gladkov <gladkov.alexey@gmail.com> Date: Wed, 10 Jun 2020 20:35:49 +0200 Subject: [PATCH] proc: s_fs_info may be NULL when proc_kill_sb is called syzbot found that proc_fill_super() fails before filling up sb->s_fs_info, deactivate_locked_super() will be called and sb->s_fs_info will be NULL. The proc_kill_sb() does not expect fs_info to be NULL which is wrong. Link: https://lore.kernel.org/lkml/0000000000002d7ca605a7b8b1c5@google.com Reported-by: syzbot+4abac52934a48af5ff19@syzkaller.appspotmail.com Fixes: fa10fed30f25 ("proc: allow to mount many instances of proc in one pid namespace") Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> --- fs/proc/root.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/proc/root.c b/fs/proc/root.c index ffebed1999e5..5e444d4f9717 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -264,11 +264,13 @@ static void proc_kill_sb(struct super_block *sb) { struct proc_fs_info *fs_info = proc_sb_info(sb); - if (fs_info->proc_self) - dput(fs_info->proc_self); + if (!fs_info) { + kill_anon_super(sb); + return; + } - if (fs_info->proc_thread_self) - dput(fs_info->proc_thread_self); + dput(fs_info->proc_self); + dput(fs_info->proc_thread_self); kill_anon_super(sb); put_pid_ns(fs_info->pid_ns); -- 2.20.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [GIT PULL] proc fixes for v5.8-rc1 2020-06-10 21:45 ` [GIT PULL] proc fixes " Eric W. Biederman @ 2020-06-10 22:05 ` pr-tracker-bot 2020-06-12 19:29 ` [GIT PULL] proc fixes v2 " Eric W. Biederman 1 sibling, 0 replies; 10+ messages in thread From: pr-tracker-bot @ 2020-06-10 22:05 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linus Torvalds, linux-kernel, Alexey Gladkov The pull request you sent on Wed, 10 Jun 2020 16:45:50 -0500: > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git proc-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/79ca035d2d941839f55f3b8b69f8e81c66946ed8 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [GIT PULL] proc fixes v2 for v5.8-rc1 2020-06-10 21:45 ` [GIT PULL] proc fixes " Eric W. Biederman 2020-06-10 22:05 ` pr-tracker-bot @ 2020-06-12 19:29 ` Eric W. Biederman 2020-06-12 19:46 ` Linus Torvalds 2020-06-12 19:50 ` pr-tracker-bot 1 sibling, 2 replies; 10+ messages in thread From: Eric W. Biederman @ 2020-06-12 19:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, Alexey Gladkov ebiederm@xmission.com (Eric W. Biederman) writes: Please pull the proc-linus branch from the git tree: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git proc-linus HEAD: ef1548adada51a2f32ed7faef50aa465e1b4c5da proc: Use new_inode not new_inode_pseudo Much to my surprise syzbot found a very old bug in proc that the recent changes made easier to reproce. This bug is subtle enough it looks like it fooled everyone who should know better. Eric --- From ef1548adada51a2f32ed7faef50aa465e1b4c5da Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" <ebiederm@xmission.com> Date: Fri, 12 Jun 2020 09:42:03 -0500 Subject: [PATCH] proc: Use new_inode not new_inode_pseudo Recently syzbot reported that unmounting proc when there is an ongoing inotify watch on the root directory of proc could result in a use after free when the watch is removed after the unmount of proc when the watcher exits. Commit 69879c01a0c3 ("proc: Remove the now unnecessary internal mount of proc") made it easier to unmount proc and allowed syzbot to see the problem, but looking at the code it has been around for a long time. Looking at the code the fsnotify watch should have been removed by fsnotify_sb_delete in generic_shutdown_super. Unfortunately the inode was allocated with new_inode_pseudo instead of new_inode so the inode was not on the sb->s_inodes list. Which prevented fsnotify_unmount_inodes from finding the inode and removing the watch as well as made it so the "VFS: Busy inodes after unmount" warning could not find the inodes to warn about them. Make all of the inodes in proc visible to generic_shutdown_super, and fsnotify_sb_delete by using new_inode instead of new_inode_pseudo. The only functional difference is that new_inode places the inodes on the sb->s_inodes list. I wrote a small test program and I can verify that without changes it can trigger this issue, and by replacing new_inode_pseudo with new_inode the issues goes away. Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/000000000000d788c905a7dfa3f4@google.com Reported-by: syzbot+7d2debdcdb3cb93c1e5e@syzkaller.appspotmail.com Fixes: 0097875bd415 ("proc: Implement /proc/thread-self to point at the directory of the current thread") Fixes: 021ada7dff22 ("procfs: switch /proc/self away from proc_dir_entry") Fixes: 51f0885e5415 ("vfs,proc: guarantee unique inodes in /proc") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/proc/inode.c | 2 +- fs/proc/self.c | 2 +- fs/proc/thread_self.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/proc/inode.c b/fs/proc/inode.c index f40c2532c057..28d6105e908e 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -617,7 +617,7 @@ const struct inode_operations proc_link_inode_operations = { struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de) { - struct inode *inode = new_inode_pseudo(sb); + struct inode *inode = new_inode(sb); if (inode) { inode->i_ino = de->low_ino; diff --git a/fs/proc/self.c b/fs/proc/self.c index ca5158fa561c..72cd69bcaf4a 100644 --- a/fs/proc/self.c +++ b/fs/proc/self.c @@ -43,7 +43,7 @@ int proc_setup_self(struct super_block *s) inode_lock(root_inode); self = d_alloc_name(s->s_root, "self"); if (self) { - struct inode *inode = new_inode_pseudo(s); + struct inode *inode = new_inode(s); if (inode) { inode->i_ino = self_inum; inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c index ac284f409568..a553273fbd41 100644 --- a/fs/proc/thread_self.c +++ b/fs/proc/thread_self.c @@ -43,7 +43,7 @@ int proc_setup_thread_self(struct super_block *s) inode_lock(root_inode); thread_self = d_alloc_name(s->s_root, "thread-self"); if (thread_self) { - struct inode *inode = new_inode_pseudo(s); + struct inode *inode = new_inode(s); if (inode) { inode->i_ino = thread_self_inum; inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); -- 2.20.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [GIT PULL] proc fixes v2 for v5.8-rc1 2020-06-12 19:29 ` [GIT PULL] proc fixes v2 " Eric W. Biederman @ 2020-06-12 19:46 ` Linus Torvalds 2020-06-12 20:02 ` Eric W. Biederman 2020-06-12 19:50 ` pr-tracker-bot 1 sibling, 1 reply; 10+ messages in thread From: Linus Torvalds @ 2020-06-12 19:46 UTC (permalink / raw) To: Eric W. Biederman, Alexey Dobriyan, Al Viro Cc: Linux Kernel Mailing List, Alexey Gladkov On Fri, Jun 12, 2020 at 12:34 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > ebiederm@xmission.com (Eric W. Biederman) writes: What happened to that first version of the email? I never got it.. > Looking at the code the fsnotify watch should have been removed by > fsnotify_sb_delete in generic_shutdown_super. Hmm. Correct. The new_inode_pseudo() is for things that don't have quotas, fsnotify or writeback. That was used somewhat intentionally on /proc, though. /proc certainly doesn't have quotas or writeback. And fsnotify on /proc seems a bit questionably too. Do people actually _do_ this and depend on it, or is this just about syzbot doing something odd and thus showing the problem? Anyway, I have pulled your fix, because I think it's reasonable and safe, but I do wonder if we should have kept the new_inode_pseudo(), and instead just make fsnotify say "you can't notify on an inode that isn't on the superblock list". Hmm? Is fsnotify on /proc really sensible? Do we actually generate any useful notifications? Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [GIT PULL] proc fixes v2 for v5.8-rc1 2020-06-12 19:46 ` Linus Torvalds @ 2020-06-12 20:02 ` Eric W. Biederman 2020-06-12 20:16 ` Linus Torvalds 0 siblings, 1 reply; 10+ messages in thread From: Eric W. Biederman @ 2020-06-12 20:02 UTC (permalink / raw) To: Linus Torvalds Cc: Alexey Dobriyan, Al Viro, Linux Kernel Mailing List, Alexey Gladkov Linus Torvalds <torvalds@linux-foundation.org> writes: > On Fri, Jun 12, 2020 at 12:34 PM Eric W. Biederman > <ebiederm@xmission.com> wrote: >> >> ebiederm@xmission.com (Eric W. Biederman) writes: > > What happened to that first version of the email? I never got it.. A little distracted I think. I forgot to edit the above line out, and v2 because it is my second pull request for a proc fix into v5.8-rc1. >> Looking at the code the fsnotify watch should have been removed by >> fsnotify_sb_delete in generic_shutdown_super. > > Hmm. Correct. The new_inode_pseudo() is for things that don't have > quotas, fsnotify or writeback. > > That was used somewhat intentionally on /proc, though. /proc certainly > doesn't have quotas or writeback. It also means we break our debugging by not putting inodes on the s_inodes list. AKA this line in generic_shutdown_super is also depent on that behavior. if (!list_empty(&sb->s_inodes)) { printk("VFS: Busy inodes after unmount of %s. " "Self-destruct in 5 seconds. Have a nice day...\n", sb->s_id); } > And fsnotify on /proc seems a bit questionably too. Do people actually > _do_ this and depend on it, or is this just about syzbot doing > something odd and thus showing the problem? > > Anyway, I have pulled your fix, because I think it's reasonable and > safe, but I do wonder if we should have kept the new_inode_pseudo(), > and instead just make fsnotify say "you can't notify on an inode that > isn't on the superblock list". Hmm? > > Is fsnotify on /proc really sensible? Do we actually generate any > useful notifications? I don't know of any proc code that does anything to make fsnotify/inotify work. Since changes to proc are not initialiated through the vfs that probably means fsnotify is pretty much useless. I have a sense that a use after free that anyone can trigger could be a bit dangerous, and despite not being the only virtual filesystem in the kernel proc is the only virtual filesystem that called new_inode_pseudo. Eric ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [GIT PULL] proc fixes v2 for v5.8-rc1 2020-06-12 20:02 ` Eric W. Biederman @ 2020-06-12 20:16 ` Linus Torvalds 2020-06-12 22:47 ` Eric W. Biederman 0 siblings, 1 reply; 10+ messages in thread From: Linus Torvalds @ 2020-06-12 20:16 UTC (permalink / raw) To: Eric W. Biederman Cc: Alexey Dobriyan, Al Viro, Linux Kernel Mailing List, Alexey Gladkov On Fri, Jun 12, 2020 at 1:06 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > I have a sense that a use after free that anyone can trigger could be a > bit dangerous, and despite not being the only virtual filesystem in the > kernel proc is the only virtual filesystem that called new_inode_pseudo. So the reason I pulled that change despite my questions was that I do agree with the whole "there's probably little point to use new_inode_pseudo() here" argument. But at the same time, I ghet the feeling that this partly just is papering over the problem. If fsnotify causes problems with a new_inode_pseudo() inode, then fsnotify should be _checking_ for that case. And if fsnotify were to check for it, then the reason for /proc to use it would largely go away. Maybe the debug check for umount matters, but honestly it doesn't really seem to be a big deal. A pseudo-inode is basically independent of the filesystem it was mounted as, so the generic_shutdown_super() check not triggering for them is sensible, I feel. But yeah, we could also make the rule go the other way, and simply make sure that "new_inode_pseudo()" itself checks that the super-block you give it is something fundamenally unmountable and was created with 'kern_mount()'. That would have also figured out that the /proc case was broken. So the main objection I have here is really that this fix feels incomplete, and doesn't really reflect the underlying issue, just fixes the symptom. Either the underlying issue is that you shouldn't call 'fsnotify' on /proc, or the underlying issue is that /proc was using a bad inode and nobody even noticed until the fsnotify issue. This is not a huge deal. I think you've fixed the bug, I just have this itch that the thing that triggered it shouldn't have happened in the first place. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [GIT PULL] proc fixes v2 for v5.8-rc1 2020-06-12 20:16 ` Linus Torvalds @ 2020-06-12 22:47 ` Eric W. Biederman 0 siblings, 0 replies; 10+ messages in thread From: Eric W. Biederman @ 2020-06-12 22:47 UTC (permalink / raw) To: Linus Torvalds Cc: Alexey Dobriyan, Al Viro, Linux Kernel Mailing List, Alexey Gladkov Linus Torvalds <torvalds@linux-foundation.org> writes: > On Fri, Jun 12, 2020 at 1:06 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >> >> I have a sense that a use after free that anyone can trigger could be a >> bit dangerous, and despite not being the only virtual filesystem in the >> kernel proc is the only virtual filesystem that called new_inode_pseudo. > > So the reason I pulled that change despite my questions was that I do > agree with the whole "there's probably little point to use > new_inode_pseudo() here" argument. > > But at the same time, I ghet the feeling that this partly just is > papering over the problem. If fsnotify causes problems with a > new_inode_pseudo() inode, then fsnotify should be _checking_ for that > case. > > And if fsnotify were to check for it, then the reason for /proc to use > it would largely go away. Maybe the debug check for umount matters, > but honestly it doesn't really seem to be a big deal. > > A pseudo-inode is basically independent of the filesystem it was > mounted as, so the generic_shutdown_super() check not triggering for > them is sensible, I feel. > > But yeah, we could also make the rule go the other way, and simply > make sure that "new_inode_pseudo()" itself checks that the super-block > you give it is something fundamenally unmountable and was created with > 'kern_mount()'. > > That would have also figured out that the /proc case was broken. > > So the main objection I have here is really that this fix feels > incomplete, and doesn't really reflect the underlying issue, just > fixes the symptom. > > Either the underlying issue is that you shouldn't call 'fsnotify' on > /proc, or the underlying issue is that /proc was using a bad inode and > nobody even noticed until the fsnotify issue. > > This is not a huge deal. I think you've fixed the bug, I just have > this itch that the thing that triggered it shouldn't have happened in > the first place. Yeah. I have a similar feeling. Especially since it took about 7 years for someone to notice something was not right. Still right now I am not up to digging and adding warnings and causing things to fail. What energy I have I am going to use to keep sorting out exec. I am not ready to page back in the fine details of how all of the filesystem pieces interact right now. I am wondering now if there are any crazy corner cases of the unix domain sockets where fsnotify could latch onto one, and have problems. It looks like the inode comes from the underlying filesystem so we should be safe. Eric ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: [GIT PULL] proc fixes v2 for v5.8-rc1 2020-06-12 19:29 ` [GIT PULL] proc fixes v2 " Eric W. Biederman 2020-06-12 19:46 ` Linus Torvalds @ 2020-06-12 19:50 ` pr-tracker-bot 1 sibling, 0 replies; 10+ messages in thread From: pr-tracker-bot @ 2020-06-12 19:50 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linus Torvalds, linux-kernel, Alexey Gladkov The pull request you sent on Fri, 12 Jun 2020 14:29:50 -0500: > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git proc-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/44ebe016df3aad96e3be8f95ec52397728dd7701 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-06-12 22:51 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-03 14:47 [GIT PULL] proc changes for v5.8-rc1 Eric W. Biederman 2020-06-04 21:15 ` pr-tracker-bot 2020-06-10 21:45 ` [GIT PULL] proc fixes " Eric W. Biederman 2020-06-10 22:05 ` pr-tracker-bot 2020-06-12 19:29 ` [GIT PULL] proc fixes v2 " Eric W. Biederman 2020-06-12 19:46 ` Linus Torvalds 2020-06-12 20:02 ` Eric W. Biederman 2020-06-12 20:16 ` Linus Torvalds 2020-06-12 22:47 ` Eric W. Biederman 2020-06-12 19:50 ` pr-tracker-bot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).