* [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-06 0:57 ` Eric W. Biederman 0 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-06 0:57 UTC (permalink / raw) To: Linus Torvalds Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA Linus, Please pull the for-linus branch from the git tree: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts This is a bunch of small changes built against 3.16-rc6. The most significant change for users is the first patch which makes setns drmatically faster by removing unneded rcu handling. The next chunk of changes are so that "mount -o remount,.." will not allow the user namespace root to drop flags on a mount set by the system wide root. Aks this forces read-only mounts to stay read-only, no-dev mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to stay no exec and it prevents unprivileged users from messing with a mounts atime settings. I have included my test case as the last patch in this series so people performing backports can verify this change works correctly. The next change fixes a bug in NFS that was discovered while auditing nsproxy users for the first optimization. Today you can oops the kernel by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid namespaces. I rebased and fixed the build of the !CONFIG_NFS_FS case yesterday when a build bot caught my typo. Given that no one to my knowledge bases anything on my tree fixing the typo in place seems more responsible that requiring a typo-fix to be backported as well. The last change is a small semantic cleanup introducing /proc/thread-self and pointing /proc/mounts and /proc/net at it. This prevents several kinds of problemantic corner cases. It is a user-visible change so it has a minute chance of causing regressions so the change to /proc/mounts and /proc/net are individual one line commits that can be trivially reverted. Unfortunately I lost and could not find the email of the original reporter so he is not credited. From at least one perspective this change to /proc/net is a refgression fix to allow pthread /proc/net uses that were broken by the introduction of the network namespace. Eric Eric W. Biederman (11): namespaces: Use task_lock and not rcu to protect nsproxy mnt: Only change user settable mount flags in remount mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount mnt: Correct permission checks in do_remount mnt: Change the default remount atime from relatime to the existing value mnt: Add tests for unprivileged remount cases that have found to be faulty NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes proc: Have net show up under /proc/<tgid>/task/<tid> proc: Implement /proc/thread-self to point at the directory of the current thread proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts fs/namespace.c | 65 +++++- fs/nfs/client.c | 95 ++++---- fs/nfs/inode.c | 3 +- fs/nfs/internal.h | 9 + fs/nfs/netns.h | 3 + fs/proc/Makefile | 1 + fs/proc/base.c | 18 +- fs/proc/inode.c | 7 +- fs/proc/internal.h | 6 + fs/proc/proc_net.c | 6 +- fs/proc/root.c | 5 +- fs/proc/thread_self.c | 85 ++++++++ fs/proc_namespace.c | 8 +- include/linux/mount.h | 9 +- include/linux/nsproxy.h | 16 +- include/linux/pid_namespace.h | 1 + ipc/namespace.c | 6 +- kernel/nsproxy.c | 15 +- kernel/utsname.c | 6 +- net/core/net_namespace.c | 10 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/mount/Makefile | 17 ++ .../selftests/mount/unprivileged-remount-test.c | 242 +++++++++++++++++++++ 23 files changed, 537 insertions(+), 97 deletions(-) ^ permalink raw reply [flat|nested] 67+ messages in thread
* [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-06 0:57 ` Eric W. Biederman 0 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-06 0:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Containers, linux-fsdevel, linux-kernel Linus, Please pull the for-linus branch from the git tree: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts This is a bunch of small changes built against 3.16-rc6. The most significant change for users is the first patch which makes setns drmatically faster by removing unneded rcu handling. The next chunk of changes are so that "mount -o remount,.." will not allow the user namespace root to drop flags on a mount set by the system wide root. Aks this forces read-only mounts to stay read-only, no-dev mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to stay no exec and it prevents unprivileged users from messing with a mounts atime settings. I have included my test case as the last patch in this series so people performing backports can verify this change works correctly. The next change fixes a bug in NFS that was discovered while auditing nsproxy users for the first optimization. Today you can oops the kernel by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid namespaces. I rebased and fixed the build of the !CONFIG_NFS_FS case yesterday when a build bot caught my typo. Given that no one to my knowledge bases anything on my tree fixing the typo in place seems more responsible that requiring a typo-fix to be backported as well. The last change is a small semantic cleanup introducing /proc/thread-self and pointing /proc/mounts and /proc/net at it. This prevents several kinds of problemantic corner cases. It is a user-visible change so it has a minute chance of causing regressions so the change to /proc/mounts and /proc/net are individual one line commits that can be trivially reverted. Unfortunately I lost and could not find the email of the original reporter so he is not credited. From at least one perspective this change to /proc/net is a refgression fix to allow pthread /proc/net uses that were broken by the introduction of the network namespace. Eric Eric W. Biederman (11): namespaces: Use task_lock and not rcu to protect nsproxy mnt: Only change user settable mount flags in remount mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount mnt: Correct permission checks in do_remount mnt: Change the default remount atime from relatime to the existing value mnt: Add tests for unprivileged remount cases that have found to be faulty NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes proc: Have net show up under /proc/<tgid>/task/<tid> proc: Implement /proc/thread-self to point at the directory of the current thread proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts fs/namespace.c | 65 +++++- fs/nfs/client.c | 95 ++++---- fs/nfs/inode.c | 3 +- fs/nfs/internal.h | 9 + fs/nfs/netns.h | 3 + fs/proc/Makefile | 1 + fs/proc/base.c | 18 +- fs/proc/inode.c | 7 +- fs/proc/internal.h | 6 + fs/proc/proc_net.c | 6 +- fs/proc/root.c | 5 +- fs/proc/thread_self.c | 85 ++++++++ fs/proc_namespace.c | 8 +- include/linux/mount.h | 9 +- include/linux/nsproxy.h | 16 +- include/linux/pid_namespace.h | 1 + ipc/namespace.c | 6 +- kernel/nsproxy.c | 15 +- kernel/utsname.c | 6 +- net/core/net_namespace.c | 10 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/mount/Makefile | 17 ++ .../selftests/mount/unprivileged-remount-test.c | 242 +++++++++++++++++++++ 23 files changed, 537 insertions(+), 97 deletions(-) ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <87fvhav3ic.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-06 0:57 ` Eric W. Biederman (?) @ 2014-08-06 4:46 ` Stephen Rothwell -1 siblings, 0 replies; 67+ messages in thread From: Stephen Rothwell @ 2014-08-06 4:46 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linus Torvalds, linux-kernel-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1.1: Type: text/plain, Size: 1373 bytes --] Hi Eric, On Tue, 05 Aug 2014 17:57:31 -0700 ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote: > > Please pull the for-linus branch from the git tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus > > HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts This has had 4 commits added since the merge window opened that have no Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by tag. > The last change is a small semantic cleanup introducing > /proc/thread-self and pointing /proc/mounts and /proc/net at it. This > prevents several kinds of problemantic corner cases. It is a > user-visible change so it has a minute chance of causing regressions so > the change to /proc/mounts and /proc/net are individual one line commits > that can be trivially reverted. Unfortunately I lost and could not find > the email of the original reporter so he is not credited. From at least > one perspective this change to /proc/net is a refgression fix to allow > pthread /proc/net uses that were broken by the introduction of the network > namespace. They appear to include this last change. -- Cheers, Stephen Rothwell sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 205 bytes --] _______________________________________________ Containers mailing list Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org https://lists.linuxfoundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-06 4:46 ` Stephen Rothwell 0 siblings, 0 replies; 67+ messages in thread From: Stephen Rothwell @ 2014-08-06 4:46 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linus Torvalds, linux-kernel-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1.1: Type: text/plain, Size: 1373 bytes --] Hi Eric, On Tue, 05 Aug 2014 17:57:31 -0700 ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote: > > Please pull the for-linus branch from the git tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus > > HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts This has had 4 commits added since the merge window opened that have no Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by tag. > The last change is a small semantic cleanup introducing > /proc/thread-self and pointing /proc/mounts and /proc/net at it. This > prevents several kinds of problemantic corner cases. It is a > user-visible change so it has a minute chance of causing regressions so > the change to /proc/mounts and /proc/net are individual one line commits > that can be trivially reverted. Unfortunately I lost and could not find > the email of the original reporter so he is not credited. From at least > one perspective this change to /proc/net is a refgression fix to allow > pthread /proc/net uses that were broken by the introduction of the network > namespace. They appear to include this last change. -- Cheers, Stephen Rothwell sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 205 bytes --] _______________________________________________ Containers mailing list Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org https://lists.linuxfoundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-06 4:46 ` Stephen Rothwell 0 siblings, 0 replies; 67+ messages in thread From: Stephen Rothwell @ 2014-08-06 4:46 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Linux Containers, linux-fsdevel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1323 bytes --] Hi Eric, On Tue, 05 Aug 2014 17:57:31 -0700 ebiederm@xmission.com (Eric W. Biederman) wrote: > > Please pull the for-linus branch from the git tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus > > HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts This has had 4 commits added since the merge window opened that have no Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by tag. > The last change is a small semantic cleanup introducing > /proc/thread-self and pointing /proc/mounts and /proc/net at it. This > prevents several kinds of problemantic corner cases. It is a > user-visible change so it has a minute chance of causing regressions so > the change to /proc/mounts and /proc/net are individual one line commits > that can be trivially reverted. Unfortunately I lost and could not find > the email of the original reporter so he is not credited. From at least > one perspective this change to /proc/net is a refgression fix to allow > pthread /proc/net uses that were broken by the introduction of the network > namespace. They appear to include this last change. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <20140806144643.45e5dab8-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-06 4:46 ` Stephen Rothwell @ 2014-08-06 5:16 ` Eric W. Biederman -1 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-06 5:16 UTC (permalink / raw) To: Stephen Rothwell Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linus Torvalds, linux-kernel-u79uwXL29TY76Z2rM5mHXA Stephen Rothwell <sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org> writes: > Hi Eric, > > On Tue, 05 Aug 2014 17:57:31 -0700 ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote: >> >> Please pull the for-linus branch from the git tree: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus >> >> HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts > > This has had 4 commits added since the merge window opened that have no > Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by > tag. I am not certain what your point is. There have been no commits added since the merge window opened. There was one commit changed to fix a typo. I documented that already. There were some commits pushed to the tree as late as friday that had been out for review earlier than that and it is possible that you did not pick them up in linux-next until monday. That doesn't mean I added anything after the merge window opened. I have also made certain all of these commits have at least had a chance to show up in linux-next. As for missing cool tags shrug. The people looking at my code didn't feel like saying the magic words so I didn't include cool tags. Furthermore the code is all quite trivial. Beyond that I have been quite out of it recently and this is what I had time to do. If I had had a little more time and energy I would have included unmount on unlink patches that still need magic to happen to keep from blowing the stack in pathological cases on everything except x86_64. That code has been sitting in linux-next. Which is a my long winded way of say it sounds like you are accusing me of being irresponsible, and my way of saying that I have tested and vetted these trivial patches and been as careful as I reasonably could, and I that I apologise for not being able to do things a few days earlier. Eric ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-06 5:16 ` Eric W. Biederman 0 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-06 5:16 UTC (permalink / raw) To: Stephen Rothwell Cc: Linus Torvalds, Linux Containers, linux-fsdevel, linux-kernel Stephen Rothwell <sfr@canb.auug.org.au> writes: > Hi Eric, > > On Tue, 05 Aug 2014 17:57:31 -0700 ebiederm@xmission.com (Eric W. Biederman) wrote: >> >> Please pull the for-linus branch from the git tree: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus >> >> HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts > > This has had 4 commits added since the merge window opened that have no > Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by > tag. I am not certain what your point is. There have been no commits added since the merge window opened. There was one commit changed to fix a typo. I documented that already. There were some commits pushed to the tree as late as friday that had been out for review earlier than that and it is possible that you did not pick them up in linux-next until monday. That doesn't mean I added anything after the merge window opened. I have also made certain all of these commits have at least had a chance to show up in linux-next. As for missing cool tags shrug. The people looking at my code didn't feel like saying the magic words so I didn't include cool tags. Furthermore the code is all quite trivial. Beyond that I have been quite out of it recently and this is what I had time to do. If I had had a little more time and energy I would have included unmount on unlink patches that still need magic to happen to keep from blowing the stack in pathological cases on everything except x86_64. That code has been sitting in linux-next. Which is a my long winded way of say it sounds like you are accusing me of being irresponsible, and my way of saying that I have tested and vetted these trivial patches and been as careful as I reasonably could, and I that I apologise for not being able to do things a few days earlier. Eric ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <87lhr2tcyx.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-06 5:16 ` Eric W. Biederman (?) @ 2014-08-06 6:06 ` Stephen Rothwell -1 siblings, 0 replies; 67+ messages in thread From: Stephen Rothwell @ 2014-08-06 6:06 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linus Torvalds, linux-kernel-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1.1: Type: text/plain, Size: 1964 bytes --] Hi Eric, On Tue, 05 Aug 2014 22:16:06 -0700 ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote: > > I am not certain what your point is. I am just trying to give Linus a heads up for branches that have not had much exposure before he is asked to pull them. > There have been no commits added since the merge window opened. > > There was one commit changed to fix a typo. I documented that already. Yes, that is what confused my, sorry, since that commit and the ones following are new commits but unchanged patches. > There were some commits pushed to the tree as late as friday that had > been out for review earlier than that and it is possible that you did > not pick them up in linux-next until monday. That doesn't mean I added > anything after the merge window opened. Correct. > I have also made certain all of these commits have at least had a chance > to show up in linux-next. > > As for missing cool tags shrug. The people looking at my code didn't > feel like saying the magic words so I didn't include cool tags. Maybe you should push them ... these tags are not just "cool", they give less involved people some indications of what has happened in the life of a patch. > Beyond that I have been quite out of it recently and this is what I had > time to do. If I had had a little more time and energy I would have > included unmount on unlink patches that still need magic to happen to > keep from blowing the stack in pathological cases on everything except > x86_64. That code has been sitting in linux-next. > > Which is a my long winded way of say it sounds like you are accusing me > of being irresponsible, and my way of saying that I have tested and I did not mean to accuse, there are reasons (as you have pointed out) that things get added late, or rewritten. -- Cheers, Stephen Rothwell sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 205 bytes --] _______________________________________________ Containers mailing list Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org https://lists.linuxfoundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-06 6:06 ` Stephen Rothwell 0 siblings, 0 replies; 67+ messages in thread From: Stephen Rothwell @ 2014-08-06 6:06 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linus Torvalds, linux-kernel-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1.1: Type: text/plain, Size: 1964 bytes --] Hi Eric, On Tue, 05 Aug 2014 22:16:06 -0700 ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote: > > I am not certain what your point is. I am just trying to give Linus a heads up for branches that have not had much exposure before he is asked to pull them. > There have been no commits added since the merge window opened. > > There was one commit changed to fix a typo. I documented that already. Yes, that is what confused my, sorry, since that commit and the ones following are new commits but unchanged patches. > There were some commits pushed to the tree as late as friday that had > been out for review earlier than that and it is possible that you did > not pick them up in linux-next until monday. That doesn't mean I added > anything after the merge window opened. Correct. > I have also made certain all of these commits have at least had a chance > to show up in linux-next. > > As for missing cool tags shrug. The people looking at my code didn't > feel like saying the magic words so I didn't include cool tags. Maybe you should push them ... these tags are not just "cool", they give less involved people some indications of what has happened in the life of a patch. > Beyond that I have been quite out of it recently and this is what I had > time to do. If I had had a little more time and energy I would have > included unmount on unlink patches that still need magic to happen to > keep from blowing the stack in pathological cases on everything except > x86_64. That code has been sitting in linux-next. > > Which is a my long winded way of say it sounds like you are accusing me > of being irresponsible, and my way of saying that I have tested and I did not mean to accuse, there are reasons (as you have pointed out) that things get added late, or rewritten. -- Cheers, Stephen Rothwell sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 205 bytes --] _______________________________________________ Containers mailing list Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org https://lists.linuxfoundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-06 6:06 ` Stephen Rothwell 0 siblings, 0 replies; 67+ messages in thread From: Stephen Rothwell @ 2014-08-06 6:06 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Linux Containers, linux-fsdevel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1914 bytes --] Hi Eric, On Tue, 05 Aug 2014 22:16:06 -0700 ebiederm@xmission.com (Eric W. Biederman) wrote: > > I am not certain what your point is. I am just trying to give Linus a heads up for branches that have not had much exposure before he is asked to pull them. > There have been no commits added since the merge window opened. > > There was one commit changed to fix a typo. I documented that already. Yes, that is what confused my, sorry, since that commit and the ones following are new commits but unchanged patches. > There were some commits pushed to the tree as late as friday that had > been out for review earlier than that and it is possible that you did > not pick them up in linux-next until monday. That doesn't mean I added > anything after the merge window opened. Correct. > I have also made certain all of these commits have at least had a chance > to show up in linux-next. > > As for missing cool tags shrug. The people looking at my code didn't > feel like saying the magic words so I didn't include cool tags. Maybe you should push them ... these tags are not just "cool", they give less involved people some indications of what has happened in the life of a patch. > Beyond that I have been quite out of it recently and this is what I had > time to do. If I had had a little more time and energy I would have > included unmount on unlink patches that still need magic to happen to > keep from blowing the stack in pathological cases on everything except > x86_64. That code has been sitting in linux-next. > > Which is a my long winded way of say it sounds like you are accusing me > of being irresponsible, and my way of saying that I have tested and I did not mean to accuse, there are reasons (as you have pointed out) that things get added late, or rewritten. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <20140806160608.218b6944-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-06 6:06 ` Stephen Rothwell @ 2014-08-06 6:30 ` Eric W. Biederman -1 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-06 6:30 UTC (permalink / raw) To: Stephen Rothwell Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linus Torvalds, linux-kernel-u79uwXL29TY76Z2rM5mHXA Stephen Rothwell <sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org> writes: > Hi Eric, > >> As for missing cool tags shrug. The people looking at my code didn't >> feel like saying the magic words so I didn't include cool tags. > > Maybe you should push them ... these tags are not just "cool", they > give less involved people some indications of what has happened in the > life of a patch. Etiquette is a good idea where it helps things flow smoothly. Other times like this standing on etiquette would have resulted in trivial fixes for long standing issues not going anywhere. Althogh I will keep it in mind in the future. >> Which is a my long winded way of say it sounds like you are accusing me >> of being irresponsible, and my way of saying that I have tested and > > I did not mean to accuse, there are reasons (as you have pointed out) > that things get added late, or rewritten. I really try to avoid those reasons but I missed it this round and the buildbot did not manage to find the comibination with a build error until the code hit linux-next :( Eric ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-06 6:30 ` Eric W. Biederman 0 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-06 6:30 UTC (permalink / raw) To: Stephen Rothwell Cc: Linus Torvalds, Linux Containers, linux-fsdevel, linux-kernel Stephen Rothwell <sfr@canb.auug.org.au> writes: > Hi Eric, > >> As for missing cool tags shrug. The people looking at my code didn't >> feel like saying the magic words so I didn't include cool tags. > > Maybe you should push them ... these tags are not just "cool", they > give less involved people some indications of what has happened in the > life of a patch. Etiquette is a good idea where it helps things flow smoothly. Other times like this standing on etiquette would have resulted in trivial fixes for long standing issues not going anywhere. Althogh I will keep it in mind in the future. >> Which is a my long winded way of say it sounds like you are accusing me >> of being irresponsible, and my way of saying that I have tested and > > I did not mean to accuse, there are reasons (as you have pointed out) > that things get added late, or rewritten. I really try to avoid those reasons but I missed it this round and the buildbot did not manage to find the comibination with a build error until the code hit linux-next :( Eric ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-06 6:06 ` Stephen Rothwell @ 2014-08-07 13:28 ` Theodore Ts'o -1 siblings, 0 replies; 67+ messages in thread From: Theodore Ts'o @ 2014-08-07 13:28 UTC (permalink / raw) To: Stephen Rothwell Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linus Torvalds, Eric W. Biederman, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Wed, Aug 06, 2014 at 04:06:08PM +1000, Stephen Rothwell wrote: > > Yes, that is what confused my, sorry, since that commit and the ones > following are new commits but unchanged patches. As a suggestion, in addition to trying to count the number of commits that are new, or with the same patch id's, etc., is to provide the diff stats between the last version of the branch that landed in linux-next before the merge window opened, and the branch head that was requested for Linus to pull? This will get tripped up by those poeple who send multiple pull requests which are subsets of the branch that is tracked by linux-next (for example, to arrange the order of merging to minimize conflicts), so like all things, it's not perfect, but it might be useful additional indicator, at least in the early days of the merge window. Also, if you could report the commit id of the branch head before the merge window opened, then other people (including Linus) can do their own investigation and draw their own conclusions. Cheers, - Ted P.S. If there was some way this could be automated so that a 'bot sends out a report for every single pull request, then it minimizes the "J'accuse!" aspect, since the report is being done for all pull requests, and everyone knows that robots makes mistakes. :-) ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-07 13:28 ` Theodore Ts'o 0 siblings, 0 replies; 67+ messages in thread From: Theodore Ts'o @ 2014-08-07 13:28 UTC (permalink / raw) To: Stephen Rothwell Cc: Eric W. Biederman, Linus Torvalds, Linux Containers, linux-fsdevel, linux-kernel On Wed, Aug 06, 2014 at 04:06:08PM +1000, Stephen Rothwell wrote: > > Yes, that is what confused my, sorry, since that commit and the ones > following are new commits but unchanged patches. As a suggestion, in addition to trying to count the number of commits that are new, or with the same patch id's, etc., is to provide the diff stats between the last version of the branch that landed in linux-next before the merge window opened, and the branch head that was requested for Linus to pull? This will get tripped up by those poeple who send multiple pull requests which are subsets of the branch that is tracked by linux-next (for example, to arrange the order of merging to minimize conflicts), so like all things, it's not perfect, but it might be useful additional indicator, at least in the early days of the merge window. Also, if you could report the commit id of the branch head before the merge window opened, then other people (including Linus) can do their own investigation and draw their own conclusions. Cheers, - Ted P.S. If there was some way this could be automated so that a 'bot sends out a report for every single pull request, then it minimizes the "J'accuse!" aspect, since the report is being done for all pull requests, and everyone knows that robots makes mistakes. :-) ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-06 0:57 ` Eric W. Biederman @ 2014-08-13 2:46 ` Andy Lutomirski -1 siblings, 0 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-13 2:46 UTC (permalink / raw) To: Eric W. Biederman, Linus Torvalds Cc: Linux FS Devel, kenton-AuYgBwuPrUQTaNkGU808tA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Linux Kernel Mailing List On 08/05/2014 05:57 PM, Eric W. Biederman wrote: > > Linus, > > Please pull the for-linus branch from the git tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus > > HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts > > This is a bunch of small changes built against 3.16-rc6. The most > significant change for users is the first patch which makes setns > drmatically faster by removing unneded rcu handling. > > The next chunk of changes are so that "mount -o remount,.." will not > allow the user namespace root to drop flags on a mount set by the system > wide root. Aks this forces read-only mounts to stay read-only, no-dev > mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to > stay no exec and it prevents unprivileged users from messing with a > mounts atime settings. I have included my test case as the last patch > in this series so people performing backports can verify this change > works correctly. > Sorry for catching this late. I think this fix is likely to unnecessarily break valid userspace due to an odd interaction. do_new_mount contains this: if (user_ns != &init_user_ns) { if (!(type->fs_flags & FS_USERNS_MOUNT)) { put_filesystem(type); return -EPERM; } /* Only in special cases allow devices from mounts * created outside the initial user namespace. */ if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) { flags |= MS_NODEV; mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV; } } This means that private tmpfs mounts end up with MNT_NODEV | MNT_LOCK_NODEV. Certainly the change from MNT_NODEV to MNT_LOCK_NODEV is sensible *if MNT_NODEV made sense in the first place*. But we didn't have MNT_LOCK_NODEV in the past, so this well-intentioned code never really worked. This has a very strange effect: mounting a private tmpfs with all default flags and then remounting with MS_REMOUNT | MS_BIND *without MS_NODEV* will now fail. I suspect that this practice is rather common, since most likely no one ever paid attention to that implicit MNT_NODEV before. I would argue that the correct fix is to either remove the implicit MNT_NODEV entirely or to set FS_USERNS_DEV_MOUNT on most filesystems. The relevant filesystems are tmpfs, mqueue, sysfs, proc, ramfs, and devpts. devpts already sets FS_USERNS_DEV_MOUNT. Setting FS_USERNS_DEV_MOUNT should be safe on all of the others in this list -- I think that the only one that initially contains device nodes is sysfs on selinux systems, which contains a null node. I think the point of this is to prevent mounts of filesystems with user-controlled initial contents from being dangerous. But we don't have any of those yet. --Andy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-13 2:46 ` Andy Lutomirski 0 siblings, 0 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-13 2:46 UTC (permalink / raw) To: Eric W. Biederman, Linus Torvalds Cc: Linux FS Devel, containers, Linux Kernel Mailing List, kenton On 08/05/2014 05:57 PM, Eric W. Biederman wrote: > > Linus, > > Please pull the for-linus branch from the git tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus > > HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts > > This is a bunch of small changes built against 3.16-rc6. The most > significant change for users is the first patch which makes setns > drmatically faster by removing unneded rcu handling. > > The next chunk of changes are so that "mount -o remount,.." will not > allow the user namespace root to drop flags on a mount set by the system > wide root. Aks this forces read-only mounts to stay read-only, no-dev > mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to > stay no exec and it prevents unprivileged users from messing with a > mounts atime settings. I have included my test case as the last patch > in this series so people performing backports can verify this change > works correctly. > Sorry for catching this late. I think this fix is likely to unnecessarily break valid userspace due to an odd interaction. do_new_mount contains this: if (user_ns != &init_user_ns) { if (!(type->fs_flags & FS_USERNS_MOUNT)) { put_filesystem(type); return -EPERM; } /* Only in special cases allow devices from mounts * created outside the initial user namespace. */ if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) { flags |= MS_NODEV; mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV; } } This means that private tmpfs mounts end up with MNT_NODEV | MNT_LOCK_NODEV. Certainly the change from MNT_NODEV to MNT_LOCK_NODEV is sensible *if MNT_NODEV made sense in the first place*. But we didn't have MNT_LOCK_NODEV in the past, so this well-intentioned code never really worked. This has a very strange effect: mounting a private tmpfs with all default flags and then remounting with MS_REMOUNT | MS_BIND *without MS_NODEV* will now fail. I suspect that this practice is rather common, since most likely no one ever paid attention to that implicit MNT_NODEV before. I would argue that the correct fix is to either remove the implicit MNT_NODEV entirely or to set FS_USERNS_DEV_MOUNT on most filesystems. The relevant filesystems are tmpfs, mqueue, sysfs, proc, ramfs, and devpts. devpts already sets FS_USERNS_DEV_MOUNT. Setting FS_USERNS_DEV_MOUNT should be safe on all of the others in this list -- I think that the only one that initially contains device nodes is sysfs on selinux systems, which contains a null node. I think the point of this is to prevent mounts of filesystems with user-controlled initial contents from being dangerous. But we don't have any of those yet. --Andy ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <53EAD180.4010906-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-13 2:46 ` Andy Lutomirski @ 2014-08-13 4:17 ` Eric W. Biederman -1 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-13 4:17 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux FS Devel, kenton-AuYgBwuPrUQTaNkGU808tA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Linus Torvalds, Linux Kernel Mailing List Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes: > On 08/05/2014 05:57 PM, Eric W. Biederman wrote: > > Sorry for catching this late. I think this fix is likely to > unnecessarily break valid userspace due to an odd interaction. The code is correct and safe (no security issues), but yes a blind remount might hit a snag. If you can find a userspace application that matters I might care that a security fix breaks it. I think you have made a point that several more filesystems might be ok to not set nodev on (because we can not do anything to create device nodes on those filesystems). I personally would prefer the much more paranoid approach of only allowing device nodes on a unprivileged mount if we have audited all of the code paths and know it is safe for device nodes to appear there. I don't actually think anyone cares ad remounts of filesystems like tmpfs, mqueue, sysfs, proc, ramfs are all quite rare. Blind remounts are even rare. The normal userspace utilities look at the appropriate version of /proc/mounts on remount. These are not filesystems that a blind remount will likely be applied to. Furthermore there is work underway to prepare patches to allow "mount --bind -ro" to work as expected. That will further reduce the pressure from blind remounts. If there is an actual regression of actual code I am happy to deal with it. But having the MNT_NODEV on those mounts has been the case for a long time now and is not new (no regression). This change just closed the security hole that allowed nodev to be removed. And that security hole we need to have fixed. Eric ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-13 4:17 ` Eric W. Biederman 0 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-13 4:17 UTC (permalink / raw) To: Andy Lutomirski Cc: Linus Torvalds, Linux FS Devel, containers, Linux Kernel Mailing List, kenton Andy Lutomirski <luto@amacapital.net> writes: > On 08/05/2014 05:57 PM, Eric W. Biederman wrote: > > Sorry for catching this late. I think this fix is likely to > unnecessarily break valid userspace due to an odd interaction. The code is correct and safe (no security issues), but yes a blind remount might hit a snag. If you can find a userspace application that matters I might care that a security fix breaks it. I think you have made a point that several more filesystems might be ok to not set nodev on (because we can not do anything to create device nodes on those filesystems). I personally would prefer the much more paranoid approach of only allowing device nodes on a unprivileged mount if we have audited all of the code paths and know it is safe for device nodes to appear there. I don't actually think anyone cares ad remounts of filesystems like tmpfs, mqueue, sysfs, proc, ramfs are all quite rare. Blind remounts are even rare. The normal userspace utilities look at the appropriate version of /proc/mounts on remount. These are not filesystems that a blind remount will likely be applied to. Furthermore there is work underway to prepare patches to allow "mount --bind -ro" to work as expected. That will further reduce the pressure from blind remounts. If there is an actual regression of actual code I am happy to deal with it. But having the MNT_NODEV on those mounts has been the case for a long time now and is not new (no regression). This change just closed the security hole that allowed nodev to be removed. And that security hole we need to have fixed. Eric ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <87sil1nhut.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-13 4:17 ` Eric W. Biederman @ 2014-08-13 4:38 ` Andy Lutomirski -1 siblings, 0 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-13 4:38 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux FS Devel, Kenton Varda, Linux Containers, Linus Torvalds, Linux Kernel Mailing List On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: > Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes: > >> On 08/05/2014 05:57 PM, Eric W. Biederman wrote: >> >> Sorry for catching this late. I think this fix is likely to >> unnecessarily break valid userspace due to an odd interaction. > > The code is correct and safe (no security issues), but yes a blind > remount might hit a snag. > > If you can find a userspace application that matters I might care > that a security fix breaks it. > > I think you have made a point that several more filesystems might > be ok to not set nodev on (because we can not do anything to create > device nodes on those filesystems). I personally would prefer the much > more paranoid approach of only allowing device nodes on a unprivileged > mount if we have audited all of the code paths and know it is safe > for device nodes to appear there. > > I don't actually think anyone cares ad remounts of filesystems like > tmpfs, mqueue, sysfs, proc, ramfs are all quite rare. Blind remounts > are even rare. The normal userspace utilities look at the appropriate > version of /proc/mounts on remount. Bind remounts are the only kind of remounts, because we've never supported do_remount_sb in a user namespace. So, if you want to create some static content in your user namespace, the way to do it is: unshare(CLONE_NEWUSER | CLONE_NEWNS); mount tmpfs somewhere; write to the tmpfs; mount("path to tmpfs", "path to tmpfs", nullptr, MS_REMOUNT | MS_BIND | MS_RDONLY); > > These are not filesystems that a blind remount will likely be applied > to. > > Furthermore there is work underway to prepare patches to allow > "mount --bind -ro" to work as expected. That will further reduce > the pressure from blind remounts. Not for example above. It really does need the remount. > > If there is an actual regression of actual code I am happy to deal > with it. But having the MNT_NODEV on those mounts has been the case > for a long time now and is not new (no regression). This change just > closed the security hole that allowed nodev to be removed. And that > security hole we need to have fixed. Sandstorm does this. (Well, it did until today.) --Andy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-13 4:38 ` Andy Lutomirski 0 siblings, 0 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-13 4:38 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Linux FS Devel, Linux Containers, Linux Kernel Mailing List, Kenton Varda On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Andy Lutomirski <luto@amacapital.net> writes: > >> On 08/05/2014 05:57 PM, Eric W. Biederman wrote: >> >> Sorry for catching this late. I think this fix is likely to >> unnecessarily break valid userspace due to an odd interaction. > > The code is correct and safe (no security issues), but yes a blind > remount might hit a snag. > > If you can find a userspace application that matters I might care > that a security fix breaks it. > > I think you have made a point that several more filesystems might > be ok to not set nodev on (because we can not do anything to create > device nodes on those filesystems). I personally would prefer the much > more paranoid approach of only allowing device nodes on a unprivileged > mount if we have audited all of the code paths and know it is safe > for device nodes to appear there. > > I don't actually think anyone cares ad remounts of filesystems like > tmpfs, mqueue, sysfs, proc, ramfs are all quite rare. Blind remounts > are even rare. The normal userspace utilities look at the appropriate > version of /proc/mounts on remount. Bind remounts are the only kind of remounts, because we've never supported do_remount_sb in a user namespace. So, if you want to create some static content in your user namespace, the way to do it is: unshare(CLONE_NEWUSER | CLONE_NEWNS); mount tmpfs somewhere; write to the tmpfs; mount("path to tmpfs", "path to tmpfs", nullptr, MS_REMOUNT | MS_BIND | MS_RDONLY); > > These are not filesystems that a blind remount will likely be applied > to. > > Furthermore there is work underway to prepare patches to allow > "mount --bind -ro" to work as expected. That will further reduce > the pressure from blind remounts. Not for example above. It really does need the remount. > > If there is an actual regression of actual code I am happy to deal > with it. But having the MNT_NODEV on those mounts has been the case > for a long time now and is not new (no regression). This change just > closed the security hole that allowed nodev to be removed. And that > security hole we need to have fixed. Sandstorm does this. (Well, it did until today.) --Andy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 [not found] ` <87sil1nhut.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2014-08-13 4:38 ` Andy Lutomirski @ 2014-08-13 4:45 ` Kenton Varda [not found] ` <CAOP=4widH1rMZ1O=hzAT+M_8exdzRPA8pJ+wH29AQ9L0ogu9nw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 67+ messages in thread From: Kenton Varda @ 2014-08-13 4:45 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux FS Devel, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Linus Torvalds, Linux Kernel Mailing List, Andy Lutomirski On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: > If you can find a userspace application that matters I might care > that a security fix breaks it. > FWIW, it broke Sandstorm.io, but we already pushed a fix, and I'm not sure if you'd say that we "matter". > If there is an actual regression of actual code I am happy to deal > with it. But having the MNT_NODEV on those mounts has been the case > for a long time now and is not new (no regression). This change just > closed the security hole that allowed nodev to be removed. And that > security hole we need to have fixed. > The problem is that users like us had no idea that nodev was being silently added in the first place, and thus didn't know that we needed to specify it in remounts. We create the tmpfs, put some things in it, and then want to remount it read-only for the sandbox. It seems reasonable to expect that a newly-created tmpfs would have exactly the flags I gave it when I created it, not silently get an additional flag that I then need to pass on remount. Note further that it may be very hard for normal developers to figure out why their remount is failing in this case. Andy only discovered the silent nodev by reading the kernel code. -Kenton ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <CAOP=4widH1rMZ1O=hzAT+M_8exdzRPA8pJ+wH29AQ9L0ogu9nw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-13 4:45 ` Kenton Varda @ 2014-08-13 10:24 ` Eric W. Biederman 0 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-13 10:24 UTC (permalink / raw) To: Kenton Varda Cc: Linux FS Devel, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Linus Torvalds, Linux Kernel Mailing List, Andy Lutomirski Kenton Varda <kenton-AuYgBwuPrUQTaNkGU808tA@public.gmane.org> writes: > On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman > <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: > > If you can find a userspace application that matters I might care > > that a security fix breaks it. > > FWIW, it broke Sandstorm.io, but we already pushed a fix, and I'm not > sure if you'd say that we "matter". I should meant to have said: "If you can find a userspace application that breaks I might care that a security fix breaks it" Hearing this actually break something changes this from a theoretical discussion to something real. I have a proposed fix as the end of this email. Could you please test it? If I can get a Tested-by and possibly a Reviewed-by responses I would appreciate it. > The problem is that users like us had no idea that nodev was being > silently added in the first place, and thus didn't know that we needed > to specify it in remounts. Agreed. And frankly that is extremely reasonable and it would in fact break anything doing a traditional tracking of mounts with /etc/mtab. So it is a pretty siginificant discontinuity. I am not prepared to remove the the implicit adding of nodev from the implementation as that would just break your code and probably others code in a different way. What we can do and that will work cleanly long term is make remount match mount and implicitly add nodev. > We create the tmpfs, put some things in it, > and then want to remount it read-only for the sandbox. It seems > reasonable to expect that a newly-created tmpfs would have exactly the > flags I gave it when I created it, not silently get an additional flag > that I then need to pass on remount. A reasonable expectation yes. At the same time it is also a very reasonable security requirement to not allow devices on filesystems that the global root does not control. It is also reasonable that code that does not care should not have to worry about this issue and work the same as the global root or as a container root. Which I think is the reasoning that I went with when adding the implicit nodev. > Note further that it may be very hard for normal developers to figure > out why their remount is failing in this case. Andy only discovered > the silent nodev by reading the kernel code. Agreed. So the simple solution I see with minimal danger and minimal security risk is to remove the inconsitency between mount and remount and add implicitly add MNT_NODEV during remount as well. That removes the minor regression and keeps the code in the paranoid state where device nodes are not honored and not allowed on filesystems mounted by ordinary users. Compared to Andy's suggestion of relaxing the nodev restriction this will also work for filesystems like fuse and nfs when we allow begin to allow unprivileged mounts of those filesystems, and it allows to sleep better knowing the code is extremely paranoid. This reminds me that the dust is settling and the man pages need to be updated for all of these small tweaks to the API that have happened, so there is a good source of documentation for how all of this actually works and why. I am going to be travelling pretty much the rest of the week, so won't be particularly responsive. But if you can get this tested I think this counts as a minor regression fix that solves this issue. Eric From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> Date: Wed, 13 Aug 2014 01:33:38 -0700 Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount Now that remount is properly enforcing the rule that you can't remove nodev at least sandstorm.io is breaking when performing a remount. It turns out that there is an easy intuitive solution implicitly add nodev on remount under the same conditions that we do on mount. Update the regression test for remounts to verify that this new addition does not regress either. Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> --- fs/namespace.c | 9 ++++ .../selftests/mount/unprivileged-remount-test.c | 51 ++++++++++++---------- 2 files changed, 36 insertions(+), 24 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 7886176232c1..0f158300c866 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags, if (path->dentry != path->mnt->mnt_root) return -EINVAL; + /* Only in special cases allow devices from mounts created + * outside the initial user namespace. + */ + if ((mnt->mnt_ns->user_ns != &init_user_ns) && + !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) { + flags |= MS_NODEV; + mnt_flags |= MNT_NODEV; + } + /* Don't allow changing of locked mnt flags. * * No locks need to be held here while testing the various diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c index 1b3ff2fda4d0..6d408c155e0f 100644 --- a/tools/testing/selftests/mount/unprivileged-remount-test.c +++ b/tools/testing/selftests/mount/unprivileged-remount-test.c @@ -118,7 +118,8 @@ static void create_and_enter_userns(void) } static -bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags) +bool test_unpriv_remount(const char *fstype, const char *mount_options, + int mount_flags, int remount_flags, int invalid_flags) { pid_t child; @@ -151,9 +152,11 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags) strerror(errno)); } - if (mount("testing", "/tmp", "ramfs", mount_flags, NULL) != 0) { - die("mount of /tmp failed: %s\n", - strerror(errno)); + if (mount("testing", "/tmp", fstype, mount_flags, mount_options) != 0) { + die("mount of %s with options '%s' on /tmp failed: %s\n", + fstype, + mount_options? mount_options : "", + strerror(errno)); } create_and_enter_userns(); @@ -181,60 +184,60 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags) static bool test_unpriv_remount_simple(int mount_flags) { - return test_unpriv_remount(mount_flags, mount_flags, 0); + return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, 0); } static bool test_unpriv_remount_atime(int mount_flags, int invalid_flags) { - return test_unpriv_remount(mount_flags, mount_flags, invalid_flags); + return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, + invalid_flags); } int main(int argc, char **argv) { - if (!test_unpriv_remount_simple(MS_RDONLY|MS_NODEV)) { + if (!test_unpriv_remount_simple(MS_RDONLY)) { die("MS_RDONLY malfunctions\n"); } - if (!test_unpriv_remount_simple(MS_NODEV)) { + if (!test_unpriv_remount("devpts", "newinstance", MS_NODEV, MS_NODEV, 0)) { die("MS_NODEV malfunctions\n"); } - if (!test_unpriv_remount_simple(MS_NOSUID|MS_NODEV)) { + if (!test_unpriv_remount_simple(MS_NOSUID)) { die("MS_NOSUID malfunctions\n"); } - if (!test_unpriv_remount_simple(MS_NOEXEC|MS_NODEV)) { + if (!test_unpriv_remount_simple(MS_NOEXEC)) { die("MS_NOEXEC malfunctions\n"); } - if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODEV, - MS_NOATIME|MS_NODEV)) + if (!test_unpriv_remount_atime(MS_RELATIME, + MS_NOATIME)) { die("MS_RELATIME malfunctions\n"); } - if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODEV, - MS_NOATIME|MS_NODEV)) + if (!test_unpriv_remount_atime(MS_STRICTATIME, + MS_NOATIME)) { die("MS_STRICTATIME malfunctions\n"); } - if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODEV, - MS_STRICTATIME|MS_NODEV)) + if (!test_unpriv_remount_atime(MS_NOATIME, + MS_STRICTATIME)) { die("MS_RELATIME malfunctions\n"); } - if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME|MS_NODEV, - MS_NOATIME|MS_NODEV)) + if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME, + MS_NOATIME)) { die("MS_RELATIME malfunctions\n"); } - if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME|MS_NODEV, - MS_NOATIME|MS_NODEV)) + if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME, + MS_NOATIME)) { die("MS_RELATIME malfunctions\n"); } - if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME|MS_NODEV, - MS_STRICTATIME|MS_NODEV)) + if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME, + MS_STRICTATIME)) { die("MS_RELATIME malfunctions\n"); } - if (!test_unpriv_remount(MS_STRICTATIME|MS_NODEV, MS_NODEV, - MS_NOATIME|MS_NODEV)) + if (!test_unpriv_remount("ramfs", NULL, MS_STRICTATIME, 0, MS_NOATIME)) { die("Default atime malfunctions\n"); } -- 1.9.1 ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-13 10:24 ` Eric W. Biederman 0 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-13 10:24 UTC (permalink / raw) To: Kenton Varda Cc: Andy Lutomirski, Linus Torvalds, Linux FS Devel, containers, Linux Kernel Mailing List Kenton Varda <kenton@sandstorm.io> writes: > On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman > <ebiederm@xmission.com> wrote: > > If you can find a userspace application that matters I might care > > that a security fix breaks it. > > FWIW, it broke Sandstorm.io, but we already pushed a fix, and I'm not > sure if you'd say that we "matter". I should meant to have said: "If you can find a userspace application that breaks I might care that a security fix breaks it" Hearing this actually break something changes this from a theoretical discussion to something real. I have a proposed fix as the end of this email. Could you please test it? If I can get a Tested-by and possibly a Reviewed-by responses I would appreciate it. > The problem is that users like us had no idea that nodev was being > silently added in the first place, and thus didn't know that we needed > to specify it in remounts. Agreed. And frankly that is extremely reasonable and it would in fact break anything doing a traditional tracking of mounts with /etc/mtab. So it is a pretty siginificant discontinuity. I am not prepared to remove the the implicit adding of nodev from the implementation as that would just break your code and probably others code in a different way. What we can do and that will work cleanly long term is make remount match mount and implicitly add nodev. > We create the tmpfs, put some things in it, > and then want to remount it read-only for the sandbox. It seems > reasonable to expect that a newly-created tmpfs would have exactly the > flags I gave it when I created it, not silently get an additional flag > that I then need to pass on remount. A reasonable expectation yes. At the same time it is also a very reasonable security requirement to not allow devices on filesystems that the global root does not control. It is also reasonable that code that does not care should not have to worry about this issue and work the same as the global root or as a container root. Which I think is the reasoning that I went with when adding the implicit nodev. > Note further that it may be very hard for normal developers to figure > out why their remount is failing in this case. Andy only discovered > the silent nodev by reading the kernel code. Agreed. So the simple solution I see with minimal danger and minimal security risk is to remove the inconsitency between mount and remount and add implicitly add MNT_NODEV during remount as well. That removes the minor regression and keeps the code in the paranoid state where device nodes are not honored and not allowed on filesystems mounted by ordinary users. Compared to Andy's suggestion of relaxing the nodev restriction this will also work for filesystems like fuse and nfs when we allow begin to allow unprivileged mounts of those filesystems, and it allows to sleep better knowing the code is extremely paranoid. This reminds me that the dust is settling and the man pages need to be updated for all of these small tweaks to the API that have happened, so there is a good source of documentation for how all of this actually works and why. I am going to be travelling pretty much the rest of the week, so won't be particularly responsive. But if you can get this tested I think this counts as a minor regression fix that solves this issue. Eric From: "Eric W. Biederman" <ebiederm@xmission.com> Date: Wed, 13 Aug 2014 01:33:38 -0700 Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount Now that remount is properly enforcing the rule that you can't remove nodev at least sandstorm.io is breaking when performing a remount. It turns out that there is an easy intuitive solution implicitly add nodev on remount under the same conditions that we do on mount. Update the regression test for remounts to verify that this new addition does not regress either. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/namespace.c | 9 ++++ .../selftests/mount/unprivileged-remount-test.c | 51 ++++++++++++---------- 2 files changed, 36 insertions(+), 24 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 7886176232c1..0f158300c866 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags, if (path->dentry != path->mnt->mnt_root) return -EINVAL; + /* Only in special cases allow devices from mounts created + * outside the initial user namespace. + */ + if ((mnt->mnt_ns->user_ns != &init_user_ns) && + !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) { + flags |= MS_NODEV; + mnt_flags |= MNT_NODEV; + } + /* Don't allow changing of locked mnt flags. * * No locks need to be held here while testing the various diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c index 1b3ff2fda4d0..6d408c155e0f 100644 --- a/tools/testing/selftests/mount/unprivileged-remount-test.c +++ b/tools/testing/selftests/mount/unprivileged-remount-test.c @@ -118,7 +118,8 @@ static void create_and_enter_userns(void) } static -bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags) +bool test_unpriv_remount(const char *fstype, const char *mount_options, + int mount_flags, int remount_flags, int invalid_flags) { pid_t child; @@ -151,9 +152,11 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags) strerror(errno)); } - if (mount("testing", "/tmp", "ramfs", mount_flags, NULL) != 0) { - die("mount of /tmp failed: %s\n", - strerror(errno)); + if (mount("testing", "/tmp", fstype, mount_flags, mount_options) != 0) { + die("mount of %s with options '%s' on /tmp failed: %s\n", + fstype, + mount_options? mount_options : "", + strerror(errno)); } create_and_enter_userns(); @@ -181,60 +184,60 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags) static bool test_unpriv_remount_simple(int mount_flags) { - return test_unpriv_remount(mount_flags, mount_flags, 0); + return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, 0); } static bool test_unpriv_remount_atime(int mount_flags, int invalid_flags) { - return test_unpriv_remount(mount_flags, mount_flags, invalid_flags); + return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, + invalid_flags); } int main(int argc, char **argv) { - if (!test_unpriv_remount_simple(MS_RDONLY|MS_NODEV)) { + if (!test_unpriv_remount_simple(MS_RDONLY)) { die("MS_RDONLY malfunctions\n"); } - if (!test_unpriv_remount_simple(MS_NODEV)) { + if (!test_unpriv_remount("devpts", "newinstance", MS_NODEV, MS_NODEV, 0)) { die("MS_NODEV malfunctions\n"); } - if (!test_unpriv_remount_simple(MS_NOSUID|MS_NODEV)) { + if (!test_unpriv_remount_simple(MS_NOSUID)) { die("MS_NOSUID malfunctions\n"); } - if (!test_unpriv_remount_simple(MS_NOEXEC|MS_NODEV)) { + if (!test_unpriv_remount_simple(MS_NOEXEC)) { die("MS_NOEXEC malfunctions\n"); } - if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODEV, - MS_NOATIME|MS_NODEV)) + if (!test_unpriv_remount_atime(MS_RELATIME, + MS_NOATIME)) { die("MS_RELATIME malfunctions\n"); } - if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODEV, - MS_NOATIME|MS_NODEV)) + if (!test_unpriv_remount_atime(MS_STRICTATIME, + MS_NOATIME)) { die("MS_STRICTATIME malfunctions\n"); } - if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODEV, - MS_STRICTATIME|MS_NODEV)) + if (!test_unpriv_remount_atime(MS_NOATIME, + MS_STRICTATIME)) { die("MS_RELATIME malfunctions\n"); } - if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME|MS_NODEV, - MS_NOATIME|MS_NODEV)) + if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME, + MS_NOATIME)) { die("MS_RELATIME malfunctions\n"); } - if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME|MS_NODEV, - MS_NOATIME|MS_NODEV)) + if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME, + MS_NOATIME)) { die("MS_RELATIME malfunctions\n"); } - if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME|MS_NODEV, - MS_STRICTATIME|MS_NODEV)) + if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME, + MS_STRICTATIME)) { die("MS_RELATIME malfunctions\n"); } - if (!test_unpriv_remount(MS_STRICTATIME|MS_NODEV, MS_NODEV, - MS_NOATIME|MS_NODEV)) + if (!test_unpriv_remount("ramfs", NULL, MS_STRICTATIME, 0, MS_NOATIME)) { die("Default atime malfunctions\n"); } -- 1.9.1 ^ permalink raw reply related [flat|nested] 67+ messages in thread
[parent not found: <87tx5ghekp.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-13 10:24 ` Eric W. Biederman @ 2014-08-13 17:03 ` Andy Lutomirski -1 siblings, 0 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-13 17:03 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux FS Devel, Kenton Varda, Linux Containers, Linus Torvalds, Linux Kernel Mailing List On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: > Kenton Varda <kenton-AuYgBwuPrUQTaNkGU808tA@public.gmane.org> writes: > >> On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman > I have a proposed fix as the end of this email. Could you please > test it? > > If I can get a Tested-by and possibly a Reviewed-by responses I would > appreciate it. Will test later today. > >> The problem is that users like us had no idea that nodev was being >> silently added in the first place, and thus didn't know that we needed >> to specify it in remounts. > > Agreed. And frankly that is extremely reasonable and it would in fact > break anything doing a traditional tracking of mounts with /etc/mtab. > So it is a pretty siginificant discontinuity. > > I am not prepared to remove the the implicit adding of nodev from the > implementation as that would just break your code and probably others > code in a different way. I'm having trouble figuring out what would break. The only way to end up with a device node on one of these filesystems would be to pass an fd to the file system out of the userns and have some globally privileged code put the device node there. > > What we can do and that will work cleanly long term is make remount > match mount and implicitly add nodev. > >> We create the tmpfs, put some things in it, >> and then want to remount it read-only for the sandbox. It seems >> reasonable to expect that a newly-created tmpfs would have exactly the >> flags I gave it when I created it, not silently get an additional flag >> that I then need to pass on remount. > > A reasonable expectation yes. At the same time it is also a very > reasonable security requirement to not allow devices on filesystems > that the global root does not control. But the magic nodev isn't needed for that; the checks in mknod are the right place for this -- they're sufficient, and they're required anyway. > > Compared to Andy's suggestion of relaxing the nodev restriction this > will also work for filesystems like fuse and nfs when we allow begin to > allow unprivileged mounts of those filesystems, and it allows to sleep > better knowing the code is extremely paranoid. If we allow fuse and nfs, we need to deal with nosuid, too. I want to avoid using nosuid for this; I want to change the suid logic to check the namespace, since I think that suid *should* work in a namespace. Also, that implicit nodev may be a problem down the road if we ever add device namespaces. ... > > From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> > Date: Wed, 13 Aug 2014 01:33:38 -0700 > Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount > > Now that remount is properly enforcing the rule that you can't > remove nodev at least sandstorm.io is breaking when performing > a remount. > > It turns out that there is an easy intuitive solution implicitly > add nodev on remount under the same conditions that we do on mount. > > Update the regression test for remounts to verify that this new > addition does not regress either. > > Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> > --- > fs/namespace.c | 9 ++++ > .../selftests/mount/unprivileged-remount-test.c | 51 ++++++++++++---------- > 2 files changed, 36 insertions(+), 24 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 7886176232c1..0f158300c866 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags, > if (path->dentry != path->mnt->mnt_root) > return -EINVAL; > > + /* Only in special cases allow devices from mounts created > + * outside the initial user namespace. I don't like this comment at all. If we stick with this approach, I think that this should say: Certain mounts are forced to have MNT_NODEV set. For these mounts, user code might not realize that nodev is set, so force MS_NODEV on remount attempts to prevent confusing remount failures. > + */ > + if ((mnt->mnt_ns->user_ns != &init_user_ns) && > + !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) { > + flags |= MS_NODEV; > + mnt_flags |= MNT_NODEV; > + } > + > /* Don't allow changing of locked mnt flags. > * > * No locks need to be held here while testing the various I will still advocate a different and much less magical approach. I'll send a patch later today. --Andy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-13 17:03 ` Andy Lutomirski 0 siblings, 0 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-13 17:03 UTC (permalink / raw) To: Eric W. Biederman Cc: Kenton Varda, Linus Torvalds, Linux FS Devel, Linux Containers, Linux Kernel Mailing List On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Kenton Varda <kenton@sandstorm.io> writes: > >> On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman > I have a proposed fix as the end of this email. Could you please > test it? > > If I can get a Tested-by and possibly a Reviewed-by responses I would > appreciate it. Will test later today. > >> The problem is that users like us had no idea that nodev was being >> silently added in the first place, and thus didn't know that we needed >> to specify it in remounts. > > Agreed. And frankly that is extremely reasonable and it would in fact > break anything doing a traditional tracking of mounts with /etc/mtab. > So it is a pretty siginificant discontinuity. > > I am not prepared to remove the the implicit adding of nodev from the > implementation as that would just break your code and probably others > code in a different way. I'm having trouble figuring out what would break. The only way to end up with a device node on one of these filesystems would be to pass an fd to the file system out of the userns and have some globally privileged code put the device node there. > > What we can do and that will work cleanly long term is make remount > match mount and implicitly add nodev. > >> We create the tmpfs, put some things in it, >> and then want to remount it read-only for the sandbox. It seems >> reasonable to expect that a newly-created tmpfs would have exactly the >> flags I gave it when I created it, not silently get an additional flag >> that I then need to pass on remount. > > A reasonable expectation yes. At the same time it is also a very > reasonable security requirement to not allow devices on filesystems > that the global root does not control. But the magic nodev isn't needed for that; the checks in mknod are the right place for this -- they're sufficient, and they're required anyway. > > Compared to Andy's suggestion of relaxing the nodev restriction this > will also work for filesystems like fuse and nfs when we allow begin to > allow unprivileged mounts of those filesystems, and it allows to sleep > better knowing the code is extremely paranoid. If we allow fuse and nfs, we need to deal with nosuid, too. I want to avoid using nosuid for this; I want to change the suid logic to check the namespace, since I think that suid *should* work in a namespace. Also, that implicit nodev may be a problem down the road if we ever add device namespaces. ... > > From: "Eric W. Biederman" <ebiederm@xmission.com> > Date: Wed, 13 Aug 2014 01:33:38 -0700 > Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount > > Now that remount is properly enforcing the rule that you can't > remove nodev at least sandstorm.io is breaking when performing > a remount. > > It turns out that there is an easy intuitive solution implicitly > add nodev on remount under the same conditions that we do on mount. > > Update the regression test for remounts to verify that this new > addition does not regress either. > > Cc: stable@vger.kernel.org > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > fs/namespace.c | 9 ++++ > .../selftests/mount/unprivileged-remount-test.c | 51 ++++++++++++---------- > 2 files changed, 36 insertions(+), 24 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 7886176232c1..0f158300c866 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags, > if (path->dentry != path->mnt->mnt_root) > return -EINVAL; > > + /* Only in special cases allow devices from mounts created > + * outside the initial user namespace. I don't like this comment at all. If we stick with this approach, I think that this should say: Certain mounts are forced to have MNT_NODEV set. For these mounts, user code might not realize that nodev is set, so force MS_NODEV on remount attempts to prevent confusing remount failures. > + */ > + if ((mnt->mnt_ns->user_ns != &init_user_ns) && > + !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) { > + flags |= MS_NODEV; > + mnt_flags |= MNT_NODEV; > + } > + > /* Don't allow changing of locked mnt flags. > * > * No locks need to be held here while testing the various I will still advocate a different and much less magical approach. I'll send a patch later today. --Andy ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <CALCETrWT_p1-5nkiAjWoeta19fkO3rDiJe9_mhRVqF8x1zXv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* [PATCH] fs: Remove implicit nodev for new mounts in non-root userns [not found] ` <CALCETrWT_p1-5nkiAjWoeta19fkO3rDiJe9_mhRVqF8x1zXv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-08-14 0:03 ` Andy Lutomirski 0 siblings, 0 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-14 0:03 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski, Kenton Varda, Linux FS Devel, Linus Torvalds Currently, creating a new mount (as opposed to bindmount) in a non-root userns will implicitly set nodev unless the fs is devpts. Something like this will be necessary for file systems that allow the mounter to create device nodes without using mknod (e.g. FUSE if/when that is allowed), but none of the currently allowed filesystems do this. Implicitly adding nodev is problematic, though. It will make it unsafe to ever remove the implicit addition, since userspace might start to rely on it. This fixes a minor regression in: 9566d6742852 mnt: Correct permission checks in do_remount Prior to that commit, MNT_NODEV wasn't enforced for remounts, so there is existing user code that creates a new mount in a userns without MS_NODEV and then expects a remount with matching options to work. That commit broke code that does this. Fortunately, since the implicit nodev has no effect on any existing filesystems, we can still safely remove it. This replaces the implicit nodev with an explicit nodev requirement: anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get -EPERM unless they set nodev. If they set nodev, that setting will be locked. As an added benefit, if anything like device namespaces is ever added, then user code will be able to opt out of nodev by clearing nodev. To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all FS_USERNS_MOUNT filesystems. All of the current filesystems with FS_USERNS_MOUNT set are safe. I confirmed that this is compatible with Sandstorm's revision 1bf0c4847b. That revision of Sandstorm does not work without this fix if 9566d6742852 is applied. Cc: Kenton Varda <kenton-AuYgBwuPrUQTaNkGU808tA@public.gmane.org> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Signed-off-by: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> --- fs/namespace.c | 16 ++++++++++++---- fs/proc/root.c | 2 +- fs/ramfs/inode.c | 2 +- fs/sysfs/mount.c | 2 +- ipc/mqueue.c | 2 +- mm/shmem.c | 4 ++-- 6 files changed, 18 insertions(+), 10 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 0acabea58319..835fa9e8307e 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags, put_filesystem(type); return -EPERM; } - /* Only in special cases allow devices from mounts - * created outside the initial user namespace. + + /* + * If a filesystem might allow the mounter to put + * device nodes on it without the checks in mknod, + * then require MS_NODEV to mount it. */ if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) { - flags |= MS_NODEV; - mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV; + if (!(mnt_flags & MNT_NODEV)) { + put_filesystem(type); + return -EPERM; + } + + /* Do not allow nodev to be cleared. */ + mnt_flags |= MNT_LOCK_NODEV; } } diff --git a/fs/proc/root.c b/fs/proc/root.c index 094e44d4a6be..2313b280729e 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = { .name = "proc", .mount = proc_mount, .kill_sb = proc_kill_sb, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, }; void __init proc_root_init(void) diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c index d365b1c4eb3c..b95b7302d4cc 100644 --- a/fs/ramfs/inode.c +++ b/fs/ramfs/inode.c @@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = { .name = "ramfs", .mount = ramfs_mount, .kill_sb = ramfs_kill_sb, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, }; int __init init_ramfs_fs(void) diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c index 8a49486bf30c..56ba59317e24 100644 --- a/fs/sysfs/mount.c +++ b/fs/sysfs/mount.c @@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = { .name = "sysfs", .mount = sysfs_mount, .kill_sb = sysfs_kill_sb, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, }; int __init sysfs_init(void) diff --git a/ipc/mqueue.c b/ipc/mqueue.c index 4fcf39af1776..56abbc848d4c 100644 --- a/ipc/mqueue.c +++ b/ipc/mqueue.c @@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = { .name = "mqueue", .mount = mqueue_mount, .kill_sb = kill_litter_super, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, }; int mq_init_ns(struct ipc_namespace *ns) diff --git a/mm/shmem.c b/mm/shmem.c index a42add14331c..f4a708a8f9e3 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = { .name = "tmpfs", .mount = shmem_mount, .kill_sb = kill_litter_super, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, }; int __init shmem_init(void) @@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = { .name = "tmpfs", .mount = ramfs_mount, .kill_sb = kill_litter_super, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, }; int __init shmem_init(void) -- 1.9.3 ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH] fs: Remove implicit nodev for new mounts in non-root userns 2014-08-13 17:03 ` Andy Lutomirski (?) (?) @ 2014-08-14 0:03 ` Andy Lutomirski 2014-08-15 19:05 ` Serge Hallyn ` (3 more replies) -1 siblings, 4 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-14 0:03 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-kernel, Linux Containers, Linux FS Devel, Linus Torvalds, Serge Hallyn, Andy Lutomirski, Kenton Varda, stable Currently, creating a new mount (as opposed to bindmount) in a non-root userns will implicitly set nodev unless the fs is devpts. Something like this will be necessary for file systems that allow the mounter to create device nodes without using mknod (e.g. FUSE if/when that is allowed), but none of the currently allowed filesystems do this. Implicitly adding nodev is problematic, though. It will make it unsafe to ever remove the implicit addition, since userspace might start to rely on it. This fixes a minor regression in: 9566d6742852 mnt: Correct permission checks in do_remount Prior to that commit, MNT_NODEV wasn't enforced for remounts, so there is existing user code that creates a new mount in a userns without MS_NODEV and then expects a remount with matching options to work. That commit broke code that does this. Fortunately, since the implicit nodev has no effect on any existing filesystems, we can still safely remove it. This replaces the implicit nodev with an explicit nodev requirement: anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get -EPERM unless they set nodev. If they set nodev, that setting will be locked. As an added benefit, if anything like device namespaces is ever added, then user code will be able to opt out of nodev by clearing nodev. To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all FS_USERNS_MOUNT filesystems. All of the current filesystems with FS_USERNS_MOUNT set are safe. I confirmed that this is compatible with Sandstorm's revision 1bf0c4847b. That revision of Sandstorm does not work without this fix if 9566d6742852 is applied. Cc: Kenton Varda <kenton@sandstorm.io> Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> --- fs/namespace.c | 16 ++++++++++++---- fs/proc/root.c | 2 +- fs/ramfs/inode.c | 2 +- fs/sysfs/mount.c | 2 +- ipc/mqueue.c | 2 +- mm/shmem.c | 4 ++-- 6 files changed, 18 insertions(+), 10 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 0acabea58319..835fa9e8307e 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags, put_filesystem(type); return -EPERM; } - /* Only in special cases allow devices from mounts - * created outside the initial user namespace. + + /* + * If a filesystem might allow the mounter to put + * device nodes on it without the checks in mknod, + * then require MS_NODEV to mount it. */ if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) { - flags |= MS_NODEV; - mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV; + if (!(mnt_flags & MNT_NODEV)) { + put_filesystem(type); + return -EPERM; + } + + /* Do not allow nodev to be cleared. */ + mnt_flags |= MNT_LOCK_NODEV; } } diff --git a/fs/proc/root.c b/fs/proc/root.c index 094e44d4a6be..2313b280729e 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = { .name = "proc", .mount = proc_mount, .kill_sb = proc_kill_sb, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, }; void __init proc_root_init(void) diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c index d365b1c4eb3c..b95b7302d4cc 100644 --- a/fs/ramfs/inode.c +++ b/fs/ramfs/inode.c @@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = { .name = "ramfs", .mount = ramfs_mount, .kill_sb = ramfs_kill_sb, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, }; int __init init_ramfs_fs(void) diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c index 8a49486bf30c..56ba59317e24 100644 --- a/fs/sysfs/mount.c +++ b/fs/sysfs/mount.c @@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = { .name = "sysfs", .mount = sysfs_mount, .kill_sb = sysfs_kill_sb, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, }; int __init sysfs_init(void) diff --git a/ipc/mqueue.c b/ipc/mqueue.c index 4fcf39af1776..56abbc848d4c 100644 --- a/ipc/mqueue.c +++ b/ipc/mqueue.c @@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = { .name = "mqueue", .mount = mqueue_mount, .kill_sb = kill_litter_super, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, }; int mq_init_ns(struct ipc_namespace *ns) diff --git a/mm/shmem.c b/mm/shmem.c index a42add14331c..f4a708a8f9e3 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = { .name = "tmpfs", .mount = shmem_mount, .kill_sb = kill_litter_super, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, }; int __init shmem_init(void) @@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = { .name = "tmpfs", .mount = ramfs_mount, .kill_sb = kill_litter_super, - .fs_flags = FS_USERNS_MOUNT, + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, }; int __init shmem_init(void) -- 1.9.3 ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns 2014-08-14 0:03 ` Andy Lutomirski @ 2014-08-15 19:05 ` Serge Hallyn 2014-08-15 19:16 ` Andy Lutomirski [not found] ` <2686c32f00b14148379e8cfee9c028c794d4aa1a.1407974494.git.luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> ` (2 subsequent siblings) 3 siblings, 1 reply; 67+ messages in thread From: Serge Hallyn @ 2014-08-15 19:05 UTC (permalink / raw) To: Andy Lutomirski Cc: Eric W. Biederman, linux-kernel, Linux Containers, Linux FS Devel, Linus Torvalds, Kenton Varda, stable Quoting Andy Lutomirski (luto@amacapital.net): > Currently, creating a new mount (as opposed to bindmount) in a > non-root userns will implicitly set nodev unless the fs is devpts. > Something like this will be necessary for file systems that allow > the mounter to create device nodes without using mknod (e.g. FUSE > if/when that is allowed), but none of the currently allowed > filesystems do this. Hi, Sorry, I'm probably thinking stupidly, but I don't see this restriction being the case serge@sl:~$ mount | grep tmp [...] tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) serge@sl:~$ sudo mknod /run/kvm c 10 232 [sudo] password for serge: serge@sl:~$ echo $? 0 serge@sl:~$ ls -l /run/kvm crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm But you seem to be saying I shouldn't be allowed to create a device inside a tmpfs. What am I overlooking? > Implicitly adding nodev is problematic, though. It will make it > unsafe to ever remove the implicit addition, since userspace might > start to rely on it. > > This fixes a minor regression in: > > 9566d6742852 mnt: Correct permission checks in do_remount > > Prior to that commit, MNT_NODEV wasn't enforced for remounts, so > there is existing user code that creates a new mount in a userns > without MS_NODEV and then expects a remount with matching options to > work. That commit broke code that does this. > > Fortunately, since the implicit nodev has no effect on any existing > filesystems, we can still safely remove it. > > This replaces the implicit nodev with an explicit nodev requirement: > anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get > -EPERM unless they set nodev. If they set nodev, that setting will > be locked. > > As an added benefit, if anything like device namespaces is ever > added, then user code will be able to opt out of nodev by clearing > nodev. > > To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all > FS_USERNS_MOUNT filesystems. All of the current filesystems with > FS_USERNS_MOUNT set are safe. > > I confirmed that this is compatible with Sandstorm's revision > 1bf0c4847b. That revision of Sandstorm does not work without this > fix if 9566d6742852 is applied. > > Cc: Kenton Varda <kenton@sandstorm.io> > Cc: stable@vger.kernel.org > Signed-off-by: Andy Lutomirski <luto@amacapital.net> > --- > fs/namespace.c | 16 ++++++++++++---- > fs/proc/root.c | 2 +- > fs/ramfs/inode.c | 2 +- > fs/sysfs/mount.c | 2 +- > ipc/mqueue.c | 2 +- > mm/shmem.c | 4 ++-- > 6 files changed, 18 insertions(+), 10 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 0acabea58319..835fa9e8307e 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags, > put_filesystem(type); > return -EPERM; > } > - /* Only in special cases allow devices from mounts > - * created outside the initial user namespace. > + > + /* > + * If a filesystem might allow the mounter to put > + * device nodes on it without the checks in mknod, > + * then require MS_NODEV to mount it. > */ > if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) { > - flags |= MS_NODEV; > - mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV; > + if (!(mnt_flags & MNT_NODEV)) { > + put_filesystem(type); > + return -EPERM; > + } > + > + /* Do not allow nodev to be cleared. */ > + mnt_flags |= MNT_LOCK_NODEV; > } > } > > diff --git a/fs/proc/root.c b/fs/proc/root.c > index 094e44d4a6be..2313b280729e 100644 > --- a/fs/proc/root.c > +++ b/fs/proc/root.c > @@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = { > .name = "proc", > .mount = proc_mount, > .kill_sb = proc_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > void __init proc_root_init(void) > diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c > index d365b1c4eb3c..b95b7302d4cc 100644 > --- a/fs/ramfs/inode.c > +++ b/fs/ramfs/inode.c > @@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = { > .name = "ramfs", > .mount = ramfs_mount, > .kill_sb = ramfs_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init init_ramfs_fs(void) > diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c > index 8a49486bf30c..56ba59317e24 100644 > --- a/fs/sysfs/mount.c > +++ b/fs/sysfs/mount.c > @@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = { > .name = "sysfs", > .mount = sysfs_mount, > .kill_sb = sysfs_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init sysfs_init(void) > diff --git a/ipc/mqueue.c b/ipc/mqueue.c > index 4fcf39af1776..56abbc848d4c 100644 > --- a/ipc/mqueue.c > +++ b/ipc/mqueue.c > @@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = { > .name = "mqueue", > .mount = mqueue_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int mq_init_ns(struct ipc_namespace *ns) > diff --git a/mm/shmem.c b/mm/shmem.c > index a42add14331c..f4a708a8f9e3 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = { > .name = "tmpfs", > .mount = shmem_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init shmem_init(void) > @@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = { > .name = "tmpfs", > .mount = ramfs_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init shmem_init(void) > -- > 1.9.3 > ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns 2014-08-15 19:05 ` Serge Hallyn @ 2014-08-15 19:16 ` Andy Lutomirski 0 siblings, 0 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-15 19:16 UTC (permalink / raw) To: Serge Hallyn Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, stable, Kenton Varda, Eric W. Biederman, Linux FS Devel, Linus Torvalds On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote: > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): >> Currently, creating a new mount (as opposed to bindmount) in a >> non-root userns will implicitly set nodev unless the fs is devpts. >> Something like this will be necessary for file systems that allow >> the mounter to create device nodes without using mknod (e.g. FUSE >> if/when that is allowed), but none of the currently allowed >> filesystems do this. > > Hi, > > Sorry, I'm probably thinking stupidly, but I don't see this restriction > being the case > > serge@sl:~$ mount | grep tmp > [...] > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) > serge@sl:~$ sudo mknod /run/kvm c 10 232 > [sudo] password for serge: > serge@sl:~$ echo $? > 0 > serge@sl:~$ ls -l /run/kvm > crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm > > But you seem to be saying I shouldn't be allowed to create a device inside > a tmpfs. What am I overlooking? I assume you're in the root userns. This patch is unnecessary, and has no effect, if you're in the root userns. The code in Sandstorm that's currently broken in Linus' tree runs in a new userns with a matching mount ns. It does (copied verbatim): KJ_SYSCALL(mount("sandstorm-dev", "dev", "tmpfs", MS_NOSUID | MS_NOEXEC, "size=1m,nr_inodes=16,mode=755")); makeCharDeviceNode("null", "null", 1, 3); makeCharDeviceNode("zero", "zero", 1, 5); makeCharDeviceNode("random", "urandom", 1, 9); makeCharDeviceNode("urandom", "urandom", 1, 9); KJ_SYSCALL(mount("dev", "dev", nullptr, MS_REMOUNT | MS_BIND | MS_NOSUID | MS_NOEXEC | MS_RDONLY, nullptr)); makeCharDeviceNode is a helper that creates an empty file and mounts a device node over it. This code needs the fs to be read/write, but Sandstorm wants to make /dev read-only when it's done. In Linus' tree, the remount fails with -EPERM because the mount is secretly nodev. It was always secretly nodev, but no one noticed because of CVE-2014-5207, which caused that remount to succeed. (Yay for programs that inadvertently exploited a serious security vulnerability for their normal function.) --Andy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns @ 2014-08-15 19:16 ` Andy Lutomirski 0 siblings, 0 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-15 19:16 UTC (permalink / raw) To: Serge Hallyn Cc: Eric W. Biederman, linux-kernel, Linux Containers, Linux FS Devel, Linus Torvalds, Kenton Varda, stable On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote: > Quoting Andy Lutomirski (luto@amacapital.net): >> Currently, creating a new mount (as opposed to bindmount) in a >> non-root userns will implicitly set nodev unless the fs is devpts. >> Something like this will be necessary for file systems that allow >> the mounter to create device nodes without using mknod (e.g. FUSE >> if/when that is allowed), but none of the currently allowed >> filesystems do this. > > Hi, > > Sorry, I'm probably thinking stupidly, but I don't see this restriction > being the case > > serge@sl:~$ mount | grep tmp > [...] > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) > serge@sl:~$ sudo mknod /run/kvm c 10 232 > [sudo] password for serge: > serge@sl:~$ echo $? > 0 > serge@sl:~$ ls -l /run/kvm > crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm > > But you seem to be saying I shouldn't be allowed to create a device inside > a tmpfs. What am I overlooking? I assume you're in the root userns. This patch is unnecessary, and has no effect, if you're in the root userns. The code in Sandstorm that's currently broken in Linus' tree runs in a new userns with a matching mount ns. It does (copied verbatim): KJ_SYSCALL(mount("sandstorm-dev", "dev", "tmpfs", MS_NOSUID | MS_NOEXEC, "size=1m,nr_inodes=16,mode=755")); makeCharDeviceNode("null", "null", 1, 3); makeCharDeviceNode("zero", "zero", 1, 5); makeCharDeviceNode("random", "urandom", 1, 9); makeCharDeviceNode("urandom", "urandom", 1, 9); KJ_SYSCALL(mount("dev", "dev", nullptr, MS_REMOUNT | MS_BIND | MS_NOSUID | MS_NOEXEC | MS_RDONLY, nullptr)); makeCharDeviceNode is a helper that creates an empty file and mounts a device node over it. This code needs the fs to be read/write, but Sandstorm wants to make /dev read-only when it's done. In Linus' tree, the remount fails with -EPERM because the mount is secretly nodev. It was always secretly nodev, but no one noticed because of CVE-2014-5207, which caused that remount to succeed. (Yay for programs that inadvertently exploited a serious security vulnerability for their normal function.) --Andy ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <CALCETrVKq1Fxnsd9jKDi5_fcKfCJxBZ1w-zGXD3FR-pF-jLsmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns 2014-08-15 19:16 ` Andy Lutomirski @ 2014-08-15 19:37 ` Serge Hallyn -1 siblings, 0 replies; 67+ messages in thread From: Serge Hallyn @ 2014-08-15 19:37 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, stable, Kenton Varda, Eric W. Biederman, Linux FS Devel, Linus Torvalds Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): > On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote: > > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): > >> Currently, creating a new mount (as opposed to bindmount) in a > >> non-root userns will implicitly set nodev unless the fs is devpts. > >> Something like this will be necessary for file systems that allow > >> the mounter to create device nodes without using mknod (e.g. FUSE > >> if/when that is allowed), but none of the currently allowed > >> filesystems do this. > > > > Hi, > > > > Sorry, I'm probably thinking stupidly, but I don't see this restriction > > being the case > > > > serge@sl:~$ mount | grep tmp > > [...] > > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) > > serge@sl:~$ sudo mknod /run/kvm c 10 232 > > [sudo] password for serge: > > serge@sl:~$ echo $? > > 0 > > serge@sl:~$ ls -l /run/kvm > > crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm > > > > But you seem to be saying I shouldn't be allowed to create a device inside > > a tmpfs. What am I overlooking? > > I assume you're in the root userns. This patch is unnecessary, and > has no effect, if you're in the root userns. Right, but I thought you were justifying adding FS_USERNS_DEV_MOUNT by saying that you cannot mknod in those filesystems. But I see you actually said "without using mknod". I guess I don't understand that caveat. -serge ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns @ 2014-08-15 19:37 ` Serge Hallyn 0 siblings, 0 replies; 67+ messages in thread From: Serge Hallyn @ 2014-08-15 19:37 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, linux-kernel, stable, Kenton Varda, Eric W. Biederman, Linux FS Devel, Linus Torvalds Quoting Andy Lutomirski (luto@amacapital.net): > On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote: > > Quoting Andy Lutomirski (luto@amacapital.net): > >> Currently, creating a new mount (as opposed to bindmount) in a > >> non-root userns will implicitly set nodev unless the fs is devpts. > >> Something like this will be necessary for file systems that allow > >> the mounter to create device nodes without using mknod (e.g. FUSE > >> if/when that is allowed), but none of the currently allowed > >> filesystems do this. > > > > Hi, > > > > Sorry, I'm probably thinking stupidly, but I don't see this restriction > > being the case > > > > serge@sl:~$ mount | grep tmp > > [...] > > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) > > serge@sl:~$ sudo mknod /run/kvm c 10 232 > > [sudo] password for serge: > > serge@sl:~$ echo $? > > 0 > > serge@sl:~$ ls -l /run/kvm > > crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm > > > > But you seem to be saying I shouldn't be allowed to create a device inside > > a tmpfs. What am I overlooking? > > I assume you're in the root userns. This patch is unnecessary, and > has no effect, if you're in the root userns. Right, but I thought you were justifying adding FS_USERNS_DEV_MOUNT by saying that you cannot mknod in those filesystems. But I see you actually said "without using mknod". I guess I don't understand that caveat. -serge ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns 2014-08-15 19:37 ` Serge Hallyn @ 2014-08-15 19:56 ` Andy Lutomirski -1 siblings, 0 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-15 19:56 UTC (permalink / raw) To: Serge Hallyn Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, stable, Kenton Varda, Eric W. Biederman, Linux FS Devel, Linus Torvalds On Fri, Aug 15, 2014 at 12:37 PM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote: > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): >> On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote: >> > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): >> >> Currently, creating a new mount (as opposed to bindmount) in a >> >> non-root userns will implicitly set nodev unless the fs is devpts. >> >> Something like this will be necessary for file systems that allow >> >> the mounter to create device nodes without using mknod (e.g. FUSE >> >> if/when that is allowed), but none of the currently allowed >> >> filesystems do this. >> > >> > Hi, >> > >> > Sorry, I'm probably thinking stupidly, but I don't see this restriction >> > being the case >> > >> > serge@sl:~$ mount | grep tmp >> > [...] >> > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) >> > serge@sl:~$ sudo mknod /run/kvm c 10 232 >> > [sudo] password for serge: >> > serge@sl:~$ echo $? >> > 0 >> > serge@sl:~$ ls -l /run/kvm >> > crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm >> > >> > But you seem to be saying I shouldn't be allowed to create a device inside >> > a tmpfs. What am I overlooking? >> >> I assume you're in the root userns. This patch is unnecessary, and >> has no effect, if you're in the root userns. > > Right, but I thought you were justifying adding FS_USERNS_DEV_MOUNT by saying > that you cannot mknod in those filesystems. But I see you actually said > "without using mknod". I guess I don't understand that caveat. IIUC, there are two ways that a user could put a device node into their filesystem. The obvious way is using mknod. But mknod has its own perfectly valid permission checks, and it doesn't need any special handling at mount time. The less obvious way is to mount a filesystem that already contains a device node or to mount a filesystem that gives some other means of inserting a device node (e.g. a network filesystem or FUSE). Those might allow inserting device nodes without passing a global capability check, so unprivileged users in a userns must not be allowed to mount such a filesystem without MNT_NODEV | MNT_LOCK_NODEV. Fortunately, none of the existing FS_USERNS_MOUNT filesystems have that property. FUSE will, but we don't support FUSE in a userns yet (unfortunately -- it would be a *very* useful feature.) I think that, if we ever allow FUSE in a userns, we should return -EPERM when trying to mount it unless the user specifies MS_NODEV, which is what this patch does. I don't think there's any reason to play complicated games to allow programs to get away with omitting MS_NODEV. --Andy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns @ 2014-08-15 19:56 ` Andy Lutomirski 0 siblings, 0 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-15 19:56 UTC (permalink / raw) To: Serge Hallyn Cc: Linux Containers, linux-kernel, stable, Kenton Varda, Eric W. Biederman, Linux FS Devel, Linus Torvalds On Fri, Aug 15, 2014 at 12:37 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote: > Quoting Andy Lutomirski (luto@amacapital.net): >> On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote: >> > Quoting Andy Lutomirski (luto@amacapital.net): >> >> Currently, creating a new mount (as opposed to bindmount) in a >> >> non-root userns will implicitly set nodev unless the fs is devpts. >> >> Something like this will be necessary for file systems that allow >> >> the mounter to create device nodes without using mknod (e.g. FUSE >> >> if/when that is allowed), but none of the currently allowed >> >> filesystems do this. >> > >> > Hi, >> > >> > Sorry, I'm probably thinking stupidly, but I don't see this restriction >> > being the case >> > >> > serge@sl:~$ mount | grep tmp >> > [...] >> > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) >> > serge@sl:~$ sudo mknod /run/kvm c 10 232 >> > [sudo] password for serge: >> > serge@sl:~$ echo $? >> > 0 >> > serge@sl:~$ ls -l /run/kvm >> > crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm >> > >> > But you seem to be saying I shouldn't be allowed to create a device inside >> > a tmpfs. What am I overlooking? >> >> I assume you're in the root userns. This patch is unnecessary, and >> has no effect, if you're in the root userns. > > Right, but I thought you were justifying adding FS_USERNS_DEV_MOUNT by saying > that you cannot mknod in those filesystems. But I see you actually said > "without using mknod". I guess I don't understand that caveat. IIUC, there are two ways that a user could put a device node into their filesystem. The obvious way is using mknod. But mknod has its own perfectly valid permission checks, and it doesn't need any special handling at mount time. The less obvious way is to mount a filesystem that already contains a device node or to mount a filesystem that gives some other means of inserting a device node (e.g. a network filesystem or FUSE). Those might allow inserting device nodes without passing a global capability check, so unprivileged users in a userns must not be allowed to mount such a filesystem without MNT_NODEV | MNT_LOCK_NODEV. Fortunately, none of the existing FS_USERNS_MOUNT filesystems have that property. FUSE will, but we don't support FUSE in a userns yet (unfortunately -- it would be a *very* useful feature.) I think that, if we ever allow FUSE in a userns, we should return -EPERM when trying to mount it unless the user specifies MS_NODEV, which is what this patch does. I don't think there's any reason to play complicated games to allow programs to get away with omitting MS_NODEV. --Andy ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <CALCETrWB0qBiyfJbapFnjxoNyNvS+aHvgc_eob3fC1j=cv+v5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns 2014-08-15 19:56 ` Andy Lutomirski @ 2014-08-15 20:16 ` Serge Hallyn -1 siblings, 0 replies; 67+ messages in thread From: Serge Hallyn @ 2014-08-15 20:16 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, stable, Kenton Varda, Eric W. Biederman, Linux FS Devel, Linus Torvalds Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): > On Fri, Aug 15, 2014 at 12:37 PM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote: > > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): > >> On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote: > >> > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): > >> >> Currently, creating a new mount (as opposed to bindmount) in a > >> >> non-root userns will implicitly set nodev unless the fs is devpts. > >> >> Something like this will be necessary for file systems that allow > >> >> the mounter to create device nodes without using mknod (e.g. FUSE > >> >> if/when that is allowed), but none of the currently allowed > >> >> filesystems do this. > >> > > >> > Hi, > >> > > >> > Sorry, I'm probably thinking stupidly, but I don't see this restriction > >> > being the case > >> > > >> > serge@sl:~$ mount | grep tmp > >> > [...] > >> > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) > >> > serge@sl:~$ sudo mknod /run/kvm c 10 232 > >> > [sudo] password for serge: > >> > serge@sl:~$ echo $? > >> > 0 > >> > serge@sl:~$ ls -l /run/kvm > >> > crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm > >> > > >> > But you seem to be saying I shouldn't be allowed to create a device inside > >> > a tmpfs. What am I overlooking? > >> > >> I assume you're in the root userns. This patch is unnecessary, and > >> has no effect, if you're in the root userns. > > > > Right, but I thought you were justifying adding FS_USERNS_DEV_MOUNT by saying > > that you cannot mknod in those filesystems. But I see you actually said > > "without using mknod". I guess I don't understand that caveat. > > IIUC, there are two ways that a user could put a device node into > their filesystem. > > The obvious way is using mknod. But mknod has its own perfectly valid > permission checks, and it doesn't need any special handling at mount > time. > > The less obvious way is to mount a filesystem that already contains a > device node or to mount a filesystem that gives some other means of > inserting a device node (e.g. a network filesystem or FUSE). Those > might allow inserting device nodes without passing a global capability > check, so unprivileged users in a userns must not be allowed to mount > such a filesystem without MNT_NODEV | MNT_LOCK_NODEV. > > Fortunately, none of the existing FS_USERNS_MOUNT filesystems have > that property. FUSE will, but we don't support FUSE in a userns yet > (unfortunately -- it would be a *very* useful feature.) > > I think that, if we ever allow FUSE in a userns, we should return Which, btw, I'm hoping we'll be allowing soon. > -EPERM when trying to mount it unless the user specifies MS_NODEV, In either case we can think that through when the time comes. > which is what this patch does. I don't think there's any reason to > play complicated games to allow programs to get away with omitting > MS_NODEV. Thanks, Andy. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns @ 2014-08-15 20:16 ` Serge Hallyn 0 siblings, 0 replies; 67+ messages in thread From: Serge Hallyn @ 2014-08-15 20:16 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, linux-kernel, stable, Kenton Varda, Eric W. Biederman, Linux FS Devel, Linus Torvalds Quoting Andy Lutomirski (luto@amacapital.net): > On Fri, Aug 15, 2014 at 12:37 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote: > > Quoting Andy Lutomirski (luto@amacapital.net): > >> On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote: > >> > Quoting Andy Lutomirski (luto@amacapital.net): > >> >> Currently, creating a new mount (as opposed to bindmount) in a > >> >> non-root userns will implicitly set nodev unless the fs is devpts. > >> >> Something like this will be necessary for file systems that allow > >> >> the mounter to create device nodes without using mknod (e.g. FUSE > >> >> if/when that is allowed), but none of the currently allowed > >> >> filesystems do this. > >> > > >> > Hi, > >> > > >> > Sorry, I'm probably thinking stupidly, but I don't see this restriction > >> > being the case > >> > > >> > serge@sl:~$ mount | grep tmp > >> > [...] > >> > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) > >> > serge@sl:~$ sudo mknod /run/kvm c 10 232 > >> > [sudo] password for serge: > >> > serge@sl:~$ echo $? > >> > 0 > >> > serge@sl:~$ ls -l /run/kvm > >> > crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm > >> > > >> > But you seem to be saying I shouldn't be allowed to create a device inside > >> > a tmpfs. What am I overlooking? > >> > >> I assume you're in the root userns. This patch is unnecessary, and > >> has no effect, if you're in the root userns. > > > > Right, but I thought you were justifying adding FS_USERNS_DEV_MOUNT by saying > > that you cannot mknod in those filesystems. But I see you actually said > > "without using mknod". I guess I don't understand that caveat. > > IIUC, there are two ways that a user could put a device node into > their filesystem. > > The obvious way is using mknod. But mknod has its own perfectly valid > permission checks, and it doesn't need any special handling at mount > time. > > The less obvious way is to mount a filesystem that already contains a > device node or to mount a filesystem that gives some other means of > inserting a device node (e.g. a network filesystem or FUSE). Those > might allow inserting device nodes without passing a global capability > check, so unprivileged users in a userns must not be allowed to mount > such a filesystem without MNT_NODEV | MNT_LOCK_NODEV. > > Fortunately, none of the existing FS_USERNS_MOUNT filesystems have > that property. FUSE will, but we don't support FUSE in a userns yet > (unfortunately -- it would be a *very* useful feature.) > > I think that, if we ever allow FUSE in a userns, we should return Which, btw, I'm hoping we'll be allowing soon. > -EPERM when trying to mount it unless the user specifies MS_NODEV, In either case we can think that through when the time comes. > which is what this patch does. I don't think there's any reason to > play complicated games to allow programs to get away with omitting > MS_NODEV. Thanks, Andy. ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <2686c32f00b14148379e8cfee9c028c794d4aa1a.1407974494.git.luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>]
* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns [not found] ` <2686c32f00b14148379e8cfee9c028c794d4aa1a.1407974494.git.luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> @ 2014-08-15 19:05 ` Serge Hallyn 2014-08-15 20:16 ` Serge Hallyn 2014-08-28 1:35 ` Andy Lutomirski 2 siblings, 0 replies; 67+ messages in thread From: Serge Hallyn @ 2014-08-15 19:05 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA, Kenton Varda, Eric W. Biederman, Linux FS Devel, Linus Torvalds Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): > Currently, creating a new mount (as opposed to bindmount) in a > non-root userns will implicitly set nodev unless the fs is devpts. > Something like this will be necessary for file systems that allow > the mounter to create device nodes without using mknod (e.g. FUSE > if/when that is allowed), but none of the currently allowed > filesystems do this. Hi, Sorry, I'm probably thinking stupidly, but I don't see this restriction being the case serge@sl:~$ mount | grep tmp [...] tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) serge@sl:~$ sudo mknod /run/kvm c 10 232 [sudo] password for serge: serge@sl:~$ echo $? 0 serge@sl:~$ ls -l /run/kvm crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm But you seem to be saying I shouldn't be allowed to create a device inside a tmpfs. What am I overlooking? > Implicitly adding nodev is problematic, though. It will make it > unsafe to ever remove the implicit addition, since userspace might > start to rely on it. > > This fixes a minor regression in: > > 9566d6742852 mnt: Correct permission checks in do_remount > > Prior to that commit, MNT_NODEV wasn't enforced for remounts, so > there is existing user code that creates a new mount in a userns > without MS_NODEV and then expects a remount with matching options to > work. That commit broke code that does this. > > Fortunately, since the implicit nodev has no effect on any existing > filesystems, we can still safely remove it. > > This replaces the implicit nodev with an explicit nodev requirement: > anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get > -EPERM unless they set nodev. If they set nodev, that setting will > be locked. > > As an added benefit, if anything like device namespaces is ever > added, then user code will be able to opt out of nodev by clearing > nodev. > > To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all > FS_USERNS_MOUNT filesystems. All of the current filesystems with > FS_USERNS_MOUNT set are safe. > > I confirmed that this is compatible with Sandstorm's revision > 1bf0c4847b. That revision of Sandstorm does not work without this > fix if 9566d6742852 is applied. > > Cc: Kenton Varda <kenton-AuYgBwuPrUQTaNkGU808tA@public.gmane.org> > Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Signed-off-by: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> > --- > fs/namespace.c | 16 ++++++++++++---- > fs/proc/root.c | 2 +- > fs/ramfs/inode.c | 2 +- > fs/sysfs/mount.c | 2 +- > ipc/mqueue.c | 2 +- > mm/shmem.c | 4 ++-- > 6 files changed, 18 insertions(+), 10 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 0acabea58319..835fa9e8307e 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags, > put_filesystem(type); > return -EPERM; > } > - /* Only in special cases allow devices from mounts > - * created outside the initial user namespace. > + > + /* > + * If a filesystem might allow the mounter to put > + * device nodes on it without the checks in mknod, > + * then require MS_NODEV to mount it. > */ > if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) { > - flags |= MS_NODEV; > - mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV; > + if (!(mnt_flags & MNT_NODEV)) { > + put_filesystem(type); > + return -EPERM; > + } > + > + /* Do not allow nodev to be cleared. */ > + mnt_flags |= MNT_LOCK_NODEV; > } > } > > diff --git a/fs/proc/root.c b/fs/proc/root.c > index 094e44d4a6be..2313b280729e 100644 > --- a/fs/proc/root.c > +++ b/fs/proc/root.c > @@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = { > .name = "proc", > .mount = proc_mount, > .kill_sb = proc_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > void __init proc_root_init(void) > diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c > index d365b1c4eb3c..b95b7302d4cc 100644 > --- a/fs/ramfs/inode.c > +++ b/fs/ramfs/inode.c > @@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = { > .name = "ramfs", > .mount = ramfs_mount, > .kill_sb = ramfs_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init init_ramfs_fs(void) > diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c > index 8a49486bf30c..56ba59317e24 100644 > --- a/fs/sysfs/mount.c > +++ b/fs/sysfs/mount.c > @@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = { > .name = "sysfs", > .mount = sysfs_mount, > .kill_sb = sysfs_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init sysfs_init(void) > diff --git a/ipc/mqueue.c b/ipc/mqueue.c > index 4fcf39af1776..56abbc848d4c 100644 > --- a/ipc/mqueue.c > +++ b/ipc/mqueue.c > @@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = { > .name = "mqueue", > .mount = mqueue_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int mq_init_ns(struct ipc_namespace *ns) > diff --git a/mm/shmem.c b/mm/shmem.c > index a42add14331c..f4a708a8f9e3 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = { > .name = "tmpfs", > .mount = shmem_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init shmem_init(void) > @@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = { > .name = "tmpfs", > .mount = ramfs_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init shmem_init(void) > -- > 1.9.3 > ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns [not found] ` <2686c32f00b14148379e8cfee9c028c794d4aa1a.1407974494.git.luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> 2014-08-15 19:05 ` Serge Hallyn @ 2014-08-15 20:16 ` Serge Hallyn 2014-08-28 1:35 ` Andy Lutomirski 2 siblings, 0 replies; 67+ messages in thread From: Serge Hallyn @ 2014-08-15 20:16 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA, Kenton Varda, Eric W. Biederman, Linux FS Devel, Linus Torvalds Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): > Currently, creating a new mount (as opposed to bindmount) in a > non-root userns will implicitly set nodev unless the fs is devpts. > Something like this will be necessary for file systems that allow > the mounter to create device nodes without using mknod (e.g. FUSE > if/when that is allowed), but none of the currently allowed > filesystems do this. > > Implicitly adding nodev is problematic, though. It will make it > unsafe to ever remove the implicit addition, since userspace might > start to rely on it. > > This fixes a minor regression in: > > 9566d6742852 mnt: Correct permission checks in do_remount > > Prior to that commit, MNT_NODEV wasn't enforced for remounts, so > there is existing user code that creates a new mount in a userns > without MS_NODEV and then expects a remount with matching options to > work. That commit broke code that does this. > > Fortunately, since the implicit nodev has no effect on any existing > filesystems, we can still safely remove it. > > This replaces the implicit nodev with an explicit nodev requirement: > anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get > -EPERM unless they set nodev. If they set nodev, that setting will > be locked. > > As an added benefit, if anything like device namespaces is ever > added, then user code will be able to opt out of nodev by clearing > nodev. > > To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all > FS_USERNS_MOUNT filesystems. All of the current filesystems with > FS_USERNS_MOUNT set are safe. > > I confirmed that this is compatible with Sandstorm's revision > 1bf0c4847b. That revision of Sandstorm does not work without this > fix if 9566d6742852 is applied. > Acked-by: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> This seems like the best alternative by far. > Cc: Kenton Varda <kenton-AuYgBwuPrUQTaNkGU808tA@public.gmane.org> > Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Signed-off-by: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> > --- > fs/namespace.c | 16 ++++++++++++---- > fs/proc/root.c | 2 +- > fs/ramfs/inode.c | 2 +- > fs/sysfs/mount.c | 2 +- > ipc/mqueue.c | 2 +- > mm/shmem.c | 4 ++-- > 6 files changed, 18 insertions(+), 10 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 0acabea58319..835fa9e8307e 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags, > put_filesystem(type); > return -EPERM; > } > - /* Only in special cases allow devices from mounts > - * created outside the initial user namespace. > + > + /* > + * If a filesystem might allow the mounter to put > + * device nodes on it without the checks in mknod, > + * then require MS_NODEV to mount it. > */ > if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) { > - flags |= MS_NODEV; > - mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV; > + if (!(mnt_flags & MNT_NODEV)) { > + put_filesystem(type); > + return -EPERM; > + } > + > + /* Do not allow nodev to be cleared. */ > + mnt_flags |= MNT_LOCK_NODEV; > } > } > > diff --git a/fs/proc/root.c b/fs/proc/root.c > index 094e44d4a6be..2313b280729e 100644 > --- a/fs/proc/root.c > +++ b/fs/proc/root.c > @@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = { > .name = "proc", > .mount = proc_mount, > .kill_sb = proc_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > void __init proc_root_init(void) > diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c > index d365b1c4eb3c..b95b7302d4cc 100644 > --- a/fs/ramfs/inode.c > +++ b/fs/ramfs/inode.c > @@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = { > .name = "ramfs", > .mount = ramfs_mount, > .kill_sb = ramfs_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init init_ramfs_fs(void) > diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c > index 8a49486bf30c..56ba59317e24 100644 > --- a/fs/sysfs/mount.c > +++ b/fs/sysfs/mount.c > @@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = { > .name = "sysfs", > .mount = sysfs_mount, > .kill_sb = sysfs_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init sysfs_init(void) > diff --git a/ipc/mqueue.c b/ipc/mqueue.c > index 4fcf39af1776..56abbc848d4c 100644 > --- a/ipc/mqueue.c > +++ b/ipc/mqueue.c > @@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = { > .name = "mqueue", > .mount = mqueue_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int mq_init_ns(struct ipc_namespace *ns) > diff --git a/mm/shmem.c b/mm/shmem.c > index a42add14331c..f4a708a8f9e3 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = { > .name = "tmpfs", > .mount = shmem_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init shmem_init(void) > @@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = { > .name = "tmpfs", > .mount = ramfs_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init shmem_init(void) > -- > 1.9.3 > ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns [not found] ` <2686c32f00b14148379e8cfee9c028c794d4aa1a.1407974494.git.luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> 2014-08-15 19:05 ` Serge Hallyn 2014-08-15 20:16 ` Serge Hallyn @ 2014-08-28 1:35 ` Andy Lutomirski 2 siblings, 0 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-28 1:35 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Containers, Serge Hallyn, linux-kernel-u79uwXL29TY76Z2rM5mHXA, stable, Andy Lutomirski, Kenton Varda, Linux FS Devel, Linus Torvalds On Wed, Aug 13, 2014 at 5:03 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote: > Currently, creating a new mount (as opposed to bindmount) in a > non-root userns will implicitly set nodev unless the fs is devpts. > Something like this will be necessary for file systems that allow > the mounter to create device nodes without using mknod (e.g. FUSE > if/when that is allowed), but none of the currently allowed > filesystems do this. > Any thoughts, Eric? > Implicitly adding nodev is problematic, though. It will make it > unsafe to ever remove the implicit addition, since userspace might > start to rely on it. > > This fixes a minor regression in: > > 9566d6742852 mnt: Correct permission checks in do_remount > > Prior to that commit, MNT_NODEV wasn't enforced for remounts, so > there is existing user code that creates a new mount in a userns > without MS_NODEV and then expects a remount with matching options to > work. That commit broke code that does this. > > Fortunately, since the implicit nodev has no effect on any existing > filesystems, we can still safely remove it. > > This replaces the implicit nodev with an explicit nodev requirement: > anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get > -EPERM unless they set nodev. If they set nodev, that setting will > be locked. > > As an added benefit, if anything like device namespaces is ever > added, then user code will be able to opt out of nodev by clearing > nodev. > > To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all > FS_USERNS_MOUNT filesystems. All of the current filesystems with > FS_USERNS_MOUNT set are safe. > > I confirmed that this is compatible with Sandstorm's revision > 1bf0c4847b. That revision of Sandstorm does not work without this > fix if 9566d6742852 is applied. > > Cc: Kenton Varda <kenton-AuYgBwuPrUQTaNkGU808tA@public.gmane.org> > Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Signed-off-by: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> > --- > fs/namespace.c | 16 ++++++++++++---- > fs/proc/root.c | 2 +- > fs/ramfs/inode.c | 2 +- > fs/sysfs/mount.c | 2 +- > ipc/mqueue.c | 2 +- > mm/shmem.c | 4 ++-- > 6 files changed, 18 insertions(+), 10 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 0acabea58319..835fa9e8307e 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags, > put_filesystem(type); > return -EPERM; > } > - /* Only in special cases allow devices from mounts > - * created outside the initial user namespace. > + > + /* > + * If a filesystem might allow the mounter to put > + * device nodes on it without the checks in mknod, > + * then require MS_NODEV to mount it. > */ > if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) { > - flags |= MS_NODEV; > - mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV; > + if (!(mnt_flags & MNT_NODEV)) { > + put_filesystem(type); > + return -EPERM; > + } > + > + /* Do not allow nodev to be cleared. */ > + mnt_flags |= MNT_LOCK_NODEV; > } > } > > diff --git a/fs/proc/root.c b/fs/proc/root.c > index 094e44d4a6be..2313b280729e 100644 > --- a/fs/proc/root.c > +++ b/fs/proc/root.c > @@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = { > .name = "proc", > .mount = proc_mount, > .kill_sb = proc_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > void __init proc_root_init(void) > diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c > index d365b1c4eb3c..b95b7302d4cc 100644 > --- a/fs/ramfs/inode.c > +++ b/fs/ramfs/inode.c > @@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = { > .name = "ramfs", > .mount = ramfs_mount, > .kill_sb = ramfs_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init init_ramfs_fs(void) > diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c > index 8a49486bf30c..56ba59317e24 100644 > --- a/fs/sysfs/mount.c > +++ b/fs/sysfs/mount.c > @@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = { > .name = "sysfs", > .mount = sysfs_mount, > .kill_sb = sysfs_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init sysfs_init(void) > diff --git a/ipc/mqueue.c b/ipc/mqueue.c > index 4fcf39af1776..56abbc848d4c 100644 > --- a/ipc/mqueue.c > +++ b/ipc/mqueue.c > @@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = { > .name = "mqueue", > .mount = mqueue_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int mq_init_ns(struct ipc_namespace *ns) > diff --git a/mm/shmem.c b/mm/shmem.c > index a42add14331c..f4a708a8f9e3 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = { > .name = "tmpfs", > .mount = shmem_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init shmem_init(void) > @@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = { > .name = "tmpfs", > .mount = ramfs_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init shmem_init(void) > -- > 1.9.3 > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns 2014-08-14 0:03 ` Andy Lutomirski 2014-08-15 19:05 ` Serge Hallyn [not found] ` <2686c32f00b14148379e8cfee9c028c794d4aa1a.1407974494.git.luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> @ 2014-08-15 20:16 ` Serge Hallyn 2014-08-28 1:35 ` Andy Lutomirski 3 siblings, 0 replies; 67+ messages in thread From: Serge Hallyn @ 2014-08-15 20:16 UTC (permalink / raw) To: Andy Lutomirski Cc: Eric W. Biederman, linux-kernel, Linux Containers, Linux FS Devel, Linus Torvalds, Kenton Varda, stable Quoting Andy Lutomirski (luto@amacapital.net): > Currently, creating a new mount (as opposed to bindmount) in a > non-root userns will implicitly set nodev unless the fs is devpts. > Something like this will be necessary for file systems that allow > the mounter to create device nodes without using mknod (e.g. FUSE > if/when that is allowed), but none of the currently allowed > filesystems do this. > > Implicitly adding nodev is problematic, though. It will make it > unsafe to ever remove the implicit addition, since userspace might > start to rely on it. > > This fixes a minor regression in: > > 9566d6742852 mnt: Correct permission checks in do_remount > > Prior to that commit, MNT_NODEV wasn't enforced for remounts, so > there is existing user code that creates a new mount in a userns > without MS_NODEV and then expects a remount with matching options to > work. That commit broke code that does this. > > Fortunately, since the implicit nodev has no effect on any existing > filesystems, we can still safely remove it. > > This replaces the implicit nodev with an explicit nodev requirement: > anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get > -EPERM unless they set nodev. If they set nodev, that setting will > be locked. > > As an added benefit, if anything like device namespaces is ever > added, then user code will be able to opt out of nodev by clearing > nodev. > > To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all > FS_USERNS_MOUNT filesystems. All of the current filesystems with > FS_USERNS_MOUNT set are safe. > > I confirmed that this is compatible with Sandstorm's revision > 1bf0c4847b. That revision of Sandstorm does not work without this > fix if 9566d6742852 is applied. > Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> This seems like the best alternative by far. > Cc: Kenton Varda <kenton@sandstorm.io> > Cc: stable@vger.kernel.org > Signed-off-by: Andy Lutomirski <luto@amacapital.net> > --- > fs/namespace.c | 16 ++++++++++++---- > fs/proc/root.c | 2 +- > fs/ramfs/inode.c | 2 +- > fs/sysfs/mount.c | 2 +- > ipc/mqueue.c | 2 +- > mm/shmem.c | 4 ++-- > 6 files changed, 18 insertions(+), 10 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 0acabea58319..835fa9e8307e 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags, > put_filesystem(type); > return -EPERM; > } > - /* Only in special cases allow devices from mounts > - * created outside the initial user namespace. > + > + /* > + * If a filesystem might allow the mounter to put > + * device nodes on it without the checks in mknod, > + * then require MS_NODEV to mount it. > */ > if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) { > - flags |= MS_NODEV; > - mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV; > + if (!(mnt_flags & MNT_NODEV)) { > + put_filesystem(type); > + return -EPERM; > + } > + > + /* Do not allow nodev to be cleared. */ > + mnt_flags |= MNT_LOCK_NODEV; > } > } > > diff --git a/fs/proc/root.c b/fs/proc/root.c > index 094e44d4a6be..2313b280729e 100644 > --- a/fs/proc/root.c > +++ b/fs/proc/root.c > @@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = { > .name = "proc", > .mount = proc_mount, > .kill_sb = proc_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > void __init proc_root_init(void) > diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c > index d365b1c4eb3c..b95b7302d4cc 100644 > --- a/fs/ramfs/inode.c > +++ b/fs/ramfs/inode.c > @@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = { > .name = "ramfs", > .mount = ramfs_mount, > .kill_sb = ramfs_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init init_ramfs_fs(void) > diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c > index 8a49486bf30c..56ba59317e24 100644 > --- a/fs/sysfs/mount.c > +++ b/fs/sysfs/mount.c > @@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = { > .name = "sysfs", > .mount = sysfs_mount, > .kill_sb = sysfs_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init sysfs_init(void) > diff --git a/ipc/mqueue.c b/ipc/mqueue.c > index 4fcf39af1776..56abbc848d4c 100644 > --- a/ipc/mqueue.c > +++ b/ipc/mqueue.c > @@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = { > .name = "mqueue", > .mount = mqueue_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int mq_init_ns(struct ipc_namespace *ns) > diff --git a/mm/shmem.c b/mm/shmem.c > index a42add14331c..f4a708a8f9e3 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = { > .name = "tmpfs", > .mount = shmem_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init shmem_init(void) > @@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = { > .name = "tmpfs", > .mount = ramfs_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init shmem_init(void) > -- > 1.9.3 > ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns 2014-08-14 0:03 ` Andy Lutomirski ` (2 preceding siblings ...) 2014-08-15 20:16 ` Serge Hallyn @ 2014-08-28 1:35 ` Andy Lutomirski 3 siblings, 0 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-28 1:35 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-kernel, Linux Containers, Linux FS Devel, Linus Torvalds, Serge Hallyn, Andy Lutomirski, Kenton Varda, stable On Wed, Aug 13, 2014 at 5:03 PM, Andy Lutomirski <luto@amacapital.net> wrote: > Currently, creating a new mount (as opposed to bindmount) in a > non-root userns will implicitly set nodev unless the fs is devpts. > Something like this will be necessary for file systems that allow > the mounter to create device nodes without using mknod (e.g. FUSE > if/when that is allowed), but none of the currently allowed > filesystems do this. > Any thoughts, Eric? > Implicitly adding nodev is problematic, though. It will make it > unsafe to ever remove the implicit addition, since userspace might > start to rely on it. > > This fixes a minor regression in: > > 9566d6742852 mnt: Correct permission checks in do_remount > > Prior to that commit, MNT_NODEV wasn't enforced for remounts, so > there is existing user code that creates a new mount in a userns > without MS_NODEV and then expects a remount with matching options to > work. That commit broke code that does this. > > Fortunately, since the implicit nodev has no effect on any existing > filesystems, we can still safely remove it. > > This replaces the implicit nodev with an explicit nodev requirement: > anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get > -EPERM unless they set nodev. If they set nodev, that setting will > be locked. > > As an added benefit, if anything like device namespaces is ever > added, then user code will be able to opt out of nodev by clearing > nodev. > > To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all > FS_USERNS_MOUNT filesystems. All of the current filesystems with > FS_USERNS_MOUNT set are safe. > > I confirmed that this is compatible with Sandstorm's revision > 1bf0c4847b. That revision of Sandstorm does not work without this > fix if 9566d6742852 is applied. > > Cc: Kenton Varda <kenton@sandstorm.io> > Cc: stable@vger.kernel.org > Signed-off-by: Andy Lutomirski <luto@amacapital.net> > --- > fs/namespace.c | 16 ++++++++++++---- > fs/proc/root.c | 2 +- > fs/ramfs/inode.c | 2 +- > fs/sysfs/mount.c | 2 +- > ipc/mqueue.c | 2 +- > mm/shmem.c | 4 ++-- > 6 files changed, 18 insertions(+), 10 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 0acabea58319..835fa9e8307e 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags, > put_filesystem(type); > return -EPERM; > } > - /* Only in special cases allow devices from mounts > - * created outside the initial user namespace. > + > + /* > + * If a filesystem might allow the mounter to put > + * device nodes on it without the checks in mknod, > + * then require MS_NODEV to mount it. > */ > if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) { > - flags |= MS_NODEV; > - mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV; > + if (!(mnt_flags & MNT_NODEV)) { > + put_filesystem(type); > + return -EPERM; > + } > + > + /* Do not allow nodev to be cleared. */ > + mnt_flags |= MNT_LOCK_NODEV; > } > } > > diff --git a/fs/proc/root.c b/fs/proc/root.c > index 094e44d4a6be..2313b280729e 100644 > --- a/fs/proc/root.c > +++ b/fs/proc/root.c > @@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = { > .name = "proc", > .mount = proc_mount, > .kill_sb = proc_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > void __init proc_root_init(void) > diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c > index d365b1c4eb3c..b95b7302d4cc 100644 > --- a/fs/ramfs/inode.c > +++ b/fs/ramfs/inode.c > @@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = { > .name = "ramfs", > .mount = ramfs_mount, > .kill_sb = ramfs_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init init_ramfs_fs(void) > diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c > index 8a49486bf30c..56ba59317e24 100644 > --- a/fs/sysfs/mount.c > +++ b/fs/sysfs/mount.c > @@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = { > .name = "sysfs", > .mount = sysfs_mount, > .kill_sb = sysfs_kill_sb, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init sysfs_init(void) > diff --git a/ipc/mqueue.c b/ipc/mqueue.c > index 4fcf39af1776..56abbc848d4c 100644 > --- a/ipc/mqueue.c > +++ b/ipc/mqueue.c > @@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = { > .name = "mqueue", > .mount = mqueue_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int mq_init_ns(struct ipc_namespace *ns) > diff --git a/mm/shmem.c b/mm/shmem.c > index a42add14331c..f4a708a8f9e3 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = { > .name = "tmpfs", > .mount = shmem_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init shmem_init(void) > @@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = { > .name = "tmpfs", > .mount = ramfs_mount, > .kill_sb = kill_litter_super, > - .fs_flags = FS_USERNS_MOUNT, > + .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT, > }; > > int __init shmem_init(void) > -- > 1.9.3 > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 [not found] ` <87tx5ghekp.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2014-08-13 17:03 ` Andy Lutomirski @ 2014-08-15 18:41 ` Andy Lutomirski 1 sibling, 0 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-15 18:41 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux FS Devel, Kenton Varda, Linux Containers, Linus Torvalds, Linux Kernel Mailing List On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: > Kenton Varda <kenton-AuYgBwuPrUQTaNkGU808tA@public.gmane.org> writes: > >> On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman >> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: > From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> > Date: Wed, 13 Aug 2014 01:33:38 -0700 > Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount > > Now that remount is properly enforcing the rule that you can't > remove nodev at least sandstorm.io is breaking when performing > a remount. > > It turns out that there is an easy intuitive solution implicitly > add nodev on remount under the same conditions that we do on mount. > > Update the regression test for remounts to verify that this new > addition does not regress either. Tested-but-disliked-by: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> Quoting Alan Cox [1]: Lying to apps generally ends up like children lying to parents - the lie gets more complicated to keep up each case you find until it breaks. The kernel was unnecessarily fudging mount flags, and it broke, because the fudge didn't go far enough. This extends the fudge. Let's just delete it instead. --Andy [1] https://lwn.net/Articles/608435/ > > Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> > --- > fs/namespace.c | 9 ++++ > .../selftests/mount/unprivileged-remount-test.c | 51 ++++++++++++---------- > 2 files changed, 36 insertions(+), 24 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 7886176232c1..0f158300c866 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags, > if (path->dentry != path->mnt->mnt_root) > return -EINVAL; > > + /* Only in special cases allow devices from mounts created > + * outside the initial user namespace. > + */ > + if ((mnt->mnt_ns->user_ns != &init_user_ns) && > + !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) { > + flags |= MS_NODEV; > + mnt_flags |= MNT_NODEV; > + } > + > /* Don't allow changing of locked mnt flags. > * > * No locks need to be held here while testing the various > diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c > index 1b3ff2fda4d0..6d408c155e0f 100644 > --- a/tools/testing/selftests/mount/unprivileged-remount-test.c > +++ b/tools/testing/selftests/mount/unprivileged-remount-test.c > @@ -118,7 +118,8 @@ static void create_and_enter_userns(void) > } > > static > -bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags) > +bool test_unpriv_remount(const char *fstype, const char *mount_options, > + int mount_flags, int remount_flags, int invalid_flags) > { > pid_t child; > > @@ -151,9 +152,11 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags) > strerror(errno)); > } > > - if (mount("testing", "/tmp", "ramfs", mount_flags, NULL) != 0) { > - die("mount of /tmp failed: %s\n", > - strerror(errno)); > + if (mount("testing", "/tmp", fstype, mount_flags, mount_options) != 0) { > + die("mount of %s with options '%s' on /tmp failed: %s\n", > + fstype, > + mount_options? mount_options : "", > + strerror(errno)); > } > > create_and_enter_userns(); > @@ -181,60 +184,60 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags) > > static bool test_unpriv_remount_simple(int mount_flags) > { > - return test_unpriv_remount(mount_flags, mount_flags, 0); > + return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, 0); > } > > static bool test_unpriv_remount_atime(int mount_flags, int invalid_flags) > { > - return test_unpriv_remount(mount_flags, mount_flags, invalid_flags); > + return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, > + invalid_flags); > } > > int main(int argc, char **argv) > { > - if (!test_unpriv_remount_simple(MS_RDONLY|MS_NODEV)) { > + if (!test_unpriv_remount_simple(MS_RDONLY)) { > die("MS_RDONLY malfunctions\n"); > } > - if (!test_unpriv_remount_simple(MS_NODEV)) { > + if (!test_unpriv_remount("devpts", "newinstance", MS_NODEV, MS_NODEV, 0)) { > die("MS_NODEV malfunctions\n"); > } > - if (!test_unpriv_remount_simple(MS_NOSUID|MS_NODEV)) { > + if (!test_unpriv_remount_simple(MS_NOSUID)) { > die("MS_NOSUID malfunctions\n"); > } > - if (!test_unpriv_remount_simple(MS_NOEXEC|MS_NODEV)) { > + if (!test_unpriv_remount_simple(MS_NOEXEC)) { > die("MS_NOEXEC malfunctions\n"); > } > - if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODEV, > - MS_NOATIME|MS_NODEV)) > + if (!test_unpriv_remount_atime(MS_RELATIME, > + MS_NOATIME)) > { > die("MS_RELATIME malfunctions\n"); > } > - if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODEV, > - MS_NOATIME|MS_NODEV)) > + if (!test_unpriv_remount_atime(MS_STRICTATIME, > + MS_NOATIME)) > { > die("MS_STRICTATIME malfunctions\n"); > } > - if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODEV, > - MS_STRICTATIME|MS_NODEV)) > + if (!test_unpriv_remount_atime(MS_NOATIME, > + MS_STRICTATIME)) > { > die("MS_RELATIME malfunctions\n"); > } > - if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME|MS_NODEV, > - MS_NOATIME|MS_NODEV)) > + if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME, > + MS_NOATIME)) > { > die("MS_RELATIME malfunctions\n"); > } > - if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME|MS_NODEV, > - MS_NOATIME|MS_NODEV)) > + if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME, > + MS_NOATIME)) > { > die("MS_RELATIME malfunctions\n"); > } > - if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME|MS_NODEV, > - MS_STRICTATIME|MS_NODEV)) > + if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME, > + MS_STRICTATIME)) > { > die("MS_RELATIME malfunctions\n"); > } > - if (!test_unpriv_remount(MS_STRICTATIME|MS_NODEV, MS_NODEV, > - MS_NOATIME|MS_NODEV)) > + if (!test_unpriv_remount("ramfs", NULL, MS_STRICTATIME, 0, MS_NOATIME)) > { > die("Default atime malfunctions\n"); > } > -- > 1.9.1 > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-13 10:24 ` Eric W. Biederman (?) (?) @ 2014-08-15 18:41 ` Andy Lutomirski -1 siblings, 0 replies; 67+ messages in thread From: Andy Lutomirski @ 2014-08-15 18:41 UTC (permalink / raw) To: Eric W. Biederman Cc: Kenton Varda, Linus Torvalds, Linux FS Devel, Linux Containers, Linux Kernel Mailing List On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Kenton Varda <kenton@sandstorm.io> writes: > >> On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman >> <ebiederm@xmission.com> wrote: > From: "Eric W. Biederman" <ebiederm@xmission.com> > Date: Wed, 13 Aug 2014 01:33:38 -0700 > Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount > > Now that remount is properly enforcing the rule that you can't > remove nodev at least sandstorm.io is breaking when performing > a remount. > > It turns out that there is an easy intuitive solution implicitly > add nodev on remount under the same conditions that we do on mount. > > Update the regression test for remounts to verify that this new > addition does not regress either. Tested-but-disliked-by: Andy Lutomirski <luto@amacapital.net> Quoting Alan Cox [1]: Lying to apps generally ends up like children lying to parents - the lie gets more complicated to keep up each case you find until it breaks. The kernel was unnecessarily fudging mount flags, and it broke, because the fudge didn't go far enough. This extends the fudge. Let's just delete it instead. --Andy [1] https://lwn.net/Articles/608435/ > > Cc: stable@vger.kernel.org > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > fs/namespace.c | 9 ++++ > .../selftests/mount/unprivileged-remount-test.c | 51 ++++++++++++---------- > 2 files changed, 36 insertions(+), 24 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 7886176232c1..0f158300c866 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags, > if (path->dentry != path->mnt->mnt_root) > return -EINVAL; > > + /* Only in special cases allow devices from mounts created > + * outside the initial user namespace. > + */ > + if ((mnt->mnt_ns->user_ns != &init_user_ns) && > + !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) { > + flags |= MS_NODEV; > + mnt_flags |= MNT_NODEV; > + } > + > /* Don't allow changing of locked mnt flags. > * > * No locks need to be held here while testing the various > diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c > index 1b3ff2fda4d0..6d408c155e0f 100644 > --- a/tools/testing/selftests/mount/unprivileged-remount-test.c > +++ b/tools/testing/selftests/mount/unprivileged-remount-test.c > @@ -118,7 +118,8 @@ static void create_and_enter_userns(void) > } > > static > -bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags) > +bool test_unpriv_remount(const char *fstype, const char *mount_options, > + int mount_flags, int remount_flags, int invalid_flags) > { > pid_t child; > > @@ -151,9 +152,11 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags) > strerror(errno)); > } > > - if (mount("testing", "/tmp", "ramfs", mount_flags, NULL) != 0) { > - die("mount of /tmp failed: %s\n", > - strerror(errno)); > + if (mount("testing", "/tmp", fstype, mount_flags, mount_options) != 0) { > + die("mount of %s with options '%s' on /tmp failed: %s\n", > + fstype, > + mount_options? mount_options : "", > + strerror(errno)); > } > > create_and_enter_userns(); > @@ -181,60 +184,60 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags) > > static bool test_unpriv_remount_simple(int mount_flags) > { > - return test_unpriv_remount(mount_flags, mount_flags, 0); > + return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, 0); > } > > static bool test_unpriv_remount_atime(int mount_flags, int invalid_flags) > { > - return test_unpriv_remount(mount_flags, mount_flags, invalid_flags); > + return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, > + invalid_flags); > } > > int main(int argc, char **argv) > { > - if (!test_unpriv_remount_simple(MS_RDONLY|MS_NODEV)) { > + if (!test_unpriv_remount_simple(MS_RDONLY)) { > die("MS_RDONLY malfunctions\n"); > } > - if (!test_unpriv_remount_simple(MS_NODEV)) { > + if (!test_unpriv_remount("devpts", "newinstance", MS_NODEV, MS_NODEV, 0)) { > die("MS_NODEV malfunctions\n"); > } > - if (!test_unpriv_remount_simple(MS_NOSUID|MS_NODEV)) { > + if (!test_unpriv_remount_simple(MS_NOSUID)) { > die("MS_NOSUID malfunctions\n"); > } > - if (!test_unpriv_remount_simple(MS_NOEXEC|MS_NODEV)) { > + if (!test_unpriv_remount_simple(MS_NOEXEC)) { > die("MS_NOEXEC malfunctions\n"); > } > - if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODEV, > - MS_NOATIME|MS_NODEV)) > + if (!test_unpriv_remount_atime(MS_RELATIME, > + MS_NOATIME)) > { > die("MS_RELATIME malfunctions\n"); > } > - if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODEV, > - MS_NOATIME|MS_NODEV)) > + if (!test_unpriv_remount_atime(MS_STRICTATIME, > + MS_NOATIME)) > { > die("MS_STRICTATIME malfunctions\n"); > } > - if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODEV, > - MS_STRICTATIME|MS_NODEV)) > + if (!test_unpriv_remount_atime(MS_NOATIME, > + MS_STRICTATIME)) > { > die("MS_RELATIME malfunctions\n"); > } > - if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME|MS_NODEV, > - MS_NOATIME|MS_NODEV)) > + if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME, > + MS_NOATIME)) > { > die("MS_RELATIME malfunctions\n"); > } > - if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME|MS_NODEV, > - MS_NOATIME|MS_NODEV)) > + if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME, > + MS_NOATIME)) > { > die("MS_RELATIME malfunctions\n"); > } > - if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME|MS_NODEV, > - MS_STRICTATIME|MS_NODEV)) > + if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME, > + MS_STRICTATIME)) > { > die("MS_RELATIME malfunctions\n"); > } > - if (!test_unpriv_remount(MS_STRICTATIME|MS_NODEV, MS_NODEV, > - MS_NOATIME|MS_NODEV)) > + if (!test_unpriv_remount("ramfs", NULL, MS_STRICTATIME, 0, MS_NOATIME)) > { > die("Default atime malfunctions\n"); > } > -- > 1.9.1 > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-06 0:57 ` Eric W. Biederman @ 2014-08-20 15:06 ` Richard Weinberger -1 siblings, 0 replies; 67+ messages in thread From: Richard Weinberger @ 2014-08-20 15:06 UTC (permalink / raw) To: Eric W. Biederman Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML, linux-fsdevel, Linus Torvalds On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: > > Linus, > > Please pull the for-linus branch from the git tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus > > HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts > > This is a bunch of small changes built against 3.16-rc6. The most > significant change for users is the first patch which makes setns > drmatically faster by removing unneded rcu handling. > > The next chunk of changes are so that "mount -o remount,.." will not > allow the user namespace root to drop flags on a mount set by the system > wide root. Aks this forces read-only mounts to stay read-only, no-dev > mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to > stay no exec and it prevents unprivileged users from messing with a > mounts atime settings. I have included my test case as the last patch > in this series so people performing backports can verify this change > works correctly. > > The next change fixes a bug in NFS that was discovered while auditing > nsproxy users for the first optimization. Today you can oops the kernel > by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid > namespaces. I rebased and fixed the build of the !CONFIG_NFS_FS case > yesterday when a build bot caught my typo. Given that no one to my > knowledge bases anything on my tree fixing the typo in place seems more > responsible that requiring a typo-fix to be backported as well. > > The last change is a small semantic cleanup introducing > /proc/thread-self and pointing /proc/mounts and /proc/net at it. This > prevents several kinds of problemantic corner cases. It is a > user-visible change so it has a minute chance of causing regressions so > the change to /proc/mounts and /proc/net are individual one line commits > that can be trivially reverted. Unfortunately I lost and could not find > the email of the original reporter so he is not credited. From at least > one perspective this change to /proc/net is a refgression fix to allow > pthread /proc/net uses that were broken by the introduction of the network > namespace. > > Eric > > Eric W. Biederman (11): > namespaces: Use task_lock and not rcu to protect nsproxy > mnt: Only change user settable mount flags in remount > mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount > mnt: Correct permission checks in do_remount This commit breaks libvirt-lxc. libvirt does in lxcContainerMountBasicFS(): /* * We can't immediately set the MS_RDONLY flag when mounting filesystems * because (in at least some kernel versions) this will propagate back * to the original mount in the host OS, turning it readonly too. Thus * we mount the filesystem in read-write mode initially, and then do a * separate read-only bind mount on top of that. */ bindOverReadonly = !!(mnt_mflags & MS_RDONLY); VIR_DEBUG("Mount %s on %s type=%s flags=%x", mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY); if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY, NULL) < 0) { ^^^^ Here it fails for sysfs because with user namespaces we bind the existing /sys into the container and would have to read out all existing mount flags from the current /sys mount. Otherwise mount() fails with EPERM. On my test system /sys is mounted with "rw,nosuid,nodev,noexec,relatime" and libvirt misses the realtime... virReportSystemError(errno, _("Failed to mount %s on %s type %s flags=%x"), mnt_src, mnt->dst, NULLSTR(mnt->type), mnt_mflags & ~MS_RDONLY); goto cleanup; } if (bindOverReadonly && mount(mnt_src, mnt->dst, NULL, MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) { ^^^ Here it fails because now we'd have to specify all flags as used for the first mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV. See lxcBasicMounts[]. In this case the fix is easy, add mnt_mflags to the mount flags. virReportSystemError(errno, _("Failed to re-mount %s on %s flags=%x"), mnt_src, mnt->dst, MS_BIND|MS_REMOUNT|MS_RDONLY); goto cleanup; } -- Thanks, //richard ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-20 15:06 ` Richard Weinberger 0 siblings, 0 replies; 67+ messages in thread From: Richard Weinberger @ 2014-08-20 15:06 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Linux Containers, linux-fsdevel, LKML, libvir-list, Daniel P. Berrange On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > Linus, > > Please pull the for-linus branch from the git tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus > > HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts > > This is a bunch of small changes built against 3.16-rc6. The most > significant change for users is the first patch which makes setns > drmatically faster by removing unneded rcu handling. > > The next chunk of changes are so that "mount -o remount,.." will not > allow the user namespace root to drop flags on a mount set by the system > wide root. Aks this forces read-only mounts to stay read-only, no-dev > mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to > stay no exec and it prevents unprivileged users from messing with a > mounts atime settings. I have included my test case as the last patch > in this series so people performing backports can verify this change > works correctly. > > The next change fixes a bug in NFS that was discovered while auditing > nsproxy users for the first optimization. Today you can oops the kernel > by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid > namespaces. I rebased and fixed the build of the !CONFIG_NFS_FS case > yesterday when a build bot caught my typo. Given that no one to my > knowledge bases anything on my tree fixing the typo in place seems more > responsible that requiring a typo-fix to be backported as well. > > The last change is a small semantic cleanup introducing > /proc/thread-self and pointing /proc/mounts and /proc/net at it. This > prevents several kinds of problemantic corner cases. It is a > user-visible change so it has a minute chance of causing regressions so > the change to /proc/mounts and /proc/net are individual one line commits > that can be trivially reverted. Unfortunately I lost and could not find > the email of the original reporter so he is not credited. From at least > one perspective this change to /proc/net is a refgression fix to allow > pthread /proc/net uses that were broken by the introduction of the network > namespace. > > Eric > > Eric W. Biederman (11): > namespaces: Use task_lock and not rcu to protect nsproxy > mnt: Only change user settable mount flags in remount > mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount > mnt: Correct permission checks in do_remount This commit breaks libvirt-lxc. libvirt does in lxcContainerMountBasicFS(): /* * We can't immediately set the MS_RDONLY flag when mounting filesystems * because (in at least some kernel versions) this will propagate back * to the original mount in the host OS, turning it readonly too. Thus * we mount the filesystem in read-write mode initially, and then do a * separate read-only bind mount on top of that. */ bindOverReadonly = !!(mnt_mflags & MS_RDONLY); VIR_DEBUG("Mount %s on %s type=%s flags=%x", mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY); if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY, NULL) < 0) { ^^^^ Here it fails for sysfs because with user namespaces we bind the existing /sys into the container and would have to read out all existing mount flags from the current /sys mount. Otherwise mount() fails with EPERM. On my test system /sys is mounted with "rw,nosuid,nodev,noexec,relatime" and libvirt misses the realtime... virReportSystemError(errno, _("Failed to mount %s on %s type %s flags=%x"), mnt_src, mnt->dst, NULLSTR(mnt->type), mnt_mflags & ~MS_RDONLY); goto cleanup; } if (bindOverReadonly && mount(mnt_src, mnt->dst, NULL, MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) { ^^^ Here it fails because now we'd have to specify all flags as used for the first mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV. See lxcBasicMounts[]. In this case the fix is easy, add mnt_mflags to the mount flags. virReportSystemError(errno, _("Failed to re-mount %s on %s flags=%x"), mnt_src, mnt->dst, MS_BIND|MS_REMOUNT|MS_RDONLY); goto cleanup; } -- Thanks, //richard ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <CAFLxGvwi-iJRyfwv8v9fcRkiSu2d-az8W55xMPbp_d8wQKmwjg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-20 15:06 ` Richard Weinberger @ 2014-08-21 4:53 ` Eric W. Biederman -1 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-21 4:53 UTC (permalink / raw) To: Richard Weinberger Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML, linux-fsdevel, Linus Torvalds Richard Weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: > > This commit breaks libvirt-lxc. > libvirt does in lxcContainerMountBasicFS(): The bugs fixed are security issues, so if we have to break a small number of userspace applications we will. Anything that we can reasonably do to avoid regressions will be done. Could you please look at my user-namespace.git#for-next branch I have a fix for at least one regresion causing issue in there. I think it may fix your issues but I am not fully certain more comments below. > /* > * We can't immediately set the MS_RDONLY flag when mounting filesystems > * because (in at least some kernel versions) this will propagate back > * to the original mount in the host OS, turning it readonly too. Thus > * we mount the filesystem in read-write mode initially, and then do a > * separate read-only bind mount on top of that. > */ > bindOverReadonly = !!(mnt_mflags & MS_RDONLY); > > VIR_DEBUG("Mount %s on %s type=%s flags=%x", > mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY); > if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags & > ~MS_RDONLY, NULL) < 0) { > > ^^^^ Here it fails for sysfs because with user namespaces we bind the > existing /sys into the container > and would have to read out all existing mount flags from the current /sys mount. > Otherwise mount() fails with EPERM. > On my test system /sys is mounted with > "rw,nosuid,nodev,noexec,relatime" and libvirt > misses the realtime... Not specifying any atime flags to mount should be safe as that asks for the default atime flags which for remount I have made the default atime flags the existing atime flags. So I am scratching my head a little on this one. > > virReportSystemError(errno, > _("Failed to mount %s on %s type %s flags=%x"), > mnt_src, mnt->dst, NULLSTR(mnt->type), > mnt_mflags & ~MS_RDONLY); > goto cleanup; > } > > if (bindOverReadonly && > mount(mnt_src, mnt->dst, NULL, > MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) { > > ^^^ Here it fails because now we'd have to specify all flags as used > for the first > mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV. > See lxcBasicMounts[]. > In this case the fix is easy, add mnt_mflags to the mount flags. That has always been a bug in general because remount has always required specifying the complete set of mount flags you want to have. That fact that flags such as nosuid are now properly locked so you can not change them if you are not the global root user just makes this obvious. Andy Lutermorski has observed that statvfs will return the mount flags making reading them simple. Eric ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-21 4:53 ` Eric W. Biederman 0 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-21 4:53 UTC (permalink / raw) To: Richard Weinberger Cc: Linus Torvalds, Linux Containers, linux-fsdevel, LKML, libvir-list, Daniel P. Berrange Richard Weinberger <richard.weinberger@gmail.com> writes: > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > This commit breaks libvirt-lxc. > libvirt does in lxcContainerMountBasicFS(): The bugs fixed are security issues, so if we have to break a small number of userspace applications we will. Anything that we can reasonably do to avoid regressions will be done. Could you please look at my user-namespace.git#for-next branch I have a fix for at least one regresion causing issue in there. I think it may fix your issues but I am not fully certain more comments below. > /* > * We can't immediately set the MS_RDONLY flag when mounting filesystems > * because (in at least some kernel versions) this will propagate back > * to the original mount in the host OS, turning it readonly too. Thus > * we mount the filesystem in read-write mode initially, and then do a > * separate read-only bind mount on top of that. > */ > bindOverReadonly = !!(mnt_mflags & MS_RDONLY); > > VIR_DEBUG("Mount %s on %s type=%s flags=%x", > mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY); > if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags & > ~MS_RDONLY, NULL) < 0) { > > ^^^^ Here it fails for sysfs because with user namespaces we bind the > existing /sys into the container > and would have to read out all existing mount flags from the current /sys mount. > Otherwise mount() fails with EPERM. > On my test system /sys is mounted with > "rw,nosuid,nodev,noexec,relatime" and libvirt > misses the realtime... Not specifying any atime flags to mount should be safe as that asks for the default atime flags which for remount I have made the default atime flags the existing atime flags. So I am scratching my head a little on this one. > > virReportSystemError(errno, > _("Failed to mount %s on %s type %s flags=%x"), > mnt_src, mnt->dst, NULLSTR(mnt->type), > mnt_mflags & ~MS_RDONLY); > goto cleanup; > } > > if (bindOverReadonly && > mount(mnt_src, mnt->dst, NULL, > MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) { > > ^^^ Here it fails because now we'd have to specify all flags as used > for the first > mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV. > See lxcBasicMounts[]. > In this case the fix is easy, add mnt_mflags to the mount flags. That has always been a bug in general because remount has always required specifying the complete set of mount flags you want to have. That fact that flags such as nosuid are now properly locked so you can not change them if you are not the global root user just makes this obvious. Andy Lutermorski has observed that statvfs will return the mount flags making reading them simple. Eric ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <87vbpm4f4y.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 [not found] ` <87vbpm4f4y.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> @ 2014-08-21 6:29 ` Richard Weinberger 2014-08-21 13:12 ` Christoph Hellwig 1 sibling, 0 replies; 67+ messages in thread From: Richard Weinberger @ 2014-08-21 6:29 UTC (permalink / raw) To: Eric W. Biederman Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML, linux-fsdevel, Linus Torvalds Am 21.08.2014 06:53, schrieb Eric W. Biederman: > The bugs fixed are security issues, so if we have to break a small > number of userspace applications we will. Anything that we can > reasonably do to avoid regressions will be done. > > Could you please look at my user-namespace.git#for-next branch I have a > fix for at least one regresion causing issue in there. I think it may > fix your issues but I am not fully certain more comments below. I'll run this on my LXC testbed today. >> /* >> * We can't immediately set the MS_RDONLY flag when mounting filesystems >> * because (in at least some kernel versions) this will propagate back >> * to the original mount in the host OS, turning it readonly too. Thus >> * we mount the filesystem in read-write mode initially, and then do a >> * separate read-only bind mount on top of that. >> */ >> bindOverReadonly = !!(mnt_mflags & MS_RDONLY); >> >> VIR_DEBUG("Mount %s on %s type=%s flags=%x", >> mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY); >> if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags & >> ~MS_RDONLY, NULL) < 0) { >> >> ^^^^ Here it fails for sysfs because with user namespaces we bind the >> existing /sys into the container >> and would have to read out all existing mount flags from the current /sys mount. >> Otherwise mount() fails with EPERM. >> On my test system /sys is mounted with >> "rw,nosuid,nodev,noexec,relatime" and libvirt >> misses the realtime... > > Not specifying any atime flags to mount should be safe as that asks for > the default atime flags which for remount I have made the default atime > flags the existing atime flags. So I am scratching my head a little on > this one. Okay, let me find out why exactly libvirt gets a EPERM here. Maybe there are more odds hidden. >> >> virReportSystemError(errno, >> _("Failed to mount %s on %s type %s flags=%x"), >> mnt_src, mnt->dst, NULLSTR(mnt->type), >> mnt_mflags & ~MS_RDONLY); >> goto cleanup; >> } >> >> if (bindOverReadonly && >> mount(mnt_src, mnt->dst, NULL, >> MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) { >> >> ^^^ Here it fails because now we'd have to specify all flags as used >> for the first >> mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV. >> See lxcBasicMounts[]. >> In this case the fix is easy, add mnt_mflags to the mount flags. > > That has always been a bug in general because remount has always > required specifying the complete set of mount flags you want to have. > > That fact that flags such as nosuid are now properly locked so you can > not change them if you are not the global root user just makes this > obvious. > > Andy Lutermorski has observed that statvfs will return the mount flags > making reading them simple. Thanks for the clarification, I'll create a fix for libvirt. Thanks, //richard ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-21 4:53 ` Eric W. Biederman @ 2014-08-21 13:12 ` Christoph Hellwig -1 siblings, 0 replies; 67+ messages in thread From: Christoph Hellwig @ 2014-08-21 13:12 UTC (permalink / raw) To: Eric W. Biederman Cc: Richard Weinberger, Linux Containers, LKML, libvir-list-H+wXaHxf7aLQT0dZR+AlfA, linux-fsdevel, Linus Torvalds On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: > Richard Weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > > > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: > > > > This commit breaks libvirt-lxc. > > libvirt does in lxcContainerMountBasicFS(): > > The bugs fixed are security issues, so if we have to break a small > number of userspace applications we will. Anything that we can > reasonably do to avoid regressions will be done. Can you explain the security issues in detail? Breaking common userspace like libvirt-lxc with just a little bit of handwaiving is entirely unacceptable. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-21 13:12 ` Christoph Hellwig 0 siblings, 0 replies; 67+ messages in thread From: Christoph Hellwig @ 2014-08-21 13:12 UTC (permalink / raw) To: Eric W. Biederman Cc: Richard Weinberger, Linus Torvalds, Linux Containers, linux-fsdevel, LKML, libvir-list, Daniel P. Berrange On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: > Richard Weinberger <richard.weinberger@gmail.com> writes: > > > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > > > This commit breaks libvirt-lxc. > > libvirt does in lxcContainerMountBasicFS(): > > The bugs fixed are security issues, so if we have to break a small > number of userspace applications we will. Anything that we can > reasonably do to avoid regressions will be done. Can you explain the security issues in detail? Breaking common userspace like libvirt-lxc with just a little bit of handwaiving is entirely unacceptable. ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <20140821131257.GA4264-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-21 13:12 ` Christoph Hellwig @ 2014-08-21 13:22 ` Richard Weinberger -1 siblings, 0 replies; 67+ messages in thread From: Richard Weinberger @ 2014-08-21 13:22 UTC (permalink / raw) To: Christoph Hellwig, Eric W. Biederman Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML, linux-fsdevel, Linus Torvalds Am 21.08.2014 15:12, schrieb Christoph Hellwig: > On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: >> Richard Weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: >> >>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: >>> >>> This commit breaks libvirt-lxc. >>> libvirt does in lxcContainerMountBasicFS(): >> >> The bugs fixed are security issues, so if we have to break a small >> number of userspace applications we will. Anything that we can >> reasonably do to avoid regressions will be done. > > Can you explain the security issues in detail? Breaking common > userspace like libvirt-lxc with just a little bit of handwaiving is > entirely unacceptable. It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next unbreaks libvirt-lxc. I hope it hits Linus tree and -stable before the offending commit hits users. Thanks, //richard ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-21 13:22 ` Richard Weinberger 0 siblings, 0 replies; 67+ messages in thread From: Richard Weinberger @ 2014-08-21 13:22 UTC (permalink / raw) To: Christoph Hellwig, Eric W. Biederman Cc: Linus Torvalds, Linux Containers, linux-fsdevel, LKML, libvir-list, Daniel P. Berrange Am 21.08.2014 15:12, schrieb Christoph Hellwig: > On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: >> Richard Weinberger <richard.weinberger@gmail.com> writes: >> >>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: >>> >>> This commit breaks libvirt-lxc. >>> libvirt does in lxcContainerMountBasicFS(): >> >> The bugs fixed are security issues, so if we have to break a small >> number of userspace applications we will. Anything that we can >> reasonably do to avoid regressions will be done. > > Can you explain the security issues in detail? Breaking common > userspace like libvirt-lxc with just a little bit of handwaiving is > entirely unacceptable. It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next unbreaks libvirt-lxc. I hope it hits Linus tree and -stable before the offending commit hits users. Thanks, //richard ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <53F5F2AD.5010607-/L3Ra7n9ekc@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-21 13:22 ` Richard Weinberger @ 2014-08-21 14:09 ` Eric W. Biederman -1 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-21 14:09 UTC (permalink / raw) To: Richard Weinberger Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML, Christoph Hellwig, linux-fsdevel, Linus Torvalds Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> writes: > Am 21.08.2014 15:12, schrieb Christoph Hellwig: >> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: >>> Richard Weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: >>> >>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: >>>> >>>> This commit breaks libvirt-lxc. >>>> libvirt does in lxcContainerMountBasicFS(): >>> >>> The bugs fixed are security issues, so if we have to break a small >>> number of userspace applications we will. Anything that we can >>> reasonably do to avoid regressions will be done. >> >> Can you explain the security issues in detail? Breaking common >> userspace like libvirt-lxc with just a little bit of handwaiving is >> entirely unacceptable. > > It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in > git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next > unbreaks libvirt-lxc. > I hope it hits Linus tree and -stable before the offending commit hits users. I plan to send the pull request to Linus as soon as I have caught my breath (from all of the conferences this week) that I can be certain I am thinking clearly and not rushing things. Eric ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-21 14:09 ` Eric W. Biederman 0 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-21 14:09 UTC (permalink / raw) To: Richard Weinberger Cc: Christoph Hellwig, Linus Torvalds, Linux Containers, linux-fsdevel, LKML, libvir-list, Daniel P. Berrange Richard Weinberger <richard@nod.at> writes: > Am 21.08.2014 15:12, schrieb Christoph Hellwig: >> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: >>> Richard Weinberger <richard.weinberger@gmail.com> writes: >>> >>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: >>>> >>>> This commit breaks libvirt-lxc. >>>> libvirt does in lxcContainerMountBasicFS(): >>> >>> The bugs fixed are security issues, so if we have to break a small >>> number of userspace applications we will. Anything that we can >>> reasonably do to avoid regressions will be done. >> >> Can you explain the security issues in detail? Breaking common >> userspace like libvirt-lxc with just a little bit of handwaiving is >> entirely unacceptable. > > It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in > git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next > unbreaks libvirt-lxc. > I hope it hits Linus tree and -stable before the offending commit hits users. I plan to send the pull request to Linus as soon as I have caught my breath (from all of the conferences this week) that I can be certain I am thinking clearly and not rushing things. Eric ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <87k362vsr5.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-21 14:09 ` Eric W. Biederman @ 2014-09-03 21:18 ` Richard Weinberger -1 siblings, 0 replies; 67+ messages in thread From: Richard Weinberger @ 2014-09-03 21:18 UTC (permalink / raw) To: Eric W. Biederman Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Richard Weinberger, Linux Containers, LKML, Christoph Hellwig, linux-fsdevel, Linus Torvalds On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: >> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in >> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next >> unbreaks libvirt-lxc. >> I hope it hits Linus tree and -stable before the offending commit hits users. > > I plan to send the pull request to Linus as soon as I have caught my > breath (from all of the conferences this week) that I can be certain I > am thinking clearly and not rushing things. *kind reminder* :-) -- Thanks, //richard ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-09-03 21:18 ` Richard Weinberger 0 siblings, 0 replies; 67+ messages in thread From: Richard Weinberger @ 2014-09-03 21:18 UTC (permalink / raw) To: Eric W. Biederman Cc: Richard Weinberger, Christoph Hellwig, Linus Torvalds, Linux Containers, linux-fsdevel, LKML, libvir-list, Daniel P. Berrange On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: >> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in >> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next >> unbreaks libvirt-lxc. >> I hope it hits Linus tree and -stable before the offending commit hits users. > > I plan to send the pull request to Linus as soon as I have caught my > breath (from all of the conferences this week) that I can be certain I > am thinking clearly and not rushing things. *kind reminder* :-) -- Thanks, //richard ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-21 14:09 ` Eric W. Biederman @ 2014-11-25 23:15 ` Richard Weinberger -1 siblings, 0 replies; 67+ messages in thread From: Richard Weinberger @ 2014-11-25 23:15 UTC (permalink / raw) To: Eric W. Biederman Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Richard Weinberger, Linux Containers, LKML, Christoph Hellwig, linux-fsdevel, Linus Torvalds Eric, On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: > Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> writes: > >> Am 21.08.2014 15:12, schrieb Christoph Hellwig: >>> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: >>>> Richard Weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: >>>> >>>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: >>>>> >>>>> This commit breaks libvirt-lxc. >>>>> libvirt does in lxcContainerMountBasicFS(): >>>> >>>> The bugs fixed are security issues, so if we have to break a small >>>> number of userspace applications we will. Anything that we can >>>> reasonably do to avoid regressions will be done. >>> >>> Can you explain the security issues in detail? Breaking common >>> userspace like libvirt-lxc with just a little bit of handwaiving is >>> entirely unacceptable. >> >> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in >> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next >> unbreaks libvirt-lxc. >> I hope it hits Linus tree and -stable before the offending commit hits users. > > I plan to send the pull request to Linus as soon as I have caught my > breath (from all of the conferences this week) that I can be certain I > am thinking clearly and not rushing things. Today I've upgraded my LXC testbed to the most recent kernel and found libvirt-lxc broken again (sic!). Remounting /proc/sys/ is failing. Investigating into the issue showed that commit "mnt: Implicitly add MNT_NODEV on remount as we do on mount" is not mainline. Why did you left out this patch? In my previous mails I explicitly stated that exactly this commit unbreaks libvirt-lxc. Now the userspace breaking changes are mainline and hit users hard. :-( -- Thanks, //richard ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-11-25 23:15 ` Richard Weinberger 0 siblings, 0 replies; 67+ messages in thread From: Richard Weinberger @ 2014-11-25 23:15 UTC (permalink / raw) To: Eric W. Biederman Cc: Richard Weinberger, Christoph Hellwig, Linus Torvalds, Linux Containers, linux-fsdevel, LKML, libvir-list, Daniel P. Berrange Eric, On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Richard Weinberger <richard@nod.at> writes: > >> Am 21.08.2014 15:12, schrieb Christoph Hellwig: >>> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: >>>> Richard Weinberger <richard.weinberger@gmail.com> writes: >>>> >>>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: >>>>> >>>>> This commit breaks libvirt-lxc. >>>>> libvirt does in lxcContainerMountBasicFS(): >>>> >>>> The bugs fixed are security issues, so if we have to break a small >>>> number of userspace applications we will. Anything that we can >>>> reasonably do to avoid regressions will be done. >>> >>> Can you explain the security issues in detail? Breaking common >>> userspace like libvirt-lxc with just a little bit of handwaiving is >>> entirely unacceptable. >> >> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in >> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next >> unbreaks libvirt-lxc. >> I hope it hits Linus tree and -stable before the offending commit hits users. > > I plan to send the pull request to Linus as soon as I have caught my > breath (from all of the conferences this week) that I can be certain I > am thinking clearly and not rushing things. Today I've upgraded my LXC testbed to the most recent kernel and found libvirt-lxc broken again (sic!). Remounting /proc/sys/ is failing. Investigating into the issue showed that commit "mnt: Implicitly add MNT_NODEV on remount as we do on mount" is not mainline. Why did you left out this patch? In my previous mails I explicitly stated that exactly this commit unbreaks libvirt-lxc. Now the userspace breaking changes are mainline and hit users hard. :-( -- Thanks, //richard ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <CAFLxGvzyhHC+QF-bFfp-yNBpCkS3JJ+RAr+5iCj0k_su9wJbGw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-11-25 23:15 ` Richard Weinberger @ 2014-11-29 16:58 ` Richard Weinberger -1 siblings, 0 replies; 67+ messages in thread From: Richard Weinberger @ 2014-11-29 16:58 UTC (permalink / raw) To: Eric W. Biederman Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML, Christoph Hellwig, linux-fsdevel, Linus Torvalds Am 26.11.2014 um 00:15 schrieb Richard Weinberger: > Eric, > > On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman > <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: >> Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> writes: >> >>> Am 21.08.2014 15:12, schrieb Christoph Hellwig: >>>> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: >>>>> Richard Weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: >>>>> >>>>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: >>>>>> >>>>>> This commit breaks libvirt-lxc. >>>>>> libvirt does in lxcContainerMountBasicFS(): >>>>> >>>>> The bugs fixed are security issues, so if we have to break a small >>>>> number of userspace applications we will. Anything that we can >>>>> reasonably do to avoid regressions will be done. >>>> >>>> Can you explain the security issues in detail? Breaking common >>>> userspace like libvirt-lxc with just a little bit of handwaiving is >>>> entirely unacceptable. >>> >>> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in >>> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next >>> unbreaks libvirt-lxc. >>> I hope it hits Linus tree and -stable before the offending commit hits users. >> >> I plan to send the pull request to Linus as soon as I have caught my >> breath (from all of the conferences this week) that I can be certain I >> am thinking clearly and not rushing things. > > Today I've upgraded my LXC testbed to the most recent kernel and found > libvirt-lxc broken again (sic!). > Remounting /proc/sys/ is failing. > Investigating into the issue showed that commit "mnt: Implicitly add > MNT_NODEV on remount as we do on mount" > is not mainline. > Why did you left out this patch? In my previous mails I explicitly > stated that exactly this commit unbreaks libvirt-lxc. > > Now the userspace breaking changes are mainline and hit users hard. :-( *kind ping* ...to make sure that this issue doesn't get lost. Thanks, //richard ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-11-29 16:58 ` Richard Weinberger 0 siblings, 0 replies; 67+ messages in thread From: Richard Weinberger @ 2014-11-29 16:58 UTC (permalink / raw) To: Eric W. Biederman Cc: Christoph Hellwig, Linus Torvalds, Linux Containers, linux-fsdevel, LKML, libvir-list, Daniel P. Berrange Am 26.11.2014 um 00:15 schrieb Richard Weinberger: > Eric, > > On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman > <ebiederm@xmission.com> wrote: >> Richard Weinberger <richard@nod.at> writes: >> >>> Am 21.08.2014 15:12, schrieb Christoph Hellwig: >>>> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: >>>>> Richard Weinberger <richard.weinberger@gmail.com> writes: >>>>> >>>>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: >>>>>> >>>>>> This commit breaks libvirt-lxc. >>>>>> libvirt does in lxcContainerMountBasicFS(): >>>>> >>>>> The bugs fixed are security issues, so if we have to break a small >>>>> number of userspace applications we will. Anything that we can >>>>> reasonably do to avoid regressions will be done. >>>> >>>> Can you explain the security issues in detail? Breaking common >>>> userspace like libvirt-lxc with just a little bit of handwaiving is >>>> entirely unacceptable. >>> >>> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in >>> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next >>> unbreaks libvirt-lxc. >>> I hope it hits Linus tree and -stable before the offending commit hits users. >> >> I plan to send the pull request to Linus as soon as I have caught my >> breath (from all of the conferences this week) that I can be certain I >> am thinking clearly and not rushing things. > > Today I've upgraded my LXC testbed to the most recent kernel and found > libvirt-lxc broken again (sic!). > Remounting /proc/sys/ is failing. > Investigating into the issue showed that commit "mnt: Implicitly add > MNT_NODEV on remount as we do on mount" > is not mainline. > Why did you left out this patch? In my previous mails I explicitly > stated that exactly this commit unbreaks libvirt-lxc. > > Now the userspace breaking changes are mainline and hit users hard. :-( *kind ping* ...to make sure that this issue doesn't get lost. Thanks, //richard ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-21 13:12 ` Christoph Hellwig @ 2014-08-21 13:43 ` Eric W. Biederman -1 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-21 13:43 UTC (permalink / raw) To: Christoph Hellwig Cc: Richard Weinberger, Linux Containers, LKML, libvir-list-H+wXaHxf7aLQT0dZR+AlfA, linux-fsdevel, Linus Torvalds Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> writes: > On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: >> Richard Weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: >> >> > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: >> > >> > This commit breaks libvirt-lxc. >> > libvirt does in lxcContainerMountBasicFS(): >> >> The bugs fixed are security issues, so if we have to break a small >> number of userspace applications we will. Anything that we can >> reasonably do to avoid regressions will be done. > > Can you explain the security issues in detail? Breaking common > userspace like libvirt-lxc with just a little bit of handwaiving is > entirely unacceptable. The biggies ability for an unprivileged user to clear nosuid, nodev, read-only, noexec with remount. Apparently part of what libvirt-lxc clearing all mount flags except for read-only on some filesystem it remounts read-only. Which if root mounted the filesystem with nosuid, nodev, or noexec is not supportable. So if it isn't the implicit setting of nodev on mount but not on remount causing problems (which I have the trivial fix for) there is a strong chance I need to break libvirt-lxc. I am several bugs deep already where fixing one bug reveals yet another bug. It is possible there is something else that that I have not discovered that I am missing. I am more to happy to investigate to see if it possible to avoid breaking libvirt-lxc. Apologies for being a little vague being at the conference I only have access to email about an hour a day. Eric ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-21 13:43 ` Eric W. Biederman 0 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-21 13:43 UTC (permalink / raw) To: Christoph Hellwig Cc: Richard Weinberger, Linus Torvalds, Linux Containers, linux-fsdevel, LKML, libvir-list, Daniel P. Berrange Christoph Hellwig <hch@infradead.org> writes: > On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote: >> Richard Weinberger <richard.weinberger@gmail.com> writes: >> >> > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: >> > >> > This commit breaks libvirt-lxc. >> > libvirt does in lxcContainerMountBasicFS(): >> >> The bugs fixed are security issues, so if we have to break a small >> number of userspace applications we will. Anything that we can >> reasonably do to avoid regressions will be done. > > Can you explain the security issues in detail? Breaking common > userspace like libvirt-lxc with just a little bit of handwaiving is > entirely unacceptable. The biggies ability for an unprivileged user to clear nosuid, nodev, read-only, noexec with remount. Apparently part of what libvirt-lxc clearing all mount flags except for read-only on some filesystem it remounts read-only. Which if root mounted the filesystem with nosuid, nodev, or noexec is not supportable. So if it isn't the implicit setting of nodev on mount but not on remount causing problems (which I have the trivial fix for) there is a strong chance I need to break libvirt-lxc. I am several bugs deep already where fixing one bug reveals yet another bug. It is possible there is something else that that I have not discovered that I am missing. I am more to happy to investigate to see if it possible to avoid breaking libvirt-lxc. Apologies for being a little vague being at the conference I only have access to email about an hour a day. Eric ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-21 4:53 ` Eric W. Biederman (?) (?) @ 2014-08-21 6:29 ` Richard Weinberger [not found] ` <53F591E7.3010509-/L3Ra7n9ekc@public.gmane.org> -1 siblings, 1 reply; 67+ messages in thread From: Richard Weinberger @ 2014-08-21 6:29 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Linux Containers, linux-fsdevel, LKML, libvir-list, Daniel P. Berrange Am 21.08.2014 06:53, schrieb Eric W. Biederman: > The bugs fixed are security issues, so if we have to break a small > number of userspace applications we will. Anything that we can > reasonably do to avoid regressions will be done. > > Could you please look at my user-namespace.git#for-next branch I have a > fix for at least one regresion causing issue in there. I think it may > fix your issues but I am not fully certain more comments below. I'll run this on my LXC testbed today. >> /* >> * We can't immediately set the MS_RDONLY flag when mounting filesystems >> * because (in at least some kernel versions) this will propagate back >> * to the original mount in the host OS, turning it readonly too. Thus >> * we mount the filesystem in read-write mode initially, and then do a >> * separate read-only bind mount on top of that. >> */ >> bindOverReadonly = !!(mnt_mflags & MS_RDONLY); >> >> VIR_DEBUG("Mount %s on %s type=%s flags=%x", >> mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY); >> if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags & >> ~MS_RDONLY, NULL) < 0) { >> >> ^^^^ Here it fails for sysfs because with user namespaces we bind the >> existing /sys into the container >> and would have to read out all existing mount flags from the current /sys mount. >> Otherwise mount() fails with EPERM. >> On my test system /sys is mounted with >> "rw,nosuid,nodev,noexec,relatime" and libvirt >> misses the realtime... > > Not specifying any atime flags to mount should be safe as that asks for > the default atime flags which for remount I have made the default atime > flags the existing atime flags. So I am scratching my head a little on > this one. Okay, let me find out why exactly libvirt gets a EPERM here. Maybe there are more odds hidden. >> >> virReportSystemError(errno, >> _("Failed to mount %s on %s type %s flags=%x"), >> mnt_src, mnt->dst, NULLSTR(mnt->type), >> mnt_mflags & ~MS_RDONLY); >> goto cleanup; >> } >> >> if (bindOverReadonly && >> mount(mnt_src, mnt->dst, NULL, >> MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) { >> >> ^^^ Here it fails because now we'd have to specify all flags as used >> for the first >> mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV. >> See lxcBasicMounts[]. >> In this case the fix is easy, add mnt_mflags to the mount flags. > > That has always been a bug in general because remount has always > required specifying the complete set of mount flags you want to have. > > That fact that flags such as nosuid are now properly locked so you can > not change them if you are not the global root user just makes this > obvious. > > Andy Lutermorski has observed that statvfs will return the mount flags > making reading them simple. Thanks for the clarification, I'll create a fix for libvirt. Thanks, //richard ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <53F591E7.3010509-/L3Ra7n9ekc@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-21 6:29 ` Richard Weinberger @ 2014-08-21 7:24 ` Richard Weinberger 0 siblings, 0 replies; 67+ messages in thread From: Richard Weinberger @ 2014-08-21 7:24 UTC (permalink / raw) To: Eric W. Biederman Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML, linux-fsdevel, Linus Torvalds Am 21.08.2014 08:29, schrieb Richard Weinberger: > Am 21.08.2014 06:53, schrieb Eric W. Biederman: >> The bugs fixed are security issues, so if we have to break a small >> number of userspace applications we will. Anything that we can >> reasonably do to avoid regressions will be done. >> >> Could you please look at my user-namespace.git#for-next branch I have a >> fix for at least one regresion causing issue in there. I think it may >> fix your issues but I am not fully certain more comments below. > > I'll run this on my LXC testbed today. Looks good. With these patches applied libvirt works again. :) Thanks, //richard ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-21 7:24 ` Richard Weinberger 0 siblings, 0 replies; 67+ messages in thread From: Richard Weinberger @ 2014-08-21 7:24 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Linux Containers, linux-fsdevel, LKML, libvir-list, Daniel P. Berrange Am 21.08.2014 08:29, schrieb Richard Weinberger: > Am 21.08.2014 06:53, schrieb Eric W. Biederman: >> The bugs fixed are security issues, so if we have to break a small >> number of userspace applications we will. Anything that we can >> reasonably do to avoid regressions will be done. >> >> Could you please look at my user-namespace.git#for-next branch I have a >> fix for at least one regresion causing issue in there. I think it may >> fix your issues but I am not fully certain more comments below. > > I'll run this on my LXC testbed today. Looks good. With these patches applied libvirt works again. :) Thanks, //richard ^ permalink raw reply [flat|nested] 67+ messages in thread
[parent not found: <53F59EC7.6060107-/L3Ra7n9ekc@public.gmane.org>]
* Re: [GIT PULL] namespace updates for v3.17-rc1 2014-08-21 7:24 ` Richard Weinberger @ 2014-08-21 13:54 ` Eric W. Biederman -1 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-21 13:54 UTC (permalink / raw) To: Richard Weinberger Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML, linux-fsdevel, Linus Torvalds Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> writes: > Am 21.08.2014 08:29, schrieb Richard Weinberger: >> Am 21.08.2014 06:53, schrieb Eric W. Biederman: >>> The bugs fixed are security issues, so if we have to break a small >>> number of userspace applications we will. Anything that we can >>> reasonably do to avoid regressions will be done. >>> >>> Could you please look at my user-namespace.git#for-next branch I have a >>> fix for at least one regresion causing issue in there. I think it may >>> fix your issues but I am not fully certain more comments below. >> >> I'll run this on my LXC testbed today. > > Looks good. With these patches applied libvirt works again. :) Darn I read my email in the wrong order. I am glad to hear that my changes were enough to fix libvirt-lxc. I will aim at pushing this to Linus after the conference is over and I can trust myself to think clearly. Eric ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [GIT PULL] namespace updates for v3.17-rc1 @ 2014-08-21 13:54 ` Eric W. Biederman 0 siblings, 0 replies; 67+ messages in thread From: Eric W. Biederman @ 2014-08-21 13:54 UTC (permalink / raw) To: Richard Weinberger Cc: Linus Torvalds, Linux Containers, linux-fsdevel, LKML, libvir-list, Daniel P. Berrange Richard Weinberger <richard@nod.at> writes: > Am 21.08.2014 08:29, schrieb Richard Weinberger: >> Am 21.08.2014 06:53, schrieb Eric W. Biederman: >>> The bugs fixed are security issues, so if we have to break a small >>> number of userspace applications we will. Anything that we can >>> reasonably do to avoid regressions will be done. >>> >>> Could you please look at my user-namespace.git#for-next branch I have a >>> fix for at least one regresion causing issue in there. I think it may >>> fix your issues but I am not fully certain more comments below. >> >> I'll run this on my LXC testbed today. > > Looks good. With these patches applied libvirt works again. :) Darn I read my email in the wrong order. I am glad to hear that my changes were enough to fix libvirt-lxc. I will aim at pushing this to Linus after the conference is over and I can trust myself to think clearly. Eric ^ permalink raw reply [flat|nested] 67+ messages in thread
end of thread, other threads:[~2014-11-29 16:58 UTC | newest] Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-08-06 0:57 [GIT PULL] namespace updates for v3.17-rc1 Eric W. Biederman 2014-08-06 0:57 ` Eric W. Biederman [not found] ` <87fvhav3ic.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2014-08-06 4:46 ` Stephen Rothwell 2014-08-06 4:46 ` Stephen Rothwell 2014-08-06 4:46 ` Stephen Rothwell [not found] ` <20140806144643.45e5dab8-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org> 2014-08-06 5:16 ` Eric W. Biederman 2014-08-06 5:16 ` Eric W. Biederman [not found] ` <87lhr2tcyx.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2014-08-06 6:06 ` Stephen Rothwell 2014-08-06 6:06 ` Stephen Rothwell 2014-08-06 6:06 ` Stephen Rothwell [not found] ` <20140806160608.218b6944-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org> 2014-08-06 6:30 ` Eric W. Biederman 2014-08-06 6:30 ` Eric W. Biederman 2014-08-07 13:28 ` Theodore Ts'o 2014-08-07 13:28 ` Theodore Ts'o 2014-08-13 2:46 ` Andy Lutomirski 2014-08-13 2:46 ` Andy Lutomirski [not found] ` <53EAD180.4010906-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> 2014-08-13 4:17 ` Eric W. Biederman 2014-08-13 4:17 ` Eric W. Biederman [not found] ` <87sil1nhut.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2014-08-13 4:38 ` Andy Lutomirski 2014-08-13 4:38 ` Andy Lutomirski 2014-08-13 4:45 ` Kenton Varda [not found] ` <CAOP=4widH1rMZ1O=hzAT+M_8exdzRPA8pJ+wH29AQ9L0ogu9nw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-08-13 10:24 ` Eric W. Biederman 2014-08-13 10:24 ` Eric W. Biederman [not found] ` <87tx5ghekp.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2014-08-13 17:03 ` Andy Lutomirski 2014-08-13 17:03 ` Andy Lutomirski [not found] ` <CALCETrWT_p1-5nkiAjWoeta19fkO3rDiJe9_mhRVqF8x1zXv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-08-14 0:03 ` [PATCH] fs: Remove implicit nodev for new mounts in non-root userns Andy Lutomirski 2014-08-14 0:03 ` Andy Lutomirski 2014-08-15 19:05 ` Serge Hallyn 2014-08-15 19:16 ` Andy Lutomirski 2014-08-15 19:16 ` Andy Lutomirski [not found] ` <CALCETrVKq1Fxnsd9jKDi5_fcKfCJxBZ1w-zGXD3FR-pF-jLsmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-08-15 19:37 ` Serge Hallyn 2014-08-15 19:37 ` Serge Hallyn 2014-08-15 19:56 ` Andy Lutomirski 2014-08-15 19:56 ` Andy Lutomirski [not found] ` <CALCETrWB0qBiyfJbapFnjxoNyNvS+aHvgc_eob3fC1j=cv+v5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-08-15 20:16 ` Serge Hallyn 2014-08-15 20:16 ` Serge Hallyn [not found] ` <2686c32f00b14148379e8cfee9c028c794d4aa1a.1407974494.git.luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> 2014-08-15 19:05 ` Serge Hallyn 2014-08-15 20:16 ` Serge Hallyn 2014-08-28 1:35 ` Andy Lutomirski 2014-08-15 20:16 ` Serge Hallyn 2014-08-28 1:35 ` Andy Lutomirski 2014-08-15 18:41 ` [GIT PULL] namespace updates for v3.17-rc1 Andy Lutomirski 2014-08-15 18:41 ` Andy Lutomirski 2014-08-20 15:06 ` Richard Weinberger 2014-08-20 15:06 ` Richard Weinberger [not found] ` <CAFLxGvwi-iJRyfwv8v9fcRkiSu2d-az8W55xMPbp_d8wQKmwjg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-08-21 4:53 ` Eric W. Biederman 2014-08-21 4:53 ` Eric W. Biederman [not found] ` <87vbpm4f4y.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2014-08-21 6:29 ` Richard Weinberger 2014-08-21 13:12 ` Christoph Hellwig 2014-08-21 13:12 ` Christoph Hellwig [not found] ` <20140821131257.GA4264-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2014-08-21 13:22 ` Richard Weinberger 2014-08-21 13:22 ` Richard Weinberger [not found] ` <53F5F2AD.5010607-/L3Ra7n9ekc@public.gmane.org> 2014-08-21 14:09 ` Eric W. Biederman 2014-08-21 14:09 ` Eric W. Biederman [not found] ` <87k362vsr5.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2014-09-03 21:18 ` Richard Weinberger 2014-09-03 21:18 ` Richard Weinberger 2014-11-25 23:15 ` Richard Weinberger 2014-11-25 23:15 ` Richard Weinberger [not found] ` <CAFLxGvzyhHC+QF-bFfp-yNBpCkS3JJ+RAr+5iCj0k_su9wJbGw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-11-29 16:58 ` Richard Weinberger 2014-11-29 16:58 ` Richard Weinberger 2014-08-21 13:43 ` Eric W. Biederman 2014-08-21 13:43 ` Eric W. Biederman 2014-08-21 6:29 ` Richard Weinberger [not found] ` <53F591E7.3010509-/L3Ra7n9ekc@public.gmane.org> 2014-08-21 7:24 ` Richard Weinberger 2014-08-21 7:24 ` Richard Weinberger [not found] ` <53F59EC7.6060107-/L3Ra7n9ekc@public.gmane.org> 2014-08-21 13:54 ` Eric W. Biederman 2014-08-21 13:54 ` Eric W. Biederman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.