All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-06  0:57 ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-06  0:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA


Linus,

Please pull the for-linus branch from the git tree:

   git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus

   HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts

This is a bunch of small changes built against 3.16-rc6.  The most
significant change for users is the first patch which makes setns
drmatically faster by removing unneded rcu handling.

The next chunk of changes are so that "mount -o remount,.." will not
allow the user namespace root to drop flags on a mount set by the system
wide root.  Aks this forces read-only mounts to stay read-only, no-dev
mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to
stay no exec and it prevents unprivileged users from messing with a
mounts atime settings.  I have included my test case as the last patch
in this series so people performing backports can verify this change
works correctly.

The next change fixes a bug in NFS that was discovered while auditing
nsproxy users for the first optimization.  Today you can oops the kernel
by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid
namespaces.  I rebased and fixed the build of the !CONFIG_NFS_FS case
yesterday when a build bot caught my typo.  Given that no one to my
knowledge bases anything on my tree fixing the typo in place seems more
responsible that requiring a typo-fix to be backported as well.

The last change is a small semantic cleanup introducing
/proc/thread-self and pointing /proc/mounts and /proc/net at it.  This
prevents several kinds of problemantic corner cases.  It is a
user-visible change so it has a minute chance of causing regressions so
the change to /proc/mounts and /proc/net are individual one line commits
that can be trivially reverted.  Unfortunately I lost and could not find
the email of the original reporter so he is not credited.  From at least
one perspective this change to /proc/net is a refgression fix to allow
pthread /proc/net uses that were broken by the introduction of the network
namespace.

Eric

Eric W. Biederman (11):
      namespaces: Use task_lock and not rcu to protect nsproxy
      mnt: Only change user settable mount flags in remount
      mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount
      mnt: Correct permission checks in do_remount
      mnt: Change the default remount atime from relatime to the existing value
      mnt: Add tests for unprivileged remount cases that have found to be faulty
      NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes
      proc: Have net show up under /proc/<tgid>/task/<tid>
      proc: Implement /proc/thread-self to point at the directory of the current thread
      proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net
      proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts

 fs/namespace.c                                     |  65 +++++-
 fs/nfs/client.c                                    |  95 ++++----
 fs/nfs/inode.c                                     |   3 +-
 fs/nfs/internal.h                                  |   9 +
 fs/nfs/netns.h                                     |   3 +
 fs/proc/Makefile                                   |   1 +
 fs/proc/base.c                                     |  18 +-
 fs/proc/inode.c                                    |   7 +-
 fs/proc/internal.h                                 |   6 +
 fs/proc/proc_net.c                                 |   6 +-
 fs/proc/root.c                                     |   5 +-
 fs/proc/thread_self.c                              |  85 ++++++++
 fs/proc_namespace.c                                |   8 +-
 include/linux/mount.h                              |   9 +-
 include/linux/nsproxy.h                            |  16 +-
 include/linux/pid_namespace.h                      |   1 +
 ipc/namespace.c                                    |   6 +-
 kernel/nsproxy.c                                   |  15 +-
 kernel/utsname.c                                   |   6 +-
 net/core/net_namespace.c                           |  10 +-
 tools/testing/selftests/Makefile                   |   1 +
 tools/testing/selftests/mount/Makefile             |  17 ++
 .../selftests/mount/unprivileged-remount-test.c    | 242 +++++++++++++++++++++
 23 files changed, 537 insertions(+), 97 deletions(-)

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-06  0:57 ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-06  0:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Containers, linux-fsdevel, linux-kernel


Linus,

Please pull the for-linus branch from the git tree:

   git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus

   HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts

This is a bunch of small changes built against 3.16-rc6.  The most
significant change for users is the first patch which makes setns
drmatically faster by removing unneded rcu handling.

The next chunk of changes are so that "mount -o remount,.." will not
allow the user namespace root to drop flags on a mount set by the system
wide root.  Aks this forces read-only mounts to stay read-only, no-dev
mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to
stay no exec and it prevents unprivileged users from messing with a
mounts atime settings.  I have included my test case as the last patch
in this series so people performing backports can verify this change
works correctly.

The next change fixes a bug in NFS that was discovered while auditing
nsproxy users for the first optimization.  Today you can oops the kernel
by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid
namespaces.  I rebased and fixed the build of the !CONFIG_NFS_FS case
yesterday when a build bot caught my typo.  Given that no one to my
knowledge bases anything on my tree fixing the typo in place seems more
responsible that requiring a typo-fix to be backported as well.

The last change is a small semantic cleanup introducing
/proc/thread-self and pointing /proc/mounts and /proc/net at it.  This
prevents several kinds of problemantic corner cases.  It is a
user-visible change so it has a minute chance of causing regressions so
the change to /proc/mounts and /proc/net are individual one line commits
that can be trivially reverted.  Unfortunately I lost and could not find
the email of the original reporter so he is not credited.  From at least
one perspective this change to /proc/net is a refgression fix to allow
pthread /proc/net uses that were broken by the introduction of the network
namespace.

Eric

Eric W. Biederman (11):
      namespaces: Use task_lock and not rcu to protect nsproxy
      mnt: Only change user settable mount flags in remount
      mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount
      mnt: Correct permission checks in do_remount
      mnt: Change the default remount atime from relatime to the existing value
      mnt: Add tests for unprivileged remount cases that have found to be faulty
      NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes
      proc: Have net show up under /proc/<tgid>/task/<tid>
      proc: Implement /proc/thread-self to point at the directory of the current thread
      proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net
      proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts

 fs/namespace.c                                     |  65 +++++-
 fs/nfs/client.c                                    |  95 ++++----
 fs/nfs/inode.c                                     |   3 +-
 fs/nfs/internal.h                                  |   9 +
 fs/nfs/netns.h                                     |   3 +
 fs/proc/Makefile                                   |   1 +
 fs/proc/base.c                                     |  18 +-
 fs/proc/inode.c                                    |   7 +-
 fs/proc/internal.h                                 |   6 +
 fs/proc/proc_net.c                                 |   6 +-
 fs/proc/root.c                                     |   5 +-
 fs/proc/thread_self.c                              |  85 ++++++++
 fs/proc_namespace.c                                |   8 +-
 include/linux/mount.h                              |   9 +-
 include/linux/nsproxy.h                            |  16 +-
 include/linux/pid_namespace.h                      |   1 +
 ipc/namespace.c                                    |   6 +-
 kernel/nsproxy.c                                   |  15 +-
 kernel/utsname.c                                   |   6 +-
 net/core/net_namespace.c                           |  10 +-
 tools/testing/selftests/Makefile                   |   1 +
 tools/testing/selftests/mount/Makefile             |  17 ++
 .../selftests/mount/unprivileged-remount-test.c    | 242 +++++++++++++++++++++
 23 files changed, 537 insertions(+), 97 deletions(-)

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-06  0:57 ` Eric W. Biederman
  (?)
@ 2014-08-06  4:46     ` Stephen Rothwell
  -1 siblings, 0 replies; 67+ messages in thread
From: Stephen Rothwell @ 2014-08-06  4:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Linus Torvalds, linux-kernel-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 1373 bytes --]

Hi Eric,

On Tue, 05 Aug 2014 17:57:31 -0700 ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote:
>
> Please pull the for-linus branch from the git tree:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus
> 
>    HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts

This has had 4 commits added since the merge window opened that have no
Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by tag.

> The last change is a small semantic cleanup introducing
> /proc/thread-self and pointing /proc/mounts and /proc/net at it.  This
> prevents several kinds of problemantic corner cases.  It is a
> user-visible change so it has a minute chance of causing regressions so
> the change to /proc/mounts and /proc/net are individual one line commits
> that can be trivially reverted.  Unfortunately I lost and could not find
> the email of the original reporter so he is not credited.  From at least
> one perspective this change to /proc/net is a refgression fix to allow
> pthread /proc/net uses that were broken by the introduction of the network
> namespace.

They appear to include this last change.

-- 
Cheers,
Stephen Rothwell                    sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-06  4:46     ` Stephen Rothwell
  0 siblings, 0 replies; 67+ messages in thread
From: Stephen Rothwell @ 2014-08-06  4:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Linux Containers, linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

Hi Eric,

On Tue, 05 Aug 2014 17:57:31 -0700 ebiederm@xmission.com (Eric W. Biederman) wrote:
>
> Please pull the for-linus branch from the git tree:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus
> 
>    HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts

This has had 4 commits added since the merge window opened that have no
Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by tag.

> The last change is a small semantic cleanup introducing
> /proc/thread-self and pointing /proc/mounts and /proc/net at it.  This
> prevents several kinds of problemantic corner cases.  It is a
> user-visible change so it has a minute chance of causing regressions so
> the change to /proc/mounts and /proc/net are individual one line commits
> that can be trivially reverted.  Unfortunately I lost and could not find
> the email of the original reporter so he is not credited.  From at least
> one perspective this change to /proc/net is a refgression fix to allow
> pthread /proc/net uses that were broken by the introduction of the network
> namespace.

They appear to include this last change.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-06  4:46     ` Stephen Rothwell
  0 siblings, 0 replies; 67+ messages in thread
From: Stephen Rothwell @ 2014-08-06  4:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Linus Torvalds, linux-kernel-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 1373 bytes --]

Hi Eric,

On Tue, 05 Aug 2014 17:57:31 -0700 ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote:
>
> Please pull the for-linus branch from the git tree:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus
> 
>    HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts

This has had 4 commits added since the merge window opened that have no
Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by tag.

> The last change is a small semantic cleanup introducing
> /proc/thread-self and pointing /proc/mounts and /proc/net at it.  This
> prevents several kinds of problemantic corner cases.  It is a
> user-visible change so it has a minute chance of causing regressions so
> the change to /proc/mounts and /proc/net are individual one line commits
> that can be trivially reverted.  Unfortunately I lost and could not find
> the email of the original reporter so he is not credited.  From at least
> one perspective this change to /proc/net is a refgression fix to allow
> pthread /proc/net uses that were broken by the introduction of the network
> namespace.

They appear to include this last change.

-- 
Cheers,
Stephen Rothwell                    sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-06  4:46     ` Stephen Rothwell
@ 2014-08-06  5:16         ` Eric W. Biederman
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-06  5:16 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Linus Torvalds, linux-kernel-u79uwXL29TY76Z2rM5mHXA

Stephen Rothwell <sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org> writes:

> Hi Eric,
>
> On Tue, 05 Aug 2014 17:57:31 -0700 ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote:
>>
>> Please pull the for-linus branch from the git tree:
>> 
>>    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus
>> 
>>    HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts
>
> This has had 4 commits added since the merge window opened that have no
> Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by
> tag.

I am not certain what your point is.

There have been no commits added since the merge window opened.

There was one commit changed to fix a typo.  I documented that already.

There were some commits pushed to the tree as late as friday that had
been out for review earlier than that and it is possible that you did
not pick them up in linux-next until monday.  That doesn't mean I added
anything after the merge window opened.

I have also made certain all of these commits have at least had a chance
to show up in linux-next.

As for missing cool tags shrug.  The people looking at my code didn't
feel like saying the magic words so I didn't include cool tags.
Furthermore the code is all quite trivial.

Beyond that I have been quite out of it recently and this is what I had
time to do.  If I had had a little more time and energy I would have
included unmount on unlink patches that still need magic to happen to
keep from blowing the stack in pathological cases on everything except
x86_64.  That code has been sitting in linux-next.

Which is a my long winded way of say it sounds like you are accusing me
of being irresponsible, and my way of saying that I have tested and
vetted these trivial patches and been as careful as I reasonably could,
and I that I apologise for not being able to do things a few days
earlier.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-06  5:16         ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-06  5:16 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Linus Torvalds, Linux Containers, linux-fsdevel, linux-kernel

Stephen Rothwell <sfr@canb.auug.org.au> writes:

> Hi Eric,
>
> On Tue, 05 Aug 2014 17:57:31 -0700 ebiederm@xmission.com (Eric W. Biederman) wrote:
>>
>> Please pull the for-linus branch from the git tree:
>> 
>>    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus
>> 
>>    HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts
>
> This has had 4 commits added since the merge window opened that have no
> Reviewed-by, Acked-by or Tested-by tags and only one Signed-off-by
> tag.

I am not certain what your point is.

There have been no commits added since the merge window opened.

There was one commit changed to fix a typo.  I documented that already.

There were some commits pushed to the tree as late as friday that had
been out for review earlier than that and it is possible that you did
not pick them up in linux-next until monday.  That doesn't mean I added
anything after the merge window opened.

I have also made certain all of these commits have at least had a chance
to show up in linux-next.

As for missing cool tags shrug.  The people looking at my code didn't
feel like saying the magic words so I didn't include cool tags.
Furthermore the code is all quite trivial.

Beyond that I have been quite out of it recently and this is what I had
time to do.  If I had had a little more time and energy I would have
included unmount on unlink patches that still need magic to happen to
keep from blowing the stack in pathological cases on everything except
x86_64.  That code has been sitting in linux-next.

Which is a my long winded way of say it sounds like you are accusing me
of being irresponsible, and my way of saying that I have tested and
vetted these trivial patches and been as careful as I reasonably could,
and I that I apologise for not being able to do things a few days
earlier.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-06  5:16         ` Eric W. Biederman
  (?)
@ 2014-08-06  6:06             ` Stephen Rothwell
  -1 siblings, 0 replies; 67+ messages in thread
From: Stephen Rothwell @ 2014-08-06  6:06 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Linus Torvalds, linux-kernel-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 1964 bytes --]

Hi Eric,

On Tue, 05 Aug 2014 22:16:06 -0700 ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote:
>
> I am not certain what your point is.

I am just trying to give Linus a heads up for branches that have not
had much exposure before he is asked to pull them.

> There have been no commits added since the merge window opened.
> 
> There was one commit changed to fix a typo.  I documented that already.

Yes, that is what confused my, sorry, since that commit and the ones
following are new commits but unchanged patches.

> There were some commits pushed to the tree as late as friday that had
> been out for review earlier than that and it is possible that you did
> not pick them up in linux-next until monday.  That doesn't mean I added
> anything after the merge window opened.

Correct.

> I have also made certain all of these commits have at least had a chance
> to show up in linux-next.
> 
> As for missing cool tags shrug.  The people looking at my code didn't
> feel like saying the magic words so I didn't include cool tags.

Maybe you should push them ... these tags are not just "cool", they
give less involved people some indications of what has happened in the
life of a patch.

> Beyond that I have been quite out of it recently and this is what I had
> time to do.  If I had had a little more time and energy I would have
> included unmount on unlink patches that still need magic to happen to
> keep from blowing the stack in pathological cases on everything except
> x86_64.  That code has been sitting in linux-next.
> 
> Which is a my long winded way of say it sounds like you are accusing me
> of being irresponsible, and my way of saying that I have tested and

I did not mean to accuse, there are reasons (as you have pointed out)
that things get added late, or rewritten.

-- 
Cheers,
Stephen Rothwell                    sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-06  6:06             ` Stephen Rothwell
  0 siblings, 0 replies; 67+ messages in thread
From: Stephen Rothwell @ 2014-08-06  6:06 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Linux Containers, linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1914 bytes --]

Hi Eric,

On Tue, 05 Aug 2014 22:16:06 -0700 ebiederm@xmission.com (Eric W. Biederman) wrote:
>
> I am not certain what your point is.

I am just trying to give Linus a heads up for branches that have not
had much exposure before he is asked to pull them.

> There have been no commits added since the merge window opened.
> 
> There was one commit changed to fix a typo.  I documented that already.

Yes, that is what confused my, sorry, since that commit and the ones
following are new commits but unchanged patches.

> There were some commits pushed to the tree as late as friday that had
> been out for review earlier than that and it is possible that you did
> not pick them up in linux-next until monday.  That doesn't mean I added
> anything after the merge window opened.

Correct.

> I have also made certain all of these commits have at least had a chance
> to show up in linux-next.
> 
> As for missing cool tags shrug.  The people looking at my code didn't
> feel like saying the magic words so I didn't include cool tags.

Maybe you should push them ... these tags are not just "cool", they
give less involved people some indications of what has happened in the
life of a patch.

> Beyond that I have been quite out of it recently and this is what I had
> time to do.  If I had had a little more time and energy I would have
> included unmount on unlink patches that still need magic to happen to
> keep from blowing the stack in pathological cases on everything except
> x86_64.  That code has been sitting in linux-next.
> 
> Which is a my long winded way of say it sounds like you are accusing me
> of being irresponsible, and my way of saying that I have tested and

I did not mean to accuse, there are reasons (as you have pointed out)
that things get added late, or rewritten.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-06  6:06             ` Stephen Rothwell
  0 siblings, 0 replies; 67+ messages in thread
From: Stephen Rothwell @ 2014-08-06  6:06 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Linus Torvalds, linux-kernel-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 1964 bytes --]

Hi Eric,

On Tue, 05 Aug 2014 22:16:06 -0700 ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote:
>
> I am not certain what your point is.

I am just trying to give Linus a heads up for branches that have not
had much exposure before he is asked to pull them.

> There have been no commits added since the merge window opened.
> 
> There was one commit changed to fix a typo.  I documented that already.

Yes, that is what confused my, sorry, since that commit and the ones
following are new commits but unchanged patches.

> There were some commits pushed to the tree as late as friday that had
> been out for review earlier than that and it is possible that you did
> not pick them up in linux-next until monday.  That doesn't mean I added
> anything after the merge window opened.

Correct.

> I have also made certain all of these commits have at least had a chance
> to show up in linux-next.
> 
> As for missing cool tags shrug.  The people looking at my code didn't
> feel like saying the magic words so I didn't include cool tags.

Maybe you should push them ... these tags are not just "cool", they
give less involved people some indications of what has happened in the
life of a patch.

> Beyond that I have been quite out of it recently and this is what I had
> time to do.  If I had had a little more time and energy I would have
> included unmount on unlink patches that still need magic to happen to
> keep from blowing the stack in pathological cases on everything except
> x86_64.  That code has been sitting in linux-next.
> 
> Which is a my long winded way of say it sounds like you are accusing me
> of being irresponsible, and my way of saying that I have tested and

I did not mean to accuse, there are reasons (as you have pointed out)
that things get added late, or rewritten.

-- 
Cheers,
Stephen Rothwell                    sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-06  6:06             ` Stephen Rothwell
@ 2014-08-06  6:30                 ` Eric W. Biederman
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-06  6:30 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Linus Torvalds, linux-kernel-u79uwXL29TY76Z2rM5mHXA

Stephen Rothwell <sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org> writes:

> Hi Eric,
>
>> As for missing cool tags shrug.  The people looking at my code didn't
>> feel like saying the magic words so I didn't include cool tags.
>
> Maybe you should push them ... these tags are not just "cool", they
> give less involved people some indications of what has happened in the
> life of a patch.

Etiquette is a good idea where it helps things flow smoothly.  Other
times like this standing on etiquette would have resulted in trivial
fixes for long standing issues not going anywhere.

Althogh I will keep it in mind in the future.

>> Which is a my long winded way of say it sounds like you are accusing me
>> of being irresponsible, and my way of saying that I have tested and
>
> I did not mean to accuse, there are reasons (as you have pointed out)
> that things get added late, or rewritten.

I really try to avoid those reasons but I missed it this round and the
buildbot did not manage to find the comibination with a build error
until the code hit linux-next :(

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-06  6:30                 ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-06  6:30 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Linus Torvalds, Linux Containers, linux-fsdevel, linux-kernel

Stephen Rothwell <sfr@canb.auug.org.au> writes:

> Hi Eric,
>
>> As for missing cool tags shrug.  The people looking at my code didn't
>> feel like saying the magic words so I didn't include cool tags.
>
> Maybe you should push them ... these tags are not just "cool", they
> give less involved people some indications of what has happened in the
> life of a patch.

Etiquette is a good idea where it helps things flow smoothly.  Other
times like this standing on etiquette would have resulted in trivial
fixes for long standing issues not going anywhere.

Althogh I will keep it in mind in the future.

>> Which is a my long winded way of say it sounds like you are accusing me
>> of being irresponsible, and my way of saying that I have tested and
>
> I did not mean to accuse, there are reasons (as you have pointed out)
> that things get added late, or rewritten.

I really try to avoid those reasons but I missed it this round and the
buildbot did not manage to find the comibination with a build error
until the code hit linux-next :(

Eric


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-06  6:06             ` Stephen Rothwell
@ 2014-08-07 13:28                 ` Theodore Ts'o
  -1 siblings, 0 replies; 67+ messages in thread
From: Theodore Ts'o @ 2014-08-07 13:28 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Linus Torvalds, Eric W. Biederman,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, Aug 06, 2014 at 04:06:08PM +1000, Stephen Rothwell wrote:
> 
> Yes, that is what confused my, sorry, since that commit and the ones
> following are new commits but unchanged patches.

As a suggestion, in addition to trying to count the number of commits
that are new, or with the same patch id's, etc., is to provide the
diff stats between the last version of the branch that landed in
linux-next before the merge window opened, and the branch head that
was requested for Linus to pull?

This will get tripped up by those poeple who send multiple pull
requests which are subsets of the branch that is tracked by linux-next
(for example, to arrange the order of merging to minimize conflicts),
so like all things, it's not perfect, but it might be useful
additional indicator, at least in the early days of the merge window.

Also, if you could report the commit id of the branch head before the
merge window opened, then other people (including Linus) can do their
own investigation and draw their own conclusions.

Cheers,

						- Ted

P.S.  If there was some way this could be automated so that a 'bot
sends out a report for every single pull request, then it minimizes
the "J'accuse!" aspect, since the report is being done for all pull
requests, and everyone knows that robots makes mistakes.  :-)

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-07 13:28                 ` Theodore Ts'o
  0 siblings, 0 replies; 67+ messages in thread
From: Theodore Ts'o @ 2014-08-07 13:28 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Eric W. Biederman, Linus Torvalds, Linux Containers,
	linux-fsdevel, linux-kernel

On Wed, Aug 06, 2014 at 04:06:08PM +1000, Stephen Rothwell wrote:
> 
> Yes, that is what confused my, sorry, since that commit and the ones
> following are new commits but unchanged patches.

As a suggestion, in addition to trying to count the number of commits
that are new, or with the same patch id's, etc., is to provide the
diff stats between the last version of the branch that landed in
linux-next before the merge window opened, and the branch head that
was requested for Linus to pull?

This will get tripped up by those poeple who send multiple pull
requests which are subsets of the branch that is tracked by linux-next
(for example, to arrange the order of merging to minimize conflicts),
so like all things, it's not perfect, but it might be useful
additional indicator, at least in the early days of the merge window.

Also, if you could report the commit id of the branch head before the
merge window opened, then other people (including Linus) can do their
own investigation and draw their own conclusions.

Cheers,

						- Ted

P.S.  If there was some way this could be automated so that a 'bot
sends out a report for every single pull request, then it minimizes
the "J'accuse!" aspect, since the report is being done for all pull
requests, and everyone knows that robots makes mistakes.  :-)

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-06  0:57 ` Eric W. Biederman
@ 2014-08-13  2:46     ` Andy Lutomirski
  -1 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-13  2:46 UTC (permalink / raw)
  To: Eric W. Biederman, Linus Torvalds
  Cc: Linux FS Devel, kenton-AuYgBwuPrUQTaNkGU808tA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Linux Kernel Mailing List

On 08/05/2014 05:57 PM, Eric W. Biederman wrote:
> 
> Linus,
> 
> Please pull the for-linus branch from the git tree:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus
> 
>    HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts
> 
> This is a bunch of small changes built against 3.16-rc6.  The most
> significant change for users is the first patch which makes setns
> drmatically faster by removing unneded rcu handling.
> 
> The next chunk of changes are so that "mount -o remount,.." will not
> allow the user namespace root to drop flags on a mount set by the system
> wide root.  Aks this forces read-only mounts to stay read-only, no-dev
> mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to
> stay no exec and it prevents unprivileged users from messing with a
> mounts atime settings.  I have included my test case as the last patch
> in this series so people performing backports can verify this change
> works correctly.
> 

Sorry for catching this late.  I think this fix is likely to
unnecessarily break valid userspace due to an odd interaction.

do_new_mount contains this:

	if (user_ns != &init_user_ns) {
		if (!(type->fs_flags & FS_USERNS_MOUNT)) {
			put_filesystem(type);
			return -EPERM;
		}
		/* Only in special cases allow devices from mounts
		 * created outside the initial user namespace.
		 */
		if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
			flags |= MS_NODEV;
			mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
		}
	}

This means that private tmpfs mounts end up with MNT_NODEV |
MNT_LOCK_NODEV.  Certainly the change from MNT_NODEV to MNT_LOCK_NODEV
is sensible *if MNT_NODEV made sense in the first place*.  But we didn't
have MNT_LOCK_NODEV in the past, so this well-intentioned code never
really worked.

This has a very strange effect: mounting a private tmpfs with all
default flags and then remounting with MS_REMOUNT | MS_BIND *without
MS_NODEV* will now fail.  I suspect that this practice is rather common,
since most likely no one ever paid attention to that implicit MNT_NODEV
before.

I would argue that the correct fix is to either remove the implicit
MNT_NODEV entirely or to set FS_USERNS_DEV_MOUNT on most filesystems.

The relevant filesystems are tmpfs, mqueue, sysfs, proc, ramfs, and
devpts.  devpts already sets FS_USERNS_DEV_MOUNT.  Setting
FS_USERNS_DEV_MOUNT should be safe on all of the others in this list --
I think that the only one that initially contains device nodes is sysfs
on selinux systems, which contains a null node.

I think the point of this is to prevent mounts of filesystems with
user-controlled initial contents from being dangerous.  But we don't
have any of those yet.

--Andy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-13  2:46     ` Andy Lutomirski
  0 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-13  2:46 UTC (permalink / raw)
  To: Eric W. Biederman, Linus Torvalds
  Cc: Linux FS Devel, containers, Linux Kernel Mailing List, kenton

On 08/05/2014 05:57 PM, Eric W. Biederman wrote:
> 
> Linus,
> 
> Please pull the for-linus branch from the git tree:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus
> 
>    HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts
> 
> This is a bunch of small changes built against 3.16-rc6.  The most
> significant change for users is the first patch which makes setns
> drmatically faster by removing unneded rcu handling.
> 
> The next chunk of changes are so that "mount -o remount,.." will not
> allow the user namespace root to drop flags on a mount set by the system
> wide root.  Aks this forces read-only mounts to stay read-only, no-dev
> mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to
> stay no exec and it prevents unprivileged users from messing with a
> mounts atime settings.  I have included my test case as the last patch
> in this series so people performing backports can verify this change
> works correctly.
> 

Sorry for catching this late.  I think this fix is likely to
unnecessarily break valid userspace due to an odd interaction.

do_new_mount contains this:

	if (user_ns != &init_user_ns) {
		if (!(type->fs_flags & FS_USERNS_MOUNT)) {
			put_filesystem(type);
			return -EPERM;
		}
		/* Only in special cases allow devices from mounts
		 * created outside the initial user namespace.
		 */
		if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
			flags |= MS_NODEV;
			mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
		}
	}

This means that private tmpfs mounts end up with MNT_NODEV |
MNT_LOCK_NODEV.  Certainly the change from MNT_NODEV to MNT_LOCK_NODEV
is sensible *if MNT_NODEV made sense in the first place*.  But we didn't
have MNT_LOCK_NODEV in the past, so this well-intentioned code never
really worked.

This has a very strange effect: mounting a private tmpfs with all
default flags and then remounting with MS_REMOUNT | MS_BIND *without
MS_NODEV* will now fail.  I suspect that this practice is rather common,
since most likely no one ever paid attention to that implicit MNT_NODEV
before.

I would argue that the correct fix is to either remove the implicit
MNT_NODEV entirely or to set FS_USERNS_DEV_MOUNT on most filesystems.

The relevant filesystems are tmpfs, mqueue, sysfs, proc, ramfs, and
devpts.  devpts already sets FS_USERNS_DEV_MOUNT.  Setting
FS_USERNS_DEV_MOUNT should be safe on all of the others in this list --
I think that the only one that initially contains device nodes is sysfs
on selinux systems, which contains a null node.

I think the point of this is to prevent mounts of filesystems with
user-controlled initial contents from being dangerous.  But we don't
have any of those yet.

--Andy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-13  2:46     ` Andy Lutomirski
@ 2014-08-13  4:17         ` Eric W. Biederman
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-13  4:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux FS Devel, kenton-AuYgBwuPrUQTaNkGU808tA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Linus Torvalds, Linux Kernel Mailing List

Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:

> On 08/05/2014 05:57 PM, Eric W. Biederman wrote:
>
> Sorry for catching this late.  I think this fix is likely to
> unnecessarily break valid userspace due to an odd interaction.

The code is correct and safe (no security issues), but yes a blind
remount might hit a snag.

If you can find a userspace application that matters I might care
that a security fix breaks it.

I think you have made a point that several more filesystems might
be ok to not set nodev on (because we can not do anything to create
device nodes on those filesystems).  I personally would prefer the much
more paranoid approach of only allowing device nodes on a unprivileged
mount if we have audited all of the code paths and know it is safe
for device nodes to appear there.

I don't actually think anyone cares ad remounts of filesystems like
tmpfs, mqueue, sysfs, proc, ramfs are all quite rare.  Blind remounts
are even rare.  The normal userspace utilities look at the appropriate
version of /proc/mounts on remount.

These are not filesystems that a blind remount will likely be applied
to.

Furthermore there is work underway to prepare patches to allow
"mount --bind -ro" to work as expected.  That will further reduce
the pressure from blind remounts.

If there is an actual regression of actual code I am happy to deal
with it.  But having the MNT_NODEV on those mounts has been the case
for a long time now and is not new (no regression).  This change just
closed the security hole that allowed nodev to be removed.  And that
security hole we need to have fixed.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-13  4:17         ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-13  4:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Linux FS Devel, containers,
	Linux Kernel Mailing List, kenton

Andy Lutomirski <luto@amacapital.net> writes:

> On 08/05/2014 05:57 PM, Eric W. Biederman wrote:
>
> Sorry for catching this late.  I think this fix is likely to
> unnecessarily break valid userspace due to an odd interaction.

The code is correct and safe (no security issues), but yes a blind
remount might hit a snag.

If you can find a userspace application that matters I might care
that a security fix breaks it.

I think you have made a point that several more filesystems might
be ok to not set nodev on (because we can not do anything to create
device nodes on those filesystems).  I personally would prefer the much
more paranoid approach of only allowing device nodes on a unprivileged
mount if we have audited all of the code paths and know it is safe
for device nodes to appear there.

I don't actually think anyone cares ad remounts of filesystems like
tmpfs, mqueue, sysfs, proc, ramfs are all quite rare.  Blind remounts
are even rare.  The normal userspace utilities look at the appropriate
version of /proc/mounts on remount.

These are not filesystems that a blind remount will likely be applied
to.

Furthermore there is work underway to prepare patches to allow
"mount --bind -ro" to work as expected.  That will further reduce
the pressure from blind remounts.

If there is an actual regression of actual code I am happy to deal
with it.  But having the MNT_NODEV on those mounts has been the case
for a long time now and is not new (no regression).  This change just
closed the security hole that allowed nodev to be removed.  And that
security hole we need to have fixed.

Eric


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-13  4:17         ` Eric W. Biederman
@ 2014-08-13  4:38             ` Andy Lutomirski
  -1 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-13  4:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux FS Devel, Kenton Varda, Linux Containers, Linus Torvalds,
	Linux Kernel Mailing List

On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:
>
>> On 08/05/2014 05:57 PM, Eric W. Biederman wrote:
>>
>> Sorry for catching this late.  I think this fix is likely to
>> unnecessarily break valid userspace due to an odd interaction.
>
> The code is correct and safe (no security issues), but yes a blind
> remount might hit a snag.
>
> If you can find a userspace application that matters I might care
> that a security fix breaks it.
>
> I think you have made a point that several more filesystems might
> be ok to not set nodev on (because we can not do anything to create
> device nodes on those filesystems).  I personally would prefer the much
> more paranoid approach of only allowing device nodes on a unprivileged
> mount if we have audited all of the code paths and know it is safe
> for device nodes to appear there.
>
> I don't actually think anyone cares ad remounts of filesystems like
> tmpfs, mqueue, sysfs, proc, ramfs are all quite rare.  Blind remounts
> are even rare.  The normal userspace utilities look at the appropriate
> version of /proc/mounts on remount.

Bind remounts are the only kind of remounts, because we've never
supported do_remount_sb in a user namespace.  So, if you want to
create some static content in your user namespace, the way to do it
is:

unshare(CLONE_NEWUSER | CLONE_NEWNS);
mount tmpfs somewhere;
write to the tmpfs;
mount("path to tmpfs", "path to tmpfs", nullptr, MS_REMOUNT | MS_BIND
| MS_RDONLY);

>
> These are not filesystems that a blind remount will likely be applied
> to.
>
> Furthermore there is work underway to prepare patches to allow
> "mount --bind -ro" to work as expected.  That will further reduce
> the pressure from blind remounts.

Not for example above.  It really does need the remount.

>
> If there is an actual regression of actual code I am happy to deal
> with it.  But having the MNT_NODEV on those mounts has been the case
> for a long time now and is not new (no regression).  This change just
> closed the security hole that allowed nodev to be removed.  And that
> security hole we need to have fixed.

Sandstorm does this.  (Well, it did until today.)

--Andy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-13  4:38             ` Andy Lutomirski
  0 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-13  4:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Linux FS Devel, Linux Containers,
	Linux Kernel Mailing List, Kenton Varda

On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> On 08/05/2014 05:57 PM, Eric W. Biederman wrote:
>>
>> Sorry for catching this late.  I think this fix is likely to
>> unnecessarily break valid userspace due to an odd interaction.
>
> The code is correct and safe (no security issues), but yes a blind
> remount might hit a snag.
>
> If you can find a userspace application that matters I might care
> that a security fix breaks it.
>
> I think you have made a point that several more filesystems might
> be ok to not set nodev on (because we can not do anything to create
> device nodes on those filesystems).  I personally would prefer the much
> more paranoid approach of only allowing device nodes on a unprivileged
> mount if we have audited all of the code paths and know it is safe
> for device nodes to appear there.
>
> I don't actually think anyone cares ad remounts of filesystems like
> tmpfs, mqueue, sysfs, proc, ramfs are all quite rare.  Blind remounts
> are even rare.  The normal userspace utilities look at the appropriate
> version of /proc/mounts on remount.

Bind remounts are the only kind of remounts, because we've never
supported do_remount_sb in a user namespace.  So, if you want to
create some static content in your user namespace, the way to do it
is:

unshare(CLONE_NEWUSER | CLONE_NEWNS);
mount tmpfs somewhere;
write to the tmpfs;
mount("path to tmpfs", "path to tmpfs", nullptr, MS_REMOUNT | MS_BIND
| MS_RDONLY);

>
> These are not filesystems that a blind remount will likely be applied
> to.
>
> Furthermore there is work underway to prepare patches to allow
> "mount --bind -ro" to work as expected.  That will further reduce
> the pressure from blind remounts.

Not for example above.  It really does need the remount.

>
> If there is an actual regression of actual code I am happy to deal
> with it.  But having the MNT_NODEV on those mounts has been the case
> for a long time now and is not new (no regression).  This change just
> closed the security hole that allowed nodev to be removed.  And that
> security hole we need to have fixed.

Sandstorm does this.  (Well, it did until today.)

--Andy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
       [not found]         ` <87sil1nhut.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
  2014-08-13  4:38             ` Andy Lutomirski
@ 2014-08-13  4:45           ` Kenton Varda
       [not found]             ` <CAOP=4widH1rMZ1O=hzAT+M_8exdzRPA8pJ+wH29AQ9L0ogu9nw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 67+ messages in thread
From: Kenton Varda @ 2014-08-13  4:45 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux FS Devel,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Linus Torvalds, Linux Kernel Mailing List, Andy Lutomirski

On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
wrote:

> If you can find a userspace application that matters I might care
> that a security fix breaks it.
>

FWIW, it broke Sandstorm.io, but we already pushed a fix, and I'm not sure
if you'd say that we "matter".


> If there is an actual regression of actual code I am happy to deal
> with it.  But having the MNT_NODEV on those mounts has been the case
> for a long time now and is not new (no regression).  This change just
> closed the security hole that allowed nodev to be removed.  And that
> security hole we need to have fixed.
>

The problem is that users like us had no idea that nodev was being silently
added in the first place, and thus didn't know that we needed to specify it
in remounts. We create the tmpfs, put some things in it, and then want to
remount it read-only for the sandbox. It seems reasonable to expect that a
newly-created tmpfs would have exactly the flags I gave it when I created
it, not silently get an additional flag that I then need to pass on remount.

Note further that it may be very hard for normal developers to figure out
why their remount is failing in this case. Andy only discovered the silent
nodev by reading the kernel code.

-Kenton

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-13  4:45           ` Kenton Varda
@ 2014-08-13 10:24                 ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-13 10:24 UTC (permalink / raw)
  To: Kenton Varda
  Cc: Linux FS Devel,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Linus Torvalds, Linux Kernel Mailing List, Andy Lutomirski

Kenton Varda <kenton-AuYgBwuPrUQTaNkGU808tA@public.gmane.org> writes:

> On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>
>     If you can find a userspace application that matters I might care
>     
>     that a security fix breaks it.
>
> FWIW, it broke Sandstorm.io, but we already pushed a fix, and I'm not
> sure if you'd say that we "matter".

I should meant to have said:  "If you can find a userspace application
that breaks I might care that a security fix breaks it"

Hearing this actually break something changes this from a theoretical
discussion to something real.

I have a proposed fix as the end of this email.  Could you please
test it?

If I can get a Tested-by and possibly a Reviewed-by responses I would
appreciate it.

> The problem is that users like us had no idea that nodev was being
> silently added in the first place, and thus didn't know that we needed
> to specify it in remounts. 

Agreed.  And frankly that is extremely reasonable and it would in fact
break anything doing a traditional tracking of mounts with /etc/mtab.
So it is a pretty siginificant discontinuity.

I am not prepared to remove the the implicit adding of nodev from the
implementation as that would just break your code and probably others
code in a different way.

What we can do and that will work cleanly long term is make remount
match mount and implicitly add nodev.

> We create the tmpfs, put some things in it,
> and then want to remount it read-only for the sandbox. It seems
> reasonable to expect that a newly-created tmpfs would have exactly the
> flags I gave it when I created it, not silently get an additional flag
> that I then need to pass on remount.

A reasonable expectation yes.  At the same time it is also a very
reasonable security requirement to not allow devices on filesystems
that the global root does not control.

It is also reasonable that code that does not care should not have
to worry about this issue and work the same as the global root or
as a container root.  Which I think is the reasoning that I went
with when adding the implicit nodev.

> Note further that it may be very hard for normal developers to figure
> out why their remount is failing in this case. Andy only discovered
> the silent nodev by reading the kernel code.

Agreed.

So the simple solution I see with minimal danger and minimal security
risk is to remove the inconsitency between mount and remount and add
implicitly add MNT_NODEV during remount as well.

That removes the minor regression and keeps the code in the paranoid
state where device nodes are not honored and not allowed on filesystems
mounted by ordinary users.

Compared to Andy's suggestion of relaxing the nodev restriction this
will also work for filesystems like fuse and nfs when we allow begin to
allow unprivileged mounts of those filesystems, and it allows to sleep
better knowing the code is extremely paranoid.

This reminds me that the dust is settling and the man pages need to be
updated for all of these small tweaks to the API that have happened, so
there is a good source of documentation for how all of this actually
works and why.

I am going to be travelling pretty much the rest of the week, so won't
be particularly responsive.  But if you can get this tested I think
this counts as a minor regression fix that solves this issue.

Eric

From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Date: Wed, 13 Aug 2014 01:33:38 -0700
Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount

Now that remount is properly enforcing the rule that you can't
remove nodev at least sandstorm.io is breaking when performing
a remount.

It turns out that there is an easy intuitive solution implicitly
add nodev on remount under the same conditions that we do on mount.

Update the regression test for remounts to verify that this new
addition does not regress either.

Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/namespace.c                                     |  9 ++++
 .../selftests/mount/unprivileged-remount-test.c    | 51 ++++++++++++----------
 2 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 7886176232c1..0f158300c866 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	if (path->dentry != path->mnt->mnt_root)
 		return -EINVAL;
 
+	/* Only in special cases allow devices from mounts created
+	 * outside the initial user namespace.
+	 */
+	if ((mnt->mnt_ns->user_ns != &init_user_ns) &&
+	    !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
+		flags |= MS_NODEV;
+		mnt_flags |= MNT_NODEV;
+	}
+
 	/* Don't allow changing of locked mnt flags.
 	 *
 	 * No locks need to be held here while testing the various
diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c
index 1b3ff2fda4d0..6d408c155e0f 100644
--- a/tools/testing/selftests/mount/unprivileged-remount-test.c
+++ b/tools/testing/selftests/mount/unprivileged-remount-test.c
@@ -118,7 +118,8 @@ static void create_and_enter_userns(void)
 }
 
 static
-bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags)
+bool test_unpriv_remount(const char *fstype, const char *mount_options,
+			 int mount_flags, int remount_flags, int invalid_flags)
 {
 	pid_t child;
 
@@ -151,9 +152,11 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags)
 			strerror(errno));
 	}
 
-	if (mount("testing", "/tmp", "ramfs", mount_flags, NULL) != 0) {
-		die("mount of /tmp failed: %s\n",
-			strerror(errno));
+	if (mount("testing", "/tmp", fstype, mount_flags, mount_options) != 0) {
+		die("mount of %s with options '%s' on /tmp failed: %s\n",
+		    fstype,
+		    mount_options? mount_options : "",
+		    strerror(errno));
 	}
 
 	create_and_enter_userns();
@@ -181,60 +184,60 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags)
 
 static bool test_unpriv_remount_simple(int mount_flags)
 {
-	return test_unpriv_remount(mount_flags, mount_flags, 0);
+	return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, 0);
 }
 
 static bool test_unpriv_remount_atime(int mount_flags, int invalid_flags)
 {
-	return test_unpriv_remount(mount_flags, mount_flags, invalid_flags);
+	return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags,
+				   invalid_flags);
 }
 
 int main(int argc, char **argv)
 {
-	if (!test_unpriv_remount_simple(MS_RDONLY|MS_NODEV)) {
+	if (!test_unpriv_remount_simple(MS_RDONLY)) {
 		die("MS_RDONLY malfunctions\n");
 	}
-	if (!test_unpriv_remount_simple(MS_NODEV)) {
+	if (!test_unpriv_remount("devpts", "newinstance", MS_NODEV, MS_NODEV, 0)) {
 		die("MS_NODEV malfunctions\n");
 	}
-	if (!test_unpriv_remount_simple(MS_NOSUID|MS_NODEV)) {
+	if (!test_unpriv_remount_simple(MS_NOSUID)) {
 		die("MS_NOSUID malfunctions\n");
 	}
-	if (!test_unpriv_remount_simple(MS_NOEXEC|MS_NODEV)) {
+	if (!test_unpriv_remount_simple(MS_NOEXEC)) {
 		die("MS_NOEXEC malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODEV,
-				       MS_NOATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_RELATIME,
+				       MS_NOATIME))
 	{
 		die("MS_RELATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODEV,
-				       MS_NOATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_STRICTATIME,
+				       MS_NOATIME))
 	{
 		die("MS_STRICTATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODEV,
-				       MS_STRICTATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_NOATIME,
+				       MS_STRICTATIME))
 	{
 		die("MS_RELATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME|MS_NODEV,
-				       MS_NOATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME,
+				       MS_NOATIME))
 	{
 		die("MS_RELATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME|MS_NODEV,
-				       MS_NOATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME,
+				       MS_NOATIME))
 	{
 		die("MS_RELATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME|MS_NODEV,
-				       MS_STRICTATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME,
+				       MS_STRICTATIME))
 	{
 		die("MS_RELATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount(MS_STRICTATIME|MS_NODEV, MS_NODEV,
-				 MS_NOATIME|MS_NODEV))
+	if (!test_unpriv_remount("ramfs", NULL, MS_STRICTATIME, 0, MS_NOATIME))
 	{
 		die("Default atime malfunctions\n");
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-13 10:24                 ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-13 10:24 UTC (permalink / raw)
  To: Kenton Varda
  Cc: Andy Lutomirski, Linus Torvalds, Linux FS Devel, containers,
	Linux Kernel Mailing List

Kenton Varda <kenton@sandstorm.io> writes:

> On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>
>     If you can find a userspace application that matters I might care
>     
>     that a security fix breaks it.
>
> FWIW, it broke Sandstorm.io, but we already pushed a fix, and I'm not
> sure if you'd say that we "matter".

I should meant to have said:  "If you can find a userspace application
that breaks I might care that a security fix breaks it"

Hearing this actually break something changes this from a theoretical
discussion to something real.

I have a proposed fix as the end of this email.  Could you please
test it?

If I can get a Tested-by and possibly a Reviewed-by responses I would
appreciate it.

> The problem is that users like us had no idea that nodev was being
> silently added in the first place, and thus didn't know that we needed
> to specify it in remounts. 

Agreed.  And frankly that is extremely reasonable and it would in fact
break anything doing a traditional tracking of mounts with /etc/mtab.
So it is a pretty siginificant discontinuity.

I am not prepared to remove the the implicit adding of nodev from the
implementation as that would just break your code and probably others
code in a different way.

What we can do and that will work cleanly long term is make remount
match mount and implicitly add nodev.

> We create the tmpfs, put some things in it,
> and then want to remount it read-only for the sandbox. It seems
> reasonable to expect that a newly-created tmpfs would have exactly the
> flags I gave it when I created it, not silently get an additional flag
> that I then need to pass on remount.

A reasonable expectation yes.  At the same time it is also a very
reasonable security requirement to not allow devices on filesystems
that the global root does not control.

It is also reasonable that code that does not care should not have
to worry about this issue and work the same as the global root or
as a container root.  Which I think is the reasoning that I went
with when adding the implicit nodev.

> Note further that it may be very hard for normal developers to figure
> out why their remount is failing in this case. Andy only discovered
> the silent nodev by reading the kernel code.

Agreed.

So the simple solution I see with minimal danger and minimal security
risk is to remove the inconsitency between mount and remount and add
implicitly add MNT_NODEV during remount as well.

That removes the minor regression and keeps the code in the paranoid
state where device nodes are not honored and not allowed on filesystems
mounted by ordinary users.

Compared to Andy's suggestion of relaxing the nodev restriction this
will also work for filesystems like fuse and nfs when we allow begin to
allow unprivileged mounts of those filesystems, and it allows to sleep
better knowing the code is extremely paranoid.

This reminds me that the dust is settling and the man pages need to be
updated for all of these small tweaks to the API that have happened, so
there is a good source of documentation for how all of this actually
works and why.

I am going to be travelling pretty much the rest of the week, so won't
be particularly responsive.  But if you can get this tested I think
this counts as a minor regression fix that solves this issue.

Eric

From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Wed, 13 Aug 2014 01:33:38 -0700
Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount

Now that remount is properly enforcing the rule that you can't
remove nodev at least sandstorm.io is breaking when performing
a remount.

It turns out that there is an easy intuitive solution implicitly
add nodev on remount under the same conditions that we do on mount.

Update the regression test for remounts to verify that this new
addition does not regress either.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/namespace.c                                     |  9 ++++
 .../selftests/mount/unprivileged-remount-test.c    | 51 ++++++++++++----------
 2 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 7886176232c1..0f158300c866 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
 	if (path->dentry != path->mnt->mnt_root)
 		return -EINVAL;
 
+	/* Only in special cases allow devices from mounts created
+	 * outside the initial user namespace.
+	 */
+	if ((mnt->mnt_ns->user_ns != &init_user_ns) &&
+	    !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
+		flags |= MS_NODEV;
+		mnt_flags |= MNT_NODEV;
+	}
+
 	/* Don't allow changing of locked mnt flags.
 	 *
 	 * No locks need to be held here while testing the various
diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c
index 1b3ff2fda4d0..6d408c155e0f 100644
--- a/tools/testing/selftests/mount/unprivileged-remount-test.c
+++ b/tools/testing/selftests/mount/unprivileged-remount-test.c
@@ -118,7 +118,8 @@ static void create_and_enter_userns(void)
 }
 
 static
-bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags)
+bool test_unpriv_remount(const char *fstype, const char *mount_options,
+			 int mount_flags, int remount_flags, int invalid_flags)
 {
 	pid_t child;
 
@@ -151,9 +152,11 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags)
 			strerror(errno));
 	}
 
-	if (mount("testing", "/tmp", "ramfs", mount_flags, NULL) != 0) {
-		die("mount of /tmp failed: %s\n",
-			strerror(errno));
+	if (mount("testing", "/tmp", fstype, mount_flags, mount_options) != 0) {
+		die("mount of %s with options '%s' on /tmp failed: %s\n",
+		    fstype,
+		    mount_options? mount_options : "",
+		    strerror(errno));
 	}
 
 	create_and_enter_userns();
@@ -181,60 +184,60 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags)
 
 static bool test_unpriv_remount_simple(int mount_flags)
 {
-	return test_unpriv_remount(mount_flags, mount_flags, 0);
+	return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, 0);
 }
 
 static bool test_unpriv_remount_atime(int mount_flags, int invalid_flags)
 {
-	return test_unpriv_remount(mount_flags, mount_flags, invalid_flags);
+	return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags,
+				   invalid_flags);
 }
 
 int main(int argc, char **argv)
 {
-	if (!test_unpriv_remount_simple(MS_RDONLY|MS_NODEV)) {
+	if (!test_unpriv_remount_simple(MS_RDONLY)) {
 		die("MS_RDONLY malfunctions\n");
 	}
-	if (!test_unpriv_remount_simple(MS_NODEV)) {
+	if (!test_unpriv_remount("devpts", "newinstance", MS_NODEV, MS_NODEV, 0)) {
 		die("MS_NODEV malfunctions\n");
 	}
-	if (!test_unpriv_remount_simple(MS_NOSUID|MS_NODEV)) {
+	if (!test_unpriv_remount_simple(MS_NOSUID)) {
 		die("MS_NOSUID malfunctions\n");
 	}
-	if (!test_unpriv_remount_simple(MS_NOEXEC|MS_NODEV)) {
+	if (!test_unpriv_remount_simple(MS_NOEXEC)) {
 		die("MS_NOEXEC malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODEV,
-				       MS_NOATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_RELATIME,
+				       MS_NOATIME))
 	{
 		die("MS_RELATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODEV,
-				       MS_NOATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_STRICTATIME,
+				       MS_NOATIME))
 	{
 		die("MS_STRICTATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODEV,
-				       MS_STRICTATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_NOATIME,
+				       MS_STRICTATIME))
 	{
 		die("MS_RELATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME|MS_NODEV,
-				       MS_NOATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME,
+				       MS_NOATIME))
 	{
 		die("MS_RELATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME|MS_NODEV,
-				       MS_NOATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME,
+				       MS_NOATIME))
 	{
 		die("MS_RELATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME|MS_NODEV,
-				       MS_STRICTATIME|MS_NODEV))
+	if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME,
+				       MS_STRICTATIME))
 	{
 		die("MS_RELATIME malfunctions\n");
 	}
-	if (!test_unpriv_remount(MS_STRICTATIME|MS_NODEV, MS_NODEV,
-				 MS_NOATIME|MS_NODEV))
+	if (!test_unpriv_remount("ramfs", NULL, MS_STRICTATIME, 0, MS_NOATIME))
 	{
 		die("Default atime malfunctions\n");
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-13 10:24                 ` Eric W. Biederman
@ 2014-08-13 17:03                     ` Andy Lutomirski
  -1 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-13 17:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux FS Devel, Kenton Varda, Linux Containers, Linus Torvalds,
	Linux Kernel Mailing List

On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Kenton Varda <kenton-AuYgBwuPrUQTaNkGU808tA@public.gmane.org> writes:
>
>> On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
> I have a proposed fix as the end of this email.  Could you please
> test it?
>
> If I can get a Tested-by and possibly a Reviewed-by responses I would
> appreciate it.

Will test later today.

>
>> The problem is that users like us had no idea that nodev was being
>> silently added in the first place, and thus didn't know that we needed
>> to specify it in remounts.
>
> Agreed.  And frankly that is extremely reasonable and it would in fact
> break anything doing a traditional tracking of mounts with /etc/mtab.
> So it is a pretty siginificant discontinuity.
>
> I am not prepared to remove the the implicit adding of nodev from the
> implementation as that would just break your code and probably others
> code in a different way.

I'm having trouble figuring out what would break.  The only way to end
up with a device node on one of these filesystems would be to pass an
fd to the file system out of the userns and have some globally
privileged code put the device node there.

>
> What we can do and that will work cleanly long term is make remount
> match mount and implicitly add nodev.
>
>> We create the tmpfs, put some things in it,
>> and then want to remount it read-only for the sandbox. It seems
>> reasonable to expect that a newly-created tmpfs would have exactly the
>> flags I gave it when I created it, not silently get an additional flag
>> that I then need to pass on remount.
>
> A reasonable expectation yes.  At the same time it is also a very
> reasonable security requirement to not allow devices on filesystems
> that the global root does not control.

But the magic nodev isn't needed for that; the checks in mknod are the
right place for this -- they're sufficient, and they're required
anyway.

>
> Compared to Andy's suggestion of relaxing the nodev restriction this
> will also work for filesystems like fuse and nfs when we allow begin to
> allow unprivileged mounts of those filesystems, and it allows to sleep
> better knowing the code is extremely paranoid.

If we allow fuse and nfs, we need to deal with nosuid, too.  I want to
avoid using nosuid for this; I want to change the suid logic to check
the namespace, since I think that suid *should* work in a namespace.

Also, that implicit nodev may be a problem down the road if we ever
add device namespaces.

...

>
> From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> Date: Wed, 13 Aug 2014 01:33:38 -0700
> Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount
>
> Now that remount is properly enforcing the rule that you can't
> remove nodev at least sandstorm.io is breaking when performing
> a remount.
>
> It turns out that there is an easy intuitive solution implicitly
> add nodev on remount under the same conditions that we do on mount.
>
> Update the regression test for remounts to verify that this new
> addition does not regress either.
>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
>  fs/namespace.c                                     |  9 ++++
>  .../selftests/mount/unprivileged-remount-test.c    | 51 ++++++++++++----------
>  2 files changed, 36 insertions(+), 24 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 7886176232c1..0f158300c866 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
>         if (path->dentry != path->mnt->mnt_root)
>                 return -EINVAL;
>
> +       /* Only in special cases allow devices from mounts created
> +        * outside the initial user namespace.

I don't like this comment at all.  If we stick with this approach, I
think that this should say:

Certain mounts are forced to have MNT_NODEV set.  For these mounts,
user code might not realize that nodev is set, so force MS_NODEV on
remount attempts to prevent confusing remount failures.

> +        */
> +       if ((mnt->mnt_ns->user_ns != &init_user_ns) &&
> +           !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
> +               flags |= MS_NODEV;
> +               mnt_flags |= MNT_NODEV;
> +       }
> +
>         /* Don't allow changing of locked mnt flags.
>          *
>          * No locks need to be held here while testing the various

I will still advocate a different and much less magical approach.
I'll send a patch later today.

--Andy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-13 17:03                     ` Andy Lutomirski
  0 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-13 17:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Kenton Varda, Linus Torvalds, Linux FS Devel, Linux Containers,
	Linux Kernel Mailing List

On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Kenton Varda <kenton@sandstorm.io> writes:
>
>> On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
> I have a proposed fix as the end of this email.  Could you please
> test it?
>
> If I can get a Tested-by and possibly a Reviewed-by responses I would
> appreciate it.

Will test later today.

>
>> The problem is that users like us had no idea that nodev was being
>> silently added in the first place, and thus didn't know that we needed
>> to specify it in remounts.
>
> Agreed.  And frankly that is extremely reasonable and it would in fact
> break anything doing a traditional tracking of mounts with /etc/mtab.
> So it is a pretty siginificant discontinuity.
>
> I am not prepared to remove the the implicit adding of nodev from the
> implementation as that would just break your code and probably others
> code in a different way.

I'm having trouble figuring out what would break.  The only way to end
up with a device node on one of these filesystems would be to pass an
fd to the file system out of the userns and have some globally
privileged code put the device node there.

>
> What we can do and that will work cleanly long term is make remount
> match mount and implicitly add nodev.
>
>> We create the tmpfs, put some things in it,
>> and then want to remount it read-only for the sandbox. It seems
>> reasonable to expect that a newly-created tmpfs would have exactly the
>> flags I gave it when I created it, not silently get an additional flag
>> that I then need to pass on remount.
>
> A reasonable expectation yes.  At the same time it is also a very
> reasonable security requirement to not allow devices on filesystems
> that the global root does not control.

But the magic nodev isn't needed for that; the checks in mknod are the
right place for this -- they're sufficient, and they're required
anyway.

>
> Compared to Andy's suggestion of relaxing the nodev restriction this
> will also work for filesystems like fuse and nfs when we allow begin to
> allow unprivileged mounts of those filesystems, and it allows to sleep
> better knowing the code is extremely paranoid.

If we allow fuse and nfs, we need to deal with nosuid, too.  I want to
avoid using nosuid for this; I want to change the suid logic to check
the namespace, since I think that suid *should* work in a namespace.

Also, that implicit nodev may be a problem down the road if we ever
add device namespaces.

...

>
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> Date: Wed, 13 Aug 2014 01:33:38 -0700
> Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount
>
> Now that remount is properly enforcing the rule that you can't
> remove nodev at least sandstorm.io is breaking when performing
> a remount.
>
> It turns out that there is an easy intuitive solution implicitly
> add nodev on remount under the same conditions that we do on mount.
>
> Update the regression test for remounts to verify that this new
> addition does not regress either.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  fs/namespace.c                                     |  9 ++++
>  .../selftests/mount/unprivileged-remount-test.c    | 51 ++++++++++++----------
>  2 files changed, 36 insertions(+), 24 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 7886176232c1..0f158300c866 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
>         if (path->dentry != path->mnt->mnt_root)
>                 return -EINVAL;
>
> +       /* Only in special cases allow devices from mounts created
> +        * outside the initial user namespace.

I don't like this comment at all.  If we stick with this approach, I
think that this should say:

Certain mounts are forced to have MNT_NODEV set.  For these mounts,
user code might not realize that nodev is set, so force MS_NODEV on
remount attempts to prevent confusing remount failures.

> +        */
> +       if ((mnt->mnt_ns->user_ns != &init_user_ns) &&
> +           !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
> +               flags |= MS_NODEV;
> +               mnt_flags |= MNT_NODEV;
> +       }
> +
>         /* Don't allow changing of locked mnt flags.
>          *
>          * No locks need to be held here while testing the various

I will still advocate a different and much less magical approach.
I'll send a patch later today.

--Andy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
       [not found]                     ` <CALCETrWT_p1-5nkiAjWoeta19fkO3rDiJe9_mhRVqF8x1zXv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-08-14  0:03                       ` Andy Lutomirski
  0 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-14  0:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Serge Hallyn,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski, Kenton Varda,
	Linux FS Devel, Linus Torvalds

Currently, creating a new mount (as opposed to bindmount) in a
non-root userns will implicitly set nodev unless the fs is devpts.
Something like this will be necessary for file systems that allow
the mounter to create device nodes without using mknod (e.g. FUSE
if/when that is allowed), but none of the currently allowed
filesystems do this.

Implicitly adding nodev is problematic, though.  It will make it
unsafe to ever remove the implicit addition, since userspace might
start to rely on it.

This fixes a minor regression in:

    9566d6742852 mnt: Correct permission checks in do_remount

Prior to that commit, MNT_NODEV wasn't enforced for remounts, so
there is existing user code that creates a new mount in a userns
without MS_NODEV and then expects a remount with matching options to
work.  That commit broke code that does this.

Fortunately, since the implicit nodev has no effect on any existing
filesystems, we can still safely remove it.

This replaces the implicit nodev with an explicit nodev requirement:
anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get
-EPERM unless they set nodev.  If they set nodev, that setting will
be locked.

As an added benefit, if anything like device namespaces is ever
added, then user code will be able to opt out of nodev by clearing
nodev.

To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all
FS_USERNS_MOUNT filesystems.  All of the current filesystems with
FS_USERNS_MOUNT set are safe.

I confirmed that this is compatible with Sandstorm's revision
1bf0c4847b.  That revision of Sandstorm does not work without this
fix if 9566d6742852 is applied.

Cc: Kenton Varda <kenton-AuYgBwuPrUQTaNkGU808tA@public.gmane.org>
Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
---
 fs/namespace.c   | 16 ++++++++++++----
 fs/proc/root.c   |  2 +-
 fs/ramfs/inode.c |  2 +-
 fs/sysfs/mount.c |  2 +-
 ipc/mqueue.c     |  2 +-
 mm/shmem.c       |  4 ++--
 6 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 0acabea58319..835fa9e8307e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 			put_filesystem(type);
 			return -EPERM;
 		}
-		/* Only in special cases allow devices from mounts
-		 * created outside the initial user namespace.
+
+		/*
+		 * If a filesystem might allow the mounter to put
+		 * device nodes on it without the checks in mknod,
+		 * then require MS_NODEV to mount it.
 		 */
 		if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
-			flags |= MS_NODEV;
-			mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
+			if (!(mnt_flags & MNT_NODEV)) {
+				put_filesystem(type);
+				return -EPERM;
+			}
+
+			/* Do not allow nodev to be cleared. */
+			mnt_flags |= MNT_LOCK_NODEV;
 		}
 	}
 
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 094e44d4a6be..2313b280729e 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = {
 	.name		= "proc",
 	.mount		= proc_mount,
 	.kill_sb	= proc_kill_sb,
-	.fs_flags	= FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
 };
 
 void __init proc_root_init(void)
diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
index d365b1c4eb3c..b95b7302d4cc 100644
--- a/fs/ramfs/inode.c
+++ b/fs/ramfs/inode.c
@@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = {
 	.name		= "ramfs",
 	.mount		= ramfs_mount,
 	.kill_sb	= ramfs_kill_sb,
-	.fs_flags	= FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
 };
 
 int __init init_ramfs_fs(void)
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 8a49486bf30c..56ba59317e24 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = {
 	.name		= "sysfs",
 	.mount		= sysfs_mount,
 	.kill_sb	= sysfs_kill_sb,
-	.fs_flags	= FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
 };
 
 int __init sysfs_init(void)
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 4fcf39af1776..56abbc848d4c 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = {
 	.name = "mqueue",
 	.mount = mqueue_mount,
 	.kill_sb = kill_litter_super,
-	.fs_flags = FS_USERNS_MOUNT,
+	.fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
 };
 
 int mq_init_ns(struct ipc_namespace *ns)
diff --git a/mm/shmem.c b/mm/shmem.c
index a42add14331c..f4a708a8f9e3 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = {
 	.name		= "tmpfs",
 	.mount		= shmem_mount,
 	.kill_sb	= kill_litter_super,
-	.fs_flags	= FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
 };
 
 int __init shmem_init(void)
@@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = {
 	.name		= "tmpfs",
 	.mount		= ramfs_mount,
 	.kill_sb	= kill_litter_super,
-	.fs_flags	= FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
 };
 
 int __init shmem_init(void)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
  2014-08-13 17:03                     ` Andy Lutomirski
  (?)
  (?)
@ 2014-08-14  0:03                     ` Andy Lutomirski
  2014-08-15 19:05                       ` Serge Hallyn
                                         ` (3 more replies)
  -1 siblings, 4 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-14  0:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linux Containers, Linux FS Devel, Linus Torvalds,
	Serge Hallyn, Andy Lutomirski, Kenton Varda, stable

Currently, creating a new mount (as opposed to bindmount) in a
non-root userns will implicitly set nodev unless the fs is devpts.
Something like this will be necessary for file systems that allow
the mounter to create device nodes without using mknod (e.g. FUSE
if/when that is allowed), but none of the currently allowed
filesystems do this.

Implicitly adding nodev is problematic, though.  It will make it
unsafe to ever remove the implicit addition, since userspace might
start to rely on it.

This fixes a minor regression in:

    9566d6742852 mnt: Correct permission checks in do_remount

Prior to that commit, MNT_NODEV wasn't enforced for remounts, so
there is existing user code that creates a new mount in a userns
without MS_NODEV and then expects a remount with matching options to
work.  That commit broke code that does this.

Fortunately, since the implicit nodev has no effect on any existing
filesystems, we can still safely remove it.

This replaces the implicit nodev with an explicit nodev requirement:
anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get
-EPERM unless they set nodev.  If they set nodev, that setting will
be locked.

As an added benefit, if anything like device namespaces is ever
added, then user code will be able to opt out of nodev by clearing
nodev.

To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all
FS_USERNS_MOUNT filesystems.  All of the current filesystems with
FS_USERNS_MOUNT set are safe.

I confirmed that this is compatible with Sandstorm's revision
1bf0c4847b.  That revision of Sandstorm does not work without this
fix if 9566d6742852 is applied.

Cc: Kenton Varda <kenton@sandstorm.io>
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 fs/namespace.c   | 16 ++++++++++++----
 fs/proc/root.c   |  2 +-
 fs/ramfs/inode.c |  2 +-
 fs/sysfs/mount.c |  2 +-
 ipc/mqueue.c     |  2 +-
 mm/shmem.c       |  4 ++--
 6 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 0acabea58319..835fa9e8307e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
 			put_filesystem(type);
 			return -EPERM;
 		}
-		/* Only in special cases allow devices from mounts
-		 * created outside the initial user namespace.
+
+		/*
+		 * If a filesystem might allow the mounter to put
+		 * device nodes on it without the checks in mknod,
+		 * then require MS_NODEV to mount it.
 		 */
 		if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
-			flags |= MS_NODEV;
-			mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
+			if (!(mnt_flags & MNT_NODEV)) {
+				put_filesystem(type);
+				return -EPERM;
+			}
+
+			/* Do not allow nodev to be cleared. */
+			mnt_flags |= MNT_LOCK_NODEV;
 		}
 	}
 
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 094e44d4a6be..2313b280729e 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = {
 	.name		= "proc",
 	.mount		= proc_mount,
 	.kill_sb	= proc_kill_sb,
-	.fs_flags	= FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
 };
 
 void __init proc_root_init(void)
diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
index d365b1c4eb3c..b95b7302d4cc 100644
--- a/fs/ramfs/inode.c
+++ b/fs/ramfs/inode.c
@@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = {
 	.name		= "ramfs",
 	.mount		= ramfs_mount,
 	.kill_sb	= ramfs_kill_sb,
-	.fs_flags	= FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
 };
 
 int __init init_ramfs_fs(void)
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 8a49486bf30c..56ba59317e24 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = {
 	.name		= "sysfs",
 	.mount		= sysfs_mount,
 	.kill_sb	= sysfs_kill_sb,
-	.fs_flags	= FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
 };
 
 int __init sysfs_init(void)
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 4fcf39af1776..56abbc848d4c 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = {
 	.name = "mqueue",
 	.mount = mqueue_mount,
 	.kill_sb = kill_litter_super,
-	.fs_flags = FS_USERNS_MOUNT,
+	.fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
 };
 
 int mq_init_ns(struct ipc_namespace *ns)
diff --git a/mm/shmem.c b/mm/shmem.c
index a42add14331c..f4a708a8f9e3 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = {
 	.name		= "tmpfs",
 	.mount		= shmem_mount,
 	.kill_sb	= kill_litter_super,
-	.fs_flags	= FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
 };
 
 int __init shmem_init(void)
@@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = {
 	.name		= "tmpfs",
 	.mount		= ramfs_mount,
 	.kill_sb	= kill_litter_super,
-	.fs_flags	= FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
 };
 
 int __init shmem_init(void)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
       [not found]                 ` <87tx5ghekp.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
  2014-08-13 17:03                     ` Andy Lutomirski
@ 2014-08-15 18:41                   ` Andy Lutomirski
  1 sibling, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-15 18:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux FS Devel, Kenton Varda, Linux Containers, Linus Torvalds,
	Linux Kernel Mailing List

On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Kenton Varda <kenton-AuYgBwuPrUQTaNkGU808tA@public.gmane.org> writes:
>
>> On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
>> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:

> From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> Date: Wed, 13 Aug 2014 01:33:38 -0700
> Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount
>
> Now that remount is properly enforcing the rule that you can't
> remove nodev at least sandstorm.io is breaking when performing
> a remount.
>
> It turns out that there is an easy intuitive solution implicitly
> add nodev on remount under the same conditions that we do on mount.
>
> Update the regression test for remounts to verify that this new
> addition does not regress either.

Tested-but-disliked-by: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>

Quoting Alan Cox [1]:

Lying to apps generally ends up like children lying to parents - the
lie gets more complicated to keep up each case you find until it
breaks.

The kernel was unnecessarily fudging mount flags, and it broke,
because the fudge didn't go far enough.  This extends the fudge.
Let's just delete it instead.

--Andy

[1] https://lwn.net/Articles/608435/

>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
>  fs/namespace.c                                     |  9 ++++
>  .../selftests/mount/unprivileged-remount-test.c    | 51 ++++++++++++----------
>  2 files changed, 36 insertions(+), 24 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 7886176232c1..0f158300c866 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
>         if (path->dentry != path->mnt->mnt_root)
>                 return -EINVAL;
>
> +       /* Only in special cases allow devices from mounts created
> +        * outside the initial user namespace.
> +        */
> +       if ((mnt->mnt_ns->user_ns != &init_user_ns) &&
> +           !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
> +               flags |= MS_NODEV;
> +               mnt_flags |= MNT_NODEV;
> +       }
> +
>         /* Don't allow changing of locked mnt flags.
>          *
>          * No locks need to be held here while testing the various
> diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c
> index 1b3ff2fda4d0..6d408c155e0f 100644
> --- a/tools/testing/selftests/mount/unprivileged-remount-test.c
> +++ b/tools/testing/selftests/mount/unprivileged-remount-test.c
> @@ -118,7 +118,8 @@ static void create_and_enter_userns(void)
>  }
>
>  static
> -bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags)
> +bool test_unpriv_remount(const char *fstype, const char *mount_options,
> +                        int mount_flags, int remount_flags, int invalid_flags)
>  {
>         pid_t child;
>
> @@ -151,9 +152,11 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags)
>                         strerror(errno));
>         }
>
> -       if (mount("testing", "/tmp", "ramfs", mount_flags, NULL) != 0) {
> -               die("mount of /tmp failed: %s\n",
> -                       strerror(errno));
> +       if (mount("testing", "/tmp", fstype, mount_flags, mount_options) != 0) {
> +               die("mount of %s with options '%s' on /tmp failed: %s\n",
> +                   fstype,
> +                   mount_options? mount_options : "",
> +                   strerror(errno));
>         }
>
>         create_and_enter_userns();
> @@ -181,60 +184,60 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags)
>
>  static bool test_unpriv_remount_simple(int mount_flags)
>  {
> -       return test_unpriv_remount(mount_flags, mount_flags, 0);
> +       return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, 0);
>  }
>
>  static bool test_unpriv_remount_atime(int mount_flags, int invalid_flags)
>  {
> -       return test_unpriv_remount(mount_flags, mount_flags, invalid_flags);
> +       return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags,
> +                                  invalid_flags);
>  }
>
>  int main(int argc, char **argv)
>  {
> -       if (!test_unpriv_remount_simple(MS_RDONLY|MS_NODEV)) {
> +       if (!test_unpriv_remount_simple(MS_RDONLY)) {
>                 die("MS_RDONLY malfunctions\n");
>         }
> -       if (!test_unpriv_remount_simple(MS_NODEV)) {
> +       if (!test_unpriv_remount("devpts", "newinstance", MS_NODEV, MS_NODEV, 0)) {
>                 die("MS_NODEV malfunctions\n");
>         }
> -       if (!test_unpriv_remount_simple(MS_NOSUID|MS_NODEV)) {
> +       if (!test_unpriv_remount_simple(MS_NOSUID)) {
>                 die("MS_NOSUID malfunctions\n");
>         }
> -       if (!test_unpriv_remount_simple(MS_NOEXEC|MS_NODEV)) {
> +       if (!test_unpriv_remount_simple(MS_NOEXEC)) {
>                 die("MS_NOEXEC malfunctions\n");
>         }
> -       if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODEV,
> -                                      MS_NOATIME|MS_NODEV))
> +       if (!test_unpriv_remount_atime(MS_RELATIME,
> +                                      MS_NOATIME))
>         {
>                 die("MS_RELATIME malfunctions\n");
>         }
> -       if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODEV,
> -                                      MS_NOATIME|MS_NODEV))
> +       if (!test_unpriv_remount_atime(MS_STRICTATIME,
> +                                      MS_NOATIME))
>         {
>                 die("MS_STRICTATIME malfunctions\n");
>         }
> -       if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODEV,
> -                                      MS_STRICTATIME|MS_NODEV))
> +       if (!test_unpriv_remount_atime(MS_NOATIME,
> +                                      MS_STRICTATIME))
>         {
>                 die("MS_RELATIME malfunctions\n");
>         }
> -       if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME|MS_NODEV,
> -                                      MS_NOATIME|MS_NODEV))
> +       if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME,
> +                                      MS_NOATIME))
>         {
>                 die("MS_RELATIME malfunctions\n");
>         }
> -       if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME|MS_NODEV,
> -                                      MS_NOATIME|MS_NODEV))
> +       if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME,
> +                                      MS_NOATIME))
>         {
>                 die("MS_RELATIME malfunctions\n");
>         }
> -       if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME|MS_NODEV,
> -                                      MS_STRICTATIME|MS_NODEV))
> +       if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME,
> +                                      MS_STRICTATIME))
>         {
>                 die("MS_RELATIME malfunctions\n");
>         }
> -       if (!test_unpriv_remount(MS_STRICTATIME|MS_NODEV, MS_NODEV,
> -                                MS_NOATIME|MS_NODEV))
> +       if (!test_unpriv_remount("ramfs", NULL, MS_STRICTATIME, 0, MS_NOATIME))
>         {
>                 die("Default atime malfunctions\n");
>         }
> --
> 1.9.1
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-13 10:24                 ` Eric W. Biederman
  (?)
  (?)
@ 2014-08-15 18:41                 ` Andy Lutomirski
  -1 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-15 18:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Kenton Varda, Linus Torvalds, Linux FS Devel, Linux Containers,
	Linux Kernel Mailing List

On Wed, Aug 13, 2014 at 3:24 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Kenton Varda <kenton@sandstorm.io> writes:
>
>> On Tue, Aug 12, 2014 at 9:17 PM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:

> From: "Eric W. Biederman" <ebiederm@xmission.com>
> Date: Wed, 13 Aug 2014 01:33:38 -0700
> Subject: [PATCH] mnt: Implicitly add MNT_NODEV on remount as we do on mount
>
> Now that remount is properly enforcing the rule that you can't
> remove nodev at least sandstorm.io is breaking when performing
> a remount.
>
> It turns out that there is an easy intuitive solution implicitly
> add nodev on remount under the same conditions that we do on mount.
>
> Update the regression test for remounts to verify that this new
> addition does not regress either.

Tested-but-disliked-by: Andy Lutomirski <luto@amacapital.net>

Quoting Alan Cox [1]:

Lying to apps generally ends up like children lying to parents - the
lie gets more complicated to keep up each case you find until it
breaks.

The kernel was unnecessarily fudging mount flags, and it broke,
because the fudge didn't go far enough.  This extends the fudge.
Let's just delete it instead.

--Andy

[1] https://lwn.net/Articles/608435/

>
> Cc: stable@vger.kernel.org
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  fs/namespace.c                                     |  9 ++++
>  .../selftests/mount/unprivileged-remount-test.c    | 51 ++++++++++++----------
>  2 files changed, 36 insertions(+), 24 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 7886176232c1..0f158300c866 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1934,6 +1934,15 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
>         if (path->dentry != path->mnt->mnt_root)
>                 return -EINVAL;
>
> +       /* Only in special cases allow devices from mounts created
> +        * outside the initial user namespace.
> +        */
> +       if ((mnt->mnt_ns->user_ns != &init_user_ns) &&
> +           !(sb->s_type->fs_flags & FS_USERNS_DEV_MOUNT)) {
> +               flags |= MS_NODEV;
> +               mnt_flags |= MNT_NODEV;
> +       }
> +
>         /* Don't allow changing of locked mnt flags.
>          *
>          * No locks need to be held here while testing the various
> diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c
> index 1b3ff2fda4d0..6d408c155e0f 100644
> --- a/tools/testing/selftests/mount/unprivileged-remount-test.c
> +++ b/tools/testing/selftests/mount/unprivileged-remount-test.c
> @@ -118,7 +118,8 @@ static void create_and_enter_userns(void)
>  }
>
>  static
> -bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags)
> +bool test_unpriv_remount(const char *fstype, const char *mount_options,
> +                        int mount_flags, int remount_flags, int invalid_flags)
>  {
>         pid_t child;
>
> @@ -151,9 +152,11 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags)
>                         strerror(errno));
>         }
>
> -       if (mount("testing", "/tmp", "ramfs", mount_flags, NULL) != 0) {
> -               die("mount of /tmp failed: %s\n",
> -                       strerror(errno));
> +       if (mount("testing", "/tmp", fstype, mount_flags, mount_options) != 0) {
> +               die("mount of %s with options '%s' on /tmp failed: %s\n",
> +                   fstype,
> +                   mount_options? mount_options : "",
> +                   strerror(errno));
>         }
>
>         create_and_enter_userns();
> @@ -181,60 +184,60 @@ bool test_unpriv_remount(int mount_flags, int remount_flags, int invalid_flags)
>
>  static bool test_unpriv_remount_simple(int mount_flags)
>  {
> -       return test_unpriv_remount(mount_flags, mount_flags, 0);
> +       return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags, 0);
>  }
>
>  static bool test_unpriv_remount_atime(int mount_flags, int invalid_flags)
>  {
> -       return test_unpriv_remount(mount_flags, mount_flags, invalid_flags);
> +       return test_unpriv_remount("ramfs", NULL, mount_flags, mount_flags,
> +                                  invalid_flags);
>  }
>
>  int main(int argc, char **argv)
>  {
> -       if (!test_unpriv_remount_simple(MS_RDONLY|MS_NODEV)) {
> +       if (!test_unpriv_remount_simple(MS_RDONLY)) {
>                 die("MS_RDONLY malfunctions\n");
>         }
> -       if (!test_unpriv_remount_simple(MS_NODEV)) {
> +       if (!test_unpriv_remount("devpts", "newinstance", MS_NODEV, MS_NODEV, 0)) {
>                 die("MS_NODEV malfunctions\n");
>         }
> -       if (!test_unpriv_remount_simple(MS_NOSUID|MS_NODEV)) {
> +       if (!test_unpriv_remount_simple(MS_NOSUID)) {
>                 die("MS_NOSUID malfunctions\n");
>         }
> -       if (!test_unpriv_remount_simple(MS_NOEXEC|MS_NODEV)) {
> +       if (!test_unpriv_remount_simple(MS_NOEXEC)) {
>                 die("MS_NOEXEC malfunctions\n");
>         }
> -       if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODEV,
> -                                      MS_NOATIME|MS_NODEV))
> +       if (!test_unpriv_remount_atime(MS_RELATIME,
> +                                      MS_NOATIME))
>         {
>                 die("MS_RELATIME malfunctions\n");
>         }
> -       if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODEV,
> -                                      MS_NOATIME|MS_NODEV))
> +       if (!test_unpriv_remount_atime(MS_STRICTATIME,
> +                                      MS_NOATIME))
>         {
>                 die("MS_STRICTATIME malfunctions\n");
>         }
> -       if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODEV,
> -                                      MS_STRICTATIME|MS_NODEV))
> +       if (!test_unpriv_remount_atime(MS_NOATIME,
> +                                      MS_STRICTATIME))
>         {
>                 die("MS_RELATIME malfunctions\n");
>         }
> -       if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME|MS_NODEV,
> -                                      MS_NOATIME|MS_NODEV))
> +       if (!test_unpriv_remount_atime(MS_RELATIME|MS_NODIRATIME,
> +                                      MS_NOATIME))
>         {
>                 die("MS_RELATIME malfunctions\n");
>         }
> -       if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME|MS_NODEV,
> -                                      MS_NOATIME|MS_NODEV))
> +       if (!test_unpriv_remount_atime(MS_STRICTATIME|MS_NODIRATIME,
> +                                      MS_NOATIME))
>         {
>                 die("MS_RELATIME malfunctions\n");
>         }
> -       if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME|MS_NODEV,
> -                                      MS_STRICTATIME|MS_NODEV))
> +       if (!test_unpriv_remount_atime(MS_NOATIME|MS_NODIRATIME,
> +                                      MS_STRICTATIME))
>         {
>                 die("MS_RELATIME malfunctions\n");
>         }
> -       if (!test_unpriv_remount(MS_STRICTATIME|MS_NODEV, MS_NODEV,
> -                                MS_NOATIME|MS_NODEV))
> +       if (!test_unpriv_remount("ramfs", NULL, MS_STRICTATIME, 0, MS_NOATIME))
>         {
>                 die("Default atime malfunctions\n");
>         }
> --
> 1.9.1
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
       [not found]                       ` <2686c32f00b14148379e8cfee9c028c794d4aa1a.1407974494.git.luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
@ 2014-08-15 19:05                         ` Serge Hallyn
  2014-08-15 20:16                         ` Serge Hallyn
  2014-08-28  1:35                         ` Andy Lutomirski
  2 siblings, 0 replies; 67+ messages in thread
From: Serge Hallyn @ 2014-08-15 19:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, Kenton Varda, Eric W. Biederman,
	Linux FS Devel, Linus Torvalds

Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
> Currently, creating a new mount (as opposed to bindmount) in a
> non-root userns will implicitly set nodev unless the fs is devpts.
> Something like this will be necessary for file systems that allow
> the mounter to create device nodes without using mknod (e.g. FUSE
> if/when that is allowed), but none of the currently allowed
> filesystems do this.

Hi,

Sorry, I'm probably thinking stupidly, but I don't see this restriction
being the case

serge@sl:~$ mount | grep tmp
[...]
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
serge@sl:~$ sudo mknod /run/kvm c 10 232
[sudo] password for serge: 
serge@sl:~$ echo $?
0
serge@sl:~$ ls -l /run/kvm
crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm

But you seem to be saying I shouldn't be allowed to create a device inside
a tmpfs.  What am I overlooking?

> Implicitly adding nodev is problematic, though.  It will make it
> unsafe to ever remove the implicit addition, since userspace might
> start to rely on it.
> 
> This fixes a minor regression in:
> 
>     9566d6742852 mnt: Correct permission checks in do_remount
> 
> Prior to that commit, MNT_NODEV wasn't enforced for remounts, so
> there is existing user code that creates a new mount in a userns
> without MS_NODEV and then expects a remount with matching options to
> work.  That commit broke code that does this.
> 
> Fortunately, since the implicit nodev has no effect on any existing
> filesystems, we can still safely remove it.
> 
> This replaces the implicit nodev with an explicit nodev requirement:
> anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get
> -EPERM unless they set nodev.  If they set nodev, that setting will
> be locked.
> 
> As an added benefit, if anything like device namespaces is ever
> added, then user code will be able to opt out of nodev by clearing
> nodev.
> 
> To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all
> FS_USERNS_MOUNT filesystems.  All of the current filesystems with
> FS_USERNS_MOUNT set are safe.
> 
> I confirmed that this is compatible with Sandstorm's revision
> 1bf0c4847b.  That revision of Sandstorm does not work without this
> fix if 9566d6742852 is applied.
> 
> Cc: Kenton Varda <kenton-AuYgBwuPrUQTaNkGU808tA@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Signed-off-by: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> ---
>  fs/namespace.c   | 16 ++++++++++++----
>  fs/proc/root.c   |  2 +-
>  fs/ramfs/inode.c |  2 +-
>  fs/sysfs/mount.c |  2 +-
>  ipc/mqueue.c     |  2 +-
>  mm/shmem.c       |  4 ++--
>  6 files changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 0acabea58319..835fa9e8307e 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
>  			put_filesystem(type);
>  			return -EPERM;
>  		}
> -		/* Only in special cases allow devices from mounts
> -		 * created outside the initial user namespace.
> +
> +		/*
> +		 * If a filesystem might allow the mounter to put
> +		 * device nodes on it without the checks in mknod,
> +		 * then require MS_NODEV to mount it.
>  		 */
>  		if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
> -			flags |= MS_NODEV;
> -			mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
> +			if (!(mnt_flags & MNT_NODEV)) {
> +				put_filesystem(type);
> +				return -EPERM;
> +			}
> +
> +			/* Do not allow nodev to be cleared. */
> +			mnt_flags |= MNT_LOCK_NODEV;
>  		}
>  	}
>  
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index 094e44d4a6be..2313b280729e 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = {
>  	.name		= "proc",
>  	.mount		= proc_mount,
>  	.kill_sb	= proc_kill_sb,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  void __init proc_root_init(void)
> diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
> index d365b1c4eb3c..b95b7302d4cc 100644
> --- a/fs/ramfs/inode.c
> +++ b/fs/ramfs/inode.c
> @@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = {
>  	.name		= "ramfs",
>  	.mount		= ramfs_mount,
>  	.kill_sb	= ramfs_kill_sb,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init init_ramfs_fs(void)
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 8a49486bf30c..56ba59317e24 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = {
>  	.name		= "sysfs",
>  	.mount		= sysfs_mount,
>  	.kill_sb	= sysfs_kill_sb,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init sysfs_init(void)
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index 4fcf39af1776..56abbc848d4c 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = {
>  	.name = "mqueue",
>  	.mount = mqueue_mount,
>  	.kill_sb = kill_litter_super,
> -	.fs_flags = FS_USERNS_MOUNT,
> +	.fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int mq_init_ns(struct ipc_namespace *ns)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index a42add14331c..f4a708a8f9e3 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = {
>  	.name		= "tmpfs",
>  	.mount		= shmem_mount,
>  	.kill_sb	= kill_litter_super,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init shmem_init(void)
> @@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = {
>  	.name		= "tmpfs",
>  	.mount		= ramfs_mount,
>  	.kill_sb	= kill_litter_super,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init shmem_init(void)
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
  2014-08-14  0:03                     ` Andy Lutomirski
@ 2014-08-15 19:05                       ` Serge Hallyn
  2014-08-15 19:16                           ` Andy Lutomirski
       [not found]                       ` <2686c32f00b14148379e8cfee9c028c794d4aa1a.1407974494.git.luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
                                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 67+ messages in thread
From: Serge Hallyn @ 2014-08-15 19:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eric W. Biederman, linux-kernel, Linux Containers,
	Linux FS Devel, Linus Torvalds, Kenton Varda, stable

Quoting Andy Lutomirski (luto@amacapital.net):
> Currently, creating a new mount (as opposed to bindmount) in a
> non-root userns will implicitly set nodev unless the fs is devpts.
> Something like this will be necessary for file systems that allow
> the mounter to create device nodes without using mknod (e.g. FUSE
> if/when that is allowed), but none of the currently allowed
> filesystems do this.

Hi,

Sorry, I'm probably thinking stupidly, but I don't see this restriction
being the case

serge@sl:~$ mount | grep tmp
[...]
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
serge@sl:~$ sudo mknod /run/kvm c 10 232
[sudo] password for serge: 
serge@sl:~$ echo $?
0
serge@sl:~$ ls -l /run/kvm
crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm

But you seem to be saying I shouldn't be allowed to create a device inside
a tmpfs.  What am I overlooking?

> Implicitly adding nodev is problematic, though.  It will make it
> unsafe to ever remove the implicit addition, since userspace might
> start to rely on it.
> 
> This fixes a minor regression in:
> 
>     9566d6742852 mnt: Correct permission checks in do_remount
> 
> Prior to that commit, MNT_NODEV wasn't enforced for remounts, so
> there is existing user code that creates a new mount in a userns
> without MS_NODEV and then expects a remount with matching options to
> work.  That commit broke code that does this.
> 
> Fortunately, since the implicit nodev has no effect on any existing
> filesystems, we can still safely remove it.
> 
> This replaces the implicit nodev with an explicit nodev requirement:
> anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get
> -EPERM unless they set nodev.  If they set nodev, that setting will
> be locked.
> 
> As an added benefit, if anything like device namespaces is ever
> added, then user code will be able to opt out of nodev by clearing
> nodev.
> 
> To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all
> FS_USERNS_MOUNT filesystems.  All of the current filesystems with
> FS_USERNS_MOUNT set are safe.
> 
> I confirmed that this is compatible with Sandstorm's revision
> 1bf0c4847b.  That revision of Sandstorm does not work without this
> fix if 9566d6742852 is applied.
> 
> Cc: Kenton Varda <kenton@sandstorm.io>
> Cc: stable@vger.kernel.org
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> ---
>  fs/namespace.c   | 16 ++++++++++++----
>  fs/proc/root.c   |  2 +-
>  fs/ramfs/inode.c |  2 +-
>  fs/sysfs/mount.c |  2 +-
>  ipc/mqueue.c     |  2 +-
>  mm/shmem.c       |  4 ++--
>  6 files changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 0acabea58319..835fa9e8307e 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
>  			put_filesystem(type);
>  			return -EPERM;
>  		}
> -		/* Only in special cases allow devices from mounts
> -		 * created outside the initial user namespace.
> +
> +		/*
> +		 * If a filesystem might allow the mounter to put
> +		 * device nodes on it without the checks in mknod,
> +		 * then require MS_NODEV to mount it.
>  		 */
>  		if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
> -			flags |= MS_NODEV;
> -			mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
> +			if (!(mnt_flags & MNT_NODEV)) {
> +				put_filesystem(type);
> +				return -EPERM;
> +			}
> +
> +			/* Do not allow nodev to be cleared. */
> +			mnt_flags |= MNT_LOCK_NODEV;
>  		}
>  	}
>  
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index 094e44d4a6be..2313b280729e 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = {
>  	.name		= "proc",
>  	.mount		= proc_mount,
>  	.kill_sb	= proc_kill_sb,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  void __init proc_root_init(void)
> diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
> index d365b1c4eb3c..b95b7302d4cc 100644
> --- a/fs/ramfs/inode.c
> +++ b/fs/ramfs/inode.c
> @@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = {
>  	.name		= "ramfs",
>  	.mount		= ramfs_mount,
>  	.kill_sb	= ramfs_kill_sb,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init init_ramfs_fs(void)
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 8a49486bf30c..56ba59317e24 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = {
>  	.name		= "sysfs",
>  	.mount		= sysfs_mount,
>  	.kill_sb	= sysfs_kill_sb,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init sysfs_init(void)
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index 4fcf39af1776..56abbc848d4c 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = {
>  	.name = "mqueue",
>  	.mount = mqueue_mount,
>  	.kill_sb = kill_litter_super,
> -	.fs_flags = FS_USERNS_MOUNT,
> +	.fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int mq_init_ns(struct ipc_namespace *ns)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index a42add14331c..f4a708a8f9e3 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = {
>  	.name		= "tmpfs",
>  	.mount		= shmem_mount,
>  	.kill_sb	= kill_litter_super,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init shmem_init(void)
> @@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = {
>  	.name		= "tmpfs",
>  	.mount		= ramfs_mount,
>  	.kill_sb	= kill_litter_super,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init shmem_init(void)
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
  2014-08-15 19:05                       ` Serge Hallyn
@ 2014-08-15 19:16                           ` Andy Lutomirski
  0 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-15 19:16 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, stable,
	Kenton Varda, Eric W. Biederman, Linux FS Devel, Linus Torvalds

On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote:
> Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
>> Currently, creating a new mount (as opposed to bindmount) in a
>> non-root userns will implicitly set nodev unless the fs is devpts.
>> Something like this will be necessary for file systems that allow
>> the mounter to create device nodes without using mknod (e.g. FUSE
>> if/when that is allowed), but none of the currently allowed
>> filesystems do this.
>
> Hi,
>
> Sorry, I'm probably thinking stupidly, but I don't see this restriction
> being the case
>
> serge@sl:~$ mount | grep tmp
> [...]
> tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
> serge@sl:~$ sudo mknod /run/kvm c 10 232
> [sudo] password for serge:
> serge@sl:~$ echo $?
> 0
> serge@sl:~$ ls -l /run/kvm
> crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm
>
> But you seem to be saying I shouldn't be allowed to create a device inside
> a tmpfs.  What am I overlooking?

I assume you're in the root userns.  This patch is unnecessary, and
has no effect, if you're in the root userns.

The code in Sandstorm that's currently broken in Linus' tree runs in a
new userns with a matching mount ns.  It does (copied verbatim):

      KJ_SYSCALL(mount("sandstorm-dev", "dev", "tmpfs", MS_NOSUID | MS_NOEXEC,
                       "size=1m,nr_inodes=16,mode=755"));
      makeCharDeviceNode("null", "null", 1, 3);
      makeCharDeviceNode("zero", "zero", 1, 5);
      makeCharDeviceNode("random", "urandom", 1, 9);
      makeCharDeviceNode("urandom", "urandom", 1, 9);
      KJ_SYSCALL(mount("dev", "dev", nullptr,
                       MS_REMOUNT | MS_BIND | MS_NOSUID | MS_NOEXEC |
MS_RDONLY, nullptr));

makeCharDeviceNode is a helper that creates an empty file and mounts a
device node over it.  This code needs the fs to be read/write, but
Sandstorm wants to make /dev read-only when it's done.

In Linus' tree, the remount fails with -EPERM because the mount is
secretly nodev.  It was always secretly nodev, but no one noticed
because of CVE-2014-5207, which caused that remount to succeed.

(Yay for programs that inadvertently exploited a serious security
vulnerability for their normal function.)

--Andy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
@ 2014-08-15 19:16                           ` Andy Lutomirski
  0 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-15 19:16 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Eric W. Biederman, linux-kernel, Linux Containers,
	Linux FS Devel, Linus Torvalds, Kenton Varda, stable

On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> Quoting Andy Lutomirski (luto@amacapital.net):
>> Currently, creating a new mount (as opposed to bindmount) in a
>> non-root userns will implicitly set nodev unless the fs is devpts.
>> Something like this will be necessary for file systems that allow
>> the mounter to create device nodes without using mknod (e.g. FUSE
>> if/when that is allowed), but none of the currently allowed
>> filesystems do this.
>
> Hi,
>
> Sorry, I'm probably thinking stupidly, but I don't see this restriction
> being the case
>
> serge@sl:~$ mount | grep tmp
> [...]
> tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
> serge@sl:~$ sudo mknod /run/kvm c 10 232
> [sudo] password for serge:
> serge@sl:~$ echo $?
> 0
> serge@sl:~$ ls -l /run/kvm
> crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm
>
> But you seem to be saying I shouldn't be allowed to create a device inside
> a tmpfs.  What am I overlooking?

I assume you're in the root userns.  This patch is unnecessary, and
has no effect, if you're in the root userns.

The code in Sandstorm that's currently broken in Linus' tree runs in a
new userns with a matching mount ns.  It does (copied verbatim):

      KJ_SYSCALL(mount("sandstorm-dev", "dev", "tmpfs", MS_NOSUID | MS_NOEXEC,
                       "size=1m,nr_inodes=16,mode=755"));
      makeCharDeviceNode("null", "null", 1, 3);
      makeCharDeviceNode("zero", "zero", 1, 5);
      makeCharDeviceNode("random", "urandom", 1, 9);
      makeCharDeviceNode("urandom", "urandom", 1, 9);
      KJ_SYSCALL(mount("dev", "dev", nullptr,
                       MS_REMOUNT | MS_BIND | MS_NOSUID | MS_NOEXEC |
MS_RDONLY, nullptr));

makeCharDeviceNode is a helper that creates an empty file and mounts a
device node over it.  This code needs the fs to be read/write, but
Sandstorm wants to make /dev read-only when it's done.

In Linus' tree, the remount fails with -EPERM because the mount is
secretly nodev.  It was always secretly nodev, but no one noticed
because of CVE-2014-5207, which caused that remount to succeed.

(Yay for programs that inadvertently exploited a serious security
vulnerability for their normal function.)

--Andy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
  2014-08-15 19:16                           ` Andy Lutomirski
@ 2014-08-15 19:37                               ` Serge Hallyn
  -1 siblings, 0 replies; 67+ messages in thread
From: Serge Hallyn @ 2014-08-15 19:37 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, stable,
	Kenton Varda, Eric W. Biederman, Linux FS Devel, Linus Torvalds

Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
> On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote:
> > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
> >> Currently, creating a new mount (as opposed to bindmount) in a
> >> non-root userns will implicitly set nodev unless the fs is devpts.
> >> Something like this will be necessary for file systems that allow
> >> the mounter to create device nodes without using mknod (e.g. FUSE
> >> if/when that is allowed), but none of the currently allowed
> >> filesystems do this.
> >
> > Hi,
> >
> > Sorry, I'm probably thinking stupidly, but I don't see this restriction
> > being the case
> >
> > serge@sl:~$ mount | grep tmp
> > [...]
> > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
> > serge@sl:~$ sudo mknod /run/kvm c 10 232
> > [sudo] password for serge:
> > serge@sl:~$ echo $?
> > 0
> > serge@sl:~$ ls -l /run/kvm
> > crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm
> >
> > But you seem to be saying I shouldn't be allowed to create a device inside
> > a tmpfs.  What am I overlooking?
> 
> I assume you're in the root userns.  This patch is unnecessary, and
> has no effect, if you're in the root userns.

Right, but I thought you were justifying adding FS_USERNS_DEV_MOUNT by saying
that you cannot mknod in those filesystems.  But I see you actually said
"without using mknod".  I guess I don't understand that caveat.

-serge

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
@ 2014-08-15 19:37                               ` Serge Hallyn
  0 siblings, 0 replies; 67+ messages in thread
From: Serge Hallyn @ 2014-08-15 19:37 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, linux-kernel, stable, Kenton Varda,
	Eric W. Biederman, Linux FS Devel, Linus Torvalds

Quoting Andy Lutomirski (luto@amacapital.net):
> On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> > Quoting Andy Lutomirski (luto@amacapital.net):
> >> Currently, creating a new mount (as opposed to bindmount) in a
> >> non-root userns will implicitly set nodev unless the fs is devpts.
> >> Something like this will be necessary for file systems that allow
> >> the mounter to create device nodes without using mknod (e.g. FUSE
> >> if/when that is allowed), but none of the currently allowed
> >> filesystems do this.
> >
> > Hi,
> >
> > Sorry, I'm probably thinking stupidly, but I don't see this restriction
> > being the case
> >
> > serge@sl:~$ mount | grep tmp
> > [...]
> > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
> > serge@sl:~$ sudo mknod /run/kvm c 10 232
> > [sudo] password for serge:
> > serge@sl:~$ echo $?
> > 0
> > serge@sl:~$ ls -l /run/kvm
> > crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm
> >
> > But you seem to be saying I shouldn't be allowed to create a device inside
> > a tmpfs.  What am I overlooking?
> 
> I assume you're in the root userns.  This patch is unnecessary, and
> has no effect, if you're in the root userns.

Right, but I thought you were justifying adding FS_USERNS_DEV_MOUNT by saying
that you cannot mknod in those filesystems.  But I see you actually said
"without using mknod".  I guess I don't understand that caveat.

-serge

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
  2014-08-15 19:37                               ` Serge Hallyn
@ 2014-08-15 19:56                                 ` Andy Lutomirski
  -1 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-15 19:56 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, stable,
	Kenton Varda, Eric W. Biederman, Linux FS Devel, Linus Torvalds

On Fri, Aug 15, 2014 at 12:37 PM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote:
> Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
>> On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote:
>> > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
>> >> Currently, creating a new mount (as opposed to bindmount) in a
>> >> non-root userns will implicitly set nodev unless the fs is devpts.
>> >> Something like this will be necessary for file systems that allow
>> >> the mounter to create device nodes without using mknod (e.g. FUSE
>> >> if/when that is allowed), but none of the currently allowed
>> >> filesystems do this.
>> >
>> > Hi,
>> >
>> > Sorry, I'm probably thinking stupidly, but I don't see this restriction
>> > being the case
>> >
>> > serge@sl:~$ mount | grep tmp
>> > [...]
>> > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
>> > serge@sl:~$ sudo mknod /run/kvm c 10 232
>> > [sudo] password for serge:
>> > serge@sl:~$ echo $?
>> > 0
>> > serge@sl:~$ ls -l /run/kvm
>> > crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm
>> >
>> > But you seem to be saying I shouldn't be allowed to create a device inside
>> > a tmpfs.  What am I overlooking?
>>
>> I assume you're in the root userns.  This patch is unnecessary, and
>> has no effect, if you're in the root userns.
>
> Right, but I thought you were justifying adding FS_USERNS_DEV_MOUNT by saying
> that you cannot mknod in those filesystems.  But I see you actually said
> "without using mknod".  I guess I don't understand that caveat.

IIUC, there are two ways that a user could put a device node into
their filesystem.

The obvious way is using mknod.  But mknod has its own perfectly valid
permission checks, and it doesn't need any special handling at mount
time.

The less obvious way is to mount a filesystem that already contains a
device node or to mount a filesystem that gives some other means of
inserting a device node (e.g. a network filesystem or FUSE).  Those
might allow inserting device nodes without passing a global capability
check, so unprivileged users in a userns must not be allowed to mount
such a filesystem without MNT_NODEV | MNT_LOCK_NODEV.

Fortunately, none of the existing FS_USERNS_MOUNT filesystems have
that property.  FUSE will, but we don't support FUSE in a userns yet
(unfortunately -- it would be a *very* useful feature.)

I think that, if we ever allow FUSE in a userns, we should return
-EPERM when trying to mount it unless the user specifies MS_NODEV,
which is what this patch does.  I don't think there's any reason to
play complicated games to allow programs to get away with omitting
MS_NODEV.

--Andy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
@ 2014-08-15 19:56                                 ` Andy Lutomirski
  0 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-15 19:56 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Linux Containers, linux-kernel, stable, Kenton Varda,
	Eric W. Biederman, Linux FS Devel, Linus Torvalds

On Fri, Aug 15, 2014 at 12:37 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> Quoting Andy Lutomirski (luto@amacapital.net):
>> On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
>> > Quoting Andy Lutomirski (luto@amacapital.net):
>> >> Currently, creating a new mount (as opposed to bindmount) in a
>> >> non-root userns will implicitly set nodev unless the fs is devpts.
>> >> Something like this will be necessary for file systems that allow
>> >> the mounter to create device nodes without using mknod (e.g. FUSE
>> >> if/when that is allowed), but none of the currently allowed
>> >> filesystems do this.
>> >
>> > Hi,
>> >
>> > Sorry, I'm probably thinking stupidly, but I don't see this restriction
>> > being the case
>> >
>> > serge@sl:~$ mount | grep tmp
>> > [...]
>> > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
>> > serge@sl:~$ sudo mknod /run/kvm c 10 232
>> > [sudo] password for serge:
>> > serge@sl:~$ echo $?
>> > 0
>> > serge@sl:~$ ls -l /run/kvm
>> > crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm
>> >
>> > But you seem to be saying I shouldn't be allowed to create a device inside
>> > a tmpfs.  What am I overlooking?
>>
>> I assume you're in the root userns.  This patch is unnecessary, and
>> has no effect, if you're in the root userns.
>
> Right, but I thought you were justifying adding FS_USERNS_DEV_MOUNT by saying
> that you cannot mknod in those filesystems.  But I see you actually said
> "without using mknod".  I guess I don't understand that caveat.

IIUC, there are two ways that a user could put a device node into
their filesystem.

The obvious way is using mknod.  But mknod has its own perfectly valid
permission checks, and it doesn't need any special handling at mount
time.

The less obvious way is to mount a filesystem that already contains a
device node or to mount a filesystem that gives some other means of
inserting a device node (e.g. a network filesystem or FUSE).  Those
might allow inserting device nodes without passing a global capability
check, so unprivileged users in a userns must not be allowed to mount
such a filesystem without MNT_NODEV | MNT_LOCK_NODEV.

Fortunately, none of the existing FS_USERNS_MOUNT filesystems have
that property.  FUSE will, but we don't support FUSE in a userns yet
(unfortunately -- it would be a *very* useful feature.)

I think that, if we ever allow FUSE in a userns, we should return
-EPERM when trying to mount it unless the user specifies MS_NODEV,
which is what this patch does.  I don't think there's any reason to
play complicated games to allow programs to get away with omitting
MS_NODEV.

--Andy

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
  2014-08-15 19:56                                 ` Andy Lutomirski
@ 2014-08-15 20:16                                     ` Serge Hallyn
  -1 siblings, 0 replies; 67+ messages in thread
From: Serge Hallyn @ 2014-08-15 20:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA, stable,
	Kenton Varda, Eric W. Biederman, Linux FS Devel, Linus Torvalds

Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
> On Fri, Aug 15, 2014 at 12:37 PM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote:
> > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
> >> On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org> wrote:
> >> > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
> >> >> Currently, creating a new mount (as opposed to bindmount) in a
> >> >> non-root userns will implicitly set nodev unless the fs is devpts.
> >> >> Something like this will be necessary for file systems that allow
> >> >> the mounter to create device nodes without using mknod (e.g. FUSE
> >> >> if/when that is allowed), but none of the currently allowed
> >> >> filesystems do this.
> >> >
> >> > Hi,
> >> >
> >> > Sorry, I'm probably thinking stupidly, but I don't see this restriction
> >> > being the case
> >> >
> >> > serge@sl:~$ mount | grep tmp
> >> > [...]
> >> > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
> >> > serge@sl:~$ sudo mknod /run/kvm c 10 232
> >> > [sudo] password for serge:
> >> > serge@sl:~$ echo $?
> >> > 0
> >> > serge@sl:~$ ls -l /run/kvm
> >> > crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm
> >> >
> >> > But you seem to be saying I shouldn't be allowed to create a device inside
> >> > a tmpfs.  What am I overlooking?
> >>
> >> I assume you're in the root userns.  This patch is unnecessary, and
> >> has no effect, if you're in the root userns.
> >
> > Right, but I thought you were justifying adding FS_USERNS_DEV_MOUNT by saying
> > that you cannot mknod in those filesystems.  But I see you actually said
> > "without using mknod".  I guess I don't understand that caveat.
> 
> IIUC, there are two ways that a user could put a device node into
> their filesystem.
> 
> The obvious way is using mknod.  But mknod has its own perfectly valid
> permission checks, and it doesn't need any special handling at mount
> time.
> 
> The less obvious way is to mount a filesystem that already contains a
> device node or to mount a filesystem that gives some other means of
> inserting a device node (e.g. a network filesystem or FUSE).  Those
> might allow inserting device nodes without passing a global capability
> check, so unprivileged users in a userns must not be allowed to mount
> such a filesystem without MNT_NODEV | MNT_LOCK_NODEV.
> 
> Fortunately, none of the existing FS_USERNS_MOUNT filesystems have
> that property.  FUSE will, but we don't support FUSE in a userns yet
> (unfortunately -- it would be a *very* useful feature.)
> 
> I think that, if we ever allow FUSE in a userns, we should return

Which, btw, I'm hoping we'll be allowing soon.

> -EPERM when trying to mount it unless the user specifies MS_NODEV,

In either case we can think that through when the time comes.

> which is what this patch does.  I don't think there's any reason to
> play complicated games to allow programs to get away with omitting
> MS_NODEV.

Thanks, Andy.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
@ 2014-08-15 20:16                                     ` Serge Hallyn
  0 siblings, 0 replies; 67+ messages in thread
From: Serge Hallyn @ 2014-08-15 20:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, linux-kernel, stable, Kenton Varda,
	Eric W. Biederman, Linux FS Devel, Linus Torvalds

Quoting Andy Lutomirski (luto@amacapital.net):
> On Fri, Aug 15, 2014 at 12:37 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> > Quoting Andy Lutomirski (luto@amacapital.net):
> >> On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> >> > Quoting Andy Lutomirski (luto@amacapital.net):
> >> >> Currently, creating a new mount (as opposed to bindmount) in a
> >> >> non-root userns will implicitly set nodev unless the fs is devpts.
> >> >> Something like this will be necessary for file systems that allow
> >> >> the mounter to create device nodes without using mknod (e.g. FUSE
> >> >> if/when that is allowed), but none of the currently allowed
> >> >> filesystems do this.
> >> >
> >> > Hi,
> >> >
> >> > Sorry, I'm probably thinking stupidly, but I don't see this restriction
> >> > being the case
> >> >
> >> > serge@sl:~$ mount | grep tmp
> >> > [...]
> >> > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
> >> > serge@sl:~$ sudo mknod /run/kvm c 10 232
> >> > [sudo] password for serge:
> >> > serge@sl:~$ echo $?
> >> > 0
> >> > serge@sl:~$ ls -l /run/kvm
> >> > crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm
> >> >
> >> > But you seem to be saying I shouldn't be allowed to create a device inside
> >> > a tmpfs.  What am I overlooking?
> >>
> >> I assume you're in the root userns.  This patch is unnecessary, and
> >> has no effect, if you're in the root userns.
> >
> > Right, but I thought you were justifying adding FS_USERNS_DEV_MOUNT by saying
> > that you cannot mknod in those filesystems.  But I see you actually said
> > "without using mknod".  I guess I don't understand that caveat.
> 
> IIUC, there are two ways that a user could put a device node into
> their filesystem.
> 
> The obvious way is using mknod.  But mknod has its own perfectly valid
> permission checks, and it doesn't need any special handling at mount
> time.
> 
> The less obvious way is to mount a filesystem that already contains a
> device node or to mount a filesystem that gives some other means of
> inserting a device node (e.g. a network filesystem or FUSE).  Those
> might allow inserting device nodes without passing a global capability
> check, so unprivileged users in a userns must not be allowed to mount
> such a filesystem without MNT_NODEV | MNT_LOCK_NODEV.
> 
> Fortunately, none of the existing FS_USERNS_MOUNT filesystems have
> that property.  FUSE will, but we don't support FUSE in a userns yet
> (unfortunately -- it would be a *very* useful feature.)
> 
> I think that, if we ever allow FUSE in a userns, we should return

Which, btw, I'm hoping we'll be allowing soon.

> -EPERM when trying to mount it unless the user specifies MS_NODEV,

In either case we can think that through when the time comes.

> which is what this patch does.  I don't think there's any reason to
> play complicated games to allow programs to get away with omitting
> MS_NODEV.

Thanks, Andy.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
       [not found]                       ` <2686c32f00b14148379e8cfee9c028c794d4aa1a.1407974494.git.luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
  2014-08-15 19:05                         ` Serge Hallyn
@ 2014-08-15 20:16                         ` Serge Hallyn
  2014-08-28  1:35                         ` Andy Lutomirski
  2 siblings, 0 replies; 67+ messages in thread
From: Serge Hallyn @ 2014-08-15 20:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, Kenton Varda, Eric W. Biederman,
	Linux FS Devel, Linus Torvalds

Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
> Currently, creating a new mount (as opposed to bindmount) in a
> non-root userns will implicitly set nodev unless the fs is devpts.
> Something like this will be necessary for file systems that allow
> the mounter to create device nodes without using mknod (e.g. FUSE
> if/when that is allowed), but none of the currently allowed
> filesystems do this.
> 
> Implicitly adding nodev is problematic, though.  It will make it
> unsafe to ever remove the implicit addition, since userspace might
> start to rely on it.
> 
> This fixes a minor regression in:
> 
>     9566d6742852 mnt: Correct permission checks in do_remount
> 
> Prior to that commit, MNT_NODEV wasn't enforced for remounts, so
> there is existing user code that creates a new mount in a userns
> without MS_NODEV and then expects a remount with matching options to
> work.  That commit broke code that does this.
> 
> Fortunately, since the implicit nodev has no effect on any existing
> filesystems, we can still safely remove it.
> 
> This replaces the implicit nodev with an explicit nodev requirement:
> anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get
> -EPERM unless they set nodev.  If they set nodev, that setting will
> be locked.
> 
> As an added benefit, if anything like device namespaces is ever
> added, then user code will be able to opt out of nodev by clearing
> nodev.
> 
> To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all
> FS_USERNS_MOUNT filesystems.  All of the current filesystems with
> FS_USERNS_MOUNT set are safe.
> 
> I confirmed that this is compatible with Sandstorm's revision
> 1bf0c4847b.  That revision of Sandstorm does not work without this
> fix if 9566d6742852 is applied.
> 

Acked-by: Serge E. Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>

This seems like the best alternative by far.

> Cc: Kenton Varda <kenton-AuYgBwuPrUQTaNkGU808tA@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Signed-off-by: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> ---
>  fs/namespace.c   | 16 ++++++++++++----
>  fs/proc/root.c   |  2 +-
>  fs/ramfs/inode.c |  2 +-
>  fs/sysfs/mount.c |  2 +-
>  ipc/mqueue.c     |  2 +-
>  mm/shmem.c       |  4 ++--
>  6 files changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 0acabea58319..835fa9e8307e 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
>  			put_filesystem(type);
>  			return -EPERM;
>  		}
> -		/* Only in special cases allow devices from mounts
> -		 * created outside the initial user namespace.
> +
> +		/*
> +		 * If a filesystem might allow the mounter to put
> +		 * device nodes on it without the checks in mknod,
> +		 * then require MS_NODEV to mount it.
>  		 */
>  		if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
> -			flags |= MS_NODEV;
> -			mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
> +			if (!(mnt_flags & MNT_NODEV)) {
> +				put_filesystem(type);
> +				return -EPERM;
> +			}
> +
> +			/* Do not allow nodev to be cleared. */
> +			mnt_flags |= MNT_LOCK_NODEV;
>  		}
>  	}
>  
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index 094e44d4a6be..2313b280729e 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = {
>  	.name		= "proc",
>  	.mount		= proc_mount,
>  	.kill_sb	= proc_kill_sb,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  void __init proc_root_init(void)
> diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
> index d365b1c4eb3c..b95b7302d4cc 100644
> --- a/fs/ramfs/inode.c
> +++ b/fs/ramfs/inode.c
> @@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = {
>  	.name		= "ramfs",
>  	.mount		= ramfs_mount,
>  	.kill_sb	= ramfs_kill_sb,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init init_ramfs_fs(void)
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 8a49486bf30c..56ba59317e24 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = {
>  	.name		= "sysfs",
>  	.mount		= sysfs_mount,
>  	.kill_sb	= sysfs_kill_sb,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init sysfs_init(void)
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index 4fcf39af1776..56abbc848d4c 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = {
>  	.name = "mqueue",
>  	.mount = mqueue_mount,
>  	.kill_sb = kill_litter_super,
> -	.fs_flags = FS_USERNS_MOUNT,
> +	.fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int mq_init_ns(struct ipc_namespace *ns)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index a42add14331c..f4a708a8f9e3 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = {
>  	.name		= "tmpfs",
>  	.mount		= shmem_mount,
>  	.kill_sb	= kill_litter_super,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init shmem_init(void)
> @@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = {
>  	.name		= "tmpfs",
>  	.mount		= ramfs_mount,
>  	.kill_sb	= kill_litter_super,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init shmem_init(void)
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
  2014-08-14  0:03                     ` Andy Lutomirski
  2014-08-15 19:05                       ` Serge Hallyn
       [not found]                       ` <2686c32f00b14148379e8cfee9c028c794d4aa1a.1407974494.git.luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
@ 2014-08-15 20:16                       ` Serge Hallyn
  2014-08-28  1:35                       ` Andy Lutomirski
  3 siblings, 0 replies; 67+ messages in thread
From: Serge Hallyn @ 2014-08-15 20:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eric W. Biederman, linux-kernel, Linux Containers,
	Linux FS Devel, Linus Torvalds, Kenton Varda, stable

Quoting Andy Lutomirski (luto@amacapital.net):
> Currently, creating a new mount (as opposed to bindmount) in a
> non-root userns will implicitly set nodev unless the fs is devpts.
> Something like this will be necessary for file systems that allow
> the mounter to create device nodes without using mknod (e.g. FUSE
> if/when that is allowed), but none of the currently allowed
> filesystems do this.
> 
> Implicitly adding nodev is problematic, though.  It will make it
> unsafe to ever remove the implicit addition, since userspace might
> start to rely on it.
> 
> This fixes a minor regression in:
> 
>     9566d6742852 mnt: Correct permission checks in do_remount
> 
> Prior to that commit, MNT_NODEV wasn't enforced for remounts, so
> there is existing user code that creates a new mount in a userns
> without MS_NODEV and then expects a remount with matching options to
> work.  That commit broke code that does this.
> 
> Fortunately, since the implicit nodev has no effect on any existing
> filesystems, we can still safely remove it.
> 
> This replaces the implicit nodev with an explicit nodev requirement:
> anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get
> -EPERM unless they set nodev.  If they set nodev, that setting will
> be locked.
> 
> As an added benefit, if anything like device namespaces is ever
> added, then user code will be able to opt out of nodev by clearing
> nodev.
> 
> To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all
> FS_USERNS_MOUNT filesystems.  All of the current filesystems with
> FS_USERNS_MOUNT set are safe.
> 
> I confirmed that this is compatible with Sandstorm's revision
> 1bf0c4847b.  That revision of Sandstorm does not work without this
> fix if 9566d6742852 is applied.
> 

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>

This seems like the best alternative by far.

> Cc: Kenton Varda <kenton@sandstorm.io>
> Cc: stable@vger.kernel.org
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> ---
>  fs/namespace.c   | 16 ++++++++++++----
>  fs/proc/root.c   |  2 +-
>  fs/ramfs/inode.c |  2 +-
>  fs/sysfs/mount.c |  2 +-
>  ipc/mqueue.c     |  2 +-
>  mm/shmem.c       |  4 ++--
>  6 files changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 0acabea58319..835fa9e8307e 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
>  			put_filesystem(type);
>  			return -EPERM;
>  		}
> -		/* Only in special cases allow devices from mounts
> -		 * created outside the initial user namespace.
> +
> +		/*
> +		 * If a filesystem might allow the mounter to put
> +		 * device nodes on it without the checks in mknod,
> +		 * then require MS_NODEV to mount it.
>  		 */
>  		if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
> -			flags |= MS_NODEV;
> -			mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
> +			if (!(mnt_flags & MNT_NODEV)) {
> +				put_filesystem(type);
> +				return -EPERM;
> +			}
> +
> +			/* Do not allow nodev to be cleared. */
> +			mnt_flags |= MNT_LOCK_NODEV;
>  		}
>  	}
>  
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index 094e44d4a6be..2313b280729e 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = {
>  	.name		= "proc",
>  	.mount		= proc_mount,
>  	.kill_sb	= proc_kill_sb,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  void __init proc_root_init(void)
> diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
> index d365b1c4eb3c..b95b7302d4cc 100644
> --- a/fs/ramfs/inode.c
> +++ b/fs/ramfs/inode.c
> @@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = {
>  	.name		= "ramfs",
>  	.mount		= ramfs_mount,
>  	.kill_sb	= ramfs_kill_sb,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init init_ramfs_fs(void)
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 8a49486bf30c..56ba59317e24 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = {
>  	.name		= "sysfs",
>  	.mount		= sysfs_mount,
>  	.kill_sb	= sysfs_kill_sb,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init sysfs_init(void)
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index 4fcf39af1776..56abbc848d4c 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = {
>  	.name = "mqueue",
>  	.mount = mqueue_mount,
>  	.kill_sb = kill_litter_super,
> -	.fs_flags = FS_USERNS_MOUNT,
> +	.fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int mq_init_ns(struct ipc_namespace *ns)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index a42add14331c..f4a708a8f9e3 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = {
>  	.name		= "tmpfs",
>  	.mount		= shmem_mount,
>  	.kill_sb	= kill_litter_super,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init shmem_init(void)
> @@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = {
>  	.name		= "tmpfs",
>  	.mount		= ramfs_mount,
>  	.kill_sb	= kill_litter_super,
> -	.fs_flags	= FS_USERNS_MOUNT,
> +	.fs_flags	= FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>  
>  int __init shmem_init(void)
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-06  0:57 ` Eric W. Biederman
@ 2014-08-20 15:06     ` Richard Weinberger
  -1 siblings, 0 replies; 67+ messages in thread
From: Richard Weinberger @ 2014-08-20 15:06 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML,
	linux-fsdevel, Linus Torvalds

On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>
> Linus,
>
> Please pull the for-linus branch from the git tree:
>
>    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus
>
>    HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts
>
> This is a bunch of small changes built against 3.16-rc6.  The most
> significant change for users is the first patch which makes setns
> drmatically faster by removing unneded rcu handling.
>
> The next chunk of changes are so that "mount -o remount,.." will not
> allow the user namespace root to drop flags on a mount set by the system
> wide root.  Aks this forces read-only mounts to stay read-only, no-dev
> mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to
> stay no exec and it prevents unprivileged users from messing with a
> mounts atime settings.  I have included my test case as the last patch
> in this series so people performing backports can verify this change
> works correctly.
>
> The next change fixes a bug in NFS that was discovered while auditing
> nsproxy users for the first optimization.  Today you can oops the kernel
> by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid
> namespaces.  I rebased and fixed the build of the !CONFIG_NFS_FS case
> yesterday when a build bot caught my typo.  Given that no one to my
> knowledge bases anything on my tree fixing the typo in place seems more
> responsible that requiring a typo-fix to be backported as well.
>
> The last change is a small semantic cleanup introducing
> /proc/thread-self and pointing /proc/mounts and /proc/net at it.  This
> prevents several kinds of problemantic corner cases.  It is a
> user-visible change so it has a minute chance of causing regressions so
> the change to /proc/mounts and /proc/net are individual one line commits
> that can be trivially reverted.  Unfortunately I lost and could not find
> the email of the original reporter so he is not credited.  From at least
> one perspective this change to /proc/net is a refgression fix to allow
> pthread /proc/net uses that were broken by the introduction of the network
> namespace.
>
> Eric
>
> Eric W. Biederman (11):
>       namespaces: Use task_lock and not rcu to protect nsproxy
>       mnt: Only change user settable mount flags in remount
>       mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount
>       mnt: Correct permission checks in do_remount

This commit breaks libvirt-lxc.
libvirt does in lxcContainerMountBasicFS():

        /*
         * We can't immediately set the MS_RDONLY flag when mounting filesystems
         * because (in at least some kernel versions) this will propagate back
         * to the original mount in the host OS, turning it readonly too. Thus
         * we mount the filesystem in read-write mode initially, and then do a
         * separate read-only bind mount on top of that.
         */
        bindOverReadonly = !!(mnt_mflags & MS_RDONLY);

        VIR_DEBUG("Mount %s on %s type=%s flags=%x",
                  mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY);
        if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags &
~MS_RDONLY, NULL) < 0) {

^^^^ Here it fails for sysfs because with user namespaces we bind the
existing /sys into the container
and would have to read out all existing mount flags from the current /sys mount.
Otherwise mount() fails with EPERM.
On my test system /sys is mounted with
"rw,nosuid,nodev,noexec,relatime" and libvirt
misses the realtime...

            virReportSystemError(errno,
                                 _("Failed to mount %s on %s type %s flags=%x"),
                                 mnt_src, mnt->dst, NULLSTR(mnt->type),
                                 mnt_mflags & ~MS_RDONLY);
            goto cleanup;
        }

        if (bindOverReadonly &&
            mount(mnt_src, mnt->dst, NULL,
                  MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) {

^^^ Here it fails because now we'd have to specify all flags as used
for the first
mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
See lxcBasicMounts[].
In this case the fix is easy, add mnt_mflags to the mount flags.

         virReportSystemError(errno,
                                 _("Failed to re-mount %s on %s flags=%x"),
                                 mnt_src, mnt->dst,
                                 MS_BIND|MS_REMOUNT|MS_RDONLY);
            goto cleanup;
        }


-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-20 15:06     ` Richard Weinberger
  0 siblings, 0 replies; 67+ messages in thread
From: Richard Weinberger @ 2014-08-20 15:06 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Linux Containers, linux-fsdevel, LKML,
	libvir-list, Daniel P. Berrange

On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Linus,
>
> Please pull the for-linus branch from the git tree:
>
>    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus
>
>    HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts
>
> This is a bunch of small changes built against 3.16-rc6.  The most
> significant change for users is the first patch which makes setns
> drmatically faster by removing unneded rcu handling.
>
> The next chunk of changes are so that "mount -o remount,.." will not
> allow the user namespace root to drop flags on a mount set by the system
> wide root.  Aks this forces read-only mounts to stay read-only, no-dev
> mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to
> stay no exec and it prevents unprivileged users from messing with a
> mounts atime settings.  I have included my test case as the last patch
> in this series so people performing backports can verify this change
> works correctly.
>
> The next change fixes a bug in NFS that was discovered while auditing
> nsproxy users for the first optimization.  Today you can oops the kernel
> by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid
> namespaces.  I rebased and fixed the build of the !CONFIG_NFS_FS case
> yesterday when a build bot caught my typo.  Given that no one to my
> knowledge bases anything on my tree fixing the typo in place seems more
> responsible that requiring a typo-fix to be backported as well.
>
> The last change is a small semantic cleanup introducing
> /proc/thread-self and pointing /proc/mounts and /proc/net at it.  This
> prevents several kinds of problemantic corner cases.  It is a
> user-visible change so it has a minute chance of causing regressions so
> the change to /proc/mounts and /proc/net are individual one line commits
> that can be trivially reverted.  Unfortunately I lost and could not find
> the email of the original reporter so he is not credited.  From at least
> one perspective this change to /proc/net is a refgression fix to allow
> pthread /proc/net uses that were broken by the introduction of the network
> namespace.
>
> Eric
>
> Eric W. Biederman (11):
>       namespaces: Use task_lock and not rcu to protect nsproxy
>       mnt: Only change user settable mount flags in remount
>       mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount
>       mnt: Correct permission checks in do_remount

This commit breaks libvirt-lxc.
libvirt does in lxcContainerMountBasicFS():

        /*
         * We can't immediately set the MS_RDONLY flag when mounting filesystems
         * because (in at least some kernel versions) this will propagate back
         * to the original mount in the host OS, turning it readonly too. Thus
         * we mount the filesystem in read-write mode initially, and then do a
         * separate read-only bind mount on top of that.
         */
        bindOverReadonly = !!(mnt_mflags & MS_RDONLY);

        VIR_DEBUG("Mount %s on %s type=%s flags=%x",
                  mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY);
        if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags &
~MS_RDONLY, NULL) < 0) {

^^^^ Here it fails for sysfs because with user namespaces we bind the
existing /sys into the container
and would have to read out all existing mount flags from the current /sys mount.
Otherwise mount() fails with EPERM.
On my test system /sys is mounted with
"rw,nosuid,nodev,noexec,relatime" and libvirt
misses the realtime...

            virReportSystemError(errno,
                                 _("Failed to mount %s on %s type %s flags=%x"),
                                 mnt_src, mnt->dst, NULLSTR(mnt->type),
                                 mnt_mflags & ~MS_RDONLY);
            goto cleanup;
        }

        if (bindOverReadonly &&
            mount(mnt_src, mnt->dst, NULL,
                  MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) {

^^^ Here it fails because now we'd have to specify all flags as used
for the first
mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
See lxcBasicMounts[].
In this case the fix is easy, add mnt_mflags to the mount flags.

         virReportSystemError(errno,
                                 _("Failed to re-mount %s on %s flags=%x"),
                                 mnt_src, mnt->dst,
                                 MS_BIND|MS_REMOUNT|MS_RDONLY);
            goto cleanup;
        }


-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-20 15:06     ` Richard Weinberger
@ 2014-08-21  4:53         ` Eric W. Biederman
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-21  4:53 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML,
	linux-fsdevel, Linus Torvalds

Richard Weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>
> This commit breaks libvirt-lxc.
> libvirt does in lxcContainerMountBasicFS():

The bugs fixed are security issues, so if we have to break a small
number of userspace applications we will.  Anything that we can
reasonably do to avoid regressions will be done.

Could you please look at my user-namespace.git#for-next branch I have a
fix for at least one regresion causing issue in there.  I think it may
fix your issues but I am not fully certain more comments below.

>         /*
>          * We can't immediately set the MS_RDONLY flag when mounting filesystems
>          * because (in at least some kernel versions) this will propagate back
>          * to the original mount in the host OS, turning it readonly too. Thus
>          * we mount the filesystem in read-write mode initially, and then do a
>          * separate read-only bind mount on top of that.
>          */
>         bindOverReadonly = !!(mnt_mflags & MS_RDONLY);
>
>         VIR_DEBUG("Mount %s on %s type=%s flags=%x",
>                   mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY);
>         if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags &
> ~MS_RDONLY, NULL) < 0) {
>
> ^^^^ Here it fails for sysfs because with user namespaces we bind the
> existing /sys into the container
> and would have to read out all existing mount flags from the current /sys mount.
> Otherwise mount() fails with EPERM.
> On my test system /sys is mounted with
> "rw,nosuid,nodev,noexec,relatime" and libvirt
> misses the realtime...

Not specifying any atime flags to mount should be safe as that asks for
the default atime flags which for remount I have made the default atime
flags the existing atime flags.  So I am scratching my head a little on
this one.

>
>             virReportSystemError(errno,
>                                  _("Failed to mount %s on %s type %s flags=%x"),
>                                  mnt_src, mnt->dst, NULLSTR(mnt->type),
>                                  mnt_mflags & ~MS_RDONLY);
>             goto cleanup;
>         }
>
>         if (bindOverReadonly &&
>             mount(mnt_src, mnt->dst, NULL,
>                   MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) {
>
> ^^^ Here it fails because now we'd have to specify all flags as used
> for the first
> mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
> See lxcBasicMounts[].
> In this case the fix is easy, add mnt_mflags to the mount flags.

That has always been a bug in general because remount has always
required specifying the complete set of mount flags you want to have.

That fact that flags such as nosuid are now properly locked so you can
not change them if you are not the global root user just makes this
obvious.

Andy Lutermorski has observed that statvfs will return the mount flags
making reading them simple.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-21  4:53         ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-21  4:53 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Linus Torvalds, Linux Containers, linux-fsdevel, LKML,
	libvir-list, Daniel P. Berrange

Richard Weinberger <richard.weinberger@gmail.com> writes:

> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> This commit breaks libvirt-lxc.
> libvirt does in lxcContainerMountBasicFS():

The bugs fixed are security issues, so if we have to break a small
number of userspace applications we will.  Anything that we can
reasonably do to avoid regressions will be done.

Could you please look at my user-namespace.git#for-next branch I have a
fix for at least one regresion causing issue in there.  I think it may
fix your issues but I am not fully certain more comments below.

>         /*
>          * We can't immediately set the MS_RDONLY flag when mounting filesystems
>          * because (in at least some kernel versions) this will propagate back
>          * to the original mount in the host OS, turning it readonly too. Thus
>          * we mount the filesystem in read-write mode initially, and then do a
>          * separate read-only bind mount on top of that.
>          */
>         bindOverReadonly = !!(mnt_mflags & MS_RDONLY);
>
>         VIR_DEBUG("Mount %s on %s type=%s flags=%x",
>                   mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY);
>         if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags &
> ~MS_RDONLY, NULL) < 0) {
>
> ^^^^ Here it fails for sysfs because with user namespaces we bind the
> existing /sys into the container
> and would have to read out all existing mount flags from the current /sys mount.
> Otherwise mount() fails with EPERM.
> On my test system /sys is mounted with
> "rw,nosuid,nodev,noexec,relatime" and libvirt
> misses the realtime...

Not specifying any atime flags to mount should be safe as that asks for
the default atime flags which for remount I have made the default atime
flags the existing atime flags.  So I am scratching my head a little on
this one.

>
>             virReportSystemError(errno,
>                                  _("Failed to mount %s on %s type %s flags=%x"),
>                                  mnt_src, mnt->dst, NULLSTR(mnt->type),
>                                  mnt_mflags & ~MS_RDONLY);
>             goto cleanup;
>         }
>
>         if (bindOverReadonly &&
>             mount(mnt_src, mnt->dst, NULL,
>                   MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) {
>
> ^^^ Here it fails because now we'd have to specify all flags as used
> for the first
> mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
> See lxcBasicMounts[].
> In this case the fix is easy, add mnt_mflags to the mount flags.

That has always been a bug in general because remount has always
required specifying the complete set of mount flags you want to have.

That fact that flags such as nosuid are now properly locked so you can
not change them if you are not the global root user just makes this
obvious.

Andy Lutermorski has observed that statvfs will return the mount flags
making reading them simple.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
       [not found]         ` <87vbpm4f4y.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
@ 2014-08-21  6:29           ` Richard Weinberger
  2014-08-21 13:12             ` Christoph Hellwig
  1 sibling, 0 replies; 67+ messages in thread
From: Richard Weinberger @ 2014-08-21  6:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML,
	linux-fsdevel, Linus Torvalds

Am 21.08.2014 06:53, schrieb Eric W. Biederman:
> The bugs fixed are security issues, so if we have to break a small
> number of userspace applications we will.  Anything that we can
> reasonably do to avoid regressions will be done.
> 
> Could you please look at my user-namespace.git#for-next branch I have a
> fix for at least one regresion causing issue in there.  I think it may
> fix your issues but I am not fully certain more comments below.

I'll run this on my LXC testbed today.

>>         /*
>>          * We can't immediately set the MS_RDONLY flag when mounting filesystems
>>          * because (in at least some kernel versions) this will propagate back
>>          * to the original mount in the host OS, turning it readonly too. Thus
>>          * we mount the filesystem in read-write mode initially, and then do a
>>          * separate read-only bind mount on top of that.
>>          */
>>         bindOverReadonly = !!(mnt_mflags & MS_RDONLY);
>>
>>         VIR_DEBUG("Mount %s on %s type=%s flags=%x",
>>                   mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY);
>>         if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags &
>> ~MS_RDONLY, NULL) < 0) {
>>
>> ^^^^ Here it fails for sysfs because with user namespaces we bind the
>> existing /sys into the container
>> and would have to read out all existing mount flags from the current /sys mount.
>> Otherwise mount() fails with EPERM.
>> On my test system /sys is mounted with
>> "rw,nosuid,nodev,noexec,relatime" and libvirt
>> misses the realtime...
> 
> Not specifying any atime flags to mount should be safe as that asks for
> the default atime flags which for remount I have made the default atime
> flags the existing atime flags.  So I am scratching my head a little on
> this one.

Okay, let me find out why exactly libvirt gets a EPERM here.
Maybe there are more odds hidden.

>>
>>             virReportSystemError(errno,
>>                                  _("Failed to mount %s on %s type %s flags=%x"),
>>                                  mnt_src, mnt->dst, NULLSTR(mnt->type),
>>                                  mnt_mflags & ~MS_RDONLY);
>>             goto cleanup;
>>         }
>>
>>         if (bindOverReadonly &&
>>             mount(mnt_src, mnt->dst, NULL,
>>                   MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) {
>>
>> ^^^ Here it fails because now we'd have to specify all flags as used
>> for the first
>> mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
>> See lxcBasicMounts[].
>> In this case the fix is easy, add mnt_mflags to the mount flags.
> 
> That has always been a bug in general because remount has always
> required specifying the complete set of mount flags you want to have.
> 
> That fact that flags such as nosuid are now properly locked so you can
> not change them if you are not the global root user just makes this
> obvious.
> 
> Andy Lutermorski has observed that statvfs will return the mount flags
> making reading them simple.

Thanks for the clarification, I'll create a fix for libvirt.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-21  4:53         ` Eric W. Biederman
  (?)
  (?)
@ 2014-08-21  6:29         ` Richard Weinberger
       [not found]           ` <53F591E7.3010509-/L3Ra7n9ekc@public.gmane.org>
  -1 siblings, 1 reply; 67+ messages in thread
From: Richard Weinberger @ 2014-08-21  6:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Linux Containers, linux-fsdevel, LKML,
	libvir-list, Daniel P. Berrange

Am 21.08.2014 06:53, schrieb Eric W. Biederman:
> The bugs fixed are security issues, so if we have to break a small
> number of userspace applications we will.  Anything that we can
> reasonably do to avoid regressions will be done.
> 
> Could you please look at my user-namespace.git#for-next branch I have a
> fix for at least one regresion causing issue in there.  I think it may
> fix your issues but I am not fully certain more comments below.

I'll run this on my LXC testbed today.

>>         /*
>>          * We can't immediately set the MS_RDONLY flag when mounting filesystems
>>          * because (in at least some kernel versions) this will propagate back
>>          * to the original mount in the host OS, turning it readonly too. Thus
>>          * we mount the filesystem in read-write mode initially, and then do a
>>          * separate read-only bind mount on top of that.
>>          */
>>         bindOverReadonly = !!(mnt_mflags & MS_RDONLY);
>>
>>         VIR_DEBUG("Mount %s on %s type=%s flags=%x",
>>                   mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY);
>>         if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags &
>> ~MS_RDONLY, NULL) < 0) {
>>
>> ^^^^ Here it fails for sysfs because with user namespaces we bind the
>> existing /sys into the container
>> and would have to read out all existing mount flags from the current /sys mount.
>> Otherwise mount() fails with EPERM.
>> On my test system /sys is mounted with
>> "rw,nosuid,nodev,noexec,relatime" and libvirt
>> misses the realtime...
> 
> Not specifying any atime flags to mount should be safe as that asks for
> the default atime flags which for remount I have made the default atime
> flags the existing atime flags.  So I am scratching my head a little on
> this one.

Okay, let me find out why exactly libvirt gets a EPERM here.
Maybe there are more odds hidden.

>>
>>             virReportSystemError(errno,
>>                                  _("Failed to mount %s on %s type %s flags=%x"),
>>                                  mnt_src, mnt->dst, NULLSTR(mnt->type),
>>                                  mnt_mflags & ~MS_RDONLY);
>>             goto cleanup;
>>         }
>>
>>         if (bindOverReadonly &&
>>             mount(mnt_src, mnt->dst, NULL,
>>                   MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) {
>>
>> ^^^ Here it fails because now we'd have to specify all flags as used
>> for the first
>> mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
>> See lxcBasicMounts[].
>> In this case the fix is easy, add mnt_mflags to the mount flags.
> 
> That has always been a bug in general because remount has always
> required specifying the complete set of mount flags you want to have.
> 
> That fact that flags such as nosuid are now properly locked so you can
> not change them if you are not the global root user just makes this
> obvious.
> 
> Andy Lutermorski has observed that statvfs will return the mount flags
> making reading them simple.

Thanks for the clarification, I'll create a fix for libvirt.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-21  6:29         ` Richard Weinberger
@ 2014-08-21  7:24               ` Richard Weinberger
  0 siblings, 0 replies; 67+ messages in thread
From: Richard Weinberger @ 2014-08-21  7:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML,
	linux-fsdevel, Linus Torvalds

Am 21.08.2014 08:29, schrieb Richard Weinberger:
> Am 21.08.2014 06:53, schrieb Eric W. Biederman:
>> The bugs fixed are security issues, so if we have to break a small
>> number of userspace applications we will.  Anything that we can
>> reasonably do to avoid regressions will be done.
>>
>> Could you please look at my user-namespace.git#for-next branch I have a
>> fix for at least one regresion causing issue in there.  I think it may
>> fix your issues but I am not fully certain more comments below.
> 
> I'll run this on my LXC testbed today.

Looks good. With these patches applied libvirt works again. :)

Thanks,
//richard

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-21  7:24               ` Richard Weinberger
  0 siblings, 0 replies; 67+ messages in thread
From: Richard Weinberger @ 2014-08-21  7:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Linux Containers, linux-fsdevel, LKML,
	libvir-list, Daniel P. Berrange

Am 21.08.2014 08:29, schrieb Richard Weinberger:
> Am 21.08.2014 06:53, schrieb Eric W. Biederman:
>> The bugs fixed are security issues, so if we have to break a small
>> number of userspace applications we will.  Anything that we can
>> reasonably do to avoid regressions will be done.
>>
>> Could you please look at my user-namespace.git#for-next branch I have a
>> fix for at least one regresion causing issue in there.  I think it may
>> fix your issues but I am not fully certain more comments below.
> 
> I'll run this on my LXC testbed today.

Looks good. With these patches applied libvirt works again. :)

Thanks,
//richard

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-21  4:53         ` Eric W. Biederman
@ 2014-08-21 13:12             ` Christoph Hellwig
  -1 siblings, 0 replies; 67+ messages in thread
From: Christoph Hellwig @ 2014-08-21 13:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Richard Weinberger, Linux Containers, LKML,
	libvir-list-H+wXaHxf7aLQT0dZR+AlfA, linux-fsdevel,
	Linus Torvalds

On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
> Richard Weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> 
> > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> >
> > This commit breaks libvirt-lxc.
> > libvirt does in lxcContainerMountBasicFS():
> 
> The bugs fixed are security issues, so if we have to break a small
> number of userspace applications we will.  Anything that we can
> reasonably do to avoid regressions will be done.

Can you explain the security issues in detail?  Breaking common
userspace like libvirt-lxc with just a little bit of handwaiving is
entirely unacceptable.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-21 13:12             ` Christoph Hellwig
  0 siblings, 0 replies; 67+ messages in thread
From: Christoph Hellwig @ 2014-08-21 13:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Richard Weinberger, Linus Torvalds, Linux Containers,
	linux-fsdevel, LKML, libvir-list, Daniel P. Berrange

On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
> Richard Weinberger <richard.weinberger@gmail.com> writes:
> 
> > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> > This commit breaks libvirt-lxc.
> > libvirt does in lxcContainerMountBasicFS():
> 
> The bugs fixed are security issues, so if we have to break a small
> number of userspace applications we will.  Anything that we can
> reasonably do to avoid regressions will be done.

Can you explain the security issues in detail?  Breaking common
userspace like libvirt-lxc with just a little bit of handwaiving is
entirely unacceptable.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-21 13:12             ` Christoph Hellwig
@ 2014-08-21 13:22                 ` Richard Weinberger
  -1 siblings, 0 replies; 67+ messages in thread
From: Richard Weinberger @ 2014-08-21 13:22 UTC (permalink / raw)
  To: Christoph Hellwig, Eric W. Biederman
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML,
	linux-fsdevel, Linus Torvalds

Am 21.08.2014 15:12, schrieb Christoph Hellwig:
> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>> Richard Weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>>
>>> This commit breaks libvirt-lxc.
>>> libvirt does in lxcContainerMountBasicFS():
>>
>> The bugs fixed are security issues, so if we have to break a small
>> number of userspace applications we will.  Anything that we can
>> reasonably do to avoid regressions will be done.
> 
> Can you explain the security issues in detail?  Breaking common
> userspace like libvirt-lxc with just a little bit of handwaiving is
> entirely unacceptable.

It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
unbreaks libvirt-lxc.
I hope it hits Linus tree and -stable before the offending commit hits users.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-21 13:22                 ` Richard Weinberger
  0 siblings, 0 replies; 67+ messages in thread
From: Richard Weinberger @ 2014-08-21 13:22 UTC (permalink / raw)
  To: Christoph Hellwig, Eric W. Biederman
  Cc: Linus Torvalds, Linux Containers, linux-fsdevel, LKML,
	libvir-list, Daniel P. Berrange

Am 21.08.2014 15:12, schrieb Christoph Hellwig:
> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>> Richard Weinberger <richard.weinberger@gmail.com> writes:
>>
>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>>
>>> This commit breaks libvirt-lxc.
>>> libvirt does in lxcContainerMountBasicFS():
>>
>> The bugs fixed are security issues, so if we have to break a small
>> number of userspace applications we will.  Anything that we can
>> reasonably do to avoid regressions will be done.
> 
> Can you explain the security issues in detail?  Breaking common
> userspace like libvirt-lxc with just a little bit of handwaiving is
> entirely unacceptable.

It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
unbreaks libvirt-lxc.
I hope it hits Linus tree and -stable before the offending commit hits users.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-21 13:12             ` Christoph Hellwig
@ 2014-08-21 13:43                 ` Eric W. Biederman
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-21 13:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Linux Containers, LKML,
	libvir-list-H+wXaHxf7aLQT0dZR+AlfA, linux-fsdevel,
	Linus Torvalds

Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> writes:

> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>> Richard Weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>> 
>> > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> >
>> > This commit breaks libvirt-lxc.
>> > libvirt does in lxcContainerMountBasicFS():
>> 
>> The bugs fixed are security issues, so if we have to break a small
>> number of userspace applications we will.  Anything that we can
>> reasonably do to avoid regressions will be done.
>
> Can you explain the security issues in detail?  Breaking common
> userspace like libvirt-lxc with just a little bit of handwaiving is
> entirely unacceptable.

The biggies ability for an unprivileged user to clear nosuid, nodev,
read-only, noexec with remount.

Apparently part of what libvirt-lxc clearing all mount flags except for
read-only on some filesystem it remounts read-only.  Which if root
mounted the filesystem with nosuid, nodev, or noexec is not supportable.

So if it isn't the implicit setting of nodev on mount but not on remount
causing problems (which I have the trivial fix for) there is a strong
chance I need to break libvirt-lxc.

I am several bugs deep already where fixing one bug reveals yet another
bug.  It is possible there is something else that that I have not
discovered that I am missing.  I am more to happy to investigate to see
if it possible to avoid breaking libvirt-lxc.

Apologies for being a little vague being at the conference I only have
access to email about an hour a day.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-21 13:43                 ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-21 13:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Linus Torvalds, Linux Containers,
	linux-fsdevel, LKML, libvir-list, Daniel P. Berrange

Christoph Hellwig <hch@infradead.org> writes:

> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>> Richard Weinberger <richard.weinberger@gmail.com> writes:
>> 
>> > On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> >
>> > This commit breaks libvirt-lxc.
>> > libvirt does in lxcContainerMountBasicFS():
>> 
>> The bugs fixed are security issues, so if we have to break a small
>> number of userspace applications we will.  Anything that we can
>> reasonably do to avoid regressions will be done.
>
> Can you explain the security issues in detail?  Breaking common
> userspace like libvirt-lxc with just a little bit of handwaiving is
> entirely unacceptable.

The biggies ability for an unprivileged user to clear nosuid, nodev,
read-only, noexec with remount.

Apparently part of what libvirt-lxc clearing all mount flags except for
read-only on some filesystem it remounts read-only.  Which if root
mounted the filesystem with nosuid, nodev, or noexec is not supportable.

So if it isn't the implicit setting of nodev on mount but not on remount
causing problems (which I have the trivial fix for) there is a strong
chance I need to break libvirt-lxc.

I am several bugs deep already where fixing one bug reveals yet another
bug.  It is possible there is something else that that I have not
discovered that I am missing.  I am more to happy to investigate to see
if it possible to avoid breaking libvirt-lxc.

Apologies for being a little vague being at the conference I only have
access to email about an hour a day.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-21  7:24               ` Richard Weinberger
@ 2014-08-21 13:54                   ` Eric W. Biederman
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-21 13:54 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML,
	linux-fsdevel, Linus Torvalds

Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> writes:

> Am 21.08.2014 08:29, schrieb Richard Weinberger:
>> Am 21.08.2014 06:53, schrieb Eric W. Biederman:
>>> The bugs fixed are security issues, so if we have to break a small
>>> number of userspace applications we will.  Anything that we can
>>> reasonably do to avoid regressions will be done.
>>>
>>> Could you please look at my user-namespace.git#for-next branch I have a
>>> fix for at least one regresion causing issue in there.  I think it may
>>> fix your issues but I am not fully certain more comments below.
>> 
>> I'll run this on my LXC testbed today.
>
> Looks good. With these patches applied libvirt works again. :)

Darn I read my email in the wrong order.  I am glad to hear that my
changes were enough to fix libvirt-lxc.

I will aim at pushing this to Linus after the conference is over and I
can trust myself to think clearly.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-21 13:54                   ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-21 13:54 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Linus Torvalds, Linux Containers, linux-fsdevel, LKML,
	libvir-list, Daniel P. Berrange

Richard Weinberger <richard@nod.at> writes:

> Am 21.08.2014 08:29, schrieb Richard Weinberger:
>> Am 21.08.2014 06:53, schrieb Eric W. Biederman:
>>> The bugs fixed are security issues, so if we have to break a small
>>> number of userspace applications we will.  Anything that we can
>>> reasonably do to avoid regressions will be done.
>>>
>>> Could you please look at my user-namespace.git#for-next branch I have a
>>> fix for at least one regresion causing issue in there.  I think it may
>>> fix your issues but I am not fully certain more comments below.
>> 
>> I'll run this on my LXC testbed today.
>
> Looks good. With these patches applied libvirt works again. :)

Darn I read my email in the wrong order.  I am glad to hear that my
changes were enough to fix libvirt-lxc.

I will aim at pushing this to Linus after the conference is over and I
can trust myself to think clearly.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-21 13:22                 ` Richard Weinberger
@ 2014-08-21 14:09                     ` Eric W. Biederman
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-21 14:09 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML,
	Christoph Hellwig, linux-fsdevel, Linus Torvalds

Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> writes:

> Am 21.08.2014 15:12, schrieb Christoph Hellwig:
>> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>>> Richard Weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>>
>>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>>>
>>>> This commit breaks libvirt-lxc.
>>>> libvirt does in lxcContainerMountBasicFS():
>>>
>>> The bugs fixed are security issues, so if we have to break a small
>>> number of userspace applications we will.  Anything that we can
>>> reasonably do to avoid regressions will be done.
>> 
>> Can you explain the security issues in detail?  Breaking common
>> userspace like libvirt-lxc with just a little bit of handwaiving is
>> entirely unacceptable.
>
> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
> unbreaks libvirt-lxc.
> I hope it hits Linus tree and -stable before the offending commit hits users.

I plan to send the pull request to Linus as soon as I have caught my
breath (from all of the conferences this week) that I can be certain I
am thinking clearly and not rushing things.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-08-21 14:09                     ` Eric W. Biederman
  0 siblings, 0 replies; 67+ messages in thread
From: Eric W. Biederman @ 2014-08-21 14:09 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Christoph Hellwig, Linus Torvalds, Linux Containers,
	linux-fsdevel, LKML, libvir-list, Daniel P. Berrange

Richard Weinberger <richard@nod.at> writes:

> Am 21.08.2014 15:12, schrieb Christoph Hellwig:
>> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>>> Richard Weinberger <richard.weinberger@gmail.com> writes:
>>>
>>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>>>
>>>> This commit breaks libvirt-lxc.
>>>> libvirt does in lxcContainerMountBasicFS():
>>>
>>> The bugs fixed are security issues, so if we have to break a small
>>> number of userspace applications we will.  Anything that we can
>>> reasonably do to avoid regressions will be done.
>> 
>> Can you explain the security issues in detail?  Breaking common
>> userspace like libvirt-lxc with just a little bit of handwaiving is
>> entirely unacceptable.
>
> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
> unbreaks libvirt-lxc.
> I hope it hits Linus tree and -stable before the offending commit hits users.

I plan to send the pull request to Linus as soon as I have caught my
breath (from all of the conferences this week) that I can be certain I
am thinking clearly and not rushing things.

Eric

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
       [not found]                       ` <2686c32f00b14148379e8cfee9c028c794d4aa1a.1407974494.git.luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
  2014-08-15 19:05                         ` Serge Hallyn
  2014-08-15 20:16                         ` Serge Hallyn
@ 2014-08-28  1:35                         ` Andy Lutomirski
  2 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-28  1:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, Serge Hallyn,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, stable, Andy Lutomirski,
	Kenton Varda, Linux FS Devel, Linus Torvalds

On Wed, Aug 13, 2014 at 5:03 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> Currently, creating a new mount (as opposed to bindmount) in a
> non-root userns will implicitly set nodev unless the fs is devpts.
> Something like this will be necessary for file systems that allow
> the mounter to create device nodes without using mknod (e.g. FUSE
> if/when that is allowed), but none of the currently allowed
> filesystems do this.
>

Any thoughts, Eric?

> Implicitly adding nodev is problematic, though.  It will make it
> unsafe to ever remove the implicit addition, since userspace might
> start to rely on it.
>
> This fixes a minor regression in:
>
>     9566d6742852 mnt: Correct permission checks in do_remount
>
> Prior to that commit, MNT_NODEV wasn't enforced for remounts, so
> there is existing user code that creates a new mount in a userns
> without MS_NODEV and then expects a remount with matching options to
> work.  That commit broke code that does this.
>
> Fortunately, since the implicit nodev has no effect on any existing
> filesystems, we can still safely remove it.
>
> This replaces the implicit nodev with an explicit nodev requirement:
> anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get
> -EPERM unless they set nodev.  If they set nodev, that setting will
> be locked.
>
> As an added benefit, if anything like device namespaces is ever
> added, then user code will be able to opt out of nodev by clearing
> nodev.
>
> To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all
> FS_USERNS_MOUNT filesystems.  All of the current filesystems with
> FS_USERNS_MOUNT set are safe.
>
> I confirmed that this is compatible with Sandstorm's revision
> 1bf0c4847b.  That revision of Sandstorm does not work without this
> fix if 9566d6742852 is applied.
>
> Cc: Kenton Varda <kenton-AuYgBwuPrUQTaNkGU808tA@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Signed-off-by: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
> ---
>  fs/namespace.c   | 16 ++++++++++++----
>  fs/proc/root.c   |  2 +-
>  fs/ramfs/inode.c |  2 +-
>  fs/sysfs/mount.c |  2 +-
>  ipc/mqueue.c     |  2 +-
>  mm/shmem.c       |  4 ++--
>  6 files changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 0acabea58319..835fa9e8307e 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
>                         put_filesystem(type);
>                         return -EPERM;
>                 }
> -               /* Only in special cases allow devices from mounts
> -                * created outside the initial user namespace.
> +
> +               /*
> +                * If a filesystem might allow the mounter to put
> +                * device nodes on it without the checks in mknod,
> +                * then require MS_NODEV to mount it.
>                  */
>                 if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
> -                       flags |= MS_NODEV;
> -                       mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
> +                       if (!(mnt_flags & MNT_NODEV)) {
> +                               put_filesystem(type);
> +                               return -EPERM;
> +                       }
> +
> +                       /* Do not allow nodev to be cleared. */
> +                       mnt_flags |= MNT_LOCK_NODEV;
>                 }
>         }
>
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index 094e44d4a6be..2313b280729e 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = {
>         .name           = "proc",
>         .mount          = proc_mount,
>         .kill_sb        = proc_kill_sb,
> -       .fs_flags       = FS_USERNS_MOUNT,
> +       .fs_flags       = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>
>  void __init proc_root_init(void)
> diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
> index d365b1c4eb3c..b95b7302d4cc 100644
> --- a/fs/ramfs/inode.c
> +++ b/fs/ramfs/inode.c
> @@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = {
>         .name           = "ramfs",
>         .mount          = ramfs_mount,
>         .kill_sb        = ramfs_kill_sb,
> -       .fs_flags       = FS_USERNS_MOUNT,
> +       .fs_flags       = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>
>  int __init init_ramfs_fs(void)
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 8a49486bf30c..56ba59317e24 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = {
>         .name           = "sysfs",
>         .mount          = sysfs_mount,
>         .kill_sb        = sysfs_kill_sb,
> -       .fs_flags       = FS_USERNS_MOUNT,
> +       .fs_flags       = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>
>  int __init sysfs_init(void)
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index 4fcf39af1776..56abbc848d4c 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = {
>         .name = "mqueue",
>         .mount = mqueue_mount,
>         .kill_sb = kill_litter_super,
> -       .fs_flags = FS_USERNS_MOUNT,
> +       .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>
>  int mq_init_ns(struct ipc_namespace *ns)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index a42add14331c..f4a708a8f9e3 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = {
>         .name           = "tmpfs",
>         .mount          = shmem_mount,
>         .kill_sb        = kill_litter_super,
> -       .fs_flags       = FS_USERNS_MOUNT,
> +       .fs_flags       = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>
>  int __init shmem_init(void)
> @@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = {
>         .name           = "tmpfs",
>         .mount          = ramfs_mount,
>         .kill_sb        = kill_litter_super,
> -       .fs_flags       = FS_USERNS_MOUNT,
> +       .fs_flags       = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>
>  int __init shmem_init(void)
> --
> 1.9.3
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] fs: Remove implicit nodev for new mounts in non-root userns
  2014-08-14  0:03                     ` Andy Lutomirski
                                         ` (2 preceding siblings ...)
  2014-08-15 20:16                       ` Serge Hallyn
@ 2014-08-28  1:35                       ` Andy Lutomirski
  3 siblings, 0 replies; 67+ messages in thread
From: Andy Lutomirski @ 2014-08-28  1:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linux Containers, Linux FS Devel, Linus Torvalds,
	Serge Hallyn, Andy Lutomirski, Kenton Varda, stable

On Wed, Aug 13, 2014 at 5:03 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> Currently, creating a new mount (as opposed to bindmount) in a
> non-root userns will implicitly set nodev unless the fs is devpts.
> Something like this will be necessary for file systems that allow
> the mounter to create device nodes without using mknod (e.g. FUSE
> if/when that is allowed), but none of the currently allowed
> filesystems do this.
>

Any thoughts, Eric?

> Implicitly adding nodev is problematic, though.  It will make it
> unsafe to ever remove the implicit addition, since userspace might
> start to rely on it.
>
> This fixes a minor regression in:
>
>     9566d6742852 mnt: Correct permission checks in do_remount
>
> Prior to that commit, MNT_NODEV wasn't enforced for remounts, so
> there is existing user code that creates a new mount in a userns
> without MS_NODEV and then expects a remount with matching options to
> work.  That commit broke code that does this.
>
> Fortunately, since the implicit nodev has no effect on any existing
> filesystems, we can still safely remove it.
>
> This replaces the implicit nodev with an explicit nodev requirement:
> anyone who mounts a filesystem without FS_USERNS_DEV_MOUNT will get
> -EPERM unless they set nodev.  If they set nodev, that setting will
> be locked.
>
> As an added benefit, if anything like device namespaces is ever
> added, then user code will be able to opt out of nodev by clearing
> nodev.
>
> To keep existing code working, this adds FS_USERNS_DEV_MOUNT to all
> FS_USERNS_MOUNT filesystems.  All of the current filesystems with
> FS_USERNS_MOUNT set are safe.
>
> I confirmed that this is compatible with Sandstorm's revision
> 1bf0c4847b.  That revision of Sandstorm does not work without this
> fix if 9566d6742852 is applied.
>
> Cc: Kenton Varda <kenton@sandstorm.io>
> Cc: stable@vger.kernel.org
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> ---
>  fs/namespace.c   | 16 ++++++++++++----
>  fs/proc/root.c   |  2 +-
>  fs/ramfs/inode.c |  2 +-
>  fs/sysfs/mount.c |  2 +-
>  ipc/mqueue.c     |  2 +-
>  mm/shmem.c       |  4 ++--
>  6 files changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 0acabea58319..835fa9e8307e 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2154,12 +2154,20 @@ static int do_new_mount(struct path *path, const char *fstype, int flags,
>                         put_filesystem(type);
>                         return -EPERM;
>                 }
> -               /* Only in special cases allow devices from mounts
> -                * created outside the initial user namespace.
> +
> +               /*
> +                * If a filesystem might allow the mounter to put
> +                * device nodes on it without the checks in mknod,
> +                * then require MS_NODEV to mount it.
>                  */
>                 if (!(type->fs_flags & FS_USERNS_DEV_MOUNT)) {
> -                       flags |= MS_NODEV;
> -                       mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
> +                       if (!(mnt_flags & MNT_NODEV)) {
> +                               put_filesystem(type);
> +                               return -EPERM;
> +                       }
> +
> +                       /* Do not allow nodev to be cleared. */
> +                       mnt_flags |= MNT_LOCK_NODEV;
>                 }
>         }
>
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index 094e44d4a6be..2313b280729e 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -159,7 +159,7 @@ static struct file_system_type proc_fs_type = {
>         .name           = "proc",
>         .mount          = proc_mount,
>         .kill_sb        = proc_kill_sb,
> -       .fs_flags       = FS_USERNS_MOUNT,
> +       .fs_flags       = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>
>  void __init proc_root_init(void)
> diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
> index d365b1c4eb3c..b95b7302d4cc 100644
> --- a/fs/ramfs/inode.c
> +++ b/fs/ramfs/inode.c
> @@ -261,7 +261,7 @@ static struct file_system_type ramfs_fs_type = {
>         .name           = "ramfs",
>         .mount          = ramfs_mount,
>         .kill_sb        = ramfs_kill_sb,
> -       .fs_flags       = FS_USERNS_MOUNT,
> +       .fs_flags       = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>
>  int __init init_ramfs_fs(void)
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 8a49486bf30c..56ba59317e24 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -58,7 +58,7 @@ static struct file_system_type sysfs_fs_type = {
>         .name           = "sysfs",
>         .mount          = sysfs_mount,
>         .kill_sb        = sysfs_kill_sb,
> -       .fs_flags       = FS_USERNS_MOUNT,
> +       .fs_flags       = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>
>  int __init sysfs_init(void)
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index 4fcf39af1776..56abbc848d4c 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -1394,7 +1394,7 @@ static struct file_system_type mqueue_fs_type = {
>         .name = "mqueue",
>         .mount = mqueue_mount,
>         .kill_sb = kill_litter_super,
> -       .fs_flags = FS_USERNS_MOUNT,
> +       .fs_flags = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>
>  int mq_init_ns(struct ipc_namespace *ns)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index a42add14331c..f4a708a8f9e3 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3149,7 +3149,7 @@ static struct file_system_type shmem_fs_type = {
>         .name           = "tmpfs",
>         .mount          = shmem_mount,
>         .kill_sb        = kill_litter_super,
> -       .fs_flags       = FS_USERNS_MOUNT,
> +       .fs_flags       = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>
>  int __init shmem_init(void)
> @@ -3208,7 +3208,7 @@ static struct file_system_type shmem_fs_type = {
>         .name           = "tmpfs",
>         .mount          = ramfs_mount,
>         .kill_sb        = kill_litter_super,
> -       .fs_flags       = FS_USERNS_MOUNT,
> +       .fs_flags       = FS_USERNS_MOUNT | FS_USERNS_DEV_MOUNT,
>  };
>
>  int __init shmem_init(void)
> --
> 1.9.3
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-21 14:09                     ` Eric W. Biederman
@ 2014-09-03 21:18                         ` Richard Weinberger
  -1 siblings, 0 replies; 67+ messages in thread
From: Richard Weinberger @ 2014-09-03 21:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Richard Weinberger,
	Linux Containers, LKML, Christoph Hellwig, linux-fsdevel,
	Linus Torvalds

On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
>> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
>> unbreaks libvirt-lxc.
>> I hope it hits Linus tree and -stable before the offending commit hits users.
>
> I plan to send the pull request to Linus as soon as I have caught my
> breath (from all of the conferences this week) that I can be certain I
> am thinking clearly and not rushing things.

*kind reminder* :-)

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-09-03 21:18                         ` Richard Weinberger
  0 siblings, 0 replies; 67+ messages in thread
From: Richard Weinberger @ 2014-09-03 21:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Richard Weinberger, Christoph Hellwig, Linus Torvalds,
	Linux Containers, linux-fsdevel, LKML, libvir-list,
	Daniel P. Berrange

On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
>> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
>> unbreaks libvirt-lxc.
>> I hope it hits Linus tree and -stable before the offending commit hits users.
>
> I plan to send the pull request to Linus as soon as I have caught my
> breath (from all of the conferences this week) that I can be certain I
> am thinking clearly and not rushing things.

*kind reminder* :-)

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-08-21 14:09                     ` Eric W. Biederman
@ 2014-11-25 23:15                         ` Richard Weinberger
  -1 siblings, 0 replies; 67+ messages in thread
From: Richard Weinberger @ 2014-11-25 23:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Richard Weinberger,
	Linux Containers, LKML, Christoph Hellwig, linux-fsdevel,
	Linus Torvalds

Eric,

On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> writes:
>
>> Am 21.08.2014 15:12, schrieb Christoph Hellwig:
>>> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>>>> Richard Weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>>>
>>>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>>>>
>>>>> This commit breaks libvirt-lxc.
>>>>> libvirt does in lxcContainerMountBasicFS():
>>>>
>>>> The bugs fixed are security issues, so if we have to break a small
>>>> number of userspace applications we will.  Anything that we can
>>>> reasonably do to avoid regressions will be done.
>>>
>>> Can you explain the security issues in detail?  Breaking common
>>> userspace like libvirt-lxc with just a little bit of handwaiving is
>>> entirely unacceptable.
>>
>> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
>> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
>> unbreaks libvirt-lxc.
>> I hope it hits Linus tree and -stable before the offending commit hits users.
>
> I plan to send the pull request to Linus as soon as I have caught my
> breath (from all of the conferences this week) that I can be certain I
> am thinking clearly and not rushing things.

Today I've upgraded my LXC testbed to the most recent kernel and found
libvirt-lxc broken again (sic!).
Remounting /proc/sys/ is failing.
Investigating into the issue showed that commit "mnt: Implicitly add
MNT_NODEV on remount as we do on mount"
is not mainline.
Why did you left out this patch? In my previous mails I explicitly
stated that exactly this commit unbreaks libvirt-lxc.

Now the userspace breaking changes are mainline and hit users hard. :-(

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-11-25 23:15                         ` Richard Weinberger
  0 siblings, 0 replies; 67+ messages in thread
From: Richard Weinberger @ 2014-11-25 23:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Richard Weinberger, Christoph Hellwig, Linus Torvalds,
	Linux Containers, linux-fsdevel, LKML, libvir-list,
	Daniel P. Berrange

Eric,

On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Richard Weinberger <richard@nod.at> writes:
>
>> Am 21.08.2014 15:12, schrieb Christoph Hellwig:
>>> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>>>> Richard Weinberger <richard.weinberger@gmail.com> writes:
>>>>
>>>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>>>>
>>>>> This commit breaks libvirt-lxc.
>>>>> libvirt does in lxcContainerMountBasicFS():
>>>>
>>>> The bugs fixed are security issues, so if we have to break a small
>>>> number of userspace applications we will.  Anything that we can
>>>> reasonably do to avoid regressions will be done.
>>>
>>> Can you explain the security issues in detail?  Breaking common
>>> userspace like libvirt-lxc with just a little bit of handwaiving is
>>> entirely unacceptable.
>>
>> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
>> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
>> unbreaks libvirt-lxc.
>> I hope it hits Linus tree and -stable before the offending commit hits users.
>
> I plan to send the pull request to Linus as soon as I have caught my
> breath (from all of the conferences this week) that I can be certain I
> am thinking clearly and not rushing things.

Today I've upgraded my LXC testbed to the most recent kernel and found
libvirt-lxc broken again (sic!).
Remounting /proc/sys/ is failing.
Investigating into the issue showed that commit "mnt: Implicitly add
MNT_NODEV on remount as we do on mount"
is not mainline.
Why did you left out this patch? In my previous mails I explicitly
stated that exactly this commit unbreaks libvirt-lxc.

Now the userspace breaking changes are mainline and hit users hard. :-(

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
  2014-11-25 23:15                         ` Richard Weinberger
@ 2014-11-29 16:58                             ` Richard Weinberger
  -1 siblings, 0 replies; 67+ messages in thread
From: Richard Weinberger @ 2014-11-29 16:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA, Linux Containers, LKML,
	Christoph Hellwig, linux-fsdevel, Linus Torvalds

Am 26.11.2014 um 00:15 schrieb Richard Weinberger:
> Eric,
> 
> On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> writes:
>>
>>> Am 21.08.2014 15:12, schrieb Christoph Hellwig:
>>>> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>>>>> Richard Weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>>>>
>>>>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>>>>>
>>>>>> This commit breaks libvirt-lxc.
>>>>>> libvirt does in lxcContainerMountBasicFS():
>>>>>
>>>>> The bugs fixed are security issues, so if we have to break a small
>>>>> number of userspace applications we will.  Anything that we can
>>>>> reasonably do to avoid regressions will be done.
>>>>
>>>> Can you explain the security issues in detail?  Breaking common
>>>> userspace like libvirt-lxc with just a little bit of handwaiving is
>>>> entirely unacceptable.
>>>
>>> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
>>> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
>>> unbreaks libvirt-lxc.
>>> I hope it hits Linus tree and -stable before the offending commit hits users.
>>
>> I plan to send the pull request to Linus as soon as I have caught my
>> breath (from all of the conferences this week) that I can be certain I
>> am thinking clearly and not rushing things.
> 
> Today I've upgraded my LXC testbed to the most recent kernel and found
> libvirt-lxc broken again (sic!).
> Remounting /proc/sys/ is failing.
> Investigating into the issue showed that commit "mnt: Implicitly add
> MNT_NODEV on remount as we do on mount"
> is not mainline.
> Why did you left out this patch? In my previous mails I explicitly
> stated that exactly this commit unbreaks libvirt-lxc.
> 
> Now the userspace breaking changes are mainline and hit users hard. :-(

*kind ping*
...to make sure that this issue doesn't get lost.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [GIT PULL] namespace updates for v3.17-rc1
@ 2014-11-29 16:58                             ` Richard Weinberger
  0 siblings, 0 replies; 67+ messages in thread
From: Richard Weinberger @ 2014-11-29 16:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Christoph Hellwig, Linus Torvalds, Linux Containers,
	linux-fsdevel, LKML, libvir-list, Daniel P. Berrange

Am 26.11.2014 um 00:15 schrieb Richard Weinberger:
> Eric,
> 
> On Thu, Aug 21, 2014 at 4:09 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Richard Weinberger <richard@nod.at> writes:
>>
>>> Am 21.08.2014 15:12, schrieb Christoph Hellwig:
>>>> On Wed, Aug 20, 2014 at 09:53:49PM -0700, Eric W. Biederman wrote:
>>>>> Richard Weinberger <richard.weinberger@gmail.com> writes:
>>>>>
>>>>>> On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>>>>>
>>>>>> This commit breaks libvirt-lxc.
>>>>>> libvirt does in lxcContainerMountBasicFS():
>>>>>
>>>>> The bugs fixed are security issues, so if we have to break a small
>>>>> number of userspace applications we will.  Anything that we can
>>>>> reasonably do to avoid regressions will be done.
>>>>
>>>> Can you explain the security issues in detail?  Breaking common
>>>> userspace like libvirt-lxc with just a little bit of handwaiving is
>>>> entirely unacceptable.
>>>
>>> It looks like commit 87b47932f40a11280584bce260cbdb3b5f9e8b7d in
>>> git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git for-next
>>> unbreaks libvirt-lxc.
>>> I hope it hits Linus tree and -stable before the offending commit hits users.
>>
>> I plan to send the pull request to Linus as soon as I have caught my
>> breath (from all of the conferences this week) that I can be certain I
>> am thinking clearly and not rushing things.
> 
> Today I've upgraded my LXC testbed to the most recent kernel and found
> libvirt-lxc broken again (sic!).
> Remounting /proc/sys/ is failing.
> Investigating into the issue showed that commit "mnt: Implicitly add
> MNT_NODEV on remount as we do on mount"
> is not mainline.
> Why did you left out this patch? In my previous mails I explicitly
> stated that exactly this commit unbreaks libvirt-lxc.
> 
> Now the userspace breaking changes are mainline and hit users hard. :-(

*kind ping*
...to make sure that this issue doesn't get lost.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2014-11-29 16:58 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-06  0:57 [GIT PULL] namespace updates for v3.17-rc1 Eric W. Biederman
2014-08-06  0:57 ` Eric W. Biederman
     [not found] ` <87fvhav3ic.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-08-06  4:46   ` Stephen Rothwell
2014-08-06  4:46     ` Stephen Rothwell
2014-08-06  4:46     ` Stephen Rothwell
     [not found]     ` <20140806144643.45e5dab8-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org>
2014-08-06  5:16       ` Eric W. Biederman
2014-08-06  5:16         ` Eric W. Biederman
     [not found]         ` <87lhr2tcyx.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-08-06  6:06           ` Stephen Rothwell
2014-08-06  6:06             ` Stephen Rothwell
2014-08-06  6:06             ` Stephen Rothwell
     [not found]             ` <20140806160608.218b6944-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org>
2014-08-06  6:30               ` Eric W. Biederman
2014-08-06  6:30                 ` Eric W. Biederman
2014-08-07 13:28               ` Theodore Ts'o
2014-08-07 13:28                 ` Theodore Ts'o
2014-08-13  2:46   ` Andy Lutomirski
2014-08-13  2:46     ` Andy Lutomirski
     [not found]     ` <53EAD180.4010906-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
2014-08-13  4:17       ` Eric W. Biederman
2014-08-13  4:17         ` Eric W. Biederman
     [not found]         ` <87sil1nhut.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-08-13  4:38           ` Andy Lutomirski
2014-08-13  4:38             ` Andy Lutomirski
2014-08-13  4:45           ` Kenton Varda
     [not found]             ` <CAOP=4widH1rMZ1O=hzAT+M_8exdzRPA8pJ+wH29AQ9L0ogu9nw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-13 10:24               ` Eric W. Biederman
2014-08-13 10:24                 ` Eric W. Biederman
     [not found]                 ` <87tx5ghekp.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-08-13 17:03                   ` Andy Lutomirski
2014-08-13 17:03                     ` Andy Lutomirski
     [not found]                     ` <CALCETrWT_p1-5nkiAjWoeta19fkO3rDiJe9_mhRVqF8x1zXv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-14  0:03                       ` [PATCH] fs: Remove implicit nodev for new mounts in non-root userns Andy Lutomirski
2014-08-14  0:03                     ` Andy Lutomirski
2014-08-15 19:05                       ` Serge Hallyn
2014-08-15 19:16                         ` Andy Lutomirski
2014-08-15 19:16                           ` Andy Lutomirski
     [not found]                           ` <CALCETrVKq1Fxnsd9jKDi5_fcKfCJxBZ1w-zGXD3FR-pF-jLsmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-15 19:37                             ` Serge Hallyn
2014-08-15 19:37                               ` Serge Hallyn
2014-08-15 19:56                               ` Andy Lutomirski
2014-08-15 19:56                                 ` Andy Lutomirski
     [not found]                                 ` <CALCETrWB0qBiyfJbapFnjxoNyNvS+aHvgc_eob3fC1j=cv+v5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-15 20:16                                   ` Serge Hallyn
2014-08-15 20:16                                     ` Serge Hallyn
     [not found]                       ` <2686c32f00b14148379e8cfee9c028c794d4aa1a.1407974494.git.luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
2014-08-15 19:05                         ` Serge Hallyn
2014-08-15 20:16                         ` Serge Hallyn
2014-08-28  1:35                         ` Andy Lutomirski
2014-08-15 20:16                       ` Serge Hallyn
2014-08-28  1:35                       ` Andy Lutomirski
2014-08-15 18:41                   ` [GIT PULL] namespace updates for v3.17-rc1 Andy Lutomirski
2014-08-15 18:41                 ` Andy Lutomirski
2014-08-20 15:06   ` Richard Weinberger
2014-08-20 15:06     ` Richard Weinberger
     [not found]     ` <CAFLxGvwi-iJRyfwv8v9fcRkiSu2d-az8W55xMPbp_d8wQKmwjg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-21  4:53       ` Eric W. Biederman
2014-08-21  4:53         ` Eric W. Biederman
     [not found]         ` <87vbpm4f4y.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-08-21  6:29           ` Richard Weinberger
2014-08-21 13:12           ` Christoph Hellwig
2014-08-21 13:12             ` Christoph Hellwig
     [not found]             ` <20140821131257.GA4264-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-08-21 13:22               ` Richard Weinberger
2014-08-21 13:22                 ` Richard Weinberger
     [not found]                 ` <53F5F2AD.5010607-/L3Ra7n9ekc@public.gmane.org>
2014-08-21 14:09                   ` Eric W. Biederman
2014-08-21 14:09                     ` Eric W. Biederman
     [not found]                     ` <87k362vsr5.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-09-03 21:18                       ` Richard Weinberger
2014-09-03 21:18                         ` Richard Weinberger
2014-11-25 23:15                       ` Richard Weinberger
2014-11-25 23:15                         ` Richard Weinberger
     [not found]                         ` <CAFLxGvzyhHC+QF-bFfp-yNBpCkS3JJ+RAr+5iCj0k_su9wJbGw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-29 16:58                           ` Richard Weinberger
2014-11-29 16:58                             ` Richard Weinberger
2014-08-21 13:43               ` Eric W. Biederman
2014-08-21 13:43                 ` Eric W. Biederman
2014-08-21  6:29         ` Richard Weinberger
     [not found]           ` <53F591E7.3010509-/L3Ra7n9ekc@public.gmane.org>
2014-08-21  7:24             ` Richard Weinberger
2014-08-21  7:24               ` Richard Weinberger
     [not found]               ` <53F59EC7.6060107-/L3Ra7n9ekc@public.gmane.org>
2014-08-21 13:54                 ` Eric W. Biederman
2014-08-21 13:54                   ` Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.