All of lore.kernel.org
 help / color / mirror / Atom feed
* [REVIEW][PATCH 0/43] Completing the user namespace
@ 2012-04-08  5:10 ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-08  5:10 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds


This is a course correction for the user namespace, so that we can reach
an inexpensive, maintainable, and reasonably complete implementation.

If anyone can think of a reason why the user namespace should not
evolve in the direction taken in this patchset please let me know.

There is not an obvious maintainer for the scope of what this patchset
covers so I intend to host this tree myself and to place it in
linux-next after this round of review.

Highlights.
- The kernel will now fail to build if you attempt to compile in
  code whose permission checks have not been updated to be user
  namespace safe.

- All uids from child user namespaces are mapped into the initial user
  namespace before they are processed.  Removing the need to add
  an additional check to see if the user namespace of the compared
  uids remains the same.

- With the user namespaces compiled out the performance is as good or
  better than it is today.

- For most operations absolutely nothing changes performance or
  operationally with the user namespace enabled.

- The worse case performance I could come up with was timing 1 billion
  cache code stat operations with the user namespace code enabled.  This
  went from 156s to 164s on my laptop (or 156ns to 164ns per stat
  operation).

- (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
  Most uid/gid setting system calls treat these value specially anyway
  so attempting to use -1 as a uid would likely cause entertaining
  failures in userspace.

- If setuid is called with a uid that can not be mapped setuid fails.
  I have looked at sendmail, login, ssh and every other program I could
  think of that would call setuid and they all check for and handle
  the case where setuid fails.

- If stat or a similar system call is called from a context in which we
  can not map a uid we lie and return overflowuid.  The LFS experience
  suggests not lying and returning an error code might be better, but
  the historical precedent with uids is different and I can't figure out
  what would break by lying about a uid we can't map.

- Capabilities are localized to the current user namespace making
  it safe to give the initial user in a user namespace all capabilities.

This patchset covers all the modifications needed to convert the core
kernel and make enough other bits to make a bootable result.

These patches are against linux-3.4-rc1 and are also available at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git master

An essentially complete conversion of the entire kernel is available at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git userns-always-map-user-v26
I have reviewed the additional patches less stringently.  The diffstat
for the additional changes is:
 211 files changed, 1496 insertions(+), 979 deletions(-)

Eric W. Biederman (43):
      vfs: Don't allow a user namespace root to make device nodes
      userns: Kill bogus declaration of function release_uids
      userns: Replace netlink uses of cap_raised with capable.
      userns: Remove unnecessary cast to struct user_struct when copying cred->user.
      cred: Add forward declaration of init_user_ns in all cases.
      userns: Use cred->user_ns instead of cred->user->user_ns
      cred: Refcount the user_ns pointed to by the cred.
      userns: Add an explicit reference to the parent user namespace
      mqueue: Explicitly capture the user namespace to send the notification to.
      userns: Deprecate and rename the user_namespace reference in the user_struct
      userns: Start out with a full set of capabilities.
      userns: Replace the hard to write inode_userns with inode_capable.
      userns: Add kuid_t and kgid_t and associated infrastructure in uidgid.h
      userns: Add a Kconfig option to enforce strict kuid and kgid type checks
      userns: Disassociate user_struct from the user_namespace.
      userns: Simplify the user_namespace by making userns->creator a kuid.
      userns: Rework the user_namespace adding uid/gid mapping support
      userns: Convert group_info values from gid_t to kgid_t.
      userns: Store uid and gid values in struct cred with kuid_t and kgid_t types
      userns: Replace user_ns_map_uid and user_ns_map_gid with from_kuid and from_kgid
      userns: Convert sched_set_affinity and sched_set_scheduler's permission checks
      userns: Convert capabilities related permsion checks
      userns: Convert setting and getting uid and gid system calls to use kuid and kgid
      userns: Convert ptrace, kill, set_priority permission checks to work with kuids and kgids
      userns: Store uid and gid types in vfs structures with kuid_t and kgid_t types
      userns: Convert in_group_p and in_egroup_p to use kgid_t
      userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
      userns: Convert user specfied uids and gids in chown into kuids and kgid
      userns: Convert stat to return values mapped from kuids and kgids
      userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
      userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
      userns: signal remove unnecessary map_cred_ns
      userns: Convert binary formats to use kuid/kgid where appropriate
      userns: Convert devpts to use kuid/kgid where appropriate
      userns: Convert ext2 to use kuid/kgid where appropriate.
      userns: Convert ext3 to use kuid/kgid where appropriate
      userns: Convert ext4 to user kuid/kgid where appropriate
      userns: Convert proc to use kuid/kgid where appropriate
      userns: Convert sysctl permission checks to use kuid and kgids.
      userns: Convert sysfs to use kgid/kuid where appropriate
      userns: Convert tmpfs to use kuid and kgid where appropriate
      userns: Convert cgroup permission checks to use uid_eq
      userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq


 arch/arm/kernel/sys_oabi-compat.c      |    4 +-
 arch/parisc/hpux/fs.c                  |    4 +-
 arch/s390/kernel/compat_linux.c        |   17 +-
 arch/sparc/kernel/sys_sparc32.c        |    4 +-
 arch/x86/ia32/sys_ia32.c               |    4 +-
 arch/x86/mm/fault.c                    |    2 +-
 drivers/block/drbd/drbd_nl.c           |    2 +-
 drivers/md/dm-log-userspace-transfer.c |    2 +-
 drivers/video/uvesafb.c                |    2 +-
 fs/attr.c                              |    8 +-
 fs/binfmt_elf.c                        |   12 +-
 fs/binfmt_elf_fdpic.c                  |   12 +-
 fs/compat.c                            |    4 +-
 arch/arm/kernel/sys_oabi-compat.c      |    4 +-
 arch/parisc/hpux/fs.c                  |    4 +-
 arch/s390/kernel/compat_linux.c        |   17 +-
 arch/sparc/kernel/sys_sparc32.c        |    4 +-
 arch/x86/ia32/sys_ia32.c               |    4 +-
 arch/x86/mm/fault.c                    |    2 +-
 drivers/block/drbd/drbd_nl.c           |    2 +-
 drivers/md/dm-log-userspace-transfer.c |    2 +-
 drivers/video/uvesafb.c                |    2 +-
 fs/attr.c                              |    8 +-
 fs/binfmt_elf.c                        |   12 +-
 fs/binfmt_elf_fdpic.c                  |   12 +-
 fs/compat.c                            |    4 +-
 fs/devpts/inode.c                      |   24 +-
 fs/ecryptfs/messaging.c                |    2 +-
 fs/exec.c                              |   15 +-
 fs/ext2/balloc.c                       |    5 +-
 fs/ext2/ext2.h                         |    8 +-
 fs/ext2/inode.c                        |   20 +-
 fs/ext2/super.c                        |   31 ++-
 fs/ext3/balloc.c                       |    5 +-
 fs/ext3/ext3.h                         |    8 +-
 fs/ext3/inode.c                        |   32 +-
 fs/ext3/super.c                        |   35 ++-
 fs/ext4/balloc.c                       |    4 +-
 fs/ext4/ext4.h                         |    4 +-
 fs/ext4/ialloc.c                       |    4 +-
 fs/ext4/inode.c                        |   34 +-
 fs/ext4/migrate.c                      |    4 +-
 fs/ext4/super.c                        |   38 ++-
 fs/fcntl.c                             |    6 +-
 fs/inode.c                             |   10 +-
 fs/ioprio.c                            |   18 +-
 fs/locks.c                             |    2 +-
 fs/namei.c                             |   29 +-
 fs/nfsd/auth.c                         |    5 +-
 fs/open.c                              |   16 +-
 fs/proc/array.c                        |   15 +-
 fs/proc/base.c                         |   93 +++++-
 fs/proc/inode.c                        |    4 +-
 fs/proc/proc_sysctl.c                  |    4 +-
 fs/proc/root.c                         |    2 +-
 fs/stat.c                              |    8 +-
 fs/sysfs/inode.c                       |    4 +-
 include/linux/capability.h             |    2 +
 include/linux/cred.h                   |   33 +-
 include/linux/fs.h                     |   42 ++-
 include/linux/pid_namespace.h          |    2 +-
 include/linux/proc_fs.h                |    4 +-
 include/linux/quotaops.h               |    4 +-
 include/linux/sched.h                  |    9 +-
 include/linux/shmem_fs.h               |    4 +-
 include/linux/stat.h                   |    5 +-
 include/linux/uidgid.h                 |  200 +++++++++++
 include/linux/user_namespace.h         |   39 +-
 include/trace/events/ext3.h            |    4 +-
 include/trace/events/ext4.h            |    4 +-
 init/Kconfig                           |   12 +-
 ipc/mqueue.c                           |   10 +-
 ipc/namespace.c                        |    2 +-
 kernel/capability.c                    |   21 ++
 kernel/cgroup.c                        |    6 +-
 kernel/cred.c                          |   44 ++-
 kernel/exit.c                          |    6 +-
 kernel/groups.c                        |   50 ++--
 kernel/ptrace.c                        |   15 +-
 kernel/sched/core.c                    |    7 +-
 kernel/signal.c                        |   51 +--
 kernel/sys.c                           |  266 ++++++++++-----
 kernel/timer.c                         |    8 +-
 kernel/uid16.c                         |   48 ++-
 kernel/user.c                          |   51 ++-
 kernel/user_namespace.c                |  594 ++++++++++++++++++++++++++++----
 kernel/utsname.c                       |    2 +-
 mm/mempolicy.c                         |    4 +-
 mm/migrate.c                           |    4 +-
 mm/oom_kill.c                          |    4 +-
 mm/shmem.c                             |   22 +-
 net/core/sock.c                        |    4 +-
 net/ipv4/ping.c                        |   11 +-
 net/sunrpc/auth_generic.c              |    4 +-
 net/sunrpc/auth_gss/svcauth_gss.c      |    7 +-
 net/sunrpc/auth_unix.c                 |   15 +-
 net/sunrpc/svcauth_unix.c              |   18 +-
 security/commoncap.c                   |   63 ++--
 security/keys/key.c                    |    2 +-
 security/keys/permission.c             |    5 +-
 security/keys/process_keys.c           |    2 +-
 88 files changed, 1670 insertions(+), 606 deletions(-)

^ permalink raw reply	[flat|nested] 227+ messages in thread

* [REVIEW][PATCH 0/43] Completing the user namespace
@ 2012-04-08  5:10 ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-08  5:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Serge E. Hallyn, Andrew Morton, Linus Torvalds, Al Viro,
	Cyrill Gorcunov


This is a course correction for the user namespace, so that we can reach
an inexpensive, maintainable, and reasonably complete implementation.

If anyone can think of a reason why the user namespace should not
evolve in the direction taken in this patchset please let me know.

There is not an obvious maintainer for the scope of what this patchset
covers so I intend to host this tree myself and to place it in
linux-next after this round of review.

Highlights.
- The kernel will now fail to build if you attempt to compile in
  code whose permission checks have not been updated to be user
  namespace safe.

- All uids from child user namespaces are mapped into the initial user
  namespace before they are processed.  Removing the need to add
  an additional check to see if the user namespace of the compared
  uids remains the same.

- With the user namespaces compiled out the performance is as good or
  better than it is today.

- For most operations absolutely nothing changes performance or
  operationally with the user namespace enabled.

- The worse case performance I could come up with was timing 1 billion
  cache code stat operations with the user namespace code enabled.  This
  went from 156s to 164s on my laptop (or 156ns to 164ns per stat
  operation).

- (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
  Most uid/gid setting system calls treat these value specially anyway
  so attempting to use -1 as a uid would likely cause entertaining
  failures in userspace.

- If setuid is called with a uid that can not be mapped setuid fails.
  I have looked at sendmail, login, ssh and every other program I could
  think of that would call setuid and they all check for and handle
  the case where setuid fails.

- If stat or a similar system call is called from a context in which we
  can not map a uid we lie and return overflowuid.  The LFS experience
  suggests not lying and returning an error code might be better, but
  the historical precedent with uids is different and I can't figure out
  what would break by lying about a uid we can't map.

- Capabilities are localized to the current user namespace making
  it safe to give the initial user in a user namespace all capabilities.

This patchset covers all the modifications needed to convert the core
kernel and make enough other bits to make a bootable result.

These patches are against linux-3.4-rc1 and are also available at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git master

An essentially complete conversion of the entire kernel is available at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git userns-always-map-user-v26
I have reviewed the additional patches less stringently.  The diffstat
for the additional changes is:
 211 files changed, 1496 insertions(+), 979 deletions(-)

Eric W. Biederman (43):
      vfs: Don't allow a user namespace root to make device nodes
      userns: Kill bogus declaration of function release_uids
      userns: Replace netlink uses of cap_raised with capable.
      userns: Remove unnecessary cast to struct user_struct when copying cred->user.
      cred: Add forward declaration of init_user_ns in all cases.
      userns: Use cred->user_ns instead of cred->user->user_ns
      cred: Refcount the user_ns pointed to by the cred.
      userns: Add an explicit reference to the parent user namespace
      mqueue: Explicitly capture the user namespace to send the notification to.
      userns: Deprecate and rename the user_namespace reference in the user_struct
      userns: Start out with a full set of capabilities.
      userns: Replace the hard to write inode_userns with inode_capable.
      userns: Add kuid_t and kgid_t and associated infrastructure in uidgid.h
      userns: Add a Kconfig option to enforce strict kuid and kgid type checks
      userns: Disassociate user_struct from the user_namespace.
      userns: Simplify the user_namespace by making userns->creator a kuid.
      userns: Rework the user_namespace adding uid/gid mapping support
      userns: Convert group_info values from gid_t to kgid_t.
      userns: Store uid and gid values in struct cred with kuid_t and kgid_t types
      userns: Replace user_ns_map_uid and user_ns_map_gid with from_kuid and from_kgid
      userns: Convert sched_set_affinity and sched_set_scheduler's permission checks
      userns: Convert capabilities related permsion checks
      userns: Convert setting and getting uid and gid system calls to use kuid and kgid
      userns: Convert ptrace, kill, set_priority permission checks to work with kuids and kgids
      userns: Store uid and gid types in vfs structures with kuid_t and kgid_t types
      userns: Convert in_group_p and in_egroup_p to use kgid_t
      userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
      userns: Convert user specfied uids and gids in chown into kuids and kgid
      userns: Convert stat to return values mapped from kuids and kgids
      userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
      userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
      userns: signal remove unnecessary map_cred_ns
      userns: Convert binary formats to use kuid/kgid where appropriate
      userns: Convert devpts to use kuid/kgid where appropriate
      userns: Convert ext2 to use kuid/kgid where appropriate.
      userns: Convert ext3 to use kuid/kgid where appropriate
      userns: Convert ext4 to user kuid/kgid where appropriate
      userns: Convert proc to use kuid/kgid where appropriate
      userns: Convert sysctl permission checks to use kuid and kgids.
      userns: Convert sysfs to use kgid/kuid where appropriate
      userns: Convert tmpfs to use kuid and kgid where appropriate
      userns: Convert cgroup permission checks to use uid_eq
      userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq


 arch/arm/kernel/sys_oabi-compat.c      |    4 +-
 arch/parisc/hpux/fs.c                  |    4 +-
 arch/s390/kernel/compat_linux.c        |   17 +-
 arch/sparc/kernel/sys_sparc32.c        |    4 +-
 arch/x86/ia32/sys_ia32.c               |    4 +-
 arch/x86/mm/fault.c                    |    2 +-
 drivers/block/drbd/drbd_nl.c           |    2 +-
 drivers/md/dm-log-userspace-transfer.c |    2 +-
 drivers/video/uvesafb.c                |    2 +-
 fs/attr.c                              |    8 +-
 fs/binfmt_elf.c                        |   12 +-
 fs/binfmt_elf_fdpic.c                  |   12 +-
 fs/compat.c                            |    4 +-
 arch/arm/kernel/sys_oabi-compat.c      |    4 +-
 arch/parisc/hpux/fs.c                  |    4 +-
 arch/s390/kernel/compat_linux.c        |   17 +-
 arch/sparc/kernel/sys_sparc32.c        |    4 +-
 arch/x86/ia32/sys_ia32.c               |    4 +-
 arch/x86/mm/fault.c                    |    2 +-
 drivers/block/drbd/drbd_nl.c           |    2 +-
 drivers/md/dm-log-userspace-transfer.c |    2 +-
 drivers/video/uvesafb.c                |    2 +-
 fs/attr.c                              |    8 +-
 fs/binfmt_elf.c                        |   12 +-
 fs/binfmt_elf_fdpic.c                  |   12 +-
 fs/compat.c                            |    4 +-
 fs/devpts/inode.c                      |   24 +-
 fs/ecryptfs/messaging.c                |    2 +-
 fs/exec.c                              |   15 +-
 fs/ext2/balloc.c                       |    5 +-
 fs/ext2/ext2.h                         |    8 +-
 fs/ext2/inode.c                        |   20 +-
 fs/ext2/super.c                        |   31 ++-
 fs/ext3/balloc.c                       |    5 +-
 fs/ext3/ext3.h                         |    8 +-
 fs/ext3/inode.c                        |   32 +-
 fs/ext3/super.c                        |   35 ++-
 fs/ext4/balloc.c                       |    4 +-
 fs/ext4/ext4.h                         |    4 +-
 fs/ext4/ialloc.c                       |    4 +-
 fs/ext4/inode.c                        |   34 +-
 fs/ext4/migrate.c                      |    4 +-
 fs/ext4/super.c                        |   38 ++-
 fs/fcntl.c                             |    6 +-
 fs/inode.c                             |   10 +-
 fs/ioprio.c                            |   18 +-
 fs/locks.c                             |    2 +-
 fs/namei.c                             |   29 +-
 fs/nfsd/auth.c                         |    5 +-
 fs/open.c                              |   16 +-
 fs/proc/array.c                        |   15 +-
 fs/proc/base.c                         |   93 +++++-
 fs/proc/inode.c                        |    4 +-
 fs/proc/proc_sysctl.c                  |    4 +-
 fs/proc/root.c                         |    2 +-
 fs/stat.c                              |    8 +-
 fs/sysfs/inode.c                       |    4 +-
 include/linux/capability.h             |    2 +
 include/linux/cred.h                   |   33 +-
 include/linux/fs.h                     |   42 ++-
 include/linux/pid_namespace.h          |    2 +-
 include/linux/proc_fs.h                |    4 +-
 include/linux/quotaops.h               |    4 +-
 include/linux/sched.h                  |    9 +-
 include/linux/shmem_fs.h               |    4 +-
 include/linux/stat.h                   |    5 +-
 include/linux/uidgid.h                 |  200 +++++++++++
 include/linux/user_namespace.h         |   39 +-
 include/trace/events/ext3.h            |    4 +-
 include/trace/events/ext4.h            |    4 +-
 init/Kconfig                           |   12 +-
 ipc/mqueue.c                           |   10 +-
 ipc/namespace.c                        |    2 +-
 kernel/capability.c                    |   21 ++
 kernel/cgroup.c                        |    6 +-
 kernel/cred.c                          |   44 ++-
 kernel/exit.c                          |    6 +-
 kernel/groups.c                        |   50 ++--
 kernel/ptrace.c                        |   15 +-
 kernel/sched/core.c                    |    7 +-
 kernel/signal.c                        |   51 +--
 kernel/sys.c                           |  266 ++++++++++-----
 kernel/timer.c                         |    8 +-
 kernel/uid16.c                         |   48 ++-
 kernel/user.c                          |   51 ++-
 kernel/user_namespace.c                |  594 ++++++++++++++++++++++++++++----
 kernel/utsname.c                       |    2 +-
 mm/mempolicy.c                         |    4 +-
 mm/migrate.c                           |    4 +-
 mm/oom_kill.c                          |    4 +-
 mm/shmem.c                             |   22 +-
 net/core/sock.c                        |    4 +-
 net/ipv4/ping.c                        |   11 +-
 net/sunrpc/auth_generic.c              |    4 +-
 net/sunrpc/auth_gss/svcauth_gss.c      |    7 +-
 net/sunrpc/auth_unix.c                 |   15 +-
 net/sunrpc/svcauth_unix.c              |   18 +-
 security/commoncap.c                   |   63 ++--
 security/keys/key.c                    |    2 +-
 security/keys/permission.c             |    5 +-
 security/keys/process_keys.c           |    2 +-
 88 files changed, 1670 insertions(+), 606 deletions(-)

^ permalink raw reply	[flat|nested] 227+ messages in thread

* [REVIEW][PATCH 0/43] Completing the user namespace
@ 2012-04-08  5:10 ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-08  5:10 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds


This is a course correction for the user namespace, so that we can reach
an inexpensive, maintainable, and reasonably complete implementation.

If anyone can think of a reason why the user namespace should not
evolve in the direction taken in this patchset please let me know.

There is not an obvious maintainer for the scope of what this patchset
covers so I intend to host this tree myself and to place it in
linux-next after this round of review.

Highlights.
- The kernel will now fail to build if you attempt to compile in
  code whose permission checks have not been updated to be user
  namespace safe.

- All uids from child user namespaces are mapped into the initial user
  namespace before they are processed.  Removing the need to add
  an additional check to see if the user namespace of the compared
  uids remains the same.

- With the user namespaces compiled out the performance is as good or
  better than it is today.

- For most operations absolutely nothing changes performance or
  operationally with the user namespace enabled.

- The worse case performance I could come up with was timing 1 billion
  cache code stat operations with the user namespace code enabled.  This
  went from 156s to 164s on my laptop (or 156ns to 164ns per stat
  operation).

- (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
  Most uid/gid setting system calls treat these value specially anyway
  so attempting to use -1 as a uid would likely cause entertaining
  failures in userspace.

- If setuid is called with a uid that can not be mapped setuid fails.
  I have looked at sendmail, login, ssh and every other program I could
  think of that would call setuid and they all check for and handle
  the case where setuid fails.

- If stat or a similar system call is called from a context in which we
  can not map a uid we lie and return overflowuid.  The LFS experience
  suggests not lying and returning an error code might be better, but
  the historical precedent with uids is different and I can't figure out
  what would break by lying about a uid we can't map.

- Capabilities are localized to the current user namespace making
  it safe to give the initial user in a user namespace all capabilities.

This patchset covers all the modifications needed to convert the core
kernel and make enough other bits to make a bootable result.

These patches are against linux-3.4-rc1 and are also available at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git master

An essentially complete conversion of the entire kernel is available at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git userns-always-map-user-v26
I have reviewed the additional patches less stringently.  The diffstat
for the additional changes is:
 211 files changed, 1496 insertions(+), 979 deletions(-)

Eric W. Biederman (43):
      vfs: Don't allow a user namespace root to make device nodes
      userns: Kill bogus declaration of function release_uids
      userns: Replace netlink uses of cap_raised with capable.
      userns: Remove unnecessary cast to struct user_struct when copying cred->user.
      cred: Add forward declaration of init_user_ns in all cases.
      userns: Use cred->user_ns instead of cred->user->user_ns
      cred: Refcount the user_ns pointed to by the cred.
      userns: Add an explicit reference to the parent user namespace
      mqueue: Explicitly capture the user namespace to send the notification to.
      userns: Deprecate and rename the user_namespace reference in the user_struct
      userns: Start out with a full set of capabilities.
      userns: Replace the hard to write inode_userns with inode_capable.
      userns: Add kuid_t and kgid_t and associated infrastructure in uidgid.h
      userns: Add a Kconfig option to enforce strict kuid and kgid type checks
      userns: Disassociate user_struct from the user_namespace.
      userns: Simplify the user_namespace by making userns->creator a kuid.
      userns: Rework the user_namespace adding uid/gid mapping support
      userns: Convert group_info values from gid_t to kgid_t.
      userns: Store uid and gid values in struct cred with kuid_t and kgid_t types
      userns: Replace user_ns_map_uid and user_ns_map_gid with from_kuid and from_kgid
      userns: Convert sched_set_affinity and sched_set_scheduler's permission checks
      userns: Convert capabilities related permsion checks
      userns: Convert setting and getting uid and gid system calls to use kuid and kgid
      userns: Convert ptrace, kill, set_priority permission checks to work with kuids and kgids
      userns: Store uid and gid types in vfs structures with kuid_t and kgid_t types
      userns: Convert in_group_p and in_egroup_p to use kgid_t
      userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
      userns: Convert user specfied uids and gids in chown into kuids and kgid
      userns: Convert stat to return values mapped from kuids and kgids
      userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
      userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
      userns: signal remove unnecessary map_cred_ns
      userns: Convert binary formats to use kuid/kgid where appropriate
      userns: Convert devpts to use kuid/kgid where appropriate
      userns: Convert ext2 to use kuid/kgid where appropriate.
      userns: Convert ext3 to use kuid/kgid where appropriate
      userns: Convert ext4 to user kuid/kgid where appropriate
      userns: Convert proc to use kuid/kgid where appropriate
      userns: Convert sysctl permission checks to use kuid and kgids.
      userns: Convert sysfs to use kgid/kuid where appropriate
      userns: Convert tmpfs to use kuid and kgid where appropriate
      userns: Convert cgroup permission checks to use uid_eq
      userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq


 arch/arm/kernel/sys_oabi-compat.c      |    4 +-
 arch/parisc/hpux/fs.c                  |    4 +-
 arch/s390/kernel/compat_linux.c        |   17 +-
 arch/sparc/kernel/sys_sparc32.c        |    4 +-
 arch/x86/ia32/sys_ia32.c               |    4 +-
 arch/x86/mm/fault.c                    |    2 +-
 drivers/block/drbd/drbd_nl.c           |    2 +-
 drivers/md/dm-log-userspace-transfer.c |    2 +-
 drivers/video/uvesafb.c                |    2 +-
 fs/attr.c                              |    8 +-
 fs/binfmt_elf.c                        |   12 +-
 fs/binfmt_elf_fdpic.c                  |   12 +-
 fs/compat.c                            |    4 +-
 arch/arm/kernel/sys_oabi-compat.c      |    4 +-
 arch/parisc/hpux/fs.c                  |    4 +-
 arch/s390/kernel/compat_linux.c        |   17 +-
 arch/sparc/kernel/sys_sparc32.c        |    4 +-
 arch/x86/ia32/sys_ia32.c               |    4 +-
 arch/x86/mm/fault.c                    |    2 +-
 drivers/block/drbd/drbd_nl.c           |    2 +-
 drivers/md/dm-log-userspace-transfer.c |    2 +-
 drivers/video/uvesafb.c                |    2 +-
 fs/attr.c                              |    8 +-
 fs/binfmt_elf.c                        |   12 +-
 fs/binfmt_elf_fdpic.c                  |   12 +-
 fs/compat.c                            |    4 +-
 fs/devpts/inode.c                      |   24 +-
 fs/ecryptfs/messaging.c                |    2 +-
 fs/exec.c                              |   15 +-
 fs/ext2/balloc.c                       |    5 +-
 fs/ext2/ext2.h                         |    8 +-
 fs/ext2/inode.c                        |   20 +-
 fs/ext2/super.c                        |   31 ++-
 fs/ext3/balloc.c                       |    5 +-
 fs/ext3/ext3.h                         |    8 +-
 fs/ext3/inode.c                        |   32 +-
 fs/ext3/super.c                        |   35 ++-
 fs/ext4/balloc.c                       |    4 +-
 fs/ext4/ext4.h                         |    4 +-
 fs/ext4/ialloc.c                       |    4 +-
 fs/ext4/inode.c                        |   34 +-
 fs/ext4/migrate.c                      |    4 +-
 fs/ext4/super.c                        |   38 ++-
 fs/fcntl.c                             |    6 +-
 fs/inode.c                             |   10 +-
 fs/ioprio.c                            |   18 +-
 fs/locks.c                             |    2 +-
 fs/namei.c                             |   29 +-
 fs/nfsd/auth.c                         |    5 +-
 fs/open.c                              |   16 +-
 fs/proc/array.c                        |   15 +-
 fs/proc/base.c                         |   93 +++++-
 fs/proc/inode.c                        |    4 +-
 fs/proc/proc_sysctl.c                  |    4 +-
 fs/proc/root.c                         |    2 +-
 fs/stat.c                              |    8 +-
 fs/sysfs/inode.c                       |    4 +-
 include/linux/capability.h             |    2 +
 include/linux/cred.h                   |   33 +-
 include/linux/fs.h                     |   42 ++-
 include/linux/pid_namespace.h          |    2 +-
 include/linux/proc_fs.h                |    4 +-
 include/linux/quotaops.h               |    4 +-
 include/linux/sched.h                  |    9 +-
 include/linux/shmem_fs.h               |    4 +-
 include/linux/stat.h                   |    5 +-
 include/linux/uidgid.h                 |  200 +++++++++++
 include/linux/user_namespace.h         |   39 +-
 include/trace/events/ext3.h            |    4 +-
 include/trace/events/ext4.h            |    4 +-
 init/Kconfig                           |   12 +-
 ipc/mqueue.c                           |   10 +-
 ipc/namespace.c                        |    2 +-
 kernel/capability.c                    |   21 ++
 kernel/cgroup.c                        |    6 +-
 kernel/cred.c                          |   44 ++-
 kernel/exit.c                          |    6 +-
 kernel/groups.c                        |   50 ++--
 kernel/ptrace.c                        |   15 +-
 kernel/sched/core.c                    |    7 +-
 kernel/signal.c                        |   51 +--
 kernel/sys.c                           |  266 ++++++++++-----
 kernel/timer.c                         |    8 +-
 kernel/uid16.c                         |   48 ++-
 kernel/user.c                          |   51 ++-
 kernel/user_namespace.c                |  594 ++++++++++++++++++++++++++++----
 kernel/utsname.c                       |    2 +-
 mm/mempolicy.c                         |    4 +-
 mm/migrate.c                           |    4 +-
 mm/oom_kill.c                          |    4 +-
 mm/shmem.c                             |   22 +-
 net/core/sock.c                        |    4 +-
 net/ipv4/ping.c                        |   11 +-
 net/sunrpc/auth_generic.c              |    4 +-
 net/sunrpc/auth_gss/svcauth_gss.c      |    7 +-
 net/sunrpc/auth_unix.c                 |   15 +-
 net/sunrpc/svcauth_unix.c              |   18 +-
 security/commoncap.c                   |   63 ++--
 security/keys/key.c                    |    2 +-
 security/keys/permission.c             |    5 +-
 security/keys/process_keys.c           |    2 +-
 88 files changed, 1670 insertions(+), 606 deletions(-)

^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCH 01/43] vfs: Don't allow a user namespace root to make device nodes
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:14     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:14 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Safely making device nodes in a container is solvable but simply
having the capability in a user namespace is not sufficient to make
this work.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/namei.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1898198..701954d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2560,8 +2560,7 @@ int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
 	if (error)
 		return error;
 
-	if ((S_ISCHR(mode) || S_ISBLK(mode)) &&
-	    !ns_capable(inode_userns(dir), CAP_MKNOD))
+	if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
 		return -EPERM;
 
 	if (!dir->i_op->mknod)
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 01/43] vfs: Don't allow a user namespace root to make device nodes
@ 2012-04-08  5:14     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Safely making device nodes in a container is solvable but simply
having the capability in a user namespace is not sufficient to make
this work.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/namei.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1898198..701954d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2560,8 +2560,7 @@ int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
 	if (error)
 		return error;
 
-	if ((S_ISCHR(mode) || S_ISBLK(mode)) &&
-	    !ns_capable(inode_userns(dir), CAP_MKNOD))
+	if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
 		return -EPERM;
 
 	if (!dir->i_op->mknod)
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 01/43] vfs: Don't allow a user namespace root to make device nodes
@ 2012-04-08  5:14     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:14 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Safely making device nodes in a container is solvable but simply
having the capability in a user namespace is not sufficient to make
this work.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/namei.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1898198..701954d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2560,8 +2560,7 @@ int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
 	if (error)
 		return error;
 
-	if ((S_ISCHR(mode) || S_ISBLK(mode)) &&
-	    !ns_capable(inode_userns(dir), CAP_MKNOD))
+	if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
 		return -EPERM;
 
 	if (!dir->i_op->mknod)
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 02/43] userns: Kill bogus declaration of function release_uids
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:14     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:14 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

There is no release_uids function remove the declaration from sched.h

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/sched.h |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 81a173c..720ce8d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2184,7 +2184,6 @@ static inline struct user_struct *get_uid(struct user_struct *u)
 	return u;
 }
 extern void free_uid(struct user_struct *);
-extern void release_uids(struct user_namespace *ns);
 
 #include <asm/current.h>
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 02/43] userns: Kill bogus declaration of function release_uids
@ 2012-04-08  5:14     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

There is no release_uids function remove the declaration from sched.h

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sched.h |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 81a173c..720ce8d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2184,7 +2184,6 @@ static inline struct user_struct *get_uid(struct user_struct *u)
 	return u;
 }
 extern void free_uid(struct user_struct *);
-extern void release_uids(struct user_namespace *ns);
 
 #include <asm/current.h>
 
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 02/43] userns: Kill bogus declaration of function release_uids
@ 2012-04-08  5:14     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:14 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

There is no release_uids function remove the declaration from sched.h

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/sched.h |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 81a173c..720ce8d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2184,7 +2184,6 @@ static inline struct user_struct *get_uid(struct user_struct *u)
 	return u;
 }
 extern void free_uid(struct user_struct *);
-extern void release_uids(struct user_namespace *ns);
 
 #include <asm/current.h>
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 03/43] userns: Replace netlink uses of cap_raised with capable.
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:14     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:14 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Philipp Reisner, Vasiliy Kulikov,
	David Howells, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

In 2009 Philip Reiser notied that a few users of netlink connector
interface needed a capability check and added the idiom
cap_raised(nsp->eff_cap, CAP_SYS_ADMIN) to a few of them, on the premise
that netlink was asynchronous.

In 2011 Patrick McHardy noticed we were being silly because netlink is
synchronous and removed eff_cap from the netlink_skb_params and changed
the idiom to cap_raised(current_cap(), CAP_SYS_ADMIN).

Looking at those spots with a fresh eye we should be calling
capable(CAP_SYS_ADMIN).  The only reason I can see for not calling
capable is that it once appeared we were not in the same task as the
caller which would have made calling capable() impossible.

In the initial user_namespace the only difference between  between
cap_raised(current_cap(), CAP_SYS_ADMIN) and capable(CAP_SYS_ADMIN)
are a few sanity checks and the fact that capable(CAP_SYS_ADMIN)
sets PF_SUPERPRIV if we use the capability.

Since we are going to be using root privilege setting PF_SUPERPRIV
seems the right thing to do.

The motivation for this that patch is that in a child user namespace
cap_raised(current_cap(),...) tests your capabilities with respect to
that child user namespace not capabilities in the initial user namespace
and thus will allow processes that should be unprivielged to use the
kernel services that are only protected with
cap_raised(current_cap(),..).

To fix possible user_namespace issues and to just clean up the code
replace cap_raised(current_cap(), CAP_SYS_ADMIN) with
capable(CAP_SYS_ADMIN).

Acked-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Reviewed-by: James Morris <james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Acked-by: Andrew G. Morgan <morgan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Patrick McHardy <kaber-dcUjhNyLwpNeoWH0uzbU5w@public.gmane.org>
Cc: Philipp Reisner <philipp.reisner-63ez5xqkn6DQT0dZR+AlfA@public.gmane.org>
Cc: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Cc: Vasiliy Kulikov <segoon-cxoSlKxDwOJWk0Htik3J/w@public.gmane.org>
Cc: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 drivers/block/drbd/drbd_nl.c           |    2 +-
 drivers/md/dm-log-userspace-transfer.c |    2 +-
 drivers/video/uvesafb.c                |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index abfaaca..946166e 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2297,7 +2297,7 @@ static void drbd_connector_callback(struct cn_msg *req, struct netlink_skb_parms
 		return;
 	}
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN)) {
+	if (!capable(CAP_SYS_ADMIN)) {
 		retcode = ERR_PERM;
 		goto fail;
 	}
diff --git a/drivers/md/dm-log-userspace-transfer.c b/drivers/md/dm-log-userspace-transfer.c
index 1f23e04..08d9a20 100644
--- a/drivers/md/dm-log-userspace-transfer.c
+++ b/drivers/md/dm-log-userspace-transfer.c
@@ -134,7 +134,7 @@ static void cn_ulog_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
 {
 	struct dm_ulog_request *tfr = (struct dm_ulog_request *)(msg + 1);
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
+	if (!capable(CAP_SYS_ADMIN))
 		return;
 
 	spin_lock(&receiving_list_lock);
diff --git a/drivers/video/uvesafb.c b/drivers/video/uvesafb.c
index 260cca7..9f7d27a 100644
--- a/drivers/video/uvesafb.c
+++ b/drivers/video/uvesafb.c
@@ -73,7 +73,7 @@ static void uvesafb_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *ns
 	struct uvesafb_task *utask;
 	struct uvesafb_ktask *task;
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
+	if (!capable(CAP_SYS_ADMIN))
 		return;
 
 	if (msg->seq >= UVESAFB_TASKS_MAX)
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 03/43] userns: Replace netlink uses of cap_raised with capable.
@ 2012-04-08  5:14     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman, Patrick McHardy, Philipp Reisner,
	Serge E. Hallyn, Vasiliy Kulikov, David Howells

From: Eric W. Biederman <ebiederm@xmission.com>

In 2009 Philip Reiser notied that a few users of netlink connector
interface needed a capability check and added the idiom
cap_raised(nsp->eff_cap, CAP_SYS_ADMIN) to a few of them, on the premise
that netlink was asynchronous.

In 2011 Patrick McHardy noticed we were being silly because netlink is
synchronous and removed eff_cap from the netlink_skb_params and changed
the idiom to cap_raised(current_cap(), CAP_SYS_ADMIN).

Looking at those spots with a fresh eye we should be calling
capable(CAP_SYS_ADMIN).  The only reason I can see for not calling
capable is that it once appeared we were not in the same task as the
caller which would have made calling capable() impossible.

In the initial user_namespace the only difference between  between
cap_raised(current_cap(), CAP_SYS_ADMIN) and capable(CAP_SYS_ADMIN)
are a few sanity checks and the fact that capable(CAP_SYS_ADMIN)
sets PF_SUPERPRIV if we use the capability.

Since we are going to be using root privilege setting PF_SUPERPRIV
seems the right thing to do.

The motivation for this that patch is that in a child user namespace
cap_raised(current_cap(),...) tests your capabilities with respect to
that child user namespace not capabilities in the initial user namespace
and thus will allow processes that should be unprivielged to use the
kernel services that are only protected with
cap_raised(current_cap(),..).

To fix possible user_namespace issues and to just clean up the code
replace cap_raised(current_cap(), CAP_SYS_ADMIN) with
capable(CAP_SYS_ADMIN).

Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/block/drbd/drbd_nl.c           |    2 +-
 drivers/md/dm-log-userspace-transfer.c |    2 +-
 drivers/video/uvesafb.c                |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index abfaaca..946166e 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2297,7 +2297,7 @@ static void drbd_connector_callback(struct cn_msg *req, struct netlink_skb_parms
 		return;
 	}
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN)) {
+	if (!capable(CAP_SYS_ADMIN)) {
 		retcode = ERR_PERM;
 		goto fail;
 	}
diff --git a/drivers/md/dm-log-userspace-transfer.c b/drivers/md/dm-log-userspace-transfer.c
index 1f23e04..08d9a20 100644
--- a/drivers/md/dm-log-userspace-transfer.c
+++ b/drivers/md/dm-log-userspace-transfer.c
@@ -134,7 +134,7 @@ static void cn_ulog_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
 {
 	struct dm_ulog_request *tfr = (struct dm_ulog_request *)(msg + 1);
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
+	if (!capable(CAP_SYS_ADMIN))
 		return;
 
 	spin_lock(&receiving_list_lock);
diff --git a/drivers/video/uvesafb.c b/drivers/video/uvesafb.c
index 260cca7..9f7d27a 100644
--- a/drivers/video/uvesafb.c
+++ b/drivers/video/uvesafb.c
@@ -73,7 +73,7 @@ static void uvesafb_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *ns
 	struct uvesafb_task *utask;
 	struct uvesafb_ktask *task;
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
+	if (!capable(CAP_SYS_ADMIN))
 		return;
 
 	if (msg->seq >= UVESAFB_TASKS_MAX)
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 03/43] userns: Replace netlink uses of cap_raised with capable.
@ 2012-04-08  5:14     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:14 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Philipp Reisner, Vasiliy Kulikov,
	David Howells, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

In 2009 Philip Reiser notied that a few users of netlink connector
interface needed a capability check and added the idiom
cap_raised(nsp->eff_cap, CAP_SYS_ADMIN) to a few of them, on the premise
that netlink was asynchronous.

In 2011 Patrick McHardy noticed we were being silly because netlink is
synchronous and removed eff_cap from the netlink_skb_params and changed
the idiom to cap_raised(current_cap(), CAP_SYS_ADMIN).

Looking at those spots with a fresh eye we should be calling
capable(CAP_SYS_ADMIN).  The only reason I can see for not calling
capable is that it once appeared we were not in the same task as the
caller which would have made calling capable() impossible.

In the initial user_namespace the only difference between  between
cap_raised(current_cap(), CAP_SYS_ADMIN) and capable(CAP_SYS_ADMIN)
are a few sanity checks and the fact that capable(CAP_SYS_ADMIN)
sets PF_SUPERPRIV if we use the capability.

Since we are going to be using root privilege setting PF_SUPERPRIV
seems the right thing to do.

The motivation for this that patch is that in a child user namespace
cap_raised(current_cap(),...) tests your capabilities with respect to
that child user namespace not capabilities in the initial user namespace
and thus will allow processes that should be unprivielged to use the
kernel services that are only protected with
cap_raised(current_cap(),..).

To fix possible user_namespace issues and to just clean up the code
replace cap_raised(current_cap(), CAP_SYS_ADMIN) with
capable(CAP_SYS_ADMIN).

Acked-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Reviewed-by: James Morris <james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Acked-by: Andrew G. Morgan <morgan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Patrick McHardy <kaber-dcUjhNyLwpNeoWH0uzbU5w@public.gmane.org>
Cc: Philipp Reisner <philipp.reisner-63ez5xqkn6DQT0dZR+AlfA@public.gmane.org>
Cc: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Cc: Vasiliy Kulikov <segoon-cxoSlKxDwOJWk0Htik3J/w@public.gmane.org>
Cc: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 drivers/block/drbd/drbd_nl.c           |    2 +-
 drivers/md/dm-log-userspace-transfer.c |    2 +-
 drivers/video/uvesafb.c                |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index abfaaca..946166e 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2297,7 +2297,7 @@ static void drbd_connector_callback(struct cn_msg *req, struct netlink_skb_parms
 		return;
 	}
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN)) {
+	if (!capable(CAP_SYS_ADMIN)) {
 		retcode = ERR_PERM;
 		goto fail;
 	}
diff --git a/drivers/md/dm-log-userspace-transfer.c b/drivers/md/dm-log-userspace-transfer.c
index 1f23e04..08d9a20 100644
--- a/drivers/md/dm-log-userspace-transfer.c
+++ b/drivers/md/dm-log-userspace-transfer.c
@@ -134,7 +134,7 @@ static void cn_ulog_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
 {
 	struct dm_ulog_request *tfr = (struct dm_ulog_request *)(msg + 1);
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
+	if (!capable(CAP_SYS_ADMIN))
 		return;
 
 	spin_lock(&receiving_list_lock);
diff --git a/drivers/video/uvesafb.c b/drivers/video/uvesafb.c
index 260cca7..9f7d27a 100644
--- a/drivers/video/uvesafb.c
+++ b/drivers/video/uvesafb.c
@@ -73,7 +73,7 @@ static void uvesafb_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *ns
 	struct uvesafb_task *utask;
 	struct uvesafb_ktask *task;
 
-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
+	if (!capable(CAP_SYS_ADMIN))
 		return;
 
 	if (msg->seq >= UVESAFB_TASKS_MAX)
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 04/43] userns: Remove unnecessary cast to struct user_struct when copying cred->user.
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

In struct cred the user member is and has always been declared struct user_struct *user.
At most a constant struct cred will have a constant pointer to non-constant user_struct
so remove this unnecessary cast.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/sys.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index e7006eb..f7a4351 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -209,7 +209,7 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
-			user = (struct user_struct *) cred->user;
+			user = cred->user;
 			if (!who)
 				who = cred->uid;
 			else if ((who != cred->uid) &&
@@ -274,7 +274,7 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
-			user = (struct user_struct *) cred->user;
+			user = cred->user;
 			if (!who)
 				who = cred->uid;
 			else if ((who != cred->uid) &&
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 04/43] userns: Remove unnecessary cast to struct user_struct when copying cred->user.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

In struct cred the user member is and has always been declared struct user_struct *user.
At most a constant struct cred will have a constant pointer to non-constant user_struct
so remove this unnecessary cast.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 kernel/sys.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index e7006eb..f7a4351 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -209,7 +209,7 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
-			user = (struct user_struct *) cred->user;
+			user = cred->user;
 			if (!who)
 				who = cred->uid;
 			else if ((who != cred->uid) &&
@@ -274,7 +274,7 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
-			user = (struct user_struct *) cred->user;
+			user = cred->user;
 			if (!who)
 				who = cred->uid;
 			else if ((who != cred->uid) &&
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 04/43] userns: Remove unnecessary cast to struct user_struct when copying cred->user.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

In struct cred the user member is and has always been declared struct user_struct *user.
At most a constant struct cred will have a constant pointer to non-constant user_struct
so remove this unnecessary cast.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/sys.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index e7006eb..f7a4351 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -209,7 +209,7 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
-			user = (struct user_struct *) cred->user;
+			user = cred->user;
 			if (!who)
 				who = cred->uid;
 			else if ((who != cred->uid) &&
@@ -274,7 +274,7 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
-			user = (struct user_struct *) cred->user;
+			user = cred->user;
 			if (!who)
 				who = cred->uid;
 			else if ((who != cred->uid) &&
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 05/43] cred: Add forward declaration of init_user_ns in all cases.
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/cred.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index adadf71..d12c4e4 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -357,11 +357,11 @@ static inline void put_cred(const struct cred *_cred)
 #define current_user()		(current_cred_xxx(user))
 #define current_security()	(current_cred_xxx(security))
 
+extern struct user_namespace init_user_ns;
 #ifdef CONFIG_USER_NS
 #define current_user_ns()	(current_cred_xxx(user_ns))
 #define task_user_ns(task)	(task_cred_xxx((task), user_ns))
 #else
-extern struct user_namespace init_user_ns;
 #define current_user_ns()	(&init_user_ns)
 #define task_user_ns(task)	(&init_user_ns)
 #endif
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 05/43] cred: Add forward declaration of init_user_ns in all cases.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/cred.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index adadf71..d12c4e4 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -357,11 +357,11 @@ static inline void put_cred(const struct cred *_cred)
 #define current_user()		(current_cred_xxx(user))
 #define current_security()	(current_cred_xxx(security))
 
+extern struct user_namespace init_user_ns;
 #ifdef CONFIG_USER_NS
 #define current_user_ns()	(current_cred_xxx(user_ns))
 #define task_user_ns(task)	(task_cred_xxx((task), user_ns))
 #else
-extern struct user_namespace init_user_ns;
 #define current_user_ns()	(&init_user_ns)
 #define task_user_ns(task)	(&init_user_ns)
 #endif
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 05/43] cred: Add forward declaration of init_user_ns in all cases.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/cred.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index adadf71..d12c4e4 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -357,11 +357,11 @@ static inline void put_cred(const struct cred *_cred)
 #define current_user()		(current_cred_xxx(user))
 #define current_security()	(current_cred_xxx(security))
 
+extern struct user_namespace init_user_ns;
 #ifdef CONFIG_USER_NS
 #define current_user_ns()	(current_cred_xxx(user_ns))
 #define task_user_ns(task)	(task_cred_xxx((task), user_ns))
 #else
-extern struct user_namespace init_user_ns;
 #define current_user_ns()	(&init_user_ns)
 #define task_user_ns(task)	(&init_user_ns)
 #endif
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 06/43] userns: Use cred->user_ns instead of cred->user->user_ns
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Optimize performance and prepare for the removal of the user_ns reference
from user_struct.  Remove the slow long walk through cred->user->user_ns and
instead go straight to cred->user_ns.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/ecryptfs/messaging.c      |    2 +-
 ipc/namespace.c              |    2 +-
 kernel/ptrace.c              |    4 ++--
 kernel/sched/core.c          |    2 +-
 kernel/signal.c              |    4 ++--
 kernel/sys.c                 |    8 ++++----
 kernel/user_namespace.c      |    4 ++--
 kernel/utsname.c             |    2 +-
 security/commoncap.c         |   14 +++++++-------
 security/keys/key.c          |    2 +-
 security/keys/permission.c   |    2 +-
 security/keys/process_keys.c |    2 +-
 12 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/fs/ecryptfs/messaging.c b/fs/ecryptfs/messaging.c
index ab22480..a750f95 100644
--- a/fs/ecryptfs/messaging.c
+++ b/fs/ecryptfs/messaging.c
@@ -303,7 +303,7 @@ int ecryptfs_process_response(struct ecryptfs_message *msg, uid_t euid,
 		mutex_unlock(&ecryptfs_daemon_hash_mux);
 		goto wake_up;
 	}
-	tsk_user_ns = __task_cred(msg_ctx->task)->user->user_ns;
+	tsk_user_ns = __task_cred(msg_ctx->task)->user_ns;
 	ctx_euid = task_euid(msg_ctx->task);
 	rc = ecryptfs_find_daemon_by_euid(&daemon, ctx_euid, tsk_user_ns);
 	rcu_read_unlock();
diff --git a/ipc/namespace.c b/ipc/namespace.c
index ce0a647..f362298c 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -46,7 +46,7 @@ static struct ipc_namespace *create_ipc_ns(struct task_struct *tsk,
 	ipcns_notify(IPCNS_CREATED);
 	register_ipcns_notifier(ns);
 
-	ns->user_ns = get_user_ns(task_cred_xxx(tsk, user)->user_ns);
+	ns->user_ns = get_user_ns(task_cred_xxx(tsk, user_ns));
 
 	return ns;
 }
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ee8d49b..24e0a5a 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -198,7 +198,7 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 		return 0;
 	rcu_read_lock();
 	tcred = __task_cred(task);
-	if (cred->user->user_ns == tcred->user->user_ns &&
+	if (cred->user_ns == tcred->user_ns &&
 	    (cred->uid == tcred->euid &&
 	     cred->uid == tcred->suid &&
 	     cred->uid == tcred->uid  &&
@@ -206,7 +206,7 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 	     cred->gid == tcred->sgid &&
 	     cred->gid == tcred->gid))
 		goto ok;
-	if (ptrace_has_cap(tcred->user->user_ns, mode))
+	if (ptrace_has_cap(tcred->user_ns, mode))
 		goto ok;
 	rcu_read_unlock();
 	return -EPERM;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4603b9d..96bff85 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4042,7 +4042,7 @@ static bool check_same_owner(struct task_struct *p)
 
 	rcu_read_lock();
 	pcred = __task_cred(p);
-	if (cred->user->user_ns == pcred->user->user_ns)
+	if (cred->user_ns == pcred->user_ns)
 		match = (cred->euid == pcred->euid ||
 			 cred->euid == pcred->uid);
 	else
diff --git a/kernel/signal.c b/kernel/signal.c
index 17afcaf..e2c5d84 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -767,14 +767,14 @@ static int kill_ok_by_cred(struct task_struct *t)
 	const struct cred *cred = current_cred();
 	const struct cred *tcred = __task_cred(t);
 
-	if (cred->user->user_ns == tcred->user->user_ns &&
+	if (cred->user_ns == tcred->user_ns &&
 	    (cred->euid == tcred->suid ||
 	     cred->euid == tcred->uid ||
 	     cred->uid  == tcred->suid ||
 	     cred->uid  == tcred->uid))
 		return 1;
 
-	if (ns_capable(tcred->user->user_ns, CAP_KILL))
+	if (ns_capable(tcred->user_ns, CAP_KILL))
 		return 1;
 
 	return 0;
diff --git a/kernel/sys.c b/kernel/sys.c
index f7a4351..82d8714 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -133,11 +133,11 @@ static bool set_one_prio_perm(struct task_struct *p)
 {
 	const struct cred *cred = current_cred(), *pcred = __task_cred(p);
 
-	if (pcred->user->user_ns == cred->user->user_ns &&
+	if (pcred->user_ns == cred->user_ns &&
 	    (pcred->uid  == cred->euid ||
 	     pcred->euid == cred->euid))
 		return true;
-	if (ns_capable(pcred->user->user_ns, CAP_SYS_NICE))
+	if (ns_capable(pcred->user_ns, CAP_SYS_NICE))
 		return true;
 	return false;
 }
@@ -1498,7 +1498,7 @@ static int check_prlimit_permission(struct task_struct *task)
 		return 0;
 
 	tcred = __task_cred(task);
-	if (cred->user->user_ns == tcred->user->user_ns &&
+	if (cred->user_ns == tcred->user_ns &&
 	    (cred->uid == tcred->euid &&
 	     cred->uid == tcred->suid &&
 	     cred->uid == tcred->uid  &&
@@ -1506,7 +1506,7 @@ static int check_prlimit_permission(struct task_struct *task)
 	     cred->gid == tcred->sgid &&
 	     cred->gid == tcred->gid))
 		return 0;
-	if (ns_capable(tcred->user->user_ns, CAP_SYS_RESOURCE))
+	if (ns_capable(tcred->user_ns, CAP_SYS_RESOURCE))
 		return 0;
 
 	return -EPERM;
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 3b906e9..f084083 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -90,7 +90,7 @@ uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t
 {
 	struct user_namespace *tmp;
 
-	if (likely(to == cred->user->user_ns))
+	if (likely(to == cred->user_ns))
 		return uid;
 
 
@@ -112,7 +112,7 @@ gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t
 {
 	struct user_namespace *tmp;
 
-	if (likely(to == cred->user->user_ns))
+	if (likely(to == cred->user_ns))
 		return gid;
 
 	/* Is cred->user the creator of the target user_ns
diff --git a/kernel/utsname.c b/kernel/utsname.c
index 405caf9..679d97a 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -43,7 +43,7 @@ static struct uts_namespace *clone_uts_ns(struct task_struct *tsk,
 
 	down_read(&uts_sem);
 	memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
-	ns->user_ns = get_user_ns(task_cred_xxx(tsk, user)->user_ns);
+	ns->user_ns = get_user_ns(task_cred_xxx(tsk, user_ns));
 	up_read(&uts_sem);
 	return ns;
 }
diff --git a/security/commoncap.c b/security/commoncap.c
index 0cf4b53..8b3e10e 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -81,7 +81,7 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
 			return 0;
 
 		/* Do we have the necessary capabilities? */
-		if (targ_ns == cred->user->user_ns)
+		if (targ_ns == cred->user_ns)
 			return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;
 
 		/* Have we tried all of the parent namespaces? */
@@ -136,10 +136,10 @@ int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
 	rcu_read_lock();
 	cred = current_cred();
 	child_cred = __task_cred(child);
-	if (cred->user->user_ns == child_cred->user->user_ns &&
+	if (cred->user_ns == child_cred->user_ns &&
 	    cap_issubset(child_cred->cap_permitted, cred->cap_permitted))
 		goto out;
-	if (ns_capable(child_cred->user->user_ns, CAP_SYS_PTRACE))
+	if (ns_capable(child_cred->user_ns, CAP_SYS_PTRACE))
 		goto out;
 	ret = -EPERM;
 out:
@@ -168,10 +168,10 @@ int cap_ptrace_traceme(struct task_struct *parent)
 	rcu_read_lock();
 	cred = __task_cred(parent);
 	child_cred = current_cred();
-	if (cred->user->user_ns == child_cred->user->user_ns &&
+	if (cred->user_ns == child_cred->user_ns &&
 	    cap_issubset(child_cred->cap_permitted, cred->cap_permitted))
 		goto out;
-	if (has_ns_capability(parent, child_cred->user->user_ns, CAP_SYS_PTRACE))
+	if (has_ns_capability(parent, child_cred->user_ns, CAP_SYS_PTRACE))
 		goto out;
 	ret = -EPERM;
 out:
@@ -214,7 +214,7 @@ static inline int cap_inh_is_capped(void)
 	/* they are so limited unless the current task has the CAP_SETPCAP
 	 * capability
 	 */
-	if (cap_capable(current_cred(), current_cred()->user->user_ns,
+	if (cap_capable(current_cred(), current_cred()->user_ns,
 			CAP_SETPCAP, SECURITY_CAP_AUDIT) == 0)
 		return 0;
 	return 1;
@@ -866,7 +866,7 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 		    || ((new->securebits & SECURE_ALL_LOCKS & ~arg2))	/*[2]*/
 		    || (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS))	/*[3]*/
 		    || (cap_capable(current_cred(),
-				    current_cred()->user->user_ns, CAP_SETPCAP,
+				    current_cred()->user_ns, CAP_SETPCAP,
 				    SECURITY_CAP_AUDIT) != 0)		/*[4]*/
 			/*
 			 * [1] no changing of bits that are locked
diff --git a/security/keys/key.c b/security/keys/key.c
index 06783cf..7e60347 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -253,7 +253,7 @@ struct key *key_alloc(struct key_type *type, const char *desc,
 	quotalen = desclen + type->def_datalen;
 
 	/* get hold of the key tracking for this user */
-	user = key_user_lookup(uid, cred->user->user_ns);
+	user = key_user_lookup(uid, cred->user_ns);
 	if (!user)
 		goto no_memory_1;
 
diff --git a/security/keys/permission.c b/security/keys/permission.c
index c35b522..e146cbd 100644
--- a/security/keys/permission.c
+++ b/security/keys/permission.c
@@ -36,7 +36,7 @@ int key_task_permission(const key_ref_t key_ref, const struct cred *cred,
 
 	key = key_ref_to_ptr(key_ref);
 
-	if (key->user->user_ns != cred->user->user_ns)
+	if (key->user->user_ns != cred->user_ns)
 		goto use_other_perms;
 
 	/* use the second 8-bits of permissions for keys the caller owns */
diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index be7ecb2..70febff 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -858,7 +858,7 @@ void key_replace_session_keyring(void)
 	new-> sgid	= old-> sgid;
 	new->fsgid	= old->fsgid;
 	new->user	= get_uid(old->user);
-	new->user_ns	= new->user->user_ns;
+	new->user_ns	= new->user_ns;
 	new->group_info	= get_group_info(old->group_info);
 
 	new->securebits	= old->securebits;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 06/43] userns: Use cred->user_ns instead of cred->user->user_ns
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Optimize performance and prepare for the removal of the user_ns reference
from user_struct.  Remove the slow long walk through cred->user->user_ns and
instead go straight to cred->user_ns.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/ecryptfs/messaging.c      |    2 +-
 ipc/namespace.c              |    2 +-
 kernel/ptrace.c              |    4 ++--
 kernel/sched/core.c          |    2 +-
 kernel/signal.c              |    4 ++--
 kernel/sys.c                 |    8 ++++----
 kernel/user_namespace.c      |    4 ++--
 kernel/utsname.c             |    2 +-
 security/commoncap.c         |   14 +++++++-------
 security/keys/key.c          |    2 +-
 security/keys/permission.c   |    2 +-
 security/keys/process_keys.c |    2 +-
 12 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/fs/ecryptfs/messaging.c b/fs/ecryptfs/messaging.c
index ab22480..a750f95 100644
--- a/fs/ecryptfs/messaging.c
+++ b/fs/ecryptfs/messaging.c
@@ -303,7 +303,7 @@ int ecryptfs_process_response(struct ecryptfs_message *msg, uid_t euid,
 		mutex_unlock(&ecryptfs_daemon_hash_mux);
 		goto wake_up;
 	}
-	tsk_user_ns = __task_cred(msg_ctx->task)->user->user_ns;
+	tsk_user_ns = __task_cred(msg_ctx->task)->user_ns;
 	ctx_euid = task_euid(msg_ctx->task);
 	rc = ecryptfs_find_daemon_by_euid(&daemon, ctx_euid, tsk_user_ns);
 	rcu_read_unlock();
diff --git a/ipc/namespace.c b/ipc/namespace.c
index ce0a647..f362298c 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -46,7 +46,7 @@ static struct ipc_namespace *create_ipc_ns(struct task_struct *tsk,
 	ipcns_notify(IPCNS_CREATED);
 	register_ipcns_notifier(ns);
 
-	ns->user_ns = get_user_ns(task_cred_xxx(tsk, user)->user_ns);
+	ns->user_ns = get_user_ns(task_cred_xxx(tsk, user_ns));
 
 	return ns;
 }
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ee8d49b..24e0a5a 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -198,7 +198,7 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 		return 0;
 	rcu_read_lock();
 	tcred = __task_cred(task);
-	if (cred->user->user_ns == tcred->user->user_ns &&
+	if (cred->user_ns == tcred->user_ns &&
 	    (cred->uid == tcred->euid &&
 	     cred->uid == tcred->suid &&
 	     cred->uid == tcred->uid  &&
@@ -206,7 +206,7 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 	     cred->gid == tcred->sgid &&
 	     cred->gid == tcred->gid))
 		goto ok;
-	if (ptrace_has_cap(tcred->user->user_ns, mode))
+	if (ptrace_has_cap(tcred->user_ns, mode))
 		goto ok;
 	rcu_read_unlock();
 	return -EPERM;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4603b9d..96bff85 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4042,7 +4042,7 @@ static bool check_same_owner(struct task_struct *p)
 
 	rcu_read_lock();
 	pcred = __task_cred(p);
-	if (cred->user->user_ns == pcred->user->user_ns)
+	if (cred->user_ns == pcred->user_ns)
 		match = (cred->euid == pcred->euid ||
 			 cred->euid == pcred->uid);
 	else
diff --git a/kernel/signal.c b/kernel/signal.c
index 17afcaf..e2c5d84 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -767,14 +767,14 @@ static int kill_ok_by_cred(struct task_struct *t)
 	const struct cred *cred = current_cred();
 	const struct cred *tcred = __task_cred(t);
 
-	if (cred->user->user_ns == tcred->user->user_ns &&
+	if (cred->user_ns == tcred->user_ns &&
 	    (cred->euid == tcred->suid ||
 	     cred->euid == tcred->uid ||
 	     cred->uid  == tcred->suid ||
 	     cred->uid  == tcred->uid))
 		return 1;
 
-	if (ns_capable(tcred->user->user_ns, CAP_KILL))
+	if (ns_capable(tcred->user_ns, CAP_KILL))
 		return 1;
 
 	return 0;
diff --git a/kernel/sys.c b/kernel/sys.c
index f7a4351..82d8714 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -133,11 +133,11 @@ static bool set_one_prio_perm(struct task_struct *p)
 {
 	const struct cred *cred = current_cred(), *pcred = __task_cred(p);
 
-	if (pcred->user->user_ns == cred->user->user_ns &&
+	if (pcred->user_ns == cred->user_ns &&
 	    (pcred->uid  == cred->euid ||
 	     pcred->euid == cred->euid))
 		return true;
-	if (ns_capable(pcred->user->user_ns, CAP_SYS_NICE))
+	if (ns_capable(pcred->user_ns, CAP_SYS_NICE))
 		return true;
 	return false;
 }
@@ -1498,7 +1498,7 @@ static int check_prlimit_permission(struct task_struct *task)
 		return 0;
 
 	tcred = __task_cred(task);
-	if (cred->user->user_ns == tcred->user->user_ns &&
+	if (cred->user_ns == tcred->user_ns &&
 	    (cred->uid == tcred->euid &&
 	     cred->uid == tcred->suid &&
 	     cred->uid == tcred->uid  &&
@@ -1506,7 +1506,7 @@ static int check_prlimit_permission(struct task_struct *task)
 	     cred->gid == tcred->sgid &&
 	     cred->gid == tcred->gid))
 		return 0;
-	if (ns_capable(tcred->user->user_ns, CAP_SYS_RESOURCE))
+	if (ns_capable(tcred->user_ns, CAP_SYS_RESOURCE))
 		return 0;
 
 	return -EPERM;
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 3b906e9..f084083 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -90,7 +90,7 @@ uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t
 {
 	struct user_namespace *tmp;
 
-	if (likely(to == cred->user->user_ns))
+	if (likely(to == cred->user_ns))
 		return uid;
 
 
@@ -112,7 +112,7 @@ gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t
 {
 	struct user_namespace *tmp;
 
-	if (likely(to == cred->user->user_ns))
+	if (likely(to == cred->user_ns))
 		return gid;
 
 	/* Is cred->user the creator of the target user_ns
diff --git a/kernel/utsname.c b/kernel/utsname.c
index 405caf9..679d97a 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -43,7 +43,7 @@ static struct uts_namespace *clone_uts_ns(struct task_struct *tsk,
 
 	down_read(&uts_sem);
 	memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
-	ns->user_ns = get_user_ns(task_cred_xxx(tsk, user)->user_ns);
+	ns->user_ns = get_user_ns(task_cred_xxx(tsk, user_ns));
 	up_read(&uts_sem);
 	return ns;
 }
diff --git a/security/commoncap.c b/security/commoncap.c
index 0cf4b53..8b3e10e 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -81,7 +81,7 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
 			return 0;
 
 		/* Do we have the necessary capabilities? */
-		if (targ_ns == cred->user->user_ns)
+		if (targ_ns == cred->user_ns)
 			return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;
 
 		/* Have we tried all of the parent namespaces? */
@@ -136,10 +136,10 @@ int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
 	rcu_read_lock();
 	cred = current_cred();
 	child_cred = __task_cred(child);
-	if (cred->user->user_ns == child_cred->user->user_ns &&
+	if (cred->user_ns == child_cred->user_ns &&
 	    cap_issubset(child_cred->cap_permitted, cred->cap_permitted))
 		goto out;
-	if (ns_capable(child_cred->user->user_ns, CAP_SYS_PTRACE))
+	if (ns_capable(child_cred->user_ns, CAP_SYS_PTRACE))
 		goto out;
 	ret = -EPERM;
 out:
@@ -168,10 +168,10 @@ int cap_ptrace_traceme(struct task_struct *parent)
 	rcu_read_lock();
 	cred = __task_cred(parent);
 	child_cred = current_cred();
-	if (cred->user->user_ns == child_cred->user->user_ns &&
+	if (cred->user_ns == child_cred->user_ns &&
 	    cap_issubset(child_cred->cap_permitted, cred->cap_permitted))
 		goto out;
-	if (has_ns_capability(parent, child_cred->user->user_ns, CAP_SYS_PTRACE))
+	if (has_ns_capability(parent, child_cred->user_ns, CAP_SYS_PTRACE))
 		goto out;
 	ret = -EPERM;
 out:
@@ -214,7 +214,7 @@ static inline int cap_inh_is_capped(void)
 	/* they are so limited unless the current task has the CAP_SETPCAP
 	 * capability
 	 */
-	if (cap_capable(current_cred(), current_cred()->user->user_ns,
+	if (cap_capable(current_cred(), current_cred()->user_ns,
 			CAP_SETPCAP, SECURITY_CAP_AUDIT) == 0)
 		return 0;
 	return 1;
@@ -866,7 +866,7 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 		    || ((new->securebits & SECURE_ALL_LOCKS & ~arg2))	/*[2]*/
 		    || (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS))	/*[3]*/
 		    || (cap_capable(current_cred(),
-				    current_cred()->user->user_ns, CAP_SETPCAP,
+				    current_cred()->user_ns, CAP_SETPCAP,
 				    SECURITY_CAP_AUDIT) != 0)		/*[4]*/
 			/*
 			 * [1] no changing of bits that are locked
diff --git a/security/keys/key.c b/security/keys/key.c
index 06783cf..7e60347 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -253,7 +253,7 @@ struct key *key_alloc(struct key_type *type, const char *desc,
 	quotalen = desclen + type->def_datalen;
 
 	/* get hold of the key tracking for this user */
-	user = key_user_lookup(uid, cred->user->user_ns);
+	user = key_user_lookup(uid, cred->user_ns);
 	if (!user)
 		goto no_memory_1;
 
diff --git a/security/keys/permission.c b/security/keys/permission.c
index c35b522..e146cbd 100644
--- a/security/keys/permission.c
+++ b/security/keys/permission.c
@@ -36,7 +36,7 @@ int key_task_permission(const key_ref_t key_ref, const struct cred *cred,
 
 	key = key_ref_to_ptr(key_ref);
 
-	if (key->user->user_ns != cred->user->user_ns)
+	if (key->user->user_ns != cred->user_ns)
 		goto use_other_perms;
 
 	/* use the second 8-bits of permissions for keys the caller owns */
diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index be7ecb2..70febff 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -858,7 +858,7 @@ void key_replace_session_keyring(void)
 	new-> sgid	= old-> sgid;
 	new->fsgid	= old->fsgid;
 	new->user	= get_uid(old->user);
-	new->user_ns	= new->user->user_ns;
+	new->user_ns	= new->user_ns;
 	new->group_info	= get_group_info(old->group_info);
 
 	new->securebits	= old->securebits;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 06/43] userns: Use cred->user_ns instead of cred->user->user_ns
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Optimize performance and prepare for the removal of the user_ns reference
from user_struct.  Remove the slow long walk through cred->user->user_ns and
instead go straight to cred->user_ns.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/ecryptfs/messaging.c      |    2 +-
 ipc/namespace.c              |    2 +-
 kernel/ptrace.c              |    4 ++--
 kernel/sched/core.c          |    2 +-
 kernel/signal.c              |    4 ++--
 kernel/sys.c                 |    8 ++++----
 kernel/user_namespace.c      |    4 ++--
 kernel/utsname.c             |    2 +-
 security/commoncap.c         |   14 +++++++-------
 security/keys/key.c          |    2 +-
 security/keys/permission.c   |    2 +-
 security/keys/process_keys.c |    2 +-
 12 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/fs/ecryptfs/messaging.c b/fs/ecryptfs/messaging.c
index ab22480..a750f95 100644
--- a/fs/ecryptfs/messaging.c
+++ b/fs/ecryptfs/messaging.c
@@ -303,7 +303,7 @@ int ecryptfs_process_response(struct ecryptfs_message *msg, uid_t euid,
 		mutex_unlock(&ecryptfs_daemon_hash_mux);
 		goto wake_up;
 	}
-	tsk_user_ns = __task_cred(msg_ctx->task)->user->user_ns;
+	tsk_user_ns = __task_cred(msg_ctx->task)->user_ns;
 	ctx_euid = task_euid(msg_ctx->task);
 	rc = ecryptfs_find_daemon_by_euid(&daemon, ctx_euid, tsk_user_ns);
 	rcu_read_unlock();
diff --git a/ipc/namespace.c b/ipc/namespace.c
index ce0a647..f362298c 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -46,7 +46,7 @@ static struct ipc_namespace *create_ipc_ns(struct task_struct *tsk,
 	ipcns_notify(IPCNS_CREATED);
 	register_ipcns_notifier(ns);
 
-	ns->user_ns = get_user_ns(task_cred_xxx(tsk, user)->user_ns);
+	ns->user_ns = get_user_ns(task_cred_xxx(tsk, user_ns));
 
 	return ns;
 }
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ee8d49b..24e0a5a 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -198,7 +198,7 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 		return 0;
 	rcu_read_lock();
 	tcred = __task_cred(task);
-	if (cred->user->user_ns == tcred->user->user_ns &&
+	if (cred->user_ns == tcred->user_ns &&
 	    (cred->uid == tcred->euid &&
 	     cred->uid == tcred->suid &&
 	     cred->uid == tcred->uid  &&
@@ -206,7 +206,7 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 	     cred->gid == tcred->sgid &&
 	     cred->gid == tcred->gid))
 		goto ok;
-	if (ptrace_has_cap(tcred->user->user_ns, mode))
+	if (ptrace_has_cap(tcred->user_ns, mode))
 		goto ok;
 	rcu_read_unlock();
 	return -EPERM;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4603b9d..96bff85 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4042,7 +4042,7 @@ static bool check_same_owner(struct task_struct *p)
 
 	rcu_read_lock();
 	pcred = __task_cred(p);
-	if (cred->user->user_ns == pcred->user->user_ns)
+	if (cred->user_ns == pcred->user_ns)
 		match = (cred->euid == pcred->euid ||
 			 cred->euid == pcred->uid);
 	else
diff --git a/kernel/signal.c b/kernel/signal.c
index 17afcaf..e2c5d84 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -767,14 +767,14 @@ static int kill_ok_by_cred(struct task_struct *t)
 	const struct cred *cred = current_cred();
 	const struct cred *tcred = __task_cred(t);
 
-	if (cred->user->user_ns == tcred->user->user_ns &&
+	if (cred->user_ns == tcred->user_ns &&
 	    (cred->euid == tcred->suid ||
 	     cred->euid == tcred->uid ||
 	     cred->uid  == tcred->suid ||
 	     cred->uid  == tcred->uid))
 		return 1;
 
-	if (ns_capable(tcred->user->user_ns, CAP_KILL))
+	if (ns_capable(tcred->user_ns, CAP_KILL))
 		return 1;
 
 	return 0;
diff --git a/kernel/sys.c b/kernel/sys.c
index f7a4351..82d8714 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -133,11 +133,11 @@ static bool set_one_prio_perm(struct task_struct *p)
 {
 	const struct cred *cred = current_cred(), *pcred = __task_cred(p);
 
-	if (pcred->user->user_ns == cred->user->user_ns &&
+	if (pcred->user_ns == cred->user_ns &&
 	    (pcred->uid  == cred->euid ||
 	     pcred->euid == cred->euid))
 		return true;
-	if (ns_capable(pcred->user->user_ns, CAP_SYS_NICE))
+	if (ns_capable(pcred->user_ns, CAP_SYS_NICE))
 		return true;
 	return false;
 }
@@ -1498,7 +1498,7 @@ static int check_prlimit_permission(struct task_struct *task)
 		return 0;
 
 	tcred = __task_cred(task);
-	if (cred->user->user_ns == tcred->user->user_ns &&
+	if (cred->user_ns == tcred->user_ns &&
 	    (cred->uid == tcred->euid &&
 	     cred->uid == tcred->suid &&
 	     cred->uid == tcred->uid  &&
@@ -1506,7 +1506,7 @@ static int check_prlimit_permission(struct task_struct *task)
 	     cred->gid == tcred->sgid &&
 	     cred->gid == tcred->gid))
 		return 0;
-	if (ns_capable(tcred->user->user_ns, CAP_SYS_RESOURCE))
+	if (ns_capable(tcred->user_ns, CAP_SYS_RESOURCE))
 		return 0;
 
 	return -EPERM;
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 3b906e9..f084083 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -90,7 +90,7 @@ uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t
 {
 	struct user_namespace *tmp;
 
-	if (likely(to == cred->user->user_ns))
+	if (likely(to == cred->user_ns))
 		return uid;
 
 
@@ -112,7 +112,7 @@ gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t
 {
 	struct user_namespace *tmp;
 
-	if (likely(to == cred->user->user_ns))
+	if (likely(to == cred->user_ns))
 		return gid;
 
 	/* Is cred->user the creator of the target user_ns
diff --git a/kernel/utsname.c b/kernel/utsname.c
index 405caf9..679d97a 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -43,7 +43,7 @@ static struct uts_namespace *clone_uts_ns(struct task_struct *tsk,
 
 	down_read(&uts_sem);
 	memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
-	ns->user_ns = get_user_ns(task_cred_xxx(tsk, user)->user_ns);
+	ns->user_ns = get_user_ns(task_cred_xxx(tsk, user_ns));
 	up_read(&uts_sem);
 	return ns;
 }
diff --git a/security/commoncap.c b/security/commoncap.c
index 0cf4b53..8b3e10e 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -81,7 +81,7 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
 			return 0;
 
 		/* Do we have the necessary capabilities? */
-		if (targ_ns == cred->user->user_ns)
+		if (targ_ns == cred->user_ns)
 			return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;
 
 		/* Have we tried all of the parent namespaces? */
@@ -136,10 +136,10 @@ int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
 	rcu_read_lock();
 	cred = current_cred();
 	child_cred = __task_cred(child);
-	if (cred->user->user_ns == child_cred->user->user_ns &&
+	if (cred->user_ns == child_cred->user_ns &&
 	    cap_issubset(child_cred->cap_permitted, cred->cap_permitted))
 		goto out;
-	if (ns_capable(child_cred->user->user_ns, CAP_SYS_PTRACE))
+	if (ns_capable(child_cred->user_ns, CAP_SYS_PTRACE))
 		goto out;
 	ret = -EPERM;
 out:
@@ -168,10 +168,10 @@ int cap_ptrace_traceme(struct task_struct *parent)
 	rcu_read_lock();
 	cred = __task_cred(parent);
 	child_cred = current_cred();
-	if (cred->user->user_ns == child_cred->user->user_ns &&
+	if (cred->user_ns == child_cred->user_ns &&
 	    cap_issubset(child_cred->cap_permitted, cred->cap_permitted))
 		goto out;
-	if (has_ns_capability(parent, child_cred->user->user_ns, CAP_SYS_PTRACE))
+	if (has_ns_capability(parent, child_cred->user_ns, CAP_SYS_PTRACE))
 		goto out;
 	ret = -EPERM;
 out:
@@ -214,7 +214,7 @@ static inline int cap_inh_is_capped(void)
 	/* they are so limited unless the current task has the CAP_SETPCAP
 	 * capability
 	 */
-	if (cap_capable(current_cred(), current_cred()->user->user_ns,
+	if (cap_capable(current_cred(), current_cred()->user_ns,
 			CAP_SETPCAP, SECURITY_CAP_AUDIT) == 0)
 		return 0;
 	return 1;
@@ -866,7 +866,7 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 		    || ((new->securebits & SECURE_ALL_LOCKS & ~arg2))	/*[2]*/
 		    || (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS))	/*[3]*/
 		    || (cap_capable(current_cred(),
-				    current_cred()->user->user_ns, CAP_SETPCAP,
+				    current_cred()->user_ns, CAP_SETPCAP,
 				    SECURITY_CAP_AUDIT) != 0)		/*[4]*/
 			/*
 			 * [1] no changing of bits that are locked
diff --git a/security/keys/key.c b/security/keys/key.c
index 06783cf..7e60347 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -253,7 +253,7 @@ struct key *key_alloc(struct key_type *type, const char *desc,
 	quotalen = desclen + type->def_datalen;
 
 	/* get hold of the key tracking for this user */
-	user = key_user_lookup(uid, cred->user->user_ns);
+	user = key_user_lookup(uid, cred->user_ns);
 	if (!user)
 		goto no_memory_1;
 
diff --git a/security/keys/permission.c b/security/keys/permission.c
index c35b522..e146cbd 100644
--- a/security/keys/permission.c
+++ b/security/keys/permission.c
@@ -36,7 +36,7 @@ int key_task_permission(const key_ref_t key_ref, const struct cred *cred,
 
 	key = key_ref_to_ptr(key_ref);
 
-	if (key->user->user_ns != cred->user->user_ns)
+	if (key->user->user_ns != cred->user_ns)
 		goto use_other_perms;
 
 	/* use the second 8-bits of permissions for keys the caller owns */
diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index be7ecb2..70febff 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -858,7 +858,7 @@ void key_replace_session_keyring(void)
 	new-> sgid	= old-> sgid;
 	new->fsgid	= old->fsgid;
 	new->user	= get_uid(old->user);
-	new->user_ns	= new->user->user_ns;
+	new->user_ns	= new->user_ns;
 	new->group_info	= get_group_info(old->group_info);
 
 	new->securebits	= old->securebits;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 07/43] cred: Refcount the user_ns pointed to by the cred.
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

struct user_struct will shortly loose it's user_ns reference
so make the cred user_ns reference a proper reference complete
with reference counting.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/cred.h         |    2 +-
 kernel/cred.c                |    8 +++-----
 kernel/user_namespace.c      |    8 +++++---
 security/keys/process_keys.c |    2 +-
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index d12c4e4..2c60ec8 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -146,7 +146,7 @@ struct cred {
 	void		*security;	/* subjective LSM security */
 #endif
 	struct user_struct *user;	/* real user ID subscription */
-	struct user_namespace *user_ns; /* cached user->user_ns */
+	struct user_namespace *user_ns; /* user_ns the caps and keyrings are relative to. */
 	struct group_info *group_info;	/* supplementary groups for euid/fsgid */
 	struct rcu_head	rcu;		/* RCU deletion hook */
 };
diff --git a/kernel/cred.c b/kernel/cred.c
index 97b36ee..7a0d806 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -148,6 +148,7 @@ static void put_cred_rcu(struct rcu_head *rcu)
 	if (cred->group_info)
 		put_group_info(cred->group_info);
 	free_uid(cred->user);
+	put_user_ns(cred->user_ns);
 	kmem_cache_free(cred_jar, cred);
 }
 
@@ -303,6 +304,7 @@ struct cred *prepare_creds(void)
 	set_cred_subscribers(new, 0);
 	get_group_info(new->group_info);
 	get_uid(new->user);
+	get_user_ns(new->user_ns);
 
 #ifdef CONFIG_KEYS
 	key_get(new->thread_keyring);
@@ -412,11 +414,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
 			goto error_put;
 	}
 
-	/* cache user_ns in cred.  Doesn't need a refcount because it will
-	 * stay pinned by cred->user
-	 */
-	new->user_ns = new->user->user_ns;
-
 #ifdef CONFIG_KEYS
 	/* new threads get their own thread keyrings if their parent already
 	 * had one */
@@ -676,6 +673,7 @@ struct cred *prepare_kernel_cred(struct task_struct *daemon)
 	atomic_set(&new->usage, 1);
 	set_cred_subscribers(new, 0);
 	get_uid(new->user);
+	get_user_ns(new->user_ns);
 	get_group_info(new->group_info);
 
 #ifdef CONFIG_KEYS
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index f084083..58bb878 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -24,7 +24,7 @@ static struct kmem_cache *user_ns_cachep __read_mostly;
  */
 int create_user_ns(struct cred *new)
 {
-	struct user_namespace *ns;
+	struct user_namespace *ns, *parent_ns = new->user_ns;
 	struct user_struct *root_user;
 	int n;
 
@@ -57,8 +57,10 @@ int create_user_ns(struct cred *new)
 #endif
 	/* tgcred will be cleared in our caller bc CLONE_THREAD won't be set */
 
-	/* root_user holds a reference to ns, our reference can be dropped */
-	put_user_ns(ns);
+	/* Leave the reference to our user_ns with the new cred */
+	new->user_ns = ns;
+
+	put_user_ns(parent_ns);
 
 	return 0;
 }
diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index 70febff..447fb76 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -858,7 +858,7 @@ void key_replace_session_keyring(void)
 	new-> sgid	= old-> sgid;
 	new->fsgid	= old->fsgid;
 	new->user	= get_uid(old->user);
-	new->user_ns	= new->user_ns;
+	new->user_ns	= get_user_ns(new->user_ns);
 	new->group_info	= get_group_info(old->group_info);
 
 	new->securebits	= old->securebits;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 07/43] cred: Refcount the user_ns pointed to by the cred.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

struct user_struct will shortly loose it's user_ns reference
so make the cred user_ns reference a proper reference complete
with reference counting.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/cred.h         |    2 +-
 kernel/cred.c                |    8 +++-----
 kernel/user_namespace.c      |    8 +++++---
 security/keys/process_keys.c |    2 +-
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index d12c4e4..2c60ec8 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -146,7 +146,7 @@ struct cred {
 	void		*security;	/* subjective LSM security */
 #endif
 	struct user_struct *user;	/* real user ID subscription */
-	struct user_namespace *user_ns; /* cached user->user_ns */
+	struct user_namespace *user_ns; /* user_ns the caps and keyrings are relative to. */
 	struct group_info *group_info;	/* supplementary groups for euid/fsgid */
 	struct rcu_head	rcu;		/* RCU deletion hook */
 };
diff --git a/kernel/cred.c b/kernel/cred.c
index 97b36ee..7a0d806 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -148,6 +148,7 @@ static void put_cred_rcu(struct rcu_head *rcu)
 	if (cred->group_info)
 		put_group_info(cred->group_info);
 	free_uid(cred->user);
+	put_user_ns(cred->user_ns);
 	kmem_cache_free(cred_jar, cred);
 }
 
@@ -303,6 +304,7 @@ struct cred *prepare_creds(void)
 	set_cred_subscribers(new, 0);
 	get_group_info(new->group_info);
 	get_uid(new->user);
+	get_user_ns(new->user_ns);
 
 #ifdef CONFIG_KEYS
 	key_get(new->thread_keyring);
@@ -412,11 +414,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
 			goto error_put;
 	}
 
-	/* cache user_ns in cred.  Doesn't need a refcount because it will
-	 * stay pinned by cred->user
-	 */
-	new->user_ns = new->user->user_ns;
-
 #ifdef CONFIG_KEYS
 	/* new threads get their own thread keyrings if their parent already
 	 * had one */
@@ -676,6 +673,7 @@ struct cred *prepare_kernel_cred(struct task_struct *daemon)
 	atomic_set(&new->usage, 1);
 	set_cred_subscribers(new, 0);
 	get_uid(new->user);
+	get_user_ns(new->user_ns);
 	get_group_info(new->group_info);
 
 #ifdef CONFIG_KEYS
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index f084083..58bb878 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -24,7 +24,7 @@ static struct kmem_cache *user_ns_cachep __read_mostly;
  */
 int create_user_ns(struct cred *new)
 {
-	struct user_namespace *ns;
+	struct user_namespace *ns, *parent_ns = new->user_ns;
 	struct user_struct *root_user;
 	int n;
 
@@ -57,8 +57,10 @@ int create_user_ns(struct cred *new)
 #endif
 	/* tgcred will be cleared in our caller bc CLONE_THREAD won't be set */
 
-	/* root_user holds a reference to ns, our reference can be dropped */
-	put_user_ns(ns);
+	/* Leave the reference to our user_ns with the new cred */
+	new->user_ns = ns;
+
+	put_user_ns(parent_ns);
 
 	return 0;
 }
diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index 70febff..447fb76 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -858,7 +858,7 @@ void key_replace_session_keyring(void)
 	new-> sgid	= old-> sgid;
 	new->fsgid	= old->fsgid;
 	new->user	= get_uid(old->user);
-	new->user_ns	= new->user_ns;
+	new->user_ns	= get_user_ns(new->user_ns);
 	new->group_info	= get_group_info(old->group_info);
 
 	new->securebits	= old->securebits;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 07/43] cred: Refcount the user_ns pointed to by the cred.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

struct user_struct will shortly loose it's user_ns reference
so make the cred user_ns reference a proper reference complete
with reference counting.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/cred.h         |    2 +-
 kernel/cred.c                |    8 +++-----
 kernel/user_namespace.c      |    8 +++++---
 security/keys/process_keys.c |    2 +-
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index d12c4e4..2c60ec8 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -146,7 +146,7 @@ struct cred {
 	void		*security;	/* subjective LSM security */
 #endif
 	struct user_struct *user;	/* real user ID subscription */
-	struct user_namespace *user_ns; /* cached user->user_ns */
+	struct user_namespace *user_ns; /* user_ns the caps and keyrings are relative to. */
 	struct group_info *group_info;	/* supplementary groups for euid/fsgid */
 	struct rcu_head	rcu;		/* RCU deletion hook */
 };
diff --git a/kernel/cred.c b/kernel/cred.c
index 97b36ee..7a0d806 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -148,6 +148,7 @@ static void put_cred_rcu(struct rcu_head *rcu)
 	if (cred->group_info)
 		put_group_info(cred->group_info);
 	free_uid(cred->user);
+	put_user_ns(cred->user_ns);
 	kmem_cache_free(cred_jar, cred);
 }
 
@@ -303,6 +304,7 @@ struct cred *prepare_creds(void)
 	set_cred_subscribers(new, 0);
 	get_group_info(new->group_info);
 	get_uid(new->user);
+	get_user_ns(new->user_ns);
 
 #ifdef CONFIG_KEYS
 	key_get(new->thread_keyring);
@@ -412,11 +414,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
 			goto error_put;
 	}
 
-	/* cache user_ns in cred.  Doesn't need a refcount because it will
-	 * stay pinned by cred->user
-	 */
-	new->user_ns = new->user->user_ns;
-
 #ifdef CONFIG_KEYS
 	/* new threads get their own thread keyrings if their parent already
 	 * had one */
@@ -676,6 +673,7 @@ struct cred *prepare_kernel_cred(struct task_struct *daemon)
 	atomic_set(&new->usage, 1);
 	set_cred_subscribers(new, 0);
 	get_uid(new->user);
+	get_user_ns(new->user_ns);
 	get_group_info(new->group_info);
 
 #ifdef CONFIG_KEYS
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index f084083..58bb878 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -24,7 +24,7 @@ static struct kmem_cache *user_ns_cachep __read_mostly;
  */
 int create_user_ns(struct cred *new)
 {
-	struct user_namespace *ns;
+	struct user_namespace *ns, *parent_ns = new->user_ns;
 	struct user_struct *root_user;
 	int n;
 
@@ -57,8 +57,10 @@ int create_user_ns(struct cred *new)
 #endif
 	/* tgcred will be cleared in our caller bc CLONE_THREAD won't be set */
 
-	/* root_user holds a reference to ns, our reference can be dropped */
-	put_user_ns(ns);
+	/* Leave the reference to our user_ns with the new cred */
+	new->user_ns = ns;
+
+	put_user_ns(parent_ns);
 
 	return 0;
 }
diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index 70febff..447fb76 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -858,7 +858,7 @@ void key_replace_session_keyring(void)
 	new-> sgid	= old-> sgid;
 	new->fsgid	= old->fsgid;
 	new->user	= get_uid(old->user);
-	new->user_ns	= new->user_ns;
+	new->user_ns	= get_user_ns(new->user_ns);
 	new->group_info	= get_group_info(old->group_info);
 
 	new->securebits	= old->securebits;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 08/43] userns: Add an explicit reference to the parent user namespace
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

I am about to remove the struct user_namespace reference from struct user_struct.
So keep an explicit track of the parent user namespace.

Take advantage of this new reference and replace instances of user_ns->creator->user_ns
with user_ns->parent.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/user_namespace.h |    1 +
 kernel/user_namespace.c        |   13 ++++++-------
 security/commoncap.c           |    2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index faf4679..dc2d85a 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -12,6 +12,7 @@
 struct user_namespace {
 	struct kref		kref;
 	struct hlist_head	uidhash_table[UIDHASH_SZ];
+	struct user_namespace	*parent;
 	struct user_struct	*creator;
 	struct work_struct	destroyer;
 };
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 58bb878..c15e533 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -45,6 +45,7 @@ int create_user_ns(struct cred *new)
 	}
 
 	/* set the new root user in the credentials under preparation */
+	ns->parent = parent_ns;
 	ns->creator = new->user;
 	new->user = root_user;
 	new->uid = new->euid = new->suid = new->fsuid = 0;
@@ -60,8 +61,6 @@ int create_user_ns(struct cred *new)
 	/* Leave the reference to our user_ns with the new cred */
 	new->user_ns = ns;
 
-	put_user_ns(parent_ns);
-
 	return 0;
 }
 
@@ -72,10 +71,12 @@ int create_user_ns(struct cred *new)
  */
 static void free_user_ns_work(struct work_struct *work)
 {
-	struct user_namespace *ns =
+	struct user_namespace *parent, *ns =
 		container_of(work, struct user_namespace, destroyer);
+	parent = ns->parent;
 	free_uid(ns->creator);
 	kmem_cache_free(user_ns_cachep, ns);
+	put_user_ns(parent);
 }
 
 void free_user_ns(struct kref *kref)
@@ -99,8 +100,7 @@ uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t
 	/* Is cred->user the creator of the target user_ns
 	 * or the creator of one of it's parents?
 	 */
-	for ( tmp = to; tmp != &init_user_ns;
-	      tmp = tmp->creator->user_ns ) {
+	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
 		if (cred->user == tmp->creator) {
 			return (uid_t)0;
 		}
@@ -120,8 +120,7 @@ gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t
 	/* Is cred->user the creator of the target user_ns
 	 * or the creator of one of it's parents?
 	 */
-	for ( tmp = to; tmp != &init_user_ns;
-	      tmp = tmp->creator->user_ns ) {
+	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
 		if (cred->user == tmp->creator) {
 			return (gid_t)0;
 		}
diff --git a/security/commoncap.c b/security/commoncap.c
index 8b3e10e..435d074 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -92,7 +92,7 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
 		 *If you have a capability in a parent user ns, then you have
 		 * it over all children user namespaces as well.
 		 */
-		targ_ns = targ_ns->creator->user_ns;
+		targ_ns = targ_ns->parent;
 	}
 
 	/* We never get here */
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 08/43] userns: Add an explicit reference to the parent user namespace
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

I am about to remove the struct user_namespace reference from struct user_struct.
So keep an explicit track of the parent user namespace.

Take advantage of this new reference and replace instances of user_ns->creator->user_ns
with user_ns->parent.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/user_namespace.h |    1 +
 kernel/user_namespace.c        |   13 ++++++-------
 security/commoncap.c           |    2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index faf4679..dc2d85a 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -12,6 +12,7 @@
 struct user_namespace {
 	struct kref		kref;
 	struct hlist_head	uidhash_table[UIDHASH_SZ];
+	struct user_namespace	*parent;
 	struct user_struct	*creator;
 	struct work_struct	destroyer;
 };
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 58bb878..c15e533 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -45,6 +45,7 @@ int create_user_ns(struct cred *new)
 	}
 
 	/* set the new root user in the credentials under preparation */
+	ns->parent = parent_ns;
 	ns->creator = new->user;
 	new->user = root_user;
 	new->uid = new->euid = new->suid = new->fsuid = 0;
@@ -60,8 +61,6 @@ int create_user_ns(struct cred *new)
 	/* Leave the reference to our user_ns with the new cred */
 	new->user_ns = ns;
 
-	put_user_ns(parent_ns);
-
 	return 0;
 }
 
@@ -72,10 +71,12 @@ int create_user_ns(struct cred *new)
  */
 static void free_user_ns_work(struct work_struct *work)
 {
-	struct user_namespace *ns =
+	struct user_namespace *parent, *ns =
 		container_of(work, struct user_namespace, destroyer);
+	parent = ns->parent;
 	free_uid(ns->creator);
 	kmem_cache_free(user_ns_cachep, ns);
+	put_user_ns(parent);
 }
 
 void free_user_ns(struct kref *kref)
@@ -99,8 +100,7 @@ uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t
 	/* Is cred->user the creator of the target user_ns
 	 * or the creator of one of it's parents?
 	 */
-	for ( tmp = to; tmp != &init_user_ns;
-	      tmp = tmp->creator->user_ns ) {
+	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
 		if (cred->user == tmp->creator) {
 			return (uid_t)0;
 		}
@@ -120,8 +120,7 @@ gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t
 	/* Is cred->user the creator of the target user_ns
 	 * or the creator of one of it's parents?
 	 */
-	for ( tmp = to; tmp != &init_user_ns;
-	      tmp = tmp->creator->user_ns ) {
+	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
 		if (cred->user == tmp->creator) {
 			return (gid_t)0;
 		}
diff --git a/security/commoncap.c b/security/commoncap.c
index 8b3e10e..435d074 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -92,7 +92,7 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
 		 *If you have a capability in a parent user ns, then you have
 		 * it over all children user namespaces as well.
 		 */
-		targ_ns = targ_ns->creator->user_ns;
+		targ_ns = targ_ns->parent;
 	}
 
 	/* We never get here */
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 08/43] userns: Add an explicit reference to the parent user namespace
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

I am about to remove the struct user_namespace reference from struct user_struct.
So keep an explicit track of the parent user namespace.

Take advantage of this new reference and replace instances of user_ns->creator->user_ns
with user_ns->parent.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/user_namespace.h |    1 +
 kernel/user_namespace.c        |   13 ++++++-------
 security/commoncap.c           |    2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index faf4679..dc2d85a 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -12,6 +12,7 @@
 struct user_namespace {
 	struct kref		kref;
 	struct hlist_head	uidhash_table[UIDHASH_SZ];
+	struct user_namespace	*parent;
 	struct user_struct	*creator;
 	struct work_struct	destroyer;
 };
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 58bb878..c15e533 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -45,6 +45,7 @@ int create_user_ns(struct cred *new)
 	}
 
 	/* set the new root user in the credentials under preparation */
+	ns->parent = parent_ns;
 	ns->creator = new->user;
 	new->user = root_user;
 	new->uid = new->euid = new->suid = new->fsuid = 0;
@@ -60,8 +61,6 @@ int create_user_ns(struct cred *new)
 	/* Leave the reference to our user_ns with the new cred */
 	new->user_ns = ns;
 
-	put_user_ns(parent_ns);
-
 	return 0;
 }
 
@@ -72,10 +71,12 @@ int create_user_ns(struct cred *new)
  */
 static void free_user_ns_work(struct work_struct *work)
 {
-	struct user_namespace *ns =
+	struct user_namespace *parent, *ns =
 		container_of(work, struct user_namespace, destroyer);
+	parent = ns->parent;
 	free_uid(ns->creator);
 	kmem_cache_free(user_ns_cachep, ns);
+	put_user_ns(parent);
 }
 
 void free_user_ns(struct kref *kref)
@@ -99,8 +100,7 @@ uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t
 	/* Is cred->user the creator of the target user_ns
 	 * or the creator of one of it's parents?
 	 */
-	for ( tmp = to; tmp != &init_user_ns;
-	      tmp = tmp->creator->user_ns ) {
+	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
 		if (cred->user == tmp->creator) {
 			return (uid_t)0;
 		}
@@ -120,8 +120,7 @@ gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t
 	/* Is cred->user the creator of the target user_ns
 	 * or the creator of one of it's parents?
 	 */
-	for ( tmp = to; tmp != &init_user_ns;
-	      tmp = tmp->creator->user_ns ) {
+	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
 		if (cred->user == tmp->creator) {
 			return (gid_t)0;
 		}
diff --git a/security/commoncap.c b/security/commoncap.c
index 8b3e10e..435d074 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -92,7 +92,7 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
 		 *If you have a capability in a parent user ns, then you have
 		 * it over all children user namespaces as well.
 		 */
-		targ_ns = targ_ns->creator->user_ns;
+		targ_ns = targ_ns->parent;
 	}
 
 	/* We never get here */
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 09/43] mqueue: Explicitly capture the user namespace to send the notification to.
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Stop relying on user->user_ns which is going away and instead capture
the user_namespace of the process we are supposed to notify.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 ipc/mqueue.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 28bd64d..b53cf34 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -66,6 +66,7 @@ struct mqueue_inode_info {
 
 	struct sigevent notify;
 	struct pid* notify_owner;
+	struct user_namespace *notify_user_ns;
 	struct user_struct *user;	/* user who created, for accounting */
 	struct sock *notify_sock;
 	struct sk_buff *notify_cookie;
@@ -139,6 +140,7 @@ static struct inode *mqueue_get_inode(struct super_block *sb,
 		INIT_LIST_HEAD(&info->e_wait_q[0].list);
 		INIT_LIST_HEAD(&info->e_wait_q[1].list);
 		info->notify_owner = NULL;
+		info->notify_user_ns = NULL;
 		info->qsize = 0;
 		info->user = NULL;	/* set when all is ok */
 		memset(&info->attr, 0, sizeof(info->attr));
@@ -536,7 +538,7 @@ static void __do_notify(struct mqueue_inode_info *info)
 			rcu_read_lock();
 			sig_i.si_pid = task_tgid_nr_ns(current,
 						ns_of_pid(info->notify_owner));
-			sig_i.si_uid = user_ns_map_uid(info->user->user_ns,
+			sig_i.si_uid = user_ns_map_uid(info->notify_user_ns,
 						current_cred(), current_uid());
 			rcu_read_unlock();
 
@@ -550,7 +552,9 @@ static void __do_notify(struct mqueue_inode_info *info)
 		}
 		/* after notification unregisters process */
 		put_pid(info->notify_owner);
+		put_user_ns(info->notify_user_ns);
 		info->notify_owner = NULL;
+		info->notify_user_ns = NULL;
 	}
 	wake_up(&info->wait_q);
 }
@@ -575,7 +579,9 @@ static void remove_notification(struct mqueue_inode_info *info)
 		netlink_sendskb(info->notify_sock, info->notify_cookie);
 	}
 	put_pid(info->notify_owner);
+	put_user_ns(info->notify_user_ns);
 	info->notify_owner = NULL;
+	info->notify_user_ns = NULL;
 }
 
 static int mq_attr_ok(struct ipc_namespace *ipc_ns, struct mq_attr *attr)
@@ -1140,6 +1146,7 @@ retry:
 		}
 
 		info->notify_owner = get_pid(task_tgid(current));
+		info->notify_user_ns = get_user_ns(current_user_ns());
 		inode->i_atime = inode->i_ctime = CURRENT_TIME;
 	}
 	spin_unlock(&info->lock);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 09/43] mqueue: Explicitly capture the user namespace to send the notification to.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Stop relying on user->user_ns which is going away and instead capture
the user_namespace of the process we are supposed to notify.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 ipc/mqueue.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 28bd64d..b53cf34 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -66,6 +66,7 @@ struct mqueue_inode_info {
 
 	struct sigevent notify;
 	struct pid* notify_owner;
+	struct user_namespace *notify_user_ns;
 	struct user_struct *user;	/* user who created, for accounting */
 	struct sock *notify_sock;
 	struct sk_buff *notify_cookie;
@@ -139,6 +140,7 @@ static struct inode *mqueue_get_inode(struct super_block *sb,
 		INIT_LIST_HEAD(&info->e_wait_q[0].list);
 		INIT_LIST_HEAD(&info->e_wait_q[1].list);
 		info->notify_owner = NULL;
+		info->notify_user_ns = NULL;
 		info->qsize = 0;
 		info->user = NULL;	/* set when all is ok */
 		memset(&info->attr, 0, sizeof(info->attr));
@@ -536,7 +538,7 @@ static void __do_notify(struct mqueue_inode_info *info)
 			rcu_read_lock();
 			sig_i.si_pid = task_tgid_nr_ns(current,
 						ns_of_pid(info->notify_owner));
-			sig_i.si_uid = user_ns_map_uid(info->user->user_ns,
+			sig_i.si_uid = user_ns_map_uid(info->notify_user_ns,
 						current_cred(), current_uid());
 			rcu_read_unlock();
 
@@ -550,7 +552,9 @@ static void __do_notify(struct mqueue_inode_info *info)
 		}
 		/* after notification unregisters process */
 		put_pid(info->notify_owner);
+		put_user_ns(info->notify_user_ns);
 		info->notify_owner = NULL;
+		info->notify_user_ns = NULL;
 	}
 	wake_up(&info->wait_q);
 }
@@ -575,7 +579,9 @@ static void remove_notification(struct mqueue_inode_info *info)
 		netlink_sendskb(info->notify_sock, info->notify_cookie);
 	}
 	put_pid(info->notify_owner);
+	put_user_ns(info->notify_user_ns);
 	info->notify_owner = NULL;
+	info->notify_user_ns = NULL;
 }
 
 static int mq_attr_ok(struct ipc_namespace *ipc_ns, struct mq_attr *attr)
@@ -1140,6 +1146,7 @@ retry:
 		}
 
 		info->notify_owner = get_pid(task_tgid(current));
+		info->notify_user_ns = get_user_ns(current_user_ns());
 		inode->i_atime = inode->i_ctime = CURRENT_TIME;
 	}
 	spin_unlock(&info->lock);
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 09/43] mqueue: Explicitly capture the user namespace to send the notification to.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Stop relying on user->user_ns which is going away and instead capture
the user_namespace of the process we are supposed to notify.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 ipc/mqueue.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 28bd64d..b53cf34 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -66,6 +66,7 @@ struct mqueue_inode_info {
 
 	struct sigevent notify;
 	struct pid* notify_owner;
+	struct user_namespace *notify_user_ns;
 	struct user_struct *user;	/* user who created, for accounting */
 	struct sock *notify_sock;
 	struct sk_buff *notify_cookie;
@@ -139,6 +140,7 @@ static struct inode *mqueue_get_inode(struct super_block *sb,
 		INIT_LIST_HEAD(&info->e_wait_q[0].list);
 		INIT_LIST_HEAD(&info->e_wait_q[1].list);
 		info->notify_owner = NULL;
+		info->notify_user_ns = NULL;
 		info->qsize = 0;
 		info->user = NULL;	/* set when all is ok */
 		memset(&info->attr, 0, sizeof(info->attr));
@@ -536,7 +538,7 @@ static void __do_notify(struct mqueue_inode_info *info)
 			rcu_read_lock();
 			sig_i.si_pid = task_tgid_nr_ns(current,
 						ns_of_pid(info->notify_owner));
-			sig_i.si_uid = user_ns_map_uid(info->user->user_ns,
+			sig_i.si_uid = user_ns_map_uid(info->notify_user_ns,
 						current_cred(), current_uid());
 			rcu_read_unlock();
 
@@ -550,7 +552,9 @@ static void __do_notify(struct mqueue_inode_info *info)
 		}
 		/* after notification unregisters process */
 		put_pid(info->notify_owner);
+		put_user_ns(info->notify_user_ns);
 		info->notify_owner = NULL;
+		info->notify_user_ns = NULL;
 	}
 	wake_up(&info->wait_q);
 }
@@ -575,7 +579,9 @@ static void remove_notification(struct mqueue_inode_info *info)
 		netlink_sendskb(info->notify_sock, info->notify_cookie);
 	}
 	put_pid(info->notify_owner);
+	put_user_ns(info->notify_user_ns);
 	info->notify_owner = NULL;
+	info->notify_user_ns = NULL;
 }
 
 static int mq_attr_ok(struct ipc_namespace *ipc_ns, struct mq_attr *attr)
@@ -1140,6 +1146,7 @@ retry:
 		}
 
 		info->notify_owner = get_pid(task_tgid(current));
+		info->notify_user_ns = get_user_ns(current_user_ns());
 		inode->i_atime = inode->i_ctime = CURRENT_TIME;
 	}
 	spin_unlock(&info->lock);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 10/43] userns: Deprecate and rename the user_namespace reference in the user_struct
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

With a user_ns reference in struct cred the only user of the user namespace
reference in struct user_struct is to keep the uid hash table alive.

The user_namespace reference in struct user_struct will be going away soon, and
I have removed all of the references.  Rename the field from user_ns to _user_ns
so that the compiler can verify nothing follows the user struct to the user
namespace anymore.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/sched.h |    2 +-
 kernel/user.c         |    6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 720ce8d..6867ae9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -729,7 +729,7 @@ struct user_struct {
 	/* Hash table maintenance information */
 	struct hlist_node uidhash_node;
 	uid_t uid;
-	struct user_namespace *user_ns;
+	struct user_namespace *_user_ns; /* Don't use will be removed soon */
 
 #ifdef CONFIG_PERF_EVENTS
 	atomic_long_t locked_vm;
diff --git a/kernel/user.c b/kernel/user.c
index 71dd236..d65fec0 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -58,7 +58,7 @@ struct user_struct root_user = {
 	.files		= ATOMIC_INIT(0),
 	.sigpending	= ATOMIC_INIT(0),
 	.locked_shm     = 0,
-	.user_ns	= &init_user_ns,
+	._user_ns	= &init_user_ns,
 };
 
 /*
@@ -72,7 +72,7 @@ static void uid_hash_insert(struct user_struct *up, struct hlist_head *hashent)
 static void uid_hash_remove(struct user_struct *up)
 {
 	hlist_del_init(&up->uidhash_node);
-	put_user_ns(up->user_ns);
+	put_user_ns(up->_user_ns); /* It is safe to free the uid hash table now */
 }
 
 static struct user_struct *uid_hash_find(uid_t uid, struct hlist_head *hashent)
@@ -153,7 +153,7 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 		new->uid = uid;
 		atomic_set(&new->__count, 1);
 
-		new->user_ns = get_user_ns(ns);
+		new->_user_ns = get_user_ns(ns);
 
 		/*
 		 * Before adding this, check whether we raced
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 10/43] userns: Deprecate and rename the user_namespace reference in the user_struct
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

With a user_ns reference in struct cred the only user of the user namespace
reference in struct user_struct is to keep the uid hash table alive.

The user_namespace reference in struct user_struct will be going away soon, and
I have removed all of the references.  Rename the field from user_ns to _user_ns
so that the compiler can verify nothing follows the user struct to the user
namespace anymore.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sched.h |    2 +-
 kernel/user.c         |    6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 720ce8d..6867ae9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -729,7 +729,7 @@ struct user_struct {
 	/* Hash table maintenance information */
 	struct hlist_node uidhash_node;
 	uid_t uid;
-	struct user_namespace *user_ns;
+	struct user_namespace *_user_ns; /* Don't use will be removed soon */
 
 #ifdef CONFIG_PERF_EVENTS
 	atomic_long_t locked_vm;
diff --git a/kernel/user.c b/kernel/user.c
index 71dd236..d65fec0 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -58,7 +58,7 @@ struct user_struct root_user = {
 	.files		= ATOMIC_INIT(0),
 	.sigpending	= ATOMIC_INIT(0),
 	.locked_shm     = 0,
-	.user_ns	= &init_user_ns,
+	._user_ns	= &init_user_ns,
 };
 
 /*
@@ -72,7 +72,7 @@ static void uid_hash_insert(struct user_struct *up, struct hlist_head *hashent)
 static void uid_hash_remove(struct user_struct *up)
 {
 	hlist_del_init(&up->uidhash_node);
-	put_user_ns(up->user_ns);
+	put_user_ns(up->_user_ns); /* It is safe to free the uid hash table now */
 }
 
 static struct user_struct *uid_hash_find(uid_t uid, struct hlist_head *hashent)
@@ -153,7 +153,7 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 		new->uid = uid;
 		atomic_set(&new->__count, 1);
 
-		new->user_ns = get_user_ns(ns);
+		new->_user_ns = get_user_ns(ns);
 
 		/*
 		 * Before adding this, check whether we raced
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 10/43] userns: Deprecate and rename the user_namespace reference in the user_struct
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

With a user_ns reference in struct cred the only user of the user namespace
reference in struct user_struct is to keep the uid hash table alive.

The user_namespace reference in struct user_struct will be going away soon, and
I have removed all of the references.  Rename the field from user_ns to _user_ns
so that the compiler can verify nothing follows the user struct to the user
namespace anymore.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/sched.h |    2 +-
 kernel/user.c         |    6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 720ce8d..6867ae9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -729,7 +729,7 @@ struct user_struct {
 	/* Hash table maintenance information */
 	struct hlist_node uidhash_node;
 	uid_t uid;
-	struct user_namespace *user_ns;
+	struct user_namespace *_user_ns; /* Don't use will be removed soon */
 
 #ifdef CONFIG_PERF_EVENTS
 	atomic_long_t locked_vm;
diff --git a/kernel/user.c b/kernel/user.c
index 71dd236..d65fec0 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -58,7 +58,7 @@ struct user_struct root_user = {
 	.files		= ATOMIC_INIT(0),
 	.sigpending	= ATOMIC_INIT(0),
 	.locked_shm     = 0,
-	.user_ns	= &init_user_ns,
+	._user_ns	= &init_user_ns,
 };
 
 /*
@@ -72,7 +72,7 @@ static void uid_hash_insert(struct user_struct *up, struct hlist_head *hashent)
 static void uid_hash_remove(struct user_struct *up)
 {
 	hlist_del_init(&up->uidhash_node);
-	put_user_ns(up->user_ns);
+	put_user_ns(up->_user_ns); /* It is safe to free the uid hash table now */
 }
 
 static struct user_struct *uid_hash_find(uid_t uid, struct hlist_head *hashent)
@@ -153,7 +153,7 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 		new->uid = uid;
 		atomic_set(&new->__count, 1);
 
-		new->user_ns = get_user_ns(ns);
+		new->_user_ns = get_user_ns(ns);
 
 		/*
 		 * Before adding this, check whether we raced
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 11/43] userns: Start out with a full set of capabilities.
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/user_namespace.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index c15e533..e216e1e 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -11,6 +11,7 @@
 #include <linux/user_namespace.h>
 #include <linux/highuid.h>
 #include <linux/cred.h>
+#include <linux/securebits.h>
 
 static struct kmem_cache *user_ns_cachep __read_mostly;
 
@@ -52,6 +53,14 @@ int create_user_ns(struct cred *new)
 	new->gid = new->egid = new->sgid = new->fsgid = 0;
 	put_group_info(new->group_info);
 	new->group_info = get_group_info(&init_groups);
+	/* Start with the same capabilities as init but useless for doing
+	 * anything as the capabilities are bound to the new user namespace.
+	 */
+	new->securebits = SECUREBITS_DEFAULT;
+	new->cap_inheritable = CAP_EMPTY_SET;
+	new->cap_permitted = CAP_FULL_SET;
+	new->cap_effective = CAP_FULL_SET;
+	new->cap_bset = CAP_FULL_SET;
 #ifdef CONFIG_KEYS
 	key_put(new->request_key_auth);
 	new->request_key_auth = NULL;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 11/43] userns: Start out with a full set of capabilities.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 kernel/user_namespace.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index c15e533..e216e1e 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -11,6 +11,7 @@
 #include <linux/user_namespace.h>
 #include <linux/highuid.h>
 #include <linux/cred.h>
+#include <linux/securebits.h>
 
 static struct kmem_cache *user_ns_cachep __read_mostly;
 
@@ -52,6 +53,14 @@ int create_user_ns(struct cred *new)
 	new->gid = new->egid = new->sgid = new->fsgid = 0;
 	put_group_info(new->group_info);
 	new->group_info = get_group_info(&init_groups);
+	/* Start with the same capabilities as init but useless for doing
+	 * anything as the capabilities are bound to the new user namespace.
+	 */
+	new->securebits = SECUREBITS_DEFAULT;
+	new->cap_inheritable = CAP_EMPTY_SET;
+	new->cap_permitted = CAP_FULL_SET;
+	new->cap_effective = CAP_FULL_SET;
+	new->cap_bset = CAP_FULL_SET;
 #ifdef CONFIG_KEYS
 	key_put(new->request_key_auth);
 	new->request_key_auth = NULL;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 11/43] userns: Start out with a full set of capabilities.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/user_namespace.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index c15e533..e216e1e 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -11,6 +11,7 @@
 #include <linux/user_namespace.h>
 #include <linux/highuid.h>
 #include <linux/cred.h>
+#include <linux/securebits.h>
 
 static struct kmem_cache *user_ns_cachep __read_mostly;
 
@@ -52,6 +53,14 @@ int create_user_ns(struct cred *new)
 	new->gid = new->egid = new->sgid = new->fsgid = 0;
 	put_group_info(new->group_info);
 	new->group_info = get_group_info(&init_groups);
+	/* Start with the same capabilities as init but useless for doing
+	 * anything as the capabilities are bound to the new user namespace.
+	 */
+	new->securebits = SECUREBITS_DEFAULT;
+	new->cap_inheritable = CAP_EMPTY_SET;
+	new->cap_permitted = CAP_FULL_SET;
+	new->cap_effective = CAP_FULL_SET;
+	new->cap_bset = CAP_FULL_SET;
 #ifdef CONFIG_KEYS
 	key_put(new->request_key_auth);
 	new->request_key_auth = NULL;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 12/43] userns: Replace the hard to write inode_userns with inode_capable.
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

This represents a change in strategy of how to handle user namespaces.
Instead of tagging everything explicitly with a user namespace and bulking
up all of the comparisons of uids and gids in the kernel,  all uids and gids
in use will have a mapping to a flat kuid and kgid spaces respectively.  This
allows much more of the existing logic to be preserved and in general
allows for faster code.

In this new and improved world we allow someone to utiliize capabilities
over an inode if the inodes owner mapps into the capabilities holders user
namespace and the user has capabilities in their user namespace.  Which
is simple and efficient.

Moving the fs uid comparisons to be comparisons in a flat kuid space
follows in later patches, something that is only significant if you
are using user namespaces.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/inode.c                 |    6 ++----
 fs/namei.c                 |   18 +++++-------------
 include/linux/capability.h |    2 ++
 include/linux/fs.h         |    6 ------
 kernel/capability.c        |   19 +++++++++++++++++++
 5 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 9f4f5fe..f0c4ace 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1732,11 +1732,9 @@ EXPORT_SYMBOL(inode_init_owner);
  */
 bool inode_owner_or_capable(const struct inode *inode)
 {
-	struct user_namespace *ns = inode_userns(inode);
-
-	if (current_user_ns() == ns && current_fsuid() == inode->i_uid)
+	if (current_fsuid() == inode->i_uid)
 		return true;
-	if (ns_capable(ns, CAP_FOWNER))
+	if (inode_capable(inode, CAP_FOWNER))
 		return true;
 	return false;
 }
diff --git a/fs/namei.c b/fs/namei.c
index 701954d..941c436 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -228,9 +228,6 @@ static int acl_permission_check(struct inode *inode, int mask)
 {
 	unsigned int mode = inode->i_mode;
 
-	if (current_user_ns() != inode_userns(inode))
-		goto other_perms;
-
 	if (likely(current_fsuid() == inode->i_uid))
 		mode >>= 6;
 	else {
@@ -244,7 +241,6 @@ static int acl_permission_check(struct inode *inode, int mask)
 			mode >>= 3;
 	}
 
-other_perms:
 	/*
 	 * If the DACs are ok we don't need any capability check.
 	 */
@@ -280,10 +276,10 @@ int generic_permission(struct inode *inode, int mask)
 
 	if (S_ISDIR(inode->i_mode)) {
 		/* DACs are overridable for directories */
-		if (ns_capable(inode_userns(inode), CAP_DAC_OVERRIDE))
+		if (inode_capable(inode, CAP_DAC_OVERRIDE))
 			return 0;
 		if (!(mask & MAY_WRITE))
-			if (ns_capable(inode_userns(inode), CAP_DAC_READ_SEARCH))
+			if (inode_capable(inode, CAP_DAC_READ_SEARCH))
 				return 0;
 		return -EACCES;
 	}
@@ -293,7 +289,7 @@ int generic_permission(struct inode *inode, int mask)
 	 * at least one exec bit set.
 	 */
 	if (!(mask & MAY_EXEC) || (inode->i_mode & S_IXUGO))
-		if (ns_capable(inode_userns(inode), CAP_DAC_OVERRIDE))
+		if (inode_capable(inode, CAP_DAC_OVERRIDE))
 			return 0;
 
 	/*
@@ -301,7 +297,7 @@ int generic_permission(struct inode *inode, int mask)
 	 */
 	mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
 	if (mask == MAY_READ)
-		if (ns_capable(inode_userns(inode), CAP_DAC_READ_SEARCH))
+		if (inode_capable(inode, CAP_DAC_READ_SEARCH))
 			return 0;
 
 	return -EACCES;
@@ -1964,15 +1960,11 @@ static inline int check_sticky(struct inode *dir, struct inode *inode)
 
 	if (!(dir->i_mode & S_ISVTX))
 		return 0;
-	if (current_user_ns() != inode_userns(inode))
-		goto other_userns;
 	if (inode->i_uid == fsuid)
 		return 0;
 	if (dir->i_uid == fsuid)
 		return 0;
-
-other_userns:
-	return !ns_capable(inode_userns(inode), CAP_FOWNER);
+	return !inode_capable(inode, CAP_FOWNER);
 }
 
 /*
diff --git a/include/linux/capability.h b/include/linux/capability.h
index 12d52de..a76eca9 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -374,6 +374,7 @@ struct cpu_vfs_cap_data {
 
 #ifdef __KERNEL__
 
+struct inode;
 struct dentry;
 struct user_namespace;
 
@@ -548,6 +549,7 @@ extern bool has_ns_capability_noaudit(struct task_struct *t,
 extern bool capable(int cap);
 extern bool ns_capable(struct user_namespace *ns, int cap);
 extern bool nsown_capable(int cap);
+extern bool inode_capable(const struct inode *inode, int cap);
 
 /* audit system wants to get cap info from files as well */
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 135693e..a6c5efb 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1522,12 +1522,6 @@ enum {
 #define vfs_check_frozen(sb, level) \
 	wait_event((sb)->s_wait_unfrozen, ((sb)->s_frozen < (level)))
 
-/*
- * until VFS tracks user namespaces for inodes, just make all files
- * belong to init_user_ns
- */
-extern struct user_namespace init_user_ns;
-#define inode_userns(inode) (&init_user_ns)
 extern bool inode_owner_or_capable(const struct inode *inode);
 
 /* not quite ready to be deprecated, but... */
diff --git a/kernel/capability.c b/kernel/capability.c
index 3f1adb6..cc5f071 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -419,3 +419,22 @@ bool nsown_capable(int cap)
 {
 	return ns_capable(current_user_ns(), cap);
 }
+
+/**
+ * inode_capable - Check superior capability over inode
+ * @inode: The inode in question
+ * @cap: The capability in question
+ *
+ * Return true if the current task has the given superior capability
+ * targeted at it's own user namespace and that the given inode is owned
+ * by the current user namespace or a child namespace.
+ *
+ * Currently inodes can only be owned by the initial user namespace.
+ *
+ */
+bool inode_capable(const struct inode *inode, int cap)
+{
+	struct user_namespace *ns = current_user_ns();
+
+	return ns_capable(ns, cap) && (ns == &init_user_ns);
+}
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 12/43] userns: Replace the hard to write inode_userns with inode_capable.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

This represents a change in strategy of how to handle user namespaces.
Instead of tagging everything explicitly with a user namespace and bulking
up all of the comparisons of uids and gids in the kernel,  all uids and gids
in use will have a mapping to a flat kuid and kgid spaces respectively.  This
allows much more of the existing logic to be preserved and in general
allows for faster code.

In this new and improved world we allow someone to utiliize capabilities
over an inode if the inodes owner mapps into the capabilities holders user
namespace and the user has capabilities in their user namespace.  Which
is simple and efficient.

Moving the fs uid comparisons to be comparisons in a flat kuid space
follows in later patches, something that is only significant if you
are using user namespaces.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/inode.c                 |    6 ++----
 fs/namei.c                 |   18 +++++-------------
 include/linux/capability.h |    2 ++
 include/linux/fs.h         |    6 ------
 kernel/capability.c        |   19 +++++++++++++++++++
 5 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 9f4f5fe..f0c4ace 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1732,11 +1732,9 @@ EXPORT_SYMBOL(inode_init_owner);
  */
 bool inode_owner_or_capable(const struct inode *inode)
 {
-	struct user_namespace *ns = inode_userns(inode);
-
-	if (current_user_ns() == ns && current_fsuid() == inode->i_uid)
+	if (current_fsuid() == inode->i_uid)
 		return true;
-	if (ns_capable(ns, CAP_FOWNER))
+	if (inode_capable(inode, CAP_FOWNER))
 		return true;
 	return false;
 }
diff --git a/fs/namei.c b/fs/namei.c
index 701954d..941c436 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -228,9 +228,6 @@ static int acl_permission_check(struct inode *inode, int mask)
 {
 	unsigned int mode = inode->i_mode;
 
-	if (current_user_ns() != inode_userns(inode))
-		goto other_perms;
-
 	if (likely(current_fsuid() == inode->i_uid))
 		mode >>= 6;
 	else {
@@ -244,7 +241,6 @@ static int acl_permission_check(struct inode *inode, int mask)
 			mode >>= 3;
 	}
 
-other_perms:
 	/*
 	 * If the DACs are ok we don't need any capability check.
 	 */
@@ -280,10 +276,10 @@ int generic_permission(struct inode *inode, int mask)
 
 	if (S_ISDIR(inode->i_mode)) {
 		/* DACs are overridable for directories */
-		if (ns_capable(inode_userns(inode), CAP_DAC_OVERRIDE))
+		if (inode_capable(inode, CAP_DAC_OVERRIDE))
 			return 0;
 		if (!(mask & MAY_WRITE))
-			if (ns_capable(inode_userns(inode), CAP_DAC_READ_SEARCH))
+			if (inode_capable(inode, CAP_DAC_READ_SEARCH))
 				return 0;
 		return -EACCES;
 	}
@@ -293,7 +289,7 @@ int generic_permission(struct inode *inode, int mask)
 	 * at least one exec bit set.
 	 */
 	if (!(mask & MAY_EXEC) || (inode->i_mode & S_IXUGO))
-		if (ns_capable(inode_userns(inode), CAP_DAC_OVERRIDE))
+		if (inode_capable(inode, CAP_DAC_OVERRIDE))
 			return 0;
 
 	/*
@@ -301,7 +297,7 @@ int generic_permission(struct inode *inode, int mask)
 	 */
 	mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
 	if (mask == MAY_READ)
-		if (ns_capable(inode_userns(inode), CAP_DAC_READ_SEARCH))
+		if (inode_capable(inode, CAP_DAC_READ_SEARCH))
 			return 0;
 
 	return -EACCES;
@@ -1964,15 +1960,11 @@ static inline int check_sticky(struct inode *dir, struct inode *inode)
 
 	if (!(dir->i_mode & S_ISVTX))
 		return 0;
-	if (current_user_ns() != inode_userns(inode))
-		goto other_userns;
 	if (inode->i_uid == fsuid)
 		return 0;
 	if (dir->i_uid == fsuid)
 		return 0;
-
-other_userns:
-	return !ns_capable(inode_userns(inode), CAP_FOWNER);
+	return !inode_capable(inode, CAP_FOWNER);
 }
 
 /*
diff --git a/include/linux/capability.h b/include/linux/capability.h
index 12d52de..a76eca9 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -374,6 +374,7 @@ struct cpu_vfs_cap_data {
 
 #ifdef __KERNEL__
 
+struct inode;
 struct dentry;
 struct user_namespace;
 
@@ -548,6 +549,7 @@ extern bool has_ns_capability_noaudit(struct task_struct *t,
 extern bool capable(int cap);
 extern bool ns_capable(struct user_namespace *ns, int cap);
 extern bool nsown_capable(int cap);
+extern bool inode_capable(const struct inode *inode, int cap);
 
 /* audit system wants to get cap info from files as well */
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 135693e..a6c5efb 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1522,12 +1522,6 @@ enum {
 #define vfs_check_frozen(sb, level) \
 	wait_event((sb)->s_wait_unfrozen, ((sb)->s_frozen < (level)))
 
-/*
- * until VFS tracks user namespaces for inodes, just make all files
- * belong to init_user_ns
- */
-extern struct user_namespace init_user_ns;
-#define inode_userns(inode) (&init_user_ns)
 extern bool inode_owner_or_capable(const struct inode *inode);
 
 /* not quite ready to be deprecated, but... */
diff --git a/kernel/capability.c b/kernel/capability.c
index 3f1adb6..cc5f071 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -419,3 +419,22 @@ bool nsown_capable(int cap)
 {
 	return ns_capable(current_user_ns(), cap);
 }
+
+/**
+ * inode_capable - Check superior capability over inode
+ * @inode: The inode in question
+ * @cap: The capability in question
+ *
+ * Return true if the current task has the given superior capability
+ * targeted at it's own user namespace and that the given inode is owned
+ * by the current user namespace or a child namespace.
+ *
+ * Currently inodes can only be owned by the initial user namespace.
+ *
+ */
+bool inode_capable(const struct inode *inode, int cap)
+{
+	struct user_namespace *ns = current_user_ns();
+
+	return ns_capable(ns, cap) && (ns == &init_user_ns);
+}
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 12/43] userns: Replace the hard to write inode_userns with inode_capable.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

This represents a change in strategy of how to handle user namespaces.
Instead of tagging everything explicitly with a user namespace and bulking
up all of the comparisons of uids and gids in the kernel,  all uids and gids
in use will have a mapping to a flat kuid and kgid spaces respectively.  This
allows much more of the existing logic to be preserved and in general
allows for faster code.

In this new and improved world we allow someone to utiliize capabilities
over an inode if the inodes owner mapps into the capabilities holders user
namespace and the user has capabilities in their user namespace.  Which
is simple and efficient.

Moving the fs uid comparisons to be comparisons in a flat kuid space
follows in later patches, something that is only significant if you
are using user namespaces.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/inode.c                 |    6 ++----
 fs/namei.c                 |   18 +++++-------------
 include/linux/capability.h |    2 ++
 include/linux/fs.h         |    6 ------
 kernel/capability.c        |   19 +++++++++++++++++++
 5 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 9f4f5fe..f0c4ace 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1732,11 +1732,9 @@ EXPORT_SYMBOL(inode_init_owner);
  */
 bool inode_owner_or_capable(const struct inode *inode)
 {
-	struct user_namespace *ns = inode_userns(inode);
-
-	if (current_user_ns() == ns && current_fsuid() == inode->i_uid)
+	if (current_fsuid() == inode->i_uid)
 		return true;
-	if (ns_capable(ns, CAP_FOWNER))
+	if (inode_capable(inode, CAP_FOWNER))
 		return true;
 	return false;
 }
diff --git a/fs/namei.c b/fs/namei.c
index 701954d..941c436 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -228,9 +228,6 @@ static int acl_permission_check(struct inode *inode, int mask)
 {
 	unsigned int mode = inode->i_mode;
 
-	if (current_user_ns() != inode_userns(inode))
-		goto other_perms;
-
 	if (likely(current_fsuid() == inode->i_uid))
 		mode >>= 6;
 	else {
@@ -244,7 +241,6 @@ static int acl_permission_check(struct inode *inode, int mask)
 			mode >>= 3;
 	}
 
-other_perms:
 	/*
 	 * If the DACs are ok we don't need any capability check.
 	 */
@@ -280,10 +276,10 @@ int generic_permission(struct inode *inode, int mask)
 
 	if (S_ISDIR(inode->i_mode)) {
 		/* DACs are overridable for directories */
-		if (ns_capable(inode_userns(inode), CAP_DAC_OVERRIDE))
+		if (inode_capable(inode, CAP_DAC_OVERRIDE))
 			return 0;
 		if (!(mask & MAY_WRITE))
-			if (ns_capable(inode_userns(inode), CAP_DAC_READ_SEARCH))
+			if (inode_capable(inode, CAP_DAC_READ_SEARCH))
 				return 0;
 		return -EACCES;
 	}
@@ -293,7 +289,7 @@ int generic_permission(struct inode *inode, int mask)
 	 * at least one exec bit set.
 	 */
 	if (!(mask & MAY_EXEC) || (inode->i_mode & S_IXUGO))
-		if (ns_capable(inode_userns(inode), CAP_DAC_OVERRIDE))
+		if (inode_capable(inode, CAP_DAC_OVERRIDE))
 			return 0;
 
 	/*
@@ -301,7 +297,7 @@ int generic_permission(struct inode *inode, int mask)
 	 */
 	mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
 	if (mask == MAY_READ)
-		if (ns_capable(inode_userns(inode), CAP_DAC_READ_SEARCH))
+		if (inode_capable(inode, CAP_DAC_READ_SEARCH))
 			return 0;
 
 	return -EACCES;
@@ -1964,15 +1960,11 @@ static inline int check_sticky(struct inode *dir, struct inode *inode)
 
 	if (!(dir->i_mode & S_ISVTX))
 		return 0;
-	if (current_user_ns() != inode_userns(inode))
-		goto other_userns;
 	if (inode->i_uid == fsuid)
 		return 0;
 	if (dir->i_uid == fsuid)
 		return 0;
-
-other_userns:
-	return !ns_capable(inode_userns(inode), CAP_FOWNER);
+	return !inode_capable(inode, CAP_FOWNER);
 }
 
 /*
diff --git a/include/linux/capability.h b/include/linux/capability.h
index 12d52de..a76eca9 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -374,6 +374,7 @@ struct cpu_vfs_cap_data {
 
 #ifdef __KERNEL__
 
+struct inode;
 struct dentry;
 struct user_namespace;
 
@@ -548,6 +549,7 @@ extern bool has_ns_capability_noaudit(struct task_struct *t,
 extern bool capable(int cap);
 extern bool ns_capable(struct user_namespace *ns, int cap);
 extern bool nsown_capable(int cap);
+extern bool inode_capable(const struct inode *inode, int cap);
 
 /* audit system wants to get cap info from files as well */
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 135693e..a6c5efb 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1522,12 +1522,6 @@ enum {
 #define vfs_check_frozen(sb, level) \
 	wait_event((sb)->s_wait_unfrozen, ((sb)->s_frozen < (level)))
 
-/*
- * until VFS tracks user namespaces for inodes, just make all files
- * belong to init_user_ns
- */
-extern struct user_namespace init_user_ns;
-#define inode_userns(inode) (&init_user_ns)
 extern bool inode_owner_or_capable(const struct inode *inode);
 
 /* not quite ready to be deprecated, but... */
diff --git a/kernel/capability.c b/kernel/capability.c
index 3f1adb6..cc5f071 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -419,3 +419,22 @@ bool nsown_capable(int cap)
 {
 	return ns_capable(current_user_ns(), cap);
 }
+
+/**
+ * inode_capable - Check superior capability over inode
+ * @inode: The inode in question
+ * @cap: The capability in question
+ *
+ * Return true if the current task has the given superior capability
+ * targeted at it's own user namespace and that the given inode is owned
+ * by the current user namespace or a child namespace.
+ *
+ * Currently inodes can only be owned by the initial user namespace.
+ *
+ */
+bool inode_capable(const struct inode *inode, int cap)
+{
+	struct user_namespace *ns = current_user_ns();
+
+	return ns_capable(ns, cap) && (ns == &init_user_ns);
+}
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 13/43] userns: Add kuid_t and kgid_t and associated infrastructure in uidgid.h
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Start distinguishing between internal kernel uids and gids and
values that userspace can use.  This is done by introducing two
new types: kuid_t and kgid_t.  These types and their associated
functions are infrastructure are declared in the new header
uidgid.h.

Ultimately there will be a different implementation of the mapping
functions for use with user namespaces.  But to keep it simple
we introduce the mapping functions first to separate the meat
from the mechanical code conversions.

Export overflowuid and overflowgid so we can use from_kuid_munged
and from_kgid_munged in modular code.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/uidgid.h |  176 ++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sys.c           |    2 -
 2 files changed, 176 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/uidgid.h

diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
new file mode 100644
index 0000000..a0addb8
--- /dev/null
+++ b/include/linux/uidgid.h
@@ -0,0 +1,176 @@
+#ifndef _LINUX_UIDGID_H
+#define _LINUX_UIDGID_H
+
+/*
+ * A set of types for the internal kernel types representing uids and gids.
+ *
+ * The types defined in this header allow distinguishing which uids and gids in
+ * the kernel are values used by userspace and which uid and gid values are
+ * the internal kernel values.  With the addition of user namespaces the values
+ * can be different.  Using the type system makes it possible for the compiler
+ * to detect when we overlook these differences.
+ *
+ */
+#include <linux/types.h>
+#include <linux/highuid.h>
+
+struct user_namespace;
+extern struct user_namespace init_user_ns;
+
+#if defined(NOTYET)
+
+typedef struct {
+	uid_t val;
+} kuid_t;
+
+
+typedef struct {
+	gid_t val;
+} kgid_t;
+
+#define KUIDT_INIT(value) (kuid_t){ value }
+#define KGIDT_INIT(value) (kgid_t){ value }
+
+static inline uid_t __kuid_val(kuid_t uid)
+{
+	return uid.val;
+}
+
+static inline gid_t __kgid_val(kgid_t gid)
+{
+	return gid.val;
+}
+
+#else
+
+typedef uid_t kuid_t;
+typedef gid_t kgid_t;
+
+static inline uid_t __kuid_val(kuid_t uid)
+{
+	return uid;
+}
+
+static inline gid_t __kgid_val(kgid_t gid)
+{
+	return gid;
+}
+
+#define KUIDT_INIT(value) ((kuid_t) value )
+#define KGIDT_INIT(value) ((kgid_t) value )
+
+#endif
+
+#define GLOBAL_ROOT_UID KUIDT_INIT(0)
+#define GLOBAL_ROOT_GID KGIDT_INIT(0)
+
+#define INVALID_UID KUIDT_INIT(-1)
+#define INVALID_GID KGIDT_INIT(-1)
+
+static inline bool uid_eq(kuid_t left, kuid_t right)
+{
+	return __kuid_val(left) == __kuid_val(right);
+}
+
+static inline bool gid_eq(kgid_t left, kgid_t right)
+{
+	return __kgid_val(left) == __kgid_val(right);
+}
+
+static inline bool uid_gt(kuid_t left, kuid_t right)
+{
+	return __kuid_val(left) > __kuid_val(right);
+}
+
+static inline bool gid_gt(kgid_t left, kgid_t right)
+{
+	return __kgid_val(left) > __kgid_val(right);
+}
+
+static inline bool uid_gte(kuid_t left, kuid_t right)
+{
+	return __kuid_val(left) >= __kuid_val(right);
+}
+
+static inline bool gid_gte(kgid_t left, kgid_t right)
+{
+	return __kgid_val(left) >= __kgid_val(right);
+}
+
+static inline bool uid_lt(kuid_t left, kuid_t right)
+{
+	return __kuid_val(left) < __kuid_val(right);
+}
+
+static inline bool gid_lt(kgid_t left, kgid_t right)
+{
+	return __kgid_val(left) < __kgid_val(right);
+}
+
+static inline bool uid_lte(kuid_t left, kuid_t right)
+{
+	return __kuid_val(left) <= __kuid_val(right);
+}
+
+static inline bool gid_lte(kgid_t left, kgid_t right)
+{
+	return __kgid_val(left) <= __kgid_val(right);
+}
+
+static inline bool uid_valid(kuid_t uid)
+{
+	return !uid_eq(uid, INVALID_UID);
+}
+
+static inline bool gid_valid(kgid_t gid)
+{
+	return !gid_eq(gid, INVALID_GID);
+}
+
+static inline kuid_t make_kuid(struct user_namespace *from, uid_t uid)
+{
+	return KUIDT_INIT(uid);
+}
+
+static inline kgid_t make_kgid(struct user_namespace *from, gid_t gid)
+{
+	return KGIDT_INIT(gid);
+}
+
+static inline uid_t from_kuid(struct user_namespace *to, kuid_t kuid)
+{
+	return __kuid_val(kuid);
+}
+
+static inline gid_t from_kgid(struct user_namespace *to, kgid_t kgid)
+{
+	return __kgid_val(kgid);
+}
+
+static inline uid_t from_kuid_munged(struct user_namespace *to, kuid_t kuid)
+{
+	uid_t uid = from_kuid(to, kuid);
+	if (uid == (uid_t)-1)
+		uid = overflowuid;
+	return uid;
+}
+
+static inline gid_t from_kgid_munged(struct user_namespace *to, kgid_t kgid)
+{
+	gid_t gid = from_kgid(to, kgid);
+	if (gid == (gid_t)-1)
+		gid = overflowgid;
+	return gid;
+}
+
+static inline bool kuid_has_mapping(struct user_namespace *ns, kuid_t uid)
+{
+	return true;
+}
+
+static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid)
+{
+	return true;
+}
+
+#endif /* _LINUX_UIDGID_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 82d8714..7185241 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -93,10 +93,8 @@
 int overflowuid = DEFAULT_OVERFLOWUID;
 int overflowgid = DEFAULT_OVERFLOWGID;
 
-#ifdef CONFIG_UID16
 EXPORT_SYMBOL(overflowuid);
 EXPORT_SYMBOL(overflowgid);
-#endif
 
 /*
  * the same as above, but for filesystems which can only store a 16-bit
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 13/43] userns: Add kuid_t and kgid_t and associated infrastructure in uidgid.h
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Start distinguishing between internal kernel uids and gids and
values that userspace can use.  This is done by introducing two
new types: kuid_t and kgid_t.  These types and their associated
functions are infrastructure are declared in the new header
uidgid.h.

Ultimately there will be a different implementation of the mapping
functions for use with user namespaces.  But to keep it simple
we introduce the mapping functions first to separate the meat
from the mechanical code conversions.

Export overflowuid and overflowgid so we can use from_kuid_munged
and from_kgid_munged in modular code.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/uidgid.h |  176 ++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sys.c           |    2 -
 2 files changed, 176 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/uidgid.h

diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
new file mode 100644
index 0000000..a0addb8
--- /dev/null
+++ b/include/linux/uidgid.h
@@ -0,0 +1,176 @@
+#ifndef _LINUX_UIDGID_H
+#define _LINUX_UIDGID_H
+
+/*
+ * A set of types for the internal kernel types representing uids and gids.
+ *
+ * The types defined in this header allow distinguishing which uids and gids in
+ * the kernel are values used by userspace and which uid and gid values are
+ * the internal kernel values.  With the addition of user namespaces the values
+ * can be different.  Using the type system makes it possible for the compiler
+ * to detect when we overlook these differences.
+ *
+ */
+#include <linux/types.h>
+#include <linux/highuid.h>
+
+struct user_namespace;
+extern struct user_namespace init_user_ns;
+
+#if defined(NOTYET)
+
+typedef struct {
+	uid_t val;
+} kuid_t;
+
+
+typedef struct {
+	gid_t val;
+} kgid_t;
+
+#define KUIDT_INIT(value) (kuid_t){ value }
+#define KGIDT_INIT(value) (kgid_t){ value }
+
+static inline uid_t __kuid_val(kuid_t uid)
+{
+	return uid.val;
+}
+
+static inline gid_t __kgid_val(kgid_t gid)
+{
+	return gid.val;
+}
+
+#else
+
+typedef uid_t kuid_t;
+typedef gid_t kgid_t;
+
+static inline uid_t __kuid_val(kuid_t uid)
+{
+	return uid;
+}
+
+static inline gid_t __kgid_val(kgid_t gid)
+{
+	return gid;
+}
+
+#define KUIDT_INIT(value) ((kuid_t) value )
+#define KGIDT_INIT(value) ((kgid_t) value )
+
+#endif
+
+#define GLOBAL_ROOT_UID KUIDT_INIT(0)
+#define GLOBAL_ROOT_GID KGIDT_INIT(0)
+
+#define INVALID_UID KUIDT_INIT(-1)
+#define INVALID_GID KGIDT_INIT(-1)
+
+static inline bool uid_eq(kuid_t left, kuid_t right)
+{
+	return __kuid_val(left) == __kuid_val(right);
+}
+
+static inline bool gid_eq(kgid_t left, kgid_t right)
+{
+	return __kgid_val(left) == __kgid_val(right);
+}
+
+static inline bool uid_gt(kuid_t left, kuid_t right)
+{
+	return __kuid_val(left) > __kuid_val(right);
+}
+
+static inline bool gid_gt(kgid_t left, kgid_t right)
+{
+	return __kgid_val(left) > __kgid_val(right);
+}
+
+static inline bool uid_gte(kuid_t left, kuid_t right)
+{
+	return __kuid_val(left) >= __kuid_val(right);
+}
+
+static inline bool gid_gte(kgid_t left, kgid_t right)
+{
+	return __kgid_val(left) >= __kgid_val(right);
+}
+
+static inline bool uid_lt(kuid_t left, kuid_t right)
+{
+	return __kuid_val(left) < __kuid_val(right);
+}
+
+static inline bool gid_lt(kgid_t left, kgid_t right)
+{
+	return __kgid_val(left) < __kgid_val(right);
+}
+
+static inline bool uid_lte(kuid_t left, kuid_t right)
+{
+	return __kuid_val(left) <= __kuid_val(right);
+}
+
+static inline bool gid_lte(kgid_t left, kgid_t right)
+{
+	return __kgid_val(left) <= __kgid_val(right);
+}
+
+static inline bool uid_valid(kuid_t uid)
+{
+	return !uid_eq(uid, INVALID_UID);
+}
+
+static inline bool gid_valid(kgid_t gid)
+{
+	return !gid_eq(gid, INVALID_GID);
+}
+
+static inline kuid_t make_kuid(struct user_namespace *from, uid_t uid)
+{
+	return KUIDT_INIT(uid);
+}
+
+static inline kgid_t make_kgid(struct user_namespace *from, gid_t gid)
+{
+	return KGIDT_INIT(gid);
+}
+
+static inline uid_t from_kuid(struct user_namespace *to, kuid_t kuid)
+{
+	return __kuid_val(kuid);
+}
+
+static inline gid_t from_kgid(struct user_namespace *to, kgid_t kgid)
+{
+	return __kgid_val(kgid);
+}
+
+static inline uid_t from_kuid_munged(struct user_namespace *to, kuid_t kuid)
+{
+	uid_t uid = from_kuid(to, kuid);
+	if (uid == (uid_t)-1)
+		uid = overflowuid;
+	return uid;
+}
+
+static inline gid_t from_kgid_munged(struct user_namespace *to, kgid_t kgid)
+{
+	gid_t gid = from_kgid(to, kgid);
+	if (gid == (gid_t)-1)
+		gid = overflowgid;
+	return gid;
+}
+
+static inline bool kuid_has_mapping(struct user_namespace *ns, kuid_t uid)
+{
+	return true;
+}
+
+static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid)
+{
+	return true;
+}
+
+#endif /* _LINUX_UIDGID_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 82d8714..7185241 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -93,10 +93,8 @@
 int overflowuid = DEFAULT_OVERFLOWUID;
 int overflowgid = DEFAULT_OVERFLOWGID;
 
-#ifdef CONFIG_UID16
 EXPORT_SYMBOL(overflowuid);
 EXPORT_SYMBOL(overflowgid);
-#endif
 
 /*
  * the same as above, but for filesystems which can only store a 16-bit
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 13/43] userns: Add kuid_t and kgid_t and associated infrastructure in uidgid.h
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Start distinguishing between internal kernel uids and gids and
values that userspace can use.  This is done by introducing two
new types: kuid_t and kgid_t.  These types and their associated
functions are infrastructure are declared in the new header
uidgid.h.

Ultimately there will be a different implementation of the mapping
functions for use with user namespaces.  But to keep it simple
we introduce the mapping functions first to separate the meat
from the mechanical code conversions.

Export overflowuid and overflowgid so we can use from_kuid_munged
and from_kgid_munged in modular code.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/uidgid.h |  176 ++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sys.c           |    2 -
 2 files changed, 176 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/uidgid.h

diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
new file mode 100644
index 0000000..a0addb8
--- /dev/null
+++ b/include/linux/uidgid.h
@@ -0,0 +1,176 @@
+#ifndef _LINUX_UIDGID_H
+#define _LINUX_UIDGID_H
+
+/*
+ * A set of types for the internal kernel types representing uids and gids.
+ *
+ * The types defined in this header allow distinguishing which uids and gids in
+ * the kernel are values used by userspace and which uid and gid values are
+ * the internal kernel values.  With the addition of user namespaces the values
+ * can be different.  Using the type system makes it possible for the compiler
+ * to detect when we overlook these differences.
+ *
+ */
+#include <linux/types.h>
+#include <linux/highuid.h>
+
+struct user_namespace;
+extern struct user_namespace init_user_ns;
+
+#if defined(NOTYET)
+
+typedef struct {
+	uid_t val;
+} kuid_t;
+
+
+typedef struct {
+	gid_t val;
+} kgid_t;
+
+#define KUIDT_INIT(value) (kuid_t){ value }
+#define KGIDT_INIT(value) (kgid_t){ value }
+
+static inline uid_t __kuid_val(kuid_t uid)
+{
+	return uid.val;
+}
+
+static inline gid_t __kgid_val(kgid_t gid)
+{
+	return gid.val;
+}
+
+#else
+
+typedef uid_t kuid_t;
+typedef gid_t kgid_t;
+
+static inline uid_t __kuid_val(kuid_t uid)
+{
+	return uid;
+}
+
+static inline gid_t __kgid_val(kgid_t gid)
+{
+	return gid;
+}
+
+#define KUIDT_INIT(value) ((kuid_t) value )
+#define KGIDT_INIT(value) ((kgid_t) value )
+
+#endif
+
+#define GLOBAL_ROOT_UID KUIDT_INIT(0)
+#define GLOBAL_ROOT_GID KGIDT_INIT(0)
+
+#define INVALID_UID KUIDT_INIT(-1)
+#define INVALID_GID KGIDT_INIT(-1)
+
+static inline bool uid_eq(kuid_t left, kuid_t right)
+{
+	return __kuid_val(left) == __kuid_val(right);
+}
+
+static inline bool gid_eq(kgid_t left, kgid_t right)
+{
+	return __kgid_val(left) == __kgid_val(right);
+}
+
+static inline bool uid_gt(kuid_t left, kuid_t right)
+{
+	return __kuid_val(left) > __kuid_val(right);
+}
+
+static inline bool gid_gt(kgid_t left, kgid_t right)
+{
+	return __kgid_val(left) > __kgid_val(right);
+}
+
+static inline bool uid_gte(kuid_t left, kuid_t right)
+{
+	return __kuid_val(left) >= __kuid_val(right);
+}
+
+static inline bool gid_gte(kgid_t left, kgid_t right)
+{
+	return __kgid_val(left) >= __kgid_val(right);
+}
+
+static inline bool uid_lt(kuid_t left, kuid_t right)
+{
+	return __kuid_val(left) < __kuid_val(right);
+}
+
+static inline bool gid_lt(kgid_t left, kgid_t right)
+{
+	return __kgid_val(left) < __kgid_val(right);
+}
+
+static inline bool uid_lte(kuid_t left, kuid_t right)
+{
+	return __kuid_val(left) <= __kuid_val(right);
+}
+
+static inline bool gid_lte(kgid_t left, kgid_t right)
+{
+	return __kgid_val(left) <= __kgid_val(right);
+}
+
+static inline bool uid_valid(kuid_t uid)
+{
+	return !uid_eq(uid, INVALID_UID);
+}
+
+static inline bool gid_valid(kgid_t gid)
+{
+	return !gid_eq(gid, INVALID_GID);
+}
+
+static inline kuid_t make_kuid(struct user_namespace *from, uid_t uid)
+{
+	return KUIDT_INIT(uid);
+}
+
+static inline kgid_t make_kgid(struct user_namespace *from, gid_t gid)
+{
+	return KGIDT_INIT(gid);
+}
+
+static inline uid_t from_kuid(struct user_namespace *to, kuid_t kuid)
+{
+	return __kuid_val(kuid);
+}
+
+static inline gid_t from_kgid(struct user_namespace *to, kgid_t kgid)
+{
+	return __kgid_val(kgid);
+}
+
+static inline uid_t from_kuid_munged(struct user_namespace *to, kuid_t kuid)
+{
+	uid_t uid = from_kuid(to, kuid);
+	if (uid == (uid_t)-1)
+		uid = overflowuid;
+	return uid;
+}
+
+static inline gid_t from_kgid_munged(struct user_namespace *to, kgid_t kgid)
+{
+	gid_t gid = from_kgid(to, kgid);
+	if (gid == (gid_t)-1)
+		gid = overflowgid;
+	return gid;
+}
+
+static inline bool kuid_has_mapping(struct user_namespace *ns, kuid_t uid)
+{
+	return true;
+}
+
+static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid)
+{
+	return true;
+}
+
+#endif /* _LINUX_UIDGID_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 82d8714..7185241 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -93,10 +93,8 @@
 int overflowuid = DEFAULT_OVERFLOWUID;
 int overflowgid = DEFAULT_OVERFLOWGID;
 
-#ifdef CONFIG_UID16
 EXPORT_SYMBOL(overflowuid);
 EXPORT_SYMBOL(overflowgid);
-#endif
 
 /*
  * the same as above, but for filesystems which can only store a 16-bit
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 14/43] userns: Add a Kconfig option to enforce strict kuid and kgid type checks
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Make it possible to easily switch between strong mandatory
type checks and relaxed type checks so that the code can
easily be tested with the type checks and then built
with the strong type checks disabled so the resulting
code can be used.

Require strong mandatory type checks when enabling the user namespace.
It is very simple to make a typo and use the wrong type allowing
conversions to/from userspace values to be bypassed by accident,
the strong type checks prevent this.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/uidgid.h |    2 +-
 init/Kconfig           |   12 +++++++++++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
index a0addb8..5398568 100644
--- a/include/linux/uidgid.h
+++ b/include/linux/uidgid.h
@@ -17,7 +17,7 @@
 struct user_namespace;
 extern struct user_namespace init_user_ns;
 
-#if defined(NOTYET)
+#ifdef CONFIG_UIDGID_STRICT_TYPE_CHECKS
 
 typedef struct {
 	uid_t val;
diff --git a/init/Kconfig b/init/Kconfig
index 72f33fa..86cf760 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -828,7 +828,8 @@ config IPC_NS
 config USER_NS
 	bool "User namespace (EXPERIMENTAL)"
 	depends on EXPERIMENTAL
-	default y
+	select UIDGID_STRICT_TYPE_CHECKS
+	default n
 	help
 	  This allows containers, i.e. vservers, to use user namespaces
 	  to provide different user info for different servers.
@@ -852,6 +853,15 @@ config NET_NS
 
 endif # NAMESPACES
 
+config UIDGID_STRICT_TYPE_CHECKS
+	bool "Require conversions between uid/gids and their internal representation"
+	default n
+	help
+	 While the nececessary conversions are being added to all subsystems this option allows
+	 the code to continue to build for unconverted subsystems.
+
+	 Say Y here if you want the strict type checking enabled
+
 config SCHED_AUTOGROUP
 	bool "Automatic process group scheduling"
 	select EVENTFD
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 14/43] userns: Add a Kconfig option to enforce strict kuid and kgid type checks
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Make it possible to easily switch between strong mandatory
type checks and relaxed type checks so that the code can
easily be tested with the type checks and then built
with the strong type checks disabled so the resulting
code can be used.

Require strong mandatory type checks when enabling the user namespace.
It is very simple to make a typo and use the wrong type allowing
conversions to/from userspace values to be bypassed by accident,
the strong type checks prevent this.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/uidgid.h |    2 +-
 init/Kconfig           |   12 +++++++++++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
index a0addb8..5398568 100644
--- a/include/linux/uidgid.h
+++ b/include/linux/uidgid.h
@@ -17,7 +17,7 @@
 struct user_namespace;
 extern struct user_namespace init_user_ns;
 
-#if defined(NOTYET)
+#ifdef CONFIG_UIDGID_STRICT_TYPE_CHECKS
 
 typedef struct {
 	uid_t val;
diff --git a/init/Kconfig b/init/Kconfig
index 72f33fa..86cf760 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -828,7 +828,8 @@ config IPC_NS
 config USER_NS
 	bool "User namespace (EXPERIMENTAL)"
 	depends on EXPERIMENTAL
-	default y
+	select UIDGID_STRICT_TYPE_CHECKS
+	default n
 	help
 	  This allows containers, i.e. vservers, to use user namespaces
 	  to provide different user info for different servers.
@@ -852,6 +853,15 @@ config NET_NS
 
 endif # NAMESPACES
 
+config UIDGID_STRICT_TYPE_CHECKS
+	bool "Require conversions between uid/gids and their internal representation"
+	default n
+	help
+	 While the nececessary conversions are being added to all subsystems this option allows
+	 the code to continue to build for unconverted subsystems.
+
+	 Say Y here if you want the strict type checking enabled
+
 config SCHED_AUTOGROUP
 	bool "Automatic process group scheduling"
 	select EVENTFD
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 14/43] userns: Add a Kconfig option to enforce strict kuid and kgid type checks
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Make it possible to easily switch between strong mandatory
type checks and relaxed type checks so that the code can
easily be tested with the type checks and then built
with the strong type checks disabled so the resulting
code can be used.

Require strong mandatory type checks when enabling the user namespace.
It is very simple to make a typo and use the wrong type allowing
conversions to/from userspace values to be bypassed by accident,
the strong type checks prevent this.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/uidgid.h |    2 +-
 init/Kconfig           |   12 +++++++++++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
index a0addb8..5398568 100644
--- a/include/linux/uidgid.h
+++ b/include/linux/uidgid.h
@@ -17,7 +17,7 @@
 struct user_namespace;
 extern struct user_namespace init_user_ns;
 
-#if defined(NOTYET)
+#ifdef CONFIG_UIDGID_STRICT_TYPE_CHECKS
 
 typedef struct {
 	uid_t val;
diff --git a/init/Kconfig b/init/Kconfig
index 72f33fa..86cf760 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -828,7 +828,8 @@ config IPC_NS
 config USER_NS
 	bool "User namespace (EXPERIMENTAL)"
 	depends on EXPERIMENTAL
-	default y
+	select UIDGID_STRICT_TYPE_CHECKS
+	default n
 	help
 	  This allows containers, i.e. vservers, to use user namespaces
 	  to provide different user info for different servers.
@@ -852,6 +853,15 @@ config NET_NS
 
 endif # NAMESPACES
 
+config UIDGID_STRICT_TYPE_CHECKS
+	bool "Require conversions between uid/gids and their internal representation"
+	default n
+	help
+	 While the nececessary conversions are being added to all subsystems this option allows
+	 the code to continue to build for unconverted subsystems.
+
+	 Say Y here if you want the strict type checking enabled
+
 config SCHED_AUTOGROUP
 	bool "Automatic process group scheduling"
 	select EVENTFD
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 15/43] userns: Disassociate user_struct from the user_namespace.
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Modify alloc_uid to take a kuid and make the user hash table global.
Stop holding a reference to the user namespace in struct user_struct.

This simplifies the code and makes the per user accounting not
care about which user namespace a uid happens to appear in.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/ioprio.c                    |   18 ++++++++++++++----
 include/linux/sched.h          |    8 ++++----
 include/linux/user_namespace.h |    4 ----
 kernel/sys.c                   |   34 +++++++++++++++++++++++-----------
 kernel/user.c                  |   28 +++++++++++++---------------
 kernel/user_namespace.c        |    6 +-----
 6 files changed, 55 insertions(+), 43 deletions(-)

diff --git a/fs/ioprio.c b/fs/ioprio.c
index 0f1b951..8e35e96 100644
--- a/fs/ioprio.c
+++ b/fs/ioprio.c
@@ -65,6 +65,7 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
 	struct task_struct *p, *g;
 	struct user_struct *user;
 	struct pid *pgrp;
+	kuid_t uid;
 	int ret;
 
 	switch (class) {
@@ -110,16 +111,21 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case IOPRIO_WHO_USER:
+			uid = make_kuid(current_user_ns(), who);
+			if (!uid_valid(uid))
+				break;
 			if (!who)
 				user = current_user();
 			else
-				user = find_user(who);
+				user = find_user(uid);
 
 			if (!user)
 				break;
 
 			do_each_thread(g, p) {
-				if (__task_cred(p)->uid != who)
+				const struct cred *tcred = __task_cred(p);
+				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
+				if (!uid_eq(tcred_uid, uid))
 					continue;
 				ret = set_task_ioprio(p, ioprio);
 				if (ret)
@@ -174,6 +180,7 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
 	struct task_struct *g, *p;
 	struct user_struct *user;
 	struct pid *pgrp;
+	kuid_t uid;
 	int ret = -ESRCH;
 	int tmpio;
 
@@ -203,16 +210,19 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case IOPRIO_WHO_USER:
+			uid = make_kuid(current_user_ns(), who);
 			if (!who)
 				user = current_user();
 			else
-				user = find_user(who);
+				user = find_user(uid);
 
 			if (!user)
 				break;
 
 			do_each_thread(g, p) {
-				if (__task_cred(p)->uid != user->uid)
+				const struct cred *tcred = __task_cred(p);
+				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
+				if (!uid_eq(tcred_uid, user->uid))
 					continue;
 				tmpio = get_task_ioprio(p);
 				if (tmpio < 0)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6867ae9..5fdc1eb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -90,6 +90,7 @@ struct sched_param {
 #include <linux/latencytop.h>
 #include <linux/cred.h>
 #include <linux/llist.h>
+#include <linux/uidgid.h>
 
 #include <asm/processor.h>
 
@@ -728,8 +729,7 @@ struct user_struct {
 
 	/* Hash table maintenance information */
 	struct hlist_node uidhash_node;
-	uid_t uid;
-	struct user_namespace *_user_ns; /* Don't use will be removed soon */
+	kuid_t uid;
 
 #ifdef CONFIG_PERF_EVENTS
 	atomic_long_t locked_vm;
@@ -738,7 +738,7 @@ struct user_struct {
 
 extern int uids_sysfs_init(void);
 
-extern struct user_struct *find_user(uid_t);
+extern struct user_struct *find_user(kuid_t);
 
 extern struct user_struct root_user;
 #define INIT_USER (&root_user)
@@ -2177,7 +2177,7 @@ extern struct task_struct *find_task_by_pid_ns(pid_t nr,
 extern void __set_special_pids(struct pid *pid);
 
 /* per-UID process charging. */
-extern struct user_struct * alloc_uid(struct user_namespace *, uid_t);
+extern struct user_struct * alloc_uid(kuid_t);
 static inline struct user_struct *get_uid(struct user_struct *u)
 {
 	atomic_inc(&u->__count);
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index dc2d85a..d767508 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -6,12 +6,8 @@
 #include <linux/sched.h>
 #include <linux/err.h>
 
-#define UIDHASH_BITS	(CONFIG_BASE_SMALL ? 3 : 7)
-#define UIDHASH_SZ	(1 << UIDHASH_BITS)
-
 struct user_namespace {
 	struct kref		kref;
-	struct hlist_head	uidhash_table[UIDHASH_SZ];
 	struct user_namespace	*parent;
 	struct user_struct	*creator;
 	struct work_struct	destroyer;
diff --git a/kernel/sys.c b/kernel/sys.c
index 7185241..f0c43b4 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -175,6 +175,8 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 	const struct cred *cred = current_cred();
 	int error = -EINVAL;
 	struct pid *pgrp;
+	kuid_t cred_uid;
+	kuid_t uid;
 
 	if (which > PRIO_USER || which < PRIO_PROCESS)
 		goto out;
@@ -207,18 +209,22 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
+			cred_uid = make_kuid(cred->user_ns, cred->uid);
+			uid = make_kuid(cred->user_ns, who);
 			user = cred->user;
 			if (!who)
-				who = cred->uid;
-			else if ((who != cred->uid) &&
-				 !(user = find_user(who)))
+				uid = cred_uid;
+			else if (!uid_eq(uid, cred_uid) &&
+				 !(user = find_user(uid)))
 				goto out_unlock;	/* No processes for this user */
 
 			do_each_thread(g, p) {
-				if (__task_cred(p)->uid == who)
+				const struct cred *tcred = __task_cred(p);
+				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
+				if (uid_eq(tcred_uid, uid))
 					error = set_one_prio(p, niceval, error);
 			} while_each_thread(g, p);
-			if (who != cred->uid)
+			if (!uid_eq(uid, cred_uid))
 				free_uid(user);		/* For find_user() */
 			break;
 	}
@@ -242,6 +248,8 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 	const struct cred *cred = current_cred();
 	long niceval, retval = -ESRCH;
 	struct pid *pgrp;
+	kuid_t cred_uid;
+	kuid_t uid;
 
 	if (which > PRIO_USER || which < PRIO_PROCESS)
 		return -EINVAL;
@@ -272,21 +280,25 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
+			cred_uid = make_kuid(cred->user_ns, cred->uid);
+			uid = make_kuid(cred->user_ns, who);
 			user = cred->user;
 			if (!who)
-				who = cred->uid;
-			else if ((who != cred->uid) &&
-				 !(user = find_user(who)))
+				uid = cred_uid;
+			else if (!uid_eq(uid, cred_uid) &&
+				 !(user = find_user(uid)))
 				goto out_unlock;	/* No processes for this user */
 
 			do_each_thread(g, p) {
-				if (__task_cred(p)->uid == who) {
+				const struct cred *tcred = __task_cred(p);
+				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
+				if (uid_eq(tcred_uid, uid)) {
 					niceval = 20 - task_nice(p);
 					if (niceval > retval)
 						retval = niceval;
 				}
 			} while_each_thread(g, p);
-			if (who != cred->uid)
+			if (!uid_eq(uid, cred_uid))
 				free_uid(user);		/* for find_user() */
 			break;
 	}
@@ -629,7 +641,7 @@ static int set_user(struct cred *new)
 {
 	struct user_struct *new_user;
 
-	new_user = alloc_uid(current_user_ns(), new->uid);
+	new_user = alloc_uid(make_kuid(new->user_ns, new->uid));
 	if (!new_user)
 		return -EAGAIN;
 
diff --git a/kernel/user.c b/kernel/user.c
index d65fec0..025077e 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -34,11 +34,14 @@ EXPORT_SYMBOL_GPL(init_user_ns);
  * when changing user ID's (ie setuid() and friends).
  */
 
+#define UIDHASH_BITS	(CONFIG_BASE_SMALL ? 3 : 7)
+#define UIDHASH_SZ	(1 << UIDHASH_BITS)
 #define UIDHASH_MASK		(UIDHASH_SZ - 1)
 #define __uidhashfn(uid)	(((uid >> UIDHASH_BITS) + uid) & UIDHASH_MASK)
-#define uidhashentry(ns, uid)	((ns)->uidhash_table + __uidhashfn((uid)))
+#define uidhashentry(uid)	(uidhash_table + __uidhashfn((__kuid_val(uid))))
 
 static struct kmem_cache *uid_cachep;
+struct hlist_head uidhash_table[UIDHASH_SZ];
 
 /*
  * The uidhash_lock is mostly taken from process context, but it is
@@ -58,7 +61,7 @@ struct user_struct root_user = {
 	.files		= ATOMIC_INIT(0),
 	.sigpending	= ATOMIC_INIT(0),
 	.locked_shm     = 0,
-	._user_ns	= &init_user_ns,
+	.uid		= GLOBAL_ROOT_UID,
 };
 
 /*
@@ -72,16 +75,15 @@ static void uid_hash_insert(struct user_struct *up, struct hlist_head *hashent)
 static void uid_hash_remove(struct user_struct *up)
 {
 	hlist_del_init(&up->uidhash_node);
-	put_user_ns(up->_user_ns); /* It is safe to free the uid hash table now */
 }
 
-static struct user_struct *uid_hash_find(uid_t uid, struct hlist_head *hashent)
+static struct user_struct *uid_hash_find(kuid_t uid, struct hlist_head *hashent)
 {
 	struct user_struct *user;
 	struct hlist_node *h;
 
 	hlist_for_each_entry(user, h, hashent, uidhash_node) {
-		if (user->uid == uid) {
+		if (uid_eq(user->uid, uid)) {
 			atomic_inc(&user->__count);
 			return user;
 		}
@@ -110,14 +112,13 @@ static void free_user(struct user_struct *up, unsigned long flags)
  *
  * If the user_struct could not be found, return NULL.
  */
-struct user_struct *find_user(uid_t uid)
+struct user_struct *find_user(kuid_t uid)
 {
 	struct user_struct *ret;
 	unsigned long flags;
-	struct user_namespace *ns = current_user_ns();
 
 	spin_lock_irqsave(&uidhash_lock, flags);
-	ret = uid_hash_find(uid, uidhashentry(ns, uid));
+	ret = uid_hash_find(uid, uidhashentry(uid));
 	spin_unlock_irqrestore(&uidhash_lock, flags);
 	return ret;
 }
@@ -136,9 +137,9 @@ void free_uid(struct user_struct *up)
 		local_irq_restore(flags);
 }
 
-struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
+struct user_struct *alloc_uid(kuid_t uid)
 {
-	struct hlist_head *hashent = uidhashentry(ns, uid);
+	struct hlist_head *hashent = uidhashentry(uid);
 	struct user_struct *up, *new;
 
 	spin_lock_irq(&uidhash_lock);
@@ -153,8 +154,6 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 		new->uid = uid;
 		atomic_set(&new->__count, 1);
 
-		new->_user_ns = get_user_ns(ns);
-
 		/*
 		 * Before adding this, check whether we raced
 		 * on adding the same user already..
@@ -162,7 +161,6 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 		spin_lock_irq(&uidhash_lock);
 		up = uid_hash_find(uid, hashent);
 		if (up) {
-			put_user_ns(ns);
 			key_put(new->uid_keyring);
 			key_put(new->session_keyring);
 			kmem_cache_free(uid_cachep, new);
@@ -187,11 +185,11 @@ static int __init uid_cache_init(void)
 			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
 
 	for(n = 0; n < UIDHASH_SZ; ++n)
-		INIT_HLIST_HEAD(init_user_ns.uidhash_table + n);
+		INIT_HLIST_HEAD(uidhash_table + n);
 
 	/* Insert the root user immediately (init already runs as root) */
 	spin_lock_irq(&uidhash_lock);
-	uid_hash_insert(&root_user, uidhashentry(&init_user_ns, 0));
+	uid_hash_insert(&root_user, uidhashentry(GLOBAL_ROOT_UID));
 	spin_unlock_irq(&uidhash_lock);
 
 	return 0;
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index e216e1e..898e973 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -27,7 +27,6 @@ int create_user_ns(struct cred *new)
 {
 	struct user_namespace *ns, *parent_ns = new->user_ns;
 	struct user_struct *root_user;
-	int n;
 
 	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
 	if (!ns)
@@ -35,11 +34,8 @@ int create_user_ns(struct cred *new)
 
 	kref_init(&ns->kref);
 
-	for (n = 0; n < UIDHASH_SZ; ++n)
-		INIT_HLIST_HEAD(ns->uidhash_table + n);
-
 	/* Alloc new root user.  */
-	root_user = alloc_uid(ns, 0);
+	root_user = alloc_uid(make_kuid(ns, 0));
 	if (!root_user) {
 		kmem_cache_free(user_ns_cachep, ns);
 		return -ENOMEM;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 15/43] userns: Disassociate user_struct from the user_namespace.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Modify alloc_uid to take a kuid and make the user hash table global.
Stop holding a reference to the user namespace in struct user_struct.

This simplifies the code and makes the per user accounting not
care about which user namespace a uid happens to appear in.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/ioprio.c                    |   18 ++++++++++++++----
 include/linux/sched.h          |    8 ++++----
 include/linux/user_namespace.h |    4 ----
 kernel/sys.c                   |   34 +++++++++++++++++++++++-----------
 kernel/user.c                  |   28 +++++++++++++---------------
 kernel/user_namespace.c        |    6 +-----
 6 files changed, 55 insertions(+), 43 deletions(-)

diff --git a/fs/ioprio.c b/fs/ioprio.c
index 0f1b951..8e35e96 100644
--- a/fs/ioprio.c
+++ b/fs/ioprio.c
@@ -65,6 +65,7 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
 	struct task_struct *p, *g;
 	struct user_struct *user;
 	struct pid *pgrp;
+	kuid_t uid;
 	int ret;
 
 	switch (class) {
@@ -110,16 +111,21 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case IOPRIO_WHO_USER:
+			uid = make_kuid(current_user_ns(), who);
+			if (!uid_valid(uid))
+				break;
 			if (!who)
 				user = current_user();
 			else
-				user = find_user(who);
+				user = find_user(uid);
 
 			if (!user)
 				break;
 
 			do_each_thread(g, p) {
-				if (__task_cred(p)->uid != who)
+				const struct cred *tcred = __task_cred(p);
+				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
+				if (!uid_eq(tcred_uid, uid))
 					continue;
 				ret = set_task_ioprio(p, ioprio);
 				if (ret)
@@ -174,6 +180,7 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
 	struct task_struct *g, *p;
 	struct user_struct *user;
 	struct pid *pgrp;
+	kuid_t uid;
 	int ret = -ESRCH;
 	int tmpio;
 
@@ -203,16 +210,19 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case IOPRIO_WHO_USER:
+			uid = make_kuid(current_user_ns(), who);
 			if (!who)
 				user = current_user();
 			else
-				user = find_user(who);
+				user = find_user(uid);
 
 			if (!user)
 				break;
 
 			do_each_thread(g, p) {
-				if (__task_cred(p)->uid != user->uid)
+				const struct cred *tcred = __task_cred(p);
+				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
+				if (!uid_eq(tcred_uid, user->uid))
 					continue;
 				tmpio = get_task_ioprio(p);
 				if (tmpio < 0)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6867ae9..5fdc1eb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -90,6 +90,7 @@ struct sched_param {
 #include <linux/latencytop.h>
 #include <linux/cred.h>
 #include <linux/llist.h>
+#include <linux/uidgid.h>
 
 #include <asm/processor.h>
 
@@ -728,8 +729,7 @@ struct user_struct {
 
 	/* Hash table maintenance information */
 	struct hlist_node uidhash_node;
-	uid_t uid;
-	struct user_namespace *_user_ns; /* Don't use will be removed soon */
+	kuid_t uid;
 
 #ifdef CONFIG_PERF_EVENTS
 	atomic_long_t locked_vm;
@@ -738,7 +738,7 @@ struct user_struct {
 
 extern int uids_sysfs_init(void);
 
-extern struct user_struct *find_user(uid_t);
+extern struct user_struct *find_user(kuid_t);
 
 extern struct user_struct root_user;
 #define INIT_USER (&root_user)
@@ -2177,7 +2177,7 @@ extern struct task_struct *find_task_by_pid_ns(pid_t nr,
 extern void __set_special_pids(struct pid *pid);
 
 /* per-UID process charging. */
-extern struct user_struct * alloc_uid(struct user_namespace *, uid_t);
+extern struct user_struct * alloc_uid(kuid_t);
 static inline struct user_struct *get_uid(struct user_struct *u)
 {
 	atomic_inc(&u->__count);
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index dc2d85a..d767508 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -6,12 +6,8 @@
 #include <linux/sched.h>
 #include <linux/err.h>
 
-#define UIDHASH_BITS	(CONFIG_BASE_SMALL ? 3 : 7)
-#define UIDHASH_SZ	(1 << UIDHASH_BITS)
-
 struct user_namespace {
 	struct kref		kref;
-	struct hlist_head	uidhash_table[UIDHASH_SZ];
 	struct user_namespace	*parent;
 	struct user_struct	*creator;
 	struct work_struct	destroyer;
diff --git a/kernel/sys.c b/kernel/sys.c
index 7185241..f0c43b4 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -175,6 +175,8 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 	const struct cred *cred = current_cred();
 	int error = -EINVAL;
 	struct pid *pgrp;
+	kuid_t cred_uid;
+	kuid_t uid;
 
 	if (which > PRIO_USER || which < PRIO_PROCESS)
 		goto out;
@@ -207,18 +209,22 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
+			cred_uid = make_kuid(cred->user_ns, cred->uid);
+			uid = make_kuid(cred->user_ns, who);
 			user = cred->user;
 			if (!who)
-				who = cred->uid;
-			else if ((who != cred->uid) &&
-				 !(user = find_user(who)))
+				uid = cred_uid;
+			else if (!uid_eq(uid, cred_uid) &&
+				 !(user = find_user(uid)))
 				goto out_unlock;	/* No processes for this user */
 
 			do_each_thread(g, p) {
-				if (__task_cred(p)->uid == who)
+				const struct cred *tcred = __task_cred(p);
+				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
+				if (uid_eq(tcred_uid, uid))
 					error = set_one_prio(p, niceval, error);
 			} while_each_thread(g, p);
-			if (who != cred->uid)
+			if (!uid_eq(uid, cred_uid))
 				free_uid(user);		/* For find_user() */
 			break;
 	}
@@ -242,6 +248,8 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 	const struct cred *cred = current_cred();
 	long niceval, retval = -ESRCH;
 	struct pid *pgrp;
+	kuid_t cred_uid;
+	kuid_t uid;
 
 	if (which > PRIO_USER || which < PRIO_PROCESS)
 		return -EINVAL;
@@ -272,21 +280,25 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
+			cred_uid = make_kuid(cred->user_ns, cred->uid);
+			uid = make_kuid(cred->user_ns, who);
 			user = cred->user;
 			if (!who)
-				who = cred->uid;
-			else if ((who != cred->uid) &&
-				 !(user = find_user(who)))
+				uid = cred_uid;
+			else if (!uid_eq(uid, cred_uid) &&
+				 !(user = find_user(uid)))
 				goto out_unlock;	/* No processes for this user */
 
 			do_each_thread(g, p) {
-				if (__task_cred(p)->uid == who) {
+				const struct cred *tcred = __task_cred(p);
+				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
+				if (uid_eq(tcred_uid, uid)) {
 					niceval = 20 - task_nice(p);
 					if (niceval > retval)
 						retval = niceval;
 				}
 			} while_each_thread(g, p);
-			if (who != cred->uid)
+			if (!uid_eq(uid, cred_uid))
 				free_uid(user);		/* for find_user() */
 			break;
 	}
@@ -629,7 +641,7 @@ static int set_user(struct cred *new)
 {
 	struct user_struct *new_user;
 
-	new_user = alloc_uid(current_user_ns(), new->uid);
+	new_user = alloc_uid(make_kuid(new->user_ns, new->uid));
 	if (!new_user)
 		return -EAGAIN;
 
diff --git a/kernel/user.c b/kernel/user.c
index d65fec0..025077e 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -34,11 +34,14 @@ EXPORT_SYMBOL_GPL(init_user_ns);
  * when changing user ID's (ie setuid() and friends).
  */
 
+#define UIDHASH_BITS	(CONFIG_BASE_SMALL ? 3 : 7)
+#define UIDHASH_SZ	(1 << UIDHASH_BITS)
 #define UIDHASH_MASK		(UIDHASH_SZ - 1)
 #define __uidhashfn(uid)	(((uid >> UIDHASH_BITS) + uid) & UIDHASH_MASK)
-#define uidhashentry(ns, uid)	((ns)->uidhash_table + __uidhashfn((uid)))
+#define uidhashentry(uid)	(uidhash_table + __uidhashfn((__kuid_val(uid))))
 
 static struct kmem_cache *uid_cachep;
+struct hlist_head uidhash_table[UIDHASH_SZ];
 
 /*
  * The uidhash_lock is mostly taken from process context, but it is
@@ -58,7 +61,7 @@ struct user_struct root_user = {
 	.files		= ATOMIC_INIT(0),
 	.sigpending	= ATOMIC_INIT(0),
 	.locked_shm     = 0,
-	._user_ns	= &init_user_ns,
+	.uid		= GLOBAL_ROOT_UID,
 };
 
 /*
@@ -72,16 +75,15 @@ static void uid_hash_insert(struct user_struct *up, struct hlist_head *hashent)
 static void uid_hash_remove(struct user_struct *up)
 {
 	hlist_del_init(&up->uidhash_node);
-	put_user_ns(up->_user_ns); /* It is safe to free the uid hash table now */
 }
 
-static struct user_struct *uid_hash_find(uid_t uid, struct hlist_head *hashent)
+static struct user_struct *uid_hash_find(kuid_t uid, struct hlist_head *hashent)
 {
 	struct user_struct *user;
 	struct hlist_node *h;
 
 	hlist_for_each_entry(user, h, hashent, uidhash_node) {
-		if (user->uid == uid) {
+		if (uid_eq(user->uid, uid)) {
 			atomic_inc(&user->__count);
 			return user;
 		}
@@ -110,14 +112,13 @@ static void free_user(struct user_struct *up, unsigned long flags)
  *
  * If the user_struct could not be found, return NULL.
  */
-struct user_struct *find_user(uid_t uid)
+struct user_struct *find_user(kuid_t uid)
 {
 	struct user_struct *ret;
 	unsigned long flags;
-	struct user_namespace *ns = current_user_ns();
 
 	spin_lock_irqsave(&uidhash_lock, flags);
-	ret = uid_hash_find(uid, uidhashentry(ns, uid));
+	ret = uid_hash_find(uid, uidhashentry(uid));
 	spin_unlock_irqrestore(&uidhash_lock, flags);
 	return ret;
 }
@@ -136,9 +137,9 @@ void free_uid(struct user_struct *up)
 		local_irq_restore(flags);
 }
 
-struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
+struct user_struct *alloc_uid(kuid_t uid)
 {
-	struct hlist_head *hashent = uidhashentry(ns, uid);
+	struct hlist_head *hashent = uidhashentry(uid);
 	struct user_struct *up, *new;
 
 	spin_lock_irq(&uidhash_lock);
@@ -153,8 +154,6 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 		new->uid = uid;
 		atomic_set(&new->__count, 1);
 
-		new->_user_ns = get_user_ns(ns);
-
 		/*
 		 * Before adding this, check whether we raced
 		 * on adding the same user already..
@@ -162,7 +161,6 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 		spin_lock_irq(&uidhash_lock);
 		up = uid_hash_find(uid, hashent);
 		if (up) {
-			put_user_ns(ns);
 			key_put(new->uid_keyring);
 			key_put(new->session_keyring);
 			kmem_cache_free(uid_cachep, new);
@@ -187,11 +185,11 @@ static int __init uid_cache_init(void)
 			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
 
 	for(n = 0; n < UIDHASH_SZ; ++n)
-		INIT_HLIST_HEAD(init_user_ns.uidhash_table + n);
+		INIT_HLIST_HEAD(uidhash_table + n);
 
 	/* Insert the root user immediately (init already runs as root) */
 	spin_lock_irq(&uidhash_lock);
-	uid_hash_insert(&root_user, uidhashentry(&init_user_ns, 0));
+	uid_hash_insert(&root_user, uidhashentry(GLOBAL_ROOT_UID));
 	spin_unlock_irq(&uidhash_lock);
 
 	return 0;
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index e216e1e..898e973 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -27,7 +27,6 @@ int create_user_ns(struct cred *new)
 {
 	struct user_namespace *ns, *parent_ns = new->user_ns;
 	struct user_struct *root_user;
-	int n;
 
 	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
 	if (!ns)
@@ -35,11 +34,8 @@ int create_user_ns(struct cred *new)
 
 	kref_init(&ns->kref);
 
-	for (n = 0; n < UIDHASH_SZ; ++n)
-		INIT_HLIST_HEAD(ns->uidhash_table + n);
-
 	/* Alloc new root user.  */
-	root_user = alloc_uid(ns, 0);
+	root_user = alloc_uid(make_kuid(ns, 0));
 	if (!root_user) {
 		kmem_cache_free(user_ns_cachep, ns);
 		return -ENOMEM;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 15/43] userns: Disassociate user_struct from the user_namespace.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Modify alloc_uid to take a kuid and make the user hash table global.
Stop holding a reference to the user namespace in struct user_struct.

This simplifies the code and makes the per user accounting not
care about which user namespace a uid happens to appear in.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/ioprio.c                    |   18 ++++++++++++++----
 include/linux/sched.h          |    8 ++++----
 include/linux/user_namespace.h |    4 ----
 kernel/sys.c                   |   34 +++++++++++++++++++++++-----------
 kernel/user.c                  |   28 +++++++++++++---------------
 kernel/user_namespace.c        |    6 +-----
 6 files changed, 55 insertions(+), 43 deletions(-)

diff --git a/fs/ioprio.c b/fs/ioprio.c
index 0f1b951..8e35e96 100644
--- a/fs/ioprio.c
+++ b/fs/ioprio.c
@@ -65,6 +65,7 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
 	struct task_struct *p, *g;
 	struct user_struct *user;
 	struct pid *pgrp;
+	kuid_t uid;
 	int ret;
 
 	switch (class) {
@@ -110,16 +111,21 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case IOPRIO_WHO_USER:
+			uid = make_kuid(current_user_ns(), who);
+			if (!uid_valid(uid))
+				break;
 			if (!who)
 				user = current_user();
 			else
-				user = find_user(who);
+				user = find_user(uid);
 
 			if (!user)
 				break;
 
 			do_each_thread(g, p) {
-				if (__task_cred(p)->uid != who)
+				const struct cred *tcred = __task_cred(p);
+				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
+				if (!uid_eq(tcred_uid, uid))
 					continue;
 				ret = set_task_ioprio(p, ioprio);
 				if (ret)
@@ -174,6 +180,7 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
 	struct task_struct *g, *p;
 	struct user_struct *user;
 	struct pid *pgrp;
+	kuid_t uid;
 	int ret = -ESRCH;
 	int tmpio;
 
@@ -203,16 +210,19 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case IOPRIO_WHO_USER:
+			uid = make_kuid(current_user_ns(), who);
 			if (!who)
 				user = current_user();
 			else
-				user = find_user(who);
+				user = find_user(uid);
 
 			if (!user)
 				break;
 
 			do_each_thread(g, p) {
-				if (__task_cred(p)->uid != user->uid)
+				const struct cred *tcred = __task_cred(p);
+				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
+				if (!uid_eq(tcred_uid, user->uid))
 					continue;
 				tmpio = get_task_ioprio(p);
 				if (tmpio < 0)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6867ae9..5fdc1eb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -90,6 +90,7 @@ struct sched_param {
 #include <linux/latencytop.h>
 #include <linux/cred.h>
 #include <linux/llist.h>
+#include <linux/uidgid.h>
 
 #include <asm/processor.h>
 
@@ -728,8 +729,7 @@ struct user_struct {
 
 	/* Hash table maintenance information */
 	struct hlist_node uidhash_node;
-	uid_t uid;
-	struct user_namespace *_user_ns; /* Don't use will be removed soon */
+	kuid_t uid;
 
 #ifdef CONFIG_PERF_EVENTS
 	atomic_long_t locked_vm;
@@ -738,7 +738,7 @@ struct user_struct {
 
 extern int uids_sysfs_init(void);
 
-extern struct user_struct *find_user(uid_t);
+extern struct user_struct *find_user(kuid_t);
 
 extern struct user_struct root_user;
 #define INIT_USER (&root_user)
@@ -2177,7 +2177,7 @@ extern struct task_struct *find_task_by_pid_ns(pid_t nr,
 extern void __set_special_pids(struct pid *pid);
 
 /* per-UID process charging. */
-extern struct user_struct * alloc_uid(struct user_namespace *, uid_t);
+extern struct user_struct * alloc_uid(kuid_t);
 static inline struct user_struct *get_uid(struct user_struct *u)
 {
 	atomic_inc(&u->__count);
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index dc2d85a..d767508 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -6,12 +6,8 @@
 #include <linux/sched.h>
 #include <linux/err.h>
 
-#define UIDHASH_BITS	(CONFIG_BASE_SMALL ? 3 : 7)
-#define UIDHASH_SZ	(1 << UIDHASH_BITS)
-
 struct user_namespace {
 	struct kref		kref;
-	struct hlist_head	uidhash_table[UIDHASH_SZ];
 	struct user_namespace	*parent;
 	struct user_struct	*creator;
 	struct work_struct	destroyer;
diff --git a/kernel/sys.c b/kernel/sys.c
index 7185241..f0c43b4 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -175,6 +175,8 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 	const struct cred *cred = current_cred();
 	int error = -EINVAL;
 	struct pid *pgrp;
+	kuid_t cred_uid;
+	kuid_t uid;
 
 	if (which > PRIO_USER || which < PRIO_PROCESS)
 		goto out;
@@ -207,18 +209,22 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
+			cred_uid = make_kuid(cred->user_ns, cred->uid);
+			uid = make_kuid(cred->user_ns, who);
 			user = cred->user;
 			if (!who)
-				who = cred->uid;
-			else if ((who != cred->uid) &&
-				 !(user = find_user(who)))
+				uid = cred_uid;
+			else if (!uid_eq(uid, cred_uid) &&
+				 !(user = find_user(uid)))
 				goto out_unlock;	/* No processes for this user */
 
 			do_each_thread(g, p) {
-				if (__task_cred(p)->uid == who)
+				const struct cred *tcred = __task_cred(p);
+				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
+				if (uid_eq(tcred_uid, uid))
 					error = set_one_prio(p, niceval, error);
 			} while_each_thread(g, p);
-			if (who != cred->uid)
+			if (!uid_eq(uid, cred_uid))
 				free_uid(user);		/* For find_user() */
 			break;
 	}
@@ -242,6 +248,8 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 	const struct cred *cred = current_cred();
 	long niceval, retval = -ESRCH;
 	struct pid *pgrp;
+	kuid_t cred_uid;
+	kuid_t uid;
 
 	if (which > PRIO_USER || which < PRIO_PROCESS)
 		return -EINVAL;
@@ -272,21 +280,25 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
+			cred_uid = make_kuid(cred->user_ns, cred->uid);
+			uid = make_kuid(cred->user_ns, who);
 			user = cred->user;
 			if (!who)
-				who = cred->uid;
-			else if ((who != cred->uid) &&
-				 !(user = find_user(who)))
+				uid = cred_uid;
+			else if (!uid_eq(uid, cred_uid) &&
+				 !(user = find_user(uid)))
 				goto out_unlock;	/* No processes for this user */
 
 			do_each_thread(g, p) {
-				if (__task_cred(p)->uid == who) {
+				const struct cred *tcred = __task_cred(p);
+				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
+				if (uid_eq(tcred_uid, uid)) {
 					niceval = 20 - task_nice(p);
 					if (niceval > retval)
 						retval = niceval;
 				}
 			} while_each_thread(g, p);
-			if (who != cred->uid)
+			if (!uid_eq(uid, cred_uid))
 				free_uid(user);		/* for find_user() */
 			break;
 	}
@@ -629,7 +641,7 @@ static int set_user(struct cred *new)
 {
 	struct user_struct *new_user;
 
-	new_user = alloc_uid(current_user_ns(), new->uid);
+	new_user = alloc_uid(make_kuid(new->user_ns, new->uid));
 	if (!new_user)
 		return -EAGAIN;
 
diff --git a/kernel/user.c b/kernel/user.c
index d65fec0..025077e 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -34,11 +34,14 @@ EXPORT_SYMBOL_GPL(init_user_ns);
  * when changing user ID's (ie setuid() and friends).
  */
 
+#define UIDHASH_BITS	(CONFIG_BASE_SMALL ? 3 : 7)
+#define UIDHASH_SZ	(1 << UIDHASH_BITS)
 #define UIDHASH_MASK		(UIDHASH_SZ - 1)
 #define __uidhashfn(uid)	(((uid >> UIDHASH_BITS) + uid) & UIDHASH_MASK)
-#define uidhashentry(ns, uid)	((ns)->uidhash_table + __uidhashfn((uid)))
+#define uidhashentry(uid)	(uidhash_table + __uidhashfn((__kuid_val(uid))))
 
 static struct kmem_cache *uid_cachep;
+struct hlist_head uidhash_table[UIDHASH_SZ];
 
 /*
  * The uidhash_lock is mostly taken from process context, but it is
@@ -58,7 +61,7 @@ struct user_struct root_user = {
 	.files		= ATOMIC_INIT(0),
 	.sigpending	= ATOMIC_INIT(0),
 	.locked_shm     = 0,
-	._user_ns	= &init_user_ns,
+	.uid		= GLOBAL_ROOT_UID,
 };
 
 /*
@@ -72,16 +75,15 @@ static void uid_hash_insert(struct user_struct *up, struct hlist_head *hashent)
 static void uid_hash_remove(struct user_struct *up)
 {
 	hlist_del_init(&up->uidhash_node);
-	put_user_ns(up->_user_ns); /* It is safe to free the uid hash table now */
 }
 
-static struct user_struct *uid_hash_find(uid_t uid, struct hlist_head *hashent)
+static struct user_struct *uid_hash_find(kuid_t uid, struct hlist_head *hashent)
 {
 	struct user_struct *user;
 	struct hlist_node *h;
 
 	hlist_for_each_entry(user, h, hashent, uidhash_node) {
-		if (user->uid == uid) {
+		if (uid_eq(user->uid, uid)) {
 			atomic_inc(&user->__count);
 			return user;
 		}
@@ -110,14 +112,13 @@ static void free_user(struct user_struct *up, unsigned long flags)
  *
  * If the user_struct could not be found, return NULL.
  */
-struct user_struct *find_user(uid_t uid)
+struct user_struct *find_user(kuid_t uid)
 {
 	struct user_struct *ret;
 	unsigned long flags;
-	struct user_namespace *ns = current_user_ns();
 
 	spin_lock_irqsave(&uidhash_lock, flags);
-	ret = uid_hash_find(uid, uidhashentry(ns, uid));
+	ret = uid_hash_find(uid, uidhashentry(uid));
 	spin_unlock_irqrestore(&uidhash_lock, flags);
 	return ret;
 }
@@ -136,9 +137,9 @@ void free_uid(struct user_struct *up)
 		local_irq_restore(flags);
 }
 
-struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
+struct user_struct *alloc_uid(kuid_t uid)
 {
-	struct hlist_head *hashent = uidhashentry(ns, uid);
+	struct hlist_head *hashent = uidhashentry(uid);
 	struct user_struct *up, *new;
 
 	spin_lock_irq(&uidhash_lock);
@@ -153,8 +154,6 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 		new->uid = uid;
 		atomic_set(&new->__count, 1);
 
-		new->_user_ns = get_user_ns(ns);
-
 		/*
 		 * Before adding this, check whether we raced
 		 * on adding the same user already..
@@ -162,7 +161,6 @@ struct user_struct *alloc_uid(struct user_namespace *ns, uid_t uid)
 		spin_lock_irq(&uidhash_lock);
 		up = uid_hash_find(uid, hashent);
 		if (up) {
-			put_user_ns(ns);
 			key_put(new->uid_keyring);
 			key_put(new->session_keyring);
 			kmem_cache_free(uid_cachep, new);
@@ -187,11 +185,11 @@ static int __init uid_cache_init(void)
 			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
 
 	for(n = 0; n < UIDHASH_SZ; ++n)
-		INIT_HLIST_HEAD(init_user_ns.uidhash_table + n);
+		INIT_HLIST_HEAD(uidhash_table + n);
 
 	/* Insert the root user immediately (init already runs as root) */
 	spin_lock_irq(&uidhash_lock);
-	uid_hash_insert(&root_user, uidhashentry(&init_user_ns, 0));
+	uid_hash_insert(&root_user, uidhashentry(GLOBAL_ROOT_UID));
 	spin_unlock_irq(&uidhash_lock);
 
 	return 0;
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index e216e1e..898e973 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -27,7 +27,6 @@ int create_user_ns(struct cred *new)
 {
 	struct user_namespace *ns, *parent_ns = new->user_ns;
 	struct user_struct *root_user;
-	int n;
 
 	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
 	if (!ns)
@@ -35,11 +34,8 @@ int create_user_ns(struct cred *new)
 
 	kref_init(&ns->kref);
 
-	for (n = 0; n < UIDHASH_SZ; ++n)
-		INIT_HLIST_HEAD(ns->uidhash_table + n);
-
 	/* Alloc new root user.  */
-	root_user = alloc_uid(ns, 0);
+	root_user = alloc_uid(make_kuid(ns, 0));
 	if (!root_user) {
 		kmem_cache_free(user_ns_cachep, ns);
 		return -ENOMEM;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

- Transform userns->creator from a user_struct reference to a simple
  kuid_t, kgid_t pair.

  In cap_capable this allows the check to see if we are the creator of
  a namespace to become the classic suser style euid permission check.

  This allows us to remove the need for a struct cred in the mapping
  functions and still be able to dispaly the user namespace creators
  uid and gid as 0.

- Remove the now unnecessary delayed_work in free_user_ns.

  All that is left for free_user_ns to do is to call kmem_cache_free
  and put_user_ns.  Those functions can be called in any context
  so call them directly from free_user_ns removing the need for delayed work.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/user_namespace.h |    4 ++--
 kernel/user.c                  |    7 ++++---
 kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
 security/commoncap.c           |    5 +++--
 4 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index d767508..8a391bd 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -9,8 +9,8 @@
 struct user_namespace {
 	struct kref		kref;
 	struct user_namespace	*parent;
-	struct user_struct	*creator;
-	struct work_struct	destroyer;
+	kuid_t			owner;
+	kgid_t			group;
 };
 
 extern struct user_namespace init_user_ns;
diff --git a/kernel/user.c b/kernel/user.c
index 025077e..cff3856 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
 	.kref = {
 		.refcount	= ATOMIC_INIT(3),
 	},
-	.creator = &root_user,
+	.owner = GLOBAL_ROOT_UID,
+	.group = GLOBAL_ROOT_GID,
 };
 EXPORT_SYMBOL_GPL(init_user_ns);
 
@@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
  */
 static DEFINE_SPINLOCK(uidhash_lock);
 
-/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
+/* root_user.__count is 1, for init task cred */
 struct user_struct root_user = {
-	.__count	= ATOMIC_INIT(2),
+	.__count	= ATOMIC_INIT(1),
 	.processes	= ATOMIC_INIT(1),
 	.files		= ATOMIC_INIT(0),
 	.sigpending	= ATOMIC_INIT(0),
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 898e973..f69741a 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
 {
 	struct user_namespace *ns, *parent_ns = new->user_ns;
 	struct user_struct *root_user;
+	kuid_t owner = make_kuid(new->user_ns, new->euid);
+	kgid_t group = make_kgid(new->user_ns, new->egid);
+
+	/* The creator needs a mapping in the parent user namespace
+	 * or else we won't be able to reasonably tell userspace who
+	 * created a user_namespace.
+	 */
+	if (!kuid_has_mapping(parent_ns, owner) ||
+	    !kgid_has_mapping(parent_ns, group))
+		return -EPERM;
 
 	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
 	if (!ns)
@@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
 
 	/* set the new root user in the credentials under preparation */
 	ns->parent = parent_ns;
-	ns->creator = new->user;
+	ns->owner = owner;
+	ns->group = group;
+	free_uid(new->user);
 	new->user = root_user;
 	new->uid = new->euid = new->suid = new->fsuid = 0;
 	new->gid = new->egid = new->sgid = new->fsgid = 0;
@@ -69,29 +81,15 @@ int create_user_ns(struct cred *new)
 	return 0;
 }
 
-/*
- * Deferred destructor for a user namespace.  This is required because
- * free_user_ns() may be called with uidhash_lock held, but we need to call
- * back to free_uid() which will want to take the lock again.
- */
-static void free_user_ns_work(struct work_struct *work)
+void free_user_ns(struct kref *kref)
 {
 	struct user_namespace *parent, *ns =
-		container_of(work, struct user_namespace, destroyer);
+		container_of(kref, struct user_namespace, kref);
+
 	parent = ns->parent;
-	free_uid(ns->creator);
 	kmem_cache_free(user_ns_cachep, ns);
 	put_user_ns(parent);
 }
-
-void free_user_ns(struct kref *kref)
-{
-	struct user_namespace *ns =
-		container_of(kref, struct user_namespace, kref);
-
-	INIT_WORK(&ns->destroyer, free_user_ns_work);
-	schedule_work(&ns->destroyer);
-}
 EXPORT_SYMBOL(free_user_ns);
 
 uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid)
@@ -101,12 +99,11 @@ uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t
 	if (likely(to == cred->user_ns))
 		return uid;
 
-
 	/* Is cred->user the creator of the target user_ns
 	 * or the creator of one of it's parents?
 	 */
 	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
-		if (cred->user == tmp->creator) {
+		if (uid_eq(cred->user->uid, tmp->owner)) {
 			return (uid_t)0;
 		}
 	}
@@ -126,7 +123,7 @@ gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t
 	 * or the creator of one of it's parents?
 	 */
 	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
-		if (cred->user == tmp->creator) {
+		if (uid_eq(cred->user->uid, tmp->owner)) {
 			return (gid_t)0;
 		}
 	}
diff --git a/security/commoncap.c b/security/commoncap.c
index 435d074..f2399d8 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -76,8 +76,9 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
 		int cap, int audit)
 {
 	for (;;) {
-		/* The creator of the user namespace has all caps. */
-		if (targ_ns != &init_user_ns && targ_ns->creator == cred->user)
+		/* The owner of the user namespace has all caps. */
+		if (targ_ns != &init_user_ns && uid_eq(targ_ns->owner,
+						       make_kuid(cred->user_ns, cred->euid)))
 			return 0;
 
 		/* Do we have the necessary capabilities? */
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

- Transform userns->creator from a user_struct reference to a simple
  kuid_t, kgid_t pair.

  In cap_capable this allows the check to see if we are the creator of
  a namespace to become the classic suser style euid permission check.

  This allows us to remove the need for a struct cred in the mapping
  functions and still be able to dispaly the user namespace creators
  uid and gid as 0.

- Remove the now unnecessary delayed_work in free_user_ns.

  All that is left for free_user_ns to do is to call kmem_cache_free
  and put_user_ns.  Those functions can be called in any context
  so call them directly from free_user_ns removing the need for delayed work.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/user_namespace.h |    4 ++--
 kernel/user.c                  |    7 ++++---
 kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
 security/commoncap.c           |    5 +++--
 4 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index d767508..8a391bd 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -9,8 +9,8 @@
 struct user_namespace {
 	struct kref		kref;
 	struct user_namespace	*parent;
-	struct user_struct	*creator;
-	struct work_struct	destroyer;
+	kuid_t			owner;
+	kgid_t			group;
 };
 
 extern struct user_namespace init_user_ns;
diff --git a/kernel/user.c b/kernel/user.c
index 025077e..cff3856 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
 	.kref = {
 		.refcount	= ATOMIC_INIT(3),
 	},
-	.creator = &root_user,
+	.owner = GLOBAL_ROOT_UID,
+	.group = GLOBAL_ROOT_GID,
 };
 EXPORT_SYMBOL_GPL(init_user_ns);
 
@@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
  */
 static DEFINE_SPINLOCK(uidhash_lock);
 
-/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
+/* root_user.__count is 1, for init task cred */
 struct user_struct root_user = {
-	.__count	= ATOMIC_INIT(2),
+	.__count	= ATOMIC_INIT(1),
 	.processes	= ATOMIC_INIT(1),
 	.files		= ATOMIC_INIT(0),
 	.sigpending	= ATOMIC_INIT(0),
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 898e973..f69741a 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
 {
 	struct user_namespace *ns, *parent_ns = new->user_ns;
 	struct user_struct *root_user;
+	kuid_t owner = make_kuid(new->user_ns, new->euid);
+	kgid_t group = make_kgid(new->user_ns, new->egid);
+
+	/* The creator needs a mapping in the parent user namespace
+	 * or else we won't be able to reasonably tell userspace who
+	 * created a user_namespace.
+	 */
+	if (!kuid_has_mapping(parent_ns, owner) ||
+	    !kgid_has_mapping(parent_ns, group))
+		return -EPERM;
 
 	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
 	if (!ns)
@@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
 
 	/* set the new root user in the credentials under preparation */
 	ns->parent = parent_ns;
-	ns->creator = new->user;
+	ns->owner = owner;
+	ns->group = group;
+	free_uid(new->user);
 	new->user = root_user;
 	new->uid = new->euid = new->suid = new->fsuid = 0;
 	new->gid = new->egid = new->sgid = new->fsgid = 0;
@@ -69,29 +81,15 @@ int create_user_ns(struct cred *new)
 	return 0;
 }
 
-/*
- * Deferred destructor for a user namespace.  This is required because
- * free_user_ns() may be called with uidhash_lock held, but we need to call
- * back to free_uid() which will want to take the lock again.
- */
-static void free_user_ns_work(struct work_struct *work)
+void free_user_ns(struct kref *kref)
 {
 	struct user_namespace *parent, *ns =
-		container_of(work, struct user_namespace, destroyer);
+		container_of(kref, struct user_namespace, kref);
+
 	parent = ns->parent;
-	free_uid(ns->creator);
 	kmem_cache_free(user_ns_cachep, ns);
 	put_user_ns(parent);
 }
-
-void free_user_ns(struct kref *kref)
-{
-	struct user_namespace *ns =
-		container_of(kref, struct user_namespace, kref);
-
-	INIT_WORK(&ns->destroyer, free_user_ns_work);
-	schedule_work(&ns->destroyer);
-}
 EXPORT_SYMBOL(free_user_ns);
 
 uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid)
@@ -101,12 +99,11 @@ uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t
 	if (likely(to == cred->user_ns))
 		return uid;
 
-
 	/* Is cred->user the creator of the target user_ns
 	 * or the creator of one of it's parents?
 	 */
 	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
-		if (cred->user == tmp->creator) {
+		if (uid_eq(cred->user->uid, tmp->owner)) {
 			return (uid_t)0;
 		}
 	}
@@ -126,7 +123,7 @@ gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t
 	 * or the creator of one of it's parents?
 	 */
 	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
-		if (cred->user == tmp->creator) {
+		if (uid_eq(cred->user->uid, tmp->owner)) {
 			return (gid_t)0;
 		}
 	}
diff --git a/security/commoncap.c b/security/commoncap.c
index 435d074..f2399d8 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -76,8 +76,9 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
 		int cap, int audit)
 {
 	for (;;) {
-		/* The creator of the user namespace has all caps. */
-		if (targ_ns != &init_user_ns && targ_ns->creator == cred->user)
+		/* The owner of the user namespace has all caps. */
+		if (targ_ns != &init_user_ns && uid_eq(targ_ns->owner,
+						       make_kuid(cred->user_ns, cred->euid)))
 			return 0;
 
 		/* Do we have the necessary capabilities? */
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

- Transform userns->creator from a user_struct reference to a simple
  kuid_t, kgid_t pair.

  In cap_capable this allows the check to see if we are the creator of
  a namespace to become the classic suser style euid permission check.

  This allows us to remove the need for a struct cred in the mapping
  functions and still be able to dispaly the user namespace creators
  uid and gid as 0.

- Remove the now unnecessary delayed_work in free_user_ns.

  All that is left for free_user_ns to do is to call kmem_cache_free
  and put_user_ns.  Those functions can be called in any context
  so call them directly from free_user_ns removing the need for delayed work.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/user_namespace.h |    4 ++--
 kernel/user.c                  |    7 ++++---
 kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
 security/commoncap.c           |    5 +++--
 4 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index d767508..8a391bd 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -9,8 +9,8 @@
 struct user_namespace {
 	struct kref		kref;
 	struct user_namespace	*parent;
-	struct user_struct	*creator;
-	struct work_struct	destroyer;
+	kuid_t			owner;
+	kgid_t			group;
 };
 
 extern struct user_namespace init_user_ns;
diff --git a/kernel/user.c b/kernel/user.c
index 025077e..cff3856 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
 	.kref = {
 		.refcount	= ATOMIC_INIT(3),
 	},
-	.creator = &root_user,
+	.owner = GLOBAL_ROOT_UID,
+	.group = GLOBAL_ROOT_GID,
 };
 EXPORT_SYMBOL_GPL(init_user_ns);
 
@@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
  */
 static DEFINE_SPINLOCK(uidhash_lock);
 
-/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
+/* root_user.__count is 1, for init task cred */
 struct user_struct root_user = {
-	.__count	= ATOMIC_INIT(2),
+	.__count	= ATOMIC_INIT(1),
 	.processes	= ATOMIC_INIT(1),
 	.files		= ATOMIC_INIT(0),
 	.sigpending	= ATOMIC_INIT(0),
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 898e973..f69741a 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
 {
 	struct user_namespace *ns, *parent_ns = new->user_ns;
 	struct user_struct *root_user;
+	kuid_t owner = make_kuid(new->user_ns, new->euid);
+	kgid_t group = make_kgid(new->user_ns, new->egid);
+
+	/* The creator needs a mapping in the parent user namespace
+	 * or else we won't be able to reasonably tell userspace who
+	 * created a user_namespace.
+	 */
+	if (!kuid_has_mapping(parent_ns, owner) ||
+	    !kgid_has_mapping(parent_ns, group))
+		return -EPERM;
 
 	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
 	if (!ns)
@@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
 
 	/* set the new root user in the credentials under preparation */
 	ns->parent = parent_ns;
-	ns->creator = new->user;
+	ns->owner = owner;
+	ns->group = group;
+	free_uid(new->user);
 	new->user = root_user;
 	new->uid = new->euid = new->suid = new->fsuid = 0;
 	new->gid = new->egid = new->sgid = new->fsgid = 0;
@@ -69,29 +81,15 @@ int create_user_ns(struct cred *new)
 	return 0;
 }
 
-/*
- * Deferred destructor for a user namespace.  This is required because
- * free_user_ns() may be called with uidhash_lock held, but we need to call
- * back to free_uid() which will want to take the lock again.
- */
-static void free_user_ns_work(struct work_struct *work)
+void free_user_ns(struct kref *kref)
 {
 	struct user_namespace *parent, *ns =
-		container_of(work, struct user_namespace, destroyer);
+		container_of(kref, struct user_namespace, kref);
+
 	parent = ns->parent;
-	free_uid(ns->creator);
 	kmem_cache_free(user_ns_cachep, ns);
 	put_user_ns(parent);
 }
-
-void free_user_ns(struct kref *kref)
-{
-	struct user_namespace *ns =
-		container_of(kref, struct user_namespace, kref);
-
-	INIT_WORK(&ns->destroyer, free_user_ns_work);
-	schedule_work(&ns->destroyer);
-}
 EXPORT_SYMBOL(free_user_ns);
 
 uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid)
@@ -101,12 +99,11 @@ uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t
 	if (likely(to == cred->user_ns))
 		return uid;
 
-
 	/* Is cred->user the creator of the target user_ns
 	 * or the creator of one of it's parents?
 	 */
 	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
-		if (cred->user == tmp->creator) {
+		if (uid_eq(cred->user->uid, tmp->owner)) {
 			return (uid_t)0;
 		}
 	}
@@ -126,7 +123,7 @@ gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t
 	 * or the creator of one of it's parents?
 	 */
 	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
-		if (cred->user == tmp->creator) {
+		if (uid_eq(cred->user->uid, tmp->owner)) {
 			return (gid_t)0;
 		}
 	}
diff --git a/security/commoncap.c b/security/commoncap.c
index 435d074..f2399d8 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -76,8 +76,9 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
 		int cap, int audit)
 {
 	for (;;) {
-		/* The creator of the user namespace has all caps. */
-		if (targ_ns != &init_user_ns && targ_ns->creator == cred->user)
+		/* The owner of the user namespace has all caps. */
+		if (targ_ns != &init_user_ns && uid_eq(targ_ns->owner,
+						       make_kuid(cred->user_ns, cred->euid)))
 			return 0;
 
 		/* Do we have the necessary capabilities? */
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 17/43] userns: Rework the user_namespace adding uid/gid mapping support
       [not found] ` <m11unyn70b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  2012-04-08  5:14     ` "Eric W. Beiderman
                       ` (44 subsequent siblings)
  45 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

- Convert the old uid mapping functions into compatibility wrappers
- Add a uid/gid mapping layer from user space uid and gids to kernel
  internal uids and gids that is extent based for simplicty and speed.
  * Working with number space after mapping uids/gids into their kernel
    internal version adds only mapping complexity over what we have today,
    leaving the kernel code easy to understand and test.
- Add proc files /proc/self/uid_map /proc/self/gid_map
  These files display the mapping and allow a mapping to be added
  if a mapping does not exist.
- Allow entering the user namespace without a uid or gid mapping.
  Since we are starting with an existing user our uids and gids
  still have global mappings so are still valid and useful they just don't
  have local mappings.  The requirement for things to work are global uid
  and gid so it is odd but perfectly fine not to have a local uid
  and gid mapping.
  Not requiring global uid and gid mappings greatly simplifies
  the logic of setting up the uid and gid mappings by allowing
  the mappings to be set after the namespace is created which makes the
  slight weirdness worth it.
- Make the mappings in the initial user namespace to the global
  uid/gid space explicit.  Today it is an identity mapping
  but in the future we may want to twist this for debugging, similar
  to what we do with jiffies.
- Document the memory ordering requirements of setting the uid and
  gid mappings.  We only allow the mappings to be set once
  and there are no pointers involved so the requirments are
  trivial but a little atypical.

Performance:

In this scheme for the permission checks the performance is expected to
stay the same as the actuall machine instructions should remain the same.

The worst case I could think of is ls -l on a large directory where
all of the stat results need to be translated with from kuids and
kgids to uids and gids.  So I benchmarked that case on my laptop
with a dual core hyperthread Intel i5-2520M cpu with 3M of cpu cache.

My benchmark consisted of going to single user mode where nothing else
was running. On an ext4 filesystem opening 1,000,000 files and looping
through all of the files 1000 times and calling fstat on the
individuals files.  This was to ensure I was benchmarking stat times
where the inodes were in the kernels cache, but the inode values were
not in the processors cache.  My results:

v3.4-rc1:         ~= 156ns (unmodified v3.4-rc1 with user namespace support disabled)
v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
v3.4-rc1-userns+: ~= 164ns (v3.4-rc1 with my user namespace patches and user namespace support enabled)

All of the configurations ran in roughly 120ns when I performed tests
that ran in the cpu cache.

So in summary the performance impact is:
1ns improvement in the worst case with user namespace support compiled out.
8ns aka 5% slowdown in the worst case with user namespace support compiled in.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/proc/base.c                 |   77 ++++++
 include/linux/uidgid.h         |   24 ++
 include/linux/user_namespace.h |   30 ++-
 kernel/user.c                  |   16 ++
 kernel/user_namespace.c        |  545 +++++++++++++++++++++++++++++++++++++---
 5 files changed, 644 insertions(+), 48 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1c8b280..2ee514c 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -81,6 +81,7 @@
 #include <linux/oom.h>
 #include <linux/elf.h>
 #include <linux/pid_namespace.h>
+#include <linux/user_namespace.h>
 #include <linux/fs_struct.h>
 #include <linux/slab.h>
 #include <linux/flex_array.h>
@@ -2943,6 +2944,74 @@ static int proc_tgid_io_accounting(struct task_struct *task, char *buffer)
 }
 #endif /* CONFIG_TASK_IO_ACCOUNTING */
 
+#ifdef CONFIG_USER_NS
+static int proc_id_map_open(struct inode *inode, struct file *file,
+	struct seq_operations *seq_ops)
+{
+	struct user_namespace *ns = NULL;
+	struct task_struct *task;
+	struct seq_file *seq;
+	int ret = -EINVAL;
+
+	task = get_proc_task(inode);
+	if (task) {
+		rcu_read_lock();
+		ns = get_user_ns(task_cred_xxx(task, user_ns));
+		rcu_read_unlock();
+		put_task_struct(task);
+	}
+	if (!ns)
+		goto err;
+
+	ret = seq_open(file, seq_ops);
+	if (ret)
+		goto err_put_ns;
+
+	seq = file->private_data;
+	seq->private = ns;
+
+	return 0;
+err_put_ns:
+	put_user_ns(ns);
+err:
+	return ret;
+}
+
+static int proc_id_map_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *seq = file->private_data;
+	struct user_namespace *ns = seq->private;
+	put_user_ns(ns);
+	return seq_release(inode, file);
+}
+
+static int proc_uid_map_open(struct inode *inode, struct file *file)
+{
+	return proc_id_map_open(inode, file, &proc_uid_seq_operations);
+}
+
+static int proc_gid_map_open(struct inode *inode, struct file *file)
+{
+	return proc_id_map_open(inode, file, &proc_gid_seq_operations);
+}
+
+static const struct file_operations proc_uid_map_operations = {
+	.open		= proc_uid_map_open,
+	.write		= proc_uid_map_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= proc_id_map_release,
+};
+
+static const struct file_operations proc_gid_map_operations = {
+	.open		= proc_gid_map_open,
+	.write		= proc_gid_map_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= proc_id_map_release,
+};
+#endif /* CONFIG_USER_NS */
+
 static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
 				struct pid *pid, struct task_struct *task)
 {
@@ -3045,6 +3114,10 @@ static const struct pid_entry tgid_base_stuff[] = {
 #ifdef CONFIG_HARDWALL
 	INF("hardwall",   S_IRUGO, proc_pid_hardwall),
 #endif
+#ifdef CONFIG_USER_NS
+	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
+	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
+#endif
 };
 
 static int proc_tgid_base_readdir(struct file * filp,
@@ -3400,6 +3473,10 @@ static const struct pid_entry tid_base_stuff[] = {
 #ifdef CONFIG_HARDWALL
 	INF("hardwall",   S_IRUGO, proc_pid_hardwall),
 #endif
+#ifdef CONFIG_USER_NS
+	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
+	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
+#endif
 };
 
 static int proc_tid_base_readdir(struct file * filp,
diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
index 5398568..8e522cbc 100644
--- a/include/linux/uidgid.h
+++ b/include/linux/uidgid.h
@@ -127,6 +127,28 @@ static inline bool gid_valid(kgid_t gid)
 	return !gid_eq(gid, INVALID_GID);
 }
 
+#ifdef CONFIG_USER_NS
+
+extern kuid_t make_kuid(struct user_namespace *from, uid_t uid);
+extern kgid_t make_kgid(struct user_namespace *from, gid_t gid);
+
+extern uid_t from_kuid(struct user_namespace *to, kuid_t uid);
+extern gid_t from_kgid(struct user_namespace *to, kgid_t gid);
+extern uid_t from_kuid_munged(struct user_namespace *to, kuid_t uid);
+extern gid_t from_kgid_munged(struct user_namespace *to, kgid_t gid);
+
+static inline bool kuid_has_mapping(struct user_namespace *ns, kuid_t uid)
+{
+	return from_kuid(ns, uid) != (uid_t) -1;
+}
+
+static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid)
+{
+	return from_kgid(ns, gid) != (gid_t) -1;
+}
+
+#else
+
 static inline kuid_t make_kuid(struct user_namespace *from, uid_t uid)
 {
 	return KUIDT_INIT(uid);
@@ -173,4 +195,6 @@ static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid)
 	return true;
 }
 
+#endif /* CONFIG_USER_NS */
+
 #endif /* _LINUX_UIDGID_H */
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 8a391bd..4c9846d 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -6,7 +6,20 @@
 #include <linux/sched.h>
 #include <linux/err.h>
 
+#define UID_GID_MAP_MAX_EXTENTS 5
+
+struct uid_gid_map {	/* 64 bytes -- 1 cache line */
+	u32 nr_extents;
+	struct uid_gid_extent {
+		u32 first;
+		u32 lower_first;
+		u32 count;
+	} extent[UID_GID_MAP_MAX_EXTENTS];
+};
+
 struct user_namespace {
+	struct uid_gid_map	uid_map;
+	struct uid_gid_map	gid_map;
 	struct kref		kref;
 	struct user_namespace	*parent;
 	kuid_t			owner;
@@ -33,9 +46,11 @@ static inline void put_user_ns(struct user_namespace *ns)
 		kref_put(&ns->kref, free_user_ns);
 }
 
-uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid);
-gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t gid);
-
+struct seq_operations;
+extern struct seq_operations proc_uid_seq_operations;
+extern struct seq_operations proc_gid_seq_operations;
+extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
+extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -52,17 +67,18 @@ static inline void put_user_ns(struct user_namespace *ns)
 {
 }
 
+#endif
+
 static inline uid_t user_ns_map_uid(struct user_namespace *to,
 	const struct cred *cred, uid_t uid)
 {
-	return uid;
+	return from_kuid_munged(to, make_kuid(cred->user_ns, uid));
 }
+
 static inline gid_t user_ns_map_gid(struct user_namespace *to,
 	const struct cred *cred, gid_t gid)
 {
-	return gid;
+	return from_kgid_munged(to, make_kgid(cred->user_ns, gid));
 }
 
-#endif
-
 #endif /* _LINUX_USER_H */
diff --git a/kernel/user.c b/kernel/user.c
index cff3856..f9e420e 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -22,6 +22,22 @@
  * and 1 for... ?
  */
 struct user_namespace init_user_ns = {
+	.uid_map = {
+		.nr_extents = 1,
+		.extent[0] = {
+			.first = 0,
+			.lower_first = 0,
+			.count = 4294967295,
+		},
+	},
+	.gid_map = {
+		.nr_extents = 1,
+		.extent[0] = {
+			.first = 0,
+			.lower_first = 0,
+			.count = 4294967295,
+		},
+	},
 	.kref = {
 		.refcount	= ATOMIC_INIT(3),
 	},
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index f69741a..9991bac 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -12,9 +12,19 @@
 #include <linux/highuid.h>
 #include <linux/cred.h>
 #include <linux/securebits.h>
+#include <linux/keyctl.h>
+#include <linux/key-type.h>
+#include <keys/user-type.h>
+#include <linux/seq_file.h>
+#include <linux/fs.h>
+#include <linux/uaccess.h>
+#include <linux/ctype.h>
 
 static struct kmem_cache *user_ns_cachep __read_mostly;
 
+static bool new_idmap_permitted(struct user_namespace *ns, int cap_setid,
+				struct uid_gid_map *map);
+
 /*
  * Create a new user namespace, deriving the creator from the user in the
  * passed credentials, and replacing that user with the new root user for the
@@ -26,7 +36,6 @@ static struct kmem_cache *user_ns_cachep __read_mostly;
 int create_user_ns(struct cred *new)
 {
 	struct user_namespace *ns, *parent_ns = new->user_ns;
-	struct user_struct *root_user;
 	kuid_t owner = make_kuid(new->user_ns, new->euid);
 	kgid_t group = make_kgid(new->user_ns, new->egid);
 
@@ -38,29 +47,15 @@ int create_user_ns(struct cred *new)
 	    !kgid_has_mapping(parent_ns, group))
 		return -EPERM;
 
-	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
+	ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL);
 	if (!ns)
 		return -ENOMEM;
 
 	kref_init(&ns->kref);
-
-	/* Alloc new root user.  */
-	root_user = alloc_uid(make_kuid(ns, 0));
-	if (!root_user) {
-		kmem_cache_free(user_ns_cachep, ns);
-		return -ENOMEM;
-	}
-
-	/* set the new root user in the credentials under preparation */
 	ns->parent = parent_ns;
 	ns->owner = owner;
 	ns->group = group;
-	free_uid(new->user);
-	new->user = root_user;
-	new->uid = new->euid = new->suid = new->fsuid = 0;
-	new->gid = new->egid = new->sgid = new->fsgid = 0;
-	put_group_info(new->group_info);
-	new->group_info = get_group_info(&init_groups);
+
 	/* Start with the same capabilities as init but useless for doing
 	 * anything as the capabilities are bound to the new user namespace.
 	 */
@@ -92,44 +87,512 @@ void free_user_ns(struct kref *kref)
 }
 EXPORT_SYMBOL(free_user_ns);
 
-uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid)
+static u32 map_id_range_down(struct uid_gid_map *map, u32 id, u32 count)
 {
-	struct user_namespace *tmp;
+	unsigned idx, extents;
+	u32 first, last, id2;
 
-	if (likely(to == cred->user_ns))
-		return uid;
+	id2 = id + count - 1;
 
-	/* Is cred->user the creator of the target user_ns
-	 * or the creator of one of it's parents?
-	 */
-	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
-		if (uid_eq(cred->user->uid, tmp->owner)) {
-			return (uid_t)0;
-		}
+	/* Find the matching extent */
+	extents = map->nr_extents;
+	smp_read_barrier_depends();
+	for (idx = 0; idx < extents; idx++) {
+		first = map->extent[idx].first;
+		last = first + map->extent[idx].count - 1;
+		if (id >= first && id <= last &&
+		    (id2 >= first && id2 <= last))
+			break;
+	}
+	/* Map the id or note failure */
+	if (idx < extents)
+		id = (id - first) + map->extent[idx].lower_first;
+	else
+		id = (u32) -1;
+
+	return id;
+}
+
+static u32 map_id_down(struct uid_gid_map *map, u32 id)
+{
+	unsigned idx, extents;
+	u32 first, last;
+
+	/* Find the matching extent */
+	extents = map->nr_extents;
+	smp_read_barrier_depends();
+	for (idx = 0; idx < extents; idx++) {
+		first = map->extent[idx].first;
+		last = first + map->extent[idx].count - 1;
+		if (id >= first && id <= last)
+			break;
+	}
+	/* Map the id or note failure */
+	if (idx < extents)
+		id = (id - first) + map->extent[idx].lower_first;
+	else
+		id = (u32) -1;
+
+	return id;
+}
+
+static u32 map_id_up(struct uid_gid_map *map, u32 id)
+{
+	unsigned idx, extents;
+	u32 first, last;
+
+	/* Find the matching extent */
+	extents = map->nr_extents;
+	smp_read_barrier_depends();
+	for (idx = 0; idx < extents; idx++) {
+		first = map->extent[idx].lower_first;
+		last = first + map->extent[idx].count - 1;
+		if (id >= first && id <= last)
+			break;
 	}
+	/* Map the id or note failure */
+	if (idx < extents)
+		id = (id - first) + map->extent[idx].first;
+	else
+		id = (u32) -1;
+
+	return id;
+}
+
+/**
+ *	make_kuid - Map a user-namespace uid pair into a kuid.
+ *	@ns:  User namespace that the uid is in
+ *	@uid: User identifier
+ *
+ *	Maps a user-namespace uid pair into a kernel internal kuid,
+ *	and returns that kuid.
+ *
+ *	When there is no mapping defined for the user-namespace uid
+ *	pair INVALID_UID is returned.  Callers are expected to test
+ *	for and handle handle INVALID_UID being returned.  INVALID_UID
+ *	may be tested for using uid_valid().
+ */
+kuid_t make_kuid(struct user_namespace *ns, uid_t uid)
+{
+	/* Map the uid to a global kernel uid */
+	return KUIDT_INIT(map_id_down(&ns->uid_map, uid));
+}
+EXPORT_SYMBOL(make_kuid);
+
+/**
+ *	from_kuid - Create a uid from a kuid user-namespace pair.
+ *	@targ: The user namespace we want a uid in.
+ *	@kuid: The kernel internal uid to start with.
+ *
+ *	Map @kuid into the user-namespace specified by @targ and
+ *	return the resulting uid.
+ *
+ *	There is always a mapping into the initial user_namespace.
+ *
+ *	If @kuid has no mapping in @targ (uid_t)-1 is returned.
+ */
+uid_t from_kuid(struct user_namespace *targ, kuid_t kuid)
+{
+	/* Map the uid from a global kernel uid */
+	return map_id_up(&targ->uid_map, __kuid_val(kuid));
+}
+EXPORT_SYMBOL(from_kuid);
+
+/**
+ *	from_kuid_munged - Create a uid from a kuid user-namespace pair.
+ *	@targ: The user namespace we want a uid in.
+ *	@kuid: The kernel internal uid to start with.
+ *
+ *	Map @kuid into the user-namespace specified by @targ and
+ *	return the resulting uid.
+ *
+ *	There is always a mapping into the initial user_namespace.
+ *
+ *	Unlike from_kuid from_kuid_munged never fails and always
+ *	returns a valid uid.  This makes from_kuid_munged appropriate
+ *	for use in syscalls like stat and getuid where failing the
+ *	system call and failing to provide a valid uid are not an
+ *	options.
+ *
+ *	If @kuid has no mapping in @targ overflowuid is returned.
+ */
+uid_t from_kuid_munged(struct user_namespace *targ, kuid_t kuid)
+{
+	uid_t uid;
+	uid = from_kuid(targ, kuid);
+
+	if (uid == (uid_t) -1)
+		uid = overflowuid;
+	return uid;
+}
+EXPORT_SYMBOL(from_kuid_munged);
+
+/**
+ *	make_kgid - Map a user-namespace gid pair into a kgid.
+ *	@ns:  User namespace that the gid is in
+ *	@uid: group identifier
+ *
+ *	Maps a user-namespace gid pair into a kernel internal kgid,
+ *	and returns that kgid.
+ *
+ *	When there is no mapping defined for the user-namespace gid
+ *	pair INVALID_GID is returned.  Callers are expected to test
+ *	for and handle INVALID_GID being returned.  INVALID_GID may be
+ *	tested for using gid_valid().
+ */
+kgid_t make_kgid(struct user_namespace *ns, gid_t gid)
+{
+	/* Map the gid to a global kernel gid */
+	return KGIDT_INIT(map_id_down(&ns->gid_map, gid));
+}
+EXPORT_SYMBOL(make_kgid);
+
+/**
+ *	from_kgid - Create a gid from a kgid user-namespace pair.
+ *	@targ: The user namespace we want a gid in.
+ *	@kgid: The kernel internal gid to start with.
+ *
+ *	Map @kgid into the user-namespace specified by @targ and
+ *	return the resulting gid.
+ *
+ *	There is always a mapping into the initial user_namespace.
+ *
+ *	If @kgid has no mapping in @targ (gid_t)-1 is returned.
+ */
+gid_t from_kgid(struct user_namespace *targ, kgid_t kgid)
+{
+	/* Map the gid from a global kernel gid */
+	return map_id_up(&targ->gid_map, __kgid_val(kgid));
+}
+EXPORT_SYMBOL(from_kgid);
+
+/**
+ *	from_kgid_munged - Create a gid from a kgid user-namespace pair.
+ *	@targ: The user namespace we want a gid in.
+ *	@kgid: The kernel internal gid to start with.
+ *
+ *	Map @kgid into the user-namespace specified by @targ and
+ *	return the resulting gid.
+ *
+ *	There is always a mapping into the initial user_namespace.
+ *
+ *	Unlike from_kgid from_kgid_munged never fails and always
+ *	returns a valid gid.  This makes from_kgid_munged appropriate
+ *	for use in syscalls like stat and getgid where failing the
+ *	system call and failing to provide a valid gid are not options.
+ *
+ *	If @kgid has no mapping in @targ overflowgid is returned.
+ */
+gid_t from_kgid_munged(struct user_namespace *targ, kgid_t kgid)
+{
+	gid_t gid;
+	gid = from_kgid(targ, kgid);
+
+	if (gid == (gid_t) -1)
+		gid = overflowgid;
+	return gid;
+}
+EXPORT_SYMBOL(from_kgid_munged);
+
+static int uid_m_show(struct seq_file *seq, void *v)
+{
+	struct user_namespace *ns = seq->private;
+	struct uid_gid_extent *extent = v;
+	struct user_namespace *lower_ns;
+	uid_t lower;
 
-	/* No useful relationship so no mapping */
-	return overflowuid;
+	lower_ns = current_user_ns();
+	if ((lower_ns == ns) && lower_ns->parent)
+		lower_ns = lower_ns->parent;
+
+	lower = from_kuid(lower_ns, KUIDT_INIT(extent->lower_first));
+
+	seq_printf(seq, "%10u %10u %10u\n",
+		extent->first,
+		lower,
+		extent->count);
+
+	return 0;
 }
 
-gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t gid)
+static int gid_m_show(struct seq_file *seq, void *v)
 {
-	struct user_namespace *tmp;
+	struct user_namespace *ns = seq->private;
+	struct uid_gid_extent *extent = v;
+	struct user_namespace *lower_ns;
+	gid_t lower;
 
-	if (likely(to == cred->user_ns))
-		return gid;
+	lower_ns = current_user_ns();
+	if ((lower_ns == ns) && lower_ns->parent)
+		lower_ns = lower_ns->parent;
 
-	/* Is cred->user the creator of the target user_ns
-	 * or the creator of one of it's parents?
+	lower = from_kgid(lower_ns, KGIDT_INIT(extent->lower_first));
+
+	seq_printf(seq, "%10u %10u %10u\n",
+		extent->first,
+		lower,
+		extent->count);
+
+	return 0;
+}
+
+static void *m_start(struct seq_file *seq, loff_t *ppos, struct uid_gid_map *map)
+{
+	struct uid_gid_extent *extent = NULL;
+	loff_t pos = *ppos;
+
+	if (pos < map->nr_extents)
+		extent = &map->extent[pos];
+
+	return extent;
+}
+
+static void *uid_m_start(struct seq_file *seq, loff_t *ppos)
+{
+	struct user_namespace *ns = seq->private;
+
+	return m_start(seq, ppos, &ns->uid_map);
+}
+
+static void *gid_m_start(struct seq_file *seq, loff_t *ppos)
+{
+	struct user_namespace *ns = seq->private;
+
+	return m_start(seq, ppos, &ns->gid_map);
+}
+
+static void *m_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	(*pos)++;
+	return seq->op->start(seq, pos);
+}
+
+static void m_stop(struct seq_file *seq, void *v)
+{
+	return;
+}
+
+struct seq_operations proc_uid_seq_operations = {
+	.start = uid_m_start,
+	.stop = m_stop,
+	.next = m_next,
+	.show = uid_m_show,
+};
+
+struct seq_operations proc_gid_seq_operations = {
+	.start = gid_m_start,
+	.stop = m_stop,
+	.next = m_next,
+	.show = gid_m_show,
+};
+
+static DEFINE_MUTEX(id_map_mutex);
+
+static ssize_t map_write(struct file *file, const char __user *buf,
+			 size_t count, loff_t *ppos,
+			 int cap_setid,
+			 struct uid_gid_map *map,
+			 struct uid_gid_map *parent_map)
+{
+	struct seq_file *seq = file->private_data;
+	struct user_namespace *ns = seq->private;
+	struct uid_gid_map new_map;
+	unsigned idx;
+	struct uid_gid_extent *extent, *last = NULL;
+	unsigned long page = 0;
+	char *kbuf, *pos, *next_line;
+	ssize_t ret = -EINVAL;
+
+	/*
+	 * The id_map_mutex serializes all writes to any given map.
+	 *
+	 * Any map is only ever written once.
+	 *
+	 * An id map fits within 1 cache line on most architectures.
+	 *
+	 * On read nothing needs to be done unless you are on an
+	 * architecture with a crazy cache coherency model like alpha.
+	 *
+	 * There is a one time data dependency between reading the
+	 * count of the extents and the values of the extents.  The
+	 * desired behavior is to see the values of the extents that
+	 * were written before the count of the extents.
+	 *
+	 * To achieve this smp_wmb() is used on guarantee the write
+	 * order and smp_read_barrier_depends() is guaranteed that we
+	 * don't have crazy architectures returning stale data.
+	 *
+	 */
+	mutex_lock(&id_map_mutex);
+
+	ret = -EPERM;
+	/* Only allow one successful write to the map */
+	if (map->nr_extents != 0)
+		goto out;
+
+	/* Require the appropriate privilege CAP_SETUID or CAP_SETGID
+	 * over the user namespace in order to set the id mapping.
 	 */
-	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
-		if (uid_eq(cred->user->uid, tmp->owner)) {
-			return (gid_t)0;
+	if (!ns_capable(ns, cap_setid))
+		goto out;
+
+	/* Get a buffer */
+	ret = -ENOMEM;
+	page = __get_free_page(GFP_TEMPORARY);
+	kbuf = (char *) page;
+	if (!page)
+		goto out;
+
+	/* Only allow <= page size writes at the beginning of the file */
+	ret = -EINVAL;
+	if ((*ppos != 0) || (count >= PAGE_SIZE))
+		goto out;
+
+	/* Slurp in the user data */
+	ret = -EFAULT;
+	if (copy_from_user(kbuf, buf, count))
+		goto out;
+	kbuf[count] = '\0';
+
+	/* Parse the user data */
+	ret = -EINVAL;
+	pos = kbuf;
+	new_map.nr_extents = 0;
+	for (;pos; pos = next_line) {
+		extent = &new_map.extent[new_map.nr_extents];
+
+		/* Find the end of line and ensure I don't look past it */
+		next_line = strchr(pos, '\n');
+		if (next_line) {
+			*next_line = '\0';
+			next_line++;
+			if (*next_line == '\0')
+				next_line = NULL;
 		}
+
+		pos = skip_spaces(pos);
+		extent->first = simple_strtoul(pos, &pos, 10);
+		if (!isspace(*pos))
+			goto out;
+
+		pos = skip_spaces(pos);
+		extent->lower_first = simple_strtoul(pos, &pos, 10);
+		if (!isspace(*pos))
+			goto out;
+
+		pos = skip_spaces(pos);
+		extent->count = simple_strtoul(pos, &pos, 10);
+		if (*pos && !isspace(*pos))
+			goto out;
+
+		/* Verify there is not trailing junk on the line */
+		pos = skip_spaces(pos);
+		if (*pos != '\0')
+			goto out;
+
+		/* Verify we have been given valid starting values */
+		if ((extent->first == (u32) -1) ||
+		    (extent->lower_first == (u32) -1 ))
+			goto out;
+
+		/* Verify count is not zero and does not cause the extent to wrap */
+		if ((extent->first + extent->count) <= extent->first)
+			goto out;
+		if ((extent->lower_first + extent->count) <= extent->lower_first)
+			goto out;
+
+		/* For now only accept extents that are strictly in order */
+		if (last &&
+		    (((last->first + last->count) > extent->first) ||
+		     ((last->lower_first + last->count) > extent->lower_first)))
+			goto out;
+
+		new_map.nr_extents++;
+		last = extent;
+
+		/* Fail if the file contains too many extents */
+		if ((new_map.nr_extents == UID_GID_MAP_MAX_EXTENTS) &&
+		    (next_line != NULL))
+			goto out;
 	}
+	/* Be very certaint the new map actually exists */
+	if (new_map.nr_extents == 0)
+		goto out;
+
+	ret = -EPERM;
+	/* Validate the user is allowed to use user id's mapped to. */
+	if (!new_idmap_permitted(ns, cap_setid, &new_map))
+		goto out;
+
+	/* Map the lower ids from the parent user namespace to the
+	 * kernel global id space.
+	 */
+	for (idx = 0; idx < new_map.nr_extents; idx++) {
+		u32 lower_first;
+		extent = &new_map.extent[idx];
+
+		lower_first = map_id_range_down(parent_map,
+						extent->lower_first,
+						extent->count);
+
+		/* Fail if we can not map the specified extent to
+		 * the kernel global id space.
+		 */
+		if (lower_first == (u32) -1)
+			goto out;
+
+		extent->lower_first = lower_first;
+	}
+
+	/* Install the map */
+	memcpy(map->extent, new_map.extent,
+		new_map.nr_extents*sizeof(new_map.extent[0]));
+	smp_wmb();
+	map->nr_extents = new_map.nr_extents;
+
+	*ppos = count;
+	ret = count;
+out:
+	mutex_unlock(&id_map_mutex);
+	if (page)
+		free_page(page);
+	return ret;
+}
+
+ssize_t proc_uid_map_write(struct file *file, const char __user *buf, size_t size, loff_t *ppos)
+{
+	struct seq_file *seq = file->private_data;
+	struct user_namespace *ns = seq->private;
+
+	if (!ns->parent)
+		return -EPERM;
+
+	return map_write(file, buf, size, ppos, CAP_SETUID,
+			 &ns->uid_map, &ns->parent->uid_map);
+}
+
+ssize_t proc_gid_map_write(struct file *file, const char __user *buf, size_t size, loff_t *ppos)
+{
+	struct seq_file *seq = file->private_data;
+	struct user_namespace *ns = seq->private;
+
+	if (!ns->parent)
+		return -EPERM;
+
+	return map_write(file, buf, size, ppos, CAP_SETGID,
+			 &ns->gid_map, &ns->parent->gid_map);
+}
+
+static bool new_idmap_permitted(struct user_namespace *ns, int cap_setid,
+				struct uid_gid_map *new_map)
+{
+	/* Allow the specified ids if we have the appropriate capability
+	 * (CAP_SETUID or CAP_SETGID) over the parent user namespace.
+	 */
+	if (ns_capable(ns->parent, cap_setid))
+		return true;
 
-	/* No useful relationship so no mapping */
-	return overflowgid;
+	return false;
 }
 
 static __init int user_namespaces_init(void)
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 17/43] userns: Rework the user_namespace adding uid/gid mapping support
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

- Convert the old uid mapping functions into compatibility wrappers
- Add a uid/gid mapping layer from user space uid and gids to kernel
  internal uids and gids that is extent based for simplicty and speed.
  * Working with number space after mapping uids/gids into their kernel
    internal version adds only mapping complexity over what we have today,
    leaving the kernel code easy to understand and test.
- Add proc files /proc/self/uid_map /proc/self/gid_map
  These files display the mapping and allow a mapping to be added
  if a mapping does not exist.
- Allow entering the user namespace without a uid or gid mapping.
  Since we are starting with an existing user our uids and gids
  still have global mappings so are still valid and useful they just don't
  have local mappings.  The requirement for things to work are global uid
  and gid so it is odd but perfectly fine not to have a local uid
  and gid mapping.
  Not requiring global uid and gid mappings greatly simplifies
  the logic of setting up the uid and gid mappings by allowing
  the mappings to be set after the namespace is created which makes the
  slight weirdness worth it.
- Make the mappings in the initial user namespace to the global
  uid/gid space explicit.  Today it is an identity mapping
  but in the future we may want to twist this for debugging, similar
  to what we do with jiffies.
- Document the memory ordering requirements of setting the uid and
  gid mappings.  We only allow the mappings to be set once
  and there are no pointers involved so the requirments are
  trivial but a little atypical.

Performance:

In this scheme for the permission checks the performance is expected to
stay the same as the actuall machine instructions should remain the same.

The worst case I could think of is ls -l on a large directory where
all of the stat results need to be translated with from kuids and
kgids to uids and gids.  So I benchmarked that case on my laptop
with a dual core hyperthread Intel i5-2520M cpu with 3M of cpu cache.

My benchmark consisted of going to single user mode where nothing else
was running. On an ext4 filesystem opening 1,000,000 files and looping
through all of the files 1000 times and calling fstat on the
individuals files.  This was to ensure I was benchmarking stat times
where the inodes were in the kernels cache, but the inode values were
not in the processors cache.  My results:

v3.4-rc1:         ~= 156ns (unmodified v3.4-rc1 with user namespace support disabled)
v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
v3.4-rc1-userns+: ~= 164ns (v3.4-rc1 with my user namespace patches and user namespace support enabled)

All of the configurations ran in roughly 120ns when I performed tests
that ran in the cpu cache.

So in summary the performance impact is:
1ns improvement in the worst case with user namespace support compiled out.
8ns aka 5% slowdown in the worst case with user namespace support compiled in.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/proc/base.c                 |   77 ++++++
 include/linux/uidgid.h         |   24 ++
 include/linux/user_namespace.h |   30 ++-
 kernel/user.c                  |   16 ++
 kernel/user_namespace.c        |  545 +++++++++++++++++++++++++++++++++++++---
 5 files changed, 644 insertions(+), 48 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1c8b280..2ee514c 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -81,6 +81,7 @@
 #include <linux/oom.h>
 #include <linux/elf.h>
 #include <linux/pid_namespace.h>
+#include <linux/user_namespace.h>
 #include <linux/fs_struct.h>
 #include <linux/slab.h>
 #include <linux/flex_array.h>
@@ -2943,6 +2944,74 @@ static int proc_tgid_io_accounting(struct task_struct *task, char *buffer)
 }
 #endif /* CONFIG_TASK_IO_ACCOUNTING */
 
+#ifdef CONFIG_USER_NS
+static int proc_id_map_open(struct inode *inode, struct file *file,
+	struct seq_operations *seq_ops)
+{
+	struct user_namespace *ns = NULL;
+	struct task_struct *task;
+	struct seq_file *seq;
+	int ret = -EINVAL;
+
+	task = get_proc_task(inode);
+	if (task) {
+		rcu_read_lock();
+		ns = get_user_ns(task_cred_xxx(task, user_ns));
+		rcu_read_unlock();
+		put_task_struct(task);
+	}
+	if (!ns)
+		goto err;
+
+	ret = seq_open(file, seq_ops);
+	if (ret)
+		goto err_put_ns;
+
+	seq = file->private_data;
+	seq->private = ns;
+
+	return 0;
+err_put_ns:
+	put_user_ns(ns);
+err:
+	return ret;
+}
+
+static int proc_id_map_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *seq = file->private_data;
+	struct user_namespace *ns = seq->private;
+	put_user_ns(ns);
+	return seq_release(inode, file);
+}
+
+static int proc_uid_map_open(struct inode *inode, struct file *file)
+{
+	return proc_id_map_open(inode, file, &proc_uid_seq_operations);
+}
+
+static int proc_gid_map_open(struct inode *inode, struct file *file)
+{
+	return proc_id_map_open(inode, file, &proc_gid_seq_operations);
+}
+
+static const struct file_operations proc_uid_map_operations = {
+	.open		= proc_uid_map_open,
+	.write		= proc_uid_map_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= proc_id_map_release,
+};
+
+static const struct file_operations proc_gid_map_operations = {
+	.open		= proc_gid_map_open,
+	.write		= proc_gid_map_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= proc_id_map_release,
+};
+#endif /* CONFIG_USER_NS */
+
 static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
 				struct pid *pid, struct task_struct *task)
 {
@@ -3045,6 +3114,10 @@ static const struct pid_entry tgid_base_stuff[] = {
 #ifdef CONFIG_HARDWALL
 	INF("hardwall",   S_IRUGO, proc_pid_hardwall),
 #endif
+#ifdef CONFIG_USER_NS
+	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
+	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
+#endif
 };
 
 static int proc_tgid_base_readdir(struct file * filp,
@@ -3400,6 +3473,10 @@ static const struct pid_entry tid_base_stuff[] = {
 #ifdef CONFIG_HARDWALL
 	INF("hardwall",   S_IRUGO, proc_pid_hardwall),
 #endif
+#ifdef CONFIG_USER_NS
+	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
+	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
+#endif
 };
 
 static int proc_tid_base_readdir(struct file * filp,
diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
index 5398568..8e522cbc 100644
--- a/include/linux/uidgid.h
+++ b/include/linux/uidgid.h
@@ -127,6 +127,28 @@ static inline bool gid_valid(kgid_t gid)
 	return !gid_eq(gid, INVALID_GID);
 }
 
+#ifdef CONFIG_USER_NS
+
+extern kuid_t make_kuid(struct user_namespace *from, uid_t uid);
+extern kgid_t make_kgid(struct user_namespace *from, gid_t gid);
+
+extern uid_t from_kuid(struct user_namespace *to, kuid_t uid);
+extern gid_t from_kgid(struct user_namespace *to, kgid_t gid);
+extern uid_t from_kuid_munged(struct user_namespace *to, kuid_t uid);
+extern gid_t from_kgid_munged(struct user_namespace *to, kgid_t gid);
+
+static inline bool kuid_has_mapping(struct user_namespace *ns, kuid_t uid)
+{
+	return from_kuid(ns, uid) != (uid_t) -1;
+}
+
+static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid)
+{
+	return from_kgid(ns, gid) != (gid_t) -1;
+}
+
+#else
+
 static inline kuid_t make_kuid(struct user_namespace *from, uid_t uid)
 {
 	return KUIDT_INIT(uid);
@@ -173,4 +195,6 @@ static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid)
 	return true;
 }
 
+#endif /* CONFIG_USER_NS */
+
 #endif /* _LINUX_UIDGID_H */
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 8a391bd..4c9846d 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -6,7 +6,20 @@
 #include <linux/sched.h>
 #include <linux/err.h>
 
+#define UID_GID_MAP_MAX_EXTENTS 5
+
+struct uid_gid_map {	/* 64 bytes -- 1 cache line */
+	u32 nr_extents;
+	struct uid_gid_extent {
+		u32 first;
+		u32 lower_first;
+		u32 count;
+	} extent[UID_GID_MAP_MAX_EXTENTS];
+};
+
 struct user_namespace {
+	struct uid_gid_map	uid_map;
+	struct uid_gid_map	gid_map;
 	struct kref		kref;
 	struct user_namespace	*parent;
 	kuid_t			owner;
@@ -33,9 +46,11 @@ static inline void put_user_ns(struct user_namespace *ns)
 		kref_put(&ns->kref, free_user_ns);
 }
 
-uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid);
-gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t gid);
-
+struct seq_operations;
+extern struct seq_operations proc_uid_seq_operations;
+extern struct seq_operations proc_gid_seq_operations;
+extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
+extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
 #else
 
 static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -52,17 +67,18 @@ static inline void put_user_ns(struct user_namespace *ns)
 {
 }
 
+#endif
+
 static inline uid_t user_ns_map_uid(struct user_namespace *to,
 	const struct cred *cred, uid_t uid)
 {
-	return uid;
+	return from_kuid_munged(to, make_kuid(cred->user_ns, uid));
 }
+
 static inline gid_t user_ns_map_gid(struct user_namespace *to,
 	const struct cred *cred, gid_t gid)
 {
-	return gid;
+	return from_kgid_munged(to, make_kgid(cred->user_ns, gid));
 }
 
-#endif
-
 #endif /* _LINUX_USER_H */
diff --git a/kernel/user.c b/kernel/user.c
index cff3856..f9e420e 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -22,6 +22,22 @@
  * and 1 for... ?
  */
 struct user_namespace init_user_ns = {
+	.uid_map = {
+		.nr_extents = 1,
+		.extent[0] = {
+			.first = 0,
+			.lower_first = 0,
+			.count = 4294967295,
+		},
+	},
+	.gid_map = {
+		.nr_extents = 1,
+		.extent[0] = {
+			.first = 0,
+			.lower_first = 0,
+			.count = 4294967295,
+		},
+	},
 	.kref = {
 		.refcount	= ATOMIC_INIT(3),
 	},
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index f69741a..9991bac 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -12,9 +12,19 @@
 #include <linux/highuid.h>
 #include <linux/cred.h>
 #include <linux/securebits.h>
+#include <linux/keyctl.h>
+#include <linux/key-type.h>
+#include <keys/user-type.h>
+#include <linux/seq_file.h>
+#include <linux/fs.h>
+#include <linux/uaccess.h>
+#include <linux/ctype.h>
 
 static struct kmem_cache *user_ns_cachep __read_mostly;
 
+static bool new_idmap_permitted(struct user_namespace *ns, int cap_setid,
+				struct uid_gid_map *map);
+
 /*
  * Create a new user namespace, deriving the creator from the user in the
  * passed credentials, and replacing that user with the new root user for the
@@ -26,7 +36,6 @@ static struct kmem_cache *user_ns_cachep __read_mostly;
 int create_user_ns(struct cred *new)
 {
 	struct user_namespace *ns, *parent_ns = new->user_ns;
-	struct user_struct *root_user;
 	kuid_t owner = make_kuid(new->user_ns, new->euid);
 	kgid_t group = make_kgid(new->user_ns, new->egid);
 
@@ -38,29 +47,15 @@ int create_user_ns(struct cred *new)
 	    !kgid_has_mapping(parent_ns, group))
 		return -EPERM;
 
-	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
+	ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL);
 	if (!ns)
 		return -ENOMEM;
 
 	kref_init(&ns->kref);
-
-	/* Alloc new root user.  */
-	root_user = alloc_uid(make_kuid(ns, 0));
-	if (!root_user) {
-		kmem_cache_free(user_ns_cachep, ns);
-		return -ENOMEM;
-	}
-
-	/* set the new root user in the credentials under preparation */
 	ns->parent = parent_ns;
 	ns->owner = owner;
 	ns->group = group;
-	free_uid(new->user);
-	new->user = root_user;
-	new->uid = new->euid = new->suid = new->fsuid = 0;
-	new->gid = new->egid = new->sgid = new->fsgid = 0;
-	put_group_info(new->group_info);
-	new->group_info = get_group_info(&init_groups);
+
 	/* Start with the same capabilities as init but useless for doing
 	 * anything as the capabilities are bound to the new user namespace.
 	 */
@@ -92,44 +87,512 @@ void free_user_ns(struct kref *kref)
 }
 EXPORT_SYMBOL(free_user_ns);
 
-uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid)
+static u32 map_id_range_down(struct uid_gid_map *map, u32 id, u32 count)
 {
-	struct user_namespace *tmp;
+	unsigned idx, extents;
+	u32 first, last, id2;
 
-	if (likely(to == cred->user_ns))
-		return uid;
+	id2 = id + count - 1;
 
-	/* Is cred->user the creator of the target user_ns
-	 * or the creator of one of it's parents?
-	 */
-	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
-		if (uid_eq(cred->user->uid, tmp->owner)) {
-			return (uid_t)0;
-		}
+	/* Find the matching extent */
+	extents = map->nr_extents;
+	smp_read_barrier_depends();
+	for (idx = 0; idx < extents; idx++) {
+		first = map->extent[idx].first;
+		last = first + map->extent[idx].count - 1;
+		if (id >= first && id <= last &&
+		    (id2 >= first && id2 <= last))
+			break;
+	}
+	/* Map the id or note failure */
+	if (idx < extents)
+		id = (id - first) + map->extent[idx].lower_first;
+	else
+		id = (u32) -1;
+
+	return id;
+}
+
+static u32 map_id_down(struct uid_gid_map *map, u32 id)
+{
+	unsigned idx, extents;
+	u32 first, last;
+
+	/* Find the matching extent */
+	extents = map->nr_extents;
+	smp_read_barrier_depends();
+	for (idx = 0; idx < extents; idx++) {
+		first = map->extent[idx].first;
+		last = first + map->extent[idx].count - 1;
+		if (id >= first && id <= last)
+			break;
+	}
+	/* Map the id or note failure */
+	if (idx < extents)
+		id = (id - first) + map->extent[idx].lower_first;
+	else
+		id = (u32) -1;
+
+	return id;
+}
+
+static u32 map_id_up(struct uid_gid_map *map, u32 id)
+{
+	unsigned idx, extents;
+	u32 first, last;
+
+	/* Find the matching extent */
+	extents = map->nr_extents;
+	smp_read_barrier_depends();
+	for (idx = 0; idx < extents; idx++) {
+		first = map->extent[idx].lower_first;
+		last = first + map->extent[idx].count - 1;
+		if (id >= first && id <= last)
+			break;
 	}
+	/* Map the id or note failure */
+	if (idx < extents)
+		id = (id - first) + map->extent[idx].first;
+	else
+		id = (u32) -1;
+
+	return id;
+}
+
+/**
+ *	make_kuid - Map a user-namespace uid pair into a kuid.
+ *	@ns:  User namespace that the uid is in
+ *	@uid: User identifier
+ *
+ *	Maps a user-namespace uid pair into a kernel internal kuid,
+ *	and returns that kuid.
+ *
+ *	When there is no mapping defined for the user-namespace uid
+ *	pair INVALID_UID is returned.  Callers are expected to test
+ *	for and handle handle INVALID_UID being returned.  INVALID_UID
+ *	may be tested for using uid_valid().
+ */
+kuid_t make_kuid(struct user_namespace *ns, uid_t uid)
+{
+	/* Map the uid to a global kernel uid */
+	return KUIDT_INIT(map_id_down(&ns->uid_map, uid));
+}
+EXPORT_SYMBOL(make_kuid);
+
+/**
+ *	from_kuid - Create a uid from a kuid user-namespace pair.
+ *	@targ: The user namespace we want a uid in.
+ *	@kuid: The kernel internal uid to start with.
+ *
+ *	Map @kuid into the user-namespace specified by @targ and
+ *	return the resulting uid.
+ *
+ *	There is always a mapping into the initial user_namespace.
+ *
+ *	If @kuid has no mapping in @targ (uid_t)-1 is returned.
+ */
+uid_t from_kuid(struct user_namespace *targ, kuid_t kuid)
+{
+	/* Map the uid from a global kernel uid */
+	return map_id_up(&targ->uid_map, __kuid_val(kuid));
+}
+EXPORT_SYMBOL(from_kuid);
+
+/**
+ *	from_kuid_munged - Create a uid from a kuid user-namespace pair.
+ *	@targ: The user namespace we want a uid in.
+ *	@kuid: The kernel internal uid to start with.
+ *
+ *	Map @kuid into the user-namespace specified by @targ and
+ *	return the resulting uid.
+ *
+ *	There is always a mapping into the initial user_namespace.
+ *
+ *	Unlike from_kuid from_kuid_munged never fails and always
+ *	returns a valid uid.  This makes from_kuid_munged appropriate
+ *	for use in syscalls like stat and getuid where failing the
+ *	system call and failing to provide a valid uid are not an
+ *	options.
+ *
+ *	If @kuid has no mapping in @targ overflowuid is returned.
+ */
+uid_t from_kuid_munged(struct user_namespace *targ, kuid_t kuid)
+{
+	uid_t uid;
+	uid = from_kuid(targ, kuid);
+
+	if (uid == (uid_t) -1)
+		uid = overflowuid;
+	return uid;
+}
+EXPORT_SYMBOL(from_kuid_munged);
+
+/**
+ *	make_kgid - Map a user-namespace gid pair into a kgid.
+ *	@ns:  User namespace that the gid is in
+ *	@uid: group identifier
+ *
+ *	Maps a user-namespace gid pair into a kernel internal kgid,
+ *	and returns that kgid.
+ *
+ *	When there is no mapping defined for the user-namespace gid
+ *	pair INVALID_GID is returned.  Callers are expected to test
+ *	for and handle INVALID_GID being returned.  INVALID_GID may be
+ *	tested for using gid_valid().
+ */
+kgid_t make_kgid(struct user_namespace *ns, gid_t gid)
+{
+	/* Map the gid to a global kernel gid */
+	return KGIDT_INIT(map_id_down(&ns->gid_map, gid));
+}
+EXPORT_SYMBOL(make_kgid);
+
+/**
+ *	from_kgid - Create a gid from a kgid user-namespace pair.
+ *	@targ: The user namespace we want a gid in.
+ *	@kgid: The kernel internal gid to start with.
+ *
+ *	Map @kgid into the user-namespace specified by @targ and
+ *	return the resulting gid.
+ *
+ *	There is always a mapping into the initial user_namespace.
+ *
+ *	If @kgid has no mapping in @targ (gid_t)-1 is returned.
+ */
+gid_t from_kgid(struct user_namespace *targ, kgid_t kgid)
+{
+	/* Map the gid from a global kernel gid */
+	return map_id_up(&targ->gid_map, __kgid_val(kgid));
+}
+EXPORT_SYMBOL(from_kgid);
+
+/**
+ *	from_kgid_munged - Create a gid from a kgid user-namespace pair.
+ *	@targ: The user namespace we want a gid in.
+ *	@kgid: The kernel internal gid to start with.
+ *
+ *	Map @kgid into the user-namespace specified by @targ and
+ *	return the resulting gid.
+ *
+ *	There is always a mapping into the initial user_namespace.
+ *
+ *	Unlike from_kgid from_kgid_munged never fails and always
+ *	returns a valid gid.  This makes from_kgid_munged appropriate
+ *	for use in syscalls like stat and getgid where failing the
+ *	system call and failing to provide a valid gid are not options.
+ *
+ *	If @kgid has no mapping in @targ overflowgid is returned.
+ */
+gid_t from_kgid_munged(struct user_namespace *targ, kgid_t kgid)
+{
+	gid_t gid;
+	gid = from_kgid(targ, kgid);
+
+	if (gid == (gid_t) -1)
+		gid = overflowgid;
+	return gid;
+}
+EXPORT_SYMBOL(from_kgid_munged);
+
+static int uid_m_show(struct seq_file *seq, void *v)
+{
+	struct user_namespace *ns = seq->private;
+	struct uid_gid_extent *extent = v;
+	struct user_namespace *lower_ns;
+	uid_t lower;
 
-	/* No useful relationship so no mapping */
-	return overflowuid;
+	lower_ns = current_user_ns();
+	if ((lower_ns == ns) && lower_ns->parent)
+		lower_ns = lower_ns->parent;
+
+	lower = from_kuid(lower_ns, KUIDT_INIT(extent->lower_first));
+
+	seq_printf(seq, "%10u %10u %10u\n",
+		extent->first,
+		lower,
+		extent->count);
+
+	return 0;
 }
 
-gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t gid)
+static int gid_m_show(struct seq_file *seq, void *v)
 {
-	struct user_namespace *tmp;
+	struct user_namespace *ns = seq->private;
+	struct uid_gid_extent *extent = v;
+	struct user_namespace *lower_ns;
+	gid_t lower;
 
-	if (likely(to == cred->user_ns))
-		return gid;
+	lower_ns = current_user_ns();
+	if ((lower_ns == ns) && lower_ns->parent)
+		lower_ns = lower_ns->parent;
 
-	/* Is cred->user the creator of the target user_ns
-	 * or the creator of one of it's parents?
+	lower = from_kgid(lower_ns, KGIDT_INIT(extent->lower_first));
+
+	seq_printf(seq, "%10u %10u %10u\n",
+		extent->first,
+		lower,
+		extent->count);
+
+	return 0;
+}
+
+static void *m_start(struct seq_file *seq, loff_t *ppos, struct uid_gid_map *map)
+{
+	struct uid_gid_extent *extent = NULL;
+	loff_t pos = *ppos;
+
+	if (pos < map->nr_extents)
+		extent = &map->extent[pos];
+
+	return extent;
+}
+
+static void *uid_m_start(struct seq_file *seq, loff_t *ppos)
+{
+	struct user_namespace *ns = seq->private;
+
+	return m_start(seq, ppos, &ns->uid_map);
+}
+
+static void *gid_m_start(struct seq_file *seq, loff_t *ppos)
+{
+	struct user_namespace *ns = seq->private;
+
+	return m_start(seq, ppos, &ns->gid_map);
+}
+
+static void *m_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	(*pos)++;
+	return seq->op->start(seq, pos);
+}
+
+static void m_stop(struct seq_file *seq, void *v)
+{
+	return;
+}
+
+struct seq_operations proc_uid_seq_operations = {
+	.start = uid_m_start,
+	.stop = m_stop,
+	.next = m_next,
+	.show = uid_m_show,
+};
+
+struct seq_operations proc_gid_seq_operations = {
+	.start = gid_m_start,
+	.stop = m_stop,
+	.next = m_next,
+	.show = gid_m_show,
+};
+
+static DEFINE_MUTEX(id_map_mutex);
+
+static ssize_t map_write(struct file *file, const char __user *buf,
+			 size_t count, loff_t *ppos,
+			 int cap_setid,
+			 struct uid_gid_map *map,
+			 struct uid_gid_map *parent_map)
+{
+	struct seq_file *seq = file->private_data;
+	struct user_namespace *ns = seq->private;
+	struct uid_gid_map new_map;
+	unsigned idx;
+	struct uid_gid_extent *extent, *last = NULL;
+	unsigned long page = 0;
+	char *kbuf, *pos, *next_line;
+	ssize_t ret = -EINVAL;
+
+	/*
+	 * The id_map_mutex serializes all writes to any given map.
+	 *
+	 * Any map is only ever written once.
+	 *
+	 * An id map fits within 1 cache line on most architectures.
+	 *
+	 * On read nothing needs to be done unless you are on an
+	 * architecture with a crazy cache coherency model like alpha.
+	 *
+	 * There is a one time data dependency between reading the
+	 * count of the extents and the values of the extents.  The
+	 * desired behavior is to see the values of the extents that
+	 * were written before the count of the extents.
+	 *
+	 * To achieve this smp_wmb() is used on guarantee the write
+	 * order and smp_read_barrier_depends() is guaranteed that we
+	 * don't have crazy architectures returning stale data.
+	 *
+	 */
+	mutex_lock(&id_map_mutex);
+
+	ret = -EPERM;
+	/* Only allow one successful write to the map */
+	if (map->nr_extents != 0)
+		goto out;
+
+	/* Require the appropriate privilege CAP_SETUID or CAP_SETGID
+	 * over the user namespace in order to set the id mapping.
 	 */
-	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
-		if (uid_eq(cred->user->uid, tmp->owner)) {
-			return (gid_t)0;
+	if (!ns_capable(ns, cap_setid))
+		goto out;
+
+	/* Get a buffer */
+	ret = -ENOMEM;
+	page = __get_free_page(GFP_TEMPORARY);
+	kbuf = (char *) page;
+	if (!page)
+		goto out;
+
+	/* Only allow <= page size writes at the beginning of the file */
+	ret = -EINVAL;
+	if ((*ppos != 0) || (count >= PAGE_SIZE))
+		goto out;
+
+	/* Slurp in the user data */
+	ret = -EFAULT;
+	if (copy_from_user(kbuf, buf, count))
+		goto out;
+	kbuf[count] = '\0';
+
+	/* Parse the user data */
+	ret = -EINVAL;
+	pos = kbuf;
+	new_map.nr_extents = 0;
+	for (;pos; pos = next_line) {
+		extent = &new_map.extent[new_map.nr_extents];
+
+		/* Find the end of line and ensure I don't look past it */
+		next_line = strchr(pos, '\n');
+		if (next_line) {
+			*next_line = '\0';
+			next_line++;
+			if (*next_line == '\0')
+				next_line = NULL;
 		}
+
+		pos = skip_spaces(pos);
+		extent->first = simple_strtoul(pos, &pos, 10);
+		if (!isspace(*pos))
+			goto out;
+
+		pos = skip_spaces(pos);
+		extent->lower_first = simple_strtoul(pos, &pos, 10);
+		if (!isspace(*pos))
+			goto out;
+
+		pos = skip_spaces(pos);
+		extent->count = simple_strtoul(pos, &pos, 10);
+		if (*pos && !isspace(*pos))
+			goto out;
+
+		/* Verify there is not trailing junk on the line */
+		pos = skip_spaces(pos);
+		if (*pos != '\0')
+			goto out;
+
+		/* Verify we have been given valid starting values */
+		if ((extent->first == (u32) -1) ||
+		    (extent->lower_first == (u32) -1 ))
+			goto out;
+
+		/* Verify count is not zero and does not cause the extent to wrap */
+		if ((extent->first + extent->count) <= extent->first)
+			goto out;
+		if ((extent->lower_first + extent->count) <= extent->lower_first)
+			goto out;
+
+		/* For now only accept extents that are strictly in order */
+		if (last &&
+		    (((last->first + last->count) > extent->first) ||
+		     ((last->lower_first + last->count) > extent->lower_first)))
+			goto out;
+
+		new_map.nr_extents++;
+		last = extent;
+
+		/* Fail if the file contains too many extents */
+		if ((new_map.nr_extents == UID_GID_MAP_MAX_EXTENTS) &&
+		    (next_line != NULL))
+			goto out;
 	}
+	/* Be very certaint the new map actually exists */
+	if (new_map.nr_extents == 0)
+		goto out;
+
+	ret = -EPERM;
+	/* Validate the user is allowed to use user id's mapped to. */
+	if (!new_idmap_permitted(ns, cap_setid, &new_map))
+		goto out;
+
+	/* Map the lower ids from the parent user namespace to the
+	 * kernel global id space.
+	 */
+	for (idx = 0; idx < new_map.nr_extents; idx++) {
+		u32 lower_first;
+		extent = &new_map.extent[idx];
+
+		lower_first = map_id_range_down(parent_map,
+						extent->lower_first,
+						extent->count);
+
+		/* Fail if we can not map the specified extent to
+		 * the kernel global id space.
+		 */
+		if (lower_first == (u32) -1)
+			goto out;
+
+		extent->lower_first = lower_first;
+	}
+
+	/* Install the map */
+	memcpy(map->extent, new_map.extent,
+		new_map.nr_extents*sizeof(new_map.extent[0]));
+	smp_wmb();
+	map->nr_extents = new_map.nr_extents;
+
+	*ppos = count;
+	ret = count;
+out:
+	mutex_unlock(&id_map_mutex);
+	if (page)
+		free_page(page);
+	return ret;
+}
+
+ssize_t proc_uid_map_write(struct file *file, const char __user *buf, size_t size, loff_t *ppos)
+{
+	struct seq_file *seq = file->private_data;
+	struct user_namespace *ns = seq->private;
+
+	if (!ns->parent)
+		return -EPERM;
+
+	return map_write(file, buf, size, ppos, CAP_SETUID,
+			 &ns->uid_map, &ns->parent->uid_map);
+}
+
+ssize_t proc_gid_map_write(struct file *file, const char __user *buf, size_t size, loff_t *ppos)
+{
+	struct seq_file *seq = file->private_data;
+	struct user_namespace *ns = seq->private;
+
+	if (!ns->parent)
+		return -EPERM;
+
+	return map_write(file, buf, size, ppos, CAP_SETGID,
+			 &ns->gid_map, &ns->parent->gid_map);
+}
+
+static bool new_idmap_permitted(struct user_namespace *ns, int cap_setid,
+				struct uid_gid_map *new_map)
+{
+	/* Allow the specified ids if we have the appropriate capability
+	 * (CAP_SETUID or CAP_SETGID) over the parent user namespace.
+	 */
+	if (ns_capable(ns->parent, cap_setid))
+		return true;
 
-	/* No useful relationship so no mapping */
-	return overflowgid;
+	return false;
 }
 
 static __init int user_namespaces_init(void)
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 18/43] userns: Convert group_info values from gid_t to kgid_t.
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

As a first step to converting struct cred to be all kuid_t and kgid_t
values convert the group values stored in group_info to always be
kgid_t values.   Unless user namespaces are used this change should
have no effect.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 arch/s390/kernel/compat_linux.c   |   13 ++++++++-
 fs/nfsd/auth.c                    |    5 ++-
 fs/proc/array.c                   |    5 +++-
 include/linux/cred.h              |    9 ++++---
 kernel/groups.c                   |   48 +++++++++++++++++++-----------------
 kernel/uid16.c                    |   14 +++++++++-
 net/ipv4/ping.c                   |   11 ++++++--
 net/sunrpc/auth_generic.c         |    4 +-
 net/sunrpc/auth_gss/svcauth_gss.c |    7 ++++-
 net/sunrpc/auth_unix.c            |   15 ++++++++---
 net/sunrpc/svcauth_unix.c         |   18 ++++++++++---
 security/keys/permission.c        |    3 +-
 12 files changed, 103 insertions(+), 49 deletions(-)

diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index ab64bdb..5baac18 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -173,11 +173,14 @@ asmlinkage long sys32_setfsgid16(u16 gid)
 
 static int groups16_to_user(u16 __user *grouplist, struct group_info *group_info)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	u16 group;
+	kgid_t kgid;
 
 	for (i = 0; i < group_info->ngroups; i++) {
-		group = (u16)GROUP_AT(group_info, i);
+		kgid = GROUP_AT(group_info, i);
+		group = (u16)from_kgid_munged(user_ns, kgid);
 		if (put_user(group, grouplist+i))
 			return -EFAULT;
 	}
@@ -187,13 +190,19 @@ static int groups16_to_user(u16 __user *grouplist, struct group_info *group_info
 
 static int groups16_from_user(struct group_info *group_info, u16 __user *grouplist)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	u16 group;
 
 	for (i = 0; i < group_info->ngroups; i++) {
 		if (get_user(group, grouplist+i))
 			return  -EFAULT;
-		GROUP_AT(group_info, i) = (gid_t)group;
+
+		kgid = make_kgid(user_ns, (gid_t)group);
+		if (!gid_valid(kgid))
+			return -EINVAL;
+
+		GROUP_AT(group_info, i) = kgid;
 	}
 
 	return 0;
diff --git a/fs/nfsd/auth.c b/fs/nfsd/auth.c
index 79717a4..204438c 100644
--- a/fs/nfsd/auth.c
+++ b/fs/nfsd/auth.c
@@ -1,6 +1,7 @@
 /* Copyright (C) 1995, 1996 Olaf Kirch <okir-pn4DOG8n3UYbFoVRYvo4fw@public.gmane.org> */
 
 #include <linux/sched.h>
+#include <linux/user_namespace.h>
 #include "nfsd.h"
 #include "auth.h"
 
@@ -56,8 +57,8 @@ int nfsd_setuser(struct svc_rqst *rqstp, struct svc_export *exp)
 			goto oom;
 
 		for (i = 0; i < rqgi->ngroups; i++) {
-			if (!GROUP_AT(rqgi, i))
-				GROUP_AT(gi, i) = exp->ex_anon_gid;
+			if (gid_eq(GLOBAL_ROOT_GID, GROUP_AT(rqgi, i)))
+				GROUP_AT(gi, i) = make_kgid(&init_user_ns, exp->ex_anon_gid);
 			else
 				GROUP_AT(gi, i) = GROUP_AT(rqgi, i);
 		}
diff --git a/fs/proc/array.c b/fs/proc/array.c
index f9bd395..36a0a91 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -81,6 +81,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/ptrace.h>
 #include <linux/tracehook.h>
+#include <linux/user_namespace.h>
 
 #include <asm/pgtable.h>
 #include <asm/processor.h>
@@ -161,6 +162,7 @@ static inline const char *get_task_state(struct task_struct *tsk)
 static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 				struct pid *pid, struct task_struct *p)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	struct group_info *group_info;
 	int g;
 	struct fdtable *fdt = NULL;
@@ -205,7 +207,8 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 	task_unlock(p);
 
 	for (g = 0; g < min(group_info->ngroups, NGROUPS_SMALL); g++)
-		seq_printf(m, "%d ", GROUP_AT(group_info, g));
+		seq_printf(m, "%d ",
+			   from_kgid_munged(user_ns, GROUP_AT(group_info, g)));
 	put_cred(cred);
 
 	seq_putc(m, '\n');
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 2c60ec8..0ab3cda 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -17,6 +17,7 @@
 #include <linux/key.h>
 #include <linux/selinux.h>
 #include <linux/atomic.h>
+#include <linux/uidgid.h>
 
 struct user_struct;
 struct cred;
@@ -26,14 +27,14 @@ struct inode;
  * COW Supplementary groups list
  */
 #define NGROUPS_SMALL		32
-#define NGROUPS_PER_BLOCK	((unsigned int)(PAGE_SIZE / sizeof(gid_t)))
+#define NGROUPS_PER_BLOCK	((unsigned int)(PAGE_SIZE / sizeof(kgid_t)))
 
 struct group_info {
 	atomic_t	usage;
 	int		ngroups;
 	int		nblocks;
-	gid_t		small_block[NGROUPS_SMALL];
-	gid_t		*blocks[0];
+	kgid_t		small_block[NGROUPS_SMALL];
+	kgid_t		*blocks[0];
 };
 
 /**
@@ -66,7 +67,7 @@ extern struct group_info init_groups;
 extern void groups_free(struct group_info *);
 extern int set_current_groups(struct group_info *);
 extern int set_groups(struct cred *, struct group_info *);
-extern int groups_search(const struct group_info *, gid_t);
+extern int groups_search(const struct group_info *, kgid_t);
 
 /* access the groups "array" with this macro */
 #define GROUP_AT(gi, i) \
diff --git a/kernel/groups.c b/kernel/groups.c
index 99b53d1..84156f2 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -31,7 +31,7 @@ struct group_info *groups_alloc(int gidsetsize)
 		group_info->blocks[0] = group_info->small_block;
 	else {
 		for (i = 0; i < nblocks; i++) {
-			gid_t *b;
+			kgid_t *b;
 			b = (void *)__get_free_page(GFP_USER);
 			if (!b)
 				goto out_undo_partial_alloc;
@@ -66,18 +66,15 @@ EXPORT_SYMBOL(groups_free);
 static int groups_to_user(gid_t __user *grouplist,
 			  const struct group_info *group_info)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	unsigned int count = group_info->ngroups;
 
-	for (i = 0; i < group_info->nblocks; i++) {
-		unsigned int cp_count = min(NGROUPS_PER_BLOCK, count);
-		unsigned int len = cp_count * sizeof(*grouplist);
-
-		if (copy_to_user(grouplist, group_info->blocks[i], len))
+	for (i = 0; i < count; i++) {
+		gid_t gid;
+		gid = from_kgid_munged(user_ns, GROUP_AT(group_info, i));
+		if (put_user(gid, grouplist+i))
 			return -EFAULT;
-
-		grouplist += NGROUPS_PER_BLOCK;
-		count -= cp_count;
 	}
 	return 0;
 }
@@ -86,18 +83,21 @@ static int groups_to_user(gid_t __user *grouplist,
 static int groups_from_user(struct group_info *group_info,
     gid_t __user *grouplist)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	unsigned int count = group_info->ngroups;
 
-	for (i = 0; i < group_info->nblocks; i++) {
-		unsigned int cp_count = min(NGROUPS_PER_BLOCK, count);
-		unsigned int len = cp_count * sizeof(*grouplist);
-
-		if (copy_from_user(group_info->blocks[i], grouplist, len))
+	for (i = 0; i < count; i++) {
+		gid_t gid;
+		kgid_t kgid;
+		if (get_user(gid, grouplist+i))
 			return -EFAULT;
 
-		grouplist += NGROUPS_PER_BLOCK;
-		count -= cp_count;
+		kgid = make_kgid(user_ns, gid);
+		if (!gid_valid(kgid))
+			return -EINVAL;
+
+		GROUP_AT(group_info, i) = kgid;
 	}
 	return 0;
 }
@@ -117,9 +117,9 @@ static void groups_sort(struct group_info *group_info)
 		for (base = 0; base < max; base++) {
 			int left = base;
 			int right = left + stride;
-			gid_t tmp = GROUP_AT(group_info, right);
+			kgid_t tmp = GROUP_AT(group_info, right);
 
-			while (left >= 0 && GROUP_AT(group_info, left) > tmp) {
+			while (left >= 0 && gid_gt(GROUP_AT(group_info, left), tmp)) {
 				GROUP_AT(group_info, right) =
 				    GROUP_AT(group_info, left);
 				right = left;
@@ -132,7 +132,7 @@ static void groups_sort(struct group_info *group_info)
 }
 
 /* a simple bsearch */
-int groups_search(const struct group_info *group_info, gid_t grp)
+int groups_search(const struct group_info *group_info, kgid_t grp)
 {
 	unsigned int left, right;
 
@@ -143,9 +143,9 @@ int groups_search(const struct group_info *group_info, gid_t grp)
 	right = group_info->ngroups;
 	while (left < right) {
 		unsigned int mid = (left+right)/2;
-		if (grp > GROUP_AT(group_info, mid))
+		if (gid_gt(grp, GROUP_AT(group_info, mid)))
 			left = mid + 1;
-		else if (grp < GROUP_AT(group_info, mid))
+		else if (gid_lt(grp, GROUP_AT(group_info, mid)))
 			right = mid;
 		else
 			return 1;
@@ -262,7 +262,8 @@ int in_group_p(gid_t grp)
 	int retval = 1;
 
 	if (grp != cred->fsgid)
-		retval = groups_search(cred->group_info, grp);
+		retval = groups_search(cred->group_info,
+				       make_kgid(cred->user_ns, grp));
 	return retval;
 }
 
@@ -274,7 +275,8 @@ int in_egroup_p(gid_t grp)
 	int retval = 1;
 
 	if (grp != cred->egid)
-		retval = groups_search(cred->group_info, grp);
+		retval = groups_search(cred->group_info,
+				       make_kgid(cred->user_ns, grp));
 	return retval;
 }
 
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 51c6e89..e530bc3 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -134,11 +134,14 @@ SYSCALL_DEFINE1(setfsgid16, old_gid_t, gid)
 static int groups16_to_user(old_gid_t __user *grouplist,
     struct group_info *group_info)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	old_gid_t group;
+	kgid_t kgid;
 
 	for (i = 0; i < group_info->ngroups; i++) {
-		group = high2lowgid(GROUP_AT(group_info, i));
+		kgid = GROUP_AT(group_info, i);
+		group = high2lowgid(from_kgid_munged(user_ns, kgid));
 		if (put_user(group, grouplist+i))
 			return -EFAULT;
 	}
@@ -149,13 +152,20 @@ static int groups16_to_user(old_gid_t __user *grouplist,
 static int groups16_from_user(struct group_info *group_info,
     old_gid_t __user *grouplist)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	old_gid_t group;
+	kgid_t kgid;
 
 	for (i = 0; i < group_info->ngroups; i++) {
 		if (get_user(group, grouplist+i))
 			return  -EFAULT;
-		GROUP_AT(group_info, i) = low2highgid(group);
+
+		kgid = make_kgid(user_ns, low2highgid(group));
+		if (!gid_valid(kgid))
+			return -EINVAL;
+
+		GROUP_AT(group_info, i) = kgid;
 	}
 
 	return 0;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 50009c7..9d3044f 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -205,17 +205,22 @@ static int ping_init_sock(struct sock *sk)
 	gid_t range[2];
 	struct group_info *group_info = get_current_groups();
 	int i, j, count = group_info->ngroups;
+	kgid_t low, high;
 
 	inet_get_ping_group_range_net(net, range, range+1);
+	low = make_kgid(&init_user_ns, range[0]);
+	high = make_kgid(&init_user_ns, range[1]);
+	if (!gid_valid(low) || !gid_valid(high) || gid_lt(high, low))
+		return -EACCES;
+
 	if (range[0] <= group && group <= range[1])
 		return 0;
 
 	for (i = 0; i < group_info->nblocks; i++) {
 		int cp_count = min_t(int, NGROUPS_PER_BLOCK, count);
-
 		for (j = 0; j < cp_count; j++) {
-			group = group_info->blocks[i][j];
-			if (range[0] <= group && group <= range[1])
+			kgid_t gid = group_info->blocks[i][j];
+			if (gid_lte(low, gid) && gid_lte(gid, high))
 				return 0;
 		}
 
diff --git a/net/sunrpc/auth_generic.c b/net/sunrpc/auth_generic.c
index 75762f3..6ed6f20 100644
--- a/net/sunrpc/auth_generic.c
+++ b/net/sunrpc/auth_generic.c
@@ -160,8 +160,8 @@ generic_match(struct auth_cred *acred, struct rpc_cred *cred, int flags)
 	if (gcred->acred.group_info->ngroups != acred->group_info->ngroups)
 		goto out_nomatch;
 	for (i = 0; i < gcred->acred.group_info->ngroups; i++) {
-		if (GROUP_AT(gcred->acred.group_info, i) !=
-				GROUP_AT(acred->group_info, i))
+		if (!gid_eq(GROUP_AT(gcred->acred.group_info, i),
+				GROUP_AT(acred->group_info, i)))
 			goto out_nomatch;
 	}
 out_match:
diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
index 1600cfb..28b62db 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -41,6 +41,7 @@
 #include <linux/types.h>
 #include <linux/module.h>
 #include <linux/pagemap.h>
+#include <linux/user_namespace.h>
 
 #include <linux/sunrpc/auth_gss.h>
 #include <linux/sunrpc/gss_err.h>
@@ -470,9 +471,13 @@ static int rsc_parse(struct cache_detail *cd,
 		status = -EINVAL;
 		for (i=0; i<N; i++) {
 			gid_t gid;
+			kgid_t kgid;
 			if (get_int(&mesg, &gid))
 				goto out;
-			GROUP_AT(rsci.cred.cr_group_info, i) = gid;
+			kgid = make_kgid(&init_user_ns, gid);
+			if (!gid_valid(kgid))
+				goto out;
+			GROUP_AT(rsci.cred.cr_group_info, i) = kgid;
 		}
 
 		/* mech name */
diff --git a/net/sunrpc/auth_unix.c b/net/sunrpc/auth_unix.c
index e50502d..52c5abd 100644
--- a/net/sunrpc/auth_unix.c
+++ b/net/sunrpc/auth_unix.c
@@ -12,6 +12,7 @@
 #include <linux/module.h>
 #include <linux/sunrpc/clnt.h>
 #include <linux/sunrpc/auth.h>
+#include <linux/user_namespace.h>
 
 #define NFS_NGROUPS	16
 
@@ -78,8 +79,11 @@ unx_create_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags)
 		groups = NFS_NGROUPS;
 
 	cred->uc_gid = acred->gid;
-	for (i = 0; i < groups; i++)
-		cred->uc_gids[i] = GROUP_AT(acred->group_info, i);
+	for (i = 0; i < groups; i++) {
+		gid_t gid;
+		gid = from_kgid(&init_user_ns, GROUP_AT(acred->group_info, i));
+		cred->uc_gids[i] = gid;
+	}
 	if (i < NFS_NGROUPS)
 		cred->uc_gids[i] = NOGROUP;
 
@@ -126,9 +130,12 @@ unx_match(struct auth_cred *acred, struct rpc_cred *rcred, int flags)
 		groups = acred->group_info->ngroups;
 	if (groups > NFS_NGROUPS)
 		groups = NFS_NGROUPS;
-	for (i = 0; i < groups ; i++)
-		if (cred->uc_gids[i] != GROUP_AT(acred->group_info, i))
+	for (i = 0; i < groups ; i++) {
+		gid_t gid;
+		gid = from_kgid(&init_user_ns, GROUP_AT(acred->group_info, i));
+		if (cred->uc_gids[i] != gid)
 			return 0;
+	}
 	if (groups < NFS_NGROUPS &&
 	    cred->uc_gids[groups] != NOGROUP)
 		return 0;
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 521d8f7..71ec853 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -14,6 +14,7 @@
 #include <net/sock.h>
 #include <net/ipv6.h>
 #include <linux/kernel.h>
+#include <linux/user_namespace.h>
 #define RPCDBG_FACILITY	RPCDBG_AUTH
 
 #include <linux/sunrpc/clnt.h>
@@ -530,11 +531,15 @@ static int unix_gid_parse(struct cache_detail *cd,
 
 	for (i = 0 ; i < gids ; i++) {
 		int gid;
+		kgid_t kgid;
 		rv = get_int(&mesg, &gid);
 		err = -EINVAL;
 		if (rv)
 			goto out;
-		GROUP_AT(ug.gi, i) = gid;
+		kgid = make_kgid(&init_user_ns, gid);
+		if (!gid_valid(kgid))
+			goto out;
+		GROUP_AT(ug.gi, i) = kgid;
 	}
 
 	ugp = unix_gid_lookup(cd, uid);
@@ -563,6 +568,7 @@ static int unix_gid_show(struct seq_file *m,
 			 struct cache_detail *cd,
 			 struct cache_head *h)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	struct unix_gid *ug;
 	int i;
 	int glen;
@@ -580,7 +586,7 @@ static int unix_gid_show(struct seq_file *m,
 
 	seq_printf(m, "%u %d:", ug->uid, glen);
 	for (i = 0; i < glen; i++)
-		seq_printf(m, " %d", GROUP_AT(ug->gi, i));
+		seq_printf(m, " %d", from_kgid_munged(user_ns, GROUP_AT(ug->gi, i)));
 	seq_printf(m, "\n");
 	return 0;
 }
@@ -831,8 +837,12 @@ svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp)
 	cred->cr_group_info = groups_alloc(slen);
 	if (cred->cr_group_info == NULL)
 		return SVC_CLOSE;
-	for (i = 0; i < slen; i++)
-		GROUP_AT(cred->cr_group_info, i) = svc_getnl(argv);
+	for (i = 0; i < slen; i++) {
+		kgid_t kgid = make_kgid(&init_user_ns, svc_getnl(argv));
+		if (!gid_valid(kgid))
+			goto badcred;
+		GROUP_AT(cred->cr_group_info, i) = kgid;
+	}
 	if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) {
 		*authp = rpc_autherr_badverf;
 		return SVC_DENIED;
diff --git a/security/keys/permission.c b/security/keys/permission.c
index e146cbd..5442900 100644
--- a/security/keys/permission.c
+++ b/security/keys/permission.c
@@ -53,7 +53,8 @@ int key_task_permission(const key_ref_t key_ref, const struct cred *cred,
 			goto use_these_perms;
 		}
 
-		ret = groups_search(cred->group_info, key->gid);
+		ret = groups_search(cred->group_info,
+				    make_kgid(current_user_ns(), key->gid));
 		if (ret) {
 			kperm = key->perm >> 8;
 			goto use_these_perms;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 18/43] userns: Convert group_info values from gid_t to kgid_t.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

As a first step to converting struct cred to be all kuid_t and kgid_t
values convert the group values stored in group_info to always be
kgid_t values.   Unless user namespaces are used this change should
have no effect.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 arch/s390/kernel/compat_linux.c   |   13 ++++++++-
 fs/nfsd/auth.c                    |    5 ++-
 fs/proc/array.c                   |    5 +++-
 include/linux/cred.h              |    9 ++++---
 kernel/groups.c                   |   48 +++++++++++++++++++-----------------
 kernel/uid16.c                    |   14 +++++++++-
 net/ipv4/ping.c                   |   11 ++++++--
 net/sunrpc/auth_generic.c         |    4 +-
 net/sunrpc/auth_gss/svcauth_gss.c |    7 ++++-
 net/sunrpc/auth_unix.c            |   15 ++++++++---
 net/sunrpc/svcauth_unix.c         |   18 ++++++++++---
 security/keys/permission.c        |    3 +-
 12 files changed, 103 insertions(+), 49 deletions(-)

diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index ab64bdb..5baac18 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -173,11 +173,14 @@ asmlinkage long sys32_setfsgid16(u16 gid)
 
 static int groups16_to_user(u16 __user *grouplist, struct group_info *group_info)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	u16 group;
+	kgid_t kgid;
 
 	for (i = 0; i < group_info->ngroups; i++) {
-		group = (u16)GROUP_AT(group_info, i);
+		kgid = GROUP_AT(group_info, i);
+		group = (u16)from_kgid_munged(user_ns, kgid);
 		if (put_user(group, grouplist+i))
 			return -EFAULT;
 	}
@@ -187,13 +190,19 @@ static int groups16_to_user(u16 __user *grouplist, struct group_info *group_info
 
 static int groups16_from_user(struct group_info *group_info, u16 __user *grouplist)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	u16 group;
 
 	for (i = 0; i < group_info->ngroups; i++) {
 		if (get_user(group, grouplist+i))
 			return  -EFAULT;
-		GROUP_AT(group_info, i) = (gid_t)group;
+
+		kgid = make_kgid(user_ns, (gid_t)group);
+		if (!gid_valid(kgid))
+			return -EINVAL;
+
+		GROUP_AT(group_info, i) = kgid;
 	}
 
 	return 0;
diff --git a/fs/nfsd/auth.c b/fs/nfsd/auth.c
index 79717a4..204438c 100644
--- a/fs/nfsd/auth.c
+++ b/fs/nfsd/auth.c
@@ -1,6 +1,7 @@
 /* Copyright (C) 1995, 1996 Olaf Kirch <okir@monad.swb.de> */
 
 #include <linux/sched.h>
+#include <linux/user_namespace.h>
 #include "nfsd.h"
 #include "auth.h"
 
@@ -56,8 +57,8 @@ int nfsd_setuser(struct svc_rqst *rqstp, struct svc_export *exp)
 			goto oom;
 
 		for (i = 0; i < rqgi->ngroups; i++) {
-			if (!GROUP_AT(rqgi, i))
-				GROUP_AT(gi, i) = exp->ex_anon_gid;
+			if (gid_eq(GLOBAL_ROOT_GID, GROUP_AT(rqgi, i)))
+				GROUP_AT(gi, i) = make_kgid(&init_user_ns, exp->ex_anon_gid);
 			else
 				GROUP_AT(gi, i) = GROUP_AT(rqgi, i);
 		}
diff --git a/fs/proc/array.c b/fs/proc/array.c
index f9bd395..36a0a91 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -81,6 +81,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/ptrace.h>
 #include <linux/tracehook.h>
+#include <linux/user_namespace.h>
 
 #include <asm/pgtable.h>
 #include <asm/processor.h>
@@ -161,6 +162,7 @@ static inline const char *get_task_state(struct task_struct *tsk)
 static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 				struct pid *pid, struct task_struct *p)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	struct group_info *group_info;
 	int g;
 	struct fdtable *fdt = NULL;
@@ -205,7 +207,8 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 	task_unlock(p);
 
 	for (g = 0; g < min(group_info->ngroups, NGROUPS_SMALL); g++)
-		seq_printf(m, "%d ", GROUP_AT(group_info, g));
+		seq_printf(m, "%d ",
+			   from_kgid_munged(user_ns, GROUP_AT(group_info, g)));
 	put_cred(cred);
 
 	seq_putc(m, '\n');
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 2c60ec8..0ab3cda 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -17,6 +17,7 @@
 #include <linux/key.h>
 #include <linux/selinux.h>
 #include <linux/atomic.h>
+#include <linux/uidgid.h>
 
 struct user_struct;
 struct cred;
@@ -26,14 +27,14 @@ struct inode;
  * COW Supplementary groups list
  */
 #define NGROUPS_SMALL		32
-#define NGROUPS_PER_BLOCK	((unsigned int)(PAGE_SIZE / sizeof(gid_t)))
+#define NGROUPS_PER_BLOCK	((unsigned int)(PAGE_SIZE / sizeof(kgid_t)))
 
 struct group_info {
 	atomic_t	usage;
 	int		ngroups;
 	int		nblocks;
-	gid_t		small_block[NGROUPS_SMALL];
-	gid_t		*blocks[0];
+	kgid_t		small_block[NGROUPS_SMALL];
+	kgid_t		*blocks[0];
 };
 
 /**
@@ -66,7 +67,7 @@ extern struct group_info init_groups;
 extern void groups_free(struct group_info *);
 extern int set_current_groups(struct group_info *);
 extern int set_groups(struct cred *, struct group_info *);
-extern int groups_search(const struct group_info *, gid_t);
+extern int groups_search(const struct group_info *, kgid_t);
 
 /* access the groups "array" with this macro */
 #define GROUP_AT(gi, i) \
diff --git a/kernel/groups.c b/kernel/groups.c
index 99b53d1..84156f2 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -31,7 +31,7 @@ struct group_info *groups_alloc(int gidsetsize)
 		group_info->blocks[0] = group_info->small_block;
 	else {
 		for (i = 0; i < nblocks; i++) {
-			gid_t *b;
+			kgid_t *b;
 			b = (void *)__get_free_page(GFP_USER);
 			if (!b)
 				goto out_undo_partial_alloc;
@@ -66,18 +66,15 @@ EXPORT_SYMBOL(groups_free);
 static int groups_to_user(gid_t __user *grouplist,
 			  const struct group_info *group_info)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	unsigned int count = group_info->ngroups;
 
-	for (i = 0; i < group_info->nblocks; i++) {
-		unsigned int cp_count = min(NGROUPS_PER_BLOCK, count);
-		unsigned int len = cp_count * sizeof(*grouplist);
-
-		if (copy_to_user(grouplist, group_info->blocks[i], len))
+	for (i = 0; i < count; i++) {
+		gid_t gid;
+		gid = from_kgid_munged(user_ns, GROUP_AT(group_info, i));
+		if (put_user(gid, grouplist+i))
 			return -EFAULT;
-
-		grouplist += NGROUPS_PER_BLOCK;
-		count -= cp_count;
 	}
 	return 0;
 }
@@ -86,18 +83,21 @@ static int groups_to_user(gid_t __user *grouplist,
 static int groups_from_user(struct group_info *group_info,
     gid_t __user *grouplist)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	unsigned int count = group_info->ngroups;
 
-	for (i = 0; i < group_info->nblocks; i++) {
-		unsigned int cp_count = min(NGROUPS_PER_BLOCK, count);
-		unsigned int len = cp_count * sizeof(*grouplist);
-
-		if (copy_from_user(group_info->blocks[i], grouplist, len))
+	for (i = 0; i < count; i++) {
+		gid_t gid;
+		kgid_t kgid;
+		if (get_user(gid, grouplist+i))
 			return -EFAULT;
 
-		grouplist += NGROUPS_PER_BLOCK;
-		count -= cp_count;
+		kgid = make_kgid(user_ns, gid);
+		if (!gid_valid(kgid))
+			return -EINVAL;
+
+		GROUP_AT(group_info, i) = kgid;
 	}
 	return 0;
 }
@@ -117,9 +117,9 @@ static void groups_sort(struct group_info *group_info)
 		for (base = 0; base < max; base++) {
 			int left = base;
 			int right = left + stride;
-			gid_t tmp = GROUP_AT(group_info, right);
+			kgid_t tmp = GROUP_AT(group_info, right);
 
-			while (left >= 0 && GROUP_AT(group_info, left) > tmp) {
+			while (left >= 0 && gid_gt(GROUP_AT(group_info, left), tmp)) {
 				GROUP_AT(group_info, right) =
 				    GROUP_AT(group_info, left);
 				right = left;
@@ -132,7 +132,7 @@ static void groups_sort(struct group_info *group_info)
 }
 
 /* a simple bsearch */
-int groups_search(const struct group_info *group_info, gid_t grp)
+int groups_search(const struct group_info *group_info, kgid_t grp)
 {
 	unsigned int left, right;
 
@@ -143,9 +143,9 @@ int groups_search(const struct group_info *group_info, gid_t grp)
 	right = group_info->ngroups;
 	while (left < right) {
 		unsigned int mid = (left+right)/2;
-		if (grp > GROUP_AT(group_info, mid))
+		if (gid_gt(grp, GROUP_AT(group_info, mid)))
 			left = mid + 1;
-		else if (grp < GROUP_AT(group_info, mid))
+		else if (gid_lt(grp, GROUP_AT(group_info, mid)))
 			right = mid;
 		else
 			return 1;
@@ -262,7 +262,8 @@ int in_group_p(gid_t grp)
 	int retval = 1;
 
 	if (grp != cred->fsgid)
-		retval = groups_search(cred->group_info, grp);
+		retval = groups_search(cred->group_info,
+				       make_kgid(cred->user_ns, grp));
 	return retval;
 }
 
@@ -274,7 +275,8 @@ int in_egroup_p(gid_t grp)
 	int retval = 1;
 
 	if (grp != cred->egid)
-		retval = groups_search(cred->group_info, grp);
+		retval = groups_search(cred->group_info,
+				       make_kgid(cred->user_ns, grp));
 	return retval;
 }
 
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 51c6e89..e530bc3 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -134,11 +134,14 @@ SYSCALL_DEFINE1(setfsgid16, old_gid_t, gid)
 static int groups16_to_user(old_gid_t __user *grouplist,
     struct group_info *group_info)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	old_gid_t group;
+	kgid_t kgid;
 
 	for (i = 0; i < group_info->ngroups; i++) {
-		group = high2lowgid(GROUP_AT(group_info, i));
+		kgid = GROUP_AT(group_info, i);
+		group = high2lowgid(from_kgid_munged(user_ns, kgid));
 		if (put_user(group, grouplist+i))
 			return -EFAULT;
 	}
@@ -149,13 +152,20 @@ static int groups16_to_user(old_gid_t __user *grouplist,
 static int groups16_from_user(struct group_info *group_info,
     old_gid_t __user *grouplist)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	old_gid_t group;
+	kgid_t kgid;
 
 	for (i = 0; i < group_info->ngroups; i++) {
 		if (get_user(group, grouplist+i))
 			return  -EFAULT;
-		GROUP_AT(group_info, i) = low2highgid(group);
+
+		kgid = make_kgid(user_ns, low2highgid(group));
+		if (!gid_valid(kgid))
+			return -EINVAL;
+
+		GROUP_AT(group_info, i) = kgid;
 	}
 
 	return 0;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 50009c7..9d3044f 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -205,17 +205,22 @@ static int ping_init_sock(struct sock *sk)
 	gid_t range[2];
 	struct group_info *group_info = get_current_groups();
 	int i, j, count = group_info->ngroups;
+	kgid_t low, high;
 
 	inet_get_ping_group_range_net(net, range, range+1);
+	low = make_kgid(&init_user_ns, range[0]);
+	high = make_kgid(&init_user_ns, range[1]);
+	if (!gid_valid(low) || !gid_valid(high) || gid_lt(high, low))
+		return -EACCES;
+
 	if (range[0] <= group && group <= range[1])
 		return 0;
 
 	for (i = 0; i < group_info->nblocks; i++) {
 		int cp_count = min_t(int, NGROUPS_PER_BLOCK, count);
-
 		for (j = 0; j < cp_count; j++) {
-			group = group_info->blocks[i][j];
-			if (range[0] <= group && group <= range[1])
+			kgid_t gid = group_info->blocks[i][j];
+			if (gid_lte(low, gid) && gid_lte(gid, high))
 				return 0;
 		}
 
diff --git a/net/sunrpc/auth_generic.c b/net/sunrpc/auth_generic.c
index 75762f3..6ed6f20 100644
--- a/net/sunrpc/auth_generic.c
+++ b/net/sunrpc/auth_generic.c
@@ -160,8 +160,8 @@ generic_match(struct auth_cred *acred, struct rpc_cred *cred, int flags)
 	if (gcred->acred.group_info->ngroups != acred->group_info->ngroups)
 		goto out_nomatch;
 	for (i = 0; i < gcred->acred.group_info->ngroups; i++) {
-		if (GROUP_AT(gcred->acred.group_info, i) !=
-				GROUP_AT(acred->group_info, i))
+		if (!gid_eq(GROUP_AT(gcred->acred.group_info, i),
+				GROUP_AT(acred->group_info, i)))
 			goto out_nomatch;
 	}
 out_match:
diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
index 1600cfb..28b62db 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -41,6 +41,7 @@
 #include <linux/types.h>
 #include <linux/module.h>
 #include <linux/pagemap.h>
+#include <linux/user_namespace.h>
 
 #include <linux/sunrpc/auth_gss.h>
 #include <linux/sunrpc/gss_err.h>
@@ -470,9 +471,13 @@ static int rsc_parse(struct cache_detail *cd,
 		status = -EINVAL;
 		for (i=0; i<N; i++) {
 			gid_t gid;
+			kgid_t kgid;
 			if (get_int(&mesg, &gid))
 				goto out;
-			GROUP_AT(rsci.cred.cr_group_info, i) = gid;
+			kgid = make_kgid(&init_user_ns, gid);
+			if (!gid_valid(kgid))
+				goto out;
+			GROUP_AT(rsci.cred.cr_group_info, i) = kgid;
 		}
 
 		/* mech name */
diff --git a/net/sunrpc/auth_unix.c b/net/sunrpc/auth_unix.c
index e50502d..52c5abd 100644
--- a/net/sunrpc/auth_unix.c
+++ b/net/sunrpc/auth_unix.c
@@ -12,6 +12,7 @@
 #include <linux/module.h>
 #include <linux/sunrpc/clnt.h>
 #include <linux/sunrpc/auth.h>
+#include <linux/user_namespace.h>
 
 #define NFS_NGROUPS	16
 
@@ -78,8 +79,11 @@ unx_create_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags)
 		groups = NFS_NGROUPS;
 
 	cred->uc_gid = acred->gid;
-	for (i = 0; i < groups; i++)
-		cred->uc_gids[i] = GROUP_AT(acred->group_info, i);
+	for (i = 0; i < groups; i++) {
+		gid_t gid;
+		gid = from_kgid(&init_user_ns, GROUP_AT(acred->group_info, i));
+		cred->uc_gids[i] = gid;
+	}
 	if (i < NFS_NGROUPS)
 		cred->uc_gids[i] = NOGROUP;
 
@@ -126,9 +130,12 @@ unx_match(struct auth_cred *acred, struct rpc_cred *rcred, int flags)
 		groups = acred->group_info->ngroups;
 	if (groups > NFS_NGROUPS)
 		groups = NFS_NGROUPS;
-	for (i = 0; i < groups ; i++)
-		if (cred->uc_gids[i] != GROUP_AT(acred->group_info, i))
+	for (i = 0; i < groups ; i++) {
+		gid_t gid;
+		gid = from_kgid(&init_user_ns, GROUP_AT(acred->group_info, i));
+		if (cred->uc_gids[i] != gid)
 			return 0;
+	}
 	if (groups < NFS_NGROUPS &&
 	    cred->uc_gids[groups] != NOGROUP)
 		return 0;
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 521d8f7..71ec853 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -14,6 +14,7 @@
 #include <net/sock.h>
 #include <net/ipv6.h>
 #include <linux/kernel.h>
+#include <linux/user_namespace.h>
 #define RPCDBG_FACILITY	RPCDBG_AUTH
 
 #include <linux/sunrpc/clnt.h>
@@ -530,11 +531,15 @@ static int unix_gid_parse(struct cache_detail *cd,
 
 	for (i = 0 ; i < gids ; i++) {
 		int gid;
+		kgid_t kgid;
 		rv = get_int(&mesg, &gid);
 		err = -EINVAL;
 		if (rv)
 			goto out;
-		GROUP_AT(ug.gi, i) = gid;
+		kgid = make_kgid(&init_user_ns, gid);
+		if (!gid_valid(kgid))
+			goto out;
+		GROUP_AT(ug.gi, i) = kgid;
 	}
 
 	ugp = unix_gid_lookup(cd, uid);
@@ -563,6 +568,7 @@ static int unix_gid_show(struct seq_file *m,
 			 struct cache_detail *cd,
 			 struct cache_head *h)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	struct unix_gid *ug;
 	int i;
 	int glen;
@@ -580,7 +586,7 @@ static int unix_gid_show(struct seq_file *m,
 
 	seq_printf(m, "%u %d:", ug->uid, glen);
 	for (i = 0; i < glen; i++)
-		seq_printf(m, " %d", GROUP_AT(ug->gi, i));
+		seq_printf(m, " %d", from_kgid_munged(user_ns, GROUP_AT(ug->gi, i)));
 	seq_printf(m, "\n");
 	return 0;
 }
@@ -831,8 +837,12 @@ svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp)
 	cred->cr_group_info = groups_alloc(slen);
 	if (cred->cr_group_info == NULL)
 		return SVC_CLOSE;
-	for (i = 0; i < slen; i++)
-		GROUP_AT(cred->cr_group_info, i) = svc_getnl(argv);
+	for (i = 0; i < slen; i++) {
+		kgid_t kgid = make_kgid(&init_user_ns, svc_getnl(argv));
+		if (!gid_valid(kgid))
+			goto badcred;
+		GROUP_AT(cred->cr_group_info, i) = kgid;
+	}
 	if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) {
 		*authp = rpc_autherr_badverf;
 		return SVC_DENIED;
diff --git a/security/keys/permission.c b/security/keys/permission.c
index e146cbd..5442900 100644
--- a/security/keys/permission.c
+++ b/security/keys/permission.c
@@ -53,7 +53,8 @@ int key_task_permission(const key_ref_t key_ref, const struct cred *cred,
 			goto use_these_perms;
 		}
 
-		ret = groups_search(cred->group_info, key->gid);
+		ret = groups_search(cred->group_info,
+				    make_kgid(current_user_ns(), key->gid));
 		if (ret) {
 			kperm = key->perm >> 8;
 			goto use_these_perms;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 18/43] userns: Convert group_info values from gid_t to kgid_t.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

As a first step to converting struct cred to be all kuid_t and kgid_t
values convert the group values stored in group_info to always be
kgid_t values.   Unless user namespaces are used this change should
have no effect.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 arch/s390/kernel/compat_linux.c   |   13 ++++++++-
 fs/nfsd/auth.c                    |    5 ++-
 fs/proc/array.c                   |    5 +++-
 include/linux/cred.h              |    9 ++++---
 kernel/groups.c                   |   48 +++++++++++++++++++-----------------
 kernel/uid16.c                    |   14 +++++++++-
 net/ipv4/ping.c                   |   11 ++++++--
 net/sunrpc/auth_generic.c         |    4 +-
 net/sunrpc/auth_gss/svcauth_gss.c |    7 ++++-
 net/sunrpc/auth_unix.c            |   15 ++++++++---
 net/sunrpc/svcauth_unix.c         |   18 ++++++++++---
 security/keys/permission.c        |    3 +-
 12 files changed, 103 insertions(+), 49 deletions(-)

diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index ab64bdb..5baac18 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -173,11 +173,14 @@ asmlinkage long sys32_setfsgid16(u16 gid)
 
 static int groups16_to_user(u16 __user *grouplist, struct group_info *group_info)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	u16 group;
+	kgid_t kgid;
 
 	for (i = 0; i < group_info->ngroups; i++) {
-		group = (u16)GROUP_AT(group_info, i);
+		kgid = GROUP_AT(group_info, i);
+		group = (u16)from_kgid_munged(user_ns, kgid);
 		if (put_user(group, grouplist+i))
 			return -EFAULT;
 	}
@@ -187,13 +190,19 @@ static int groups16_to_user(u16 __user *grouplist, struct group_info *group_info
 
 static int groups16_from_user(struct group_info *group_info, u16 __user *grouplist)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	u16 group;
 
 	for (i = 0; i < group_info->ngroups; i++) {
 		if (get_user(group, grouplist+i))
 			return  -EFAULT;
-		GROUP_AT(group_info, i) = (gid_t)group;
+
+		kgid = make_kgid(user_ns, (gid_t)group);
+		if (!gid_valid(kgid))
+			return -EINVAL;
+
+		GROUP_AT(group_info, i) = kgid;
 	}
 
 	return 0;
diff --git a/fs/nfsd/auth.c b/fs/nfsd/auth.c
index 79717a4..204438c 100644
--- a/fs/nfsd/auth.c
+++ b/fs/nfsd/auth.c
@@ -1,6 +1,7 @@
 /* Copyright (C) 1995, 1996 Olaf Kirch <okir-pn4DOG8n3UYbFoVRYvo4fw@public.gmane.org> */
 
 #include <linux/sched.h>
+#include <linux/user_namespace.h>
 #include "nfsd.h"
 #include "auth.h"
 
@@ -56,8 +57,8 @@ int nfsd_setuser(struct svc_rqst *rqstp, struct svc_export *exp)
 			goto oom;
 
 		for (i = 0; i < rqgi->ngroups; i++) {
-			if (!GROUP_AT(rqgi, i))
-				GROUP_AT(gi, i) = exp->ex_anon_gid;
+			if (gid_eq(GLOBAL_ROOT_GID, GROUP_AT(rqgi, i)))
+				GROUP_AT(gi, i) = make_kgid(&init_user_ns, exp->ex_anon_gid);
 			else
 				GROUP_AT(gi, i) = GROUP_AT(rqgi, i);
 		}
diff --git a/fs/proc/array.c b/fs/proc/array.c
index f9bd395..36a0a91 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -81,6 +81,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/ptrace.h>
 #include <linux/tracehook.h>
+#include <linux/user_namespace.h>
 
 #include <asm/pgtable.h>
 #include <asm/processor.h>
@@ -161,6 +162,7 @@ static inline const char *get_task_state(struct task_struct *tsk)
 static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 				struct pid *pid, struct task_struct *p)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	struct group_info *group_info;
 	int g;
 	struct fdtable *fdt = NULL;
@@ -205,7 +207,8 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 	task_unlock(p);
 
 	for (g = 0; g < min(group_info->ngroups, NGROUPS_SMALL); g++)
-		seq_printf(m, "%d ", GROUP_AT(group_info, g));
+		seq_printf(m, "%d ",
+			   from_kgid_munged(user_ns, GROUP_AT(group_info, g)));
 	put_cred(cred);
 
 	seq_putc(m, '\n');
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 2c60ec8..0ab3cda 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -17,6 +17,7 @@
 #include <linux/key.h>
 #include <linux/selinux.h>
 #include <linux/atomic.h>
+#include <linux/uidgid.h>
 
 struct user_struct;
 struct cred;
@@ -26,14 +27,14 @@ struct inode;
  * COW Supplementary groups list
  */
 #define NGROUPS_SMALL		32
-#define NGROUPS_PER_BLOCK	((unsigned int)(PAGE_SIZE / sizeof(gid_t)))
+#define NGROUPS_PER_BLOCK	((unsigned int)(PAGE_SIZE / sizeof(kgid_t)))
 
 struct group_info {
 	atomic_t	usage;
 	int		ngroups;
 	int		nblocks;
-	gid_t		small_block[NGROUPS_SMALL];
-	gid_t		*blocks[0];
+	kgid_t		small_block[NGROUPS_SMALL];
+	kgid_t		*blocks[0];
 };
 
 /**
@@ -66,7 +67,7 @@ extern struct group_info init_groups;
 extern void groups_free(struct group_info *);
 extern int set_current_groups(struct group_info *);
 extern int set_groups(struct cred *, struct group_info *);
-extern int groups_search(const struct group_info *, gid_t);
+extern int groups_search(const struct group_info *, kgid_t);
 
 /* access the groups "array" with this macro */
 #define GROUP_AT(gi, i) \
diff --git a/kernel/groups.c b/kernel/groups.c
index 99b53d1..84156f2 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -31,7 +31,7 @@ struct group_info *groups_alloc(int gidsetsize)
 		group_info->blocks[0] = group_info->small_block;
 	else {
 		for (i = 0; i < nblocks; i++) {
-			gid_t *b;
+			kgid_t *b;
 			b = (void *)__get_free_page(GFP_USER);
 			if (!b)
 				goto out_undo_partial_alloc;
@@ -66,18 +66,15 @@ EXPORT_SYMBOL(groups_free);
 static int groups_to_user(gid_t __user *grouplist,
 			  const struct group_info *group_info)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	unsigned int count = group_info->ngroups;
 
-	for (i = 0; i < group_info->nblocks; i++) {
-		unsigned int cp_count = min(NGROUPS_PER_BLOCK, count);
-		unsigned int len = cp_count * sizeof(*grouplist);
-
-		if (copy_to_user(grouplist, group_info->blocks[i], len))
+	for (i = 0; i < count; i++) {
+		gid_t gid;
+		gid = from_kgid_munged(user_ns, GROUP_AT(group_info, i));
+		if (put_user(gid, grouplist+i))
 			return -EFAULT;
-
-		grouplist += NGROUPS_PER_BLOCK;
-		count -= cp_count;
 	}
 	return 0;
 }
@@ -86,18 +83,21 @@ static int groups_to_user(gid_t __user *grouplist,
 static int groups_from_user(struct group_info *group_info,
     gid_t __user *grouplist)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	unsigned int count = group_info->ngroups;
 
-	for (i = 0; i < group_info->nblocks; i++) {
-		unsigned int cp_count = min(NGROUPS_PER_BLOCK, count);
-		unsigned int len = cp_count * sizeof(*grouplist);
-
-		if (copy_from_user(group_info->blocks[i], grouplist, len))
+	for (i = 0; i < count; i++) {
+		gid_t gid;
+		kgid_t kgid;
+		if (get_user(gid, grouplist+i))
 			return -EFAULT;
 
-		grouplist += NGROUPS_PER_BLOCK;
-		count -= cp_count;
+		kgid = make_kgid(user_ns, gid);
+		if (!gid_valid(kgid))
+			return -EINVAL;
+
+		GROUP_AT(group_info, i) = kgid;
 	}
 	return 0;
 }
@@ -117,9 +117,9 @@ static void groups_sort(struct group_info *group_info)
 		for (base = 0; base < max; base++) {
 			int left = base;
 			int right = left + stride;
-			gid_t tmp = GROUP_AT(group_info, right);
+			kgid_t tmp = GROUP_AT(group_info, right);
 
-			while (left >= 0 && GROUP_AT(group_info, left) > tmp) {
+			while (left >= 0 && gid_gt(GROUP_AT(group_info, left), tmp)) {
 				GROUP_AT(group_info, right) =
 				    GROUP_AT(group_info, left);
 				right = left;
@@ -132,7 +132,7 @@ static void groups_sort(struct group_info *group_info)
 }
 
 /* a simple bsearch */
-int groups_search(const struct group_info *group_info, gid_t grp)
+int groups_search(const struct group_info *group_info, kgid_t grp)
 {
 	unsigned int left, right;
 
@@ -143,9 +143,9 @@ int groups_search(const struct group_info *group_info, gid_t grp)
 	right = group_info->ngroups;
 	while (left < right) {
 		unsigned int mid = (left+right)/2;
-		if (grp > GROUP_AT(group_info, mid))
+		if (gid_gt(grp, GROUP_AT(group_info, mid)))
 			left = mid + 1;
-		else if (grp < GROUP_AT(group_info, mid))
+		else if (gid_lt(grp, GROUP_AT(group_info, mid)))
 			right = mid;
 		else
 			return 1;
@@ -262,7 +262,8 @@ int in_group_p(gid_t grp)
 	int retval = 1;
 
 	if (grp != cred->fsgid)
-		retval = groups_search(cred->group_info, grp);
+		retval = groups_search(cred->group_info,
+				       make_kgid(cred->user_ns, grp));
 	return retval;
 }
 
@@ -274,7 +275,8 @@ int in_egroup_p(gid_t grp)
 	int retval = 1;
 
 	if (grp != cred->egid)
-		retval = groups_search(cred->group_info, grp);
+		retval = groups_search(cred->group_info,
+				       make_kgid(cred->user_ns, grp));
 	return retval;
 }
 
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 51c6e89..e530bc3 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -134,11 +134,14 @@ SYSCALL_DEFINE1(setfsgid16, old_gid_t, gid)
 static int groups16_to_user(old_gid_t __user *grouplist,
     struct group_info *group_info)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	old_gid_t group;
+	kgid_t kgid;
 
 	for (i = 0; i < group_info->ngroups; i++) {
-		group = high2lowgid(GROUP_AT(group_info, i));
+		kgid = GROUP_AT(group_info, i);
+		group = high2lowgid(from_kgid_munged(user_ns, kgid));
 		if (put_user(group, grouplist+i))
 			return -EFAULT;
 	}
@@ -149,13 +152,20 @@ static int groups16_to_user(old_gid_t __user *grouplist,
 static int groups16_from_user(struct group_info *group_info,
     old_gid_t __user *grouplist)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	int i;
 	old_gid_t group;
+	kgid_t kgid;
 
 	for (i = 0; i < group_info->ngroups; i++) {
 		if (get_user(group, grouplist+i))
 			return  -EFAULT;
-		GROUP_AT(group_info, i) = low2highgid(group);
+
+		kgid = make_kgid(user_ns, low2highgid(group));
+		if (!gid_valid(kgid))
+			return -EINVAL;
+
+		GROUP_AT(group_info, i) = kgid;
 	}
 
 	return 0;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 50009c7..9d3044f 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -205,17 +205,22 @@ static int ping_init_sock(struct sock *sk)
 	gid_t range[2];
 	struct group_info *group_info = get_current_groups();
 	int i, j, count = group_info->ngroups;
+	kgid_t low, high;
 
 	inet_get_ping_group_range_net(net, range, range+1);
+	low = make_kgid(&init_user_ns, range[0]);
+	high = make_kgid(&init_user_ns, range[1]);
+	if (!gid_valid(low) || !gid_valid(high) || gid_lt(high, low))
+		return -EACCES;
+
 	if (range[0] <= group && group <= range[1])
 		return 0;
 
 	for (i = 0; i < group_info->nblocks; i++) {
 		int cp_count = min_t(int, NGROUPS_PER_BLOCK, count);
-
 		for (j = 0; j < cp_count; j++) {
-			group = group_info->blocks[i][j];
-			if (range[0] <= group && group <= range[1])
+			kgid_t gid = group_info->blocks[i][j];
+			if (gid_lte(low, gid) && gid_lte(gid, high))
 				return 0;
 		}
 
diff --git a/net/sunrpc/auth_generic.c b/net/sunrpc/auth_generic.c
index 75762f3..6ed6f20 100644
--- a/net/sunrpc/auth_generic.c
+++ b/net/sunrpc/auth_generic.c
@@ -160,8 +160,8 @@ generic_match(struct auth_cred *acred, struct rpc_cred *cred, int flags)
 	if (gcred->acred.group_info->ngroups != acred->group_info->ngroups)
 		goto out_nomatch;
 	for (i = 0; i < gcred->acred.group_info->ngroups; i++) {
-		if (GROUP_AT(gcred->acred.group_info, i) !=
-				GROUP_AT(acred->group_info, i))
+		if (!gid_eq(GROUP_AT(gcred->acred.group_info, i),
+				GROUP_AT(acred->group_info, i)))
 			goto out_nomatch;
 	}
 out_match:
diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
index 1600cfb..28b62db 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -41,6 +41,7 @@
 #include <linux/types.h>
 #include <linux/module.h>
 #include <linux/pagemap.h>
+#include <linux/user_namespace.h>
 
 #include <linux/sunrpc/auth_gss.h>
 #include <linux/sunrpc/gss_err.h>
@@ -470,9 +471,13 @@ static int rsc_parse(struct cache_detail *cd,
 		status = -EINVAL;
 		for (i=0; i<N; i++) {
 			gid_t gid;
+			kgid_t kgid;
 			if (get_int(&mesg, &gid))
 				goto out;
-			GROUP_AT(rsci.cred.cr_group_info, i) = gid;
+			kgid = make_kgid(&init_user_ns, gid);
+			if (!gid_valid(kgid))
+				goto out;
+			GROUP_AT(rsci.cred.cr_group_info, i) = kgid;
 		}
 
 		/* mech name */
diff --git a/net/sunrpc/auth_unix.c b/net/sunrpc/auth_unix.c
index e50502d..52c5abd 100644
--- a/net/sunrpc/auth_unix.c
+++ b/net/sunrpc/auth_unix.c
@@ -12,6 +12,7 @@
 #include <linux/module.h>
 #include <linux/sunrpc/clnt.h>
 #include <linux/sunrpc/auth.h>
+#include <linux/user_namespace.h>
 
 #define NFS_NGROUPS	16
 
@@ -78,8 +79,11 @@ unx_create_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags)
 		groups = NFS_NGROUPS;
 
 	cred->uc_gid = acred->gid;
-	for (i = 0; i < groups; i++)
-		cred->uc_gids[i] = GROUP_AT(acred->group_info, i);
+	for (i = 0; i < groups; i++) {
+		gid_t gid;
+		gid = from_kgid(&init_user_ns, GROUP_AT(acred->group_info, i));
+		cred->uc_gids[i] = gid;
+	}
 	if (i < NFS_NGROUPS)
 		cred->uc_gids[i] = NOGROUP;
 
@@ -126,9 +130,12 @@ unx_match(struct auth_cred *acred, struct rpc_cred *rcred, int flags)
 		groups = acred->group_info->ngroups;
 	if (groups > NFS_NGROUPS)
 		groups = NFS_NGROUPS;
-	for (i = 0; i < groups ; i++)
-		if (cred->uc_gids[i] != GROUP_AT(acred->group_info, i))
+	for (i = 0; i < groups ; i++) {
+		gid_t gid;
+		gid = from_kgid(&init_user_ns, GROUP_AT(acred->group_info, i));
+		if (cred->uc_gids[i] != gid)
 			return 0;
+	}
 	if (groups < NFS_NGROUPS &&
 	    cred->uc_gids[groups] != NOGROUP)
 		return 0;
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 521d8f7..71ec853 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -14,6 +14,7 @@
 #include <net/sock.h>
 #include <net/ipv6.h>
 #include <linux/kernel.h>
+#include <linux/user_namespace.h>
 #define RPCDBG_FACILITY	RPCDBG_AUTH
 
 #include <linux/sunrpc/clnt.h>
@@ -530,11 +531,15 @@ static int unix_gid_parse(struct cache_detail *cd,
 
 	for (i = 0 ; i < gids ; i++) {
 		int gid;
+		kgid_t kgid;
 		rv = get_int(&mesg, &gid);
 		err = -EINVAL;
 		if (rv)
 			goto out;
-		GROUP_AT(ug.gi, i) = gid;
+		kgid = make_kgid(&init_user_ns, gid);
+		if (!gid_valid(kgid))
+			goto out;
+		GROUP_AT(ug.gi, i) = kgid;
 	}
 
 	ugp = unix_gid_lookup(cd, uid);
@@ -563,6 +568,7 @@ static int unix_gid_show(struct seq_file *m,
 			 struct cache_detail *cd,
 			 struct cache_head *h)
 {
+	struct user_namespace *user_ns = current_user_ns();
 	struct unix_gid *ug;
 	int i;
 	int glen;
@@ -580,7 +586,7 @@ static int unix_gid_show(struct seq_file *m,
 
 	seq_printf(m, "%u %d:", ug->uid, glen);
 	for (i = 0; i < glen; i++)
-		seq_printf(m, " %d", GROUP_AT(ug->gi, i));
+		seq_printf(m, " %d", from_kgid_munged(user_ns, GROUP_AT(ug->gi, i)));
 	seq_printf(m, "\n");
 	return 0;
 }
@@ -831,8 +837,12 @@ svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp)
 	cred->cr_group_info = groups_alloc(slen);
 	if (cred->cr_group_info == NULL)
 		return SVC_CLOSE;
-	for (i = 0; i < slen; i++)
-		GROUP_AT(cred->cr_group_info, i) = svc_getnl(argv);
+	for (i = 0; i < slen; i++) {
+		kgid_t kgid = make_kgid(&init_user_ns, svc_getnl(argv));
+		if (!gid_valid(kgid))
+			goto badcred;
+		GROUP_AT(cred->cr_group_info, i) = kgid;
+	}
 	if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) {
 		*authp = rpc_autherr_badverf;
 		return SVC_DENIED;
diff --git a/security/keys/permission.c b/security/keys/permission.c
index e146cbd..5442900 100644
--- a/security/keys/permission.c
+++ b/security/keys/permission.c
@@ -53,7 +53,8 @@ int key_task_permission(const key_ref_t key_ref, const struct cred *cred,
 			goto use_these_perms;
 		}
 
-		ret = groups_search(cred->group_info, key->gid);
+		ret = groups_search(cred->group_info,
+				    make_kgid(current_user_ns(), key->gid));
 		if (ret) {
 			kperm = key->perm >> 8;
 			goto use_these_perms;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 19/43] userns: Store uid and gid values in struct cred with kuid_t and kgid_t types
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

cred.h and a few trivial users of struct cred are changed.  The rest of the users
of struct cred are left for other patches as there are too many changes to make
in one go and leave the change reviewable.  If the user namespace is disabled and
CONFIG_UIDGID_STRICT_TYPE_CHECKS are disabled the code will contiue to compile
and behave correctly.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 arch/x86/mm/fault.c            |    2 +-
 fs/ioprio.c                    |    8 ++------
 include/linux/cred.h           |   16 ++++++++--------
 include/linux/user_namespace.h |    8 ++++----
 kernel/cred.c                  |   36 ++++++++++++++++++++++--------------
 kernel/signal.c                |   14 ++++++++------
 kernel/sys.c                   |   26 +++++++++-----------------
 kernel/user_namespace.c        |    4 ++--
 mm/oom_kill.c                  |    4 ++--
 security/commoncap.c           |    3 +--
 10 files changed, 59 insertions(+), 62 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 3ecfd1a..76dcd9d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -582,7 +582,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code,
 		pte_t *pte = lookup_address(address, &level);
 
 		if (pte && pte_present(*pte) && !pte_exec(*pte))
-			printk(nx_warning, current_uid());
+			printk(nx_warning, from_kuid(&init_user_ns, current_uid()));
 	}
 
 	printk(KERN_ALERT "BUG: unable to handle kernel ");
diff --git a/fs/ioprio.c b/fs/ioprio.c
index 8e35e96..2072e41 100644
--- a/fs/ioprio.c
+++ b/fs/ioprio.c
@@ -123,9 +123,7 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
 				break;
 
 			do_each_thread(g, p) {
-				const struct cred *tcred = __task_cred(p);
-				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
-				if (!uid_eq(tcred_uid, uid))
+				if (!uid_eq(task_uid(p), uid))
 					continue;
 				ret = set_task_ioprio(p, ioprio);
 				if (ret)
@@ -220,9 +218,7 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
 				break;
 
 			do_each_thread(g, p) {
-				const struct cred *tcred = __task_cred(p);
-				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
-				if (!uid_eq(tcred_uid, user->uid))
+				if (!uid_eq(task_uid(p), user->uid))
 					continue;
 				tmpio = get_task_ioprio(p);
 				if (tmpio < 0)
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 0ab3cda..fac0579 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -123,14 +123,14 @@ struct cred {
 #define CRED_MAGIC	0x43736564
 #define CRED_MAGIC_DEAD	0x44656144
 #endif
-	uid_t		uid;		/* real UID of the task */
-	gid_t		gid;		/* real GID of the task */
-	uid_t		suid;		/* saved UID of the task */
-	gid_t		sgid;		/* saved GID of the task */
-	uid_t		euid;		/* effective UID of the task */
-	gid_t		egid;		/* effective GID of the task */
-	uid_t		fsuid;		/* UID for VFS ops */
-	gid_t		fsgid;		/* GID for VFS ops */
+	kuid_t		uid;		/* real UID of the task */
+	kgid_t		gid;		/* real GID of the task */
+	kuid_t		suid;		/* saved UID of the task */
+	kgid_t		sgid;		/* saved GID of the task */
+	kuid_t		euid;		/* effective UID of the task */
+	kgid_t		egid;		/* effective GID of the task */
+	kuid_t		fsuid;		/* UID for VFS ops */
+	kgid_t		fsgid;		/* GID for VFS ops */
 	unsigned	securebits;	/* SUID-less security management */
 	kernel_cap_t	cap_inheritable; /* caps our children can inherit */
 	kernel_cap_t	cap_permitted;	/* caps we're permitted */
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 4c9846d..a2c6145 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -70,15 +70,15 @@ static inline void put_user_ns(struct user_namespace *ns)
 #endif
 
 static inline uid_t user_ns_map_uid(struct user_namespace *to,
-	const struct cred *cred, uid_t uid)
+	const struct cred *cred, kuid_t uid)
 {
-	return from_kuid_munged(to, make_kuid(cred->user_ns, uid));
+	return from_kuid_munged(to, uid);
 }
 
 static inline gid_t user_ns_map_gid(struct user_namespace *to,
-	const struct cred *cred, gid_t gid)
+	const struct cred *cred, kgid_t gid)
 {
-	return from_kgid_munged(to, make_kgid(cred->user_ns, gid));
+	return from_kgid_munged(to, gid);
 }
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/cred.c b/kernel/cred.c
index 7a0d806..eddc5e2 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -49,6 +49,14 @@ struct cred init_cred = {
 	.subscribers		= ATOMIC_INIT(2),
 	.magic			= CRED_MAGIC,
 #endif
+	.uid			= GLOBAL_ROOT_UID,
+	.gid			= GLOBAL_ROOT_GID,
+	.suid			= GLOBAL_ROOT_UID,
+	.sgid			= GLOBAL_ROOT_GID,
+	.euid			= GLOBAL_ROOT_UID,
+	.egid			= GLOBAL_ROOT_GID,
+	.fsuid			= GLOBAL_ROOT_UID,
+	.fsgid			= GLOBAL_ROOT_GID,
 	.securebits		= SECUREBITS_DEFAULT,
 	.cap_inheritable	= CAP_EMPTY_SET,
 	.cap_permitted		= CAP_FULL_SET,
@@ -488,10 +496,10 @@ int commit_creds(struct cred *new)
 	get_cred(new); /* we will require a ref for the subj creds too */
 
 	/* dumpability changes */
-	if (old->euid != new->euid ||
-	    old->egid != new->egid ||
-	    old->fsuid != new->fsuid ||
-	    old->fsgid != new->fsgid ||
+	if (!uid_eq(old->euid, new->euid) ||
+	    !gid_eq(old->egid, new->egid) ||
+	    !uid_eq(old->fsuid, new->fsuid) ||
+	    !gid_eq(old->fsgid, new->fsgid) ||
 	    !cap_issubset(new->cap_permitted, old->cap_permitted)) {
 		if (task->mm)
 			set_dumpable(task->mm, suid_dumpable);
@@ -500,9 +508,9 @@ int commit_creds(struct cred *new)
 	}
 
 	/* alter the thread keyring */
-	if (new->fsuid != old->fsuid)
+	if (!uid_eq(new->fsuid, old->fsuid))
 		key_fsuid_changed(task);
-	if (new->fsgid != old->fsgid)
+	if (!gid_eq(new->fsgid, old->fsgid))
 		key_fsgid_changed(task);
 
 	/* do it
@@ -519,16 +527,16 @@ int commit_creds(struct cred *new)
 	alter_cred_subscribers(old, -2);
 
 	/* send notifications */
-	if (new->uid   != old->uid  ||
-	    new->euid  != old->euid ||
-	    new->suid  != old->suid ||
-	    new->fsuid != old->fsuid)
+	if (!uid_eq(new->uid,   old->uid)  ||
+	    !uid_eq(new->euid,  old->euid) ||
+	    !uid_eq(new->suid,  old->suid) ||
+	    !uid_eq(new->fsuid, old->fsuid))
 		proc_id_connector(task, PROC_EVENT_UID);
 
-	if (new->gid   != old->gid  ||
-	    new->egid  != old->egid ||
-	    new->sgid  != old->sgid ||
-	    new->fsgid != old->fsgid)
+	if (!gid_eq(new->gid,   old->gid)  ||
+	    !gid_eq(new->egid,  old->egid) ||
+	    !gid_eq(new->sgid,  old->sgid) ||
+	    !gid_eq(new->fsgid, old->fsgid))
 		proc_id_connector(task, PROC_EVENT_GID);
 
 	/* release the old obj and subj refs both */
diff --git a/kernel/signal.c b/kernel/signal.c
index e2c5d84..2734dc9 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1038,8 +1038,10 @@ static inline void userns_fixup_signal_uid(struct siginfo *info, struct task_str
 	if (SI_FROMKERNEL(info))
 		return;
 
-	info->si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
-					current_cred(), info->si_uid);
+	rcu_read_lock();
+	info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
+					make_kuid(current_user_ns(), info->si_uid));
+	rcu_read_unlock();
 }
 #else
 static inline void userns_fixup_signal_uid(struct siginfo *info, struct task_struct *t)
@@ -1106,7 +1108,7 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 			q->info.si_code = SI_USER;
 			q->info.si_pid = task_tgid_nr_ns(current,
 							task_active_pid_ns(t));
-			q->info.si_uid = current_uid();
+			q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 			break;
 		case (unsigned long) SEND_SIG_PRIV:
 			q->info.si_signo = sig;
@@ -1973,7 +1975,7 @@ static void ptrace_do_notify(int signr, int exit_code, int why)
 	info.si_signo = signr;
 	info.si_code = exit_code;
 	info.si_pid = task_pid_vnr(current);
-	info.si_uid = current_uid();
+	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	/* Let the debugger run.  */
 	ptrace_stop(exit_code, why, 1, &info);
@@ -2828,7 +2830,7 @@ SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
 	info.si_errno = 0;
 	info.si_code = SI_USER;
 	info.si_pid = task_tgid_vnr(current);
-	info.si_uid = current_uid();
+	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	return kill_something_info(sig, &info, pid);
 }
@@ -2871,7 +2873,7 @@ static int do_tkill(pid_t tgid, pid_t pid, int sig)
 	info.si_errno = 0;
 	info.si_code = SI_TKILL;
 	info.si_pid = task_tgid_vnr(current);
-	info.si_uid = current_uid();
+	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	return do_send_specific(tgid, pid, sig, &info);
 }
diff --git a/kernel/sys.c b/kernel/sys.c
index f0c43b4..3996281 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -175,7 +175,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 	const struct cred *cred = current_cred();
 	int error = -EINVAL;
 	struct pid *pgrp;
-	kuid_t cred_uid;
 	kuid_t uid;
 
 	if (which > PRIO_USER || which < PRIO_PROCESS)
@@ -209,22 +208,19 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
-			cred_uid = make_kuid(cred->user_ns, cred->uid);
 			uid = make_kuid(cred->user_ns, who);
 			user = cred->user;
 			if (!who)
-				uid = cred_uid;
-			else if (!uid_eq(uid, cred_uid) &&
+				uid = cred->uid;
+			else if (!uid_eq(uid, cred->uid) &&
 				 !(user = find_user(uid)))
 				goto out_unlock;	/* No processes for this user */
 
 			do_each_thread(g, p) {
-				const struct cred *tcred = __task_cred(p);
-				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
-				if (uid_eq(tcred_uid, uid))
+				if (uid_eq(task_uid(p), uid))
 					error = set_one_prio(p, niceval, error);
 			} while_each_thread(g, p);
-			if (!uid_eq(uid, cred_uid))
+			if (!uid_eq(uid, cred->uid))
 				free_uid(user);		/* For find_user() */
 			break;
 	}
@@ -248,7 +244,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 	const struct cred *cred = current_cred();
 	long niceval, retval = -ESRCH;
 	struct pid *pgrp;
-	kuid_t cred_uid;
 	kuid_t uid;
 
 	if (which > PRIO_USER || which < PRIO_PROCESS)
@@ -280,25 +275,22 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
-			cred_uid = make_kuid(cred->user_ns, cred->uid);
 			uid = make_kuid(cred->user_ns, who);
 			user = cred->user;
 			if (!who)
-				uid = cred_uid;
-			else if (!uid_eq(uid, cred_uid) &&
+				uid = cred->uid;
+			else if (!uid_eq(uid, cred->uid) &&
 				 !(user = find_user(uid)))
 				goto out_unlock;	/* No processes for this user */
 
 			do_each_thread(g, p) {
-				const struct cred *tcred = __task_cred(p);
-				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
-				if (uid_eq(tcred_uid, uid)) {
+				if (uid_eq(task_uid(p), uid)) {
 					niceval = 20 - task_nice(p);
 					if (niceval > retval)
 						retval = niceval;
 				}
 			} while_each_thread(g, p);
-			if (!uid_eq(uid, cred_uid))
+			if (!uid_eq(uid, cred->uid))
 				free_uid(user);		/* for find_user() */
 			break;
 	}
@@ -641,7 +633,7 @@ static int set_user(struct cred *new)
 {
 	struct user_struct *new_user;
 
-	new_user = alloc_uid(make_kuid(new->user_ns, new->uid));
+	new_user = alloc_uid(new->uid);
 	if (!new_user)
 		return -EAGAIN;
 
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 9991bac..0683dbf 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -36,8 +36,8 @@ static bool new_idmap_permitted(struct user_namespace *ns, int cap_setid,
 int create_user_ns(struct cred *new)
 {
 	struct user_namespace *ns, *parent_ns = new->user_ns;
-	kuid_t owner = make_kuid(new->user_ns, new->euid);
-	kgid_t group = make_kgid(new->user_ns, new->egid);
+	kuid_t owner = new->euid;
+	kgid_t group = new->egid;
 
 	/* The creator needs a mapping in the parent user namespace
 	 * or else we won't be able to reasonably tell userspace who
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 46bf2ed5..9f09a1f 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -410,8 +410,8 @@ static void dump_tasks(const struct mem_cgroup *memcg, const nodemask_t *nodemas
 		}
 
 		pr_info("[%5d] %5d %5d %8lu %8lu %3u     %3d         %5d %s\n",
-			task->pid, task_uid(task), task->tgid,
-			task->mm->total_vm, get_mm_rss(task->mm),
+			task->pid, from_kuid(&init_user_ns, task_uid(task)),
+			task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
 			task_cpu(task), task->signal->oom_adj,
 			task->signal->oom_score_adj, task->comm);
 		task_unlock(task);
diff --git a/security/commoncap.c b/security/commoncap.c
index f2399d8..dbd465a 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -77,8 +77,7 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
 {
 	for (;;) {
 		/* The owner of the user namespace has all caps. */
-		if (targ_ns != &init_user_ns && uid_eq(targ_ns->owner,
-						       make_kuid(cred->user_ns, cred->euid)))
+		if (targ_ns != &init_user_ns && uid_eq(targ_ns->owner, cred->euid))
 			return 0;
 
 		/* Do we have the necessary capabilities? */
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 19/43] userns: Store uid and gid values in struct cred with kuid_t and kgid_t types
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

cred.h and a few trivial users of struct cred are changed.  The rest of the users
of struct cred are left for other patches as there are too many changes to make
in one go and leave the change reviewable.  If the user namespace is disabled and
CONFIG_UIDGID_STRICT_TYPE_CHECKS are disabled the code will contiue to compile
and behave correctly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 arch/x86/mm/fault.c            |    2 +-
 fs/ioprio.c                    |    8 ++------
 include/linux/cred.h           |   16 ++++++++--------
 include/linux/user_namespace.h |    8 ++++----
 kernel/cred.c                  |   36 ++++++++++++++++++++++--------------
 kernel/signal.c                |   14 ++++++++------
 kernel/sys.c                   |   26 +++++++++-----------------
 kernel/user_namespace.c        |    4 ++--
 mm/oom_kill.c                  |    4 ++--
 security/commoncap.c           |    3 +--
 10 files changed, 59 insertions(+), 62 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 3ecfd1a..76dcd9d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -582,7 +582,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code,
 		pte_t *pte = lookup_address(address, &level);
 
 		if (pte && pte_present(*pte) && !pte_exec(*pte))
-			printk(nx_warning, current_uid());
+			printk(nx_warning, from_kuid(&init_user_ns, current_uid()));
 	}
 
 	printk(KERN_ALERT "BUG: unable to handle kernel ");
diff --git a/fs/ioprio.c b/fs/ioprio.c
index 8e35e96..2072e41 100644
--- a/fs/ioprio.c
+++ b/fs/ioprio.c
@@ -123,9 +123,7 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
 				break;
 
 			do_each_thread(g, p) {
-				const struct cred *tcred = __task_cred(p);
-				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
-				if (!uid_eq(tcred_uid, uid))
+				if (!uid_eq(task_uid(p), uid))
 					continue;
 				ret = set_task_ioprio(p, ioprio);
 				if (ret)
@@ -220,9 +218,7 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
 				break;
 
 			do_each_thread(g, p) {
-				const struct cred *tcred = __task_cred(p);
-				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
-				if (!uid_eq(tcred_uid, user->uid))
+				if (!uid_eq(task_uid(p), user->uid))
 					continue;
 				tmpio = get_task_ioprio(p);
 				if (tmpio < 0)
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 0ab3cda..fac0579 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -123,14 +123,14 @@ struct cred {
 #define CRED_MAGIC	0x43736564
 #define CRED_MAGIC_DEAD	0x44656144
 #endif
-	uid_t		uid;		/* real UID of the task */
-	gid_t		gid;		/* real GID of the task */
-	uid_t		suid;		/* saved UID of the task */
-	gid_t		sgid;		/* saved GID of the task */
-	uid_t		euid;		/* effective UID of the task */
-	gid_t		egid;		/* effective GID of the task */
-	uid_t		fsuid;		/* UID for VFS ops */
-	gid_t		fsgid;		/* GID for VFS ops */
+	kuid_t		uid;		/* real UID of the task */
+	kgid_t		gid;		/* real GID of the task */
+	kuid_t		suid;		/* saved UID of the task */
+	kgid_t		sgid;		/* saved GID of the task */
+	kuid_t		euid;		/* effective UID of the task */
+	kgid_t		egid;		/* effective GID of the task */
+	kuid_t		fsuid;		/* UID for VFS ops */
+	kgid_t		fsgid;		/* GID for VFS ops */
 	unsigned	securebits;	/* SUID-less security management */
 	kernel_cap_t	cap_inheritable; /* caps our children can inherit */
 	kernel_cap_t	cap_permitted;	/* caps we're permitted */
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 4c9846d..a2c6145 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -70,15 +70,15 @@ static inline void put_user_ns(struct user_namespace *ns)
 #endif
 
 static inline uid_t user_ns_map_uid(struct user_namespace *to,
-	const struct cred *cred, uid_t uid)
+	const struct cred *cred, kuid_t uid)
 {
-	return from_kuid_munged(to, make_kuid(cred->user_ns, uid));
+	return from_kuid_munged(to, uid);
 }
 
 static inline gid_t user_ns_map_gid(struct user_namespace *to,
-	const struct cred *cred, gid_t gid)
+	const struct cred *cred, kgid_t gid)
 {
-	return from_kgid_munged(to, make_kgid(cred->user_ns, gid));
+	return from_kgid_munged(to, gid);
 }
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/cred.c b/kernel/cred.c
index 7a0d806..eddc5e2 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -49,6 +49,14 @@ struct cred init_cred = {
 	.subscribers		= ATOMIC_INIT(2),
 	.magic			= CRED_MAGIC,
 #endif
+	.uid			= GLOBAL_ROOT_UID,
+	.gid			= GLOBAL_ROOT_GID,
+	.suid			= GLOBAL_ROOT_UID,
+	.sgid			= GLOBAL_ROOT_GID,
+	.euid			= GLOBAL_ROOT_UID,
+	.egid			= GLOBAL_ROOT_GID,
+	.fsuid			= GLOBAL_ROOT_UID,
+	.fsgid			= GLOBAL_ROOT_GID,
 	.securebits		= SECUREBITS_DEFAULT,
 	.cap_inheritable	= CAP_EMPTY_SET,
 	.cap_permitted		= CAP_FULL_SET,
@@ -488,10 +496,10 @@ int commit_creds(struct cred *new)
 	get_cred(new); /* we will require a ref for the subj creds too */
 
 	/* dumpability changes */
-	if (old->euid != new->euid ||
-	    old->egid != new->egid ||
-	    old->fsuid != new->fsuid ||
-	    old->fsgid != new->fsgid ||
+	if (!uid_eq(old->euid, new->euid) ||
+	    !gid_eq(old->egid, new->egid) ||
+	    !uid_eq(old->fsuid, new->fsuid) ||
+	    !gid_eq(old->fsgid, new->fsgid) ||
 	    !cap_issubset(new->cap_permitted, old->cap_permitted)) {
 		if (task->mm)
 			set_dumpable(task->mm, suid_dumpable);
@@ -500,9 +508,9 @@ int commit_creds(struct cred *new)
 	}
 
 	/* alter the thread keyring */
-	if (new->fsuid != old->fsuid)
+	if (!uid_eq(new->fsuid, old->fsuid))
 		key_fsuid_changed(task);
-	if (new->fsgid != old->fsgid)
+	if (!gid_eq(new->fsgid, old->fsgid))
 		key_fsgid_changed(task);
 
 	/* do it
@@ -519,16 +527,16 @@ int commit_creds(struct cred *new)
 	alter_cred_subscribers(old, -2);
 
 	/* send notifications */
-	if (new->uid   != old->uid  ||
-	    new->euid  != old->euid ||
-	    new->suid  != old->suid ||
-	    new->fsuid != old->fsuid)
+	if (!uid_eq(new->uid,   old->uid)  ||
+	    !uid_eq(new->euid,  old->euid) ||
+	    !uid_eq(new->suid,  old->suid) ||
+	    !uid_eq(new->fsuid, old->fsuid))
 		proc_id_connector(task, PROC_EVENT_UID);
 
-	if (new->gid   != old->gid  ||
-	    new->egid  != old->egid ||
-	    new->sgid  != old->sgid ||
-	    new->fsgid != old->fsgid)
+	if (!gid_eq(new->gid,   old->gid)  ||
+	    !gid_eq(new->egid,  old->egid) ||
+	    !gid_eq(new->sgid,  old->sgid) ||
+	    !gid_eq(new->fsgid, old->fsgid))
 		proc_id_connector(task, PROC_EVENT_GID);
 
 	/* release the old obj and subj refs both */
diff --git a/kernel/signal.c b/kernel/signal.c
index e2c5d84..2734dc9 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1038,8 +1038,10 @@ static inline void userns_fixup_signal_uid(struct siginfo *info, struct task_str
 	if (SI_FROMKERNEL(info))
 		return;
 
-	info->si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
-					current_cred(), info->si_uid);
+	rcu_read_lock();
+	info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
+					make_kuid(current_user_ns(), info->si_uid));
+	rcu_read_unlock();
 }
 #else
 static inline void userns_fixup_signal_uid(struct siginfo *info, struct task_struct *t)
@@ -1106,7 +1108,7 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 			q->info.si_code = SI_USER;
 			q->info.si_pid = task_tgid_nr_ns(current,
 							task_active_pid_ns(t));
-			q->info.si_uid = current_uid();
+			q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 			break;
 		case (unsigned long) SEND_SIG_PRIV:
 			q->info.si_signo = sig;
@@ -1973,7 +1975,7 @@ static void ptrace_do_notify(int signr, int exit_code, int why)
 	info.si_signo = signr;
 	info.si_code = exit_code;
 	info.si_pid = task_pid_vnr(current);
-	info.si_uid = current_uid();
+	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	/* Let the debugger run.  */
 	ptrace_stop(exit_code, why, 1, &info);
@@ -2828,7 +2830,7 @@ SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
 	info.si_errno = 0;
 	info.si_code = SI_USER;
 	info.si_pid = task_tgid_vnr(current);
-	info.si_uid = current_uid();
+	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	return kill_something_info(sig, &info, pid);
 }
@@ -2871,7 +2873,7 @@ static int do_tkill(pid_t tgid, pid_t pid, int sig)
 	info.si_errno = 0;
 	info.si_code = SI_TKILL;
 	info.si_pid = task_tgid_vnr(current);
-	info.si_uid = current_uid();
+	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	return do_send_specific(tgid, pid, sig, &info);
 }
diff --git a/kernel/sys.c b/kernel/sys.c
index f0c43b4..3996281 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -175,7 +175,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 	const struct cred *cred = current_cred();
 	int error = -EINVAL;
 	struct pid *pgrp;
-	kuid_t cred_uid;
 	kuid_t uid;
 
 	if (which > PRIO_USER || which < PRIO_PROCESS)
@@ -209,22 +208,19 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
-			cred_uid = make_kuid(cred->user_ns, cred->uid);
 			uid = make_kuid(cred->user_ns, who);
 			user = cred->user;
 			if (!who)
-				uid = cred_uid;
-			else if (!uid_eq(uid, cred_uid) &&
+				uid = cred->uid;
+			else if (!uid_eq(uid, cred->uid) &&
 				 !(user = find_user(uid)))
 				goto out_unlock;	/* No processes for this user */
 
 			do_each_thread(g, p) {
-				const struct cred *tcred = __task_cred(p);
-				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
-				if (uid_eq(tcred_uid, uid))
+				if (uid_eq(task_uid(p), uid))
 					error = set_one_prio(p, niceval, error);
 			} while_each_thread(g, p);
-			if (!uid_eq(uid, cred_uid))
+			if (!uid_eq(uid, cred->uid))
 				free_uid(user);		/* For find_user() */
 			break;
 	}
@@ -248,7 +244,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 	const struct cred *cred = current_cred();
 	long niceval, retval = -ESRCH;
 	struct pid *pgrp;
-	kuid_t cred_uid;
 	kuid_t uid;
 
 	if (which > PRIO_USER || which < PRIO_PROCESS)
@@ -280,25 +275,22 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
-			cred_uid = make_kuid(cred->user_ns, cred->uid);
 			uid = make_kuid(cred->user_ns, who);
 			user = cred->user;
 			if (!who)
-				uid = cred_uid;
-			else if (!uid_eq(uid, cred_uid) &&
+				uid = cred->uid;
+			else if (!uid_eq(uid, cred->uid) &&
 				 !(user = find_user(uid)))
 				goto out_unlock;	/* No processes for this user */
 
 			do_each_thread(g, p) {
-				const struct cred *tcred = __task_cred(p);
-				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
-				if (uid_eq(tcred_uid, uid)) {
+				if (uid_eq(task_uid(p), uid)) {
 					niceval = 20 - task_nice(p);
 					if (niceval > retval)
 						retval = niceval;
 				}
 			} while_each_thread(g, p);
-			if (!uid_eq(uid, cred_uid))
+			if (!uid_eq(uid, cred->uid))
 				free_uid(user);		/* for find_user() */
 			break;
 	}
@@ -641,7 +633,7 @@ static int set_user(struct cred *new)
 {
 	struct user_struct *new_user;
 
-	new_user = alloc_uid(make_kuid(new->user_ns, new->uid));
+	new_user = alloc_uid(new->uid);
 	if (!new_user)
 		return -EAGAIN;
 
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 9991bac..0683dbf 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -36,8 +36,8 @@ static bool new_idmap_permitted(struct user_namespace *ns, int cap_setid,
 int create_user_ns(struct cred *new)
 {
 	struct user_namespace *ns, *parent_ns = new->user_ns;
-	kuid_t owner = make_kuid(new->user_ns, new->euid);
-	kgid_t group = make_kgid(new->user_ns, new->egid);
+	kuid_t owner = new->euid;
+	kgid_t group = new->egid;
 
 	/* The creator needs a mapping in the parent user namespace
 	 * or else we won't be able to reasonably tell userspace who
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 46bf2ed5..9f09a1f 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -410,8 +410,8 @@ static void dump_tasks(const struct mem_cgroup *memcg, const nodemask_t *nodemas
 		}
 
 		pr_info("[%5d] %5d %5d %8lu %8lu %3u     %3d         %5d %s\n",
-			task->pid, task_uid(task), task->tgid,
-			task->mm->total_vm, get_mm_rss(task->mm),
+			task->pid, from_kuid(&init_user_ns, task_uid(task)),
+			task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
 			task_cpu(task), task->signal->oom_adj,
 			task->signal->oom_score_adj, task->comm);
 		task_unlock(task);
diff --git a/security/commoncap.c b/security/commoncap.c
index f2399d8..dbd465a 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -77,8 +77,7 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
 {
 	for (;;) {
 		/* The owner of the user namespace has all caps. */
-		if (targ_ns != &init_user_ns && uid_eq(targ_ns->owner,
-						       make_kuid(cred->user_ns, cred->euid)))
+		if (targ_ns != &init_user_ns && uid_eq(targ_ns->owner, cred->euid))
 			return 0;
 
 		/* Do we have the necessary capabilities? */
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 19/43] userns: Store uid and gid values in struct cred with kuid_t and kgid_t types
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

cred.h and a few trivial users of struct cred are changed.  The rest of the users
of struct cred are left for other patches as there are too many changes to make
in one go and leave the change reviewable.  If the user namespace is disabled and
CONFIG_UIDGID_STRICT_TYPE_CHECKS are disabled the code will contiue to compile
and behave correctly.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 arch/x86/mm/fault.c            |    2 +-
 fs/ioprio.c                    |    8 ++------
 include/linux/cred.h           |   16 ++++++++--------
 include/linux/user_namespace.h |    8 ++++----
 kernel/cred.c                  |   36 ++++++++++++++++++++++--------------
 kernel/signal.c                |   14 ++++++++------
 kernel/sys.c                   |   26 +++++++++-----------------
 kernel/user_namespace.c        |    4 ++--
 mm/oom_kill.c                  |    4 ++--
 security/commoncap.c           |    3 +--
 10 files changed, 59 insertions(+), 62 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 3ecfd1a..76dcd9d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -582,7 +582,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code,
 		pte_t *pte = lookup_address(address, &level);
 
 		if (pte && pte_present(*pte) && !pte_exec(*pte))
-			printk(nx_warning, current_uid());
+			printk(nx_warning, from_kuid(&init_user_ns, current_uid()));
 	}
 
 	printk(KERN_ALERT "BUG: unable to handle kernel ");
diff --git a/fs/ioprio.c b/fs/ioprio.c
index 8e35e96..2072e41 100644
--- a/fs/ioprio.c
+++ b/fs/ioprio.c
@@ -123,9 +123,7 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
 				break;
 
 			do_each_thread(g, p) {
-				const struct cred *tcred = __task_cred(p);
-				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
-				if (!uid_eq(tcred_uid, uid))
+				if (!uid_eq(task_uid(p), uid))
 					continue;
 				ret = set_task_ioprio(p, ioprio);
 				if (ret)
@@ -220,9 +218,7 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
 				break;
 
 			do_each_thread(g, p) {
-				const struct cred *tcred = __task_cred(p);
-				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
-				if (!uid_eq(tcred_uid, user->uid))
+				if (!uid_eq(task_uid(p), user->uid))
 					continue;
 				tmpio = get_task_ioprio(p);
 				if (tmpio < 0)
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 0ab3cda..fac0579 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -123,14 +123,14 @@ struct cred {
 #define CRED_MAGIC	0x43736564
 #define CRED_MAGIC_DEAD	0x44656144
 #endif
-	uid_t		uid;		/* real UID of the task */
-	gid_t		gid;		/* real GID of the task */
-	uid_t		suid;		/* saved UID of the task */
-	gid_t		sgid;		/* saved GID of the task */
-	uid_t		euid;		/* effective UID of the task */
-	gid_t		egid;		/* effective GID of the task */
-	uid_t		fsuid;		/* UID for VFS ops */
-	gid_t		fsgid;		/* GID for VFS ops */
+	kuid_t		uid;		/* real UID of the task */
+	kgid_t		gid;		/* real GID of the task */
+	kuid_t		suid;		/* saved UID of the task */
+	kgid_t		sgid;		/* saved GID of the task */
+	kuid_t		euid;		/* effective UID of the task */
+	kgid_t		egid;		/* effective GID of the task */
+	kuid_t		fsuid;		/* UID for VFS ops */
+	kgid_t		fsgid;		/* GID for VFS ops */
 	unsigned	securebits;	/* SUID-less security management */
 	kernel_cap_t	cap_inheritable; /* caps our children can inherit */
 	kernel_cap_t	cap_permitted;	/* caps we're permitted */
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 4c9846d..a2c6145 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -70,15 +70,15 @@ static inline void put_user_ns(struct user_namespace *ns)
 #endif
 
 static inline uid_t user_ns_map_uid(struct user_namespace *to,
-	const struct cred *cred, uid_t uid)
+	const struct cred *cred, kuid_t uid)
 {
-	return from_kuid_munged(to, make_kuid(cred->user_ns, uid));
+	return from_kuid_munged(to, uid);
 }
 
 static inline gid_t user_ns_map_gid(struct user_namespace *to,
-	const struct cred *cred, gid_t gid)
+	const struct cred *cred, kgid_t gid)
 {
-	return from_kgid_munged(to, make_kgid(cred->user_ns, gid));
+	return from_kgid_munged(to, gid);
 }
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/cred.c b/kernel/cred.c
index 7a0d806..eddc5e2 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -49,6 +49,14 @@ struct cred init_cred = {
 	.subscribers		= ATOMIC_INIT(2),
 	.magic			= CRED_MAGIC,
 #endif
+	.uid			= GLOBAL_ROOT_UID,
+	.gid			= GLOBAL_ROOT_GID,
+	.suid			= GLOBAL_ROOT_UID,
+	.sgid			= GLOBAL_ROOT_GID,
+	.euid			= GLOBAL_ROOT_UID,
+	.egid			= GLOBAL_ROOT_GID,
+	.fsuid			= GLOBAL_ROOT_UID,
+	.fsgid			= GLOBAL_ROOT_GID,
 	.securebits		= SECUREBITS_DEFAULT,
 	.cap_inheritable	= CAP_EMPTY_SET,
 	.cap_permitted		= CAP_FULL_SET,
@@ -488,10 +496,10 @@ int commit_creds(struct cred *new)
 	get_cred(new); /* we will require a ref for the subj creds too */
 
 	/* dumpability changes */
-	if (old->euid != new->euid ||
-	    old->egid != new->egid ||
-	    old->fsuid != new->fsuid ||
-	    old->fsgid != new->fsgid ||
+	if (!uid_eq(old->euid, new->euid) ||
+	    !gid_eq(old->egid, new->egid) ||
+	    !uid_eq(old->fsuid, new->fsuid) ||
+	    !gid_eq(old->fsgid, new->fsgid) ||
 	    !cap_issubset(new->cap_permitted, old->cap_permitted)) {
 		if (task->mm)
 			set_dumpable(task->mm, suid_dumpable);
@@ -500,9 +508,9 @@ int commit_creds(struct cred *new)
 	}
 
 	/* alter the thread keyring */
-	if (new->fsuid != old->fsuid)
+	if (!uid_eq(new->fsuid, old->fsuid))
 		key_fsuid_changed(task);
-	if (new->fsgid != old->fsgid)
+	if (!gid_eq(new->fsgid, old->fsgid))
 		key_fsgid_changed(task);
 
 	/* do it
@@ -519,16 +527,16 @@ int commit_creds(struct cred *new)
 	alter_cred_subscribers(old, -2);
 
 	/* send notifications */
-	if (new->uid   != old->uid  ||
-	    new->euid  != old->euid ||
-	    new->suid  != old->suid ||
-	    new->fsuid != old->fsuid)
+	if (!uid_eq(new->uid,   old->uid)  ||
+	    !uid_eq(new->euid,  old->euid) ||
+	    !uid_eq(new->suid,  old->suid) ||
+	    !uid_eq(new->fsuid, old->fsuid))
 		proc_id_connector(task, PROC_EVENT_UID);
 
-	if (new->gid   != old->gid  ||
-	    new->egid  != old->egid ||
-	    new->sgid  != old->sgid ||
-	    new->fsgid != old->fsgid)
+	if (!gid_eq(new->gid,   old->gid)  ||
+	    !gid_eq(new->egid,  old->egid) ||
+	    !gid_eq(new->sgid,  old->sgid) ||
+	    !gid_eq(new->fsgid, old->fsgid))
 		proc_id_connector(task, PROC_EVENT_GID);
 
 	/* release the old obj and subj refs both */
diff --git a/kernel/signal.c b/kernel/signal.c
index e2c5d84..2734dc9 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1038,8 +1038,10 @@ static inline void userns_fixup_signal_uid(struct siginfo *info, struct task_str
 	if (SI_FROMKERNEL(info))
 		return;
 
-	info->si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
-					current_cred(), info->si_uid);
+	rcu_read_lock();
+	info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
+					make_kuid(current_user_ns(), info->si_uid));
+	rcu_read_unlock();
 }
 #else
 static inline void userns_fixup_signal_uid(struct siginfo *info, struct task_struct *t)
@@ -1106,7 +1108,7 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 			q->info.si_code = SI_USER;
 			q->info.si_pid = task_tgid_nr_ns(current,
 							task_active_pid_ns(t));
-			q->info.si_uid = current_uid();
+			q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 			break;
 		case (unsigned long) SEND_SIG_PRIV:
 			q->info.si_signo = sig;
@@ -1973,7 +1975,7 @@ static void ptrace_do_notify(int signr, int exit_code, int why)
 	info.si_signo = signr;
 	info.si_code = exit_code;
 	info.si_pid = task_pid_vnr(current);
-	info.si_uid = current_uid();
+	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	/* Let the debugger run.  */
 	ptrace_stop(exit_code, why, 1, &info);
@@ -2828,7 +2830,7 @@ SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
 	info.si_errno = 0;
 	info.si_code = SI_USER;
 	info.si_pid = task_tgid_vnr(current);
-	info.si_uid = current_uid();
+	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	return kill_something_info(sig, &info, pid);
 }
@@ -2871,7 +2873,7 @@ static int do_tkill(pid_t tgid, pid_t pid, int sig)
 	info.si_errno = 0;
 	info.si_code = SI_TKILL;
 	info.si_pid = task_tgid_vnr(current);
-	info.si_uid = current_uid();
+	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
 
 	return do_send_specific(tgid, pid, sig, &info);
 }
diff --git a/kernel/sys.c b/kernel/sys.c
index f0c43b4..3996281 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -175,7 +175,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 	const struct cred *cred = current_cred();
 	int error = -EINVAL;
 	struct pid *pgrp;
-	kuid_t cred_uid;
 	kuid_t uid;
 
 	if (which > PRIO_USER || which < PRIO_PROCESS)
@@ -209,22 +208,19 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
-			cred_uid = make_kuid(cred->user_ns, cred->uid);
 			uid = make_kuid(cred->user_ns, who);
 			user = cred->user;
 			if (!who)
-				uid = cred_uid;
-			else if (!uid_eq(uid, cred_uid) &&
+				uid = cred->uid;
+			else if (!uid_eq(uid, cred->uid) &&
 				 !(user = find_user(uid)))
 				goto out_unlock;	/* No processes for this user */
 
 			do_each_thread(g, p) {
-				const struct cred *tcred = __task_cred(p);
-				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
-				if (uid_eq(tcred_uid, uid))
+				if (uid_eq(task_uid(p), uid))
 					error = set_one_prio(p, niceval, error);
 			} while_each_thread(g, p);
-			if (!uid_eq(uid, cred_uid))
+			if (!uid_eq(uid, cred->uid))
 				free_uid(user);		/* For find_user() */
 			break;
 	}
@@ -248,7 +244,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 	const struct cred *cred = current_cred();
 	long niceval, retval = -ESRCH;
 	struct pid *pgrp;
-	kuid_t cred_uid;
 	kuid_t uid;
 
 	if (which > PRIO_USER || which < PRIO_PROCESS)
@@ -280,25 +275,22 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
 			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
 			break;
 		case PRIO_USER:
-			cred_uid = make_kuid(cred->user_ns, cred->uid);
 			uid = make_kuid(cred->user_ns, who);
 			user = cred->user;
 			if (!who)
-				uid = cred_uid;
-			else if (!uid_eq(uid, cred_uid) &&
+				uid = cred->uid;
+			else if (!uid_eq(uid, cred->uid) &&
 				 !(user = find_user(uid)))
 				goto out_unlock;	/* No processes for this user */
 
 			do_each_thread(g, p) {
-				const struct cred *tcred = __task_cred(p);
-				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
-				if (uid_eq(tcred_uid, uid)) {
+				if (uid_eq(task_uid(p), uid)) {
 					niceval = 20 - task_nice(p);
 					if (niceval > retval)
 						retval = niceval;
 				}
 			} while_each_thread(g, p);
-			if (!uid_eq(uid, cred_uid))
+			if (!uid_eq(uid, cred->uid))
 				free_uid(user);		/* for find_user() */
 			break;
 	}
@@ -641,7 +633,7 @@ static int set_user(struct cred *new)
 {
 	struct user_struct *new_user;
 
-	new_user = alloc_uid(make_kuid(new->user_ns, new->uid));
+	new_user = alloc_uid(new->uid);
 	if (!new_user)
 		return -EAGAIN;
 
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 9991bac..0683dbf 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -36,8 +36,8 @@ static bool new_idmap_permitted(struct user_namespace *ns, int cap_setid,
 int create_user_ns(struct cred *new)
 {
 	struct user_namespace *ns, *parent_ns = new->user_ns;
-	kuid_t owner = make_kuid(new->user_ns, new->euid);
-	kgid_t group = make_kgid(new->user_ns, new->egid);
+	kuid_t owner = new->euid;
+	kgid_t group = new->egid;
 
 	/* The creator needs a mapping in the parent user namespace
 	 * or else we won't be able to reasonably tell userspace who
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 46bf2ed5..9f09a1f 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -410,8 +410,8 @@ static void dump_tasks(const struct mem_cgroup *memcg, const nodemask_t *nodemas
 		}
 
 		pr_info("[%5d] %5d %5d %8lu %8lu %3u     %3d         %5d %s\n",
-			task->pid, task_uid(task), task->tgid,
-			task->mm->total_vm, get_mm_rss(task->mm),
+			task->pid, from_kuid(&init_user_ns, task_uid(task)),
+			task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
 			task_cpu(task), task->signal->oom_adj,
 			task->signal->oom_score_adj, task->comm);
 		task_unlock(task);
diff --git a/security/commoncap.c b/security/commoncap.c
index f2399d8..dbd465a 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -77,8 +77,7 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
 {
 	for (;;) {
 		/* The owner of the user namespace has all caps. */
-		if (targ_ns != &init_user_ns && uid_eq(targ_ns->owner,
-						       make_kuid(cred->user_ns, cred->euid)))
+		if (targ_ns != &init_user_ns && uid_eq(targ_ns->owner, cred->euid))
 			return 0;
 
 		/* Do we have the necessary capabilities? */
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 20/43] userns: Replace user_ns_map_uid and user_ns_map_gid with from_kuid and from_kgid
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

These function are no longer needed replace them with their more useful equivalents.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/user_namespace.h |   12 ------------
 ipc/mqueue.c                   |    3 +--
 kernel/signal.c                |    2 +-
 net/core/sock.c                |    4 ++--
 4 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index a2c6145..4e72922 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -69,16 +69,4 @@ static inline void put_user_ns(struct user_namespace *ns)
 
 #endif
 
-static inline uid_t user_ns_map_uid(struct user_namespace *to,
-	const struct cred *cred, kuid_t uid)
-{
-	return from_kuid_munged(to, uid);
-}
-
-static inline gid_t user_ns_map_gid(struct user_namespace *to,
-	const struct cred *cred, kgid_t gid)
-{
-	return from_kgid_munged(to, gid);
-}
-
 #endif /* _LINUX_USER_H */
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index b53cf34..b6a0d46 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -538,8 +538,7 @@ static void __do_notify(struct mqueue_inode_info *info)
 			rcu_read_lock();
 			sig_i.si_pid = task_tgid_nr_ns(current,
 						ns_of_pid(info->notify_owner));
-			sig_i.si_uid = user_ns_map_uid(info->notify_user_ns,
-						current_cred(), current_uid());
+			sig_i.si_uid = from_kuid_munged(info->notify_user_ns, current_uid());
 			rcu_read_unlock();
 
 			kill_pid_info(info->notify.sigev_signo,
diff --git a/kernel/signal.c b/kernel/signal.c
index 2734dc9..d630327 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1026,7 +1026,7 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
 static inline uid_t map_cred_ns(const struct cred *cred,
 				struct user_namespace *ns)
 {
-	return user_ns_map_uid(ns, cred, cred->uid);
+	return from_kuid_munged(ns, cred->uid);
 }
 
 #ifdef CONFIG_USER_NS
diff --git a/net/core/sock.c b/net/core/sock.c
index b2e14c0..e1ec8ba 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -821,8 +821,8 @@ void cred_to_ucred(struct pid *pid, const struct cred *cred,
 	if (cred) {
 		struct user_namespace *current_ns = current_user_ns();
 
-		ucred->uid = user_ns_map_uid(current_ns, cred, cred->euid);
-		ucred->gid = user_ns_map_gid(current_ns, cred, cred->egid);
+		ucred->uid = from_kuid(current_ns, cred->euid);
+		ucred->gid = from_kgid(current_ns, cred->egid);
 	}
 }
 EXPORT_SYMBOL_GPL(cred_to_ucred);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 20/43] userns: Replace user_ns_map_uid and user_ns_map_gid with from_kuid and from_kgid
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

These function are no longer needed replace them with their more useful equivalents.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/user_namespace.h |   12 ------------
 ipc/mqueue.c                   |    3 +--
 kernel/signal.c                |    2 +-
 net/core/sock.c                |    4 ++--
 4 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index a2c6145..4e72922 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -69,16 +69,4 @@ static inline void put_user_ns(struct user_namespace *ns)
 
 #endif
 
-static inline uid_t user_ns_map_uid(struct user_namespace *to,
-	const struct cred *cred, kuid_t uid)
-{
-	return from_kuid_munged(to, uid);
-}
-
-static inline gid_t user_ns_map_gid(struct user_namespace *to,
-	const struct cred *cred, kgid_t gid)
-{
-	return from_kgid_munged(to, gid);
-}
-
 #endif /* _LINUX_USER_H */
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index b53cf34..b6a0d46 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -538,8 +538,7 @@ static void __do_notify(struct mqueue_inode_info *info)
 			rcu_read_lock();
 			sig_i.si_pid = task_tgid_nr_ns(current,
 						ns_of_pid(info->notify_owner));
-			sig_i.si_uid = user_ns_map_uid(info->notify_user_ns,
-						current_cred(), current_uid());
+			sig_i.si_uid = from_kuid_munged(info->notify_user_ns, current_uid());
 			rcu_read_unlock();
 
 			kill_pid_info(info->notify.sigev_signo,
diff --git a/kernel/signal.c b/kernel/signal.c
index 2734dc9..d630327 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1026,7 +1026,7 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
 static inline uid_t map_cred_ns(const struct cred *cred,
 				struct user_namespace *ns)
 {
-	return user_ns_map_uid(ns, cred, cred->uid);
+	return from_kuid_munged(ns, cred->uid);
 }
 
 #ifdef CONFIG_USER_NS
diff --git a/net/core/sock.c b/net/core/sock.c
index b2e14c0..e1ec8ba 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -821,8 +821,8 @@ void cred_to_ucred(struct pid *pid, const struct cred *cred,
 	if (cred) {
 		struct user_namespace *current_ns = current_user_ns();
 
-		ucred->uid = user_ns_map_uid(current_ns, cred, cred->euid);
-		ucred->gid = user_ns_map_gid(current_ns, cred, cred->egid);
+		ucred->uid = from_kuid(current_ns, cred->euid);
+		ucred->gid = from_kgid(current_ns, cred->egid);
 	}
 }
 EXPORT_SYMBOL_GPL(cred_to_ucred);
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 20/43] userns: Replace user_ns_map_uid and user_ns_map_gid with from_kuid and from_kgid
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

These function are no longer needed replace them with their more useful equivalents.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/user_namespace.h |   12 ------------
 ipc/mqueue.c                   |    3 +--
 kernel/signal.c                |    2 +-
 net/core/sock.c                |    4 ++--
 4 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index a2c6145..4e72922 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -69,16 +69,4 @@ static inline void put_user_ns(struct user_namespace *ns)
 
 #endif
 
-static inline uid_t user_ns_map_uid(struct user_namespace *to,
-	const struct cred *cred, kuid_t uid)
-{
-	return from_kuid_munged(to, uid);
-}
-
-static inline gid_t user_ns_map_gid(struct user_namespace *to,
-	const struct cred *cred, kgid_t gid)
-{
-	return from_kgid_munged(to, gid);
-}
-
 #endif /* _LINUX_USER_H */
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index b53cf34..b6a0d46 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -538,8 +538,7 @@ static void __do_notify(struct mqueue_inode_info *info)
 			rcu_read_lock();
 			sig_i.si_pid = task_tgid_nr_ns(current,
 						ns_of_pid(info->notify_owner));
-			sig_i.si_uid = user_ns_map_uid(info->notify_user_ns,
-						current_cred(), current_uid());
+			sig_i.si_uid = from_kuid_munged(info->notify_user_ns, current_uid());
 			rcu_read_unlock();
 
 			kill_pid_info(info->notify.sigev_signo,
diff --git a/kernel/signal.c b/kernel/signal.c
index 2734dc9..d630327 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1026,7 +1026,7 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
 static inline uid_t map_cred_ns(const struct cred *cred,
 				struct user_namespace *ns)
 {
-	return user_ns_map_uid(ns, cred, cred->uid);
+	return from_kuid_munged(ns, cred->uid);
 }
 
 #ifdef CONFIG_USER_NS
diff --git a/net/core/sock.c b/net/core/sock.c
index b2e14c0..e1ec8ba 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -821,8 +821,8 @@ void cred_to_ucred(struct pid *pid, const struct cred *cred,
 	if (cred) {
 		struct user_namespace *current_ns = current_user_ns();
 
-		ucred->uid = user_ns_map_uid(current_ns, cred, cred->euid);
-		ucred->gid = user_ns_map_gid(current_ns, cred, cred->egid);
+		ucred->uid = from_kuid(current_ns, cred->euid);
+		ucred->gid = from_kgid(current_ns, cred->egid);
 	}
 }
 EXPORT_SYMBOL_GPL(cred_to_ucred);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 21/43] userns: Convert sched_set_affinity and sched_set_scheduler's permission checks
       [not found] ` <m11unyn70b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
                     ` (19 preceding siblings ...)
  2012-04-08  5:15     ` "Eric W. Beiderman
@ 2012-04-08  5:15   ` "Eric W. Beiderman
  2012-04-08  5:15     ` "Eric W. Beiderman
                     ` (24 subsequent siblings)
  45 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

- Compare kuids with uid_eq
- kuid are uniuqe across all user namespaces so there is no longer the
  need for a user_namespace comparison.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/sched/core.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 96bff85..b189fec 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4042,11 +4042,8 @@ static bool check_same_owner(struct task_struct *p)
 
 	rcu_read_lock();
 	pcred = __task_cred(p);
-	if (cred->user_ns == pcred->user_ns)
-		match = (cred->euid == pcred->euid ||
-			 cred->euid == pcred->uid);
-	else
-		match = false;
+	match = (uid_eq(cred->euid, pcred->euid) ||
+		 uid_eq(cred->euid, pcred->uid));
 	rcu_read_unlock();
 	return match;
 }
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 21/43] userns: Convert sched_set_affinity and sched_set_scheduler's permission checks
  2012-04-08  5:10 ` Eric W. Biederman
                   ` (2 preceding siblings ...)
  (?)
@ 2012-04-08  5:15 ` "Eric W. Beiderman
       [not found]   ` <1333862139-31737-21-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2012-04-18 18:50   ` Serge E. Hallyn
  -1 siblings, 2 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

- Compare kuids with uid_eq
- kuid are uniuqe across all user namespaces so there is no longer the
  need for a user_namespace comparison.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 kernel/sched/core.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 96bff85..b189fec 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4042,11 +4042,8 @@ static bool check_same_owner(struct task_struct *p)
 
 	rcu_read_lock();
 	pcred = __task_cred(p);
-	if (cred->user_ns == pcred->user_ns)
-		match = (cred->euid == pcred->euid ||
-			 cred->euid == pcred->uid);
-	else
-		match = false;
+	match = (uid_eq(cred->euid, pcred->euid) ||
+		 uid_eq(cred->euid, pcred->uid));
 	rcu_read_unlock();
 	return match;
 }
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 22/43] userns: Convert capabilities related permsion checks
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

- Use uid_eq when comparing kuids
  Use gid_eq when comparing kgids
- Use __make_kuid(user_ns, 0) to talk about the user_namespace root uid
  Use __make_kgid(user_ns, 0) to talk about the user_namespace root gid

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/open.c            |    3 ++-
 security/commoncap.c |   43 ++++++++++++++++++++++++++++---------------
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 5720854..92335f6 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -316,7 +316,8 @@ SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
 
 	if (!issecure(SECURE_NO_SETUID_FIXUP)) {
 		/* Clear the capabilities if we switch to a non-root user */
-		if (override_cred->uid)
+		kuid_t root_uid = make_kuid(override_cred->user_ns, 0);
+		if (!uid_eq(override_cred->uid, root_uid))
 			cap_clear(override_cred->cap_effective);
 		else
 			override_cred->cap_effective =
diff --git a/security/commoncap.c b/security/commoncap.c
index dbd465a..9bf8df8 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -472,19 +472,24 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
 	struct cred *new = bprm->cred;
 	bool effective, has_cap = false;
 	int ret;
+	kuid_t root_uid;
+	kgid_t root_gid;
 
 	effective = false;
 	ret = get_file_caps(bprm, &effective, &has_cap);
 	if (ret < 0)
 		return ret;
 
+	root_uid = make_kuid(new->user_ns, 0);
+	root_gid = make_kgid(new->user_ns, 0);
+
 	if (!issecure(SECURE_NOROOT)) {
 		/*
 		 * If the legacy file capability is set, then don't set privs
 		 * for a setuid root binary run by a non-root user.  Do set it
 		 * for a root user just to cause least surprise to an admin.
 		 */
-		if (has_cap && new->uid != 0 && new->euid == 0) {
+		if (has_cap && !uid_eq(new->uid, root_uid) && uid_eq(new->euid, root_uid)) {
 			warn_setuid_and_fcaps_mixed(bprm->filename);
 			goto skip;
 		}
@@ -495,12 +500,12 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
 		 *
 		 * If only the real uid is 0, we do not set the effective bit.
 		 */
-		if (new->euid == 0 || new->uid == 0) {
+		if (uid_eq(new->euid, root_uid) || uid_eq(new->uid, root_uid)) {
 			/* pP' = (cap_bset & ~0) | (pI & ~0) */
 			new->cap_permitted = cap_combine(old->cap_bset,
 							 old->cap_inheritable);
 		}
-		if (new->euid == 0)
+		if (uid_eq(new->euid, root_uid))
 			effective = true;
 	}
 skip:
@@ -508,8 +513,8 @@ skip:
 	/* Don't let someone trace a set[ug]id/setpcap binary with the revised
 	 * credentials unless they have the appropriate permit
 	 */
-	if ((new->euid != old->uid ||
-	     new->egid != old->gid ||
+	if ((!uid_eq(new->euid, old->uid) ||
+	     !gid_eq(new->egid, old->gid) ||
 	     !cap_issubset(new->cap_permitted, old->cap_permitted)) &&
 	    bprm->unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
 		/* downgrade; they get no more than they had, and maybe less */
@@ -544,7 +549,7 @@ skip:
 	 */
 	if (!cap_isclear(new->cap_effective)) {
 		if (!cap_issubset(CAP_FULL_SET, new->cap_effective) ||
-		    new->euid != 0 || new->uid != 0 ||
+		    !uid_eq(new->euid, root_uid) || !uid_eq(new->uid, root_uid) ||
 		    issecure(SECURE_NOROOT)) {
 			ret = audit_log_bprm_fcaps(bprm, new, old);
 			if (ret < 0)
@@ -569,16 +574,17 @@ skip:
 int cap_bprm_secureexec(struct linux_binprm *bprm)
 {
 	const struct cred *cred = current_cred();
+	kuid_t root_uid = make_kuid(cred->user_ns, 0);
 
-	if (cred->uid != 0) {
+	if (!uid_eq(cred->uid, root_uid)) {
 		if (bprm->cap_effective)
 			return 1;
 		if (!cap_isclear(cred->cap_permitted))
 			return 1;
 	}
 
-	return (cred->euid != cred->uid ||
-		cred->egid != cred->gid);
+	return (!uid_eq(cred->euid, cred->uid) ||
+		!gid_eq(cred->egid, cred->gid));
 }
 
 /**
@@ -668,15 +674,21 @@ int cap_inode_removexattr(struct dentry *dentry, const char *name)
  */
 static inline void cap_emulate_setxuid(struct cred *new, const struct cred *old)
 {
-	if ((old->uid == 0 || old->euid == 0 || old->suid == 0) &&
-	    (new->uid != 0 && new->euid != 0 && new->suid != 0) &&
+	kuid_t root_uid = make_kuid(old->user_ns, 0);
+
+	if ((uid_eq(old->uid, root_uid) ||
+	     uid_eq(old->euid, root_uid) ||
+	     uid_eq(old->suid, root_uid)) &&
+	    (!uid_eq(new->uid, root_uid) &&
+	     !uid_eq(new->euid, root_uid) &&
+	     !uid_eq(new->suid, root_uid)) &&
 	    !issecure(SECURE_KEEP_CAPS)) {
 		cap_clear(new->cap_permitted);
 		cap_clear(new->cap_effective);
 	}
-	if (old->euid == 0 && new->euid != 0)
+	if (uid_eq(old->euid, root_uid) && !uid_eq(new->euid, root_uid))
 		cap_clear(new->cap_effective);
-	if (old->euid != 0 && new->euid == 0)
+	if (!uid_eq(old->euid, root_uid) && uid_eq(new->euid, root_uid))
 		new->cap_effective = new->cap_permitted;
 }
 
@@ -709,11 +721,12 @@ int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags)
 		 *          if not, we might be a bit too harsh here.
 		 */
 		if (!issecure(SECURE_NO_SETUID_FIXUP)) {
-			if (old->fsuid == 0 && new->fsuid != 0)
+			kuid_t root_uid = make_kuid(old->user_ns, 0);
+			if (uid_eq(old->fsuid, root_uid) && !uid_eq(new->fsuid, root_uid))
 				new->cap_effective =
 					cap_drop_fs_set(new->cap_effective);
 
-			if (old->fsuid != 0 && new->fsuid == 0)
+			if (!uid_eq(old->fsuid, root_uid) && uid_eq(new->fsuid, root_uid))
 				new->cap_effective =
 					cap_raise_fs_set(new->cap_effective,
 							 new->cap_permitted);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 22/43] userns: Convert capabilities related permsion checks
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

- Use uid_eq when comparing kuids
  Use gid_eq when comparing kgids
- Use __make_kuid(user_ns, 0) to talk about the user_namespace root uid
  Use __make_kgid(user_ns, 0) to talk about the user_namespace root gid

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/open.c            |    3 ++-
 security/commoncap.c |   43 ++++++++++++++++++++++++++++---------------
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 5720854..92335f6 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -316,7 +316,8 @@ SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
 
 	if (!issecure(SECURE_NO_SETUID_FIXUP)) {
 		/* Clear the capabilities if we switch to a non-root user */
-		if (override_cred->uid)
+		kuid_t root_uid = make_kuid(override_cred->user_ns, 0);
+		if (!uid_eq(override_cred->uid, root_uid))
 			cap_clear(override_cred->cap_effective);
 		else
 			override_cred->cap_effective =
diff --git a/security/commoncap.c b/security/commoncap.c
index dbd465a..9bf8df8 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -472,19 +472,24 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
 	struct cred *new = bprm->cred;
 	bool effective, has_cap = false;
 	int ret;
+	kuid_t root_uid;
+	kgid_t root_gid;
 
 	effective = false;
 	ret = get_file_caps(bprm, &effective, &has_cap);
 	if (ret < 0)
 		return ret;
 
+	root_uid = make_kuid(new->user_ns, 0);
+	root_gid = make_kgid(new->user_ns, 0);
+
 	if (!issecure(SECURE_NOROOT)) {
 		/*
 		 * If the legacy file capability is set, then don't set privs
 		 * for a setuid root binary run by a non-root user.  Do set it
 		 * for a root user just to cause least surprise to an admin.
 		 */
-		if (has_cap && new->uid != 0 && new->euid == 0) {
+		if (has_cap && !uid_eq(new->uid, root_uid) && uid_eq(new->euid, root_uid)) {
 			warn_setuid_and_fcaps_mixed(bprm->filename);
 			goto skip;
 		}
@@ -495,12 +500,12 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
 		 *
 		 * If only the real uid is 0, we do not set the effective bit.
 		 */
-		if (new->euid == 0 || new->uid == 0) {
+		if (uid_eq(new->euid, root_uid) || uid_eq(new->uid, root_uid)) {
 			/* pP' = (cap_bset & ~0) | (pI & ~0) */
 			new->cap_permitted = cap_combine(old->cap_bset,
 							 old->cap_inheritable);
 		}
-		if (new->euid == 0)
+		if (uid_eq(new->euid, root_uid))
 			effective = true;
 	}
 skip:
@@ -508,8 +513,8 @@ skip:
 	/* Don't let someone trace a set[ug]id/setpcap binary with the revised
 	 * credentials unless they have the appropriate permit
 	 */
-	if ((new->euid != old->uid ||
-	     new->egid != old->gid ||
+	if ((!uid_eq(new->euid, old->uid) ||
+	     !gid_eq(new->egid, old->gid) ||
 	     !cap_issubset(new->cap_permitted, old->cap_permitted)) &&
 	    bprm->unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
 		/* downgrade; they get no more than they had, and maybe less */
@@ -544,7 +549,7 @@ skip:
 	 */
 	if (!cap_isclear(new->cap_effective)) {
 		if (!cap_issubset(CAP_FULL_SET, new->cap_effective) ||
-		    new->euid != 0 || new->uid != 0 ||
+		    !uid_eq(new->euid, root_uid) || !uid_eq(new->uid, root_uid) ||
 		    issecure(SECURE_NOROOT)) {
 			ret = audit_log_bprm_fcaps(bprm, new, old);
 			if (ret < 0)
@@ -569,16 +574,17 @@ skip:
 int cap_bprm_secureexec(struct linux_binprm *bprm)
 {
 	const struct cred *cred = current_cred();
+	kuid_t root_uid = make_kuid(cred->user_ns, 0);
 
-	if (cred->uid != 0) {
+	if (!uid_eq(cred->uid, root_uid)) {
 		if (bprm->cap_effective)
 			return 1;
 		if (!cap_isclear(cred->cap_permitted))
 			return 1;
 	}
 
-	return (cred->euid != cred->uid ||
-		cred->egid != cred->gid);
+	return (!uid_eq(cred->euid, cred->uid) ||
+		!gid_eq(cred->egid, cred->gid));
 }
 
 /**
@@ -668,15 +674,21 @@ int cap_inode_removexattr(struct dentry *dentry, const char *name)
  */
 static inline void cap_emulate_setxuid(struct cred *new, const struct cred *old)
 {
-	if ((old->uid == 0 || old->euid == 0 || old->suid == 0) &&
-	    (new->uid != 0 && new->euid != 0 && new->suid != 0) &&
+	kuid_t root_uid = make_kuid(old->user_ns, 0);
+
+	if ((uid_eq(old->uid, root_uid) ||
+	     uid_eq(old->euid, root_uid) ||
+	     uid_eq(old->suid, root_uid)) &&
+	    (!uid_eq(new->uid, root_uid) &&
+	     !uid_eq(new->euid, root_uid) &&
+	     !uid_eq(new->suid, root_uid)) &&
 	    !issecure(SECURE_KEEP_CAPS)) {
 		cap_clear(new->cap_permitted);
 		cap_clear(new->cap_effective);
 	}
-	if (old->euid == 0 && new->euid != 0)
+	if (uid_eq(old->euid, root_uid) && !uid_eq(new->euid, root_uid))
 		cap_clear(new->cap_effective);
-	if (old->euid != 0 && new->euid == 0)
+	if (!uid_eq(old->euid, root_uid) && uid_eq(new->euid, root_uid))
 		new->cap_effective = new->cap_permitted;
 }
 
@@ -709,11 +721,12 @@ int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags)
 		 *          if not, we might be a bit too harsh here.
 		 */
 		if (!issecure(SECURE_NO_SETUID_FIXUP)) {
-			if (old->fsuid == 0 && new->fsuid != 0)
+			kuid_t root_uid = make_kuid(old->user_ns, 0);
+			if (uid_eq(old->fsuid, root_uid) && !uid_eq(new->fsuid, root_uid))
 				new->cap_effective =
 					cap_drop_fs_set(new->cap_effective);
 
-			if (old->fsuid != 0 && new->fsuid == 0)
+			if (!uid_eq(old->fsuid, root_uid) && uid_eq(new->fsuid, root_uid))
 				new->cap_effective =
 					cap_raise_fs_set(new->cap_effective,
 							 new->cap_permitted);
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 22/43] userns: Convert capabilities related permsion checks
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

- Use uid_eq when comparing kuids
  Use gid_eq when comparing kgids
- Use __make_kuid(user_ns, 0) to talk about the user_namespace root uid
  Use __make_kgid(user_ns, 0) to talk about the user_namespace root gid

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/open.c            |    3 ++-
 security/commoncap.c |   43 ++++++++++++++++++++++++++++---------------
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 5720854..92335f6 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -316,7 +316,8 @@ SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
 
 	if (!issecure(SECURE_NO_SETUID_FIXUP)) {
 		/* Clear the capabilities if we switch to a non-root user */
-		if (override_cred->uid)
+		kuid_t root_uid = make_kuid(override_cred->user_ns, 0);
+		if (!uid_eq(override_cred->uid, root_uid))
 			cap_clear(override_cred->cap_effective);
 		else
 			override_cred->cap_effective =
diff --git a/security/commoncap.c b/security/commoncap.c
index dbd465a..9bf8df8 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -472,19 +472,24 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
 	struct cred *new = bprm->cred;
 	bool effective, has_cap = false;
 	int ret;
+	kuid_t root_uid;
+	kgid_t root_gid;
 
 	effective = false;
 	ret = get_file_caps(bprm, &effective, &has_cap);
 	if (ret < 0)
 		return ret;
 
+	root_uid = make_kuid(new->user_ns, 0);
+	root_gid = make_kgid(new->user_ns, 0);
+
 	if (!issecure(SECURE_NOROOT)) {
 		/*
 		 * If the legacy file capability is set, then don't set privs
 		 * for a setuid root binary run by a non-root user.  Do set it
 		 * for a root user just to cause least surprise to an admin.
 		 */
-		if (has_cap && new->uid != 0 && new->euid == 0) {
+		if (has_cap && !uid_eq(new->uid, root_uid) && uid_eq(new->euid, root_uid)) {
 			warn_setuid_and_fcaps_mixed(bprm->filename);
 			goto skip;
 		}
@@ -495,12 +500,12 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
 		 *
 		 * If only the real uid is 0, we do not set the effective bit.
 		 */
-		if (new->euid == 0 || new->uid == 0) {
+		if (uid_eq(new->euid, root_uid) || uid_eq(new->uid, root_uid)) {
 			/* pP' = (cap_bset & ~0) | (pI & ~0) */
 			new->cap_permitted = cap_combine(old->cap_bset,
 							 old->cap_inheritable);
 		}
-		if (new->euid == 0)
+		if (uid_eq(new->euid, root_uid))
 			effective = true;
 	}
 skip:
@@ -508,8 +513,8 @@ skip:
 	/* Don't let someone trace a set[ug]id/setpcap binary with the revised
 	 * credentials unless they have the appropriate permit
 	 */
-	if ((new->euid != old->uid ||
-	     new->egid != old->gid ||
+	if ((!uid_eq(new->euid, old->uid) ||
+	     !gid_eq(new->egid, old->gid) ||
 	     !cap_issubset(new->cap_permitted, old->cap_permitted)) &&
 	    bprm->unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
 		/* downgrade; they get no more than they had, and maybe less */
@@ -544,7 +549,7 @@ skip:
 	 */
 	if (!cap_isclear(new->cap_effective)) {
 		if (!cap_issubset(CAP_FULL_SET, new->cap_effective) ||
-		    new->euid != 0 || new->uid != 0 ||
+		    !uid_eq(new->euid, root_uid) || !uid_eq(new->uid, root_uid) ||
 		    issecure(SECURE_NOROOT)) {
 			ret = audit_log_bprm_fcaps(bprm, new, old);
 			if (ret < 0)
@@ -569,16 +574,17 @@ skip:
 int cap_bprm_secureexec(struct linux_binprm *bprm)
 {
 	const struct cred *cred = current_cred();
+	kuid_t root_uid = make_kuid(cred->user_ns, 0);
 
-	if (cred->uid != 0) {
+	if (!uid_eq(cred->uid, root_uid)) {
 		if (bprm->cap_effective)
 			return 1;
 		if (!cap_isclear(cred->cap_permitted))
 			return 1;
 	}
 
-	return (cred->euid != cred->uid ||
-		cred->egid != cred->gid);
+	return (!uid_eq(cred->euid, cred->uid) ||
+		!gid_eq(cred->egid, cred->gid));
 }
 
 /**
@@ -668,15 +674,21 @@ int cap_inode_removexattr(struct dentry *dentry, const char *name)
  */
 static inline void cap_emulate_setxuid(struct cred *new, const struct cred *old)
 {
-	if ((old->uid == 0 || old->euid == 0 || old->suid == 0) &&
-	    (new->uid != 0 && new->euid != 0 && new->suid != 0) &&
+	kuid_t root_uid = make_kuid(old->user_ns, 0);
+
+	if ((uid_eq(old->uid, root_uid) ||
+	     uid_eq(old->euid, root_uid) ||
+	     uid_eq(old->suid, root_uid)) &&
+	    (!uid_eq(new->uid, root_uid) &&
+	     !uid_eq(new->euid, root_uid) &&
+	     !uid_eq(new->suid, root_uid)) &&
 	    !issecure(SECURE_KEEP_CAPS)) {
 		cap_clear(new->cap_permitted);
 		cap_clear(new->cap_effective);
 	}
-	if (old->euid == 0 && new->euid != 0)
+	if (uid_eq(old->euid, root_uid) && !uid_eq(new->euid, root_uid))
 		cap_clear(new->cap_effective);
-	if (old->euid != 0 && new->euid == 0)
+	if (!uid_eq(old->euid, root_uid) && uid_eq(new->euid, root_uid))
 		new->cap_effective = new->cap_permitted;
 }
 
@@ -709,11 +721,12 @@ int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags)
 		 *          if not, we might be a bit too harsh here.
 		 */
 		if (!issecure(SECURE_NO_SETUID_FIXUP)) {
-			if (old->fsuid == 0 && new->fsuid != 0)
+			kuid_t root_uid = make_kuid(old->user_ns, 0);
+			if (uid_eq(old->fsuid, root_uid) && !uid_eq(new->fsuid, root_uid))
 				new->cap_effective =
 					cap_drop_fs_set(new->cap_effective);
 
-			if (old->fsuid != 0 && new->fsuid == 0)
+			if (!uid_eq(old->fsuid, root_uid) && uid_eq(new->fsuid, root_uid))
 				new->cap_effective =
 					cap_raise_fs_set(new->cap_effective,
 							 new->cap_permitted);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 23/43] userns: Convert setting and getting uid and gid system calls to use kuid and kgid
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Convert setregid, setgid, setreuid, setuid,
setresuid, getresuid, setresgid, getresgid, setfsuid, setfsgid,
getuid, geteuid, getgid, getegid,
waitpid, waitid, wait4.

Convert userspace uids and gids into kuids and kgids before
being placed on struct cred.  Convert struct cred kuids and
kgids into userspace uids and gids when returning them.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/exit.c  |    6 +-
 kernel/sys.c   |  216 ++++++++++++++++++++++++++++++++++++++-----------------
 kernel/timer.c |    8 +-
 kernel/uid16.c |   34 ++++++---
 4 files changed, 178 insertions(+), 86 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b42..789e3c5 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1214,7 +1214,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
 	unsigned long state;
 	int retval, status, traced;
 	pid_t pid = task_pid_vnr(p);
-	uid_t uid = __task_cred(p)->uid;
+	uid_t uid = from_kuid_munged(current_user_ns(), __task_cred(p)->uid);
 	struct siginfo __user *infop;
 
 	if (!likely(wo->wo_flags & WEXITED))
@@ -1427,7 +1427,7 @@ static int wait_task_stopped(struct wait_opts *wo,
 	if (!unlikely(wo->wo_flags & WNOWAIT))
 		*p_code = 0;
 
-	uid = task_uid(p);
+	uid = from_kuid_munged(current_user_ns(), __task_cred(p)->uid);
 unlock_sig:
 	spin_unlock_irq(&p->sighand->siglock);
 	if (!exit_code)
@@ -1500,7 +1500,7 @@ static int wait_task_continued(struct wait_opts *wo, struct task_struct *p)
 	}
 	if (!unlikely(wo->wo_flags & WNOWAIT))
 		p->signal->flags &= ~SIGNAL_STOP_CONTINUED;
-	uid = task_uid(p);
+	uid = from_kuid_munged(current_user_ns(), __task_cred(p)->uid);
 	spin_unlock_irq(&p->sighand->siglock);
 
 	pid = task_pid_vnr(p);
diff --git a/kernel/sys.c b/kernel/sys.c
index 3996281..aff09f2 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -555,9 +555,19 @@ void ctrl_alt_del(void)
  */
 SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kgid_t krgid, kegid;
+
+	krgid = make_kgid(ns, rgid);
+	kegid = make_kgid(ns, egid);
+
+	if ((rgid != (gid_t) -1) && !gid_valid(krgid))
+		return -EINVAL;
+	if ((egid != (gid_t) -1) && !gid_valid(kegid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -566,25 +576,25 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
 
 	retval = -EPERM;
 	if (rgid != (gid_t) -1) {
-		if (old->gid == rgid ||
-		    old->egid == rgid ||
+		if (gid_eq(old->gid, krgid) ||
+		    gid_eq(old->egid, krgid) ||
 		    nsown_capable(CAP_SETGID))
-			new->gid = rgid;
+			new->gid = krgid;
 		else
 			goto error;
 	}
 	if (egid != (gid_t) -1) {
-		if (old->gid == egid ||
-		    old->egid == egid ||
-		    old->sgid == egid ||
+		if (gid_eq(old->gid, kegid) ||
+		    gid_eq(old->egid, kegid) ||
+		    gid_eq(old->sgid, kegid) ||
 		    nsown_capable(CAP_SETGID))
-			new->egid = egid;
+			new->egid = kegid;
 		else
 			goto error;
 	}
 
 	if (rgid != (gid_t) -1 ||
-	    (egid != (gid_t) -1 && egid != old->gid))
+	    (egid != (gid_t) -1 && !gid_eq(kegid, old->gid)))
 		new->sgid = new->egid;
 	new->fsgid = new->egid;
 
@@ -602,9 +612,15 @@ error:
  */
 SYSCALL_DEFINE1(setgid, gid_t, gid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kgid_t kgid;
+
+	kgid = make_kgid(ns, gid);
+	if (!gid_valid(kgid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -613,9 +629,9 @@ SYSCALL_DEFINE1(setgid, gid_t, gid)
 
 	retval = -EPERM;
 	if (nsown_capable(CAP_SETGID))
-		new->gid = new->egid = new->sgid = new->fsgid = gid;
-	else if (gid == old->gid || gid == old->sgid)
-		new->egid = new->fsgid = gid;
+		new->gid = new->egid = new->sgid = new->fsgid = kgid;
+	else if (gid_eq(kgid, old->gid) || gid_eq(kgid, old->sgid))
+		new->egid = new->fsgid = kgid;
 	else
 		goto error;
 
@@ -672,9 +688,19 @@ static int set_user(struct cred *new)
  */
 SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kuid_t kruid, keuid;
+
+	kruid = make_kuid(ns, ruid);
+	keuid = make_kuid(ns, euid);
+
+	if ((ruid != (uid_t) -1) && !uid_valid(kruid))
+		return -EINVAL;
+	if ((euid != (uid_t) -1) && !uid_valid(keuid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -683,29 +709,29 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
 
 	retval = -EPERM;
 	if (ruid != (uid_t) -1) {
-		new->uid = ruid;
-		if (old->uid != ruid &&
-		    old->euid != ruid &&
+		new->uid = kruid;
+		if (!uid_eq(old->uid, kruid) &&
+		    !uid_eq(old->euid, kruid) &&
 		    !nsown_capable(CAP_SETUID))
 			goto error;
 	}
 
 	if (euid != (uid_t) -1) {
-		new->euid = euid;
-		if (old->uid != euid &&
-		    old->euid != euid &&
-		    old->suid != euid &&
+		new->euid = keuid;
+		if (!uid_eq(old->uid, keuid) &&
+		    !uid_eq(old->euid, keuid) &&
+		    !uid_eq(old->suid, keuid) &&
 		    !nsown_capable(CAP_SETUID))
 			goto error;
 	}
 
-	if (new->uid != old->uid) {
+	if (!uid_eq(new->uid, old->uid)) {
 		retval = set_user(new);
 		if (retval < 0)
 			goto error;
 	}
 	if (ruid != (uid_t) -1 ||
-	    (euid != (uid_t) -1 && euid != old->uid))
+	    (euid != (uid_t) -1 && !uid_eq(keuid, old->uid)))
 		new->suid = new->euid;
 	new->fsuid = new->euid;
 
@@ -733,9 +759,15 @@ error:
  */
 SYSCALL_DEFINE1(setuid, uid_t, uid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kuid_t kuid;
+
+	kuid = make_kuid(ns, uid);
+	if (!uid_valid(kuid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -744,17 +776,17 @@ SYSCALL_DEFINE1(setuid, uid_t, uid)
 
 	retval = -EPERM;
 	if (nsown_capable(CAP_SETUID)) {
-		new->suid = new->uid = uid;
-		if (uid != old->uid) {
+		new->suid = new->uid = kuid;
+		if (!uid_eq(kuid, old->uid)) {
 			retval = set_user(new);
 			if (retval < 0)
 				goto error;
 		}
-	} else if (uid != old->uid && uid != new->suid) {
+	} else if (!uid_eq(kuid, old->uid) && !uid_eq(kuid, new->suid)) {
 		goto error;
 	}
 
-	new->fsuid = new->euid = uid;
+	new->fsuid = new->euid = kuid;
 
 	retval = security_task_fix_setuid(new, old, LSM_SETID_ID);
 	if (retval < 0)
@@ -774,9 +806,24 @@ error:
  */
 SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kuid_t kruid, keuid, ksuid;
+
+	kruid = make_kuid(ns, ruid);
+	keuid = make_kuid(ns, euid);
+	ksuid = make_kuid(ns, suid);
+
+	if ((ruid != (uid_t) -1) && !uid_valid(kruid))
+		return -EINVAL;
+
+	if ((euid != (uid_t) -1) && !uid_valid(keuid))
+		return -EINVAL;
+
+	if ((suid != (uid_t) -1) && !uid_valid(ksuid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -786,29 +833,29 @@ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
 
 	retval = -EPERM;
 	if (!nsown_capable(CAP_SETUID)) {
-		if (ruid != (uid_t) -1 && ruid != old->uid &&
-		    ruid != old->euid  && ruid != old->suid)
+		if (ruid != (uid_t) -1        && !uid_eq(kruid, old->uid) &&
+		    !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
 			goto error;
-		if (euid != (uid_t) -1 && euid != old->uid &&
-		    euid != old->euid  && euid != old->suid)
+		if (euid != (uid_t) -1        && !uid_eq(keuid, old->uid) &&
+		    !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid))
 			goto error;
-		if (suid != (uid_t) -1 && suid != old->uid &&
-		    suid != old->euid  && suid != old->suid)
+		if (suid != (uid_t) -1        && !uid_eq(ksuid, old->uid) &&
+		    !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid))
 			goto error;
 	}
 
 	if (ruid != (uid_t) -1) {
-		new->uid = ruid;
-		if (ruid != old->uid) {
+		new->uid = kruid;
+		if (!uid_eq(kruid, old->uid)) {
 			retval = set_user(new);
 			if (retval < 0)
 				goto error;
 		}
 	}
 	if (euid != (uid_t) -1)
-		new->euid = euid;
+		new->euid = keuid;
 	if (suid != (uid_t) -1)
-		new->suid = suid;
+		new->suid = ksuid;
 	new->fsuid = new->euid;
 
 	retval = security_task_fix_setuid(new, old, LSM_SETID_RES);
@@ -822,14 +869,19 @@ error:
 	return retval;
 }
 
-SYSCALL_DEFINE3(getresuid, uid_t __user *, ruid, uid_t __user *, euid, uid_t __user *, suid)
+SYSCALL_DEFINE3(getresuid, uid_t __user *, ruidp, uid_t __user *, euidp, uid_t __user *, suidp)
 {
 	const struct cred *cred = current_cred();
 	int retval;
+	uid_t ruid, euid, suid;
+
+	ruid = from_kuid_munged(cred->user_ns, cred->uid);
+	euid = from_kuid_munged(cred->user_ns, cred->euid);
+	suid = from_kuid_munged(cred->user_ns, cred->suid);
 
-	if (!(retval   = put_user(cred->uid,  ruid)) &&
-	    !(retval   = put_user(cred->euid, euid)))
-		retval = put_user(cred->suid, suid);
+	if (!(retval   = put_user(ruid, ruidp)) &&
+	    !(retval   = put_user(euid, euidp)))
+		retval = put_user(suid, suidp);
 
 	return retval;
 }
@@ -839,9 +891,22 @@ SYSCALL_DEFINE3(getresuid, uid_t __user *, ruid, uid_t __user *, euid, uid_t __u
  */
 SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kgid_t krgid, kegid, ksgid;
+
+	krgid = make_kgid(ns, rgid);
+	kegid = make_kgid(ns, egid);
+	ksgid = make_kgid(ns, sgid);
+
+	if ((rgid != (gid_t) -1) && !gid_valid(krgid))
+		return -EINVAL;
+	if ((egid != (gid_t) -1) && !gid_valid(kegid))
+		return -EINVAL;
+	if ((sgid != (gid_t) -1) && !gid_valid(ksgid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -850,23 +915,23 @@ SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
 
 	retval = -EPERM;
 	if (!nsown_capable(CAP_SETGID)) {
-		if (rgid != (gid_t) -1 && rgid != old->gid &&
-		    rgid != old->egid  && rgid != old->sgid)
+		if (rgid != (gid_t) -1        && !gid_eq(krgid, old->gid) &&
+		    !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
 			goto error;
-		if (egid != (gid_t) -1 && egid != old->gid &&
-		    egid != old->egid  && egid != old->sgid)
+		if (egid != (gid_t) -1        && !gid_eq(kegid, old->gid) &&
+		    !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid))
 			goto error;
-		if (sgid != (gid_t) -1 && sgid != old->gid &&
-		    sgid != old->egid  && sgid != old->sgid)
+		if (sgid != (gid_t) -1        && !gid_eq(ksgid, old->gid) &&
+		    !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid))
 			goto error;
 	}
 
 	if (rgid != (gid_t) -1)
-		new->gid = rgid;
+		new->gid = krgid;
 	if (egid != (gid_t) -1)
-		new->egid = egid;
+		new->egid = kegid;
 	if (sgid != (gid_t) -1)
-		new->sgid = sgid;
+		new->sgid = ksgid;
 	new->fsgid = new->egid;
 
 	return commit_creds(new);
@@ -876,14 +941,19 @@ error:
 	return retval;
 }
 
-SYSCALL_DEFINE3(getresgid, gid_t __user *, rgid, gid_t __user *, egid, gid_t __user *, sgid)
+SYSCALL_DEFINE3(getresgid, gid_t __user *, rgidp, gid_t __user *, egidp, gid_t __user *, sgidp)
 {
 	const struct cred *cred = current_cred();
 	int retval;
+	gid_t rgid, egid, sgid;
+
+	rgid = from_kgid_munged(cred->user_ns, cred->gid);
+	egid = from_kgid_munged(cred->user_ns, cred->egid);
+	sgid = from_kgid_munged(cred->user_ns, cred->sgid);
 
-	if (!(retval   = put_user(cred->gid,  rgid)) &&
-	    !(retval   = put_user(cred->egid, egid)))
-		retval = put_user(cred->sgid, sgid);
+	if (!(retval   = put_user(rgid, rgidp)) &&
+	    !(retval   = put_user(egid, egidp)))
+		retval = put_user(sgid, sgidp);
 
 	return retval;
 }
@@ -900,18 +970,24 @@ SYSCALL_DEFINE1(setfsuid, uid_t, uid)
 	const struct cred *old;
 	struct cred *new;
 	uid_t old_fsuid;
+	kuid_t kuid;
+
+	old = current_cred();
+	old_fsuid = from_kuid_munged(old->user_ns, old->fsuid);
+
+	kuid = make_kuid(old->user_ns, uid);
+	if (!uid_valid(kuid))
+		return old_fsuid;
 
 	new = prepare_creds();
 	if (!new)
-		return current_fsuid();
-	old = current_cred();
-	old_fsuid = old->fsuid;
+		return old_fsuid;
 
-	if (uid == old->uid  || uid == old->euid  ||
-	    uid == old->suid || uid == old->fsuid ||
+	if (uid_eq(kuid, old->uid)  || uid_eq(kuid, old->euid)  ||
+	    uid_eq(kuid, old->suid) || uid_eq(kuid, old->fsuid) ||
 	    nsown_capable(CAP_SETUID)) {
-		if (uid != old_fsuid) {
-			new->fsuid = uid;
+		if (!uid_eq(kuid, old->fsuid)) {
+			new->fsuid = kuid;
 			if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
 				goto change_okay;
 		}
@@ -933,18 +1009,24 @@ SYSCALL_DEFINE1(setfsgid, gid_t, gid)
 	const struct cred *old;
 	struct cred *new;
 	gid_t old_fsgid;
+	kgid_t kgid;
+
+	old = current_cred();
+	old_fsgid = from_kgid_munged(old->user_ns, old->fsgid);
+
+	kgid = make_kgid(old->user_ns, gid);
+	if (!gid_valid(kgid))
+		return old_fsgid;
 
 	new = prepare_creds();
 	if (!new)
-		return current_fsgid();
-	old = current_cred();
-	old_fsgid = old->fsgid;
+		return old_fsgid;
 
-	if (gid == old->gid  || gid == old->egid  ||
-	    gid == old->sgid || gid == old->fsgid ||
+	if (gid_eq(kgid, old->gid)  || gid_eq(kgid, old->egid)  ||
+	    gid_eq(kgid, old->sgid) || gid_eq(kgid, old->fsgid) ||
 	    nsown_capable(CAP_SETGID)) {
-		if (gid != old_fsgid) {
-			new->fsgid = gid;
+		if (!gid_eq(kgid, old->fsgid)) {
+			new->fsgid = kgid;
 			goto change_okay;
 		}
 	}
@@ -1503,10 +1585,10 @@ static int check_prlimit_permission(struct task_struct *task)
 	if (cred->user_ns == tcred->user_ns &&
 	    (cred->uid == tcred->euid &&
 	     cred->uid == tcred->suid &&
-	     cred->uid == tcred->uid  &&
+	     cred->uid == tcred->uid &&
 	     cred->gid == tcred->egid &&
 	     cred->gid == tcred->sgid &&
-	     cred->gid == tcred->gid))
+		    cred->gid == tcred->gid))
 		return 0;
 	if (ns_capable(tcred->user_ns, CAP_SYS_RESOURCE))
 		return 0;
diff --git a/kernel/timer.c b/kernel/timer.c
index a297ffc..67316cb 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1427,25 +1427,25 @@ SYSCALL_DEFINE0(getppid)
 SYSCALL_DEFINE0(getuid)
 {
 	/* Only we change this so SMP safe */
-	return current_uid();
+	return from_kuid_munged(current_user_ns(), current_uid());
 }
 
 SYSCALL_DEFINE0(geteuid)
 {
 	/* Only we change this so SMP safe */
-	return current_euid();
+	return from_kuid_munged(current_user_ns(), current_euid());
 }
 
 SYSCALL_DEFINE0(getgid)
 {
 	/* Only we change this so SMP safe */
-	return current_gid();
+	return from_kgid_munged(current_user_ns(), current_gid());
 }
 
 SYSCALL_DEFINE0(getegid)
 {
 	/* Only we change this so SMP safe */
-	return  current_egid();
+	return from_kgid_munged(current_user_ns(), current_egid());
 }
 
 #endif
diff --git a/kernel/uid16.c b/kernel/uid16.c
index e530bc3..d7948eb 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -81,14 +81,19 @@ SYSCALL_DEFINE3(setresuid16, old_uid_t, ruid, old_uid_t, euid, old_uid_t, suid)
 	return ret;
 }
 
-SYSCALL_DEFINE3(getresuid16, old_uid_t __user *, ruid, old_uid_t __user *, euid, old_uid_t __user *, suid)
+SYSCALL_DEFINE3(getresuid16, old_uid_t __user *, ruidp, old_uid_t __user *, euidp, old_uid_t __user *, suidp)
 {
 	const struct cred *cred = current_cred();
 	int retval;
+	old_uid_t ruid, euid, suid;
 
-	if (!(retval   = put_user(high2lowuid(cred->uid),  ruid)) &&
-	    !(retval   = put_user(high2lowuid(cred->euid), euid)))
-		retval = put_user(high2lowuid(cred->suid), suid);
+	ruid = high2lowuid(from_kuid_munged(cred->user_ns, cred->uid));
+	euid = high2lowuid(from_kuid_munged(cred->user_ns, cred->euid));
+	suid = high2lowuid(from_kuid_munged(cred->user_ns, cred->suid));
+
+	if (!(retval   = put_user(ruid, ruidp)) &&
+	    !(retval   = put_user(euid, euidp)))
+		retval = put_user(suid, suidp);
 
 	return retval;
 }
@@ -103,14 +108,19 @@ SYSCALL_DEFINE3(setresgid16, old_gid_t, rgid, old_gid_t, egid, old_gid_t, sgid)
 }
 
 
-SYSCALL_DEFINE3(getresgid16, old_gid_t __user *, rgid, old_gid_t __user *, egid, old_gid_t __user *, sgid)
+SYSCALL_DEFINE3(getresgid16, old_gid_t __user *, rgidp, old_gid_t __user *, egidp, old_gid_t __user *, sgidp)
 {
 	const struct cred *cred = current_cred();
 	int retval;
+	old_gid_t rgid, egid, sgid;
+
+	rgid = high2lowgid(from_kgid_munged(cred->user_ns, cred->gid));
+	egid = high2lowgid(from_kgid_munged(cred->user_ns, cred->egid));
+	sgid = high2lowgid(from_kgid_munged(cred->user_ns, cred->sgid));
 
-	if (!(retval   = put_user(high2lowgid(cred->gid),  rgid)) &&
-	    !(retval   = put_user(high2lowgid(cred->egid), egid)))
-		retval = put_user(high2lowgid(cred->sgid), sgid);
+	if (!(retval   = put_user(rgid, rgidp)) &&
+	    !(retval   = put_user(egid, egidp)))
+		retval = put_user(sgid, sgidp);
 
 	return retval;
 }
@@ -221,20 +231,20 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
 
 SYSCALL_DEFINE0(getuid16)
 {
-	return high2lowuid(current_uid());
+	return high2lowuid(from_kuid_munged(current_user_ns(), current_uid()));
 }
 
 SYSCALL_DEFINE0(geteuid16)
 {
-	return high2lowuid(current_euid());
+	return high2lowuid(from_kuid_munged(current_user_ns(), current_euid()));
 }
 
 SYSCALL_DEFINE0(getgid16)
 {
-	return high2lowgid(current_gid());
+	return high2lowgid(from_kgid_munged(current_user_ns(), current_gid()));
 }
 
 SYSCALL_DEFINE0(getegid16)
 {
-	return high2lowgid(current_egid());
+	return high2lowgid(from_kgid_munged(current_user_ns(), current_egid()));
 }
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 23/43] userns: Convert setting and getting uid and gid system calls to use kuid and kgid
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Convert setregid, setgid, setreuid, setuid,
setresuid, getresuid, setresgid, getresgid, setfsuid, setfsgid,
getuid, geteuid, getgid, getegid,
waitpid, waitid, wait4.

Convert userspace uids and gids into kuids and kgids before
being placed on struct cred.  Convert struct cred kuids and
kgids into userspace uids and gids when returning them.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 kernel/exit.c  |    6 +-
 kernel/sys.c   |  216 ++++++++++++++++++++++++++++++++++++++-----------------
 kernel/timer.c |    8 +-
 kernel/uid16.c |   34 ++++++---
 4 files changed, 178 insertions(+), 86 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b42..789e3c5 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1214,7 +1214,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
 	unsigned long state;
 	int retval, status, traced;
 	pid_t pid = task_pid_vnr(p);
-	uid_t uid = __task_cred(p)->uid;
+	uid_t uid = from_kuid_munged(current_user_ns(), __task_cred(p)->uid);
 	struct siginfo __user *infop;
 
 	if (!likely(wo->wo_flags & WEXITED))
@@ -1427,7 +1427,7 @@ static int wait_task_stopped(struct wait_opts *wo,
 	if (!unlikely(wo->wo_flags & WNOWAIT))
 		*p_code = 0;
 
-	uid = task_uid(p);
+	uid = from_kuid_munged(current_user_ns(), __task_cred(p)->uid);
 unlock_sig:
 	spin_unlock_irq(&p->sighand->siglock);
 	if (!exit_code)
@@ -1500,7 +1500,7 @@ static int wait_task_continued(struct wait_opts *wo, struct task_struct *p)
 	}
 	if (!unlikely(wo->wo_flags & WNOWAIT))
 		p->signal->flags &= ~SIGNAL_STOP_CONTINUED;
-	uid = task_uid(p);
+	uid = from_kuid_munged(current_user_ns(), __task_cred(p)->uid);
 	spin_unlock_irq(&p->sighand->siglock);
 
 	pid = task_pid_vnr(p);
diff --git a/kernel/sys.c b/kernel/sys.c
index 3996281..aff09f2 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -555,9 +555,19 @@ void ctrl_alt_del(void)
  */
 SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kgid_t krgid, kegid;
+
+	krgid = make_kgid(ns, rgid);
+	kegid = make_kgid(ns, egid);
+
+	if ((rgid != (gid_t) -1) && !gid_valid(krgid))
+		return -EINVAL;
+	if ((egid != (gid_t) -1) && !gid_valid(kegid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -566,25 +576,25 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
 
 	retval = -EPERM;
 	if (rgid != (gid_t) -1) {
-		if (old->gid == rgid ||
-		    old->egid == rgid ||
+		if (gid_eq(old->gid, krgid) ||
+		    gid_eq(old->egid, krgid) ||
 		    nsown_capable(CAP_SETGID))
-			new->gid = rgid;
+			new->gid = krgid;
 		else
 			goto error;
 	}
 	if (egid != (gid_t) -1) {
-		if (old->gid == egid ||
-		    old->egid == egid ||
-		    old->sgid == egid ||
+		if (gid_eq(old->gid, kegid) ||
+		    gid_eq(old->egid, kegid) ||
+		    gid_eq(old->sgid, kegid) ||
 		    nsown_capable(CAP_SETGID))
-			new->egid = egid;
+			new->egid = kegid;
 		else
 			goto error;
 	}
 
 	if (rgid != (gid_t) -1 ||
-	    (egid != (gid_t) -1 && egid != old->gid))
+	    (egid != (gid_t) -1 && !gid_eq(kegid, old->gid)))
 		new->sgid = new->egid;
 	new->fsgid = new->egid;
 
@@ -602,9 +612,15 @@ error:
  */
 SYSCALL_DEFINE1(setgid, gid_t, gid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kgid_t kgid;
+
+	kgid = make_kgid(ns, gid);
+	if (!gid_valid(kgid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -613,9 +629,9 @@ SYSCALL_DEFINE1(setgid, gid_t, gid)
 
 	retval = -EPERM;
 	if (nsown_capable(CAP_SETGID))
-		new->gid = new->egid = new->sgid = new->fsgid = gid;
-	else if (gid == old->gid || gid == old->sgid)
-		new->egid = new->fsgid = gid;
+		new->gid = new->egid = new->sgid = new->fsgid = kgid;
+	else if (gid_eq(kgid, old->gid) || gid_eq(kgid, old->sgid))
+		new->egid = new->fsgid = kgid;
 	else
 		goto error;
 
@@ -672,9 +688,19 @@ static int set_user(struct cred *new)
  */
 SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kuid_t kruid, keuid;
+
+	kruid = make_kuid(ns, ruid);
+	keuid = make_kuid(ns, euid);
+
+	if ((ruid != (uid_t) -1) && !uid_valid(kruid))
+		return -EINVAL;
+	if ((euid != (uid_t) -1) && !uid_valid(keuid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -683,29 +709,29 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
 
 	retval = -EPERM;
 	if (ruid != (uid_t) -1) {
-		new->uid = ruid;
-		if (old->uid != ruid &&
-		    old->euid != ruid &&
+		new->uid = kruid;
+		if (!uid_eq(old->uid, kruid) &&
+		    !uid_eq(old->euid, kruid) &&
 		    !nsown_capable(CAP_SETUID))
 			goto error;
 	}
 
 	if (euid != (uid_t) -1) {
-		new->euid = euid;
-		if (old->uid != euid &&
-		    old->euid != euid &&
-		    old->suid != euid &&
+		new->euid = keuid;
+		if (!uid_eq(old->uid, keuid) &&
+		    !uid_eq(old->euid, keuid) &&
+		    !uid_eq(old->suid, keuid) &&
 		    !nsown_capable(CAP_SETUID))
 			goto error;
 	}
 
-	if (new->uid != old->uid) {
+	if (!uid_eq(new->uid, old->uid)) {
 		retval = set_user(new);
 		if (retval < 0)
 			goto error;
 	}
 	if (ruid != (uid_t) -1 ||
-	    (euid != (uid_t) -1 && euid != old->uid))
+	    (euid != (uid_t) -1 && !uid_eq(keuid, old->uid)))
 		new->suid = new->euid;
 	new->fsuid = new->euid;
 
@@ -733,9 +759,15 @@ error:
  */
 SYSCALL_DEFINE1(setuid, uid_t, uid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kuid_t kuid;
+
+	kuid = make_kuid(ns, uid);
+	if (!uid_valid(kuid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -744,17 +776,17 @@ SYSCALL_DEFINE1(setuid, uid_t, uid)
 
 	retval = -EPERM;
 	if (nsown_capable(CAP_SETUID)) {
-		new->suid = new->uid = uid;
-		if (uid != old->uid) {
+		new->suid = new->uid = kuid;
+		if (!uid_eq(kuid, old->uid)) {
 			retval = set_user(new);
 			if (retval < 0)
 				goto error;
 		}
-	} else if (uid != old->uid && uid != new->suid) {
+	} else if (!uid_eq(kuid, old->uid) && !uid_eq(kuid, new->suid)) {
 		goto error;
 	}
 
-	new->fsuid = new->euid = uid;
+	new->fsuid = new->euid = kuid;
 
 	retval = security_task_fix_setuid(new, old, LSM_SETID_ID);
 	if (retval < 0)
@@ -774,9 +806,24 @@ error:
  */
 SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kuid_t kruid, keuid, ksuid;
+
+	kruid = make_kuid(ns, ruid);
+	keuid = make_kuid(ns, euid);
+	ksuid = make_kuid(ns, suid);
+
+	if ((ruid != (uid_t) -1) && !uid_valid(kruid))
+		return -EINVAL;
+
+	if ((euid != (uid_t) -1) && !uid_valid(keuid))
+		return -EINVAL;
+
+	if ((suid != (uid_t) -1) && !uid_valid(ksuid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -786,29 +833,29 @@ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
 
 	retval = -EPERM;
 	if (!nsown_capable(CAP_SETUID)) {
-		if (ruid != (uid_t) -1 && ruid != old->uid &&
-		    ruid != old->euid  && ruid != old->suid)
+		if (ruid != (uid_t) -1        && !uid_eq(kruid, old->uid) &&
+		    !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
 			goto error;
-		if (euid != (uid_t) -1 && euid != old->uid &&
-		    euid != old->euid  && euid != old->suid)
+		if (euid != (uid_t) -1        && !uid_eq(keuid, old->uid) &&
+		    !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid))
 			goto error;
-		if (suid != (uid_t) -1 && suid != old->uid &&
-		    suid != old->euid  && suid != old->suid)
+		if (suid != (uid_t) -1        && !uid_eq(ksuid, old->uid) &&
+		    !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid))
 			goto error;
 	}
 
 	if (ruid != (uid_t) -1) {
-		new->uid = ruid;
-		if (ruid != old->uid) {
+		new->uid = kruid;
+		if (!uid_eq(kruid, old->uid)) {
 			retval = set_user(new);
 			if (retval < 0)
 				goto error;
 		}
 	}
 	if (euid != (uid_t) -1)
-		new->euid = euid;
+		new->euid = keuid;
 	if (suid != (uid_t) -1)
-		new->suid = suid;
+		new->suid = ksuid;
 	new->fsuid = new->euid;
 
 	retval = security_task_fix_setuid(new, old, LSM_SETID_RES);
@@ -822,14 +869,19 @@ error:
 	return retval;
 }
 
-SYSCALL_DEFINE3(getresuid, uid_t __user *, ruid, uid_t __user *, euid, uid_t __user *, suid)
+SYSCALL_DEFINE3(getresuid, uid_t __user *, ruidp, uid_t __user *, euidp, uid_t __user *, suidp)
 {
 	const struct cred *cred = current_cred();
 	int retval;
+	uid_t ruid, euid, suid;
+
+	ruid = from_kuid_munged(cred->user_ns, cred->uid);
+	euid = from_kuid_munged(cred->user_ns, cred->euid);
+	suid = from_kuid_munged(cred->user_ns, cred->suid);
 
-	if (!(retval   = put_user(cred->uid,  ruid)) &&
-	    !(retval   = put_user(cred->euid, euid)))
-		retval = put_user(cred->suid, suid);
+	if (!(retval   = put_user(ruid, ruidp)) &&
+	    !(retval   = put_user(euid, euidp)))
+		retval = put_user(suid, suidp);
 
 	return retval;
 }
@@ -839,9 +891,22 @@ SYSCALL_DEFINE3(getresuid, uid_t __user *, ruid, uid_t __user *, euid, uid_t __u
  */
 SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kgid_t krgid, kegid, ksgid;
+
+	krgid = make_kgid(ns, rgid);
+	kegid = make_kgid(ns, egid);
+	ksgid = make_kgid(ns, sgid);
+
+	if ((rgid != (gid_t) -1) && !gid_valid(krgid))
+		return -EINVAL;
+	if ((egid != (gid_t) -1) && !gid_valid(kegid))
+		return -EINVAL;
+	if ((sgid != (gid_t) -1) && !gid_valid(ksgid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -850,23 +915,23 @@ SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
 
 	retval = -EPERM;
 	if (!nsown_capable(CAP_SETGID)) {
-		if (rgid != (gid_t) -1 && rgid != old->gid &&
-		    rgid != old->egid  && rgid != old->sgid)
+		if (rgid != (gid_t) -1        && !gid_eq(krgid, old->gid) &&
+		    !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
 			goto error;
-		if (egid != (gid_t) -1 && egid != old->gid &&
-		    egid != old->egid  && egid != old->sgid)
+		if (egid != (gid_t) -1        && !gid_eq(kegid, old->gid) &&
+		    !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid))
 			goto error;
-		if (sgid != (gid_t) -1 && sgid != old->gid &&
-		    sgid != old->egid  && sgid != old->sgid)
+		if (sgid != (gid_t) -1        && !gid_eq(ksgid, old->gid) &&
+		    !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid))
 			goto error;
 	}
 
 	if (rgid != (gid_t) -1)
-		new->gid = rgid;
+		new->gid = krgid;
 	if (egid != (gid_t) -1)
-		new->egid = egid;
+		new->egid = kegid;
 	if (sgid != (gid_t) -1)
-		new->sgid = sgid;
+		new->sgid = ksgid;
 	new->fsgid = new->egid;
 
 	return commit_creds(new);
@@ -876,14 +941,19 @@ error:
 	return retval;
 }
 
-SYSCALL_DEFINE3(getresgid, gid_t __user *, rgid, gid_t __user *, egid, gid_t __user *, sgid)
+SYSCALL_DEFINE3(getresgid, gid_t __user *, rgidp, gid_t __user *, egidp, gid_t __user *, sgidp)
 {
 	const struct cred *cred = current_cred();
 	int retval;
+	gid_t rgid, egid, sgid;
+
+	rgid = from_kgid_munged(cred->user_ns, cred->gid);
+	egid = from_kgid_munged(cred->user_ns, cred->egid);
+	sgid = from_kgid_munged(cred->user_ns, cred->sgid);
 
-	if (!(retval   = put_user(cred->gid,  rgid)) &&
-	    !(retval   = put_user(cred->egid, egid)))
-		retval = put_user(cred->sgid, sgid);
+	if (!(retval   = put_user(rgid, rgidp)) &&
+	    !(retval   = put_user(egid, egidp)))
+		retval = put_user(sgid, sgidp);
 
 	return retval;
 }
@@ -900,18 +970,24 @@ SYSCALL_DEFINE1(setfsuid, uid_t, uid)
 	const struct cred *old;
 	struct cred *new;
 	uid_t old_fsuid;
+	kuid_t kuid;
+
+	old = current_cred();
+	old_fsuid = from_kuid_munged(old->user_ns, old->fsuid);
+
+	kuid = make_kuid(old->user_ns, uid);
+	if (!uid_valid(kuid))
+		return old_fsuid;
 
 	new = prepare_creds();
 	if (!new)
-		return current_fsuid();
-	old = current_cred();
-	old_fsuid = old->fsuid;
+		return old_fsuid;
 
-	if (uid == old->uid  || uid == old->euid  ||
-	    uid == old->suid || uid == old->fsuid ||
+	if (uid_eq(kuid, old->uid)  || uid_eq(kuid, old->euid)  ||
+	    uid_eq(kuid, old->suid) || uid_eq(kuid, old->fsuid) ||
 	    nsown_capable(CAP_SETUID)) {
-		if (uid != old_fsuid) {
-			new->fsuid = uid;
+		if (!uid_eq(kuid, old->fsuid)) {
+			new->fsuid = kuid;
 			if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
 				goto change_okay;
 		}
@@ -933,18 +1009,24 @@ SYSCALL_DEFINE1(setfsgid, gid_t, gid)
 	const struct cred *old;
 	struct cred *new;
 	gid_t old_fsgid;
+	kgid_t kgid;
+
+	old = current_cred();
+	old_fsgid = from_kgid_munged(old->user_ns, old->fsgid);
+
+	kgid = make_kgid(old->user_ns, gid);
+	if (!gid_valid(kgid))
+		return old_fsgid;
 
 	new = prepare_creds();
 	if (!new)
-		return current_fsgid();
-	old = current_cred();
-	old_fsgid = old->fsgid;
+		return old_fsgid;
 
-	if (gid == old->gid  || gid == old->egid  ||
-	    gid == old->sgid || gid == old->fsgid ||
+	if (gid_eq(kgid, old->gid)  || gid_eq(kgid, old->egid)  ||
+	    gid_eq(kgid, old->sgid) || gid_eq(kgid, old->fsgid) ||
 	    nsown_capable(CAP_SETGID)) {
-		if (gid != old_fsgid) {
-			new->fsgid = gid;
+		if (!gid_eq(kgid, old->fsgid)) {
+			new->fsgid = kgid;
 			goto change_okay;
 		}
 	}
@@ -1503,10 +1585,10 @@ static int check_prlimit_permission(struct task_struct *task)
 	if (cred->user_ns == tcred->user_ns &&
 	    (cred->uid == tcred->euid &&
 	     cred->uid == tcred->suid &&
-	     cred->uid == tcred->uid  &&
+	     cred->uid == tcred->uid &&
 	     cred->gid == tcred->egid &&
 	     cred->gid == tcred->sgid &&
-	     cred->gid == tcred->gid))
+		    cred->gid == tcred->gid))
 		return 0;
 	if (ns_capable(tcred->user_ns, CAP_SYS_RESOURCE))
 		return 0;
diff --git a/kernel/timer.c b/kernel/timer.c
index a297ffc..67316cb 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1427,25 +1427,25 @@ SYSCALL_DEFINE0(getppid)
 SYSCALL_DEFINE0(getuid)
 {
 	/* Only we change this so SMP safe */
-	return current_uid();
+	return from_kuid_munged(current_user_ns(), current_uid());
 }
 
 SYSCALL_DEFINE0(geteuid)
 {
 	/* Only we change this so SMP safe */
-	return current_euid();
+	return from_kuid_munged(current_user_ns(), current_euid());
 }
 
 SYSCALL_DEFINE0(getgid)
 {
 	/* Only we change this so SMP safe */
-	return current_gid();
+	return from_kgid_munged(current_user_ns(), current_gid());
 }
 
 SYSCALL_DEFINE0(getegid)
 {
 	/* Only we change this so SMP safe */
-	return  current_egid();
+	return from_kgid_munged(current_user_ns(), current_egid());
 }
 
 #endif
diff --git a/kernel/uid16.c b/kernel/uid16.c
index e530bc3..d7948eb 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -81,14 +81,19 @@ SYSCALL_DEFINE3(setresuid16, old_uid_t, ruid, old_uid_t, euid, old_uid_t, suid)
 	return ret;
 }
 
-SYSCALL_DEFINE3(getresuid16, old_uid_t __user *, ruid, old_uid_t __user *, euid, old_uid_t __user *, suid)
+SYSCALL_DEFINE3(getresuid16, old_uid_t __user *, ruidp, old_uid_t __user *, euidp, old_uid_t __user *, suidp)
 {
 	const struct cred *cred = current_cred();
 	int retval;
+	old_uid_t ruid, euid, suid;
 
-	if (!(retval   = put_user(high2lowuid(cred->uid),  ruid)) &&
-	    !(retval   = put_user(high2lowuid(cred->euid), euid)))
-		retval = put_user(high2lowuid(cred->suid), suid);
+	ruid = high2lowuid(from_kuid_munged(cred->user_ns, cred->uid));
+	euid = high2lowuid(from_kuid_munged(cred->user_ns, cred->euid));
+	suid = high2lowuid(from_kuid_munged(cred->user_ns, cred->suid));
+
+	if (!(retval   = put_user(ruid, ruidp)) &&
+	    !(retval   = put_user(euid, euidp)))
+		retval = put_user(suid, suidp);
 
 	return retval;
 }
@@ -103,14 +108,19 @@ SYSCALL_DEFINE3(setresgid16, old_gid_t, rgid, old_gid_t, egid, old_gid_t, sgid)
 }
 
 
-SYSCALL_DEFINE3(getresgid16, old_gid_t __user *, rgid, old_gid_t __user *, egid, old_gid_t __user *, sgid)
+SYSCALL_DEFINE3(getresgid16, old_gid_t __user *, rgidp, old_gid_t __user *, egidp, old_gid_t __user *, sgidp)
 {
 	const struct cred *cred = current_cred();
 	int retval;
+	old_gid_t rgid, egid, sgid;
+
+	rgid = high2lowgid(from_kgid_munged(cred->user_ns, cred->gid));
+	egid = high2lowgid(from_kgid_munged(cred->user_ns, cred->egid));
+	sgid = high2lowgid(from_kgid_munged(cred->user_ns, cred->sgid));
 
-	if (!(retval   = put_user(high2lowgid(cred->gid),  rgid)) &&
-	    !(retval   = put_user(high2lowgid(cred->egid), egid)))
-		retval = put_user(high2lowgid(cred->sgid), sgid);
+	if (!(retval   = put_user(rgid, rgidp)) &&
+	    !(retval   = put_user(egid, egidp)))
+		retval = put_user(sgid, sgidp);
 
 	return retval;
 }
@@ -221,20 +231,20 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
 
 SYSCALL_DEFINE0(getuid16)
 {
-	return high2lowuid(current_uid());
+	return high2lowuid(from_kuid_munged(current_user_ns(), current_uid()));
 }
 
 SYSCALL_DEFINE0(geteuid16)
 {
-	return high2lowuid(current_euid());
+	return high2lowuid(from_kuid_munged(current_user_ns(), current_euid()));
 }
 
 SYSCALL_DEFINE0(getgid16)
 {
-	return high2lowgid(current_gid());
+	return high2lowgid(from_kgid_munged(current_user_ns(), current_gid()));
 }
 
 SYSCALL_DEFINE0(getegid16)
 {
-	return high2lowgid(current_egid());
+	return high2lowgid(from_kgid_munged(current_user_ns(), current_egid()));
 }
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 23/43] userns: Convert setting and getting uid and gid system calls to use kuid and kgid
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Convert setregid, setgid, setreuid, setuid,
setresuid, getresuid, setresgid, getresgid, setfsuid, setfsgid,
getuid, geteuid, getgid, getegid,
waitpid, waitid, wait4.

Convert userspace uids and gids into kuids and kgids before
being placed on struct cred.  Convert struct cred kuids and
kgids into userspace uids and gids when returning them.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/exit.c  |    6 +-
 kernel/sys.c   |  216 ++++++++++++++++++++++++++++++++++++++-----------------
 kernel/timer.c |    8 +-
 kernel/uid16.c |   34 ++++++---
 4 files changed, 178 insertions(+), 86 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b42..789e3c5 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1214,7 +1214,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
 	unsigned long state;
 	int retval, status, traced;
 	pid_t pid = task_pid_vnr(p);
-	uid_t uid = __task_cred(p)->uid;
+	uid_t uid = from_kuid_munged(current_user_ns(), __task_cred(p)->uid);
 	struct siginfo __user *infop;
 
 	if (!likely(wo->wo_flags & WEXITED))
@@ -1427,7 +1427,7 @@ static int wait_task_stopped(struct wait_opts *wo,
 	if (!unlikely(wo->wo_flags & WNOWAIT))
 		*p_code = 0;
 
-	uid = task_uid(p);
+	uid = from_kuid_munged(current_user_ns(), __task_cred(p)->uid);
 unlock_sig:
 	spin_unlock_irq(&p->sighand->siglock);
 	if (!exit_code)
@@ -1500,7 +1500,7 @@ static int wait_task_continued(struct wait_opts *wo, struct task_struct *p)
 	}
 	if (!unlikely(wo->wo_flags & WNOWAIT))
 		p->signal->flags &= ~SIGNAL_STOP_CONTINUED;
-	uid = task_uid(p);
+	uid = from_kuid_munged(current_user_ns(), __task_cred(p)->uid);
 	spin_unlock_irq(&p->sighand->siglock);
 
 	pid = task_pid_vnr(p);
diff --git a/kernel/sys.c b/kernel/sys.c
index 3996281..aff09f2 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -555,9 +555,19 @@ void ctrl_alt_del(void)
  */
 SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kgid_t krgid, kegid;
+
+	krgid = make_kgid(ns, rgid);
+	kegid = make_kgid(ns, egid);
+
+	if ((rgid != (gid_t) -1) && !gid_valid(krgid))
+		return -EINVAL;
+	if ((egid != (gid_t) -1) && !gid_valid(kegid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -566,25 +576,25 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
 
 	retval = -EPERM;
 	if (rgid != (gid_t) -1) {
-		if (old->gid == rgid ||
-		    old->egid == rgid ||
+		if (gid_eq(old->gid, krgid) ||
+		    gid_eq(old->egid, krgid) ||
 		    nsown_capable(CAP_SETGID))
-			new->gid = rgid;
+			new->gid = krgid;
 		else
 			goto error;
 	}
 	if (egid != (gid_t) -1) {
-		if (old->gid == egid ||
-		    old->egid == egid ||
-		    old->sgid == egid ||
+		if (gid_eq(old->gid, kegid) ||
+		    gid_eq(old->egid, kegid) ||
+		    gid_eq(old->sgid, kegid) ||
 		    nsown_capable(CAP_SETGID))
-			new->egid = egid;
+			new->egid = kegid;
 		else
 			goto error;
 	}
 
 	if (rgid != (gid_t) -1 ||
-	    (egid != (gid_t) -1 && egid != old->gid))
+	    (egid != (gid_t) -1 && !gid_eq(kegid, old->gid)))
 		new->sgid = new->egid;
 	new->fsgid = new->egid;
 
@@ -602,9 +612,15 @@ error:
  */
 SYSCALL_DEFINE1(setgid, gid_t, gid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kgid_t kgid;
+
+	kgid = make_kgid(ns, gid);
+	if (!gid_valid(kgid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -613,9 +629,9 @@ SYSCALL_DEFINE1(setgid, gid_t, gid)
 
 	retval = -EPERM;
 	if (nsown_capable(CAP_SETGID))
-		new->gid = new->egid = new->sgid = new->fsgid = gid;
-	else if (gid == old->gid || gid == old->sgid)
-		new->egid = new->fsgid = gid;
+		new->gid = new->egid = new->sgid = new->fsgid = kgid;
+	else if (gid_eq(kgid, old->gid) || gid_eq(kgid, old->sgid))
+		new->egid = new->fsgid = kgid;
 	else
 		goto error;
 
@@ -672,9 +688,19 @@ static int set_user(struct cred *new)
  */
 SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kuid_t kruid, keuid;
+
+	kruid = make_kuid(ns, ruid);
+	keuid = make_kuid(ns, euid);
+
+	if ((ruid != (uid_t) -1) && !uid_valid(kruid))
+		return -EINVAL;
+	if ((euid != (uid_t) -1) && !uid_valid(keuid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -683,29 +709,29 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
 
 	retval = -EPERM;
 	if (ruid != (uid_t) -1) {
-		new->uid = ruid;
-		if (old->uid != ruid &&
-		    old->euid != ruid &&
+		new->uid = kruid;
+		if (!uid_eq(old->uid, kruid) &&
+		    !uid_eq(old->euid, kruid) &&
 		    !nsown_capable(CAP_SETUID))
 			goto error;
 	}
 
 	if (euid != (uid_t) -1) {
-		new->euid = euid;
-		if (old->uid != euid &&
-		    old->euid != euid &&
-		    old->suid != euid &&
+		new->euid = keuid;
+		if (!uid_eq(old->uid, keuid) &&
+		    !uid_eq(old->euid, keuid) &&
+		    !uid_eq(old->suid, keuid) &&
 		    !nsown_capable(CAP_SETUID))
 			goto error;
 	}
 
-	if (new->uid != old->uid) {
+	if (!uid_eq(new->uid, old->uid)) {
 		retval = set_user(new);
 		if (retval < 0)
 			goto error;
 	}
 	if (ruid != (uid_t) -1 ||
-	    (euid != (uid_t) -1 && euid != old->uid))
+	    (euid != (uid_t) -1 && !uid_eq(keuid, old->uid)))
 		new->suid = new->euid;
 	new->fsuid = new->euid;
 
@@ -733,9 +759,15 @@ error:
  */
 SYSCALL_DEFINE1(setuid, uid_t, uid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kuid_t kuid;
+
+	kuid = make_kuid(ns, uid);
+	if (!uid_valid(kuid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -744,17 +776,17 @@ SYSCALL_DEFINE1(setuid, uid_t, uid)
 
 	retval = -EPERM;
 	if (nsown_capable(CAP_SETUID)) {
-		new->suid = new->uid = uid;
-		if (uid != old->uid) {
+		new->suid = new->uid = kuid;
+		if (!uid_eq(kuid, old->uid)) {
 			retval = set_user(new);
 			if (retval < 0)
 				goto error;
 		}
-	} else if (uid != old->uid && uid != new->suid) {
+	} else if (!uid_eq(kuid, old->uid) && !uid_eq(kuid, new->suid)) {
 		goto error;
 	}
 
-	new->fsuid = new->euid = uid;
+	new->fsuid = new->euid = kuid;
 
 	retval = security_task_fix_setuid(new, old, LSM_SETID_ID);
 	if (retval < 0)
@@ -774,9 +806,24 @@ error:
  */
 SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kuid_t kruid, keuid, ksuid;
+
+	kruid = make_kuid(ns, ruid);
+	keuid = make_kuid(ns, euid);
+	ksuid = make_kuid(ns, suid);
+
+	if ((ruid != (uid_t) -1) && !uid_valid(kruid))
+		return -EINVAL;
+
+	if ((euid != (uid_t) -1) && !uid_valid(keuid))
+		return -EINVAL;
+
+	if ((suid != (uid_t) -1) && !uid_valid(ksuid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -786,29 +833,29 @@ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
 
 	retval = -EPERM;
 	if (!nsown_capable(CAP_SETUID)) {
-		if (ruid != (uid_t) -1 && ruid != old->uid &&
-		    ruid != old->euid  && ruid != old->suid)
+		if (ruid != (uid_t) -1        && !uid_eq(kruid, old->uid) &&
+		    !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
 			goto error;
-		if (euid != (uid_t) -1 && euid != old->uid &&
-		    euid != old->euid  && euid != old->suid)
+		if (euid != (uid_t) -1        && !uid_eq(keuid, old->uid) &&
+		    !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid))
 			goto error;
-		if (suid != (uid_t) -1 && suid != old->uid &&
-		    suid != old->euid  && suid != old->suid)
+		if (suid != (uid_t) -1        && !uid_eq(ksuid, old->uid) &&
+		    !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid))
 			goto error;
 	}
 
 	if (ruid != (uid_t) -1) {
-		new->uid = ruid;
-		if (ruid != old->uid) {
+		new->uid = kruid;
+		if (!uid_eq(kruid, old->uid)) {
 			retval = set_user(new);
 			if (retval < 0)
 				goto error;
 		}
 	}
 	if (euid != (uid_t) -1)
-		new->euid = euid;
+		new->euid = keuid;
 	if (suid != (uid_t) -1)
-		new->suid = suid;
+		new->suid = ksuid;
 	new->fsuid = new->euid;
 
 	retval = security_task_fix_setuid(new, old, LSM_SETID_RES);
@@ -822,14 +869,19 @@ error:
 	return retval;
 }
 
-SYSCALL_DEFINE3(getresuid, uid_t __user *, ruid, uid_t __user *, euid, uid_t __user *, suid)
+SYSCALL_DEFINE3(getresuid, uid_t __user *, ruidp, uid_t __user *, euidp, uid_t __user *, suidp)
 {
 	const struct cred *cred = current_cred();
 	int retval;
+	uid_t ruid, euid, suid;
+
+	ruid = from_kuid_munged(cred->user_ns, cred->uid);
+	euid = from_kuid_munged(cred->user_ns, cred->euid);
+	suid = from_kuid_munged(cred->user_ns, cred->suid);
 
-	if (!(retval   = put_user(cred->uid,  ruid)) &&
-	    !(retval   = put_user(cred->euid, euid)))
-		retval = put_user(cred->suid, suid);
+	if (!(retval   = put_user(ruid, ruidp)) &&
+	    !(retval   = put_user(euid, euidp)))
+		retval = put_user(suid, suidp);
 
 	return retval;
 }
@@ -839,9 +891,22 @@ SYSCALL_DEFINE3(getresuid, uid_t __user *, ruid, uid_t __user *, euid, uid_t __u
  */
 SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
 {
+	struct user_namespace *ns = current_user_ns();
 	const struct cred *old;
 	struct cred *new;
 	int retval;
+	kgid_t krgid, kegid, ksgid;
+
+	krgid = make_kgid(ns, rgid);
+	kegid = make_kgid(ns, egid);
+	ksgid = make_kgid(ns, sgid);
+
+	if ((rgid != (gid_t) -1) && !gid_valid(krgid))
+		return -EINVAL;
+	if ((egid != (gid_t) -1) && !gid_valid(kegid))
+		return -EINVAL;
+	if ((sgid != (gid_t) -1) && !gid_valid(ksgid))
+		return -EINVAL;
 
 	new = prepare_creds();
 	if (!new)
@@ -850,23 +915,23 @@ SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
 
 	retval = -EPERM;
 	if (!nsown_capable(CAP_SETGID)) {
-		if (rgid != (gid_t) -1 && rgid != old->gid &&
-		    rgid != old->egid  && rgid != old->sgid)
+		if (rgid != (gid_t) -1        && !gid_eq(krgid, old->gid) &&
+		    !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
 			goto error;
-		if (egid != (gid_t) -1 && egid != old->gid &&
-		    egid != old->egid  && egid != old->sgid)
+		if (egid != (gid_t) -1        && !gid_eq(kegid, old->gid) &&
+		    !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid))
 			goto error;
-		if (sgid != (gid_t) -1 && sgid != old->gid &&
-		    sgid != old->egid  && sgid != old->sgid)
+		if (sgid != (gid_t) -1        && !gid_eq(ksgid, old->gid) &&
+		    !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid))
 			goto error;
 	}
 
 	if (rgid != (gid_t) -1)
-		new->gid = rgid;
+		new->gid = krgid;
 	if (egid != (gid_t) -1)
-		new->egid = egid;
+		new->egid = kegid;
 	if (sgid != (gid_t) -1)
-		new->sgid = sgid;
+		new->sgid = ksgid;
 	new->fsgid = new->egid;
 
 	return commit_creds(new);
@@ -876,14 +941,19 @@ error:
 	return retval;
 }
 
-SYSCALL_DEFINE3(getresgid, gid_t __user *, rgid, gid_t __user *, egid, gid_t __user *, sgid)
+SYSCALL_DEFINE3(getresgid, gid_t __user *, rgidp, gid_t __user *, egidp, gid_t __user *, sgidp)
 {
 	const struct cred *cred = current_cred();
 	int retval;
+	gid_t rgid, egid, sgid;
+
+	rgid = from_kgid_munged(cred->user_ns, cred->gid);
+	egid = from_kgid_munged(cred->user_ns, cred->egid);
+	sgid = from_kgid_munged(cred->user_ns, cred->sgid);
 
-	if (!(retval   = put_user(cred->gid,  rgid)) &&
-	    !(retval   = put_user(cred->egid, egid)))
-		retval = put_user(cred->sgid, sgid);
+	if (!(retval   = put_user(rgid, rgidp)) &&
+	    !(retval   = put_user(egid, egidp)))
+		retval = put_user(sgid, sgidp);
 
 	return retval;
 }
@@ -900,18 +970,24 @@ SYSCALL_DEFINE1(setfsuid, uid_t, uid)
 	const struct cred *old;
 	struct cred *new;
 	uid_t old_fsuid;
+	kuid_t kuid;
+
+	old = current_cred();
+	old_fsuid = from_kuid_munged(old->user_ns, old->fsuid);
+
+	kuid = make_kuid(old->user_ns, uid);
+	if (!uid_valid(kuid))
+		return old_fsuid;
 
 	new = prepare_creds();
 	if (!new)
-		return current_fsuid();
-	old = current_cred();
-	old_fsuid = old->fsuid;
+		return old_fsuid;
 
-	if (uid == old->uid  || uid == old->euid  ||
-	    uid == old->suid || uid == old->fsuid ||
+	if (uid_eq(kuid, old->uid)  || uid_eq(kuid, old->euid)  ||
+	    uid_eq(kuid, old->suid) || uid_eq(kuid, old->fsuid) ||
 	    nsown_capable(CAP_SETUID)) {
-		if (uid != old_fsuid) {
-			new->fsuid = uid;
+		if (!uid_eq(kuid, old->fsuid)) {
+			new->fsuid = kuid;
 			if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
 				goto change_okay;
 		}
@@ -933,18 +1009,24 @@ SYSCALL_DEFINE1(setfsgid, gid_t, gid)
 	const struct cred *old;
 	struct cred *new;
 	gid_t old_fsgid;
+	kgid_t kgid;
+
+	old = current_cred();
+	old_fsgid = from_kgid_munged(old->user_ns, old->fsgid);
+
+	kgid = make_kgid(old->user_ns, gid);
+	if (!gid_valid(kgid))
+		return old_fsgid;
 
 	new = prepare_creds();
 	if (!new)
-		return current_fsgid();
-	old = current_cred();
-	old_fsgid = old->fsgid;
+		return old_fsgid;
 
-	if (gid == old->gid  || gid == old->egid  ||
-	    gid == old->sgid || gid == old->fsgid ||
+	if (gid_eq(kgid, old->gid)  || gid_eq(kgid, old->egid)  ||
+	    gid_eq(kgid, old->sgid) || gid_eq(kgid, old->fsgid) ||
 	    nsown_capable(CAP_SETGID)) {
-		if (gid != old_fsgid) {
-			new->fsgid = gid;
+		if (!gid_eq(kgid, old->fsgid)) {
+			new->fsgid = kgid;
 			goto change_okay;
 		}
 	}
@@ -1503,10 +1585,10 @@ static int check_prlimit_permission(struct task_struct *task)
 	if (cred->user_ns == tcred->user_ns &&
 	    (cred->uid == tcred->euid &&
 	     cred->uid == tcred->suid &&
-	     cred->uid == tcred->uid  &&
+	     cred->uid == tcred->uid &&
 	     cred->gid == tcred->egid &&
 	     cred->gid == tcred->sgid &&
-	     cred->gid == tcred->gid))
+		    cred->gid == tcred->gid))
 		return 0;
 	if (ns_capable(tcred->user_ns, CAP_SYS_RESOURCE))
 		return 0;
diff --git a/kernel/timer.c b/kernel/timer.c
index a297ffc..67316cb 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1427,25 +1427,25 @@ SYSCALL_DEFINE0(getppid)
 SYSCALL_DEFINE0(getuid)
 {
 	/* Only we change this so SMP safe */
-	return current_uid();
+	return from_kuid_munged(current_user_ns(), current_uid());
 }
 
 SYSCALL_DEFINE0(geteuid)
 {
 	/* Only we change this so SMP safe */
-	return current_euid();
+	return from_kuid_munged(current_user_ns(), current_euid());
 }
 
 SYSCALL_DEFINE0(getgid)
 {
 	/* Only we change this so SMP safe */
-	return current_gid();
+	return from_kgid_munged(current_user_ns(), current_gid());
 }
 
 SYSCALL_DEFINE0(getegid)
 {
 	/* Only we change this so SMP safe */
-	return  current_egid();
+	return from_kgid_munged(current_user_ns(), current_egid());
 }
 
 #endif
diff --git a/kernel/uid16.c b/kernel/uid16.c
index e530bc3..d7948eb 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -81,14 +81,19 @@ SYSCALL_DEFINE3(setresuid16, old_uid_t, ruid, old_uid_t, euid, old_uid_t, suid)
 	return ret;
 }
 
-SYSCALL_DEFINE3(getresuid16, old_uid_t __user *, ruid, old_uid_t __user *, euid, old_uid_t __user *, suid)
+SYSCALL_DEFINE3(getresuid16, old_uid_t __user *, ruidp, old_uid_t __user *, euidp, old_uid_t __user *, suidp)
 {
 	const struct cred *cred = current_cred();
 	int retval;
+	old_uid_t ruid, euid, suid;
 
-	if (!(retval   = put_user(high2lowuid(cred->uid),  ruid)) &&
-	    !(retval   = put_user(high2lowuid(cred->euid), euid)))
-		retval = put_user(high2lowuid(cred->suid), suid);
+	ruid = high2lowuid(from_kuid_munged(cred->user_ns, cred->uid));
+	euid = high2lowuid(from_kuid_munged(cred->user_ns, cred->euid));
+	suid = high2lowuid(from_kuid_munged(cred->user_ns, cred->suid));
+
+	if (!(retval   = put_user(ruid, ruidp)) &&
+	    !(retval   = put_user(euid, euidp)))
+		retval = put_user(suid, suidp);
 
 	return retval;
 }
@@ -103,14 +108,19 @@ SYSCALL_DEFINE3(setresgid16, old_gid_t, rgid, old_gid_t, egid, old_gid_t, sgid)
 }
 
 
-SYSCALL_DEFINE3(getresgid16, old_gid_t __user *, rgid, old_gid_t __user *, egid, old_gid_t __user *, sgid)
+SYSCALL_DEFINE3(getresgid16, old_gid_t __user *, rgidp, old_gid_t __user *, egidp, old_gid_t __user *, sgidp)
 {
 	const struct cred *cred = current_cred();
 	int retval;
+	old_gid_t rgid, egid, sgid;
+
+	rgid = high2lowgid(from_kgid_munged(cred->user_ns, cred->gid));
+	egid = high2lowgid(from_kgid_munged(cred->user_ns, cred->egid));
+	sgid = high2lowgid(from_kgid_munged(cred->user_ns, cred->sgid));
 
-	if (!(retval   = put_user(high2lowgid(cred->gid),  rgid)) &&
-	    !(retval   = put_user(high2lowgid(cred->egid), egid)))
-		retval = put_user(high2lowgid(cred->sgid), sgid);
+	if (!(retval   = put_user(rgid, rgidp)) &&
+	    !(retval   = put_user(egid, egidp)))
+		retval = put_user(sgid, sgidp);
 
 	return retval;
 }
@@ -221,20 +231,20 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
 
 SYSCALL_DEFINE0(getuid16)
 {
-	return high2lowuid(current_uid());
+	return high2lowuid(from_kuid_munged(current_user_ns(), current_uid()));
 }
 
 SYSCALL_DEFINE0(geteuid16)
 {
-	return high2lowuid(current_euid());
+	return high2lowuid(from_kuid_munged(current_user_ns(), current_euid()));
 }
 
 SYSCALL_DEFINE0(getgid16)
 {
-	return high2lowgid(current_gid());
+	return high2lowgid(from_kgid_munged(current_user_ns(), current_gid()));
 }
 
 SYSCALL_DEFINE0(getegid16)
 {
-	return high2lowgid(current_egid());
+	return high2lowgid(from_kgid_munged(current_user_ns(), current_egid()));
 }
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 24/43] userns: Convert ptrace, kill, set_priority permission checks to work with kuids and kgids
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Update the permission checks to use the new uid_eq and gid_eq helpers
and remove the now unnecessary user_ns equality comparison.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/ptrace.c |   13 ++++++-------
 kernel/signal.c |   15 ++++++---------
 kernel/sys.c    |   18 ++++++++----------
 3 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 24e0a5a..a232bb5 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -198,13 +198,12 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 		return 0;
 	rcu_read_lock();
 	tcred = __task_cred(task);
-	if (cred->user_ns == tcred->user_ns &&
-	    (cred->uid == tcred->euid &&
-	     cred->uid == tcred->suid &&
-	     cred->uid == tcred->uid  &&
-	     cred->gid == tcred->egid &&
-	     cred->gid == tcred->sgid &&
-	     cred->gid == tcred->gid))
+	if (uid_eq(cred->uid, tcred->euid) &&
+	    uid_eq(cred->uid, tcred->suid) &&
+	    uid_eq(cred->uid, tcred->uid)  &&
+	    gid_eq(cred->gid, tcred->egid) &&
+	    gid_eq(cred->gid, tcred->sgid) &&
+	    gid_eq(cred->gid, tcred->gid))
 		goto ok;
 	if (ptrace_has_cap(tcred->user_ns, mode))
 		goto ok;
diff --git a/kernel/signal.c b/kernel/signal.c
index d630327..9797939 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -767,11 +767,10 @@ static int kill_ok_by_cred(struct task_struct *t)
 	const struct cred *cred = current_cred();
 	const struct cred *tcred = __task_cred(t);
 
-	if (cred->user_ns == tcred->user_ns &&
-	    (cred->euid == tcred->suid ||
-	     cred->euid == tcred->uid ||
-	     cred->uid  == tcred->suid ||
-	     cred->uid  == tcred->uid))
+	if (uid_eq(cred->euid, tcred->suid) ||
+	    uid_eq(cred->euid, tcred->uid)  ||
+	    uid_eq(cred->uid,  tcred->suid) ||
+	    uid_eq(cred->uid,  tcred->uid))
 		return 1;
 
 	if (ns_capable(tcred->user_ns, CAP_KILL))
@@ -1389,10 +1388,8 @@ static int kill_as_cred_perm(const struct cred *cred,
 			     struct task_struct *target)
 {
 	const struct cred *pcred = __task_cred(target);
-	if (cred->user_ns != pcred->user_ns)
-		return 0;
-	if (cred->euid != pcred->suid && cred->euid != pcred->uid &&
-	    cred->uid  != pcred->suid && cred->uid  != pcred->uid)
+	if (uid_eq(cred->euid, pcred->suid) && uid_eq(cred->euid, pcred->uid) &&
+	    uid_eq(cred->uid,  pcred->suid) && uid_eq(cred->uid,  pcred->uid))
 		return 0;
 	return 1;
 }
diff --git a/kernel/sys.c b/kernel/sys.c
index aff09f2..f484077 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -131,9 +131,8 @@ static bool set_one_prio_perm(struct task_struct *p)
 {
 	const struct cred *cred = current_cred(), *pcred = __task_cred(p);
 
-	if (pcred->user_ns == cred->user_ns &&
-	    (pcred->uid  == cred->euid ||
-	     pcred->euid == cred->euid))
+	if (uid_eq(pcred->uid,  cred->euid) ||
+	    uid_eq(pcred->euid, cred->euid))
 		return true;
 	if (ns_capable(pcred->user_ns, CAP_SYS_NICE))
 		return true;
@@ -1582,13 +1581,12 @@ static int check_prlimit_permission(struct task_struct *task)
 		return 0;
 
 	tcred = __task_cred(task);
-	if (cred->user_ns == tcred->user_ns &&
-	    (cred->uid == tcred->euid &&
-	     cred->uid == tcred->suid &&
-	     cred->uid == tcred->uid &&
-	     cred->gid == tcred->egid &&
-	     cred->gid == tcred->sgid &&
-		    cred->gid == tcred->gid))
+	if (uid_eq(cred->uid, tcred->euid) &&
+	    uid_eq(cred->uid, tcred->suid) &&
+	    uid_eq(cred->uid, tcred->uid)  &&
+	    gid_eq(cred->gid, tcred->egid) &&
+	    gid_eq(cred->gid, tcred->sgid) &&
+	    gid_eq(cred->gid, tcred->gid))
 		return 0;
 	if (ns_capable(tcred->user_ns, CAP_SYS_RESOURCE))
 		return 0;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 24/43] userns: Convert ptrace, kill, set_priority permission checks to work with kuids and kgids
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Update the permission checks to use the new uid_eq and gid_eq helpers
and remove the now unnecessary user_ns equality comparison.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 kernel/ptrace.c |   13 ++++++-------
 kernel/signal.c |   15 ++++++---------
 kernel/sys.c    |   18 ++++++++----------
 3 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 24e0a5a..a232bb5 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -198,13 +198,12 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 		return 0;
 	rcu_read_lock();
 	tcred = __task_cred(task);
-	if (cred->user_ns == tcred->user_ns &&
-	    (cred->uid == tcred->euid &&
-	     cred->uid == tcred->suid &&
-	     cred->uid == tcred->uid  &&
-	     cred->gid == tcred->egid &&
-	     cred->gid == tcred->sgid &&
-	     cred->gid == tcred->gid))
+	if (uid_eq(cred->uid, tcred->euid) &&
+	    uid_eq(cred->uid, tcred->suid) &&
+	    uid_eq(cred->uid, tcred->uid)  &&
+	    gid_eq(cred->gid, tcred->egid) &&
+	    gid_eq(cred->gid, tcred->sgid) &&
+	    gid_eq(cred->gid, tcred->gid))
 		goto ok;
 	if (ptrace_has_cap(tcred->user_ns, mode))
 		goto ok;
diff --git a/kernel/signal.c b/kernel/signal.c
index d630327..9797939 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -767,11 +767,10 @@ static int kill_ok_by_cred(struct task_struct *t)
 	const struct cred *cred = current_cred();
 	const struct cred *tcred = __task_cred(t);
 
-	if (cred->user_ns == tcred->user_ns &&
-	    (cred->euid == tcred->suid ||
-	     cred->euid == tcred->uid ||
-	     cred->uid  == tcred->suid ||
-	     cred->uid  == tcred->uid))
+	if (uid_eq(cred->euid, tcred->suid) ||
+	    uid_eq(cred->euid, tcred->uid)  ||
+	    uid_eq(cred->uid,  tcred->suid) ||
+	    uid_eq(cred->uid,  tcred->uid))
 		return 1;
 
 	if (ns_capable(tcred->user_ns, CAP_KILL))
@@ -1389,10 +1388,8 @@ static int kill_as_cred_perm(const struct cred *cred,
 			     struct task_struct *target)
 {
 	const struct cred *pcred = __task_cred(target);
-	if (cred->user_ns != pcred->user_ns)
-		return 0;
-	if (cred->euid != pcred->suid && cred->euid != pcred->uid &&
-	    cred->uid  != pcred->suid && cred->uid  != pcred->uid)
+	if (uid_eq(cred->euid, pcred->suid) && uid_eq(cred->euid, pcred->uid) &&
+	    uid_eq(cred->uid,  pcred->suid) && uid_eq(cred->uid,  pcred->uid))
 		return 0;
 	return 1;
 }
diff --git a/kernel/sys.c b/kernel/sys.c
index aff09f2..f484077 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -131,9 +131,8 @@ static bool set_one_prio_perm(struct task_struct *p)
 {
 	const struct cred *cred = current_cred(), *pcred = __task_cred(p);
 
-	if (pcred->user_ns == cred->user_ns &&
-	    (pcred->uid  == cred->euid ||
-	     pcred->euid == cred->euid))
+	if (uid_eq(pcred->uid,  cred->euid) ||
+	    uid_eq(pcred->euid, cred->euid))
 		return true;
 	if (ns_capable(pcred->user_ns, CAP_SYS_NICE))
 		return true;
@@ -1582,13 +1581,12 @@ static int check_prlimit_permission(struct task_struct *task)
 		return 0;
 
 	tcred = __task_cred(task);
-	if (cred->user_ns == tcred->user_ns &&
-	    (cred->uid == tcred->euid &&
-	     cred->uid == tcred->suid &&
-	     cred->uid == tcred->uid &&
-	     cred->gid == tcred->egid &&
-	     cred->gid == tcred->sgid &&
-		    cred->gid == tcred->gid))
+	if (uid_eq(cred->uid, tcred->euid) &&
+	    uid_eq(cred->uid, tcred->suid) &&
+	    uid_eq(cred->uid, tcred->uid)  &&
+	    gid_eq(cred->gid, tcred->egid) &&
+	    gid_eq(cred->gid, tcred->sgid) &&
+	    gid_eq(cred->gid, tcred->gid))
 		return 0;
 	if (ns_capable(tcred->user_ns, CAP_SYS_RESOURCE))
 		return 0;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 24/43] userns: Convert ptrace, kill, set_priority permission checks to work with kuids and kgids
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Update the permission checks to use the new uid_eq and gid_eq helpers
and remove the now unnecessary user_ns equality comparison.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/ptrace.c |   13 ++++++-------
 kernel/signal.c |   15 ++++++---------
 kernel/sys.c    |   18 ++++++++----------
 3 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 24e0a5a..a232bb5 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -198,13 +198,12 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 		return 0;
 	rcu_read_lock();
 	tcred = __task_cred(task);
-	if (cred->user_ns == tcred->user_ns &&
-	    (cred->uid == tcred->euid &&
-	     cred->uid == tcred->suid &&
-	     cred->uid == tcred->uid  &&
-	     cred->gid == tcred->egid &&
-	     cred->gid == tcred->sgid &&
-	     cred->gid == tcred->gid))
+	if (uid_eq(cred->uid, tcred->euid) &&
+	    uid_eq(cred->uid, tcred->suid) &&
+	    uid_eq(cred->uid, tcred->uid)  &&
+	    gid_eq(cred->gid, tcred->egid) &&
+	    gid_eq(cred->gid, tcred->sgid) &&
+	    gid_eq(cred->gid, tcred->gid))
 		goto ok;
 	if (ptrace_has_cap(tcred->user_ns, mode))
 		goto ok;
diff --git a/kernel/signal.c b/kernel/signal.c
index d630327..9797939 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -767,11 +767,10 @@ static int kill_ok_by_cred(struct task_struct *t)
 	const struct cred *cred = current_cred();
 	const struct cred *tcred = __task_cred(t);
 
-	if (cred->user_ns == tcred->user_ns &&
-	    (cred->euid == tcred->suid ||
-	     cred->euid == tcred->uid ||
-	     cred->uid  == tcred->suid ||
-	     cred->uid  == tcred->uid))
+	if (uid_eq(cred->euid, tcred->suid) ||
+	    uid_eq(cred->euid, tcred->uid)  ||
+	    uid_eq(cred->uid,  tcred->suid) ||
+	    uid_eq(cred->uid,  tcred->uid))
 		return 1;
 
 	if (ns_capable(tcred->user_ns, CAP_KILL))
@@ -1389,10 +1388,8 @@ static int kill_as_cred_perm(const struct cred *cred,
 			     struct task_struct *target)
 {
 	const struct cred *pcred = __task_cred(target);
-	if (cred->user_ns != pcred->user_ns)
-		return 0;
-	if (cred->euid != pcred->suid && cred->euid != pcred->uid &&
-	    cred->uid  != pcred->suid && cred->uid  != pcred->uid)
+	if (uid_eq(cred->euid, pcred->suid) && uid_eq(cred->euid, pcred->uid) &&
+	    uid_eq(cred->uid,  pcred->suid) && uid_eq(cred->uid,  pcred->uid))
 		return 0;
 	return 1;
 }
diff --git a/kernel/sys.c b/kernel/sys.c
index aff09f2..f484077 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -131,9 +131,8 @@ static bool set_one_prio_perm(struct task_struct *p)
 {
 	const struct cred *cred = current_cred(), *pcred = __task_cred(p);
 
-	if (pcred->user_ns == cred->user_ns &&
-	    (pcred->uid  == cred->euid ||
-	     pcred->euid == cred->euid))
+	if (uid_eq(pcred->uid,  cred->euid) ||
+	    uid_eq(pcred->euid, cred->euid))
 		return true;
 	if (ns_capable(pcred->user_ns, CAP_SYS_NICE))
 		return true;
@@ -1582,13 +1581,12 @@ static int check_prlimit_permission(struct task_struct *task)
 		return 0;
 
 	tcred = __task_cred(task);
-	if (cred->user_ns == tcred->user_ns &&
-	    (cred->uid == tcred->euid &&
-	     cred->uid == tcred->suid &&
-	     cred->uid == tcred->uid &&
-	     cred->gid == tcred->egid &&
-	     cred->gid == tcred->sgid &&
-		    cred->gid == tcred->gid))
+	if (uid_eq(cred->uid, tcred->euid) &&
+	    uid_eq(cred->uid, tcred->suid) &&
+	    uid_eq(cred->uid, tcred->uid)  &&
+	    gid_eq(cred->gid, tcred->egid) &&
+	    gid_eq(cred->gid, tcred->sgid) &&
+	    gid_eq(cred->gid, tcred->gid))
 		return 0;
 	if (ns_capable(tcred->user_ns, CAP_SYS_RESOURCE))
 		return 0;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 25/43] userns: Store uid and gid types in vfs structures with kuid_t and kgid_t types
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

The conversion of all of the users is not done yet there are too many to change
in one go and leave the code reviewable. For now I change just the header and
a few trivial users and rely on CONFIG_UIDGID_STRICT_TYPE_CHECKS not being set
to ensure that the code will still compile during the transition.

Helper functions i_uid_read, i_uid_write, i_gid_read, i_gid_write are added
so that in most cases filesystems can avoid the complexities of multiple user
namespaces and can concentrate on moving their raw numeric values into and
out of the vfs data structures.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/inode.c         |    6 +++---
 include/linux/fs.h |   36 +++++++++++++++++++++++++++++++-----
 2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index f0c4ace..deb72f6 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -135,8 +135,8 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
 	inode->i_fop = &empty_fops;
 	inode->__i_nlink = 1;
 	inode->i_opflags = 0;
-	inode->i_uid = 0;
-	inode->i_gid = 0;
+	i_uid_write(inode, 0);
+	i_gid_write(inode, 0);
 	atomic_set(&inode->i_writecount, 0);
 	inode->i_size = 0;
 	inode->i_blocks = 0;
@@ -1732,7 +1732,7 @@ EXPORT_SYMBOL(inode_init_owner);
  */
 bool inode_owner_or_capable(const struct inode *inode)
 {
-	if (current_fsuid() == inode->i_uid)
+	if (uid_eq(current_fsuid(), inode->i_uid))
 		return true;
 	if (inode_capable(inode, CAP_FOWNER))
 		return true;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a6c5efb..797eb26 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -402,6 +402,7 @@ struct inodes_stat_t {
 #include <linux/atomic.h>
 #include <linux/shrinker.h>
 #include <linux/migrate_mode.h>
+#include <linux/uidgid.h>
 
 #include <asm/byteorder.h>
 
@@ -469,8 +470,8 @@ typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 struct iattr {
 	unsigned int	ia_valid;
 	umode_t		ia_mode;
-	uid_t		ia_uid;
-	gid_t		ia_gid;
+	kuid_t		ia_uid;
+	kgid_t		ia_gid;
 	loff_t		ia_size;
 	struct timespec	ia_atime;
 	struct timespec	ia_mtime;
@@ -761,8 +762,8 @@ struct posix_acl;
 struct inode {
 	umode_t			i_mode;
 	unsigned short		i_opflags;
-	uid_t			i_uid;
-	gid_t			i_gid;
+	kuid_t			i_uid;
+	kgid_t			i_gid;
 	unsigned int		i_flags;
 
 #ifdef CONFIG_FS_POSIX_ACL
@@ -927,6 +928,31 @@ static inline void i_size_write(struct inode *inode, loff_t i_size)
 #endif
 }
 
+/* Helper functions so that in most cases filesystems will
+ * not need to deal directly with kuid_t and kgid_t and can
+ * instead deal with the raw numeric values that are stored
+ * in the filesystem.
+ */
+static inline uid_t i_uid_read(const struct inode *inode)
+{
+	return from_kuid(&init_user_ns, inode->i_uid);
+}
+
+static inline gid_t i_gid_read(const struct inode *inode)
+{
+	return from_kgid(&init_user_ns, inode->i_gid);
+}
+
+static inline void i_uid_write(struct inode *inode, uid_t uid)
+{
+	inode->i_uid = make_kuid(&init_user_ns, uid);
+}
+
+static inline void i_gid_write(struct inode *inode, gid_t gid)
+{
+	inode->i_gid = make_kgid(&init_user_ns, gid);
+}
+
 static inline unsigned iminor(const struct inode *inode)
 {
 	return MINOR(inode->i_rdev);
@@ -943,7 +969,7 @@ struct fown_struct {
 	rwlock_t lock;          /* protects pid, uid, euid fields */
 	struct pid *pid;	/* pid or -pgrp where SIGIO should be sent */
 	enum pid_type pid_type;	/* Kind of process group SIGIO should be sent to */
-	uid_t uid, euid;	/* uid/euid of process setting the owner */
+	kuid_t uid, euid;	/* uid/euid of process setting the owner */
 	int signum;		/* posix.1b rt signal to be delivered on IO */
 };
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 25/43] userns: Store uid and gid types in vfs structures with kuid_t and kgid_t types
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

The conversion of all of the users is not done yet there are too many to change
in one go and leave the code reviewable. For now I change just the header and
a few trivial users and rely on CONFIG_UIDGID_STRICT_TYPE_CHECKS not being set
to ensure that the code will still compile during the transition.

Helper functions i_uid_read, i_uid_write, i_gid_read, i_gid_write are added
so that in most cases filesystems can avoid the complexities of multiple user
namespaces and can concentrate on moving their raw numeric values into and
out of the vfs data structures.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/inode.c         |    6 +++---
 include/linux/fs.h |   36 +++++++++++++++++++++++++++++++-----
 2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index f0c4ace..deb72f6 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -135,8 +135,8 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
 	inode->i_fop = &empty_fops;
 	inode->__i_nlink = 1;
 	inode->i_opflags = 0;
-	inode->i_uid = 0;
-	inode->i_gid = 0;
+	i_uid_write(inode, 0);
+	i_gid_write(inode, 0);
 	atomic_set(&inode->i_writecount, 0);
 	inode->i_size = 0;
 	inode->i_blocks = 0;
@@ -1732,7 +1732,7 @@ EXPORT_SYMBOL(inode_init_owner);
  */
 bool inode_owner_or_capable(const struct inode *inode)
 {
-	if (current_fsuid() == inode->i_uid)
+	if (uid_eq(current_fsuid(), inode->i_uid))
 		return true;
 	if (inode_capable(inode, CAP_FOWNER))
 		return true;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a6c5efb..797eb26 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -402,6 +402,7 @@ struct inodes_stat_t {
 #include <linux/atomic.h>
 #include <linux/shrinker.h>
 #include <linux/migrate_mode.h>
+#include <linux/uidgid.h>
 
 #include <asm/byteorder.h>
 
@@ -469,8 +470,8 @@ typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 struct iattr {
 	unsigned int	ia_valid;
 	umode_t		ia_mode;
-	uid_t		ia_uid;
-	gid_t		ia_gid;
+	kuid_t		ia_uid;
+	kgid_t		ia_gid;
 	loff_t		ia_size;
 	struct timespec	ia_atime;
 	struct timespec	ia_mtime;
@@ -761,8 +762,8 @@ struct posix_acl;
 struct inode {
 	umode_t			i_mode;
 	unsigned short		i_opflags;
-	uid_t			i_uid;
-	gid_t			i_gid;
+	kuid_t			i_uid;
+	kgid_t			i_gid;
 	unsigned int		i_flags;
 
 #ifdef CONFIG_FS_POSIX_ACL
@@ -927,6 +928,31 @@ static inline void i_size_write(struct inode *inode, loff_t i_size)
 #endif
 }
 
+/* Helper functions so that in most cases filesystems will
+ * not need to deal directly with kuid_t and kgid_t and can
+ * instead deal with the raw numeric values that are stored
+ * in the filesystem.
+ */
+static inline uid_t i_uid_read(const struct inode *inode)
+{
+	return from_kuid(&init_user_ns, inode->i_uid);
+}
+
+static inline gid_t i_gid_read(const struct inode *inode)
+{
+	return from_kgid(&init_user_ns, inode->i_gid);
+}
+
+static inline void i_uid_write(struct inode *inode, uid_t uid)
+{
+	inode->i_uid = make_kuid(&init_user_ns, uid);
+}
+
+static inline void i_gid_write(struct inode *inode, gid_t gid)
+{
+	inode->i_gid = make_kgid(&init_user_ns, gid);
+}
+
 static inline unsigned iminor(const struct inode *inode)
 {
 	return MINOR(inode->i_rdev);
@@ -943,7 +969,7 @@ struct fown_struct {
 	rwlock_t lock;          /* protects pid, uid, euid fields */
 	struct pid *pid;	/* pid or -pgrp where SIGIO should be sent */
 	enum pid_type pid_type;	/* Kind of process group SIGIO should be sent to */
-	uid_t uid, euid;	/* uid/euid of process setting the owner */
+	kuid_t uid, euid;	/* uid/euid of process setting the owner */
 	int signum;		/* posix.1b rt signal to be delivered on IO */
 };
 
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 25/43] userns: Store uid and gid types in vfs structures with kuid_t and kgid_t types
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

The conversion of all of the users is not done yet there are too many to change
in one go and leave the code reviewable. For now I change just the header and
a few trivial users and rely on CONFIG_UIDGID_STRICT_TYPE_CHECKS not being set
to ensure that the code will still compile during the transition.

Helper functions i_uid_read, i_uid_write, i_gid_read, i_gid_write are added
so that in most cases filesystems can avoid the complexities of multiple user
namespaces and can concentrate on moving their raw numeric values into and
out of the vfs data structures.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/inode.c         |    6 +++---
 include/linux/fs.h |   36 +++++++++++++++++++++++++++++++-----
 2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index f0c4ace..deb72f6 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -135,8 +135,8 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
 	inode->i_fop = &empty_fops;
 	inode->__i_nlink = 1;
 	inode->i_opflags = 0;
-	inode->i_uid = 0;
-	inode->i_gid = 0;
+	i_uid_write(inode, 0);
+	i_gid_write(inode, 0);
 	atomic_set(&inode->i_writecount, 0);
 	inode->i_size = 0;
 	inode->i_blocks = 0;
@@ -1732,7 +1732,7 @@ EXPORT_SYMBOL(inode_init_owner);
  */
 bool inode_owner_or_capable(const struct inode *inode)
 {
-	if (current_fsuid() == inode->i_uid)
+	if (uid_eq(current_fsuid(), inode->i_uid))
 		return true;
 	if (inode_capable(inode, CAP_FOWNER))
 		return true;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a6c5efb..797eb26 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -402,6 +402,7 @@ struct inodes_stat_t {
 #include <linux/atomic.h>
 #include <linux/shrinker.h>
 #include <linux/migrate_mode.h>
+#include <linux/uidgid.h>
 
 #include <asm/byteorder.h>
 
@@ -469,8 +470,8 @@ typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 struct iattr {
 	unsigned int	ia_valid;
 	umode_t		ia_mode;
-	uid_t		ia_uid;
-	gid_t		ia_gid;
+	kuid_t		ia_uid;
+	kgid_t		ia_gid;
 	loff_t		ia_size;
 	struct timespec	ia_atime;
 	struct timespec	ia_mtime;
@@ -761,8 +762,8 @@ struct posix_acl;
 struct inode {
 	umode_t			i_mode;
 	unsigned short		i_opflags;
-	uid_t			i_uid;
-	gid_t			i_gid;
+	kuid_t			i_uid;
+	kgid_t			i_gid;
 	unsigned int		i_flags;
 
 #ifdef CONFIG_FS_POSIX_ACL
@@ -927,6 +928,31 @@ static inline void i_size_write(struct inode *inode, loff_t i_size)
 #endif
 }
 
+/* Helper functions so that in most cases filesystems will
+ * not need to deal directly with kuid_t and kgid_t and can
+ * instead deal with the raw numeric values that are stored
+ * in the filesystem.
+ */
+static inline uid_t i_uid_read(const struct inode *inode)
+{
+	return from_kuid(&init_user_ns, inode->i_uid);
+}
+
+static inline gid_t i_gid_read(const struct inode *inode)
+{
+	return from_kgid(&init_user_ns, inode->i_gid);
+}
+
+static inline void i_uid_write(struct inode *inode, uid_t uid)
+{
+	inode->i_uid = make_kuid(&init_user_ns, uid);
+}
+
+static inline void i_gid_write(struct inode *inode, gid_t gid)
+{
+	inode->i_gid = make_kgid(&init_user_ns, gid);
+}
+
 static inline unsigned iminor(const struct inode *inode)
 {
 	return MINOR(inode->i_rdev);
@@ -943,7 +969,7 @@ struct fown_struct {
 	rwlock_t lock;          /* protects pid, uid, euid fields */
 	struct pid *pid;	/* pid or -pgrp where SIGIO should be sent */
 	enum pid_type pid_type;	/* Kind of process group SIGIO should be sent to */
-	uid_t uid, euid;	/* uid/euid of process setting the owner */
+	kuid_t uid, euid;	/* uid/euid of process setting the owner */
 	int signum;		/* posix.1b rt signal to be delivered on IO */
 };
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 26/43] userns: Convert in_group_p and in_egroup_p to use kgid_t
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/cred.h |    4 ++--
 kernel/groups.c      |   14 ++++++--------
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index fac0579..917dc5a 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -73,8 +73,8 @@ extern int groups_search(const struct group_info *, kgid_t);
 #define GROUP_AT(gi, i) \
 	((gi)->blocks[(i) / NGROUPS_PER_BLOCK][(i) % NGROUPS_PER_BLOCK])
 
-extern int in_group_p(gid_t);
-extern int in_egroup_p(gid_t);
+extern int in_group_p(kgid_t);
+extern int in_egroup_p(kgid_t);
 
 /*
  * The common credentials for a thread group
diff --git a/kernel/groups.c b/kernel/groups.c
index 84156f2..6b2588d 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -256,27 +256,25 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
 /*
  * Check whether we're fsgid/egid or in the supplemental group..
  */
-int in_group_p(gid_t grp)
+int in_group_p(kgid_t grp)
 {
 	const struct cred *cred = current_cred();
 	int retval = 1;
 
-	if (grp != cred->fsgid)
-		retval = groups_search(cred->group_info,
-				       make_kgid(cred->user_ns, grp));
+	if (!gid_eq(grp, cred->fsgid))
+		retval = groups_search(cred->group_info, grp);
 	return retval;
 }
 
 EXPORT_SYMBOL(in_group_p);
 
-int in_egroup_p(gid_t grp)
+int in_egroup_p(kgid_t grp)
 {
 	const struct cred *cred = current_cred();
 	int retval = 1;
 
-	if (grp != cred->egid)
-		retval = groups_search(cred->group_info,
-				       make_kgid(cred->user_ns, grp));
+	if (!gid_eq(grp, cred->egid))
+		retval = groups_search(cred->group_info, grp);
 	return retval;
 }
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 26/43] userns: Convert in_group_p and in_egroup_p to use kgid_t
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/cred.h |    4 ++--
 kernel/groups.c      |   14 ++++++--------
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index fac0579..917dc5a 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -73,8 +73,8 @@ extern int groups_search(const struct group_info *, kgid_t);
 #define GROUP_AT(gi, i) \
 	((gi)->blocks[(i) / NGROUPS_PER_BLOCK][(i) % NGROUPS_PER_BLOCK])
 
-extern int in_group_p(gid_t);
-extern int in_egroup_p(gid_t);
+extern int in_group_p(kgid_t);
+extern int in_egroup_p(kgid_t);
 
 /*
  * The common credentials for a thread group
diff --git a/kernel/groups.c b/kernel/groups.c
index 84156f2..6b2588d 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -256,27 +256,25 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
 /*
  * Check whether we're fsgid/egid or in the supplemental group..
  */
-int in_group_p(gid_t grp)
+int in_group_p(kgid_t grp)
 {
 	const struct cred *cred = current_cred();
 	int retval = 1;
 
-	if (grp != cred->fsgid)
-		retval = groups_search(cred->group_info,
-				       make_kgid(cred->user_ns, grp));
+	if (!gid_eq(grp, cred->fsgid))
+		retval = groups_search(cred->group_info, grp);
 	return retval;
 }
 
 EXPORT_SYMBOL(in_group_p);
 
-int in_egroup_p(gid_t grp)
+int in_egroup_p(kgid_t grp)
 {
 	const struct cred *cred = current_cred();
 	int retval = 1;
 
-	if (grp != cred->egid)
-		retval = groups_search(cred->group_info,
-				       make_kgid(cred->user_ns, grp));
+	if (!gid_eq(grp, cred->egid))
+		retval = groups_search(cred->group_info, grp);
 	return retval;
 }
 
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 26/43] userns: Convert in_group_p and in_egroup_p to use kgid_t
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/cred.h |    4 ++--
 kernel/groups.c      |   14 ++++++--------
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index fac0579..917dc5a 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -73,8 +73,8 @@ extern int groups_search(const struct group_info *, kgid_t);
 #define GROUP_AT(gi, i) \
 	((gi)->blocks[(i) / NGROUPS_PER_BLOCK][(i) % NGROUPS_PER_BLOCK])
 
-extern int in_group_p(gid_t);
-extern int in_egroup_p(gid_t);
+extern int in_group_p(kgid_t);
+extern int in_egroup_p(kgid_t);
 
 /*
  * The common credentials for a thread group
diff --git a/kernel/groups.c b/kernel/groups.c
index 84156f2..6b2588d 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -256,27 +256,25 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
 /*
  * Check whether we're fsgid/egid or in the supplemental group..
  */
-int in_group_p(gid_t grp)
+int in_group_p(kgid_t grp)
 {
 	const struct cred *cred = current_cred();
 	int retval = 1;
 
-	if (grp != cred->fsgid)
-		retval = groups_search(cred->group_info,
-				       make_kgid(cred->user_ns, grp));
+	if (!gid_eq(grp, cred->fsgid))
+		retval = groups_search(cred->group_info, grp);
 	return retval;
 }
 
 EXPORT_SYMBOL(in_group_p);
 
-int in_egroup_p(gid_t grp)
+int in_egroup_p(kgid_t grp)
 {
 	const struct cred *cred = current_cred();
 	int retval = 1;
 
-	if (grp != cred->egid)
-		retval = groups_search(cred->group_info,
-				       make_kgid(cred->user_ns, grp));
+	if (!gid_eq(grp, cred->egid))
+		retval = groups_search(cred->group_info, grp);
 	return retval;
 }
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
       [not found] ` <m11unyn70b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
                     ` (25 preceding siblings ...)
  2012-04-08  5:15     ` "Eric W. Beiderman
@ 2012-04-08  5:15   ` "Eric W. Beiderman
  2012-04-08  5:15     ` "Eric W. Beiderman
                     ` (18 subsequent siblings)
  45 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/attr.c                |    8 ++++----
 fs/exec.c                |   10 +++++-----
 fs/fcntl.c               |    6 +++---
 fs/ioprio.c              |    4 ++--
 fs/locks.c               |    2 +-
 fs/namei.c               |    8 ++++----
 include/linux/quotaops.h |    4 ++--
 7 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index 73f69a6..2f094c6 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -47,14 +47,14 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)
 
 	/* Make sure a caller can chown. */
 	if ((ia_valid & ATTR_UID) &&
-	    (current_fsuid() != inode->i_uid ||
-	     attr->ia_uid != inode->i_uid) && !capable(CAP_CHOWN))
+	    (!uid_eq(current_fsuid(), inode->i_uid) ||
+	     !uid_eq(attr->ia_uid, inode->i_uid)) && !capable(CAP_CHOWN))
 		return -EPERM;
 
 	/* Make sure caller can chgrp. */
 	if ((ia_valid & ATTR_GID) &&
-	    (current_fsuid() != inode->i_uid ||
-	    (!in_group_p(attr->ia_gid) && attr->ia_gid != inode->i_gid)) &&
+	    (!uid_eq(current_fsuid(), inode->i_uid) ||
+	    (!in_group_p(attr->ia_gid) && gid_eq(attr->ia_gid, inode->i_gid))) &&
 	    !capable(CAP_CHOWN))
 		return -EPERM;
 
diff --git a/fs/exec.c b/fs/exec.c
index 9a1d9f0..00ae2ef 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1139,7 +1139,7 @@ void setup_new_exec(struct linux_binprm * bprm)
 	/* This is the point of no return */
 	current->sas_ss_sp = current->sas_ss_size = 0;
 
-	if (current_euid() == current_uid() && current_egid() == current_gid())
+	if (uid_eq(current_euid(), current_uid()) && gid_eq(current_egid(), current_gid()))
 		set_dumpable(current->mm, 1);
 	else
 		set_dumpable(current->mm, suid_dumpable);
@@ -1153,8 +1153,8 @@ void setup_new_exec(struct linux_binprm * bprm)
 	current->mm->task_size = TASK_SIZE;
 
 	/* install the new credentials */
-	if (bprm->cred->uid != current_euid() ||
-	    bprm->cred->gid != current_egid()) {
+	if (!uid_eq(bprm->cred->uid, current_euid()) ||
+	    !gid_eq(bprm->cred->gid, current_egid())) {
 		current->pdeath_signal = 0;
 	} else {
 		would_dump(bprm, bprm->file);
@@ -2120,7 +2120,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
 	if (__get_dumpable(cprm.mm_flags) == 2) {
 		/* Setuid core dump mode */
 		flag = O_EXCL;		/* Stop rewrite attacks */
-		cred->fsuid = 0;	/* Dump root private */
+		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
 	}
 
 	retval = coredump_wait(exit_code, &core_state);
@@ -2221,7 +2221,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
 		 * Dont allow local users get cute and trick others to coredump
 		 * into their pre-created files.
 		 */
-		if (inode->i_uid != current_fsuid())
+		if (!uid_eq(inode->i_uid, current_fsuid()))
 			goto close_fail;
 		if (!cprm.file->f_op || !cprm.file->f_op->write)
 			goto close_fail;
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 75e7c1f..d078b75 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -532,9 +532,9 @@ static inline int sigio_perm(struct task_struct *p,
 
 	rcu_read_lock();
 	cred = __task_cred(p);
-	ret = ((fown->euid == 0 ||
-		fown->euid == cred->suid || fown->euid == cred->uid ||
-		fown->uid  == cred->suid || fown->uid  == cred->uid) &&
+	ret = ((uid_eq(fown->euid, GLOBAL_ROOT_UID) ||
+		uid_eq(fown->euid, cred->suid) || uid_eq(fown->euid, cred->uid) ||
+		uid_eq(fown->uid,  cred->suid) || uid_eq(fown->uid,  cred->uid)) &&
 	       !security_file_send_sigiotask(p, fown, sig));
 	rcu_read_unlock();
 	return ret;
diff --git a/fs/ioprio.c b/fs/ioprio.c
index 2072e41..5e6dbe89 100644
--- a/fs/ioprio.c
+++ b/fs/ioprio.c
@@ -37,8 +37,8 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
 
 	rcu_read_lock();
 	tcred = __task_cred(task);
-	if (tcred->uid != cred->euid &&
-	    tcred->uid != cred->uid && !capable(CAP_SYS_NICE)) {
+	if (!uid_eq(tcred->uid, cred->euid) &&
+	    !uid_eq(tcred->uid, cred->uid) && !capable(CAP_SYS_NICE)) {
 		rcu_read_unlock();
 		return -EPERM;
 	}
diff --git a/fs/locks.c b/fs/locks.c
index 637694b..3e946cd 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1445,7 +1445,7 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
 	struct inode *inode = dentry->d_inode;
 	int error;
 
-	if ((current_fsuid() != inode->i_uid) && !capable(CAP_LEASE))
+	if ((!uid_eq(current_fsuid(), inode->i_uid)) && !capable(CAP_LEASE))
 		return -EACCES;
 	if (!S_ISREG(inode->i_mode))
 		return -EINVAL;
diff --git a/fs/namei.c b/fs/namei.c
index 941c436..86512b4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -228,7 +228,7 @@ static int acl_permission_check(struct inode *inode, int mask)
 {
 	unsigned int mode = inode->i_mode;
 
-	if (likely(current_fsuid() == inode->i_uid))
+	if (likely(uid_eq(current_fsuid(), inode->i_uid)))
 		mode >>= 6;
 	else {
 		if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
@@ -1956,13 +1956,13 @@ static int user_path_parent(int dfd, const char __user *path,
  */
 static inline int check_sticky(struct inode *dir, struct inode *inode)
 {
-	uid_t fsuid = current_fsuid();
+	kuid_t fsuid = current_fsuid();
 
 	if (!(dir->i_mode & S_ISVTX))
 		return 0;
-	if (inode->i_uid == fsuid)
+	if (uid_eq(inode->i_uid, fsuid))
 		return 0;
-	if (dir->i_uid == fsuid)
+	if (uid_eq(dir->i_uid, fsuid))
 		return 0;
 	return !inode_capable(inode, CAP_FOWNER);
 }
diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
index d93f95e..17b9773 100644
--- a/include/linux/quotaops.h
+++ b/include/linux/quotaops.h
@@ -22,8 +22,8 @@ static inline struct quota_info *sb_dqopt(struct super_block *sb)
 static inline bool is_quota_modification(struct inode *inode, struct iattr *ia)
 {
 	return (ia->ia_valid & ATTR_SIZE && ia->ia_size != inode->i_size) ||
-		(ia->ia_valid & ATTR_UID && ia->ia_uid != inode->i_uid) ||
-		(ia->ia_valid & ATTR_GID && ia->ia_gid != inode->i_gid);
+		(ia->ia_valid & ATTR_UID && !uid_eq(ia->ia_uid, inode->i_uid)) ||
+		(ia->ia_valid & ATTR_GID && !gid_eq(ia->ia_gid, inode->i_gid));
 }
 
 #if defined(CONFIG_QUOTA)
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  2012-04-08  5:10 ` Eric W. Biederman
                   ` (3 preceding siblings ...)
  (?)
@ 2012-04-08  5:15 ` "Eric W. Beiderman
       [not found]   ` <1333862139-31737-27-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  -1 siblings, 1 reply; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/attr.c                |    8 ++++----
 fs/exec.c                |   10 +++++-----
 fs/fcntl.c               |    6 +++---
 fs/ioprio.c              |    4 ++--
 fs/locks.c               |    2 +-
 fs/namei.c               |    8 ++++----
 include/linux/quotaops.h |    4 ++--
 7 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index 73f69a6..2f094c6 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -47,14 +47,14 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)
 
 	/* Make sure a caller can chown. */
 	if ((ia_valid & ATTR_UID) &&
-	    (current_fsuid() != inode->i_uid ||
-	     attr->ia_uid != inode->i_uid) && !capable(CAP_CHOWN))
+	    (!uid_eq(current_fsuid(), inode->i_uid) ||
+	     !uid_eq(attr->ia_uid, inode->i_uid)) && !capable(CAP_CHOWN))
 		return -EPERM;
 
 	/* Make sure caller can chgrp. */
 	if ((ia_valid & ATTR_GID) &&
-	    (current_fsuid() != inode->i_uid ||
-	    (!in_group_p(attr->ia_gid) && attr->ia_gid != inode->i_gid)) &&
+	    (!uid_eq(current_fsuid(), inode->i_uid) ||
+	    (!in_group_p(attr->ia_gid) && gid_eq(attr->ia_gid, inode->i_gid))) &&
 	    !capable(CAP_CHOWN))
 		return -EPERM;
 
diff --git a/fs/exec.c b/fs/exec.c
index 9a1d9f0..00ae2ef 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1139,7 +1139,7 @@ void setup_new_exec(struct linux_binprm * bprm)
 	/* This is the point of no return */
 	current->sas_ss_sp = current->sas_ss_size = 0;
 
-	if (current_euid() == current_uid() && current_egid() == current_gid())
+	if (uid_eq(current_euid(), current_uid()) && gid_eq(current_egid(), current_gid()))
 		set_dumpable(current->mm, 1);
 	else
 		set_dumpable(current->mm, suid_dumpable);
@@ -1153,8 +1153,8 @@ void setup_new_exec(struct linux_binprm * bprm)
 	current->mm->task_size = TASK_SIZE;
 
 	/* install the new credentials */
-	if (bprm->cred->uid != current_euid() ||
-	    bprm->cred->gid != current_egid()) {
+	if (!uid_eq(bprm->cred->uid, current_euid()) ||
+	    !gid_eq(bprm->cred->gid, current_egid())) {
 		current->pdeath_signal = 0;
 	} else {
 		would_dump(bprm, bprm->file);
@@ -2120,7 +2120,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
 	if (__get_dumpable(cprm.mm_flags) == 2) {
 		/* Setuid core dump mode */
 		flag = O_EXCL;		/* Stop rewrite attacks */
-		cred->fsuid = 0;	/* Dump root private */
+		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
 	}
 
 	retval = coredump_wait(exit_code, &core_state);
@@ -2221,7 +2221,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
 		 * Dont allow local users get cute and trick others to coredump
 		 * into their pre-created files.
 		 */
-		if (inode->i_uid != current_fsuid())
+		if (!uid_eq(inode->i_uid, current_fsuid()))
 			goto close_fail;
 		if (!cprm.file->f_op || !cprm.file->f_op->write)
 			goto close_fail;
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 75e7c1f..d078b75 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -532,9 +532,9 @@ static inline int sigio_perm(struct task_struct *p,
 
 	rcu_read_lock();
 	cred = __task_cred(p);
-	ret = ((fown->euid == 0 ||
-		fown->euid == cred->suid || fown->euid == cred->uid ||
-		fown->uid  == cred->suid || fown->uid  == cred->uid) &&
+	ret = ((uid_eq(fown->euid, GLOBAL_ROOT_UID) ||
+		uid_eq(fown->euid, cred->suid) || uid_eq(fown->euid, cred->uid) ||
+		uid_eq(fown->uid,  cred->suid) || uid_eq(fown->uid,  cred->uid)) &&
 	       !security_file_send_sigiotask(p, fown, sig));
 	rcu_read_unlock();
 	return ret;
diff --git a/fs/ioprio.c b/fs/ioprio.c
index 2072e41..5e6dbe89 100644
--- a/fs/ioprio.c
+++ b/fs/ioprio.c
@@ -37,8 +37,8 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
 
 	rcu_read_lock();
 	tcred = __task_cred(task);
-	if (tcred->uid != cred->euid &&
-	    tcred->uid != cred->uid && !capable(CAP_SYS_NICE)) {
+	if (!uid_eq(tcred->uid, cred->euid) &&
+	    !uid_eq(tcred->uid, cred->uid) && !capable(CAP_SYS_NICE)) {
 		rcu_read_unlock();
 		return -EPERM;
 	}
diff --git a/fs/locks.c b/fs/locks.c
index 637694b..3e946cd 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1445,7 +1445,7 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
 	struct inode *inode = dentry->d_inode;
 	int error;
 
-	if ((current_fsuid() != inode->i_uid) && !capable(CAP_LEASE))
+	if ((!uid_eq(current_fsuid(), inode->i_uid)) && !capable(CAP_LEASE))
 		return -EACCES;
 	if (!S_ISREG(inode->i_mode))
 		return -EINVAL;
diff --git a/fs/namei.c b/fs/namei.c
index 941c436..86512b4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -228,7 +228,7 @@ static int acl_permission_check(struct inode *inode, int mask)
 {
 	unsigned int mode = inode->i_mode;
 
-	if (likely(current_fsuid() == inode->i_uid))
+	if (likely(uid_eq(current_fsuid(), inode->i_uid)))
 		mode >>= 6;
 	else {
 		if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
@@ -1956,13 +1956,13 @@ static int user_path_parent(int dfd, const char __user *path,
  */
 static inline int check_sticky(struct inode *dir, struct inode *inode)
 {
-	uid_t fsuid = current_fsuid();
+	kuid_t fsuid = current_fsuid();
 
 	if (!(dir->i_mode & S_ISVTX))
 		return 0;
-	if (inode->i_uid == fsuid)
+	if (uid_eq(inode->i_uid, fsuid))
 		return 0;
-	if (dir->i_uid == fsuid)
+	if (uid_eq(dir->i_uid, fsuid))
 		return 0;
 	return !inode_capable(inode, CAP_FOWNER);
 }
diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
index d93f95e..17b9773 100644
--- a/include/linux/quotaops.h
+++ b/include/linux/quotaops.h
@@ -22,8 +22,8 @@ static inline struct quota_info *sb_dqopt(struct super_block *sb)
 static inline bool is_quota_modification(struct inode *inode, struct iattr *ia)
 {
 	return (ia->ia_valid & ATTR_SIZE && ia->ia_size != inode->i_size) ||
-		(ia->ia_valid & ATTR_UID && ia->ia_uid != inode->i_uid) ||
-		(ia->ia_valid & ATTR_GID && ia->ia_gid != inode->i_gid);
+		(ia->ia_valid & ATTR_UID && !uid_eq(ia->ia_uid, inode->i_uid)) ||
+		(ia->ia_valid & ATTR_GID && !gid_eq(ia->ia_gid, inode->i_gid));
 }
 
 #if defined(CONFIG_QUOTA)
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 28/43] userns: Convert user specfied uids and gids in chown into kuids and kgid
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/open.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 92335f6..e166801 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -506,15 +506,24 @@ static int chown_common(struct path *path, uid_t user, gid_t group)
 	struct inode *inode = path->dentry->d_inode;
 	int error;
 	struct iattr newattrs;
+	kuid_t uid;
+	kgid_t gid;
+
+	uid = make_kuid(current_user_ns(), user);
+	gid = make_kgid(current_user_ns(), group);
 
 	newattrs.ia_valid =  ATTR_CTIME;
 	if (user != (uid_t) -1) {
+		if (!uid_valid(uid))
+			return -EINVAL;
 		newattrs.ia_valid |= ATTR_UID;
-		newattrs.ia_uid = user;
+		newattrs.ia_uid = uid;
 	}
 	if (group != (gid_t) -1) {
+		if (!gid_valid(gid))
+			return -EINVAL;
 		newattrs.ia_valid |= ATTR_GID;
-		newattrs.ia_gid = group;
+		newattrs.ia_gid = gid;
 	}
 	if (!S_ISDIR(inode->i_mode))
 		newattrs.ia_valid |=
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 28/43] userns: Convert user specfied uids and gids in chown into kuids and kgid
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/open.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 92335f6..e166801 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -506,15 +506,24 @@ static int chown_common(struct path *path, uid_t user, gid_t group)
 	struct inode *inode = path->dentry->d_inode;
 	int error;
 	struct iattr newattrs;
+	kuid_t uid;
+	kgid_t gid;
+
+	uid = make_kuid(current_user_ns(), user);
+	gid = make_kgid(current_user_ns(), group);
 
 	newattrs.ia_valid =  ATTR_CTIME;
 	if (user != (uid_t) -1) {
+		if (!uid_valid(uid))
+			return -EINVAL;
 		newattrs.ia_valid |= ATTR_UID;
-		newattrs.ia_uid = user;
+		newattrs.ia_uid = uid;
 	}
 	if (group != (gid_t) -1) {
+		if (!gid_valid(gid))
+			return -EINVAL;
 		newattrs.ia_valid |= ATTR_GID;
-		newattrs.ia_gid = group;
+		newattrs.ia_gid = gid;
 	}
 	if (!S_ISDIR(inode->i_mode))
 		newattrs.ia_valid |=
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 28/43] userns: Convert user specfied uids and gids in chown into kuids and kgid
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/open.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 92335f6..e166801 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -506,15 +506,24 @@ static int chown_common(struct path *path, uid_t user, gid_t group)
 	struct inode *inode = path->dentry->d_inode;
 	int error;
 	struct iattr newattrs;
+	kuid_t uid;
+	kgid_t gid;
+
+	uid = make_kuid(current_user_ns(), user);
+	gid = make_kgid(current_user_ns(), group);
 
 	newattrs.ia_valid =  ATTR_CTIME;
 	if (user != (uid_t) -1) {
+		if (!uid_valid(uid))
+			return -EINVAL;
 		newattrs.ia_valid |= ATTR_UID;
-		newattrs.ia_uid = user;
+		newattrs.ia_uid = uid;
 	}
 	if (group != (gid_t) -1) {
+		if (!gid_valid(gid))
+			return -EINVAL;
 		newattrs.ia_valid |= ATTR_GID;
-		newattrs.ia_gid = group;
+		newattrs.ia_gid = gid;
 	}
 	if (!S_ISDIR(inode->i_mode))
 		newattrs.ia_valid |=
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 29/43] userns: Convert stat to return values mapped from kuids and kgids
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

- Store uids and gids with kuid_t and kgid_t in struct kstat
- Convert uid and gids to userspace usable values with
  from_kuid and from_kgid

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 arch/arm/kernel/sys_oabi-compat.c |    4 ++--
 arch/parisc/hpux/fs.c             |    4 ++--
 arch/s390/kernel/compat_linux.c   |    4 ++--
 arch/sparc/kernel/sys_sparc32.c   |    4 ++--
 arch/x86/ia32/sys_ia32.c          |    4 ++--
 fs/compat.c                       |    4 ++--
 fs/stat.c                         |    8 ++++----
 include/linux/stat.h              |    5 +++--
 8 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index af0aaeb..3e94811 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -124,8 +124,8 @@ static long cp_oldabi_stat64(struct kstat *stat,
 	tmp.__st_ino = stat->ino;
 	tmp.st_mode = stat->mode;
 	tmp.st_nlink = stat->nlink;
-	tmp.st_uid = stat->uid;
-	tmp.st_gid = stat->gid;
+	tmp.st_uid = from_kuid_munged(current_user_ns(), stat->uid);
+	tmp.st_gid = from_kgid_munged(current_user_ns(), stat->gid);
 	tmp.st_rdev = huge_encode_dev(stat->rdev);
 	tmp.st_size = stat->size;
 	tmp.st_blocks = stat->blocks;
diff --git a/arch/parisc/hpux/fs.c b/arch/parisc/hpux/fs.c
index 0dc8543..c71eb6c 100644
--- a/arch/parisc/hpux/fs.c
+++ b/arch/parisc/hpux/fs.c
@@ -159,8 +159,8 @@ static int cp_hpux_stat(struct kstat *stat, struct hpux_stat64 __user *statbuf)
 	tmp.st_ino = stat->ino;
 	tmp.st_mode = stat->mode;
 	tmp.st_nlink = stat->nlink;
-	tmp.st_uid = stat->uid;
-	tmp.st_gid = stat->gid;
+	tmp.st_uid = from_kuid_munged(current_user_ns(), stat->uid);
+	tmp.st_gid = from_kgid_munged(current_user_ns(), stat->gid);
 	tmp.st_rdev = new_encode_dev(stat->rdev);
 	tmp.st_size = stat->size;
 	tmp.st_atime = stat->atime.tv_sec;
diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index 5baac18..80ab23a 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -546,8 +546,8 @@ static int cp_stat64(struct stat64_emu31 __user *ubuf, struct kstat *stat)
 	tmp.__st_ino = (u32)stat->ino;
 	tmp.st_mode = stat->mode;
 	tmp.st_nlink = (unsigned int)stat->nlink;
-	tmp.st_uid = stat->uid;
-	tmp.st_gid = stat->gid;
+	tmp.st_uid = from_kuid_munged(current_user_ns(), stat->uid);
+	tmp.st_gid = from_kgid_munged(current_user_ns(), stat->gid);
 	tmp.st_rdev = huge_encode_dev(stat->rdev);
 	tmp.st_size = stat->size;
 	tmp.st_blksize = (u32)stat->blksize;
diff --git a/arch/sparc/kernel/sys_sparc32.c b/arch/sparc/kernel/sys_sparc32.c
index 29c478f..f739233 100644
--- a/arch/sparc/kernel/sys_sparc32.c
+++ b/arch/sparc/kernel/sys_sparc32.c
@@ -139,8 +139,8 @@ static int cp_compat_stat64(struct kstat *stat,
 	err |= put_user(stat->ino, &statbuf->st_ino);
 	err |= put_user(stat->mode, &statbuf->st_mode);
 	err |= put_user(stat->nlink, &statbuf->st_nlink);
-	err |= put_user(stat->uid, &statbuf->st_uid);
-	err |= put_user(stat->gid, &statbuf->st_gid);
+	err |= put_user(from_kuid_munged(current_user_ns(), stat->uid), &statbuf->st_uid);
+	err |= put_user(from_kgid_munged(current_user_ns(), stat->gid), &statbuf->st_gid);
 	err |= put_user(huge_encode_dev(stat->rdev), &statbuf->st_rdev);
 	err |= put_user(0, (unsigned long __user *) &statbuf->__pad3[0]);
 	err |= put_user(stat->size, &statbuf->st_size);
diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
index aec2202..d5c820a 100644
--- a/arch/x86/ia32/sys_ia32.c
+++ b/arch/x86/ia32/sys_ia32.c
@@ -71,8 +71,8 @@ static int cp_stat64(struct stat64 __user *ubuf, struct kstat *stat)
 {
 	typeof(ubuf->st_uid) uid = 0;
 	typeof(ubuf->st_gid) gid = 0;
-	SET_UID(uid, stat->uid);
-	SET_GID(gid, stat->gid);
+	SET_UID(uid, from_kuid_munged(current_user_ns(), stat->uid));
+	SET_GID(gid, from_kgid_munged(current_user_ns(), stat->gid));
 	if (!access_ok(VERIFY_WRITE, ubuf, sizeof(struct stat64)) ||
 	    __put_user(huge_encode_dev(stat->dev), &ubuf->st_dev) ||
 	    __put_user(stat->ino, &ubuf->__st_ino) ||
diff --git a/fs/compat.c b/fs/compat.c
index f2944ac..0781e61 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -144,8 +144,8 @@ static int cp_compat_stat(struct kstat *stat, struct compat_stat __user *ubuf)
 	tmp.st_nlink = stat->nlink;
 	if (tmp.st_nlink != stat->nlink)
 		return -EOVERFLOW;
-	SET_UID(tmp.st_uid, stat->uid);
-	SET_GID(tmp.st_gid, stat->gid);
+	SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
+	SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
 	tmp.st_rdev = old_encode_dev(stat->rdev);
 	if ((u64) stat->size > MAX_NON_LFS)
 		return -EOVERFLOW;
diff --git a/fs/stat.c b/fs/stat.c
index c733dc5..fca17f9 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -137,8 +137,8 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
 	tmp.st_nlink = stat->nlink;
 	if (tmp.st_nlink != stat->nlink)
 		return -EOVERFLOW;
-	SET_UID(tmp.st_uid, stat->uid);
-	SET_GID(tmp.st_gid, stat->gid);
+	SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
+	SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
 	tmp.st_rdev = old_encode_dev(stat->rdev);
 #if BITS_PER_LONG == 32
 	if (stat->size > MAX_NON_LFS)
@@ -215,8 +215,8 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf)
 	tmp.st_nlink = stat->nlink;
 	if (tmp.st_nlink != stat->nlink)
 		return -EOVERFLOW;
-	SET_UID(tmp.st_uid, stat->uid);
-	SET_GID(tmp.st_gid, stat->gid);
+	SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
+	SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
 #if BITS_PER_LONG == 32
 	tmp.st_rdev = old_encode_dev(stat->rdev);
 #else
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 611c398..4613240 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -58,14 +58,15 @@
 
 #include <linux/types.h>
 #include <linux/time.h>
+#include <linux/uidgid.h>
 
 struct kstat {
 	u64		ino;
 	dev_t		dev;
 	umode_t		mode;
 	unsigned int	nlink;
-	uid_t		uid;
-	gid_t		gid;
+	kuid_t		uid;
+	kgid_t		gid;
 	dev_t		rdev;
 	loff_t		size;
 	struct timespec  atime;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 29/43] userns: Convert stat to return values mapped from kuids and kgids
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

- Store uids and gids with kuid_t and kgid_t in struct kstat
- Convert uid and gids to userspace usable values with
  from_kuid and from_kgid

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 arch/arm/kernel/sys_oabi-compat.c |    4 ++--
 arch/parisc/hpux/fs.c             |    4 ++--
 arch/s390/kernel/compat_linux.c   |    4 ++--
 arch/sparc/kernel/sys_sparc32.c   |    4 ++--
 arch/x86/ia32/sys_ia32.c          |    4 ++--
 fs/compat.c                       |    4 ++--
 fs/stat.c                         |    8 ++++----
 include/linux/stat.h              |    5 +++--
 8 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index af0aaeb..3e94811 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -124,8 +124,8 @@ static long cp_oldabi_stat64(struct kstat *stat,
 	tmp.__st_ino = stat->ino;
 	tmp.st_mode = stat->mode;
 	tmp.st_nlink = stat->nlink;
-	tmp.st_uid = stat->uid;
-	tmp.st_gid = stat->gid;
+	tmp.st_uid = from_kuid_munged(current_user_ns(), stat->uid);
+	tmp.st_gid = from_kgid_munged(current_user_ns(), stat->gid);
 	tmp.st_rdev = huge_encode_dev(stat->rdev);
 	tmp.st_size = stat->size;
 	tmp.st_blocks = stat->blocks;
diff --git a/arch/parisc/hpux/fs.c b/arch/parisc/hpux/fs.c
index 0dc8543..c71eb6c 100644
--- a/arch/parisc/hpux/fs.c
+++ b/arch/parisc/hpux/fs.c
@@ -159,8 +159,8 @@ static int cp_hpux_stat(struct kstat *stat, struct hpux_stat64 __user *statbuf)
 	tmp.st_ino = stat->ino;
 	tmp.st_mode = stat->mode;
 	tmp.st_nlink = stat->nlink;
-	tmp.st_uid = stat->uid;
-	tmp.st_gid = stat->gid;
+	tmp.st_uid = from_kuid_munged(current_user_ns(), stat->uid);
+	tmp.st_gid = from_kgid_munged(current_user_ns(), stat->gid);
 	tmp.st_rdev = new_encode_dev(stat->rdev);
 	tmp.st_size = stat->size;
 	tmp.st_atime = stat->atime.tv_sec;
diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index 5baac18..80ab23a 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -546,8 +546,8 @@ static int cp_stat64(struct stat64_emu31 __user *ubuf, struct kstat *stat)
 	tmp.__st_ino = (u32)stat->ino;
 	tmp.st_mode = stat->mode;
 	tmp.st_nlink = (unsigned int)stat->nlink;
-	tmp.st_uid = stat->uid;
-	tmp.st_gid = stat->gid;
+	tmp.st_uid = from_kuid_munged(current_user_ns(), stat->uid);
+	tmp.st_gid = from_kgid_munged(current_user_ns(), stat->gid);
 	tmp.st_rdev = huge_encode_dev(stat->rdev);
 	tmp.st_size = stat->size;
 	tmp.st_blksize = (u32)stat->blksize;
diff --git a/arch/sparc/kernel/sys_sparc32.c b/arch/sparc/kernel/sys_sparc32.c
index 29c478f..f739233 100644
--- a/arch/sparc/kernel/sys_sparc32.c
+++ b/arch/sparc/kernel/sys_sparc32.c
@@ -139,8 +139,8 @@ static int cp_compat_stat64(struct kstat *stat,
 	err |= put_user(stat->ino, &statbuf->st_ino);
 	err |= put_user(stat->mode, &statbuf->st_mode);
 	err |= put_user(stat->nlink, &statbuf->st_nlink);
-	err |= put_user(stat->uid, &statbuf->st_uid);
-	err |= put_user(stat->gid, &statbuf->st_gid);
+	err |= put_user(from_kuid_munged(current_user_ns(), stat->uid), &statbuf->st_uid);
+	err |= put_user(from_kgid_munged(current_user_ns(), stat->gid), &statbuf->st_gid);
 	err |= put_user(huge_encode_dev(stat->rdev), &statbuf->st_rdev);
 	err |= put_user(0, (unsigned long __user *) &statbuf->__pad3[0]);
 	err |= put_user(stat->size, &statbuf->st_size);
diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
index aec2202..d5c820a 100644
--- a/arch/x86/ia32/sys_ia32.c
+++ b/arch/x86/ia32/sys_ia32.c
@@ -71,8 +71,8 @@ static int cp_stat64(struct stat64 __user *ubuf, struct kstat *stat)
 {
 	typeof(ubuf->st_uid) uid = 0;
 	typeof(ubuf->st_gid) gid = 0;
-	SET_UID(uid, stat->uid);
-	SET_GID(gid, stat->gid);
+	SET_UID(uid, from_kuid_munged(current_user_ns(), stat->uid));
+	SET_GID(gid, from_kgid_munged(current_user_ns(), stat->gid));
 	if (!access_ok(VERIFY_WRITE, ubuf, sizeof(struct stat64)) ||
 	    __put_user(huge_encode_dev(stat->dev), &ubuf->st_dev) ||
 	    __put_user(stat->ino, &ubuf->__st_ino) ||
diff --git a/fs/compat.c b/fs/compat.c
index f2944ac..0781e61 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -144,8 +144,8 @@ static int cp_compat_stat(struct kstat *stat, struct compat_stat __user *ubuf)
 	tmp.st_nlink = stat->nlink;
 	if (tmp.st_nlink != stat->nlink)
 		return -EOVERFLOW;
-	SET_UID(tmp.st_uid, stat->uid);
-	SET_GID(tmp.st_gid, stat->gid);
+	SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
+	SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
 	tmp.st_rdev = old_encode_dev(stat->rdev);
 	if ((u64) stat->size > MAX_NON_LFS)
 		return -EOVERFLOW;
diff --git a/fs/stat.c b/fs/stat.c
index c733dc5..fca17f9 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -137,8 +137,8 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
 	tmp.st_nlink = stat->nlink;
 	if (tmp.st_nlink != stat->nlink)
 		return -EOVERFLOW;
-	SET_UID(tmp.st_uid, stat->uid);
-	SET_GID(tmp.st_gid, stat->gid);
+	SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
+	SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
 	tmp.st_rdev = old_encode_dev(stat->rdev);
 #if BITS_PER_LONG == 32
 	if (stat->size > MAX_NON_LFS)
@@ -215,8 +215,8 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf)
 	tmp.st_nlink = stat->nlink;
 	if (tmp.st_nlink != stat->nlink)
 		return -EOVERFLOW;
-	SET_UID(tmp.st_uid, stat->uid);
-	SET_GID(tmp.st_gid, stat->gid);
+	SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
+	SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
 #if BITS_PER_LONG == 32
 	tmp.st_rdev = old_encode_dev(stat->rdev);
 #else
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 611c398..4613240 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -58,14 +58,15 @@
 
 #include <linux/types.h>
 #include <linux/time.h>
+#include <linux/uidgid.h>
 
 struct kstat {
 	u64		ino;
 	dev_t		dev;
 	umode_t		mode;
 	unsigned int	nlink;
-	uid_t		uid;
-	gid_t		gid;
+	kuid_t		uid;
+	kgid_t		gid;
 	dev_t		rdev;
 	loff_t		size;
 	struct timespec  atime;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 29/43] userns: Convert stat to return values mapped from kuids and kgids
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

- Store uids and gids with kuid_t and kgid_t in struct kstat
- Convert uid and gids to userspace usable values with
  from_kuid and from_kgid

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 arch/arm/kernel/sys_oabi-compat.c |    4 ++--
 arch/parisc/hpux/fs.c             |    4 ++--
 arch/s390/kernel/compat_linux.c   |    4 ++--
 arch/sparc/kernel/sys_sparc32.c   |    4 ++--
 arch/x86/ia32/sys_ia32.c          |    4 ++--
 fs/compat.c                       |    4 ++--
 fs/stat.c                         |    8 ++++----
 include/linux/stat.h              |    5 +++--
 8 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index af0aaeb..3e94811 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -124,8 +124,8 @@ static long cp_oldabi_stat64(struct kstat *stat,
 	tmp.__st_ino = stat->ino;
 	tmp.st_mode = stat->mode;
 	tmp.st_nlink = stat->nlink;
-	tmp.st_uid = stat->uid;
-	tmp.st_gid = stat->gid;
+	tmp.st_uid = from_kuid_munged(current_user_ns(), stat->uid);
+	tmp.st_gid = from_kgid_munged(current_user_ns(), stat->gid);
 	tmp.st_rdev = huge_encode_dev(stat->rdev);
 	tmp.st_size = stat->size;
 	tmp.st_blocks = stat->blocks;
diff --git a/arch/parisc/hpux/fs.c b/arch/parisc/hpux/fs.c
index 0dc8543..c71eb6c 100644
--- a/arch/parisc/hpux/fs.c
+++ b/arch/parisc/hpux/fs.c
@@ -159,8 +159,8 @@ static int cp_hpux_stat(struct kstat *stat, struct hpux_stat64 __user *statbuf)
 	tmp.st_ino = stat->ino;
 	tmp.st_mode = stat->mode;
 	tmp.st_nlink = stat->nlink;
-	tmp.st_uid = stat->uid;
-	tmp.st_gid = stat->gid;
+	tmp.st_uid = from_kuid_munged(current_user_ns(), stat->uid);
+	tmp.st_gid = from_kgid_munged(current_user_ns(), stat->gid);
 	tmp.st_rdev = new_encode_dev(stat->rdev);
 	tmp.st_size = stat->size;
 	tmp.st_atime = stat->atime.tv_sec;
diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
index 5baac18..80ab23a 100644
--- a/arch/s390/kernel/compat_linux.c
+++ b/arch/s390/kernel/compat_linux.c
@@ -546,8 +546,8 @@ static int cp_stat64(struct stat64_emu31 __user *ubuf, struct kstat *stat)
 	tmp.__st_ino = (u32)stat->ino;
 	tmp.st_mode = stat->mode;
 	tmp.st_nlink = (unsigned int)stat->nlink;
-	tmp.st_uid = stat->uid;
-	tmp.st_gid = stat->gid;
+	tmp.st_uid = from_kuid_munged(current_user_ns(), stat->uid);
+	tmp.st_gid = from_kgid_munged(current_user_ns(), stat->gid);
 	tmp.st_rdev = huge_encode_dev(stat->rdev);
 	tmp.st_size = stat->size;
 	tmp.st_blksize = (u32)stat->blksize;
diff --git a/arch/sparc/kernel/sys_sparc32.c b/arch/sparc/kernel/sys_sparc32.c
index 29c478f..f739233 100644
--- a/arch/sparc/kernel/sys_sparc32.c
+++ b/arch/sparc/kernel/sys_sparc32.c
@@ -139,8 +139,8 @@ static int cp_compat_stat64(struct kstat *stat,
 	err |= put_user(stat->ino, &statbuf->st_ino);
 	err |= put_user(stat->mode, &statbuf->st_mode);
 	err |= put_user(stat->nlink, &statbuf->st_nlink);
-	err |= put_user(stat->uid, &statbuf->st_uid);
-	err |= put_user(stat->gid, &statbuf->st_gid);
+	err |= put_user(from_kuid_munged(current_user_ns(), stat->uid), &statbuf->st_uid);
+	err |= put_user(from_kgid_munged(current_user_ns(), stat->gid), &statbuf->st_gid);
 	err |= put_user(huge_encode_dev(stat->rdev), &statbuf->st_rdev);
 	err |= put_user(0, (unsigned long __user *) &statbuf->__pad3[0]);
 	err |= put_user(stat->size, &statbuf->st_size);
diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
index aec2202..d5c820a 100644
--- a/arch/x86/ia32/sys_ia32.c
+++ b/arch/x86/ia32/sys_ia32.c
@@ -71,8 +71,8 @@ static int cp_stat64(struct stat64 __user *ubuf, struct kstat *stat)
 {
 	typeof(ubuf->st_uid) uid = 0;
 	typeof(ubuf->st_gid) gid = 0;
-	SET_UID(uid, stat->uid);
-	SET_GID(gid, stat->gid);
+	SET_UID(uid, from_kuid_munged(current_user_ns(), stat->uid));
+	SET_GID(gid, from_kgid_munged(current_user_ns(), stat->gid));
 	if (!access_ok(VERIFY_WRITE, ubuf, sizeof(struct stat64)) ||
 	    __put_user(huge_encode_dev(stat->dev), &ubuf->st_dev) ||
 	    __put_user(stat->ino, &ubuf->__st_ino) ||
diff --git a/fs/compat.c b/fs/compat.c
index f2944ac..0781e61 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -144,8 +144,8 @@ static int cp_compat_stat(struct kstat *stat, struct compat_stat __user *ubuf)
 	tmp.st_nlink = stat->nlink;
 	if (tmp.st_nlink != stat->nlink)
 		return -EOVERFLOW;
-	SET_UID(tmp.st_uid, stat->uid);
-	SET_GID(tmp.st_gid, stat->gid);
+	SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
+	SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
 	tmp.st_rdev = old_encode_dev(stat->rdev);
 	if ((u64) stat->size > MAX_NON_LFS)
 		return -EOVERFLOW;
diff --git a/fs/stat.c b/fs/stat.c
index c733dc5..fca17f9 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -137,8 +137,8 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
 	tmp.st_nlink = stat->nlink;
 	if (tmp.st_nlink != stat->nlink)
 		return -EOVERFLOW;
-	SET_UID(tmp.st_uid, stat->uid);
-	SET_GID(tmp.st_gid, stat->gid);
+	SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
+	SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
 	tmp.st_rdev = old_encode_dev(stat->rdev);
 #if BITS_PER_LONG == 32
 	if (stat->size > MAX_NON_LFS)
@@ -215,8 +215,8 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf)
 	tmp.st_nlink = stat->nlink;
 	if (tmp.st_nlink != stat->nlink)
 		return -EOVERFLOW;
-	SET_UID(tmp.st_uid, stat->uid);
-	SET_GID(tmp.st_gid, stat->gid);
+	SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
+	SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
 #if BITS_PER_LONG == 32
 	tmp.st_rdev = old_encode_dev(stat->rdev);
 #else
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 611c398..4613240 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -58,14 +58,15 @@
 
 #include <linux/types.h>
 #include <linux/time.h>
+#include <linux/uidgid.h>
 
 struct kstat {
 	u64		ino;
 	dev_t		dev;
 	umode_t		mode;
 	unsigned int	nlink;
-	uid_t		uid;
-	gid_t		gid;
+	kuid_t		uid;
+	kgid_t		gid;
 	dev_t		rdev;
 	loff_t		size;
 	struct timespec  atime;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 30/43] userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/exec.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 00ae2ef..e001bdf 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1291,8 +1291,11 @@ int prepare_binprm(struct linux_binprm *bprm)
 	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) {
 		/* Set-uid? */
 		if (mode & S_ISUID) {
+			if (!kuid_has_mapping(bprm->cred->user_ns, inode->i_uid))
+				return -EPERM;
 			bprm->per_clear |= PER_CLEAR_ON_SETID;
 			bprm->cred->euid = inode->i_uid;
+
 		}
 
 		/* Set-gid? */
@@ -1302,6 +1305,8 @@ int prepare_binprm(struct linux_binprm *bprm)
 		 * executable.
 		 */
 		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
+			if (!kgid_has_mapping(bprm->cred->user_ns, inode->i_gid))
+				return -EPERM;
 			bprm->per_clear |= PER_CLEAR_ON_SETID;
 			bprm->cred->egid = inode->i_gid;
 		}
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 30/43] userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/exec.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 00ae2ef..e001bdf 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1291,8 +1291,11 @@ int prepare_binprm(struct linux_binprm *bprm)
 	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) {
 		/* Set-uid? */
 		if (mode & S_ISUID) {
+			if (!kuid_has_mapping(bprm->cred->user_ns, inode->i_uid))
+				return -EPERM;
 			bprm->per_clear |= PER_CLEAR_ON_SETID;
 			bprm->cred->euid = inode->i_uid;
+
 		}
 
 		/* Set-gid? */
@@ -1302,6 +1305,8 @@ int prepare_binprm(struct linux_binprm *bprm)
 		 * executable.
 		 */
 		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
+			if (!kgid_has_mapping(bprm->cred->user_ns, inode->i_gid))
+				return -EPERM;
 			bprm->per_clear |= PER_CLEAR_ON_SETID;
 			bprm->cred->egid = inode->i_gid;
 		}
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 30/43] userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/exec.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 00ae2ef..e001bdf 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1291,8 +1291,11 @@ int prepare_binprm(struct linux_binprm *bprm)
 	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) {
 		/* Set-uid? */
 		if (mode & S_ISUID) {
+			if (!kuid_has_mapping(bprm->cred->user_ns, inode->i_uid))
+				return -EPERM;
 			bprm->per_clear |= PER_CLEAR_ON_SETID;
 			bprm->cred->euid = inode->i_uid;
+
 		}
 
 		/* Set-gid? */
@@ -1302,6 +1305,8 @@ int prepare_binprm(struct linux_binprm *bprm)
 		 * executable.
 		 */
 		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
+			if (!kgid_has_mapping(bprm->cred->user_ns, inode->i_gid))
+				return -EPERM;
 			bprm->per_clear |= PER_CLEAR_ON_SETID;
 			bprm->cred->egid = inode->i_gid;
 		}
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 31/43] userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/capability.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/capability.c b/kernel/capability.c
index cc5f071..493d972 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -429,12 +429,14 @@ bool nsown_capable(int cap)
  * targeted at it's own user namespace and that the given inode is owned
  * by the current user namespace or a child namespace.
  *
- * Currently inodes can only be owned by the initial user namespace.
+ * Currently we check to see if an inode is owned by the current
+ * user namespace by seeing if the inode's owner maps into the
+ * current user namespace.
  *
  */
 bool inode_capable(const struct inode *inode, int cap)
 {
 	struct user_namespace *ns = current_user_ns();
 
-	return ns_capable(ns, cap) && (ns == &init_user_ns);
+	return ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid);
 }
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 31/43] userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 kernel/capability.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/capability.c b/kernel/capability.c
index cc5f071..493d972 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -429,12 +429,14 @@ bool nsown_capable(int cap)
  * targeted at it's own user namespace and that the given inode is owned
  * by the current user namespace or a child namespace.
  *
- * Currently inodes can only be owned by the initial user namespace.
+ * Currently we check to see if an inode is owned by the current
+ * user namespace by seeing if the inode's owner maps into the
+ * current user namespace.
  *
  */
 bool inode_capable(const struct inode *inode, int cap)
 {
 	struct user_namespace *ns = current_user_ns();
 
-	return ns_capable(ns, cap) && (ns == &init_user_ns);
+	return ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid);
 }
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 31/43] userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/capability.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/capability.c b/kernel/capability.c
index cc5f071..493d972 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -429,12 +429,14 @@ bool nsown_capable(int cap)
  * targeted at it's own user namespace and that the given inode is owned
  * by the current user namespace or a child namespace.
  *
- * Currently inodes can only be owned by the initial user namespace.
+ * Currently we check to see if an inode is owned by the current
+ * user namespace by seeing if the inode's owner maps into the
+ * current user namespace.
  *
  */
 bool inode_capable(const struct inode *inode, int cap)
 {
 	struct user_namespace *ns = current_user_ns();
 
-	return ns_capable(ns, cap) && (ns == &init_user_ns);
+	return ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid);
 }
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 32/43] userns: signal remove unnecessary map_cred_ns
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

map_cred_ns is a light wrapper around from_kuid with the order of the arguments
reversed.  Replace map_cred_ns with from_kuid and remove map_cred_ns.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/signal.c |   20 +++++---------------
 1 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 9797939..6aca310 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1019,15 +1019,6 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-/*
- * map the uid in struct cred into user namespace *ns
- */
-static inline uid_t map_cred_ns(const struct cred *cred,
-				struct user_namespace *ns)
-{
-	return from_kuid_munged(ns, cred->uid);
-}
-
 #ifdef CONFIG_USER_NS
 static inline void userns_fixup_signal_uid(struct siginfo *info, struct task_struct *t)
 {
@@ -1677,8 +1668,8 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 */
 	rcu_read_lock();
 	info.si_pid = task_pid_nr_ns(tsk, tsk->parent->nsproxy->pid_ns);
-	info.si_uid = map_cred_ns(__task_cred(tsk),
-			task_cred_xxx(tsk->parent, user_ns));
+	info.si_uid = from_kuid_munged(task_cred_xxx(tsk->parent, user_ns),
+				       task_uid(tsk));
 	rcu_read_unlock();
 
 	info.si_utime = cputime_to_clock_t(tsk->utime + tsk->signal->utime);
@@ -1761,8 +1752,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	 */
 	rcu_read_lock();
 	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
-	info.si_uid = map_cred_ns(__task_cred(tsk),
-			task_cred_xxx(parent, user_ns));
+	info.si_uid = from_kuid_munged(task_cred_xxx(parent, user_ns), task_uid(tsk));
 	rcu_read_unlock();
 
 	info.si_utime = cputime_to_clock_t(tsk->utime);
@@ -2180,8 +2170,8 @@ static int ptrace_signal(int signr, siginfo_t *info,
 		info->si_code = SI_USER;
 		rcu_read_lock();
 		info->si_pid = task_pid_vnr(current->parent);
-		info->si_uid = map_cred_ns(__task_cred(current->parent),
-				current_user_ns());
+		info->si_uid = from_kuid_munged(current_user_ns(),
+						task_uid(current->parent));
 		rcu_read_unlock();
 	}
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 32/43] userns: signal remove unnecessary map_cred_ns
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

map_cred_ns is a light wrapper around from_kuid with the order of the arguments
reversed.  Replace map_cred_ns with from_kuid and remove map_cred_ns.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 kernel/signal.c |   20 +++++---------------
 1 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 9797939..6aca310 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1019,15 +1019,6 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-/*
- * map the uid in struct cred into user namespace *ns
- */
-static inline uid_t map_cred_ns(const struct cred *cred,
-				struct user_namespace *ns)
-{
-	return from_kuid_munged(ns, cred->uid);
-}
-
 #ifdef CONFIG_USER_NS
 static inline void userns_fixup_signal_uid(struct siginfo *info, struct task_struct *t)
 {
@@ -1677,8 +1668,8 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 */
 	rcu_read_lock();
 	info.si_pid = task_pid_nr_ns(tsk, tsk->parent->nsproxy->pid_ns);
-	info.si_uid = map_cred_ns(__task_cred(tsk),
-			task_cred_xxx(tsk->parent, user_ns));
+	info.si_uid = from_kuid_munged(task_cred_xxx(tsk->parent, user_ns),
+				       task_uid(tsk));
 	rcu_read_unlock();
 
 	info.si_utime = cputime_to_clock_t(tsk->utime + tsk->signal->utime);
@@ -1761,8 +1752,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	 */
 	rcu_read_lock();
 	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
-	info.si_uid = map_cred_ns(__task_cred(tsk),
-			task_cred_xxx(parent, user_ns));
+	info.si_uid = from_kuid_munged(task_cred_xxx(parent, user_ns), task_uid(tsk));
 	rcu_read_unlock();
 
 	info.si_utime = cputime_to_clock_t(tsk->utime);
@@ -2180,8 +2170,8 @@ static int ptrace_signal(int signr, siginfo_t *info,
 		info->si_code = SI_USER;
 		rcu_read_lock();
 		info->si_pid = task_pid_vnr(current->parent);
-		info->si_uid = map_cred_ns(__task_cred(current->parent),
-				current_user_ns());
+		info->si_uid = from_kuid_munged(current_user_ns(),
+						task_uid(current->parent));
 		rcu_read_unlock();
 	}
 
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 32/43] userns: signal remove unnecessary map_cred_ns
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

map_cred_ns is a light wrapper around from_kuid with the order of the arguments
reversed.  Replace map_cred_ns with from_kuid and remove map_cred_ns.

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/signal.c |   20 +++++---------------
 1 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 9797939..6aca310 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1019,15 +1019,6 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
 	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
 }
 
-/*
- * map the uid in struct cred into user namespace *ns
- */
-static inline uid_t map_cred_ns(const struct cred *cred,
-				struct user_namespace *ns)
-{
-	return from_kuid_munged(ns, cred->uid);
-}
-
 #ifdef CONFIG_USER_NS
 static inline void userns_fixup_signal_uid(struct siginfo *info, struct task_struct *t)
 {
@@ -1677,8 +1668,8 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 */
 	rcu_read_lock();
 	info.si_pid = task_pid_nr_ns(tsk, tsk->parent->nsproxy->pid_ns);
-	info.si_uid = map_cred_ns(__task_cred(tsk),
-			task_cred_xxx(tsk->parent, user_ns));
+	info.si_uid = from_kuid_munged(task_cred_xxx(tsk->parent, user_ns),
+				       task_uid(tsk));
 	rcu_read_unlock();
 
 	info.si_utime = cputime_to_clock_t(tsk->utime + tsk->signal->utime);
@@ -1761,8 +1752,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	 */
 	rcu_read_lock();
 	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
-	info.si_uid = map_cred_ns(__task_cred(tsk),
-			task_cred_xxx(parent, user_ns));
+	info.si_uid = from_kuid_munged(task_cred_xxx(parent, user_ns), task_uid(tsk));
 	rcu_read_unlock();
 
 	info.si_utime = cputime_to_clock_t(tsk->utime);
@@ -2180,8 +2170,8 @@ static int ptrace_signal(int signr, siginfo_t *info,
 		info->si_code = SI_USER;
 		rcu_read_lock();
 		info->si_pid = task_pid_vnr(current->parent);
-		info->si_uid = map_cred_ns(__task_cred(current->parent),
-				current_user_ns());
+		info->si_uid = from_kuid_munged(current_user_ns(),
+						task_uid(current->parent));
 		rcu_read_unlock();
 	}
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 33/43] userns: Convert binary formats to use kuid/kgid where appropriate
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/binfmt_elf.c       |   12 ++++++------
 fs/binfmt_elf_fdpic.c |   12 ++++++------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 48ffb3d..efc6731 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -228,10 +228,10 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
 	NEW_AUX_ENT(AT_BASE, interp_load_addr);
 	NEW_AUX_ENT(AT_FLAGS, 0);
 	NEW_AUX_ENT(AT_ENTRY, exec->e_entry);
-	NEW_AUX_ENT(AT_UID, cred->uid);
-	NEW_AUX_ENT(AT_EUID, cred->euid);
-	NEW_AUX_ENT(AT_GID, cred->gid);
-	NEW_AUX_ENT(AT_EGID, cred->egid);
+	NEW_AUX_ENT(AT_UID, from_kuid_munged(cred->user_ns, cred->uid));
+	NEW_AUX_ENT(AT_EUID, from_kuid_munged(cred->user_ns, cred->euid));
+	NEW_AUX_ENT(AT_GID, from_kgid_munged(cred->user_ns, cred->gid));
+	NEW_AUX_ENT(AT_EGID, from_kgid_munged(cred->user_ns, cred->egid));
  	NEW_AUX_ENT(AT_SECURE, security_bprm_secureexec(bprm));
 	NEW_AUX_ENT(AT_RANDOM, (elf_addr_t)(unsigned long)u_rand_bytes);
 	NEW_AUX_ENT(AT_EXECFN, bprm->exec);
@@ -1367,8 +1367,8 @@ static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
 	psinfo->pr_flag = p->flags;
 	rcu_read_lock();
 	cred = __task_cred(p);
-	SET_UID(psinfo->pr_uid, cred->uid);
-	SET_GID(psinfo->pr_gid, cred->gid);
+	SET_UID(psinfo->pr_uid, from_kuid_munged(cred->user_ns, cred->uid));
+	SET_GID(psinfo->pr_gid, from_kgid_munged(cred->user_ns, cred->gid));
 	rcu_read_unlock();
 	strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
 	
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 9bd5612..82bf0ed 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -631,10 +631,10 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm,
 	NEW_AUX_ENT(AT_BASE,	interp_params->elfhdr_addr);
 	NEW_AUX_ENT(AT_FLAGS,	0);
 	NEW_AUX_ENT(AT_ENTRY,	exec_params->entry_addr);
-	NEW_AUX_ENT(AT_UID,	(elf_addr_t) cred->uid);
-	NEW_AUX_ENT(AT_EUID,	(elf_addr_t) cred->euid);
-	NEW_AUX_ENT(AT_GID,	(elf_addr_t) cred->gid);
-	NEW_AUX_ENT(AT_EGID,	(elf_addr_t) cred->egid);
+	NEW_AUX_ENT(AT_UID,	(elf_addr_t) from_kuid_munged(cred->user_ns, cred->uid));
+	NEW_AUX_ENT(AT_EUID,	(elf_addr_t) from_kuid_munged(cred->user_ns, cred->euid));
+	NEW_AUX_ENT(AT_GID,	(elf_addr_t) from_kgid_munged(cred->user_ns, cred->gid));
+	NEW_AUX_ENT(AT_EGID,	(elf_addr_t) from_kgid_munged(cred->user_ns, cred->egid));
 	NEW_AUX_ENT(AT_SECURE,	security_bprm_secureexec(bprm));
 	NEW_AUX_ENT(AT_EXECFN,	bprm->exec);
 
@@ -1431,8 +1431,8 @@ static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
 	psinfo->pr_flag = p->flags;
 	rcu_read_lock();
 	cred = __task_cred(p);
-	SET_UID(psinfo->pr_uid, cred->uid);
-	SET_GID(psinfo->pr_gid, cred->gid);
+	SET_UID(psinfo->pr_uid, from_kuid_munged(cred->user_ns, cred->uid));
+	SET_GID(psinfo->pr_gid, from_kgid_munged(cred->user_ns, cred->gid));
 	rcu_read_unlock();
 	strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 33/43] userns: Convert binary formats to use kuid/kgid where appropriate
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/binfmt_elf.c       |   12 ++++++------
 fs/binfmt_elf_fdpic.c |   12 ++++++------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 48ffb3d..efc6731 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -228,10 +228,10 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
 	NEW_AUX_ENT(AT_BASE, interp_load_addr);
 	NEW_AUX_ENT(AT_FLAGS, 0);
 	NEW_AUX_ENT(AT_ENTRY, exec->e_entry);
-	NEW_AUX_ENT(AT_UID, cred->uid);
-	NEW_AUX_ENT(AT_EUID, cred->euid);
-	NEW_AUX_ENT(AT_GID, cred->gid);
-	NEW_AUX_ENT(AT_EGID, cred->egid);
+	NEW_AUX_ENT(AT_UID, from_kuid_munged(cred->user_ns, cred->uid));
+	NEW_AUX_ENT(AT_EUID, from_kuid_munged(cred->user_ns, cred->euid));
+	NEW_AUX_ENT(AT_GID, from_kgid_munged(cred->user_ns, cred->gid));
+	NEW_AUX_ENT(AT_EGID, from_kgid_munged(cred->user_ns, cred->egid));
  	NEW_AUX_ENT(AT_SECURE, security_bprm_secureexec(bprm));
 	NEW_AUX_ENT(AT_RANDOM, (elf_addr_t)(unsigned long)u_rand_bytes);
 	NEW_AUX_ENT(AT_EXECFN, bprm->exec);
@@ -1367,8 +1367,8 @@ static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
 	psinfo->pr_flag = p->flags;
 	rcu_read_lock();
 	cred = __task_cred(p);
-	SET_UID(psinfo->pr_uid, cred->uid);
-	SET_GID(psinfo->pr_gid, cred->gid);
+	SET_UID(psinfo->pr_uid, from_kuid_munged(cred->user_ns, cred->uid));
+	SET_GID(psinfo->pr_gid, from_kgid_munged(cred->user_ns, cred->gid));
 	rcu_read_unlock();
 	strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
 	
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 9bd5612..82bf0ed 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -631,10 +631,10 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm,
 	NEW_AUX_ENT(AT_BASE,	interp_params->elfhdr_addr);
 	NEW_AUX_ENT(AT_FLAGS,	0);
 	NEW_AUX_ENT(AT_ENTRY,	exec_params->entry_addr);
-	NEW_AUX_ENT(AT_UID,	(elf_addr_t) cred->uid);
-	NEW_AUX_ENT(AT_EUID,	(elf_addr_t) cred->euid);
-	NEW_AUX_ENT(AT_GID,	(elf_addr_t) cred->gid);
-	NEW_AUX_ENT(AT_EGID,	(elf_addr_t) cred->egid);
+	NEW_AUX_ENT(AT_UID,	(elf_addr_t) from_kuid_munged(cred->user_ns, cred->uid));
+	NEW_AUX_ENT(AT_EUID,	(elf_addr_t) from_kuid_munged(cred->user_ns, cred->euid));
+	NEW_AUX_ENT(AT_GID,	(elf_addr_t) from_kgid_munged(cred->user_ns, cred->gid));
+	NEW_AUX_ENT(AT_EGID,	(elf_addr_t) from_kgid_munged(cred->user_ns, cred->egid));
 	NEW_AUX_ENT(AT_SECURE,	security_bprm_secureexec(bprm));
 	NEW_AUX_ENT(AT_EXECFN,	bprm->exec);
 
@@ -1431,8 +1431,8 @@ static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
 	psinfo->pr_flag = p->flags;
 	rcu_read_lock();
 	cred = __task_cred(p);
-	SET_UID(psinfo->pr_uid, cred->uid);
-	SET_GID(psinfo->pr_gid, cred->gid);
+	SET_UID(psinfo->pr_uid, from_kuid_munged(cred->user_ns, cred->uid));
+	SET_GID(psinfo->pr_gid, from_kgid_munged(cred->user_ns, cred->gid));
 	rcu_read_unlock();
 	strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
 
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 33/43] userns: Convert binary formats to use kuid/kgid where appropriate
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/binfmt_elf.c       |   12 ++++++------
 fs/binfmt_elf_fdpic.c |   12 ++++++------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 48ffb3d..efc6731 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -228,10 +228,10 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
 	NEW_AUX_ENT(AT_BASE, interp_load_addr);
 	NEW_AUX_ENT(AT_FLAGS, 0);
 	NEW_AUX_ENT(AT_ENTRY, exec->e_entry);
-	NEW_AUX_ENT(AT_UID, cred->uid);
-	NEW_AUX_ENT(AT_EUID, cred->euid);
-	NEW_AUX_ENT(AT_GID, cred->gid);
-	NEW_AUX_ENT(AT_EGID, cred->egid);
+	NEW_AUX_ENT(AT_UID, from_kuid_munged(cred->user_ns, cred->uid));
+	NEW_AUX_ENT(AT_EUID, from_kuid_munged(cred->user_ns, cred->euid));
+	NEW_AUX_ENT(AT_GID, from_kgid_munged(cred->user_ns, cred->gid));
+	NEW_AUX_ENT(AT_EGID, from_kgid_munged(cred->user_ns, cred->egid));
  	NEW_AUX_ENT(AT_SECURE, security_bprm_secureexec(bprm));
 	NEW_AUX_ENT(AT_RANDOM, (elf_addr_t)(unsigned long)u_rand_bytes);
 	NEW_AUX_ENT(AT_EXECFN, bprm->exec);
@@ -1367,8 +1367,8 @@ static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
 	psinfo->pr_flag = p->flags;
 	rcu_read_lock();
 	cred = __task_cred(p);
-	SET_UID(psinfo->pr_uid, cred->uid);
-	SET_GID(psinfo->pr_gid, cred->gid);
+	SET_UID(psinfo->pr_uid, from_kuid_munged(cred->user_ns, cred->uid));
+	SET_GID(psinfo->pr_gid, from_kgid_munged(cred->user_ns, cred->gid));
 	rcu_read_unlock();
 	strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
 	
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 9bd5612..82bf0ed 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -631,10 +631,10 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm,
 	NEW_AUX_ENT(AT_BASE,	interp_params->elfhdr_addr);
 	NEW_AUX_ENT(AT_FLAGS,	0);
 	NEW_AUX_ENT(AT_ENTRY,	exec_params->entry_addr);
-	NEW_AUX_ENT(AT_UID,	(elf_addr_t) cred->uid);
-	NEW_AUX_ENT(AT_EUID,	(elf_addr_t) cred->euid);
-	NEW_AUX_ENT(AT_GID,	(elf_addr_t) cred->gid);
-	NEW_AUX_ENT(AT_EGID,	(elf_addr_t) cred->egid);
+	NEW_AUX_ENT(AT_UID,	(elf_addr_t) from_kuid_munged(cred->user_ns, cred->uid));
+	NEW_AUX_ENT(AT_EUID,	(elf_addr_t) from_kuid_munged(cred->user_ns, cred->euid));
+	NEW_AUX_ENT(AT_GID,	(elf_addr_t) from_kgid_munged(cred->user_ns, cred->gid));
+	NEW_AUX_ENT(AT_EGID,	(elf_addr_t) from_kgid_munged(cred->user_ns, cred->egid));
 	NEW_AUX_ENT(AT_SECURE,	security_bprm_secureexec(bprm));
 	NEW_AUX_ENT(AT_EXECFN,	bprm->exec);
 
@@ -1431,8 +1431,8 @@ static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
 	psinfo->pr_flag = p->flags;
 	rcu_read_lock();
 	cred = __task_cred(p);
-	SET_UID(psinfo->pr_uid, cred->uid);
-	SET_GID(psinfo->pr_gid, cred->gid);
+	SET_UID(psinfo->pr_uid, from_kuid_munged(cred->user_ns, cred->uid));
+	SET_GID(psinfo->pr_gid, from_kgid_munged(cred->user_ns, cred->gid));
 	rcu_read_unlock();
 	strncpy(psinfo->pr_fname, p->comm, sizeof(psinfo->pr_fname));
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 34/43] userns: Convert devpts to use kuid/kgid where appropriate
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/devpts/inode.c |   24 ++++++++++++++++--------
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 10f5e0b..979c1e3 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -98,8 +98,8 @@ static struct vfsmount *devpts_mnt;
 struct pts_mount_opts {
 	int setuid;
 	int setgid;
-	uid_t   uid;
-	gid_t   gid;
+	kuid_t   uid;
+	kgid_t   gid;
 	umode_t mode;
 	umode_t ptmxmode;
 	int newinstance;
@@ -158,11 +158,13 @@ static inline struct super_block *pts_sb_from_inode(struct inode *inode)
 static int parse_mount_options(char *data, int op, struct pts_mount_opts *opts)
 {
 	char *p;
+	kuid_t uid;
+	kgid_t gid;
 
 	opts->setuid  = 0;
 	opts->setgid  = 0;
-	opts->uid     = 0;
-	opts->gid     = 0;
+	opts->uid     = GLOBAL_ROOT_UID;
+	opts->gid     = GLOBAL_ROOT_GID;
 	opts->mode    = DEVPTS_DEFAULT_MODE;
 	opts->ptmxmode = DEVPTS_DEFAULT_PTMX_MODE;
 	opts->max     = NR_UNIX98_PTY_MAX;
@@ -184,13 +186,19 @@ static int parse_mount_options(char *data, int op, struct pts_mount_opts *opts)
 		case Opt_uid:
 			if (match_int(&args[0], &option))
 				return -EINVAL;
-			opts->uid = option;
+			uid = make_kuid(current_user_ns(), option);
+			if (!uid_valid(uid))
+				return -EINVAL;
+			opts->uid = uid;
 			opts->setuid = 1;
 			break;
 		case Opt_gid:
 			if (match_int(&args[0], &option))
 				return -EINVAL;
-			opts->gid = option;
+			gid = make_kgid(current_user_ns(), option);
+			if (!gid_valid(gid))
+				return -EINVAL;
+			opts->gid = gid;
 			opts->setgid = 1;
 			break;
 		case Opt_mode:
@@ -315,9 +323,9 @@ static int devpts_show_options(struct seq_file *seq, struct dentry *root)
 	struct pts_mount_opts *opts = &fsi->mount_opts;
 
 	if (opts->setuid)
-		seq_printf(seq, ",uid=%u", opts->uid);
+		seq_printf(seq, ",uid=%u", from_kuid_munged(&init_user_ns, opts->uid));
 	if (opts->setgid)
-		seq_printf(seq, ",gid=%u", opts->gid);
+		seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, opts->gid));
 	seq_printf(seq, ",mode=%03o", opts->mode);
 #ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
 	seq_printf(seq, ",ptmxmode=%03o", opts->ptmxmode);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 34/43] userns: Convert devpts to use kuid/kgid where appropriate
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/devpts/inode.c |   24 ++++++++++++++++--------
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 10f5e0b..979c1e3 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -98,8 +98,8 @@ static struct vfsmount *devpts_mnt;
 struct pts_mount_opts {
 	int setuid;
 	int setgid;
-	uid_t   uid;
-	gid_t   gid;
+	kuid_t   uid;
+	kgid_t   gid;
 	umode_t mode;
 	umode_t ptmxmode;
 	int newinstance;
@@ -158,11 +158,13 @@ static inline struct super_block *pts_sb_from_inode(struct inode *inode)
 static int parse_mount_options(char *data, int op, struct pts_mount_opts *opts)
 {
 	char *p;
+	kuid_t uid;
+	kgid_t gid;
 
 	opts->setuid  = 0;
 	opts->setgid  = 0;
-	opts->uid     = 0;
-	opts->gid     = 0;
+	opts->uid     = GLOBAL_ROOT_UID;
+	opts->gid     = GLOBAL_ROOT_GID;
 	opts->mode    = DEVPTS_DEFAULT_MODE;
 	opts->ptmxmode = DEVPTS_DEFAULT_PTMX_MODE;
 	opts->max     = NR_UNIX98_PTY_MAX;
@@ -184,13 +186,19 @@ static int parse_mount_options(char *data, int op, struct pts_mount_opts *opts)
 		case Opt_uid:
 			if (match_int(&args[0], &option))
 				return -EINVAL;
-			opts->uid = option;
+			uid = make_kuid(current_user_ns(), option);
+			if (!uid_valid(uid))
+				return -EINVAL;
+			opts->uid = uid;
 			opts->setuid = 1;
 			break;
 		case Opt_gid:
 			if (match_int(&args[0], &option))
 				return -EINVAL;
-			opts->gid = option;
+			gid = make_kgid(current_user_ns(), option);
+			if (!gid_valid(gid))
+				return -EINVAL;
+			opts->gid = gid;
 			opts->setgid = 1;
 			break;
 		case Opt_mode:
@@ -315,9 +323,9 @@ static int devpts_show_options(struct seq_file *seq, struct dentry *root)
 	struct pts_mount_opts *opts = &fsi->mount_opts;
 
 	if (opts->setuid)
-		seq_printf(seq, ",uid=%u", opts->uid);
+		seq_printf(seq, ",uid=%u", from_kuid_munged(&init_user_ns, opts->uid));
 	if (opts->setgid)
-		seq_printf(seq, ",gid=%u", opts->gid);
+		seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, opts->gid));
 	seq_printf(seq, ",mode=%03o", opts->mode);
 #ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
 	seq_printf(seq, ",ptmxmode=%03o", opts->ptmxmode);
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 34/43] userns: Convert devpts to use kuid/kgid where appropriate
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/devpts/inode.c |   24 ++++++++++++++++--------
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 10f5e0b..979c1e3 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -98,8 +98,8 @@ static struct vfsmount *devpts_mnt;
 struct pts_mount_opts {
 	int setuid;
 	int setgid;
-	uid_t   uid;
-	gid_t   gid;
+	kuid_t   uid;
+	kgid_t   gid;
 	umode_t mode;
 	umode_t ptmxmode;
 	int newinstance;
@@ -158,11 +158,13 @@ static inline struct super_block *pts_sb_from_inode(struct inode *inode)
 static int parse_mount_options(char *data, int op, struct pts_mount_opts *opts)
 {
 	char *p;
+	kuid_t uid;
+	kgid_t gid;
 
 	opts->setuid  = 0;
 	opts->setgid  = 0;
-	opts->uid     = 0;
-	opts->gid     = 0;
+	opts->uid     = GLOBAL_ROOT_UID;
+	opts->gid     = GLOBAL_ROOT_GID;
 	opts->mode    = DEVPTS_DEFAULT_MODE;
 	opts->ptmxmode = DEVPTS_DEFAULT_PTMX_MODE;
 	opts->max     = NR_UNIX98_PTY_MAX;
@@ -184,13 +186,19 @@ static int parse_mount_options(char *data, int op, struct pts_mount_opts *opts)
 		case Opt_uid:
 			if (match_int(&args[0], &option))
 				return -EINVAL;
-			opts->uid = option;
+			uid = make_kuid(current_user_ns(), option);
+			if (!uid_valid(uid))
+				return -EINVAL;
+			opts->uid = uid;
 			opts->setuid = 1;
 			break;
 		case Opt_gid:
 			if (match_int(&args[0], &option))
 				return -EINVAL;
-			opts->gid = option;
+			gid = make_kgid(current_user_ns(), option);
+			if (!gid_valid(gid))
+				return -EINVAL;
+			opts->gid = gid;
 			opts->setgid = 1;
 			break;
 		case Opt_mode:
@@ -315,9 +323,9 @@ static int devpts_show_options(struct seq_file *seq, struct dentry *root)
 	struct pts_mount_opts *opts = &fsi->mount_opts;
 
 	if (opts->setuid)
-		seq_printf(seq, ",uid=%u", opts->uid);
+		seq_printf(seq, ",uid=%u", from_kuid_munged(&init_user_ns, opts->uid));
 	if (opts->setgid)
-		seq_printf(seq, ",gid=%u", opts->gid);
+		seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, opts->gid));
 	seq_printf(seq, ",mode=%03o", opts->mode);
 #ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
 	seq_printf(seq, ",ptmxmode=%03o", opts->ptmxmode);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 35/43] userns: Convert ext2 to use kuid/kgid where appropriate.
  2012-04-08  5:10 ` Eric W. Biederman
  (?)
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/ext2/balloc.c |    5 +++--
 fs/ext2/ext2.h   |    8 ++++----
 fs/ext2/inode.c  |   20 ++++++++++++--------
 fs/ext2/super.c  |   31 +++++++++++++++++++++++--------
 4 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c
index a8cbe1b..030c6d2 100644
--- a/fs/ext2/balloc.c
+++ b/fs/ext2/balloc.c
@@ -1193,8 +1193,9 @@ static int ext2_has_free_blocks(struct ext2_sb_info *sbi)
 	free_blocks = percpu_counter_read_positive(&sbi->s_freeblocks_counter);
 	root_blocks = le32_to_cpu(sbi->s_es->s_r_blocks_count);
 	if (free_blocks < root_blocks + 1 && !capable(CAP_SYS_RESOURCE) &&
-		sbi->s_resuid != current_fsuid() &&
-		(sbi->s_resgid == 0 || !in_group_p (sbi->s_resgid))) {
+		!uid_eq(sbi->s_resuid, current_fsuid()) &&
+		(gid_eq(sbi->s_resgid, GLOBAL_ROOT_GID) ||
+		 !in_group_p (sbi->s_resgid))) {
 		return 0;
 	}
 	return 1;
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 0b2b4db..d9a17d0 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -82,8 +82,8 @@ struct ext2_sb_info {
 	struct buffer_head ** s_group_desc;
 	unsigned long  s_mount_opt;
 	unsigned long s_sb_block;
-	uid_t s_resuid;
-	gid_t s_resgid;
+	kuid_t s_resuid;
+	kgid_t s_resgid;
 	unsigned short s_mount_state;
 	unsigned short s_pad;
 	int s_addr_per_block_bits;
@@ -637,8 +637,8 @@ static inline void verify_offsets(void)
  */
 struct ext2_mount_options {
 	unsigned long s_mount_opt;
-	uid_t s_resuid;
-	gid_t s_resgid;
+	kuid_t s_resuid;
+	kgid_t s_resgid;
 };
 
 /*
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 740cad8..f9fa95f 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1293,6 +1293,8 @@ struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
 	struct inode *inode;
 	long ret = -EIO;
 	int n;
+	uid_t i_uid;
+	gid_t i_gid;
 
 	inode = iget_locked(sb, ino);
 	if (!inode)
@@ -1310,12 +1312,14 @@ struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
 	}
 
 	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
-	inode->i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
-	inode->i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
+	i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
+	i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
 	if (!(test_opt (inode->i_sb, NO_UID32))) {
-		inode->i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
-		inode->i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
+		i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
+		i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
 	}
+	i_uid_write(inode, i_uid);
+	i_gid_write(inode, i_gid);
 	set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));
 	inode->i_size = le32_to_cpu(raw_inode->i_size);
 	inode->i_atime.tv_sec = (signed)le32_to_cpu(raw_inode->i_atime);
@@ -1413,8 +1417,8 @@ static int __ext2_write_inode(struct inode *inode, int do_sync)
 	struct ext2_inode_info *ei = EXT2_I(inode);
 	struct super_block *sb = inode->i_sb;
 	ino_t ino = inode->i_ino;
-	uid_t uid = inode->i_uid;
-	gid_t gid = inode->i_gid;
+	uid_t uid = i_uid_read(inode);
+	gid_t gid = i_gid_read(inode);
 	struct buffer_head * bh;
 	struct ext2_inode * raw_inode = ext2_get_inode(sb, ino, &bh);
 	int n;
@@ -1529,8 +1533,8 @@ int ext2_setattr(struct dentry *dentry, struct iattr *iattr)
 
 	if (is_quota_modification(inode, iattr))
 		dquot_initialize(inode);
-	if ((iattr->ia_valid & ATTR_UID && iattr->ia_uid != inode->i_uid) ||
-	    (iattr->ia_valid & ATTR_GID && iattr->ia_gid != inode->i_gid)) {
+	if ((iattr->ia_valid & ATTR_UID && !uid_eq(iattr->ia_uid, inode->i_uid)) ||
+	    (iattr->ia_valid & ATTR_GID && !gid_eq(iattr->ia_gid, inode->i_gid))) {
 		error = dquot_transfer(inode, iattr);
 		if (error)
 			return error;
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index e1025c7..38f8160 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -228,13 +228,15 @@ static int ext2_show_options(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",grpid");
 	if (!test_opt(sb, GRPID) && (def_mount_opts & EXT2_DEFM_BSDGROUPS))
 		seq_puts(seq, ",nogrpid");
-	if (sbi->s_resuid != EXT2_DEF_RESUID ||
+	if (!uid_eq(sbi->s_resuid, make_kuid(&init_user_ns, EXT2_DEF_RESUID)) ||
 	    le16_to_cpu(es->s_def_resuid) != EXT2_DEF_RESUID) {
-		seq_printf(seq, ",resuid=%u", sbi->s_resuid);
+		seq_printf(seq, ",resuid=%u",
+				from_kuid_munged(&init_user_ns, sbi->s_resuid));
 	}
-	if (sbi->s_resgid != EXT2_DEF_RESGID ||
+	if (!gid_eq(sbi->s_resgid, make_kgid(&init_user_ns, EXT2_DEF_RESGID)) ||
 	    le16_to_cpu(es->s_def_resgid) != EXT2_DEF_RESGID) {
-		seq_printf(seq, ",resgid=%u", sbi->s_resgid);
+		seq_printf(seq, ",resgid=%u",
+				from_kgid_munged(&init_user_ns, sbi->s_resgid));
 	}
 	if (test_opt(sb, ERRORS_RO)) {
 		int def_errors = le16_to_cpu(es->s_errors);
@@ -436,6 +438,8 @@ static int parse_options(char *options, struct super_block *sb)
 	struct ext2_sb_info *sbi = EXT2_SB(sb);
 	substring_t args[MAX_OPT_ARGS];
 	int option;
+	kuid_t uid;
+	kgid_t gid;
 
 	if (!options)
 		return 1;
@@ -462,12 +466,23 @@ static int parse_options(char *options, struct super_block *sb)
 		case Opt_resuid:
 			if (match_int(&args[0], &option))
 				return 0;
-			sbi->s_resuid = option;
+			uid = make_kuid(current_user_ns(), option);
+			if (!uid_valid(uid)) {
+				ext2_msg(sb, KERN_ERR, "Invalid uid value %d", option);
+				return -1;
+
+			}
+			sbi->s_resuid = uid;
 			break;
 		case Opt_resgid:
 			if (match_int(&args[0], &option))
 				return 0;
-			sbi->s_resgid = option;
+			gid = make_kgid(current_user_ns(), option);
+			if (!gid_valid(gid)) {
+				ext2_msg(sb, KERN_ERR, "Invalid gid value %d", option);
+				return -1;
+			}
+			sbi->s_resgid = gid;
 			break;
 		case Opt_sb:
 			/* handled by get_sb_block() instead of here */
@@ -841,8 +856,8 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
 	else
 		set_opt(sbi->s_mount_opt, ERRORS_RO);
 
-	sbi->s_resuid = le16_to_cpu(es->s_def_resuid);
-	sbi->s_resgid = le16_to_cpu(es->s_def_resgid);
+	sbi->s_resuid = make_kuid(&init_user_ns, le16_to_cpu(es->s_def_resuid));
+	sbi->s_resgid = make_kgid(&init_user_ns, le16_to_cpu(es->s_def_resgid));
 	
 	set_opt(sbi->s_mount_opt, RESERVATION);
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 35/43] userns: Convert ext2 to use kuid/kgid where appropriate.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/ext2/balloc.c |    5 +++--
 fs/ext2/ext2.h   |    8 ++++----
 fs/ext2/inode.c  |   20 ++++++++++++--------
 fs/ext2/super.c  |   31 +++++++++++++++++++++++--------
 4 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c
index a8cbe1b..030c6d2 100644
--- a/fs/ext2/balloc.c
+++ b/fs/ext2/balloc.c
@@ -1193,8 +1193,9 @@ static int ext2_has_free_blocks(struct ext2_sb_info *sbi)
 	free_blocks = percpu_counter_read_positive(&sbi->s_freeblocks_counter);
 	root_blocks = le32_to_cpu(sbi->s_es->s_r_blocks_count);
 	if (free_blocks < root_blocks + 1 && !capable(CAP_SYS_RESOURCE) &&
-		sbi->s_resuid != current_fsuid() &&
-		(sbi->s_resgid == 0 || !in_group_p (sbi->s_resgid))) {
+		!uid_eq(sbi->s_resuid, current_fsuid()) &&
+		(gid_eq(sbi->s_resgid, GLOBAL_ROOT_GID) ||
+		 !in_group_p (sbi->s_resgid))) {
 		return 0;
 	}
 	return 1;
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 0b2b4db..d9a17d0 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -82,8 +82,8 @@ struct ext2_sb_info {
 	struct buffer_head ** s_group_desc;
 	unsigned long  s_mount_opt;
 	unsigned long s_sb_block;
-	uid_t s_resuid;
-	gid_t s_resgid;
+	kuid_t s_resuid;
+	kgid_t s_resgid;
 	unsigned short s_mount_state;
 	unsigned short s_pad;
 	int s_addr_per_block_bits;
@@ -637,8 +637,8 @@ static inline void verify_offsets(void)
  */
 struct ext2_mount_options {
 	unsigned long s_mount_opt;
-	uid_t s_resuid;
-	gid_t s_resgid;
+	kuid_t s_resuid;
+	kgid_t s_resgid;
 };
 
 /*
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 740cad8..f9fa95f 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1293,6 +1293,8 @@ struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
 	struct inode *inode;
 	long ret = -EIO;
 	int n;
+	uid_t i_uid;
+	gid_t i_gid;
 
 	inode = iget_locked(sb, ino);
 	if (!inode)
@@ -1310,12 +1312,14 @@ struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
 	}
 
 	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
-	inode->i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
-	inode->i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
+	i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
+	i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
 	if (!(test_opt (inode->i_sb, NO_UID32))) {
-		inode->i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
-		inode->i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
+		i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
+		i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
 	}
+	i_uid_write(inode, i_uid);
+	i_gid_write(inode, i_gid);
 	set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));
 	inode->i_size = le32_to_cpu(raw_inode->i_size);
 	inode->i_atime.tv_sec = (signed)le32_to_cpu(raw_inode->i_atime);
@@ -1413,8 +1417,8 @@ static int __ext2_write_inode(struct inode *inode, int do_sync)
 	struct ext2_inode_info *ei = EXT2_I(inode);
 	struct super_block *sb = inode->i_sb;
 	ino_t ino = inode->i_ino;
-	uid_t uid = inode->i_uid;
-	gid_t gid = inode->i_gid;
+	uid_t uid = i_uid_read(inode);
+	gid_t gid = i_gid_read(inode);
 	struct buffer_head * bh;
 	struct ext2_inode * raw_inode = ext2_get_inode(sb, ino, &bh);
 	int n;
@@ -1529,8 +1533,8 @@ int ext2_setattr(struct dentry *dentry, struct iattr *iattr)
 
 	if (is_quota_modification(inode, iattr))
 		dquot_initialize(inode);
-	if ((iattr->ia_valid & ATTR_UID && iattr->ia_uid != inode->i_uid) ||
-	    (iattr->ia_valid & ATTR_GID && iattr->ia_gid != inode->i_gid)) {
+	if ((iattr->ia_valid & ATTR_UID && !uid_eq(iattr->ia_uid, inode->i_uid)) ||
+	    (iattr->ia_valid & ATTR_GID && !gid_eq(iattr->ia_gid, inode->i_gid))) {
 		error = dquot_transfer(inode, iattr);
 		if (error)
 			return error;
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index e1025c7..38f8160 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -228,13 +228,15 @@ static int ext2_show_options(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",grpid");
 	if (!test_opt(sb, GRPID) && (def_mount_opts & EXT2_DEFM_BSDGROUPS))
 		seq_puts(seq, ",nogrpid");
-	if (sbi->s_resuid != EXT2_DEF_RESUID ||
+	if (!uid_eq(sbi->s_resuid, make_kuid(&init_user_ns, EXT2_DEF_RESUID)) ||
 	    le16_to_cpu(es->s_def_resuid) != EXT2_DEF_RESUID) {
-		seq_printf(seq, ",resuid=%u", sbi->s_resuid);
+		seq_printf(seq, ",resuid=%u",
+				from_kuid_munged(&init_user_ns, sbi->s_resuid));
 	}
-	if (sbi->s_resgid != EXT2_DEF_RESGID ||
+	if (!gid_eq(sbi->s_resgid, make_kgid(&init_user_ns, EXT2_DEF_RESGID)) ||
 	    le16_to_cpu(es->s_def_resgid) != EXT2_DEF_RESGID) {
-		seq_printf(seq, ",resgid=%u", sbi->s_resgid);
+		seq_printf(seq, ",resgid=%u",
+				from_kgid_munged(&init_user_ns, sbi->s_resgid));
 	}
 	if (test_opt(sb, ERRORS_RO)) {
 		int def_errors = le16_to_cpu(es->s_errors);
@@ -436,6 +438,8 @@ static int parse_options(char *options, struct super_block *sb)
 	struct ext2_sb_info *sbi = EXT2_SB(sb);
 	substring_t args[MAX_OPT_ARGS];
 	int option;
+	kuid_t uid;
+	kgid_t gid;
 
 	if (!options)
 		return 1;
@@ -462,12 +466,23 @@ static int parse_options(char *options, struct super_block *sb)
 		case Opt_resuid:
 			if (match_int(&args[0], &option))
 				return 0;
-			sbi->s_resuid = option;
+			uid = make_kuid(current_user_ns(), option);
+			if (!uid_valid(uid)) {
+				ext2_msg(sb, KERN_ERR, "Invalid uid value %d", option);
+				return -1;
+
+			}
+			sbi->s_resuid = uid;
 			break;
 		case Opt_resgid:
 			if (match_int(&args[0], &option))
 				return 0;
-			sbi->s_resgid = option;
+			gid = make_kgid(current_user_ns(), option);
+			if (!gid_valid(gid)) {
+				ext2_msg(sb, KERN_ERR, "Invalid gid value %d", option);
+				return -1;
+			}
+			sbi->s_resgid = gid;
 			break;
 		case Opt_sb:
 			/* handled by get_sb_block() instead of here */
@@ -841,8 +856,8 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
 	else
 		set_opt(sbi->s_mount_opt, ERRORS_RO);
 
-	sbi->s_resuid = le16_to_cpu(es->s_def_resuid);
-	sbi->s_resgid = le16_to_cpu(es->s_def_resgid);
+	sbi->s_resuid = make_kuid(&init_user_ns, le16_to_cpu(es->s_def_resuid));
+	sbi->s_resgid = make_kgid(&init_user_ns, le16_to_cpu(es->s_def_resgid));
 	
 	set_opt(sbi->s_mount_opt, RESERVATION);
 
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 35/43] userns: Convert ext2 to use kuid/kgid where appropriate.
@ 2012-04-08  5:15     ` "Eric W. Beiderman
  0 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/ext2/balloc.c |    5 +++--
 fs/ext2/ext2.h   |    8 ++++----
 fs/ext2/inode.c  |   20 ++++++++++++--------
 fs/ext2/super.c  |   31 +++++++++++++++++++++++--------
 4 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c
index a8cbe1b..030c6d2 100644
--- a/fs/ext2/balloc.c
+++ b/fs/ext2/balloc.c
@@ -1193,8 +1193,9 @@ static int ext2_has_free_blocks(struct ext2_sb_info *sbi)
 	free_blocks = percpu_counter_read_positive(&sbi->s_freeblocks_counter);
 	root_blocks = le32_to_cpu(sbi->s_es->s_r_blocks_count);
 	if (free_blocks < root_blocks + 1 && !capable(CAP_SYS_RESOURCE) &&
-		sbi->s_resuid != current_fsuid() &&
-		(sbi->s_resgid == 0 || !in_group_p (sbi->s_resgid))) {
+		!uid_eq(sbi->s_resuid, current_fsuid()) &&
+		(gid_eq(sbi->s_resgid, GLOBAL_ROOT_GID) ||
+		 !in_group_p (sbi->s_resgid))) {
 		return 0;
 	}
 	return 1;
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 0b2b4db..d9a17d0 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -82,8 +82,8 @@ struct ext2_sb_info {
 	struct buffer_head ** s_group_desc;
 	unsigned long  s_mount_opt;
 	unsigned long s_sb_block;
-	uid_t s_resuid;
-	gid_t s_resgid;
+	kuid_t s_resuid;
+	kgid_t s_resgid;
 	unsigned short s_mount_state;
 	unsigned short s_pad;
 	int s_addr_per_block_bits;
@@ -637,8 +637,8 @@ static inline void verify_offsets(void)
  */
 struct ext2_mount_options {
 	unsigned long s_mount_opt;
-	uid_t s_resuid;
-	gid_t s_resgid;
+	kuid_t s_resuid;
+	kgid_t s_resgid;
 };
 
 /*
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 740cad8..f9fa95f 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1293,6 +1293,8 @@ struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
 	struct inode *inode;
 	long ret = -EIO;
 	int n;
+	uid_t i_uid;
+	gid_t i_gid;
 
 	inode = iget_locked(sb, ino);
 	if (!inode)
@@ -1310,12 +1312,14 @@ struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
 	}
 
 	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
-	inode->i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
-	inode->i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
+	i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
+	i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
 	if (!(test_opt (inode->i_sb, NO_UID32))) {
-		inode->i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
-		inode->i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
+		i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
+		i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
 	}
+	i_uid_write(inode, i_uid);
+	i_gid_write(inode, i_gid);
 	set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));
 	inode->i_size = le32_to_cpu(raw_inode->i_size);
 	inode->i_atime.tv_sec = (signed)le32_to_cpu(raw_inode->i_atime);
@@ -1413,8 +1417,8 @@ static int __ext2_write_inode(struct inode *inode, int do_sync)
 	struct ext2_inode_info *ei = EXT2_I(inode);
 	struct super_block *sb = inode->i_sb;
 	ino_t ino = inode->i_ino;
-	uid_t uid = inode->i_uid;
-	gid_t gid = inode->i_gid;
+	uid_t uid = i_uid_read(inode);
+	gid_t gid = i_gid_read(inode);
 	struct buffer_head * bh;
 	struct ext2_inode * raw_inode = ext2_get_inode(sb, ino, &bh);
 	int n;
@@ -1529,8 +1533,8 @@ int ext2_setattr(struct dentry *dentry, struct iattr *iattr)
 
 	if (is_quota_modification(inode, iattr))
 		dquot_initialize(inode);
-	if ((iattr->ia_valid & ATTR_UID && iattr->ia_uid != inode->i_uid) ||
-	    (iattr->ia_valid & ATTR_GID && iattr->ia_gid != inode->i_gid)) {
+	if ((iattr->ia_valid & ATTR_UID && !uid_eq(iattr->ia_uid, inode->i_uid)) ||
+	    (iattr->ia_valid & ATTR_GID && !gid_eq(iattr->ia_gid, inode->i_gid))) {
 		error = dquot_transfer(inode, iattr);
 		if (error)
 			return error;
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index e1025c7..38f8160 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -228,13 +228,15 @@ static int ext2_show_options(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",grpid");
 	if (!test_opt(sb, GRPID) && (def_mount_opts & EXT2_DEFM_BSDGROUPS))
 		seq_puts(seq, ",nogrpid");
-	if (sbi->s_resuid != EXT2_DEF_RESUID ||
+	if (!uid_eq(sbi->s_resuid, make_kuid(&init_user_ns, EXT2_DEF_RESUID)) ||
 	    le16_to_cpu(es->s_def_resuid) != EXT2_DEF_RESUID) {
-		seq_printf(seq, ",resuid=%u", sbi->s_resuid);
+		seq_printf(seq, ",resuid=%u",
+				from_kuid_munged(&init_user_ns, sbi->s_resuid));
 	}
-	if (sbi->s_resgid != EXT2_DEF_RESGID ||
+	if (!gid_eq(sbi->s_resgid, make_kgid(&init_user_ns, EXT2_DEF_RESGID)) ||
 	    le16_to_cpu(es->s_def_resgid) != EXT2_DEF_RESGID) {
-		seq_printf(seq, ",resgid=%u", sbi->s_resgid);
+		seq_printf(seq, ",resgid=%u",
+				from_kgid_munged(&init_user_ns, sbi->s_resgid));
 	}
 	if (test_opt(sb, ERRORS_RO)) {
 		int def_errors = le16_to_cpu(es->s_errors);
@@ -436,6 +438,8 @@ static int parse_options(char *options, struct super_block *sb)
 	struct ext2_sb_info *sbi = EXT2_SB(sb);
 	substring_t args[MAX_OPT_ARGS];
 	int option;
+	kuid_t uid;
+	kgid_t gid;
 
 	if (!options)
 		return 1;
@@ -462,12 +466,23 @@ static int parse_options(char *options, struct super_block *sb)
 		case Opt_resuid:
 			if (match_int(&args[0], &option))
 				return 0;
-			sbi->s_resuid = option;
+			uid = make_kuid(current_user_ns(), option);
+			if (!uid_valid(uid)) {
+				ext2_msg(sb, KERN_ERR, "Invalid uid value %d", option);
+				return -1;
+
+			}
+			sbi->s_resuid = uid;
 			break;
 		case Opt_resgid:
 			if (match_int(&args[0], &option))
 				return 0;
-			sbi->s_resgid = option;
+			gid = make_kgid(current_user_ns(), option);
+			if (!gid_valid(gid)) {
+				ext2_msg(sb, KERN_ERR, "Invalid gid value %d", option);
+				return -1;
+			}
+			sbi->s_resgid = gid;
 			break;
 		case Opt_sb:
 			/* handled by get_sb_block() instead of here */
@@ -841,8 +856,8 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
 	else
 		set_opt(sbi->s_mount_opt, ERRORS_RO);
 
-	sbi->s_resuid = le16_to_cpu(es->s_def_resuid);
-	sbi->s_resgid = le16_to_cpu(es->s_def_resgid);
+	sbi->s_resuid = make_kuid(&init_user_ns, le16_to_cpu(es->s_def_resuid));
+	sbi->s_resgid = make_kgid(&init_user_ns, le16_to_cpu(es->s_def_resgid));
 	
 	set_opt(sbi->s_mount_opt, RESERVATION);
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 36/43] userns: Convert ext3 to use kuid/kgid where appropriate
       [not found] ` <m11unyn70b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
                     ` (34 preceding siblings ...)
  2012-04-08  5:15     ` "Eric W. Beiderman
@ 2012-04-08  5:15   ` "Eric W. Beiderman
  2012-04-08  5:15   ` [PATCH 37/43] userns: Convert ext4 to user " "Eric W. Beiderman
                     ` (9 subsequent siblings)
  45 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/ext3/balloc.c            |    5 +++--
 fs/ext3/ext3.h              |    8 ++++----
 fs/ext3/inode.c             |   32 ++++++++++++++++++++------------
 fs/ext3/super.c             |   35 +++++++++++++++++++++++++----------
 include/trace/events/ext3.h |    4 ++--
 5 files changed, 54 insertions(+), 30 deletions(-)

diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
index baac1b1..25cd608 100644
--- a/fs/ext3/balloc.c
+++ b/fs/ext3/balloc.c
@@ -1439,8 +1439,9 @@ static int ext3_has_free_blocks(struct ext3_sb_info *sbi, int use_reservation)
 	free_blocks = percpu_counter_read_positive(&sbi->s_freeblocks_counter);
 	root_blocks = le32_to_cpu(sbi->s_es->s_r_blocks_count);
 	if (free_blocks < root_blocks + 1 && !capable(CAP_SYS_RESOURCE) &&
-		!use_reservation && sbi->s_resuid != current_fsuid() &&
-		(sbi->s_resgid == 0 || !in_group_p (sbi->s_resgid))) {
+		!use_reservation && !uid_eq(sbi->s_resuid, current_fsuid()) &&
+		(gid_eq(sbi->s_resgid, GLOBAL_ROOT_GID) ||
+		 !in_group_p (sbi->s_resgid))) {
 		return 0;
 	}
 	return 1;
diff --git a/fs/ext3/ext3.h b/fs/ext3/ext3.h
index b6515fd..7977973 100644
--- a/fs/ext3/ext3.h
+++ b/fs/ext3/ext3.h
@@ -243,8 +243,8 @@ struct ext3_new_group_data {
  */
 struct ext3_mount_options {
 	unsigned long s_mount_opt;
-	uid_t s_resuid;
-	gid_t s_resgid;
+	kuid_t s_resuid;
+	kgid_t s_resgid;
 	unsigned long s_commit_interval;
 #ifdef CONFIG_QUOTA
 	int s_jquota_fmt;
@@ -637,8 +637,8 @@ struct ext3_sb_info {
 	struct buffer_head ** s_group_desc;
 	unsigned long  s_mount_opt;
 	ext3_fsblk_t s_sb_block;
-	uid_t s_resuid;
-	gid_t s_resgid;
+	kuid_t s_resuid;
+	kgid_t s_resgid;
 	unsigned short s_mount_state;
 	unsigned short s_pad;
 	int s_addr_per_block_bits;
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 10d7812..a09790a 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -2891,6 +2891,8 @@ struct inode *ext3_iget(struct super_block *sb, unsigned long ino)
 	transaction_t *transaction;
 	long ret;
 	int block;
+	uid_t i_uid;
+	gid_t i_gid;
 
 	inode = iget_locked(sb, ino);
 	if (!inode)
@@ -2907,12 +2909,14 @@ struct inode *ext3_iget(struct super_block *sb, unsigned long ino)
 	bh = iloc.bh;
 	raw_inode = ext3_raw_inode(&iloc);
 	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
-	inode->i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
-	inode->i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
+	i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
+	i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
 	if(!(test_opt (inode->i_sb, NO_UID32))) {
-		inode->i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
-		inode->i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
+		i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
+		i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
 	}
+	i_uid_write(inode, i_uid);
+	i_gid_write(inode, i_gid);
 	set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));
 	inode->i_size = le32_to_cpu(raw_inode->i_size);
 	inode->i_atime.tv_sec = (signed)le32_to_cpu(raw_inode->i_atime);
@@ -3068,6 +3072,8 @@ static int ext3_do_update_inode(handle_t *handle,
 	struct ext3_inode_info *ei = EXT3_I(inode);
 	struct buffer_head *bh = iloc->bh;
 	int err = 0, rc, block;
+	uid_t i_uid;
+	gid_t i_gid;
 
 again:
 	/* we can't allow multiple procs in here at once, its a bit racey */
@@ -3080,27 +3086,29 @@ again:
 
 	ext3_get_inode_flags(ei);
 	raw_inode->i_mode = cpu_to_le16(inode->i_mode);
+	i_uid = i_uid_read(inode);
+	i_gid = i_gid_read(inode);
 	if(!(test_opt(inode->i_sb, NO_UID32))) {
-		raw_inode->i_uid_low = cpu_to_le16(low_16_bits(inode->i_uid));
-		raw_inode->i_gid_low = cpu_to_le16(low_16_bits(inode->i_gid));
+		raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
+		raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
 /*
  * Fix up interoperability with old kernels. Otherwise, old inodes get
  * re-used with the upper 16 bits of the uid/gid intact
  */
 		if(!ei->i_dtime) {
 			raw_inode->i_uid_high =
-				cpu_to_le16(high_16_bits(inode->i_uid));
+				cpu_to_le16(high_16_bits(i_uid));
 			raw_inode->i_gid_high =
-				cpu_to_le16(high_16_bits(inode->i_gid));
+				cpu_to_le16(high_16_bits(i_gid));
 		} else {
 			raw_inode->i_uid_high = 0;
 			raw_inode->i_gid_high = 0;
 		}
 	} else {
 		raw_inode->i_uid_low =
-			cpu_to_le16(fs_high2lowuid(inode->i_uid));
+			cpu_to_le16(fs_high2lowuid(i_uid));
 		raw_inode->i_gid_low =
-			cpu_to_le16(fs_high2lowgid(inode->i_gid));
+			cpu_to_le16(fs_high2lowgid(i_gid));
 		raw_inode->i_uid_high = 0;
 		raw_inode->i_gid_high = 0;
 	}
@@ -3262,8 +3270,8 @@ int ext3_setattr(struct dentry *dentry, struct iattr *attr)
 
 	if (is_quota_modification(inode, attr))
 		dquot_initialize(inode);
-	if ((ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid) ||
-		(ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid)) {
+	if ((ia_valid & ATTR_UID && !uid_eq(attr->ia_uid, inode->i_uid)) ||
+	    (ia_valid & ATTR_GID && !gid_eq(attr->ia_gid, inode->i_gid))) {
 		handle_t *handle;
 
 		/* (user+group)*(old+new) structure, inode write (sb,
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index cf0b592..94ef7e6 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -617,13 +617,15 @@ static int ext3_show_options(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",grpid");
 	if (!test_opt(sb, GRPID) && (def_mount_opts & EXT3_DEFM_BSDGROUPS))
 		seq_puts(seq, ",nogrpid");
-	if (sbi->s_resuid != EXT3_DEF_RESUID ||
+	if (!uid_eq(sbi->s_resuid, make_kuid(&init_user_ns, EXT3_DEF_RESUID)) ||
 	    le16_to_cpu(es->s_def_resuid) != EXT3_DEF_RESUID) {
-		seq_printf(seq, ",resuid=%u", sbi->s_resuid);
+		seq_printf(seq, ",resuid=%u",
+				from_kuid_munged(&init_user_ns, sbi->s_resuid));
 	}
-	if (sbi->s_resgid != EXT3_DEF_RESGID ||
+	if (!gid_eq(sbi->s_resgid, make_kgid(&init_user_ns, EXT3_DEF_RESGID)) ||
 	    le16_to_cpu(es->s_def_resgid) != EXT3_DEF_RESGID) {
-		seq_printf(seq, ",resgid=%u", sbi->s_resgid);
+		seq_printf(seq, ",resgid=%u",
+				from_kgid_munged(&init_user_ns, sbi->s_resgid));
 	}
 	if (test_opt(sb, ERRORS_RO)) {
 		int def_errors = le16_to_cpu(es->s_errors);
@@ -967,6 +969,8 @@ static int parse_options (char *options, struct super_block *sb,
 	substring_t args[MAX_OPT_ARGS];
 	int data_opt = 0;
 	int option;
+	kuid_t uid;
+	kgid_t gid;
 #ifdef CONFIG_QUOTA
 	int qfmt;
 #endif
@@ -1000,12 +1004,23 @@ static int parse_options (char *options, struct super_block *sb,
 		case Opt_resuid:
 			if (match_int(&args[0], &option))
 				return 0;
-			sbi->s_resuid = option;
+			uid = make_kuid(current_user_ns(), option);
+			if (!uid_valid(uid)) {
+				ext3_msg(sb, KERN_ERR, "Invalid uid value %d", option);
+				return -1;
+
+			}
+			sbi->s_resuid = uid;
 			break;
 		case Opt_resgid:
 			if (match_int(&args[0], &option))
 				return 0;
-			sbi->s_resgid = option;
+			gid = make_kgid(current_user_ns(), option);
+			if (!gid_valid(gid)) {
+				ext3_msg(sb, KERN_ERR, "Invalid gid value %d", option);
+				return -1;
+			}
+			sbi->s_resgid = gid;
 			break;
 		case Opt_sb:
 			/* handled by get_sb_block() instead of here */
@@ -1651,8 +1666,8 @@ static int ext3_fill_super (struct super_block *sb, void *data, int silent)
 	}
 	sb->s_fs_info = sbi;
 	sbi->s_mount_opt = 0;
-	sbi->s_resuid = EXT3_DEF_RESUID;
-	sbi->s_resgid = EXT3_DEF_RESGID;
+	sbi->s_resuid = make_kuid(&init_user_ns, EXT3_DEF_RESUID);
+	sbi->s_resgid = make_kgid(&init_user_ns, EXT3_DEF_RESGID);
 	sbi->s_sb_block = sb_block;
 
 	blocksize = sb_min_blocksize(sb, EXT3_MIN_BLOCK_SIZE);
@@ -1716,8 +1731,8 @@ static int ext3_fill_super (struct super_block *sb, void *data, int silent)
 	else
 		set_opt(sbi->s_mount_opt, ERRORS_RO);
 
-	sbi->s_resuid = le16_to_cpu(es->s_def_resuid);
-	sbi->s_resgid = le16_to_cpu(es->s_def_resgid);
+	sbi->s_resuid = make_kuid(&init_user_ns, le16_to_cpu(es->s_def_resuid));
+	sbi->s_resgid = make_kgid(&init_user_ns, le16_to_cpu(es->s_def_resgid));
 
 	/* enable barriers by default */
 	set_opt(sbi->s_mount_opt, BARRIER);
diff --git a/include/trace/events/ext3.h b/include/trace/events/ext3.h
index 7b53c05..15d11a3 100644
--- a/include/trace/events/ext3.h
+++ b/include/trace/events/ext3.h
@@ -24,8 +24,8 @@ TRACE_EVENT(ext3_free_inode,
 		__entry->dev	= inode->i_sb->s_dev;
 		__entry->ino	= inode->i_ino;
 		__entry->mode	= inode->i_mode;
-		__entry->uid	= inode->i_uid;
-		__entry->gid	= inode->i_gid;
+		__entry->uid	= i_uid_read(inode);
+		__entry->gid	= i_gid_read(inode);
 		__entry->blocks	= inode->i_blocks;
 	),
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 36/43] userns: Convert ext3 to use kuid/kgid where appropriate
  2012-04-08  5:10 ` Eric W. Biederman
                   ` (4 preceding siblings ...)
  (?)
@ 2012-04-08  5:15 ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/ext3/balloc.c            |    5 +++--
 fs/ext3/ext3.h              |    8 ++++----
 fs/ext3/inode.c             |   32 ++++++++++++++++++++------------
 fs/ext3/super.c             |   35 +++++++++++++++++++++++++----------
 include/trace/events/ext3.h |    4 ++--
 5 files changed, 54 insertions(+), 30 deletions(-)

diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
index baac1b1..25cd608 100644
--- a/fs/ext3/balloc.c
+++ b/fs/ext3/balloc.c
@@ -1439,8 +1439,9 @@ static int ext3_has_free_blocks(struct ext3_sb_info *sbi, int use_reservation)
 	free_blocks = percpu_counter_read_positive(&sbi->s_freeblocks_counter);
 	root_blocks = le32_to_cpu(sbi->s_es->s_r_blocks_count);
 	if (free_blocks < root_blocks + 1 && !capable(CAP_SYS_RESOURCE) &&
-		!use_reservation && sbi->s_resuid != current_fsuid() &&
-		(sbi->s_resgid == 0 || !in_group_p (sbi->s_resgid))) {
+		!use_reservation && !uid_eq(sbi->s_resuid, current_fsuid()) &&
+		(gid_eq(sbi->s_resgid, GLOBAL_ROOT_GID) ||
+		 !in_group_p (sbi->s_resgid))) {
 		return 0;
 	}
 	return 1;
diff --git a/fs/ext3/ext3.h b/fs/ext3/ext3.h
index b6515fd..7977973 100644
--- a/fs/ext3/ext3.h
+++ b/fs/ext3/ext3.h
@@ -243,8 +243,8 @@ struct ext3_new_group_data {
  */
 struct ext3_mount_options {
 	unsigned long s_mount_opt;
-	uid_t s_resuid;
-	gid_t s_resgid;
+	kuid_t s_resuid;
+	kgid_t s_resgid;
 	unsigned long s_commit_interval;
 #ifdef CONFIG_QUOTA
 	int s_jquota_fmt;
@@ -637,8 +637,8 @@ struct ext3_sb_info {
 	struct buffer_head ** s_group_desc;
 	unsigned long  s_mount_opt;
 	ext3_fsblk_t s_sb_block;
-	uid_t s_resuid;
-	gid_t s_resgid;
+	kuid_t s_resuid;
+	kgid_t s_resgid;
 	unsigned short s_mount_state;
 	unsigned short s_pad;
 	int s_addr_per_block_bits;
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 10d7812..a09790a 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -2891,6 +2891,8 @@ struct inode *ext3_iget(struct super_block *sb, unsigned long ino)
 	transaction_t *transaction;
 	long ret;
 	int block;
+	uid_t i_uid;
+	gid_t i_gid;
 
 	inode = iget_locked(sb, ino);
 	if (!inode)
@@ -2907,12 +2909,14 @@ struct inode *ext3_iget(struct super_block *sb, unsigned long ino)
 	bh = iloc.bh;
 	raw_inode = ext3_raw_inode(&iloc);
 	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
-	inode->i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
-	inode->i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
+	i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
+	i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
 	if(!(test_opt (inode->i_sb, NO_UID32))) {
-		inode->i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
-		inode->i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
+		i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
+		i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
 	}
+	i_uid_write(inode, i_uid);
+	i_gid_write(inode, i_gid);
 	set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));
 	inode->i_size = le32_to_cpu(raw_inode->i_size);
 	inode->i_atime.tv_sec = (signed)le32_to_cpu(raw_inode->i_atime);
@@ -3068,6 +3072,8 @@ static int ext3_do_update_inode(handle_t *handle,
 	struct ext3_inode_info *ei = EXT3_I(inode);
 	struct buffer_head *bh = iloc->bh;
 	int err = 0, rc, block;
+	uid_t i_uid;
+	gid_t i_gid;
 
 again:
 	/* we can't allow multiple procs in here at once, its a bit racey */
@@ -3080,27 +3086,29 @@ again:
 
 	ext3_get_inode_flags(ei);
 	raw_inode->i_mode = cpu_to_le16(inode->i_mode);
+	i_uid = i_uid_read(inode);
+	i_gid = i_gid_read(inode);
 	if(!(test_opt(inode->i_sb, NO_UID32))) {
-		raw_inode->i_uid_low = cpu_to_le16(low_16_bits(inode->i_uid));
-		raw_inode->i_gid_low = cpu_to_le16(low_16_bits(inode->i_gid));
+		raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
+		raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
 /*
  * Fix up interoperability with old kernels. Otherwise, old inodes get
  * re-used with the upper 16 bits of the uid/gid intact
  */
 		if(!ei->i_dtime) {
 			raw_inode->i_uid_high =
-				cpu_to_le16(high_16_bits(inode->i_uid));
+				cpu_to_le16(high_16_bits(i_uid));
 			raw_inode->i_gid_high =
-				cpu_to_le16(high_16_bits(inode->i_gid));
+				cpu_to_le16(high_16_bits(i_gid));
 		} else {
 			raw_inode->i_uid_high = 0;
 			raw_inode->i_gid_high = 0;
 		}
 	} else {
 		raw_inode->i_uid_low =
-			cpu_to_le16(fs_high2lowuid(inode->i_uid));
+			cpu_to_le16(fs_high2lowuid(i_uid));
 		raw_inode->i_gid_low =
-			cpu_to_le16(fs_high2lowgid(inode->i_gid));
+			cpu_to_le16(fs_high2lowgid(i_gid));
 		raw_inode->i_uid_high = 0;
 		raw_inode->i_gid_high = 0;
 	}
@@ -3262,8 +3270,8 @@ int ext3_setattr(struct dentry *dentry, struct iattr *attr)
 
 	if (is_quota_modification(inode, attr))
 		dquot_initialize(inode);
-	if ((ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid) ||
-		(ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid)) {
+	if ((ia_valid & ATTR_UID && !uid_eq(attr->ia_uid, inode->i_uid)) ||
+	    (ia_valid & ATTR_GID && !gid_eq(attr->ia_gid, inode->i_gid))) {
 		handle_t *handle;
 
 		/* (user+group)*(old+new) structure, inode write (sb,
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index cf0b592..94ef7e6 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -617,13 +617,15 @@ static int ext3_show_options(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",grpid");
 	if (!test_opt(sb, GRPID) && (def_mount_opts & EXT3_DEFM_BSDGROUPS))
 		seq_puts(seq, ",nogrpid");
-	if (sbi->s_resuid != EXT3_DEF_RESUID ||
+	if (!uid_eq(sbi->s_resuid, make_kuid(&init_user_ns, EXT3_DEF_RESUID)) ||
 	    le16_to_cpu(es->s_def_resuid) != EXT3_DEF_RESUID) {
-		seq_printf(seq, ",resuid=%u", sbi->s_resuid);
+		seq_printf(seq, ",resuid=%u",
+				from_kuid_munged(&init_user_ns, sbi->s_resuid));
 	}
-	if (sbi->s_resgid != EXT3_DEF_RESGID ||
+	if (!gid_eq(sbi->s_resgid, make_kgid(&init_user_ns, EXT3_DEF_RESGID)) ||
 	    le16_to_cpu(es->s_def_resgid) != EXT3_DEF_RESGID) {
-		seq_printf(seq, ",resgid=%u", sbi->s_resgid);
+		seq_printf(seq, ",resgid=%u",
+				from_kgid_munged(&init_user_ns, sbi->s_resgid));
 	}
 	if (test_opt(sb, ERRORS_RO)) {
 		int def_errors = le16_to_cpu(es->s_errors);
@@ -967,6 +969,8 @@ static int parse_options (char *options, struct super_block *sb,
 	substring_t args[MAX_OPT_ARGS];
 	int data_opt = 0;
 	int option;
+	kuid_t uid;
+	kgid_t gid;
 #ifdef CONFIG_QUOTA
 	int qfmt;
 #endif
@@ -1000,12 +1004,23 @@ static int parse_options (char *options, struct super_block *sb,
 		case Opt_resuid:
 			if (match_int(&args[0], &option))
 				return 0;
-			sbi->s_resuid = option;
+			uid = make_kuid(current_user_ns(), option);
+			if (!uid_valid(uid)) {
+				ext3_msg(sb, KERN_ERR, "Invalid uid value %d", option);
+				return -1;
+
+			}
+			sbi->s_resuid = uid;
 			break;
 		case Opt_resgid:
 			if (match_int(&args[0], &option))
 				return 0;
-			sbi->s_resgid = option;
+			gid = make_kgid(current_user_ns(), option);
+			if (!gid_valid(gid)) {
+				ext3_msg(sb, KERN_ERR, "Invalid gid value %d", option);
+				return -1;
+			}
+			sbi->s_resgid = gid;
 			break;
 		case Opt_sb:
 			/* handled by get_sb_block() instead of here */
@@ -1651,8 +1666,8 @@ static int ext3_fill_super (struct super_block *sb, void *data, int silent)
 	}
 	sb->s_fs_info = sbi;
 	sbi->s_mount_opt = 0;
-	sbi->s_resuid = EXT3_DEF_RESUID;
-	sbi->s_resgid = EXT3_DEF_RESGID;
+	sbi->s_resuid = make_kuid(&init_user_ns, EXT3_DEF_RESUID);
+	sbi->s_resgid = make_kgid(&init_user_ns, EXT3_DEF_RESGID);
 	sbi->s_sb_block = sb_block;
 
 	blocksize = sb_min_blocksize(sb, EXT3_MIN_BLOCK_SIZE);
@@ -1716,8 +1731,8 @@ static int ext3_fill_super (struct super_block *sb, void *data, int silent)
 	else
 		set_opt(sbi->s_mount_opt, ERRORS_RO);
 
-	sbi->s_resuid = le16_to_cpu(es->s_def_resuid);
-	sbi->s_resgid = le16_to_cpu(es->s_def_resgid);
+	sbi->s_resuid = make_kuid(&init_user_ns, le16_to_cpu(es->s_def_resuid));
+	sbi->s_resgid = make_kgid(&init_user_ns, le16_to_cpu(es->s_def_resgid));
 
 	/* enable barriers by default */
 	set_opt(sbi->s_mount_opt, BARRIER);
diff --git a/include/trace/events/ext3.h b/include/trace/events/ext3.h
index 7b53c05..15d11a3 100644
--- a/include/trace/events/ext3.h
+++ b/include/trace/events/ext3.h
@@ -24,8 +24,8 @@ TRACE_EVENT(ext3_free_inode,
 		__entry->dev	= inode->i_sb->s_dev;
 		__entry->ino	= inode->i_ino;
 		__entry->mode	= inode->i_mode;
-		__entry->uid	= inode->i_uid;
-		__entry->gid	= inode->i_gid;
+		__entry->uid	= i_uid_read(inode);
+		__entry->gid	= i_gid_read(inode);
 		__entry->blocks	= inode->i_blocks;
 	),
 
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 37/43] userns: Convert ext4 to user kuid/kgid where appropriate
       [not found] ` <m11unyn70b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
                     ` (35 preceding siblings ...)
  2012-04-08  5:15   ` [PATCH 36/43] userns: Convert ext3 " "Eric W. Beiderman
@ 2012-04-08  5:15   ` "Eric W. Beiderman
  2012-04-08  5:15   ` [PATCH 38/43] userns: Convert proc to use " "Eric W. Beiderman
                     ` (8 subsequent siblings)
  45 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/ext4/balloc.c            |    4 ++--
 fs/ext4/ext4.h              |    4 ++--
 fs/ext4/ialloc.c            |    4 ++--
 fs/ext4/inode.c             |   34 ++++++++++++++++++++--------------
 fs/ext4/migrate.c           |    4 ++--
 fs/ext4/super.c             |   38 ++++++++++++++++++++++++++------------
 include/trace/events/ext4.h |    4 ++--
 7 files changed, 56 insertions(+), 36 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 4bbd07a..c45c411 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -461,8 +461,8 @@ static int ext4_has_free_clusters(struct ext4_sb_info *sbi,
 		return 1;
 
 	/* Hm, nope.  Are (enough) root reserved clusters available? */
-	if (sbi->s_resuid == current_fsuid() ||
-	    ((sbi->s_resgid != 0) && in_group_p(sbi->s_resgid)) ||
+	if (uid_eq(sbi->s_resuid, current_fsuid()) ||
+	    (!gid_eq(sbi->s_resgid, GLOBAL_ROOT_GID) && in_group_p(sbi->s_resgid)) ||
 	    capable(CAP_SYS_RESOURCE) ||
 		(flags & EXT4_MB_USE_ROOT_BLOCKS)) {
 
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index ab2594a..0b4aeb2 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1153,8 +1153,8 @@ struct ext4_sb_info {
 	unsigned int s_mount_flags;
 	unsigned int s_def_mount_opt;
 	ext4_fsblk_t s_sb_block;
-	uid_t s_resuid;
-	gid_t s_resgid;
+	kuid_t s_resuid;
+	kgid_t s_resgid;
 	unsigned short s_mount_state;
 	unsigned short s_pad;
 	int s_addr_per_block_bits;
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 409c2ee..9f9acac 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -808,8 +808,8 @@ got:
 	}
 	if (owner) {
 		inode->i_mode = mode;
-		inode->i_uid = owner[0];
-		inode->i_gid = owner[1];
+		i_uid_write(inode, owner[0]);
+		i_gid_write(inode, owner[1]);
 	} else if (test_opt(sb, GRPID)) {
 		inode->i_mode = mode;
 		inode->i_uid = current_fsuid();
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c77b0bd..07eaf56 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3630,6 +3630,8 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
 	journal_t *journal = EXT4_SB(sb)->s_journal;
 	long ret;
 	int block;
+	uid_t i_uid;
+	gid_t i_gid;
 
 	inode = iget_locked(sb, ino);
 	if (!inode)
@@ -3645,12 +3647,14 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
 		goto bad_inode;
 	raw_inode = ext4_raw_inode(&iloc);
 	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
-	inode->i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
-	inode->i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
+	i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
+	i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
 	if (!(test_opt(inode->i_sb, NO_UID32))) {
-		inode->i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
-		inode->i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
+		i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
+		i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
 	}
+	i_uid_write(inode, i_uid);
+	i_gid_write(inode, i_gid);
 	set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));
 
 	ext4_clear_state_flags(ei);	/* Only relevant on 32-bit archs */
@@ -3870,6 +3874,8 @@ static int ext4_do_update_inode(handle_t *handle,
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	struct buffer_head *bh = iloc->bh;
 	int err = 0, rc, block;
+	uid_t i_uid;
+	gid_t i_gid;
 
 	/* For fields not not tracking in the in-memory inode,
 	 * initialise them to zero for new inodes. */
@@ -3878,27 +3884,27 @@ static int ext4_do_update_inode(handle_t *handle,
 
 	ext4_get_inode_flags(ei);
 	raw_inode->i_mode = cpu_to_le16(inode->i_mode);
+	i_uid = i_uid_read(inode);
+	i_gid = i_gid_read(inode);
 	if (!(test_opt(inode->i_sb, NO_UID32))) {
-		raw_inode->i_uid_low = cpu_to_le16(low_16_bits(inode->i_uid));
-		raw_inode->i_gid_low = cpu_to_le16(low_16_bits(inode->i_gid));
+		raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
+		raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
 /*
  * Fix up interoperability with old kernels. Otherwise, old inodes get
  * re-used with the upper 16 bits of the uid/gid intact
  */
 		if (!ei->i_dtime) {
 			raw_inode->i_uid_high =
-				cpu_to_le16(high_16_bits(inode->i_uid));
+				cpu_to_le16(high_16_bits(i_uid));
 			raw_inode->i_gid_high =
-				cpu_to_le16(high_16_bits(inode->i_gid));
+				cpu_to_le16(high_16_bits(i_gid));
 		} else {
 			raw_inode->i_uid_high = 0;
 			raw_inode->i_gid_high = 0;
 		}
 	} else {
-		raw_inode->i_uid_low =
-			cpu_to_le16(fs_high2lowuid(inode->i_uid));
-		raw_inode->i_gid_low =
-			cpu_to_le16(fs_high2lowgid(inode->i_gid));
+		raw_inode->i_uid_low = cpu_to_le16(fs_high2lowuid(i_uid));
+		raw_inode->i_gid_low = cpu_to_le16(fs_high2lowgid(i_gid));
 		raw_inode->i_uid_high = 0;
 		raw_inode->i_gid_high = 0;
 	}
@@ -4084,8 +4090,8 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
 
 	if (is_quota_modification(inode, attr))
 		dquot_initialize(inode);
-	if ((ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid) ||
-		(ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid)) {
+	if ((ia_valid & ATTR_UID && !uid_eq(attr->ia_uid, inode->i_uid)) ||
+	    (ia_valid & ATTR_GID && !gid_eq(attr->ia_gid, inode->i_gid))) {
 		handle_t *handle;
 
 		/* (user+group)*(old+new) structure, inode write (sb,
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index f39f80f..f1bb32e 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -466,8 +466,8 @@ int ext4_ext_migrate(struct inode *inode)
 	}
 	goal = (((inode->i_ino - 1) / EXT4_INODES_PER_GROUP(inode->i_sb)) *
 		EXT4_INODES_PER_GROUP(inode->i_sb)) + 1;
-	owner[0] = inode->i_uid;
-	owner[1] = inode->i_gid;
+	owner[0] = i_uid_read(inode);
+	owner[1] = i_gid_read(inode);
 	tmp_inode = ext4_new_inode(handle, inode->i_sb->s_root->d_inode,
 				   S_IFREG, NULL, goal, owner);
 	if (IS_ERR(tmp_inode)) {
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index ceebaf8..9d8eba0 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1448,6 +1448,8 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
 {
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	const struct mount_opts *m;
+	kuid_t uid;
+	kgid_t gid;
 	int arg = 0;
 
 	if (args->from && match_int(args, &arg))
@@ -1464,10 +1466,20 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
 			 "Ignoring removed %s option", opt);
 		return 1;
 	case Opt_resuid:
-		sbi->s_resuid = arg;
+		uid = make_kuid(current_user_ns(), arg);
+		if (!uid_valid(uid)) {
+			ext4_msg(sb, KERN_ERR, "Invalid uid value %d", arg);
+			return -1;
+		}
+		sbi->s_resuid = uid;
 		return 1;
 	case Opt_resgid:
-		sbi->s_resgid = arg;
+		gid = make_kgid(current_user_ns(), arg);
+		if (!gid_valid(gid)) {
+			ext4_msg(sb, KERN_ERR, "Invalid gid value %d", arg);
+			return -1;
+		}
+		sbi->s_resgid = gid;
 		return 1;
 	case Opt_abort:
 		sbi->s_mount_flags |= EXT4_MF_FS_ABORTED;
@@ -1732,12 +1744,14 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
 		SEQ_OPTS_PRINT("%s", token2str(m->token));
 	}
 
-	if (nodefs || sbi->s_resuid != EXT4_DEF_RESUID ||
+	if (nodefs || !uid_eq(sbi->s_resuid, make_kuid(&init_user_ns, EXT4_DEF_RESUID)) ||
 	    le16_to_cpu(es->s_def_resuid) != EXT4_DEF_RESUID)
-		SEQ_OPTS_PRINT("resuid=%u", sbi->s_resuid);
-	if (nodefs || sbi->s_resgid != EXT4_DEF_RESGID ||
+		SEQ_OPTS_PRINT("resuid=%u",
+				from_kuid_munged(&init_user_ns, sbi->s_resuid));
+	if (nodefs || !gid_eq(sbi->s_resgid, make_kgid(&init_user_ns, EXT4_DEF_RESGID)) ||
 	    le16_to_cpu(es->s_def_resgid) != EXT4_DEF_RESGID)
-		SEQ_OPTS_PRINT("resgid=%u", sbi->s_resgid);
+		SEQ_OPTS_PRINT("resgid=%u",
+				from_kgid_munged(&init_user_ns, sbi->s_resgid));
 	def_errors = nodefs ? -1 : le16_to_cpu(es->s_errors);
 	if (test_opt(sb, ERRORS_RO) && def_errors != EXT4_ERRORS_RO)
 		SEQ_OPTS_PUTS("errors=remount-ro");
@@ -2996,8 +3010,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	}
 	sb->s_fs_info = sbi;
 	sbi->s_mount_opt = 0;
-	sbi->s_resuid = EXT4_DEF_RESUID;
-	sbi->s_resgid = EXT4_DEF_RESGID;
+	sbi->s_resuid = make_kuid(&init_user_ns, EXT4_DEF_RESUID);
+	sbi->s_resgid = make_kgid(&init_user_ns, EXT4_DEF_RESGID);
 	sbi->s_inode_readahead_blks = EXT4_DEF_INODE_READAHEAD_BLKS;
 	sbi->s_sb_block = sb_block;
 	if (sb->s_bdev->bd_part)
@@ -3076,8 +3090,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	if (def_mount_opts & EXT4_DEFM_DISCARD)
 		set_opt(sb, DISCARD);
 
-	sbi->s_resuid = le16_to_cpu(es->s_def_resuid);
-	sbi->s_resgid = le16_to_cpu(es->s_def_resgid);
+	sbi->s_resuid = make_kuid(&init_user_ns, le16_to_cpu(es->s_def_resuid));
+	sbi->s_resgid = make_kgid(&init_user_ns, le16_to_cpu(es->s_def_resgid));
 	sbi->s_commit_interval = JBD2_DEFAULT_MAX_COMMIT_AGE * HZ;
 	sbi->s_min_batch_time = EXT4_DEF_MIN_BATCH_TIME;
 	sbi->s_max_batch_time = EXT4_DEF_MAX_BATCH_TIME;
@@ -4229,8 +4243,8 @@ static int ext4_unfreeze(struct super_block *sb)
 struct ext4_mount_options {
 	unsigned long s_mount_opt;
 	unsigned long s_mount_opt2;
-	uid_t s_resuid;
-	gid_t s_resgid;
+	kuid_t s_resuid;
+	kgid_t s_resgid;
 	unsigned long s_commit_interval;
 	u32 s_min_batch_time, s_max_batch_time;
 #ifdef CONFIG_QUOTA
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 319538b..69d8a69 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -36,8 +36,8 @@ TRACE_EVENT(ext4_free_inode,
 		__entry->dev	= inode->i_sb->s_dev;
 		__entry->ino	= inode->i_ino;
 		__entry->mode	= inode->i_mode;
-		__entry->uid	= inode->i_uid;
-		__entry->gid	= inode->i_gid;
+		__entry->uid	= i_uid_read(inode);
+		__entry->gid	= i_gid_read(inode);
 		__entry->blocks	= inode->i_blocks;
 	),
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 37/43] userns: Convert ext4 to user kuid/kgid where appropriate
  2012-04-08  5:10 ` Eric W. Biederman
                   ` (5 preceding siblings ...)
  (?)
@ 2012-04-08  5:15 ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/ext4/balloc.c            |    4 ++--
 fs/ext4/ext4.h              |    4 ++--
 fs/ext4/ialloc.c            |    4 ++--
 fs/ext4/inode.c             |   34 ++++++++++++++++++++--------------
 fs/ext4/migrate.c           |    4 ++--
 fs/ext4/super.c             |   38 ++++++++++++++++++++++++++------------
 include/trace/events/ext4.h |    4 ++--
 7 files changed, 56 insertions(+), 36 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 4bbd07a..c45c411 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -461,8 +461,8 @@ static int ext4_has_free_clusters(struct ext4_sb_info *sbi,
 		return 1;
 
 	/* Hm, nope.  Are (enough) root reserved clusters available? */
-	if (sbi->s_resuid == current_fsuid() ||
-	    ((sbi->s_resgid != 0) && in_group_p(sbi->s_resgid)) ||
+	if (uid_eq(sbi->s_resuid, current_fsuid()) ||
+	    (!gid_eq(sbi->s_resgid, GLOBAL_ROOT_GID) && in_group_p(sbi->s_resgid)) ||
 	    capable(CAP_SYS_RESOURCE) ||
 		(flags & EXT4_MB_USE_ROOT_BLOCKS)) {
 
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index ab2594a..0b4aeb2 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1153,8 +1153,8 @@ struct ext4_sb_info {
 	unsigned int s_mount_flags;
 	unsigned int s_def_mount_opt;
 	ext4_fsblk_t s_sb_block;
-	uid_t s_resuid;
-	gid_t s_resgid;
+	kuid_t s_resuid;
+	kgid_t s_resgid;
 	unsigned short s_mount_state;
 	unsigned short s_pad;
 	int s_addr_per_block_bits;
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 409c2ee..9f9acac 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -808,8 +808,8 @@ got:
 	}
 	if (owner) {
 		inode->i_mode = mode;
-		inode->i_uid = owner[0];
-		inode->i_gid = owner[1];
+		i_uid_write(inode, owner[0]);
+		i_gid_write(inode, owner[1]);
 	} else if (test_opt(sb, GRPID)) {
 		inode->i_mode = mode;
 		inode->i_uid = current_fsuid();
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c77b0bd..07eaf56 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3630,6 +3630,8 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
 	journal_t *journal = EXT4_SB(sb)->s_journal;
 	long ret;
 	int block;
+	uid_t i_uid;
+	gid_t i_gid;
 
 	inode = iget_locked(sb, ino);
 	if (!inode)
@@ -3645,12 +3647,14 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
 		goto bad_inode;
 	raw_inode = ext4_raw_inode(&iloc);
 	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
-	inode->i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
-	inode->i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
+	i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
+	i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
 	if (!(test_opt(inode->i_sb, NO_UID32))) {
-		inode->i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
-		inode->i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
+		i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
+		i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
 	}
+	i_uid_write(inode, i_uid);
+	i_gid_write(inode, i_gid);
 	set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));
 
 	ext4_clear_state_flags(ei);	/* Only relevant on 32-bit archs */
@@ -3870,6 +3874,8 @@ static int ext4_do_update_inode(handle_t *handle,
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	struct buffer_head *bh = iloc->bh;
 	int err = 0, rc, block;
+	uid_t i_uid;
+	gid_t i_gid;
 
 	/* For fields not not tracking in the in-memory inode,
 	 * initialise them to zero for new inodes. */
@@ -3878,27 +3884,27 @@ static int ext4_do_update_inode(handle_t *handle,
 
 	ext4_get_inode_flags(ei);
 	raw_inode->i_mode = cpu_to_le16(inode->i_mode);
+	i_uid = i_uid_read(inode);
+	i_gid = i_gid_read(inode);
 	if (!(test_opt(inode->i_sb, NO_UID32))) {
-		raw_inode->i_uid_low = cpu_to_le16(low_16_bits(inode->i_uid));
-		raw_inode->i_gid_low = cpu_to_le16(low_16_bits(inode->i_gid));
+		raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
+		raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
 /*
  * Fix up interoperability with old kernels. Otherwise, old inodes get
  * re-used with the upper 16 bits of the uid/gid intact
  */
 		if (!ei->i_dtime) {
 			raw_inode->i_uid_high =
-				cpu_to_le16(high_16_bits(inode->i_uid));
+				cpu_to_le16(high_16_bits(i_uid));
 			raw_inode->i_gid_high =
-				cpu_to_le16(high_16_bits(inode->i_gid));
+				cpu_to_le16(high_16_bits(i_gid));
 		} else {
 			raw_inode->i_uid_high = 0;
 			raw_inode->i_gid_high = 0;
 		}
 	} else {
-		raw_inode->i_uid_low =
-			cpu_to_le16(fs_high2lowuid(inode->i_uid));
-		raw_inode->i_gid_low =
-			cpu_to_le16(fs_high2lowgid(inode->i_gid));
+		raw_inode->i_uid_low = cpu_to_le16(fs_high2lowuid(i_uid));
+		raw_inode->i_gid_low = cpu_to_le16(fs_high2lowgid(i_gid));
 		raw_inode->i_uid_high = 0;
 		raw_inode->i_gid_high = 0;
 	}
@@ -4084,8 +4090,8 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
 
 	if (is_quota_modification(inode, attr))
 		dquot_initialize(inode);
-	if ((ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid) ||
-		(ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid)) {
+	if ((ia_valid & ATTR_UID && !uid_eq(attr->ia_uid, inode->i_uid)) ||
+	    (ia_valid & ATTR_GID && !gid_eq(attr->ia_gid, inode->i_gid))) {
 		handle_t *handle;
 
 		/* (user+group)*(old+new) structure, inode write (sb,
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index f39f80f..f1bb32e 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -466,8 +466,8 @@ int ext4_ext_migrate(struct inode *inode)
 	}
 	goal = (((inode->i_ino - 1) / EXT4_INODES_PER_GROUP(inode->i_sb)) *
 		EXT4_INODES_PER_GROUP(inode->i_sb)) + 1;
-	owner[0] = inode->i_uid;
-	owner[1] = inode->i_gid;
+	owner[0] = i_uid_read(inode);
+	owner[1] = i_gid_read(inode);
 	tmp_inode = ext4_new_inode(handle, inode->i_sb->s_root->d_inode,
 				   S_IFREG, NULL, goal, owner);
 	if (IS_ERR(tmp_inode)) {
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index ceebaf8..9d8eba0 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1448,6 +1448,8 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
 {
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	const struct mount_opts *m;
+	kuid_t uid;
+	kgid_t gid;
 	int arg = 0;
 
 	if (args->from && match_int(args, &arg))
@@ -1464,10 +1466,20 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
 			 "Ignoring removed %s option", opt);
 		return 1;
 	case Opt_resuid:
-		sbi->s_resuid = arg;
+		uid = make_kuid(current_user_ns(), arg);
+		if (!uid_valid(uid)) {
+			ext4_msg(sb, KERN_ERR, "Invalid uid value %d", arg);
+			return -1;
+		}
+		sbi->s_resuid = uid;
 		return 1;
 	case Opt_resgid:
-		sbi->s_resgid = arg;
+		gid = make_kgid(current_user_ns(), arg);
+		if (!gid_valid(gid)) {
+			ext4_msg(sb, KERN_ERR, "Invalid gid value %d", arg);
+			return -1;
+		}
+		sbi->s_resgid = gid;
 		return 1;
 	case Opt_abort:
 		sbi->s_mount_flags |= EXT4_MF_FS_ABORTED;
@@ -1732,12 +1744,14 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
 		SEQ_OPTS_PRINT("%s", token2str(m->token));
 	}
 
-	if (nodefs || sbi->s_resuid != EXT4_DEF_RESUID ||
+	if (nodefs || !uid_eq(sbi->s_resuid, make_kuid(&init_user_ns, EXT4_DEF_RESUID)) ||
 	    le16_to_cpu(es->s_def_resuid) != EXT4_DEF_RESUID)
-		SEQ_OPTS_PRINT("resuid=%u", sbi->s_resuid);
-	if (nodefs || sbi->s_resgid != EXT4_DEF_RESGID ||
+		SEQ_OPTS_PRINT("resuid=%u",
+				from_kuid_munged(&init_user_ns, sbi->s_resuid));
+	if (nodefs || !gid_eq(sbi->s_resgid, make_kgid(&init_user_ns, EXT4_DEF_RESGID)) ||
 	    le16_to_cpu(es->s_def_resgid) != EXT4_DEF_RESGID)
-		SEQ_OPTS_PRINT("resgid=%u", sbi->s_resgid);
+		SEQ_OPTS_PRINT("resgid=%u",
+				from_kgid_munged(&init_user_ns, sbi->s_resgid));
 	def_errors = nodefs ? -1 : le16_to_cpu(es->s_errors);
 	if (test_opt(sb, ERRORS_RO) && def_errors != EXT4_ERRORS_RO)
 		SEQ_OPTS_PUTS("errors=remount-ro");
@@ -2996,8 +3010,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	}
 	sb->s_fs_info = sbi;
 	sbi->s_mount_opt = 0;
-	sbi->s_resuid = EXT4_DEF_RESUID;
-	sbi->s_resgid = EXT4_DEF_RESGID;
+	sbi->s_resuid = make_kuid(&init_user_ns, EXT4_DEF_RESUID);
+	sbi->s_resgid = make_kgid(&init_user_ns, EXT4_DEF_RESGID);
 	sbi->s_inode_readahead_blks = EXT4_DEF_INODE_READAHEAD_BLKS;
 	sbi->s_sb_block = sb_block;
 	if (sb->s_bdev->bd_part)
@@ -3076,8 +3090,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	if (def_mount_opts & EXT4_DEFM_DISCARD)
 		set_opt(sb, DISCARD);
 
-	sbi->s_resuid = le16_to_cpu(es->s_def_resuid);
-	sbi->s_resgid = le16_to_cpu(es->s_def_resgid);
+	sbi->s_resuid = make_kuid(&init_user_ns, le16_to_cpu(es->s_def_resuid));
+	sbi->s_resgid = make_kgid(&init_user_ns, le16_to_cpu(es->s_def_resgid));
 	sbi->s_commit_interval = JBD2_DEFAULT_MAX_COMMIT_AGE * HZ;
 	sbi->s_min_batch_time = EXT4_DEF_MIN_BATCH_TIME;
 	sbi->s_max_batch_time = EXT4_DEF_MAX_BATCH_TIME;
@@ -4229,8 +4243,8 @@ static int ext4_unfreeze(struct super_block *sb)
 struct ext4_mount_options {
 	unsigned long s_mount_opt;
 	unsigned long s_mount_opt2;
-	uid_t s_resuid;
-	gid_t s_resgid;
+	kuid_t s_resuid;
+	kgid_t s_resgid;
 	unsigned long s_commit_interval;
 	u32 s_min_batch_time, s_max_batch_time;
 #ifdef CONFIG_QUOTA
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 319538b..69d8a69 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -36,8 +36,8 @@ TRACE_EVENT(ext4_free_inode,
 		__entry->dev	= inode->i_sb->s_dev;
 		__entry->ino	= inode->i_ino;
 		__entry->mode	= inode->i_mode;
-		__entry->uid	= inode->i_uid;
-		__entry->gid	= inode->i_gid;
+		__entry->uid	= i_uid_read(inode);
+		__entry->gid	= i_gid_read(inode);
 		__entry->blocks	= inode->i_blocks;
 	),
 
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 38/43] userns: Convert proc to use kuid/kgid where appropriate
       [not found] ` <m11unyn70b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
                     ` (36 preceding siblings ...)
  2012-04-08  5:15   ` [PATCH 37/43] userns: Convert ext4 to user " "Eric W. Beiderman
@ 2012-04-08  5:15   ` "Eric W. Beiderman
  2012-04-08  5:15   ` [PATCH 39/43] userns: Convert sysctl permission checks to use kuid and kgids "Eric W. Beiderman
                     ` (7 subsequent siblings)
  45 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/proc/array.c               |   10 ++++++++--
 fs/proc/base.c                |   16 ++++++++--------
 fs/proc/inode.c               |    4 ++--
 fs/proc/root.c                |    2 +-
 include/linux/pid_namespace.h |    2 +-
 include/linux/proc_fs.h       |    4 ++--
 6 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 36a0a91..dc4c5a7 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -191,8 +191,14 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 		task_tgid_nr_ns(p, ns),
 		pid_nr_ns(pid, ns),
 		ppid, tpid,
-		cred->uid, cred->euid, cred->suid, cred->fsuid,
-		cred->gid, cred->egid, cred->sgid, cred->fsgid);
+		from_kuid_munged(user_ns, cred->uid),
+		from_kuid_munged(user_ns, cred->euid),
+		from_kuid_munged(user_ns, cred->suid),
+		from_kuid_munged(user_ns, cred->fsuid),
+		from_kgid_munged(user_ns, cred->gid),
+		from_kgid_munged(user_ns, cred->egid),
+		from_kgid_munged(user_ns, cred->sgid),
+		from_kgid_munged(user_ns, cred->fsgid));
 
 	task_lock(p);
 	if (p->files)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 2ee514c..c479049 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1562,8 +1562,8 @@ int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
 	generic_fillattr(inode, stat);
 
 	rcu_read_lock();
-	stat->uid = 0;
-	stat->gid = 0;
+	stat->uid = GLOBAL_ROOT_UID;
+	stat->gid = GLOBAL_ROOT_GID;
 	task = pid_task(proc_pid(inode), PIDTYPE_PID);
 	if (task) {
 		if (!has_pid_permissions(pid, task, 2)) {
@@ -1623,8 +1623,8 @@ int pid_revalidate(struct dentry *dentry, struct nameidata *nd)
 			inode->i_gid = cred->egid;
 			rcu_read_unlock();
 		} else {
-			inode->i_uid = 0;
-			inode->i_gid = 0;
+			inode->i_uid = GLOBAL_ROOT_UID;
+			inode->i_gid = GLOBAL_ROOT_GID;
 		}
 		inode->i_mode &= ~(S_ISUID | S_ISGID);
 		security_task_to_inode(task, inode);
@@ -1811,8 +1811,8 @@ static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
 					inode->i_gid = cred->egid;
 					rcu_read_unlock();
 				} else {
-					inode->i_uid = 0;
-					inode->i_gid = 0;
+					inode->i_uid = GLOBAL_ROOT_UID;
+					inode->i_gid = GLOBAL_ROOT_GID;
 				}
 				inode->i_mode &= ~(S_ISUID | S_ISGID);
 				security_task_to_inode(task, inode);
@@ -2061,8 +2061,8 @@ static int map_files_d_revalidate(struct dentry *dentry, struct nameidata *nd)
 			inode->i_gid = cred->egid;
 			rcu_read_unlock();
 		} else {
-			inode->i_uid = 0;
-			inode->i_gid = 0;
+			inode->i_uid = GLOBAL_ROOT_UID;
+			inode->i_gid = GLOBAL_ROOT_GID;
 		}
 		security_task_to_inode(task, inode);
 		status = 1;
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 205c922..554ecc5 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -108,8 +108,8 @@ static int proc_show_options(struct seq_file *seq, struct dentry *root)
 	struct super_block *sb = root->d_sb;
 	struct pid_namespace *pid = sb->s_fs_info;
 
-	if (pid->pid_gid)
-		seq_printf(seq, ",gid=%lu", (unsigned long)pid->pid_gid);
+	if (!gid_eq(pid->pid_gid, GLOBAL_ROOT_GID))
+		seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, pid->pid_gid));
 	if (pid->hide_pid != 0)
 		seq_printf(seq, ",hidepid=%u", pid->hide_pid);
 
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 46a15d8..df4e456 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -67,7 +67,7 @@ static int proc_parse_options(char *options, struct pid_namespace *pid)
 		case Opt_gid:
 			if (match_int(&args[0], &option))
 				return 0;
-			pid->pid_gid = option;
+			pid->pid_gid = make_kgid(current_user_ns(), option);
 			break;
 		case Opt_hidepid:
 			if (match_int(&args[0], &option))
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index b067bd8..00474b0 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -31,7 +31,7 @@ struct pid_namespace {
 #ifdef CONFIG_BSD_PROCESS_ACCT
 	struct bsd_acct_struct *bacct;
 #endif
-	gid_t pid_gid;
+	kgid_t pid_gid;
 	int hide_pid;
 	int reboot;	/* group exit code if this pidns was rebooted */
 };
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 85c5073..3fd2e87 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -52,8 +52,8 @@ struct proc_dir_entry {
 	unsigned int low_ino;
 	umode_t mode;
 	nlink_t nlink;
-	uid_t uid;
-	gid_t gid;
+	kuid_t uid;
+	kgid_t gid;
 	loff_t size;
 	const struct inode_operations *proc_iops;
 	/*
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 38/43] userns: Convert proc to use kuid/kgid where appropriate
  2012-04-08  5:10 ` Eric W. Biederman
                   ` (6 preceding siblings ...)
  (?)
@ 2012-04-08  5:15 ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/proc/array.c               |   10 ++++++++--
 fs/proc/base.c                |   16 ++++++++--------
 fs/proc/inode.c               |    4 ++--
 fs/proc/root.c                |    2 +-
 include/linux/pid_namespace.h |    2 +-
 include/linux/proc_fs.h       |    4 ++--
 6 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 36a0a91..dc4c5a7 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -191,8 +191,14 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 		task_tgid_nr_ns(p, ns),
 		pid_nr_ns(pid, ns),
 		ppid, tpid,
-		cred->uid, cred->euid, cred->suid, cred->fsuid,
-		cred->gid, cred->egid, cred->sgid, cred->fsgid);
+		from_kuid_munged(user_ns, cred->uid),
+		from_kuid_munged(user_ns, cred->euid),
+		from_kuid_munged(user_ns, cred->suid),
+		from_kuid_munged(user_ns, cred->fsuid),
+		from_kgid_munged(user_ns, cred->gid),
+		from_kgid_munged(user_ns, cred->egid),
+		from_kgid_munged(user_ns, cred->sgid),
+		from_kgid_munged(user_ns, cred->fsgid));
 
 	task_lock(p);
 	if (p->files)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 2ee514c..c479049 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1562,8 +1562,8 @@ int pid_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
 	generic_fillattr(inode, stat);
 
 	rcu_read_lock();
-	stat->uid = 0;
-	stat->gid = 0;
+	stat->uid = GLOBAL_ROOT_UID;
+	stat->gid = GLOBAL_ROOT_GID;
 	task = pid_task(proc_pid(inode), PIDTYPE_PID);
 	if (task) {
 		if (!has_pid_permissions(pid, task, 2)) {
@@ -1623,8 +1623,8 @@ int pid_revalidate(struct dentry *dentry, struct nameidata *nd)
 			inode->i_gid = cred->egid;
 			rcu_read_unlock();
 		} else {
-			inode->i_uid = 0;
-			inode->i_gid = 0;
+			inode->i_uid = GLOBAL_ROOT_UID;
+			inode->i_gid = GLOBAL_ROOT_GID;
 		}
 		inode->i_mode &= ~(S_ISUID | S_ISGID);
 		security_task_to_inode(task, inode);
@@ -1811,8 +1811,8 @@ static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
 					inode->i_gid = cred->egid;
 					rcu_read_unlock();
 				} else {
-					inode->i_uid = 0;
-					inode->i_gid = 0;
+					inode->i_uid = GLOBAL_ROOT_UID;
+					inode->i_gid = GLOBAL_ROOT_GID;
 				}
 				inode->i_mode &= ~(S_ISUID | S_ISGID);
 				security_task_to_inode(task, inode);
@@ -2061,8 +2061,8 @@ static int map_files_d_revalidate(struct dentry *dentry, struct nameidata *nd)
 			inode->i_gid = cred->egid;
 			rcu_read_unlock();
 		} else {
-			inode->i_uid = 0;
-			inode->i_gid = 0;
+			inode->i_uid = GLOBAL_ROOT_UID;
+			inode->i_gid = GLOBAL_ROOT_GID;
 		}
 		security_task_to_inode(task, inode);
 		status = 1;
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 205c922..554ecc5 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -108,8 +108,8 @@ static int proc_show_options(struct seq_file *seq, struct dentry *root)
 	struct super_block *sb = root->d_sb;
 	struct pid_namespace *pid = sb->s_fs_info;
 
-	if (pid->pid_gid)
-		seq_printf(seq, ",gid=%lu", (unsigned long)pid->pid_gid);
+	if (!gid_eq(pid->pid_gid, GLOBAL_ROOT_GID))
+		seq_printf(seq, ",gid=%u", from_kgid_munged(&init_user_ns, pid->pid_gid));
 	if (pid->hide_pid != 0)
 		seq_printf(seq, ",hidepid=%u", pid->hide_pid);
 
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 46a15d8..df4e456 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -67,7 +67,7 @@ static int proc_parse_options(char *options, struct pid_namespace *pid)
 		case Opt_gid:
 			if (match_int(&args[0], &option))
 				return 0;
-			pid->pid_gid = option;
+			pid->pid_gid = make_kgid(current_user_ns(), option);
 			break;
 		case Opt_hidepid:
 			if (match_int(&args[0], &option))
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index b067bd8..00474b0 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -31,7 +31,7 @@ struct pid_namespace {
 #ifdef CONFIG_BSD_PROCESS_ACCT
 	struct bsd_acct_struct *bacct;
 #endif
-	gid_t pid_gid;
+	kgid_t pid_gid;
 	int hide_pid;
 	int reboot;	/* group exit code if this pidns was rebooted */
 };
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 85c5073..3fd2e87 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -52,8 +52,8 @@ struct proc_dir_entry {
 	unsigned int low_ino;
 	umode_t mode;
 	nlink_t nlink;
-	uid_t uid;
-	gid_t gid;
+	kuid_t uid;
+	kgid_t gid;
 	loff_t size;
 	const struct inode_operations *proc_iops;
 	/*
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 39/43] userns: Convert sysctl permission checks to use kuid and kgids.
       [not found] ` <m11unyn70b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
                     ` (37 preceding siblings ...)
  2012-04-08  5:15   ` [PATCH 38/43] userns: Convert proc to use " "Eric W. Beiderman
@ 2012-04-08  5:15   ` "Eric W. Beiderman
  2012-04-08  5:15   ` [PATCH 40/43] userns: Convert sysfs to use kgid/kuid where appropriate "Eric W. Beiderman
                     ` (6 subsequent siblings)
  45 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/proc/proc_sysctl.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 21d836f..3476bca 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -371,9 +371,9 @@ void register_sysctl_root(struct ctl_table_root *root)
 
 static int test_perm(int mode, int op)
 {
-	if (!current_euid())
+	if (uid_eq(current_euid(), GLOBAL_ROOT_UID))
 		mode >>= 6;
-	else if (in_egroup_p(0))
+	else if (in_egroup_p(GLOBAL_ROOT_GID))
 		mode >>= 3;
 	if ((op & ~mode & (MAY_READ|MAY_WRITE|MAY_EXEC)) == 0)
 		return 0;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 39/43] userns: Convert sysctl permission checks to use kuid and kgids.
  2012-04-08  5:10 ` Eric W. Biederman
                   ` (7 preceding siblings ...)
  (?)
@ 2012-04-08  5:15 ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/proc/proc_sysctl.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 21d836f..3476bca 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -371,9 +371,9 @@ void register_sysctl_root(struct ctl_table_root *root)
 
 static int test_perm(int mode, int op)
 {
-	if (!current_euid())
+	if (uid_eq(current_euid(), GLOBAL_ROOT_UID))
 		mode >>= 6;
-	else if (in_egroup_p(0))
+	else if (in_egroup_p(GLOBAL_ROOT_GID))
 		mode >>= 3;
 	if ((op & ~mode & (MAY_READ|MAY_WRITE|MAY_EXEC)) == 0)
 		return 0;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 40/43] userns: Convert sysfs to use kgid/kuid where appropriate
       [not found] ` <m11unyn70b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
                     ` (38 preceding siblings ...)
  2012-04-08  5:15   ` [PATCH 39/43] userns: Convert sysctl permission checks to use kuid and kgids "Eric W. Beiderman
@ 2012-04-08  5:15   ` "Eric W. Beiderman
  2012-04-08  5:15   ` [PATCH 41/43] userns: Convert tmpfs to use kuid and kgid " "Eric W. Beiderman
                     ` (5 subsequent siblings)
  45 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/sysfs/inode.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index feb2d69..907c2b3 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -62,8 +62,8 @@ static struct sysfs_inode_attrs *sysfs_init_inode_attrs(struct sysfs_dirent *sd)
 
 	/* assign default attributes */
 	iattrs->ia_mode = sd->s_mode;
-	iattrs->ia_uid = 0;
-	iattrs->ia_gid = 0;
+	iattrs->ia_uid = GLOBAL_ROOT_UID;
+	iattrs->ia_gid = GLOBAL_ROOT_GID;
 	iattrs->ia_atime = iattrs->ia_mtime = iattrs->ia_ctime = CURRENT_TIME;
 
 	return attrs;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 40/43] userns: Convert sysfs to use kgid/kuid where appropriate
  2012-04-08  5:10 ` Eric W. Biederman
                   ` (8 preceding siblings ...)
  (?)
@ 2012-04-08  5:15 ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/sysfs/inode.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index feb2d69..907c2b3 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -62,8 +62,8 @@ static struct sysfs_inode_attrs *sysfs_init_inode_attrs(struct sysfs_dirent *sd)
 
 	/* assign default attributes */
 	iattrs->ia_mode = sd->s_mode;
-	iattrs->ia_uid = 0;
-	iattrs->ia_gid = 0;
+	iattrs->ia_uid = GLOBAL_ROOT_UID;
+	iattrs->ia_gid = GLOBAL_ROOT_GID;
 	iattrs->ia_atime = iattrs->ia_mtime = iattrs->ia_ctime = CURRENT_TIME;
 
 	return attrs;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 41/43] userns: Convert tmpfs to use kuid and kgid where appropriate
       [not found] ` <m11unyn70b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
                     ` (39 preceding siblings ...)
  2012-04-08  5:15   ` [PATCH 40/43] userns: Convert sysfs to use kgid/kuid where appropriate "Eric W. Beiderman
@ 2012-04-08  5:15   ` "Eric W. Beiderman
  2012-04-08  5:15   ` [PATCH 42/43] userns: Convert cgroup permission checks to use uid_eq "Eric W. Beiderman
                     ` (4 subsequent siblings)
  45 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/linux/shmem_fs.h |    4 ++--
 mm/shmem.c               |   22 ++++++++++++++++------
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 79ab255..bef2cf0 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -28,8 +28,8 @@ struct shmem_sb_info {
 	unsigned long max_inodes;   /* How many inodes are allowed */
 	unsigned long free_inodes;  /* How many are left for allocation */
 	spinlock_t stat_lock;	    /* Serialize shmem_sb_info changes */
-	uid_t uid;		    /* Mount uid for root directory */
-	gid_t gid;		    /* Mount gid for root directory */
+	kuid_t uid;		    /* Mount uid for root directory */
+	kgid_t gid;		    /* Mount gid for root directory */
 	umode_t mode;		    /* Mount mode for root directory */
 	struct mempolicy *mpol;     /* default memory policy for mappings */
 };
diff --git a/mm/shmem.c b/mm/shmem.c
index f99ff3e..d7b433a 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2075,6 +2075,8 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo,
 			       bool remount)
 {
 	char *this_char, *value, *rest;
+	uid_t uid;
+	gid_t gid;
 
 	while (options != NULL) {
 		this_char = options;
@@ -2134,15 +2136,21 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo,
 		} else if (!strcmp(this_char,"uid")) {
 			if (remount)
 				continue;
-			sbinfo->uid = simple_strtoul(value, &rest, 0);
+			uid = simple_strtoul(value, &rest, 0);
 			if (*rest)
 				goto bad_val;
+			sbinfo->uid = make_kuid(current_user_ns(), uid);
+			if (!uid_valid(sbinfo->uid))
+				goto bad_val;
 		} else if (!strcmp(this_char,"gid")) {
 			if (remount)
 				continue;
-			sbinfo->gid = simple_strtoul(value, &rest, 0);
+			gid = simple_strtoul(value, &rest, 0);
 			if (*rest)
 				goto bad_val;
+			sbinfo->gid = make_kgid(current_user_ns(), gid);
+			if (!gid_valid(sbinfo->gid))
+				goto bad_val;
 		} else if (!strcmp(this_char,"mpol")) {
 			if (mpol_parse_str(value, &sbinfo->mpol, 1))
 				goto bad_val;
@@ -2210,10 +2218,12 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root)
 		seq_printf(seq, ",nr_inodes=%lu", sbinfo->max_inodes);
 	if (sbinfo->mode != (S_IRWXUGO | S_ISVTX))
 		seq_printf(seq, ",mode=%03ho", sbinfo->mode);
-	if (sbinfo->uid != 0)
-		seq_printf(seq, ",uid=%u", sbinfo->uid);
-	if (sbinfo->gid != 0)
-		seq_printf(seq, ",gid=%u", sbinfo->gid);
+	if (!uid_eq(sbinfo->uid, GLOBAL_ROOT_UID))
+		seq_printf(seq, ",uid=%u",
+				from_kuid_munged(&init_user_ns, sbinfo->uid));
+	if (!gid_eq(sbinfo->gid, GLOBAL_ROOT_GID))
+		seq_printf(seq, ",gid=%u",
+				from_kgid_munged(&init_user_ns, sbinfo->gid));
 	shmem_show_mpol(seq, sbinfo->mpol);
 	return 0;
 }
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 41/43] userns: Convert tmpfs to use kuid and kgid where appropriate
  2012-04-08  5:10 ` Eric W. Biederman
                   ` (9 preceding siblings ...)
  (?)
@ 2012-04-08  5:15 ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/shmem_fs.h |    4 ++--
 mm/shmem.c               |   22 ++++++++++++++++------
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 79ab255..bef2cf0 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -28,8 +28,8 @@ struct shmem_sb_info {
 	unsigned long max_inodes;   /* How many inodes are allowed */
 	unsigned long free_inodes;  /* How many are left for allocation */
 	spinlock_t stat_lock;	    /* Serialize shmem_sb_info changes */
-	uid_t uid;		    /* Mount uid for root directory */
-	gid_t gid;		    /* Mount gid for root directory */
+	kuid_t uid;		    /* Mount uid for root directory */
+	kgid_t gid;		    /* Mount gid for root directory */
 	umode_t mode;		    /* Mount mode for root directory */
 	struct mempolicy *mpol;     /* default memory policy for mappings */
 };
diff --git a/mm/shmem.c b/mm/shmem.c
index f99ff3e..d7b433a 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2075,6 +2075,8 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo,
 			       bool remount)
 {
 	char *this_char, *value, *rest;
+	uid_t uid;
+	gid_t gid;
 
 	while (options != NULL) {
 		this_char = options;
@@ -2134,15 +2136,21 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo,
 		} else if (!strcmp(this_char,"uid")) {
 			if (remount)
 				continue;
-			sbinfo->uid = simple_strtoul(value, &rest, 0);
+			uid = simple_strtoul(value, &rest, 0);
 			if (*rest)
 				goto bad_val;
+			sbinfo->uid = make_kuid(current_user_ns(), uid);
+			if (!uid_valid(sbinfo->uid))
+				goto bad_val;
 		} else if (!strcmp(this_char,"gid")) {
 			if (remount)
 				continue;
-			sbinfo->gid = simple_strtoul(value, &rest, 0);
+			gid = simple_strtoul(value, &rest, 0);
 			if (*rest)
 				goto bad_val;
+			sbinfo->gid = make_kgid(current_user_ns(), gid);
+			if (!gid_valid(sbinfo->gid))
+				goto bad_val;
 		} else if (!strcmp(this_char,"mpol")) {
 			if (mpol_parse_str(value, &sbinfo->mpol, 1))
 				goto bad_val;
@@ -2210,10 +2218,12 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root)
 		seq_printf(seq, ",nr_inodes=%lu", sbinfo->max_inodes);
 	if (sbinfo->mode != (S_IRWXUGO | S_ISVTX))
 		seq_printf(seq, ",mode=%03ho", sbinfo->mode);
-	if (sbinfo->uid != 0)
-		seq_printf(seq, ",uid=%u", sbinfo->uid);
-	if (sbinfo->gid != 0)
-		seq_printf(seq, ",gid=%u", sbinfo->gid);
+	if (!uid_eq(sbinfo->uid, GLOBAL_ROOT_UID))
+		seq_printf(seq, ",uid=%u",
+				from_kuid_munged(&init_user_ns, sbinfo->uid));
+	if (!gid_eq(sbinfo->gid, GLOBAL_ROOT_GID))
+		seq_printf(seq, ",gid=%u",
+				from_kgid_munged(&init_user_ns, sbinfo->gid));
 	shmem_show_mpol(seq, sbinfo->mpol);
 	return 0;
 }
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 42/43] userns: Convert cgroup permission checks to use uid_eq
       [not found] ` <m11unyn70b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
                     ` (40 preceding siblings ...)
  2012-04-08  5:15   ` [PATCH 41/43] userns: Convert tmpfs to use kuid and kgid " "Eric W. Beiderman
@ 2012-04-08  5:15   ` "Eric W. Beiderman
  2012-04-08  5:15   ` [PATCH 43/43] userns: Convert the move_pages, and migrate_pages " "Eric W. Beiderman
                     ` (3 subsequent siblings)
  45 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 kernel/cgroup.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ed64cca..c8329b0 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2160,9 +2160,9 @@ retry_find_task:
 		 * only need to check permissions on one of them.
 		 */
 		tcred = __task_cred(tsk);
-		if (cred->euid &&
-		    cred->euid != tcred->uid &&
-		    cred->euid != tcred->suid) {
+		if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
+		    !uid_eq(cred->euid, tcred->uid) &&
+		    !uid_eq(cred->euid, tcred->suid)) {
 			rcu_read_unlock();
 			ret = -EACCES;
 			goto out_unlock_cgroup;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 42/43] userns: Convert cgroup permission checks to use uid_eq
  2012-04-08  5:10 ` Eric W. Biederman
                   ` (10 preceding siblings ...)
  (?)
@ 2012-04-08  5:15 ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 kernel/cgroup.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ed64cca..c8329b0 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2160,9 +2160,9 @@ retry_find_task:
 		 * only need to check permissions on one of them.
 		 */
 		tcred = __task_cred(tsk);
-		if (cred->euid &&
-		    cred->euid != tcred->uid &&
-		    cred->euid != tcred->suid) {
+		if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
+		    !uid_eq(cred->euid, tcred->uid) &&
+		    !uid_eq(cred->euid, tcred->suid)) {
 			rcu_read_unlock();
 			ret = -EACCES;
 			goto out_unlock_cgroup;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 43/43] userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
       [not found] ` <m11unyn70b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
                     ` (41 preceding siblings ...)
  2012-04-08  5:15   ` [PATCH 42/43] userns: Convert cgroup permission checks to use uid_eq "Eric W. Beiderman
@ 2012-04-08  5:15   ` "Eric W. Beiderman
  2012-04-08 14:54   ` [REVIEW][PATCH 0/43] Completing the user namespace Serge Hallyn
                     ` (2 subsequent siblings)
  45 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Linux Containers, Cyrill Gorcunov,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds, Eric W. Biederman

From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 mm/mempolicy.c |    4 ++--
 mm/migrate.c   |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index cfb6c86..7b44fc8 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1334,8 +1334,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
 	 * userid as the target process.
 	 */
 	tcred = __task_cred(task);
-	if (cred->euid != tcred->suid && cred->euid != tcred->uid &&
-	    cred->uid  != tcred->suid && cred->uid  != tcred->uid &&
+	if (!uid_eq(cred->euid, tcred->suid) && !uid_eq(cred->euid, tcred->uid) &&
+	    !uid_eq(cred->uid,  tcred->suid) && !uid_eq(cred->uid,  tcred->uid) &&
 	    !capable(CAP_SYS_NICE)) {
 		rcu_read_unlock();
 		err = -EPERM;
diff --git a/mm/migrate.c b/mm/migrate.c
index 51c08a0..1cf5252 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1371,8 +1371,8 @@ SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages,
 	 * userid as the target process.
 	 */
 	tcred = __task_cred(task);
-	if (cred->euid != tcred->suid && cred->euid != tcred->uid &&
-	    cred->uid  != tcred->suid && cred->uid  != tcred->uid &&
+	if (!uid_eq(cred->euid, tcred->suid) && !uid_eq(cred->euid, tcred->uid) &&
+	    !uid_eq(cred->uid,  tcred->suid) && !uid_eq(cred->uid,  tcred->uid) &&
 	    !capable(CAP_SYS_NICE)) {
 		rcu_read_unlock();
 		err = -EPERM;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 43/43] userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
  2012-04-08  5:10 ` Eric W. Biederman
                   ` (11 preceding siblings ...)
  (?)
@ 2012-04-08  5:15 ` "Eric W. Beiderman
  -1 siblings, 0 replies; 227+ messages in thread
From: "Eric W. Beiderman @ 2012-04-08  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Andrew Morton, Linus Torvalds, Al Viro, Cyrill Gorcunov,
	Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 mm/mempolicy.c |    4 ++--
 mm/migrate.c   |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index cfb6c86..7b44fc8 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1334,8 +1334,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
 	 * userid as the target process.
 	 */
 	tcred = __task_cred(task);
-	if (cred->euid != tcred->suid && cred->euid != tcred->uid &&
-	    cred->uid  != tcred->suid && cred->uid  != tcred->uid &&
+	if (!uid_eq(cred->euid, tcred->suid) && !uid_eq(cred->euid, tcred->uid) &&
+	    !uid_eq(cred->uid,  tcred->suid) && !uid_eq(cred->uid,  tcred->uid) &&
 	    !capable(CAP_SYS_NICE)) {
 		rcu_read_unlock();
 		err = -EPERM;
diff --git a/mm/migrate.c b/mm/migrate.c
index 51c08a0..1cf5252 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1371,8 +1371,8 @@ SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages,
 	 * userid as the target process.
 	 */
 	tcred = __task_cred(task);
-	if (cred->euid != tcred->suid && cred->euid != tcred->uid &&
-	    cred->uid  != tcred->suid && cred->uid  != tcred->uid &&
+	if (!uid_eq(cred->euid, tcred->suid) && !uid_eq(cred->euid, tcred->uid) &&
+	    !uid_eq(cred->uid,  tcred->suid) && !uid_eq(cred->uid,  tcred->uid) &&
 	    !capable(CAP_SYS_NICE)) {
 		rcu_read_unlock();
 		err = -EPERM;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
       [not found] ` <m11unyn70b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
                     ` (42 preceding siblings ...)
  2012-04-08  5:15   ` [PATCH 43/43] userns: Convert the move_pages, and migrate_pages " "Eric W. Beiderman
@ 2012-04-08 14:54   ` Serge Hallyn
  2012-04-08 17:40   ` richard -rw- weinberger
  2012-05-11 23:20     ` Eric W. Biederman
  45 siblings, 0 replies; 227+ messages in thread
From: Serge Hallyn @ 2012-04-08 14:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Thanks, Eric.

While this approach has limitations and administrative overhead which I'd
prefer it didn't, it also has huge benefits, especially faster id comparisons
and, most importantly, type safety enforcing that the right kinds of uids
are compared.

So I endorse this approach.

I've reviewed and acked many of the patches, some I still had questions on,
but the approach as a whole gets an ack from me.

thanks,
-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-08  5:10 ` Eric W. Biederman
                   ` (12 preceding siblings ...)
  (?)
@ 2012-04-08 14:54 ` Serge Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge Hallyn @ 2012-04-08 14:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Thanks, Eric.

While this approach has limitations and administrative overhead which I'd
prefer it didn't, it also has huge benefits, especially faster id comparisons
and, most importantly, type safety enforcing that the right kinds of uids
are compared.

So I endorse this approach.

I've reviewed and acked many of the patches, some I still had questions on,
but the approach as a whole gets an ack from me.

thanks,
-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
       [not found] ` <m11unyn70b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
                     ` (43 preceding siblings ...)
  2012-04-08 14:54   ` [REVIEW][PATCH 0/43] Completing the user namespace Serge Hallyn
@ 2012-04-08 17:40   ` richard -rw- weinberger
  2012-05-11 23:20     ` Eric W. Biederman
  45 siblings, 0 replies; 227+ messages in thread
From: richard -rw- weinberger @ 2012-04-08 17:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Cyrill Gorcunov, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds

On Sun, Apr 8, 2012 at 7:10 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> - Capabilities are localized to the current user namespace making
>  it safe to give the initial user in a user namespace all capabilities.
>

So, this makes LXC and friends ready for hostile environments?
IOW a root user (with all capabilities) sitting in his own namespace can no
longer ham the host?

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-08  5:10 ` Eric W. Biederman
@ 2012-04-08 17:40   ` richard -rw- weinberger
  -1 siblings, 0 replies; 227+ messages in thread
From: richard -rw- weinberger @ 2012-04-08 17:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-fsdevel, linux-security-module,
	Linux Containers, Serge E. Hallyn, Andrew Morton, Linus Torvalds,
	Al Viro, Cyrill Gorcunov

On Sun, Apr 8, 2012 at 7:10 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> - Capabilities are localized to the current user namespace making
>  it safe to give the initial user in a user namespace all capabilities.
>

So, this makes LXC and friends ready for hostile environments?
IOW a root user (with all capabilities) sitting in his own namespace can no
longer ham the host?

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
@ 2012-04-08 17:40   ` richard -rw- weinberger
  0 siblings, 0 replies; 227+ messages in thread
From: richard -rw- weinberger @ 2012-04-08 17:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-fsdevel, linux-security-module,
	Linux Containers, Serge E. Hallyn, Andrew Morton, Linus Torvalds,
	Al Viro, Cyrill Gorcunov

On Sun, Apr 8, 2012 at 7:10 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> - Capabilities are localized to the current user namespace making
>  it safe to give the initial user in a user namespace all capabilities.
>

So, this makes LXC and friends ready for hostile environments?
IOW a root user (with all capabilities) sitting in his own namespace can no
longer ham the host?

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-08 17:40   ` richard -rw- weinberger
@ 2012-04-08 21:30       ` Eric W. Biederman
  -1 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-08 21:30 UTC (permalink / raw)
  To: richard -rw- weinberger
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Cyrill Gorcunov, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds

richard -rw- weinberger <richard.weinberger@gmail.com> writes:

> On Sun, Apr 8, 2012 at 7:10 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> - Capabilities are localized to the current user namespace making
>>  it safe to give the initial user in a user namespace all capabilities.
>>
>
> So, this makes LXC and friends ready for hostile environments?
> IOW a root user (with all capabilities) sitting in his own namespace can no
> longer ham the host?

The user namespace now restricts the root user in a container to being
able to do no more harm than any other user can do.  Additionally suid
executables can no longer lead to having all power on the system.  Which
means that the only privilege escalation attacks available from a
container require kernel bugs.

With my version of user namespaces you no longer have to worry about the
container root writing to files in /proc or /sys and changing the
behavior of the system.  Nor do you have to worry about messages passed
across unix domain sockets to d-bus having a trusted uid and being
allowed to do something nasty.

It allows for applications with no capabilities to use multiple
uids and to implement privilege separation.

I certainly see user namespaces like this as having the potential
to make linux systems more secure.

You will have to make your own threat assessment to decide if that is
enough of an improvement to start deploying containers in what you
consider hostile environments.



For me the big potential I see is that it makes possible the creation of
a container without privilege (today the uid mapping setup still
requires privilege), and it allows a lot of things that the existence of
suid root executables has prevented us from making unprivileged before.

After the core is settled we can start looking at patches to allow
unprivileged creation of other namespaces.  Unprivileged mounts.
Unprivileged use of the networking stack.  Bringing many of the
improvements that linux has seen over the years to unprivileged
users.

I also see great potential for April fools day jokes.  You log in and
try to fix something and discover you are not the root you thought you
were.  Does that count as a hostile environment?

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
@ 2012-04-08 21:30       ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-08 21:30 UTC (permalink / raw)
  To: richard -rw- weinberger
  Cc: linux-kernel, linux-fsdevel, linux-security-module,
	Linux Containers, Serge E. Hallyn, Andrew Morton, Linus Torvalds,
	Al Viro, Cyrill Gorcunov

richard -rw- weinberger <richard.weinberger@gmail.com> writes:

> On Sun, Apr 8, 2012 at 7:10 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> - Capabilities are localized to the current user namespace making
>>  it safe to give the initial user in a user namespace all capabilities.
>>
>
> So, this makes LXC and friends ready for hostile environments?
> IOW a root user (with all capabilities) sitting in his own namespace can no
> longer ham the host?

The user namespace now restricts the root user in a container to being
able to do no more harm than any other user can do.  Additionally suid
executables can no longer lead to having all power on the system.  Which
means that the only privilege escalation attacks available from a
container require kernel bugs.

With my version of user namespaces you no longer have to worry about the
container root writing to files in /proc or /sys and changing the
behavior of the system.  Nor do you have to worry about messages passed
across unix domain sockets to d-bus having a trusted uid and being
allowed to do something nasty.

It allows for applications with no capabilities to use multiple
uids and to implement privilege separation.

I certainly see user namespaces like this as having the potential
to make linux systems more secure.

You will have to make your own threat assessment to decide if that is
enough of an improvement to start deploying containers in what you
consider hostile environments.



For me the big potential I see is that it makes possible the creation of
a container without privilege (today the uid mapping setup still
requires privilege), and it allows a lot of things that the existence of
suid root executables has prevented us from making unprivileged before.

After the core is settled we can start looking at patches to allow
unprivileged creation of other namespaces.  Unprivileged mounts.
Unprivileged use of the networking stack.  Bringing many of the
improvements that linux has seen over the years to unprivileged
users.

I also see great potential for April fools day jokes.  You log in and
try to fix something and discover you are not the root you thought you
were.  Does that count as a hostile environment?

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-08 21:30       ` Eric W. Biederman
@ 2012-04-08 22:04           ` richard -rw- weinberger
  -1 siblings, 0 replies; 227+ messages in thread
From: richard -rw- weinberger @ 2012-04-08 22:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Cyrill Gorcunov, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds

On Sun, Apr 8, 2012 at 11:30 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> richard -rw- weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> On Sun, Apr 8, 2012 at 7:10 AM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>> - Capabilities are localized to the current user namespace making
>>>  it safe to give the initial user in a user namespace all capabilities.
>>>
>>
>> So, this makes LXC and friends ready for hostile environments?
>> IOW a root user (with all capabilities) sitting in his own namespace can no
>> longer ham the host?
>
> The user namespace now restricts the root user in a container to being
> able to do no more harm than any other user can do.  Additionally suid
> executables can no longer lead to having all power on the system.  Which
> means that the only privilege escalation attacks available from a
> container require kernel bugs.
>
> With my version of user namespaces you no longer have to worry about the
> container root writing to files in /proc or /sys and changing the
> behavior of the system.  Nor do you have to worry about messages passed
> across unix domain sockets to d-bus having a trusted uid and being
> allowed to do something nasty.
>
> It allows for applications with no capabilities to use multiple
> uids and to implement privilege separation.
>
> I certainly see user namespaces like this as having the potential
> to make linux systems more secure.
>
> You will have to make your own threat assessment to decide if that is
> enough of an improvement to start deploying containers in what you
> consider hostile environments.
>
>
>
> For me the big potential I see is that it makes possible the creation of
> a container without privilege (today the uid mapping setup still
> requires privilege), and it allows a lot of things that the existence of
> suid root executables has prevented us from making unprivileged before.
>
> After the core is settled we can start looking at patches to allow
> unprivileged creation of other namespaces.  Unprivileged mounts.
> Unprivileged use of the networking stack.  Bringing many of the
> improvements that linux has seen over the years to unprivileged
> users.
>
> I also see great potential for April fools day jokes.  You log in and
> try to fix something and discover you are not the root you thought you
> were.  Does that count as a hostile environment?
>

Yep. Sounds great!
I'll give your patch set a try within the next few days on my LXC testbed. :-)

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
@ 2012-04-08 22:04           ` richard -rw- weinberger
  0 siblings, 0 replies; 227+ messages in thread
From: richard -rw- weinberger @ 2012-04-08 22:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, linux-fsdevel, linux-security-module,
	Linux Containers, Serge E. Hallyn, Andrew Morton, Linus Torvalds,
	Al Viro, Cyrill Gorcunov

On Sun, Apr 8, 2012 at 11:30 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> richard -rw- weinberger <richard.weinberger@gmail.com> writes:
>
>> On Sun, Apr 8, 2012 at 7:10 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> - Capabilities are localized to the current user namespace making
>>>  it safe to give the initial user in a user namespace all capabilities.
>>>
>>
>> So, this makes LXC and friends ready for hostile environments?
>> IOW a root user (with all capabilities) sitting in his own namespace can no
>> longer ham the host?
>
> The user namespace now restricts the root user in a container to being
> able to do no more harm than any other user can do.  Additionally suid
> executables can no longer lead to having all power on the system.  Which
> means that the only privilege escalation attacks available from a
> container require kernel bugs.
>
> With my version of user namespaces you no longer have to worry about the
> container root writing to files in /proc or /sys and changing the
> behavior of the system.  Nor do you have to worry about messages passed
> across unix domain sockets to d-bus having a trusted uid and being
> allowed to do something nasty.
>
> It allows for applications with no capabilities to use multiple
> uids and to implement privilege separation.
>
> I certainly see user namespaces like this as having the potential
> to make linux systems more secure.
>
> You will have to make your own threat assessment to decide if that is
> enough of an improvement to start deploying containers in what you
> consider hostile environments.
>
>
>
> For me the big potential I see is that it makes possible the creation of
> a container without privilege (today the uid mapping setup still
> requires privilege), and it allows a lot of things that the existence of
> suid root executables has prevented us from making unprivileged before.
>
> After the core is settled we can start looking at patches to allow
> unprivileged creation of other namespaces.  Unprivileged mounts.
> Unprivileged use of the networking stack.  Bringing many of the
> improvements that linux has seen over the years to unprivileged
> users.
>
> I also see great potential for April fools day jokes.  You log in and
> try to fix something and discover you are not the root you thought you
> were.  Does that count as a hostile environment?
>

Yep. Sounds great!
I'll give your patch set a try within the next few days on my LXC testbed. :-)

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
       [not found]           ` <CAFLxGvwHtA028V2XudM-5HXmXCPw5ENL5E_nHKZh_gbrsRV69g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-08 22:52             ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-08 22:52 UTC (permalink / raw)
  To: richard -rw- weinberger
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Cyrill Gorcunov, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds

richard -rw- weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> On Sun, Apr 8, 2012 at 11:30 PM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> richard -rw- weinberger <richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> Yep. Sounds great!
> I'll give your patch set a try within the next few days on my LXC
> testbed. :-)

Sounds good.

The big practical detail to work out is all of the userspace bits for
the uid and gid mappings.  Assigning multiple uids per each user and
things like that.  Which I expect ultimately means updating the shadow
package to set it up so that we reserve a bunch of uids and gids for
each user when their accounts are added.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-08 22:04           ` richard -rw- weinberger
  (?)
@ 2012-04-08 22:52           ` Eric W. Biederman
  -1 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-08 22:52 UTC (permalink / raw)
  To: richard -rw- weinberger
  Cc: linux-kernel, linux-fsdevel, linux-security-module,
	Linux Containers, Serge E. Hallyn, Andrew Morton, Linus Torvalds,
	Al Viro, Cyrill Gorcunov

richard -rw- weinberger <richard.weinberger@gmail.com> writes:

> On Sun, Apr 8, 2012 at 11:30 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> richard -rw- weinberger <richard.weinberger@gmail.com> writes:
>
> Yep. Sounds great!
> I'll give your patch set a try within the next few days on my LXC
> testbed. :-)

Sounds good.

The big practical detail to work out is all of the userspace bits for
the uid and gid mappings.  Assigning multiple uids per each user and
things like that.  Which I expect ultimately means updating the shadow
package to set it up so that we reserve a bunch of uids and gids for
each user when their accounts are added.

Eric


^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-08  5:10 ` Eric W. Biederman
                   ` (14 preceding siblings ...)
  (?)
@ 2012-04-10 19:01 ` Andy Lutomirski
  2012-04-10 21:59   ` Eric W. Biederman
  -1 siblings, 1 reply; 227+ messages in thread
From: Andy Lutomirski @ 2012-04-10 19:01 UTC (permalink / raw)
  To: Eric W. Biederman, Markus Gutschke, Will Drewry
  Cc: Cyrill Gorcunov, linux-security-module, Al Viro, linux-fsdevel,
	Andrew Morton, Linus Torvalds

On 04/07/2012 10:10 PM, Eric W. Biederman wrote:
> 
> This is a course correction for the user namespace, so that we can reach
> an inexpensive, maintainable, and reasonably complete implementation.
> 
> If anyone can think of a reason why the user namespace should not
> evolve in the direction taken in this patchset please let me know.
> 
> There is not an obvious maintainer for the scope of what this patchset
> covers so I intend to host this tree myself and to place it in
> linux-next after this round of review.
> 
> Highlights.
> - The kernel will now fail to build if you attempt to compile in
>   code whose permission checks have not been updated to be user
>   namespace safe.
> 
> - All uids from child user namespaces are mapped into the initial user
>   namespace before they are processed.  Removing the need to add
>   an additional check to see if the user namespace of the compared
>   uids remains the same.

[...]

I haven't read enough of the details to figure out how the uid mapping
works (do all the child namespace uids map to the same parent uid?), so
I may be missing some details here.

As a bit of background, the no_new_privs mode introduced in the big
seccomp patchset will add a flag that any task can set to prevent it or
any of its children from gaining privileges by using execve.

How should this interact with pid namespaces?  As a first pass, I
imagine that the main PR_SET_NO_NEW_PRIVS(1) mode will prevent setuid
from working inside uid namespaces as well, but there may be interest in
weaker variants that allow setuid inside namespaces.

Any thoughts?

--Andy

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-10 19:01 ` Andy Lutomirski
@ 2012-04-10 21:59   ` Eric W. Biederman
  2012-04-10 22:15     ` Andrew Lutomirski
  0 siblings, 1 reply; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-10 21:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Markus Gutschke, Will Drewry, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Andy Lutomirski <luto@MIT.EDU> writes:

> On 04/07/2012 10:10 PM, Eric W. Biederman wrote:
>> 
>> This is a course correction for the user namespace, so that we can reach
>> an inexpensive, maintainable, and reasonably complete implementation.
>> 
>> If anyone can think of a reason why the user namespace should not
>> evolve in the direction taken in this patchset please let me know.
>> 
>> There is not an obvious maintainer for the scope of what this patchset
>> covers so I intend to host this tree myself and to place it in
>> linux-next after this round of review.
>> 
>> Highlights.
>> - The kernel will now fail to build if you attempt to compile in
>>   code whose permission checks have not been updated to be user
>>   namespace safe.
>> 
>> - All uids from child user namespaces are mapped into the initial user
>>   namespace before they are processed.  Removing the need to add
>>   an additional check to see if the user namespace of the compared
>>   uids remains the same.
>
> [...]
>
> I haven't read enough of the details to figure out how the uid mapping
> works (do all the child namespace uids map to the same parent uid?), so
> I may be missing some details here.

You seem to be missing a detail or two.

What you want to look at are the functions make_kuid and from_kuid
in kernel/user_namespace.c  You might look at the patches that talk
about uidgid.h and introducing a mapping layer.

The implementation creates an incomplete but 1-1 mapping to the uids in
the initial user namespace.  Which means except for the change in
datatype (sigh) the existing permission checks don't need to be changed.

I change the data type from uid_t to kuid_t for everything internal
and make them assignment incompatible to force the use of
make_kuid and from_kuid at the boundaries of user space, and of the
filesystems and unfortunately that means uid comparisons themselves
must change a little.  aka uid_eq. instead of == .  That grand
search and replace is probably the scariest bit of this patchset.

> As a bit of background, the no_new_privs mode introduced in the big
> seccomp patchset will add a flag that any task can set to prevent it or
> any of its children from gaining privileges by using execve.
>
> How should this interact with pid namespaces?

I assume you mean uid namespaces?  I don't expect pid namespaces will
have any effect.

> As a first pass, I
> imagine that the main PR_SET_NO_NEW_PRIVS(1) mode will prevent setuid
> from working inside uid namespaces as well, but there may be interest in
> weaker variants that allow setuid inside namespaces.

> Any thoughts?

It looks like a big strength of seccomp is reducing the attack surface
of the kernel.  The user namespace will actually increase that attack
surface by making much more of the functionality available.  So on that
level they are very different mechanisms.

My understanding of no_new_privs is that current_cred() including
the user, the user namespace and the security label will never change,
with the goal of making the security analysis simple.

I don't recall how seccomp filters are dealt with if you don't have
no_new_privs enabled.  If seccomp filters installed by root
are dropped when we change privilege levels it might be worth looking
at how to keep a seccomp filter installed as long as you stay in
a user namespace.


There are essentially two modes you can use the user namespace in:
with mappings setup (a privileged operation) and with no mappings.

With no mappings you can not create a new user namespace or change or
uid or gids, and suid exec fails (or possibly ignores the uid/gid change
but I am starting with suid exec fails).  Making user namespaces similar
to no_new_privs.

The emphasis is a bit different from new_new_privs as the user_namespace
does not need to guarantee that the lsm will not change security labels,
etc.

At a basic level of interaction I expect no_new_privs will need to fail
any change of the user namespace.  As changing the user namespace
changes current_cred(), and fundamentally allows more things to happen.

Overall I think the two mechanisms are complementary.  I actually don't
expect any fundamental conflicts in the code between user namespaces and
no_new_privs I do expect conflicts in the patches, because some of the
same code paths will be touched.  Both approaches change exec and the
user namespace implementation lightly touches every every permission
check in the kernel.

Where seccomp focuses on making things secure the user namespace focuses
on making things possible that were not before.  In particular the user
namespace makes it easily possible and generally safe to allow suid exec,
creation of namespaces and allowing capable calls with respect to
namespaces we have created.

With both mechanisms in play I would expect the implementation of
no_new_privs to be able focus on limiting the attack surface and not
have to worry about allowing more kernel functionality to be used.
While the user namespaces can then focus on increasing the functionality
exported to unprivileged users.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-10 21:59   ` Eric W. Biederman
@ 2012-04-10 22:15     ` Andrew Lutomirski
  2012-04-10 23:01       ` Markus Gutschke
  2012-04-10 23:50       ` Eric W. Biederman
  0 siblings, 2 replies; 227+ messages in thread
From: Andrew Lutomirski @ 2012-04-10 22:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Markus Gutschke, Will Drewry, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Andy Lutomirski <luto@MIT.EDU> writes:
>
>> On 04/07/2012 10:10 PM, Eric W. Biederman wrote:
>>>
>>> This is a course correction for the user namespace, so that we can reach
>>> an inexpensive, maintainable, and reasonably complete implementation.
>>>
>>> If anyone can think of a reason why the user namespace should not
>>> evolve in the direction taken in this patchset please let me know.
>>>
>>> There is not an obvious maintainer for the scope of what this patchset
>>> covers so I intend to host this tree myself and to place it in
>>> linux-next after this round of review.
>>>
>>> Highlights.
>>> - The kernel will now fail to build if you attempt to compile in
>>>   code whose permission checks have not been updated to be user
>>>   namespace safe.
>>>
>>> - All uids from child user namespaces are mapped into the initial user
>>>   namespace before they are processed.  Removing the need to add
>>>   an additional check to see if the user namespace of the compared
>>>   uids remains the same.
>>
>> [...]
>>
>> I haven't read enough of the details to figure out how the uid mapping
>> works (do all the child namespace uids map to the same parent uid?), so
>> I may be missing some details here.
>
> You seem to be missing a detail or two.
>
> What you want to look at are the functions make_kuid and from_kuid
> in kernel/user_namespace.c  You might look at the patches that talk
> about uidgid.h and introducing a mapping layer.
>
> The implementation creates an incomplete but 1-1 mapping to the uids in
> the initial user namespace.  Which means except for the change in
> datatype (sigh) the existing permission checks don't need to be changed.

I'll do my homework at the same time that I write up docs for
no_new_privs (i.e. maybe today).

>
> My understanding of no_new_privs is that current_cred() including
> the user, the user namespace and the security label will never change,
> with the goal of making the security analysis simple.

They can change but only if you already have the privilege to change
them yourself and then you do so.  For example, PR_SET_NO_NEW_PRIVS,
setuid, then drop caps is allowed and useful -- it's a race-free way
to make sure that a given uid never executes without no_new_privs set.
 I've implemented this as a pam module.

This still simplifies security analysis: the guarantee is that, if
no_new_privs is set, then a task's children cannot do anything that
the task could do on it's own.  Therefore it's safe for the task to
manipulate its own environment in whatever strange ways it wants,
because even if that gives it the ability to subvert its children,
there is no privilege gained.

>
> I don't recall how seccomp filters are dealt with if you don't have
> no_new_privs enabled.  If seccomp filters installed by root
> are dropped when we change privilege levels it might be worth looking
> at how to keep a seccomp filter installed as long as you stay in
> a user namespace.
>

They're not dropped.  I think in the current implementation they can't
be dropped at all.

>
> There are essentially two modes you can use the user namespace in:
> with mappings setup (a privileged operation) and with no mappings.

>
> With no mappings you can not create a new user namespace or change or
> uid or gids, and suid exec fails (or possibly ignores the uid/gid change
> but I am starting with suid exec fails).  Making user namespaces similar
> to no_new_privs.
>
> The emphasis is a bit different from new_new_privs as the user_namespace
> does not need to guarantee that the lsm will not change security labels,
> etc.

Hmm.  Is this safe?  For example, if there's a program that LSM policy
grants extra privileges that malfunctions when run inside a user
namespace, can that be used to break out of LSM restrictions?

>
> At a basic level of interaction I expect no_new_privs will need to fail
> any change of the user namespace.  As changing the user namespace
> changes current_cred(), and fundamentally allows more things to happen.

If a user namespace has no visible effect on processes that aren't
descendents of whoever created it, then creating one in no_new_privs
mode should be safe.  On the other hand, it could be somewhat useless.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-10 22:15     ` Andrew Lutomirski
@ 2012-04-10 23:01       ` Markus Gutschke
  2012-04-11  0:04         ` Eric W. Biederman
  2012-04-10 23:50       ` Eric W. Biederman
  1 sibling, 1 reply; 227+ messages in thread
From: Markus Gutschke @ 2012-04-10 23:01 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Eric W. Biederman, Will Drewry, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

On Tue, Apr 10, 2012 at 15:15, Andrew Lutomirski <luto@mit.edu> wrote:
> On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> > With no mappings you can not create a new user namespace or change or
> > uid or gids, and suid exec fails (or possibly ignores the uid/gid change
> > but I am starting with suid exec fails).  Making user namespaces similar
> > to no_new_privs.
>
> Hmm.  Is this safe?  For example, if there's a program that LSM policy
> grants extra privileges that malfunctions when run inside a user
> namespace, can that be used to break out of LSM restrictions?

Is  creation without a mapping similar to some of the other CLONE_XXX
flags that essentially give you a new anonymous and ephemeral
namespace? Or does it just give you a 1:1 mapping to the parent's
namespace?

The former would conceivably be useful for sandboxing purposes. Every
so often, it is desirable to run a process as a user id that is
distinct from any other user id in the system. But this usually
requires the explicit creation of a new entry in /etc/passwd; and of
course it also takes a privileged user to switch to this new user id.
So, unprivileged processes can usually not switch to a dedicated user
id. I could see the benefit in being able to create an ephemeral
anonymous user id.

Of course, if the kernel provided for anonymous user ids, this would
have interesting semantics throughout the system. E.g. what happens if
the process attempts to create a new file in /tmp. Would that be
allowed? If so, who would be the owner of the file. Presumably, file
systems don't have any way to represent the fact the user id is
emphemeral. So, an application should be denied file system accesses
unless they obtained a file descriptor that was opened outside of the
namespace.

What happens if credentials are passed with SCM_CREDENTIALS? Do they
get translated? Does this work in both directions (i.e. passing in and
out of the namespace)?

What happens to permissions on files in /proc?

Can the creator of a namespace send signals to processes in the
namespace? How about the reverse?

But maybe, this is just too complicated and anonymous ephemeral user
is are not really doable.


Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-10 22:15     ` Andrew Lutomirski
  2012-04-10 23:01       ` Markus Gutschke
@ 2012-04-10 23:50       ` Eric W. Biederman
  2012-04-10 23:56         ` Andrew Lutomirski
  2012-04-11  4:16         ` Serge Hallyn
  1 sibling, 2 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-10 23:50 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Markus Gutschke, Will Drewry, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Andrew Lutomirski <luto@mit.edu> writes:

> On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Andy Lutomirski <luto@MIT.EDU> writes:
>>>
>>> [...]
>>>
>>> I haven't read enough of the details to figure out how the uid mapping
>>> works (do all the child namespace uids map to the same parent uid?), so
>>> I may be missing some details here.
>>
>> You seem to be missing a detail or two.
>>
>> What you want to look at are the functions make_kuid and from_kuid
>> in kernel/user_namespace.c  You might look at the patches that talk
>> about uidgid.h and introducing a mapping layer.
>>
>> The implementation creates an incomplete but 1-1 mapping to the uids in
>> the initial user namespace.  Which means except for the change in
>> datatype (sigh) the existing permission checks don't need to be changed.
>
> I'll do my homework at the same time that I write up docs for
> no_new_privs (i.e. maybe today).
>
>>
>> My understanding of no_new_privs is that current_cred() including
>> the user, the user namespace and the security label will never change,
>> with the goal of making the security analysis simple.
>
> They can change but only if you already have the privilege to change
> them yourself and then you do so.  For example, PR_SET_NO_NEW_PRIVS,
> setuid, then drop caps is allowed and useful -- it's a race-free way
> to make sure that a given uid never executes without no_new_privs set.
>  I've implemented this as a pam module.

Careful.  There is the security_task_fix_setuid call that will raise
your capabilities from cap->effective to cap->permitted if you call
setuid(0).  Which in the general case means you can regain all of the
root privileges if you only have CAP_SETUID.

> This still simplifies security analysis: the guarantee is that, if
> no_new_privs is set, then a task's children cannot do anything that
> the task could do on it's own.  Therefore it's safe for the task to
> manipulate its own environment in whatever strange ways it wants,
> because even if that gives it the ability to subvert its children,
> there is no privilege gained.

>> I don't recall how seccomp filters are dealt with if you don't have
>> no_new_privs enabled.  If seccomp filters installed by root
>> are dropped when we change privilege levels it might be worth looking
>> at how to keep a seccomp filter installed as long as you stay in
>> a user namespace.
>>
>
> They're not dropped.  I think in the current implementation they can't
> be dropped at all.

Which makes sense.   Is this why you need no_new_privs?  So you can't run
seccomp on higher privileged executables and confusing them into keeping
privileges when they should not?

>> There are essentially two modes you can use the user namespace in:
>> with mappings setup (a privileged operation) and with no mappings.
>
>>
>> With no mappings you can not create a new user namespace or change or
>> uid or gids, and suid exec fails (or possibly ignores the uid/gid change
>> but I am starting with suid exec fails).  Making user namespaces similar
>> to no_new_privs.
>>
>> The emphasis is a bit different from new_new_privs as the user_namespace
>> does not need to guarantee that the lsm will not change security labels,
>> etc.
>
> Hmm.  Is this safe?  For example, if there's a program that LSM policy
> grants extra privileges that malfunctions when run inside a user
> namespace, can that be used to break out of LSM restrictions?

I can't see how it would not be safe.

Except for the user namespace pointer the state the LSM and the rest of
the kernel sees is the same state the kernel sees.  Aka userspace sees
uid 0, the LSM does not.  So I don't know why a LSM would get confused.

Beyond that it is a bug for an LSM to grant permissions beyond the
core DAC model.  So the worst I can see is an LSM not grokking user
namespaces and getting confused and not restricting a process as
much as the designer of the LSM would like.

>> At a basic level of interaction I expect no_new_privs will need to fail
>> any change of the user namespace.  As changing the user namespace
>> changes current_cred(), and fundamentally allows more things to happen.
>
> If a user namespace has no visible effect on processes that aren't
> descendents of whoever created it, then creating one in no_new_privs
> mode should be safe.  On the other hand, it could be somewhat useless.

Creating a user namespace will allowing a process access to more kernel
facilities.  Aka you can (or at least will be able to) create network
namespaces and mount namespaces and the like.  That increases the
surface of the kernel an attacker can hit.

So in a perfect kernel there are no affects on others.  In a scenario
where you are limiting how much of the kernel a user can use I think
you would want that.

Still given that you aren't doing the very restrictive current_cred()
must not change I don't know how it matters, and a bpf based seccomp can
pretty easily filter out new user namespace creation.  Shrug.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-10 23:50       ` Eric W. Biederman
@ 2012-04-10 23:56         ` Andrew Lutomirski
  2012-04-11  1:01           ` Eric W. Biederman
  2012-04-11  4:16         ` Serge Hallyn
  1 sibling, 1 reply; 227+ messages in thread
From: Andrew Lutomirski @ 2012-04-10 23:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Markus Gutschke, Will Drewry, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

On Tue, Apr 10, 2012 at 4:50 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Andrew Lutomirski <luto@mit.edu> writes:
>
>> On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>> Andy Lutomirski <luto@MIT.EDU> writes:
>>>
>>> My understanding of no_new_privs is that current_cred() including
>>> the user, the user namespace and the security label will never change,
>>> with the goal of making the security analysis simple.
>>
>> They can change but only if you already have the privilege to change
>> them yourself and then you do so.  For example, PR_SET_NO_NEW_PRIVS,
>> setuid, then drop caps is allowed and useful -- it's a race-free way
>> to make sure that a given uid never executes without no_new_privs set.
>>  I've implemented this as a pam module.
>
> Careful.  There is the security_task_fix_setuid call that will raise
> your capabilities from cap->effective to cap->permitted if you call
> setuid(0).  Which in the general case means you can regain all of the
> root privileges if you only have CAP_SETUID.
>

That's fine.  If you're running with CAP_SETUID and default
securebits, then you effectively have all capabilities already and
don't need to exploit setuid binaries to gain them.  no_new_privs
doesn't change that.  If you don't want to be able to gain all privs,
change securebits or drop CAP_SETUID.  seccomp reduces the kernel
attack surface; no_new_privs reduces the userspace attack surface.
But see below...


>
>>> I don't recall how seccomp filters are dealt with if you don't have
>>> no_new_privs enabled.  If seccomp filters installed by root
>>> are dropped when we change privilege levels it might be worth looking
>>> at how to keep a seccomp filter installed as long as you stay in
>>> a user namespace.
>>>
>>
>> They're not dropped.  I think in the current implementation they can't
>> be dropped at all.
>
> Which makes sense.   Is this why you need no_new_privs?  So you can't run
> seccomp on higher privileged executables and confusing them into keeping
> privileges when they should not?

Exactly.  seccomp is flexible enough that it's probably possible to
confuse many setuid executables with it.

>
>>> The emphasis is a bit different from new_new_privs as the user_namespace
>>> does not need to guarantee that the lsm will not change security labels,
>>> etc.
>>
>> Hmm.  Is this safe?  For example, if there's a program that LSM policy
>> grants extra privileges that malfunctions when run inside a user
>> namespace, can that be used to break out of LSM restrictions?
>
> I can't see how it would not be safe.
>
> Except for the user namespace pointer the state the LSM and the rest of
> the kernel sees is the same state the kernel sees.  Aka userspace sees
> uid 0, the LSM does not.  So I don't know why a LSM would get confused.
>
> Beyond that it is a bug for an LSM to grant permissions beyond the
> core DAC model.  So the worst I can see is an LSM not grokking user
> namespaces and getting confused and not restricting a process as
> much as the designer of the LSM would like.

Right.  Suppose you have some program that has extra restrictions
applied by an LSM.  It executes a helper (e.g. Apache's suidexec
thing, but I bet there are more examples) which is supposed to be very
careful not to leak privileges.  The LSM is set to restrict that
helper less than the parent process.  But that program was written
before user namespaces existed, and it has a bug (or missing feature)
that allows its parent to exploit it when run inside an unmapped user
namespace.  The parent can now escape from the LSM restrictions.

no_new_privs is designed to prevent exactly this issue.


>>
>> If a user namespace has no visible effect on processes that aren't
>> descendents of whoever created it, then creating one in no_new_privs
>> mode should be safe.  On the other hand, it could be somewhat useless.
>
> Creating a user namespace will allowing a process access to more kernel
> facilities.  Aka you can (or at least will be able to) create network
> namespaces and mount namespaces and the like.  That increases the
> surface of the kernel an attacker can hit.
>
> So in a perfect kernel there are no affects on others.  In a scenario
> where you are limiting how much of the kernel a user can use I think
> you would want that.
>
> Still given that you aren't doing the very restrictive current_cred()
> must not change I don't know how it matters, and a bpf based seccomp can
> pretty easily filter out new user namespace creation.  Shrug.

I'm not worried about that.  I'm more interested in whether
unprivileged user namespace creation should require nnp and/or whether
someone might want a mode in which a task is has nnp set but can
create a user namespace that allows setuid execution inside the
namespace in spite of the nnp setting.  The latter is probably rather
complicated to get right and depends on nonexistent filesystem
features.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-10 23:01       ` Markus Gutschke
@ 2012-04-11  0:04         ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-11  0:04 UTC (permalink / raw)
  To: Markus Gutschke
  Cc: Andrew Lutomirski, Will Drewry, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Markus Gutschke <markus@chromium.org> writes:

> On Tue, Apr 10, 2012 at 15:15, Andrew Lutomirski <luto@mit.edu> wrote:
>> On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> > With no mappings you can not create a new user namespace or change or
>> > uid or gids, and suid exec fails (or possibly ignores the uid/gid change
>> > but I am starting with suid exec fails).  Making user namespaces similar
>> > to no_new_privs.
>>
>> Hmm.  Is this safe?  For example, if there's a program that LSM policy
>> grants extra privileges that malfunctions when run inside a user
>> namespace, can that be used to break out of LSM restrictions?
>
> Is  creation without a mapping similar to some of the other CLONE_XXX
> flags that essentially give you a new anonymous and ephemeral
> namespace? 

Close.  The primary purpose is to make it simpler to setup the mapping.
Strictly speaking you are still running with your original user id,
which just doesn't happen to map in a way that is useful for getuid(),
or stat, but still works for the permission checks.

> Or does it just give you a 1:1 mapping to the parent's namespace?

There is no mapping setup.  In the parent user namespace you see
an unchanged uid.  In the new user namespace you see your uid
as overflowuid aka 65534 aka nobody.

> The former would conceivably be useful for sandboxing purposes. Every
> so often, it is desirable to run a process as a user id that is
> distinct from any other user id in the system. But this usually
> requires the explicit creation of a new entry in /etc/passwd; and of
> course it also takes a privileged user to switch to this new user id.
> So, unprivileged processes can usually not switch to a dedicated user
> id. I could see the benefit in being able to create an ephemeral
> anonymous user id.

You don't get an ephemeral/anonymous user id, but you do get sandboxed
into the user namespace.  Which ultimately will make other namespaces
usable.

> Of course, if the kernel provided for anonymous user ids, this would
> have interesting semantics throughout the system. E.g. what happens if
> the process attempts to create a new file in /tmp. Would that be
> allowed? If so, who would be the owner of the file. Presumably, file
> systems don't have any way to represent the fact the user id is
> emphemeral. So, an application should be denied file system accesses
> unless they obtained a file descriptor that was opened outside of the
> namespace.

Which is why I skipped that.  The entire purpose of this patchset is to
make it so that you always, always, always have a uid that maps into the
initial user namespace.  So as to avoid creating strange cases to
consider in the permission checks or the rest of the logic throughout
the kernel.

> What happens if credentials are passed with SCM_CREDENTIALS? Do they
> get translated? Does this work in both directions (i.e. passing in and
> out of the namespace)?
>
> What happens to permissions on files in /proc?
>
> Can the creator of a namespace send signals to processes in the
> namespace? How about the reverse?

The logic is the uid in the initial user namespace is translated into
whatever user namespaces you are in.

If the uid maps you get the mapped uid.  If the uid does not map
(the initial state) you get overflowuid.

> But maybe, this is just too complicated and anonymous ephemeral user
> is are not really doable.

Which is why the user namespace needs this course correction I am
putting it on.  Too much pain for too little gain in dealing with uids
that don't map in a useful way for the permission checks.

Adding the one extra constraint that uids always map to the initial
user namespace makes the code fast and simple, at very little cost in
flexibility.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-11  1:01           ` Eric W. Biederman
@ 2012-04-11  1:00             ` Andrew Lutomirski
  2012-04-11  1:14               ` Eric W. Biederman
  2012-04-11  4:33             ` Serge Hallyn
  1 sibling, 1 reply; 227+ messages in thread
From: Andrew Lutomirski @ 2012-04-11  1:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Markus Gutschke, Will Drewry, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

On Tue, Apr 10, 2012 at 6:01 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Andrew Lutomirski <luto@mit.edu> writes:
>
>> On Tue, Apr 10, 2012 at 4:50 PM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>> Andrew Lutomirski <luto@mit.edu> writes:
>>>
>>>> On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman
>>>> <ebiederm@xmission.com> wrote:
>>>>> Andy Lutomirski <luto@MIT.EDU> writes:
>>>>>
>>>>> My understanding of no_new_privs is that current_cred() including
>>>>> the user, the user namespace and the security label will never change,
>>>>> with the goal of making the security analysis simple.
>>>>
>>>> They can change but only if you already have the privilege to change
>>>> them yourself and then you do so.  For example, PR_SET_NO_NEW_PRIVS,
>>>> setuid, then drop caps is allowed and useful -- it's a race-free way
>>>> to make sure that a given uid never executes without no_new_privs set.
>>>>  I've implemented this as a pam module.
>>>
>>> Careful.  There is the security_task_fix_setuid call that will raise
>>> your capabilities from cap->effective to cap->permitted if you call
>>> setuid(0).  Which in the general case means you can regain all of the
>>> root privileges if you only have CAP_SETUID.
>>>
>>
>> That's fine.  If you're running with CAP_SETUID and default
>> securebits, then you effectively have all capabilities already and
>> don't need to exploit setuid binaries to gain them.  no_new_privs
>> doesn't change that.  If you don't want to be able to gain all privs,
>> change securebits or drop CAP_SETUID.  seccomp reduces the kernel
>> attack surface; no_new_privs reduces the userspace attack surface.
>> But see below...
>>
>>
>>>
>>>>> I don't recall how seccomp filters are dealt with if you don't have
>>>>> no_new_privs enabled.  If seccomp filters installed by root
>>>>> are dropped when we change privilege levels it might be worth looking
>>>>> at how to keep a seccomp filter installed as long as you stay in
>>>>> a user namespace.
>>>>>
>>>>
>>>> They're not dropped.  I think in the current implementation they can't
>>>> be dropped at all.
>>>
>>> Which makes sense.   Is this why you need no_new_privs?  So you can't run
>>> seccomp on higher privileged executables and confusing them into keeping
>>> privileges when they should not?
>>
>> Exactly.  seccomp is flexible enough that it's probably possible to
>> confuse many setuid executables with it.
>>
>>>
>>>>> The emphasis is a bit different from new_new_privs as the user_namespace
>>>>> does not need to guarantee that the lsm will not change security labels,
>>>>> etc.
>>>>
>>>> Hmm.  Is this safe?  For example, if there's a program that LSM policy
>>>> grants extra privileges that malfunctions when run inside a user
>>>> namespace, can that be used to break out of LSM restrictions?
>>>
>>> I can't see how it would not be safe.
>>>
>>> Except for the user namespace pointer the state the LSM and the rest of
>>> the kernel sees is the same state the kernel sees.  Aka userspace sees
>>> uid 0, the LSM does not.  So I don't know why a LSM would get confused.
>>>
>>> Beyond that it is a bug for an LSM to grant permissions beyond the
>>> core DAC model.  So the worst I can see is an LSM not grokking user
>>> namespaces and getting confused and not restricting a process as
>>> much as the designer of the LSM would like.
>>
>> Right.  Suppose you have some program that has extra restrictions
>> applied by an LSM.  It executes a helper (e.g. Apache's suidexec
>> thing, but I bet there are more examples) which is supposed to be very
>> careful not to leak privileges.  The LSM is set to restrict that
>> helper less than the parent process.  But that program was written
>> before user namespaces existed, and it has a bug (or missing feature)
>> that allows its parent to exploit it when run inside an unmapped user
>> namespace.  The parent can now escape from the LSM restrictions.
>>
>> no_new_privs is designed to prevent exactly this issue.
>
> Currently the suid exec will fail because the uid's don't map.
>
> I might switch that around to simply ignoring the change of uid
> on suid exec.  I have a patch in my devel tree that plays with
> that idea.  However as much as I hit that case once in testing
> (I think it was ping).  I don't think running suid executables
> is particularly interesting.
>
> Certainly the application program won't care or break, because we are
> still bounded by the usaual DAC security.
>
> I wonder a little if the lsm might change labels on exec of a
> non suid binary.  That case is more interesting in the unmapped
> unprivileged user namespace.
>
> But I just can't seem to care.  The LSM is the line behind which we hide
> the crazy.

Sounds like you're reinventing (something very similar to)
no_new_privs.  Why not just require no_new_privs as a prerequisite for
creating a user namespace if you're unprivileged?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-10 23:56         ` Andrew Lutomirski
@ 2012-04-11  1:01           ` Eric W. Biederman
  2012-04-11  1:00             ` Andrew Lutomirski
  2012-04-11  4:33             ` Serge Hallyn
  0 siblings, 2 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-11  1:01 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Markus Gutschke, Will Drewry, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Andrew Lutomirski <luto@mit.edu> writes:

> On Tue, Apr 10, 2012 at 4:50 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Andrew Lutomirski <luto@mit.edu> writes:
>>
>>> On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman
>>> <ebiederm@xmission.com> wrote:
>>>> Andy Lutomirski <luto@MIT.EDU> writes:
>>>>
>>>> My understanding of no_new_privs is that current_cred() including
>>>> the user, the user namespace and the security label will never change,
>>>> with the goal of making the security analysis simple.
>>>
>>> They can change but only if you already have the privilege to change
>>> them yourself and then you do so.  For example, PR_SET_NO_NEW_PRIVS,
>>> setuid, then drop caps is allowed and useful -- it's a race-free way
>>> to make sure that a given uid never executes without no_new_privs set.
>>>  I've implemented this as a pam module.
>>
>> Careful.  There is the security_task_fix_setuid call that will raise
>> your capabilities from cap->effective to cap->permitted if you call
>> setuid(0).  Which in the general case means you can regain all of the
>> root privileges if you only have CAP_SETUID.
>>
>
> That's fine.  If you're running with CAP_SETUID and default
> securebits, then you effectively have all capabilities already and
> don't need to exploit setuid binaries to gain them.  no_new_privs
> doesn't change that.  If you don't want to be able to gain all privs,
> change securebits or drop CAP_SETUID.  seccomp reduces the kernel
> attack surface; no_new_privs reduces the userspace attack surface.
> But see below...
>
>
>>
>>>> I don't recall how seccomp filters are dealt with if you don't have
>>>> no_new_privs enabled.  If seccomp filters installed by root
>>>> are dropped when we change privilege levels it might be worth looking
>>>> at how to keep a seccomp filter installed as long as you stay in
>>>> a user namespace.
>>>>
>>>
>>> They're not dropped.  I think in the current implementation they can't
>>> be dropped at all.
>>
>> Which makes sense.   Is this why you need no_new_privs?  So you can't run
>> seccomp on higher privileged executables and confusing them into keeping
>> privileges when they should not?
>
> Exactly.  seccomp is flexible enough that it's probably possible to
> confuse many setuid executables with it.
>
>>
>>>> The emphasis is a bit different from new_new_privs as the user_namespace
>>>> does not need to guarantee that the lsm will not change security labels,
>>>> etc.
>>>
>>> Hmm.  Is this safe?  For example, if there's a program that LSM policy
>>> grants extra privileges that malfunctions when run inside a user
>>> namespace, can that be used to break out of LSM restrictions?
>>
>> I can't see how it would not be safe.
>>
>> Except for the user namespace pointer the state the LSM and the rest of
>> the kernel sees is the same state the kernel sees.  Aka userspace sees
>> uid 0, the LSM does not.  So I don't know why a LSM would get confused.
>>
>> Beyond that it is a bug for an LSM to grant permissions beyond the
>> core DAC model.  So the worst I can see is an LSM not grokking user
>> namespaces and getting confused and not restricting a process as
>> much as the designer of the LSM would like.
>
> Right.  Suppose you have some program that has extra restrictions
> applied by an LSM.  It executes a helper (e.g. Apache's suidexec
> thing, but I bet there are more examples) which is supposed to be very
> careful not to leak privileges.  The LSM is set to restrict that
> helper less than the parent process.  But that program was written
> before user namespaces existed, and it has a bug (or missing feature)
> that allows its parent to exploit it when run inside an unmapped user
> namespace.  The parent can now escape from the LSM restrictions.
>
> no_new_privs is designed to prevent exactly this issue.

Currently the suid exec will fail because the uid's don't map.

I might switch that around to simply ignoring the change of uid
on suid exec.  I have a patch in my devel tree that plays with
that idea.  However as much as I hit that case once in testing
(I think it was ping).  I don't think running suid executables
is particularly interesting.

Certainly the application program won't care or break, because we are
still bounded by the usaual DAC security.

I wonder a little if the lsm might change labels on exec of a
non suid binary.  That case is more interesting in the unmapped
unprivileged user namespace.

But I just can't seem to care.  The LSM is the line behind which we hide
the crazy.

The only real difference is that I can create namespaces, which are my
process local environment.  Unprivileged users setting up their own
mount namespace will likely allow all kinds of ways to sneak through the
path based protections of apparmor and tomoyo.  As for smack and selinux
shrug.  I know selinux is at least a lot more path based than the
developers like to admit.  I know most of the /proc and /sys checks are
path based, although I don't think they depend on where you mount
things.  I you can somehow trigger a selinux labelling spree with a
different mount namespace selinux will like do some very wrong things.
smack is simple so it will probably work as intended.

Shrug.  There is nothing special here with the unmapped uid case of
user namespaces.  This is all things that have to be dealt with in some
fashion, but I do believe that is for the LSM maintainers to worry
about.


>>> If a user namespace has no visible effect on processes that aren't
>>> descendents of whoever created it, then creating one in no_new_privs
>>> mode should be safe.  On the other hand, it could be somewhat useless.
>>
>> Creating a user namespace will allowing a process access to more kernel
>> facilities.  Aka you can (or at least will be able to) create network
>> namespaces and mount namespaces and the like.  That increases the
>> surface of the kernel an attacker can hit.
>>
>> So in a perfect kernel there are no affects on others.  In a scenario
>> where you are limiting how much of the kernel a user can use I think
>> you would want that.
>>
>> Still given that you aren't doing the very restrictive current_cred()
>> must not change I don't know how it matters, and a bpf based seccomp can
>> pretty easily filter out new user namespace creation.  Shrug.
>
> I'm not worried about that.  I'm more interested in whether
> unprivileged user namespace creation should require nnp and/or whether
> someone might want a mode in which a task is has nnp set but can
> create a user namespace that allows setuid execution inside the
> namespace in spite of the nnp setting.  The latter is probably rather
> complicated to get right and depends on nonexistent filesystem
> features.

Hmm.  If the goals is to avoid confusing lsms, I think when the user
namespaces and no new privs meet it becomes sensible for no new privs
to deny user namespace fiddling.  No clone(CLONE_NEWUSER), no
unshare(CLONE_NEWUSER) no setns(CLONE_NEWUSER).  It becomes trivial
to confuse path based lsms.

If the goal is to avoid confusing privileged executables with seccomp,
I don't think it matters.  The user namespace guarantees you can't get
additional privileges.

As for requiring no new privs for creating a user namespace, ick.  I
think that will just break things.  suid exec is otherwise safe in a
user namespace and it needs to be supported.  If the LSMs have problems
the LSMs need to figure out how to cope.

I do think  no new privs makes sense inside a user namespace exactly
the same way it makes sense if you don't think about user namespaces.

So I expect a really tight security policy use a user_namespace +
seccomp + no new privs.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-11  1:00             ` Andrew Lutomirski
@ 2012-04-11  1:14               ` Eric W. Biederman
  2012-04-11  1:22                 ` Andrew Lutomirski
  2012-04-11  4:37                 ` Serge Hallyn
  0 siblings, 2 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-11  1:14 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Markus Gutschke, Will Drewry, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Andrew Lutomirski <luto@mit.edu> writes:

> On Tue, Apr 10, 2012 at 6:01 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:

> Sounds like you're reinventing (something very similar to)
> no_new_privs.  Why not just require no_new_privs as a prerequisite for
> creating a user namespace if you're unprivileged?

As I said in the part of my email you snipped, because no_new_privs will
break suid exec in the user namespace.

I am most definitely not going to require something that will make
implementing/using user namespaces almost pointless.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-11  1:14               ` Eric W. Biederman
@ 2012-04-11  1:22                 ` Andrew Lutomirski
  2012-04-11  4:37                 ` Serge Hallyn
  1 sibling, 0 replies; 227+ messages in thread
From: Andrew Lutomirski @ 2012-04-11  1:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Markus Gutschke, Will Drewry, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

On Tue, Apr 10, 2012 at 6:14 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Andrew Lutomirski <luto@mit.edu> writes:
>
>> On Tue, Apr 10, 2012 at 6:01 PM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>
>> Sounds like you're reinventing (something very similar to)
>> no_new_privs.  Why not just require no_new_privs as a prerequisite for
>> creating a user namespace if you're unprivileged?
>
> As I said in the part of my email you snipped, because no_new_privs will
> break suid exec in the user namespace.
>
> I am most definitely not going to require something that will make
> implementing/using user namespaces almost pointless.

This part:

> Currently the suid exec will fail because the uid's don't map.
>
> I might switch that around to simply ignoring the change of uid
> on suid exec.  I have a patch in my devel tree that plays with
> that idea.  However as much as I hit that case once in testing
> (I think it was ping).  I don't think running suid executables
> is particularly interesting.

I'm totally lost now.  I'll wait until I play around with the patches some more.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-10 23:50       ` Eric W. Biederman
  2012-04-10 23:56         ` Andrew Lutomirski
@ 2012-04-11  4:16         ` Serge Hallyn
  1 sibling, 0 replies; 227+ messages in thread
From: Serge Hallyn @ 2012-04-11  4:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Lutomirski, Markus Gutschke, Will Drewry, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Andrew Lutomirski <luto@mit.edu> writes:
> Still given that you aren't doing the very restrictive current_cred()
> must not change I don't know how it matters, and a bpf based seccomp can
> pretty easily filter out new user namespace creation.  Shrug.

I very much want and intend to use both user namespaces and seccomp2
together.  Speaking in terms of the old userns implementation, once
a container has been created, no child of my task will change uid/gid
or gain/move capabilities in the original user namespace.  But they're
free to do so at will in the child user namespace.  Since the capabilities
are targeted at the child namespaces, that's fine.  And as Eric noted
the user namespaces will allow us to increase the attack surface, but
at the same time I'm hoping to offset that somewhat using seccomp2.

-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-11  1:01           ` Eric W. Biederman
  2012-04-11  1:00             ` Andrew Lutomirski
@ 2012-04-11  4:33             ` Serge Hallyn
  1 sibling, 0 replies; 227+ messages in thread
From: Serge Hallyn @ 2012-04-11  4:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Lutomirski, Markus Gutschke, Will Drewry, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Andrew Lutomirski <luto@mit.edu> writes:
> 
> > On Tue, Apr 10, 2012 at 4:50 PM, Eric W. Biederman
> > <ebiederm@xmission.com> wrote:
> >> Andrew Lutomirski <luto@mit.edu> writes:
> >>
> >>> On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman
> >>> <ebiederm@xmission.com> wrote:
> >>>> Andy Lutomirski <luto@MIT.EDU> writes:
> >>>>
> >>>> My understanding of no_new_privs is that current_cred() including
> >>>> the user, the user namespace and the security label will never change,
> >>>> with the goal of making the security analysis simple.
> >>>
> >>> They can change but only if you already have the privilege to change
> >>> them yourself and then you do so.  For example, PR_SET_NO_NEW_PRIVS,
> >>> setuid, then drop caps is allowed and useful -- it's a race-free way
> >>> to make sure that a given uid never executes without no_new_privs set.
> >>>  I've implemented this as a pam module.
> >>
> >> Careful.  There is the security_task_fix_setuid call that will raise
> >> your capabilities from cap->effective to cap->permitted if you call
> >> setuid(0).  Which in the general case means you can regain all of the
> >> root privileges if you only have CAP_SETUID.
> >>
> >
> > That's fine.  If you're running with CAP_SETUID and default
> > securebits, then you effectively have all capabilities already and
> > don't need to exploit setuid binaries to gain them.  no_new_privs
> > doesn't change that.  If you don't want to be able to gain all privs,
> > change securebits or drop CAP_SETUID.  seccomp reduces the kernel
> > attack surface; no_new_privs reduces the userspace attack surface.
> > But see below...
> >
> >
> >>
> >>>> I don't recall how seccomp filters are dealt with if you don't have
> >>>> no_new_privs enabled.  If seccomp filters installed by root
> >>>> are dropped when we change privilege levels it might be worth looking
> >>>> at how to keep a seccomp filter installed as long as you stay in
> >>>> a user namespace.
> >>>>
> >>>
> >>> They're not dropped.  I think in the current implementation they can't
> >>> be dropped at all.
> >>
> >> Which makes sense.   Is this why you need no_new_privs?  So you can't run
> >> seccomp on higher privileged executables and confusing them into keeping
> >> privileges when they should not?
> >
> > Exactly.  seccomp is flexible enough that it's probably possible to
> > confuse many setuid executables with it.
> >
> >>
> >>>> The emphasis is a bit different from new_new_privs as the user_namespace
> >>>> does not need to guarantee that the lsm will not change security labels,
> >>>> etc.
> >>>
> >>> Hmm.  Is this safe?  For example, if there's a program that LSM policy
> >>> grants extra privileges that malfunctions when run inside a user
> >>> namespace, can that be used to break out of LSM restrictions?
> >>
> >> I can't see how it would not be safe.
> >>
> >> Except for the user namespace pointer the state the LSM and the rest of
> >> the kernel sees is the same state the kernel sees.  Aka userspace sees
> >> uid 0, the LSM does not.  So I don't know why a LSM would get confused.
> >>
> >> Beyond that it is a bug for an LSM to grant permissions beyond the
> >> core DAC model.  So the worst I can see is an LSM not grokking user
> >> namespaces and getting confused and not restricting a process as
> >> much as the designer of the LSM would like.
> >
> > Right.  Suppose you have some program that has extra restrictions
> > applied by an LSM.  It executes a helper (e.g. Apache's suidexec
> > thing, but I bet there are more examples) which is supposed to be very
> > careful not to leak privileges.  The LSM is set to restrict that
> > helper less than the parent process.  But that program was written
> > before user namespaces existed, and it has a bug (or missing feature)
> > that allows its parent to exploit it when run inside an unmapped user
> > namespace.  The parent can now escape from the LSM restrictions.
> >
> > no_new_privs is designed to prevent exactly this issue.
> 
> Currently the suid exec will fail because the uid's don't map.
> 
> I might switch that around to simply ignoring the change of uid
> on suid exec.  I have a patch in my devel tree that plays with
> that idea.  However as much as I hit that case once in testing
> (I think it was ping).  I don't think running suid executables
> is particularly interesting.
> 
> Certainly the application program won't care or break, because we are
> still bounded by the usaual DAC security.
> 
> I wonder a little if the lsm might change labels on exec of a
> non suid binary.  That case is more interesting in the unmapped
> unprivileged user namespace.

They will (change labels on exec of non suid binary).  But.  First, any
well behaved user of user namespaces will switch to a (selinux, smack,
apparmor, whatever) context which is aware it is namespaced so that only
desired transitions happen.  So we're left with the concern of uid 1001
creates an unprivileged user namespace and runs a program (as uid 0)
which transitions him to uber_client_t.  Since as Eric has pointed out
the MAC can't override the DAC rules, it still won't be able to write to
files not owned by uid 1001 in the initial user namespace.  We might
worry about it connecting to the privileged server and passing its
uber_client_t credentials to pass a request.  The server being in the
initial user ns will get uid 1001, not 0.

Perhaps the client checks its uid (0 in its user namespace) and passes
it to the server (as a simple message), which blindly accepts that.  In
that case the server could just as easily be exploited without user
namespaces.  

It's possible that there's another way this can be exploited, but I
haven't thought of it yet.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [REVIEW][PATCH 0/43] Completing the user namespace
  2012-04-11  1:14               ` Eric W. Biederman
  2012-04-11  1:22                 ` Andrew Lutomirski
@ 2012-04-11  4:37                 ` Serge Hallyn
  1 sibling, 0 replies; 227+ messages in thread
From: Serge Hallyn @ 2012-04-11  4:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Lutomirski, Markus Gutschke, Will Drewry, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Andrew Lutomirski <luto@mit.edu> writes:
> 
> > On Tue, Apr 10, 2012 at 6:01 PM, Eric W. Biederman
> > <ebiederm@xmission.com> wrote:
> 
> > Sounds like you're reinventing (something very similar to)
> > no_new_privs.  Why not just require no_new_privs as a prerequisite for
> > creating a user namespace if you're unprivileged?
> 
> As I said in the part of my email you snipped, because no_new_privs will
> break suid exec in the user namespace.

Andrew,

note that once you create a new user namespace, you cannot change your
credentials in the ancestor user namespaces.  So in effect you already
have no_new_privs for those namespaces.

So if I'm uid 1001 and I create a task in a new user namespace where
50000 on host is mapped to uid 0 in userns.  Now I try to execute a
file belonging to uid 500 on the host and setuid.  Note that 500 is not
mapped into my user namespace.  That is what Eric meant by either exec
being refused or setuid being ignored.  Either way, the file would be
executed using uid 50000 on the host (and 0 in the user namespace).

-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
       [not found]     ` <1333862139-31737-16-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-04-18 18:48       ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:48 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> - Transform userns->creator from a user_struct reference to a simple
>   kuid_t, kgid_t pair.
> 
>   In cap_capable this allows the check to see if we are the creator of
>   a namespace to become the classic suser style euid permission check.
> 
>   This allows us to remove the need for a struct cred in the mapping
>   functions and still be able to dispaly the user namespace creators
>   uid and gid as 0.
> 
> - Remove the now unnecessary delayed_work in free_user_ns.
> 
>   All that is left for free_user_ns to do is to call kmem_cache_free
>   and put_user_ns.  Those functions can be called in any context
>   so call them directly from free_user_ns removing the need for delayed work.
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
>  include/linux/user_namespace.h |    4 ++--
>  kernel/user.c                  |    7 ++++---
>  kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
>  security/commoncap.c           |    5 +++--
>  4 files changed, 27 insertions(+), 28 deletions(-)
> 
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index d767508..8a391bd 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -9,8 +9,8 @@
>  struct user_namespace {
>  	struct kref		kref;
>  	struct user_namespace	*parent;
> -	struct user_struct	*creator;
> -	struct work_struct	destroyer;
> +	kuid_t			owner;
> +	kgid_t			group;
>  };
>  
>  extern struct user_namespace init_user_ns;
> diff --git a/kernel/user.c b/kernel/user.c
> index 025077e..cff3856 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
>  	.kref = {
>  		.refcount	= ATOMIC_INIT(3),
>  	},
> -	.creator = &root_user,
> +	.owner = GLOBAL_ROOT_UID,
> +	.group = GLOBAL_ROOT_GID,
>  };
>  EXPORT_SYMBOL_GPL(init_user_ns);
>  
> @@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
>   */
>  static DEFINE_SPINLOCK(uidhash_lock);
>  
> -/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
> +/* root_user.__count is 1, for init task cred */
>  struct user_struct root_user = {
> -	.__count	= ATOMIC_INIT(2),
> +	.__count	= ATOMIC_INIT(1),
>  	.processes	= ATOMIC_INIT(1),
>  	.files		= ATOMIC_INIT(0),
>  	.sigpending	= ATOMIC_INIT(0),
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 898e973..f69741a 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
>  {
>  	struct user_namespace *ns, *parent_ns = new->user_ns;
>  	struct user_struct *root_user;
> +	kuid_t owner = make_kuid(new->user_ns, new->euid);
> +	kgid_t group = make_kgid(new->user_ns, new->egid);
> +
> +	/* The creator needs a mapping in the parent user namespace
> +	 * or else we won't be able to reasonably tell userspace who
> +	 * created a user_namespace.
> +	 */
> +	if (!kuid_has_mapping(parent_ns, owner) ||
> +	    !kgid_has_mapping(parent_ns, group))
> +		return -EPERM;
>  
>  	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
>  	if (!ns)
> @@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
>  
>  	/* set the new root user in the credentials under preparation */
>  	ns->parent = parent_ns;

I think in the past the creator cred pinned the ns->parent.  Do you now
need to explicitly pin ns->parent (and release it in free_user_ns())?

> -	ns->creator = new->user;
> +	ns->owner = owner;
> +	ns->group = group;
> +	free_uid(new->user);
>  	new->user = root_user;
>  	new->uid = new->euid = new->suid = new->fsuid = 0;
>  	new->gid = new->egid = new->sgid = new->fsgid = 0;
> @@ -69,29 +81,15 @@ int create_user_ns(struct cred *new)
>  	return 0;
>  }
>  
> -/*
> - * Deferred destructor for a user namespace.  This is required because
> - * free_user_ns() may be called with uidhash_lock held, but we need to call
> - * back to free_uid() which will want to take the lock again.
> - */
> -static void free_user_ns_work(struct work_struct *work)
> +void free_user_ns(struct kref *kref)
>  {
>  	struct user_namespace *parent, *ns =
> -		container_of(work, struct user_namespace, destroyer);
> +		container_of(kref, struct user_namespace, kref);
> +
>  	parent = ns->parent;
> -	free_uid(ns->creator);
>  	kmem_cache_free(user_ns_cachep, ns);
>  	put_user_ns(parent);
>  }
> -
> -void free_user_ns(struct kref *kref)
> -{
> -	struct user_namespace *ns =
> -		container_of(kref, struct user_namespace, kref);
> -
> -	INIT_WORK(&ns->destroyer, free_user_ns_work);
> -	schedule_work(&ns->destroyer);
> -}
>  EXPORT_SYMBOL(free_user_ns);
>  
>  uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid)
> @@ -101,12 +99,11 @@ uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t
>  	if (likely(to == cred->user_ns))
>  		return uid;
>  
> -
>  	/* Is cred->user the creator of the target user_ns
>  	 * or the creator of one of it's parents?
>  	 */
>  	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
> -		if (cred->user == tmp->creator) {
> +		if (uid_eq(cred->user->uid, tmp->owner)) {
>  			return (uid_t)0;
>  		}
>  	}
> @@ -126,7 +123,7 @@ gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t
>  	 * or the creator of one of it's parents?
>  	 */
>  	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
> -		if (cred->user == tmp->creator) {
> +		if (uid_eq(cred->user->uid, tmp->owner)) {
>  			return (gid_t)0;
>  		}
>  	}
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 435d074..f2399d8 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -76,8 +76,9 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
>  		int cap, int audit)
>  {
>  	for (;;) {
> -		/* The creator of the user namespace has all caps. */
> -		if (targ_ns != &init_user_ns && targ_ns->creator == cred->user)
> +		/* The owner of the user namespace has all caps. */
> +		if (targ_ns != &init_user_ns && uid_eq(targ_ns->owner,
> +						       make_kuid(cred->user_ns, cred->euid)))
>  			return 0;
>  
>  		/* Do we have the necessary capabilities? */
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
  2012-04-08  5:15     ` "Eric W. Beiderman
  (?)
  (?)
@ 2012-04-18 18:48     ` Serge E. Hallyn
       [not found]       ` <20120418184847.GA4984-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  2012-04-20 22:58       ` Eric W. Biederman
  -1 siblings, 2 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:48 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> - Transform userns->creator from a user_struct reference to a simple
>   kuid_t, kgid_t pair.
> 
>   In cap_capable this allows the check to see if we are the creator of
>   a namespace to become the classic suser style euid permission check.
> 
>   This allows us to remove the need for a struct cred in the mapping
>   functions and still be able to dispaly the user namespace creators
>   uid and gid as 0.
> 
> - Remove the now unnecessary delayed_work in free_user_ns.
> 
>   All that is left for free_user_ns to do is to call kmem_cache_free
>   and put_user_ns.  Those functions can be called in any context
>   so call them directly from free_user_ns removing the need for delayed work.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  include/linux/user_namespace.h |    4 ++--
>  kernel/user.c                  |    7 ++++---
>  kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
>  security/commoncap.c           |    5 +++--
>  4 files changed, 27 insertions(+), 28 deletions(-)
> 
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index d767508..8a391bd 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -9,8 +9,8 @@
>  struct user_namespace {
>  	struct kref		kref;
>  	struct user_namespace	*parent;
> -	struct user_struct	*creator;
> -	struct work_struct	destroyer;
> +	kuid_t			owner;
> +	kgid_t			group;
>  };
>  
>  extern struct user_namespace init_user_ns;
> diff --git a/kernel/user.c b/kernel/user.c
> index 025077e..cff3856 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
>  	.kref = {
>  		.refcount	= ATOMIC_INIT(3),
>  	},
> -	.creator = &root_user,
> +	.owner = GLOBAL_ROOT_UID,
> +	.group = GLOBAL_ROOT_GID,
>  };
>  EXPORT_SYMBOL_GPL(init_user_ns);
>  
> @@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
>   */
>  static DEFINE_SPINLOCK(uidhash_lock);
>  
> -/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
> +/* root_user.__count is 1, for init task cred */
>  struct user_struct root_user = {
> -	.__count	= ATOMIC_INIT(2),
> +	.__count	= ATOMIC_INIT(1),
>  	.processes	= ATOMIC_INIT(1),
>  	.files		= ATOMIC_INIT(0),
>  	.sigpending	= ATOMIC_INIT(0),
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 898e973..f69741a 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
>  {
>  	struct user_namespace *ns, *parent_ns = new->user_ns;
>  	struct user_struct *root_user;
> +	kuid_t owner = make_kuid(new->user_ns, new->euid);
> +	kgid_t group = make_kgid(new->user_ns, new->egid);
> +
> +	/* The creator needs a mapping in the parent user namespace
> +	 * or else we won't be able to reasonably tell userspace who
> +	 * created a user_namespace.
> +	 */
> +	if (!kuid_has_mapping(parent_ns, owner) ||
> +	    !kgid_has_mapping(parent_ns, group))
> +		return -EPERM;
>  
>  	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
>  	if (!ns)
> @@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
>  
>  	/* set the new root user in the credentials under preparation */
>  	ns->parent = parent_ns;

I think in the past the creator cred pinned the ns->parent.  Do you now
need to explicitly pin ns->parent (and release it in free_user_ns())?

> -	ns->creator = new->user;
> +	ns->owner = owner;
> +	ns->group = group;
> +	free_uid(new->user);
>  	new->user = root_user;
>  	new->uid = new->euid = new->suid = new->fsuid = 0;
>  	new->gid = new->egid = new->sgid = new->fsgid = 0;
> @@ -69,29 +81,15 @@ int create_user_ns(struct cred *new)
>  	return 0;
>  }
>  
> -/*
> - * Deferred destructor for a user namespace.  This is required because
> - * free_user_ns() may be called with uidhash_lock held, but we need to call
> - * back to free_uid() which will want to take the lock again.
> - */
> -static void free_user_ns_work(struct work_struct *work)
> +void free_user_ns(struct kref *kref)
>  {
>  	struct user_namespace *parent, *ns =
> -		container_of(work, struct user_namespace, destroyer);
> +		container_of(kref, struct user_namespace, kref);
> +
>  	parent = ns->parent;
> -	free_uid(ns->creator);
>  	kmem_cache_free(user_ns_cachep, ns);
>  	put_user_ns(parent);
>  }
> -
> -void free_user_ns(struct kref *kref)
> -{
> -	struct user_namespace *ns =
> -		container_of(kref, struct user_namespace, kref);
> -
> -	INIT_WORK(&ns->destroyer, free_user_ns_work);
> -	schedule_work(&ns->destroyer);
> -}
>  EXPORT_SYMBOL(free_user_ns);
>  
>  uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid)
> @@ -101,12 +99,11 @@ uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t
>  	if (likely(to == cred->user_ns))
>  		return uid;
>  
> -
>  	/* Is cred->user the creator of the target user_ns
>  	 * or the creator of one of it's parents?
>  	 */
>  	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
> -		if (cred->user == tmp->creator) {
> +		if (uid_eq(cred->user->uid, tmp->owner)) {
>  			return (uid_t)0;
>  		}
>  	}
> @@ -126,7 +123,7 @@ gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t
>  	 * or the creator of one of it's parents?
>  	 */
>  	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
> -		if (cred->user == tmp->creator) {
> +		if (uid_eq(cred->user->uid, tmp->owner)) {
>  			return (gid_t)0;
>  		}
>  	}
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 435d074..f2399d8 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -76,8 +76,9 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
>  		int cap, int audit)
>  {
>  	for (;;) {
> -		/* The creator of the user namespace has all caps. */
> -		if (targ_ns != &init_user_ns && targ_ns->creator == cred->user)
> +		/* The owner of the user namespace has all caps. */
> +		if (targ_ns != &init_user_ns && uid_eq(targ_ns->owner,
> +						       make_kuid(cred->user_ns, cred->euid)))
>  			return 0;
>  
>  		/* Do we have the necessary capabilities? */
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 17/43] userns: Rework the user_namespace adding uid/gid mapping support
  2012-04-08  5:15     ` "Eric W. Beiderman
@ 2012-04-18 18:49         ` Serge E. Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:49 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> - Convert the old uid mapping functions into compatibility wrappers
> - Add a uid/gid mapping layer from user space uid and gids to kernel
>   internal uids and gids that is extent based for simplicty and speed.
>   * Working with number space after mapping uids/gids into their kernel
>     internal version adds only mapping complexity over what we have today,
>     leaving the kernel code easy to understand and test.
> - Add proc files /proc/self/uid_map /proc/self/gid_map
>   These files display the mapping and allow a mapping to be added
>   if a mapping does not exist.
> - Allow entering the user namespace without a uid or gid mapping.
>   Since we are starting with an existing user our uids and gids
>   still have global mappings so are still valid and useful they just don't
>   have local mappings.  The requirement for things to work are global uid
>   and gid so it is odd but perfectly fine not to have a local uid
>   and gid mapping.
>   Not requiring global uid and gid mappings greatly simplifies
>   the logic of setting up the uid and gid mappings by allowing
>   the mappings to be set after the namespace is created which makes the
>   slight weirdness worth it.
> - Make the mappings in the initial user namespace to the global
>   uid/gid space explicit.  Today it is an identity mapping
>   but in the future we may want to twist this for debugging, similar
>   to what we do with jiffies.
> - Document the memory ordering requirements of setting the uid and
>   gid mappings.  We only allow the mappings to be set once
>   and there are no pointers involved so the requirments are
>   trivial but a little atypical.
> 
> Performance:
> 
> In this scheme for the permission checks the performance is expected to
> stay the same as the actuall machine instructions should remain the same.
> 
> The worst case I could think of is ls -l on a large directory where
> all of the stat results need to be translated with from kuids and
> kgids to uids and gids.  So I benchmarked that case on my laptop
> with a dual core hyperthread Intel i5-2520M cpu with 3M of cpu cache.
> 
> My benchmark consisted of going to single user mode where nothing else
> was running. On an ext4 filesystem opening 1,000,000 files and looping
> through all of the files 1000 times and calling fstat on the
> individuals files.  This was to ensure I was benchmarking stat times
> where the inodes were in the kernels cache, but the inode values were
> not in the processors cache.  My results:
> 
> v3.4-rc1:         ~= 156ns (unmodified v3.4-rc1 with user namespace support disabled)
> v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
> v3.4-rc1-userns+: ~= 164ns (v3.4-rc1 with my user namespace patches and user namespace support enabled)
> 
> All of the configurations ran in roughly 120ns when I performed tests
> that ran in the cpu cache.
> 
> So in summary the performance impact is:
> 1ns improvement in the worst case with user namespace support compiled out.
> 8ns aka 5% slowdown in the worst case with user namespace support compiled in.
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  fs/proc/base.c                 |   77 ++++++
>  include/linux/uidgid.h         |   24 ++
>  include/linux/user_namespace.h |   30 ++-
>  kernel/user.c                  |   16 ++
>  kernel/user_namespace.c        |  545 +++++++++++++++++++++++++++++++++++++---
>  5 files changed, 644 insertions(+), 48 deletions(-)
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 1c8b280..2ee514c 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -81,6 +81,7 @@
>  #include <linux/oom.h>
>  #include <linux/elf.h>
>  #include <linux/pid_namespace.h>
> +#include <linux/user_namespace.h>
>  #include <linux/fs_struct.h>
>  #include <linux/slab.h>
>  #include <linux/flex_array.h>
> @@ -2943,6 +2944,74 @@ static int proc_tgid_io_accounting(struct task_struct *task, char *buffer)
>  }
>  #endif /* CONFIG_TASK_IO_ACCOUNTING */
>  
> +#ifdef CONFIG_USER_NS
> +static int proc_id_map_open(struct inode *inode, struct file *file,
> +	struct seq_operations *seq_ops)
> +{
> +	struct user_namespace *ns = NULL;
> +	struct task_struct *task;
> +	struct seq_file *seq;
> +	int ret = -EINVAL;
> +
> +	task = get_proc_task(inode);
> +	if (task) {
> +		rcu_read_lock();
> +		ns = get_user_ns(task_cred_xxx(task, user_ns));
> +		rcu_read_unlock();
> +		put_task_struct(task);
> +	}
> +	if (!ns)
> +		goto err;
> +
> +	ret = seq_open(file, seq_ops);
> +	if (ret)
> +		goto err_put_ns;
> +
> +	seq = file->private_data;
> +	seq->private = ns;
> +
> +	return 0;
> +err_put_ns:
> +	put_user_ns(ns);
> +err:
> +	return ret;
> +}
> +
> +static int proc_id_map_release(struct inode *inode, struct file *file)
> +{
> +	struct seq_file *seq = file->private_data;
> +	struct user_namespace *ns = seq->private;
> +	put_user_ns(ns);
> +	return seq_release(inode, file);
> +}
> +
> +static int proc_uid_map_open(struct inode *inode, struct file *file)
> +{
> +	return proc_id_map_open(inode, file, &proc_uid_seq_operations);
> +}
> +
> +static int proc_gid_map_open(struct inode *inode, struct file *file)
> +{
> +	return proc_id_map_open(inode, file, &proc_gid_seq_operations);
> +}
> +
> +static const struct file_operations proc_uid_map_operations = {
> +	.open		= proc_uid_map_open,
> +	.write		= proc_uid_map_write,
> +	.read		= seq_read,
> +	.llseek		= seq_lseek,
> +	.release	= proc_id_map_release,
> +};
> +
> +static const struct file_operations proc_gid_map_operations = {
> +	.open		= proc_gid_map_open,
> +	.write		= proc_gid_map_write,
> +	.read		= seq_read,
> +	.llseek		= seq_lseek,
> +	.release	= proc_id_map_release,
> +};
> +#endif /* CONFIG_USER_NS */
> +
>  static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
>  				struct pid *pid, struct task_struct *task)
>  {
> @@ -3045,6 +3114,10 @@ static const struct pid_entry tgid_base_stuff[] = {
>  #ifdef CONFIG_HARDWALL
>  	INF("hardwall",   S_IRUGO, proc_pid_hardwall),
>  #endif
> +#ifdef CONFIG_USER_NS
> +	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
> +	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
> +#endif
>  };
>  
>  static int proc_tgid_base_readdir(struct file * filp,
> @@ -3400,6 +3473,10 @@ static const struct pid_entry tid_base_stuff[] = {
>  #ifdef CONFIG_HARDWALL
>  	INF("hardwall",   S_IRUGO, proc_pid_hardwall),
>  #endif
> +#ifdef CONFIG_USER_NS
> +	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
> +	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
> +#endif
>  };
>  
>  static int proc_tid_base_readdir(struct file * filp,
> diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
> index 5398568..8e522cbc 100644
> --- a/include/linux/uidgid.h
> +++ b/include/linux/uidgid.h
> @@ -127,6 +127,28 @@ static inline bool gid_valid(kgid_t gid)
>  	return !gid_eq(gid, INVALID_GID);
>  }
>  
> +#ifdef CONFIG_USER_NS
> +
> +extern kuid_t make_kuid(struct user_namespace *from, uid_t uid);
> +extern kgid_t make_kgid(struct user_namespace *from, gid_t gid);
> +
> +extern uid_t from_kuid(struct user_namespace *to, kuid_t uid);
> +extern gid_t from_kgid(struct user_namespace *to, kgid_t gid);
> +extern uid_t from_kuid_munged(struct user_namespace *to, kuid_t uid);
> +extern gid_t from_kgid_munged(struct user_namespace *to, kgid_t gid);
> +
> +static inline bool kuid_has_mapping(struct user_namespace *ns, kuid_t uid)
> +{
> +	return from_kuid(ns, uid) != (uid_t) -1;
> +}
> +
> +static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid)
> +{
> +	return from_kgid(ns, gid) != (gid_t) -1;
> +}
> +
> +#else
> +
>  static inline kuid_t make_kuid(struct user_namespace *from, uid_t uid)
>  {
>  	return KUIDT_INIT(uid);
> @@ -173,4 +195,6 @@ static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid)
>  	return true;
>  }
>  
> +#endif /* CONFIG_USER_NS */
> +
>  #endif /* _LINUX_UIDGID_H */
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 8a391bd..4c9846d 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -6,7 +6,20 @@
>  #include <linux/sched.h>
>  #include <linux/err.h>
>  
> +#define UID_GID_MAP_MAX_EXTENTS 5
> +
> +struct uid_gid_map {	/* 64 bytes -- 1 cache line */
> +	u32 nr_extents;
> +	struct uid_gid_extent {
> +		u32 first;
> +		u32 lower_first;
> +		u32 count;
> +	} extent[UID_GID_MAP_MAX_EXTENTS];
> +};
> +
>  struct user_namespace {
> +	struct uid_gid_map	uid_map;
> +	struct uid_gid_map	gid_map;
>  	struct kref		kref;
>  	struct user_namespace	*parent;
>  	kuid_t			owner;
> @@ -33,9 +46,11 @@ static inline void put_user_ns(struct user_namespace *ns)
>  		kref_put(&ns->kref, free_user_ns);
>  }
>  
> -uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid);
> -gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t gid);
> -
> +struct seq_operations;
> +extern struct seq_operations proc_uid_seq_operations;
> +extern struct seq_operations proc_gid_seq_operations;
> +extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
> +extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
>  #else
>  
>  static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
> @@ -52,17 +67,18 @@ static inline void put_user_ns(struct user_namespace *ns)
>  {
>  }
>  
> +#endif
> +
>  static inline uid_t user_ns_map_uid(struct user_namespace *to,
>  	const struct cred *cred, uid_t uid)
>  {
> -	return uid;
> +	return from_kuid_munged(to, make_kuid(cred->user_ns, uid));
>  }
> +
>  static inline gid_t user_ns_map_gid(struct user_namespace *to,
>  	const struct cred *cred, gid_t gid)
>  {
> -	return gid;
> +	return from_kgid_munged(to, make_kgid(cred->user_ns, gid));
>  }
>  
> -#endif
> -
>  #endif /* _LINUX_USER_H */
> diff --git a/kernel/user.c b/kernel/user.c
> index cff3856..f9e420e 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -22,6 +22,22 @@
>   * and 1 for... ?
>   */
>  struct user_namespace init_user_ns = {
> +	.uid_map = {
> +		.nr_extents = 1,
> +		.extent[0] = {
> +			.first = 0,
> +			.lower_first = 0,
> +			.count = 4294967295,
> +		},
> +	},
> +	.gid_map = {
> +		.nr_extents = 1,
> +		.extent[0] = {
> +			.first = 0,
> +			.lower_first = 0,
> +			.count = 4294967295,
> +		},
> +	},
>  	.kref = {
>  		.refcount	= ATOMIC_INIT(3),
>  	},
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index f69741a..9991bac 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -12,9 +12,19 @@
>  #include <linux/highuid.h>
>  #include <linux/cred.h>
>  #include <linux/securebits.h>
> +#include <linux/keyctl.h>
> +#include <linux/key-type.h>
> +#include <keys/user-type.h>
> +#include <linux/seq_file.h>
> +#include <linux/fs.h>
> +#include <linux/uaccess.h>
> +#include <linux/ctype.h>
>  
>  static struct kmem_cache *user_ns_cachep __read_mostly;
>  
> +static bool new_idmap_permitted(struct user_namespace *ns, int cap_setid,
> +				struct uid_gid_map *map);
> +
>  /*
>   * Create a new user namespace, deriving the creator from the user in the
>   * passed credentials, and replacing that user with the new root user for the
> @@ -26,7 +36,6 @@ static struct kmem_cache *user_ns_cachep __read_mostly;
>  int create_user_ns(struct cred *new)
>  {
>  	struct user_namespace *ns, *parent_ns = new->user_ns;
> -	struct user_struct *root_user;
>  	kuid_t owner = make_kuid(new->user_ns, new->euid);
>  	kgid_t group = make_kgid(new->user_ns, new->egid);
>  
> @@ -38,29 +47,15 @@ int create_user_ns(struct cred *new)
>  	    !kgid_has_mapping(parent_ns, group))
>  		return -EPERM;
>  
> -	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
> +	ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL);
>  	if (!ns)
>  		return -ENOMEM;
>  
>  	kref_init(&ns->kref);
> -
> -	/* Alloc new root user.  */
> -	root_user = alloc_uid(make_kuid(ns, 0));
> -	if (!root_user) {
> -		kmem_cache_free(user_ns_cachep, ns);
> -		return -ENOMEM;
> -	}
> -
> -	/* set the new root user in the credentials under preparation */
>  	ns->parent = parent_ns;
>  	ns->owner = owner;
>  	ns->group = group;
> -	free_uid(new->user);
> -	new->user = root_user;
> -	new->uid = new->euid = new->suid = new->fsuid = 0;
> -	new->gid = new->egid = new->sgid = new->fsgid = 0;
> -	put_group_info(new->group_info);
> -	new->group_info = get_group_info(&init_groups);
> +
>  	/* Start with the same capabilities as init but useless for doing
>  	 * anything as the capabilities are bound to the new user namespace.
>  	 */
> @@ -92,44 +87,512 @@ void free_user_ns(struct kref *kref)
>  }
>  EXPORT_SYMBOL(free_user_ns);
>  
> -uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid)
> +static u32 map_id_range_down(struct uid_gid_map *map, u32 id, u32 count)
>  {
> -	struct user_namespace *tmp;
> +	unsigned idx, extents;
> +	u32 first, last, id2;
>  
> -	if (likely(to == cred->user_ns))
> -		return uid;
> +	id2 = id + count - 1;
>  
> -	/* Is cred->user the creator of the target user_ns
> -	 * or the creator of one of it's parents?
> -	 */
> -	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
> -		if (uid_eq(cred->user->uid, tmp->owner)) {
> -			return (uid_t)0;
> -		}
> +	/* Find the matching extent */
> +	extents = map->nr_extents;
> +	smp_read_barrier_depends();
> +	for (idx = 0; idx < extents; idx++) {
> +		first = map->extent[idx].first;
> +		last = first + map->extent[idx].count - 1;
> +		if (id >= first && id <= last &&
> +		    (id2 >= first && id2 <= last))
> +			break;
> +	}
> +	/* Map the id or note failure */
> +	if (idx < extents)
> +		id = (id - first) + map->extent[idx].lower_first;
> +	else
> +		id = (u32) -1;
> +
> +	return id;
> +}
> +
> +static u32 map_id_down(struct uid_gid_map *map, u32 id)
> +{
> +	unsigned idx, extents;
> +	u32 first, last;
> +
> +	/* Find the matching extent */
> +	extents = map->nr_extents;
> +	smp_read_barrier_depends();
> +	for (idx = 0; idx < extents; idx++) {
> +		first = map->extent[idx].first;
> +		last = first + map->extent[idx].count - 1;
> +		if (id >= first && id <= last)
> +			break;
> +	}
> +	/* Map the id or note failure */
> +	if (idx < extents)
> +		id = (id - first) + map->extent[idx].lower_first;
> +	else
> +		id = (u32) -1;
> +
> +	return id;
> +}
> +
> +static u32 map_id_up(struct uid_gid_map *map, u32 id)
> +{
> +	unsigned idx, extents;
> +	u32 first, last;
> +
> +	/* Find the matching extent */
> +	extents = map->nr_extents;
> +	smp_read_barrier_depends();
> +	for (idx = 0; idx < extents; idx++) {
> +		first = map->extent[idx].lower_first;
> +		last = first + map->extent[idx].count - 1;
> +		if (id >= first && id <= last)
> +			break;
>  	}
> +	/* Map the id or note failure */
> +	if (idx < extents)
> +		id = (id - first) + map->extent[idx].first;
> +	else
> +		id = (u32) -1;
> +
> +	return id;
> +}
> +
> +/**
> + *	make_kuid - Map a user-namespace uid pair into a kuid.
> + *	@ns:  User namespace that the uid is in
> + *	@uid: User identifier
> + *
> + *	Maps a user-namespace uid pair into a kernel internal kuid,
> + *	and returns that kuid.
> + *
> + *	When there is no mapping defined for the user-namespace uid
> + *	pair INVALID_UID is returned.  Callers are expected to test
> + *	for and handle handle INVALID_UID being returned.  INVALID_UID
> + *	may be tested for using uid_valid().
> + */
> +kuid_t make_kuid(struct user_namespace *ns, uid_t uid)
> +{
> +	/* Map the uid to a global kernel uid */
> +	return KUIDT_INIT(map_id_down(&ns->uid_map, uid));
> +}
> +EXPORT_SYMBOL(make_kuid);
> +
> +/**
> + *	from_kuid - Create a uid from a kuid user-namespace pair.
> + *	@targ: The user namespace we want a uid in.
> + *	@kuid: The kernel internal uid to start with.
> + *
> + *	Map @kuid into the user-namespace specified by @targ and
> + *	return the resulting uid.
> + *
> + *	There is always a mapping into the initial user_namespace.
> + *
> + *	If @kuid has no mapping in @targ (uid_t)-1 is returned.
> + */
> +uid_t from_kuid(struct user_namespace *targ, kuid_t kuid)
> +{
> +	/* Map the uid from a global kernel uid */
> +	return map_id_up(&targ->uid_map, __kuid_val(kuid));
> +}
> +EXPORT_SYMBOL(from_kuid);
> +
> +/**
> + *	from_kuid_munged - Create a uid from a kuid user-namespace pair.
> + *	@targ: The user namespace we want a uid in.
> + *	@kuid: The kernel internal uid to start with.
> + *
> + *	Map @kuid into the user-namespace specified by @targ and
> + *	return the resulting uid.
> + *
> + *	There is always a mapping into the initial user_namespace.
> + *
> + *	Unlike from_kuid from_kuid_munged never fails and always
> + *	returns a valid uid.  This makes from_kuid_munged appropriate
> + *	for use in syscalls like stat and getuid where failing the
> + *	system call and failing to provide a valid uid are not an
> + *	options.
> + *
> + *	If @kuid has no mapping in @targ overflowuid is returned.
> + */
> +uid_t from_kuid_munged(struct user_namespace *targ, kuid_t kuid)
> +{
> +	uid_t uid;
> +	uid = from_kuid(targ, kuid);
> +
> +	if (uid == (uid_t) -1)
> +		uid = overflowuid;
> +	return uid;
> +}
> +EXPORT_SYMBOL(from_kuid_munged);
> +
> +/**
> + *	make_kgid - Map a user-namespace gid pair into a kgid.
> + *	@ns:  User namespace that the gid is in
> + *	@uid: group identifier
> + *
> + *	Maps a user-namespace gid pair into a kernel internal kgid,
> + *	and returns that kgid.
> + *
> + *	When there is no mapping defined for the user-namespace gid
> + *	pair INVALID_GID is returned.  Callers are expected to test
> + *	for and handle INVALID_GID being returned.  INVALID_GID may be
> + *	tested for using gid_valid().
> + */
> +kgid_t make_kgid(struct user_namespace *ns, gid_t gid)
> +{
> +	/* Map the gid to a global kernel gid */
> +	return KGIDT_INIT(map_id_down(&ns->gid_map, gid));
> +}
> +EXPORT_SYMBOL(make_kgid);
> +
> +/**
> + *	from_kgid - Create a gid from a kgid user-namespace pair.
> + *	@targ: The user namespace we want a gid in.
> + *	@kgid: The kernel internal gid to start with.
> + *
> + *	Map @kgid into the user-namespace specified by @targ and
> + *	return the resulting gid.
> + *
> + *	There is always a mapping into the initial user_namespace.
> + *
> + *	If @kgid has no mapping in @targ (gid_t)-1 is returned.
> + */
> +gid_t from_kgid(struct user_namespace *targ, kgid_t kgid)
> +{
> +	/* Map the gid from a global kernel gid */
> +	return map_id_up(&targ->gid_map, __kgid_val(kgid));
> +}
> +EXPORT_SYMBOL(from_kgid);
> +
> +/**
> + *	from_kgid_munged - Create a gid from a kgid user-namespace pair.
> + *	@targ: The user namespace we want a gid in.
> + *	@kgid: The kernel internal gid to start with.
> + *
> + *	Map @kgid into the user-namespace specified by @targ and
> + *	return the resulting gid.
> + *
> + *	There is always a mapping into the initial user_namespace.
> + *
> + *	Unlike from_kgid from_kgid_munged never fails and always
> + *	returns a valid gid.  This makes from_kgid_munged appropriate
> + *	for use in syscalls like stat and getgid where failing the
> + *	system call and failing to provide a valid gid are not options.
> + *
> + *	If @kgid has no mapping in @targ overflowgid is returned.
> + */
> +gid_t from_kgid_munged(struct user_namespace *targ, kgid_t kgid)
> +{
> +	gid_t gid;
> +	gid = from_kgid(targ, kgid);
> +
> +	if (gid == (gid_t) -1)
> +		gid = overflowgid;
> +	return gid;
> +}
> +EXPORT_SYMBOL(from_kgid_munged);
> +
> +static int uid_m_show(struct seq_file *seq, void *v)
> +{
> +	struct user_namespace *ns = seq->private;
> +	struct uid_gid_extent *extent = v;
> +	struct user_namespace *lower_ns;
> +	uid_t lower;
>  
> -	/* No useful relationship so no mapping */
> -	return overflowuid;
> +	lower_ns = current_user_ns();
> +	if ((lower_ns == ns) && lower_ns->parent)
> +		lower_ns = lower_ns->parent;
> +
> +	lower = from_kuid(lower_ns, KUIDT_INIT(extent->lower_first));
> +
> +	seq_printf(seq, "%10u %10u %10u\n",
> +		extent->first,
> +		lower,
> +		extent->count);
> +
> +	return 0;
>  }
>  
> -gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t gid)
> +static int gid_m_show(struct seq_file *seq, void *v)
>  {
> -	struct user_namespace *tmp;
> +	struct user_namespace *ns = seq->private;
> +	struct uid_gid_extent *extent = v;
> +	struct user_namespace *lower_ns;
> +	gid_t lower;
>  
> -	if (likely(to == cred->user_ns))
> -		return gid;
> +	lower_ns = current_user_ns();
> +	if ((lower_ns == ns) && lower_ns->parent)
> +		lower_ns = lower_ns->parent;
>  
> -	/* Is cred->user the creator of the target user_ns
> -	 * or the creator of one of it's parents?
> +	lower = from_kgid(lower_ns, KGIDT_INIT(extent->lower_first));
> +
> +	seq_printf(seq, "%10u %10u %10u\n",
> +		extent->first,
> +		lower,
> +		extent->count);
> +
> +	return 0;
> +}
> +
> +static void *m_start(struct seq_file *seq, loff_t *ppos, struct uid_gid_map *map)
> +{
> +	struct uid_gid_extent *extent = NULL;
> +	loff_t pos = *ppos;
> +
> +	if (pos < map->nr_extents)
> +		extent = &map->extent[pos];
> +
> +	return extent;
> +}
> +
> +static void *uid_m_start(struct seq_file *seq, loff_t *ppos)
> +{
> +	struct user_namespace *ns = seq->private;
> +
> +	return m_start(seq, ppos, &ns->uid_map);
> +}
> +
> +static void *gid_m_start(struct seq_file *seq, loff_t *ppos)
> +{
> +	struct user_namespace *ns = seq->private;
> +
> +	return m_start(seq, ppos, &ns->gid_map);
> +}
> +
> +static void *m_next(struct seq_file *seq, void *v, loff_t *pos)
> +{
> +	(*pos)++;
> +	return seq->op->start(seq, pos);
> +}
> +
> +static void m_stop(struct seq_file *seq, void *v)
> +{
> +	return;
> +}
> +
> +struct seq_operations proc_uid_seq_operations = {
> +	.start = uid_m_start,
> +	.stop = m_stop,
> +	.next = m_next,
> +	.show = uid_m_show,
> +};
> +
> +struct seq_operations proc_gid_seq_operations = {
> +	.start = gid_m_start,
> +	.stop = m_stop,
> +	.next = m_next,
> +	.show = gid_m_show,
> +};
> +
> +static DEFINE_MUTEX(id_map_mutex);
> +
> +static ssize_t map_write(struct file *file, const char __user *buf,
> +			 size_t count, loff_t *ppos,
> +			 int cap_setid,
> +			 struct uid_gid_map *map,
> +			 struct uid_gid_map *parent_map)
> +{
> +	struct seq_file *seq = file->private_data;
> +	struct user_namespace *ns = seq->private;
> +	struct uid_gid_map new_map;
> +	unsigned idx;
> +	struct uid_gid_extent *extent, *last = NULL;
> +	unsigned long page = 0;
> +	char *kbuf, *pos, *next_line;
> +	ssize_t ret = -EINVAL;
> +
> +	/*
> +	 * The id_map_mutex serializes all writes to any given map.
> +	 *
> +	 * Any map is only ever written once.
> +	 *
> +	 * An id map fits within 1 cache line on most architectures.
> +	 *
> +	 * On read nothing needs to be done unless you are on an
> +	 * architecture with a crazy cache coherency model like alpha.
> +	 *
> +	 * There is a one time data dependency between reading the
> +	 * count of the extents and the values of the extents.  The
> +	 * desired behavior is to see the values of the extents that
> +	 * were written before the count of the extents.
> +	 *
> +	 * To achieve this smp_wmb() is used on guarantee the write
> +	 * order and smp_read_barrier_depends() is guaranteed that we
> +	 * don't have crazy architectures returning stale data.
> +	 *
> +	 */
> +	mutex_lock(&id_map_mutex);
> +
> +	ret = -EPERM;
> +	/* Only allow one successful write to the map */
> +	if (map->nr_extents != 0)
> +		goto out;
> +
> +	/* Require the appropriate privilege CAP_SETUID or CAP_SETGID
> +	 * over the user namespace in order to set the id mapping.
>  	 */
> -	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
> -		if (uid_eq(cred->user->uid, tmp->owner)) {
> -			return (gid_t)0;
> +	if (!ns_capable(ns, cap_setid))
> +		goto out;
> +
> +	/* Get a buffer */
> +	ret = -ENOMEM;
> +	page = __get_free_page(GFP_TEMPORARY);
> +	kbuf = (char *) page;
> +	if (!page)
> +		goto out;
> +
> +	/* Only allow <= page size writes at the beginning of the file */
> +	ret = -EINVAL;
> +	if ((*ppos != 0) || (count >= PAGE_SIZE))
> +		goto out;
> +
> +	/* Slurp in the user data */
> +	ret = -EFAULT;
> +	if (copy_from_user(kbuf, buf, count))
> +		goto out;
> +	kbuf[count] = '\0';
> +
> +	/* Parse the user data */
> +	ret = -EINVAL;
> +	pos = kbuf;
> +	new_map.nr_extents = 0;
> +	for (;pos; pos = next_line) {
> +		extent = &new_map.extent[new_map.nr_extents];
> +
> +		/* Find the end of line and ensure I don't look past it */
> +		next_line = strchr(pos, '\n');
> +		if (next_line) {
> +			*next_line = '\0';
> +			next_line++;
> +			if (*next_line == '\0')
> +				next_line = NULL;
>  		}
> +
> +		pos = skip_spaces(pos);
> +		extent->first = simple_strtoul(pos, &pos, 10);
> +		if (!isspace(*pos))
> +			goto out;
> +
> +		pos = skip_spaces(pos);
> +		extent->lower_first = simple_strtoul(pos, &pos, 10);
> +		if (!isspace(*pos))
> +			goto out;
> +
> +		pos = skip_spaces(pos);
> +		extent->count = simple_strtoul(pos, &pos, 10);
> +		if (*pos && !isspace(*pos))
> +			goto out;
> +
> +		/* Verify there is not trailing junk on the line */
> +		pos = skip_spaces(pos);
> +		if (*pos != '\0')
> +			goto out;
> +
> +		/* Verify we have been given valid starting values */
> +		if ((extent->first == (u32) -1) ||
> +		    (extent->lower_first == (u32) -1 ))
> +			goto out;
> +
> +		/* Verify count is not zero and does not cause the extent to wrap */
> +		if ((extent->first + extent->count) <= extent->first)
> +			goto out;
> +		if ((extent->lower_first + extent->count) <= extent->lower_first)
> +			goto out;
> +
> +		/* For now only accept extents that are strictly in order */
> +		if (last &&
> +		    (((last->first + last->count) > extent->first) ||
> +		     ((last->lower_first + last->count) > extent->lower_first)))
> +			goto out;
> +
> +		new_map.nr_extents++;
> +		last = extent;
> +
> +		/* Fail if the file contains too many extents */
> +		if ((new_map.nr_extents == UID_GID_MAP_MAX_EXTENTS) &&
> +		    (next_line != NULL))
> +			goto out;
>  	}
> +	/* Be very certaint the new map actually exists */
> +	if (new_map.nr_extents == 0)
> +		goto out;
> +
> +	ret = -EPERM;
> +	/* Validate the user is allowed to use user id's mapped to. */
> +	if (!new_idmap_permitted(ns, cap_setid, &new_map))
> +		goto out;
> +
> +	/* Map the lower ids from the parent user namespace to the
> +	 * kernel global id space.
> +	 */
> +	for (idx = 0; idx < new_map.nr_extents; idx++) {
> +		u32 lower_first;
> +		extent = &new_map.extent[idx];
> +
> +		lower_first = map_id_range_down(parent_map,
> +						extent->lower_first,
> +						extent->count);
> +
> +		/* Fail if we can not map the specified extent to
> +		 * the kernel global id space.
> +		 */
> +		if (lower_first == (u32) -1)
> +			goto out;
> +
> +		extent->lower_first = lower_first;
> +	}
> +
> +	/* Install the map */
> +	memcpy(map->extent, new_map.extent,
> +		new_map.nr_extents*sizeof(new_map.extent[0]));
> +	smp_wmb();
> +	map->nr_extents = new_map.nr_extents;
> +
> +	*ppos = count;
> +	ret = count;
> +out:
> +	mutex_unlock(&id_map_mutex);
> +	if (page)
> +		free_page(page);
> +	return ret;
> +}
> +
> +ssize_t proc_uid_map_write(struct file *file, const char __user *buf, size_t size, loff_t *ppos)
> +{
> +	struct seq_file *seq = file->private_data;
> +	struct user_namespace *ns = seq->private;
> +
> +	if (!ns->parent)
> +		return -EPERM;
> +
> +	return map_write(file, buf, size, ppos, CAP_SETUID,
> +			 &ns->uid_map, &ns->parent->uid_map);
> +}
> +
> +ssize_t proc_gid_map_write(struct file *file, const char __user *buf, size_t size, loff_t *ppos)
> +{
> +	struct seq_file *seq = file->private_data;
> +	struct user_namespace *ns = seq->private;
> +
> +	if (!ns->parent)
> +		return -EPERM;
> +
> +	return map_write(file, buf, size, ppos, CAP_SETGID,
> +			 &ns->gid_map, &ns->parent->gid_map);
> +}
> +
> +static bool new_idmap_permitted(struct user_namespace *ns, int cap_setid,
> +				struct uid_gid_map *new_map)
> +{
> +	/* Allow the specified ids if we have the appropriate capability
> +	 * (CAP_SETUID or CAP_SETGID) over the parent user namespace.
> +	 */
> +	if (ns_capable(ns->parent, cap_setid))
> +		return true;
>  
> -	/* No useful relationship so no mapping */
> -	return overflowgid;
> +	return false;
>  }
>  
>  static __init int user_namespaces_init(void)
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 17/43] userns: Rework the user_namespace adding uid/gid mapping support
@ 2012-04-18 18:49         ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:49 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> - Convert the old uid mapping functions into compatibility wrappers
> - Add a uid/gid mapping layer from user space uid and gids to kernel
>   internal uids and gids that is extent based for simplicty and speed.
>   * Working with number space after mapping uids/gids into their kernel
>     internal version adds only mapping complexity over what we have today,
>     leaving the kernel code easy to understand and test.
> - Add proc files /proc/self/uid_map /proc/self/gid_map
>   These files display the mapping and allow a mapping to be added
>   if a mapping does not exist.
> - Allow entering the user namespace without a uid or gid mapping.
>   Since we are starting with an existing user our uids and gids
>   still have global mappings so are still valid and useful they just don't
>   have local mappings.  The requirement for things to work are global uid
>   and gid so it is odd but perfectly fine not to have a local uid
>   and gid mapping.
>   Not requiring global uid and gid mappings greatly simplifies
>   the logic of setting up the uid and gid mappings by allowing
>   the mappings to be set after the namespace is created which makes the
>   slight weirdness worth it.
> - Make the mappings in the initial user namespace to the global
>   uid/gid space explicit.  Today it is an identity mapping
>   but in the future we may want to twist this for debugging, similar
>   to what we do with jiffies.
> - Document the memory ordering requirements of setting the uid and
>   gid mappings.  We only allow the mappings to be set once
>   and there are no pointers involved so the requirments are
>   trivial but a little atypical.
> 
> Performance:
> 
> In this scheme for the permission checks the performance is expected to
> stay the same as the actuall machine instructions should remain the same.
> 
> The worst case I could think of is ls -l on a large directory where
> all of the stat results need to be translated with from kuids and
> kgids to uids and gids.  So I benchmarked that case on my laptop
> with a dual core hyperthread Intel i5-2520M cpu with 3M of cpu cache.
> 
> My benchmark consisted of going to single user mode where nothing else
> was running. On an ext4 filesystem opening 1,000,000 files and looping
> through all of the files 1000 times and calling fstat on the
> individuals files.  This was to ensure I was benchmarking stat times
> where the inodes were in the kernels cache, but the inode values were
> not in the processors cache.  My results:
> 
> v3.4-rc1:         ~= 156ns (unmodified v3.4-rc1 with user namespace support disabled)
> v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
> v3.4-rc1-userns+: ~= 164ns (v3.4-rc1 with my user namespace patches and user namespace support enabled)
> 
> All of the configurations ran in roughly 120ns when I performed tests
> that ran in the cpu cache.
> 
> So in summary the performance impact is:
> 1ns improvement in the worst case with user namespace support compiled out.
> 8ns aka 5% slowdown in the worst case with user namespace support compiled in.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  fs/proc/base.c                 |   77 ++++++
>  include/linux/uidgid.h         |   24 ++
>  include/linux/user_namespace.h |   30 ++-
>  kernel/user.c                  |   16 ++
>  kernel/user_namespace.c        |  545 +++++++++++++++++++++++++++++++++++++---
>  5 files changed, 644 insertions(+), 48 deletions(-)
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 1c8b280..2ee514c 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -81,6 +81,7 @@
>  #include <linux/oom.h>
>  #include <linux/elf.h>
>  #include <linux/pid_namespace.h>
> +#include <linux/user_namespace.h>
>  #include <linux/fs_struct.h>
>  #include <linux/slab.h>
>  #include <linux/flex_array.h>
> @@ -2943,6 +2944,74 @@ static int proc_tgid_io_accounting(struct task_struct *task, char *buffer)
>  }
>  #endif /* CONFIG_TASK_IO_ACCOUNTING */
>  
> +#ifdef CONFIG_USER_NS
> +static int proc_id_map_open(struct inode *inode, struct file *file,
> +	struct seq_operations *seq_ops)
> +{
> +	struct user_namespace *ns = NULL;
> +	struct task_struct *task;
> +	struct seq_file *seq;
> +	int ret = -EINVAL;
> +
> +	task = get_proc_task(inode);
> +	if (task) {
> +		rcu_read_lock();
> +		ns = get_user_ns(task_cred_xxx(task, user_ns));
> +		rcu_read_unlock();
> +		put_task_struct(task);
> +	}
> +	if (!ns)
> +		goto err;
> +
> +	ret = seq_open(file, seq_ops);
> +	if (ret)
> +		goto err_put_ns;
> +
> +	seq = file->private_data;
> +	seq->private = ns;
> +
> +	return 0;
> +err_put_ns:
> +	put_user_ns(ns);
> +err:
> +	return ret;
> +}
> +
> +static int proc_id_map_release(struct inode *inode, struct file *file)
> +{
> +	struct seq_file *seq = file->private_data;
> +	struct user_namespace *ns = seq->private;
> +	put_user_ns(ns);
> +	return seq_release(inode, file);
> +}
> +
> +static int proc_uid_map_open(struct inode *inode, struct file *file)
> +{
> +	return proc_id_map_open(inode, file, &proc_uid_seq_operations);
> +}
> +
> +static int proc_gid_map_open(struct inode *inode, struct file *file)
> +{
> +	return proc_id_map_open(inode, file, &proc_gid_seq_operations);
> +}
> +
> +static const struct file_operations proc_uid_map_operations = {
> +	.open		= proc_uid_map_open,
> +	.write		= proc_uid_map_write,
> +	.read		= seq_read,
> +	.llseek		= seq_lseek,
> +	.release	= proc_id_map_release,
> +};
> +
> +static const struct file_operations proc_gid_map_operations = {
> +	.open		= proc_gid_map_open,
> +	.write		= proc_gid_map_write,
> +	.read		= seq_read,
> +	.llseek		= seq_lseek,
> +	.release	= proc_id_map_release,
> +};
> +#endif /* CONFIG_USER_NS */
> +
>  static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
>  				struct pid *pid, struct task_struct *task)
>  {
> @@ -3045,6 +3114,10 @@ static const struct pid_entry tgid_base_stuff[] = {
>  #ifdef CONFIG_HARDWALL
>  	INF("hardwall",   S_IRUGO, proc_pid_hardwall),
>  #endif
> +#ifdef CONFIG_USER_NS
> +	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
> +	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
> +#endif
>  };
>  
>  static int proc_tgid_base_readdir(struct file * filp,
> @@ -3400,6 +3473,10 @@ static const struct pid_entry tid_base_stuff[] = {
>  #ifdef CONFIG_HARDWALL
>  	INF("hardwall",   S_IRUGO, proc_pid_hardwall),
>  #endif
> +#ifdef CONFIG_USER_NS
> +	REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations),
> +	REG("gid_map",    S_IRUGO|S_IWUSR, proc_gid_map_operations),
> +#endif
>  };
>  
>  static int proc_tid_base_readdir(struct file * filp,
> diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
> index 5398568..8e522cbc 100644
> --- a/include/linux/uidgid.h
> +++ b/include/linux/uidgid.h
> @@ -127,6 +127,28 @@ static inline bool gid_valid(kgid_t gid)
>  	return !gid_eq(gid, INVALID_GID);
>  }
>  
> +#ifdef CONFIG_USER_NS
> +
> +extern kuid_t make_kuid(struct user_namespace *from, uid_t uid);
> +extern kgid_t make_kgid(struct user_namespace *from, gid_t gid);
> +
> +extern uid_t from_kuid(struct user_namespace *to, kuid_t uid);
> +extern gid_t from_kgid(struct user_namespace *to, kgid_t gid);
> +extern uid_t from_kuid_munged(struct user_namespace *to, kuid_t uid);
> +extern gid_t from_kgid_munged(struct user_namespace *to, kgid_t gid);
> +
> +static inline bool kuid_has_mapping(struct user_namespace *ns, kuid_t uid)
> +{
> +	return from_kuid(ns, uid) != (uid_t) -1;
> +}
> +
> +static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid)
> +{
> +	return from_kgid(ns, gid) != (gid_t) -1;
> +}
> +
> +#else
> +
>  static inline kuid_t make_kuid(struct user_namespace *from, uid_t uid)
>  {
>  	return KUIDT_INIT(uid);
> @@ -173,4 +195,6 @@ static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid)
>  	return true;
>  }
>  
> +#endif /* CONFIG_USER_NS */
> +
>  #endif /* _LINUX_UIDGID_H */
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 8a391bd..4c9846d 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -6,7 +6,20 @@
>  #include <linux/sched.h>
>  #include <linux/err.h>
>  
> +#define UID_GID_MAP_MAX_EXTENTS 5
> +
> +struct uid_gid_map {	/* 64 bytes -- 1 cache line */
> +	u32 nr_extents;
> +	struct uid_gid_extent {
> +		u32 first;
> +		u32 lower_first;
> +		u32 count;
> +	} extent[UID_GID_MAP_MAX_EXTENTS];
> +};
> +
>  struct user_namespace {
> +	struct uid_gid_map	uid_map;
> +	struct uid_gid_map	gid_map;
>  	struct kref		kref;
>  	struct user_namespace	*parent;
>  	kuid_t			owner;
> @@ -33,9 +46,11 @@ static inline void put_user_ns(struct user_namespace *ns)
>  		kref_put(&ns->kref, free_user_ns);
>  }
>  
> -uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid);
> -gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t gid);
> -
> +struct seq_operations;
> +extern struct seq_operations proc_uid_seq_operations;
> +extern struct seq_operations proc_gid_seq_operations;
> +extern ssize_t proc_uid_map_write(struct file *, const char __user *, size_t, loff_t *);
> +extern ssize_t proc_gid_map_write(struct file *, const char __user *, size_t, loff_t *);
>  #else
>  
>  static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
> @@ -52,17 +67,18 @@ static inline void put_user_ns(struct user_namespace *ns)
>  {
>  }
>  
> +#endif
> +
>  static inline uid_t user_ns_map_uid(struct user_namespace *to,
>  	const struct cred *cred, uid_t uid)
>  {
> -	return uid;
> +	return from_kuid_munged(to, make_kuid(cred->user_ns, uid));
>  }
> +
>  static inline gid_t user_ns_map_gid(struct user_namespace *to,
>  	const struct cred *cred, gid_t gid)
>  {
> -	return gid;
> +	return from_kgid_munged(to, make_kgid(cred->user_ns, gid));
>  }
>  
> -#endif
> -
>  #endif /* _LINUX_USER_H */
> diff --git a/kernel/user.c b/kernel/user.c
> index cff3856..f9e420e 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -22,6 +22,22 @@
>   * and 1 for... ?
>   */
>  struct user_namespace init_user_ns = {
> +	.uid_map = {
> +		.nr_extents = 1,
> +		.extent[0] = {
> +			.first = 0,
> +			.lower_first = 0,
> +			.count = 4294967295,
> +		},
> +	},
> +	.gid_map = {
> +		.nr_extents = 1,
> +		.extent[0] = {
> +			.first = 0,
> +			.lower_first = 0,
> +			.count = 4294967295,
> +		},
> +	},
>  	.kref = {
>  		.refcount	= ATOMIC_INIT(3),
>  	},
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index f69741a..9991bac 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -12,9 +12,19 @@
>  #include <linux/highuid.h>
>  #include <linux/cred.h>
>  #include <linux/securebits.h>
> +#include <linux/keyctl.h>
> +#include <linux/key-type.h>
> +#include <keys/user-type.h>
> +#include <linux/seq_file.h>
> +#include <linux/fs.h>
> +#include <linux/uaccess.h>
> +#include <linux/ctype.h>
>  
>  static struct kmem_cache *user_ns_cachep __read_mostly;
>  
> +static bool new_idmap_permitted(struct user_namespace *ns, int cap_setid,
> +				struct uid_gid_map *map);
> +
>  /*
>   * Create a new user namespace, deriving the creator from the user in the
>   * passed credentials, and replacing that user with the new root user for the
> @@ -26,7 +36,6 @@ static struct kmem_cache *user_ns_cachep __read_mostly;
>  int create_user_ns(struct cred *new)
>  {
>  	struct user_namespace *ns, *parent_ns = new->user_ns;
> -	struct user_struct *root_user;
>  	kuid_t owner = make_kuid(new->user_ns, new->euid);
>  	kgid_t group = make_kgid(new->user_ns, new->egid);
>  
> @@ -38,29 +47,15 @@ int create_user_ns(struct cred *new)
>  	    !kgid_has_mapping(parent_ns, group))
>  		return -EPERM;
>  
> -	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
> +	ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL);
>  	if (!ns)
>  		return -ENOMEM;
>  
>  	kref_init(&ns->kref);
> -
> -	/* Alloc new root user.  */
> -	root_user = alloc_uid(make_kuid(ns, 0));
> -	if (!root_user) {
> -		kmem_cache_free(user_ns_cachep, ns);
> -		return -ENOMEM;
> -	}
> -
> -	/* set the new root user in the credentials under preparation */
>  	ns->parent = parent_ns;
>  	ns->owner = owner;
>  	ns->group = group;
> -	free_uid(new->user);
> -	new->user = root_user;
> -	new->uid = new->euid = new->suid = new->fsuid = 0;
> -	new->gid = new->egid = new->sgid = new->fsgid = 0;
> -	put_group_info(new->group_info);
> -	new->group_info = get_group_info(&init_groups);
> +
>  	/* Start with the same capabilities as init but useless for doing
>  	 * anything as the capabilities are bound to the new user namespace.
>  	 */
> @@ -92,44 +87,512 @@ void free_user_ns(struct kref *kref)
>  }
>  EXPORT_SYMBOL(free_user_ns);
>  
> -uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid)
> +static u32 map_id_range_down(struct uid_gid_map *map, u32 id, u32 count)
>  {
> -	struct user_namespace *tmp;
> +	unsigned idx, extents;
> +	u32 first, last, id2;
>  
> -	if (likely(to == cred->user_ns))
> -		return uid;
> +	id2 = id + count - 1;
>  
> -	/* Is cred->user the creator of the target user_ns
> -	 * or the creator of one of it's parents?
> -	 */
> -	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
> -		if (uid_eq(cred->user->uid, tmp->owner)) {
> -			return (uid_t)0;
> -		}
> +	/* Find the matching extent */
> +	extents = map->nr_extents;
> +	smp_read_barrier_depends();
> +	for (idx = 0; idx < extents; idx++) {
> +		first = map->extent[idx].first;
> +		last = first + map->extent[idx].count - 1;
> +		if (id >= first && id <= last &&
> +		    (id2 >= first && id2 <= last))
> +			break;
> +	}
> +	/* Map the id or note failure */
> +	if (idx < extents)
> +		id = (id - first) + map->extent[idx].lower_first;
> +	else
> +		id = (u32) -1;
> +
> +	return id;
> +}
> +
> +static u32 map_id_down(struct uid_gid_map *map, u32 id)
> +{
> +	unsigned idx, extents;
> +	u32 first, last;
> +
> +	/* Find the matching extent */
> +	extents = map->nr_extents;
> +	smp_read_barrier_depends();
> +	for (idx = 0; idx < extents; idx++) {
> +		first = map->extent[idx].first;
> +		last = first + map->extent[idx].count - 1;
> +		if (id >= first && id <= last)
> +			break;
> +	}
> +	/* Map the id or note failure */
> +	if (idx < extents)
> +		id = (id - first) + map->extent[idx].lower_first;
> +	else
> +		id = (u32) -1;
> +
> +	return id;
> +}
> +
> +static u32 map_id_up(struct uid_gid_map *map, u32 id)
> +{
> +	unsigned idx, extents;
> +	u32 first, last;
> +
> +	/* Find the matching extent */
> +	extents = map->nr_extents;
> +	smp_read_barrier_depends();
> +	for (idx = 0; idx < extents; idx++) {
> +		first = map->extent[idx].lower_first;
> +		last = first + map->extent[idx].count - 1;
> +		if (id >= first && id <= last)
> +			break;
>  	}
> +	/* Map the id or note failure */
> +	if (idx < extents)
> +		id = (id - first) + map->extent[idx].first;
> +	else
> +		id = (u32) -1;
> +
> +	return id;
> +}
> +
> +/**
> + *	make_kuid - Map a user-namespace uid pair into a kuid.
> + *	@ns:  User namespace that the uid is in
> + *	@uid: User identifier
> + *
> + *	Maps a user-namespace uid pair into a kernel internal kuid,
> + *	and returns that kuid.
> + *
> + *	When there is no mapping defined for the user-namespace uid
> + *	pair INVALID_UID is returned.  Callers are expected to test
> + *	for and handle handle INVALID_UID being returned.  INVALID_UID
> + *	may be tested for using uid_valid().
> + */
> +kuid_t make_kuid(struct user_namespace *ns, uid_t uid)
> +{
> +	/* Map the uid to a global kernel uid */
> +	return KUIDT_INIT(map_id_down(&ns->uid_map, uid));
> +}
> +EXPORT_SYMBOL(make_kuid);
> +
> +/**
> + *	from_kuid - Create a uid from a kuid user-namespace pair.
> + *	@targ: The user namespace we want a uid in.
> + *	@kuid: The kernel internal uid to start with.
> + *
> + *	Map @kuid into the user-namespace specified by @targ and
> + *	return the resulting uid.
> + *
> + *	There is always a mapping into the initial user_namespace.
> + *
> + *	If @kuid has no mapping in @targ (uid_t)-1 is returned.
> + */
> +uid_t from_kuid(struct user_namespace *targ, kuid_t kuid)
> +{
> +	/* Map the uid from a global kernel uid */
> +	return map_id_up(&targ->uid_map, __kuid_val(kuid));
> +}
> +EXPORT_SYMBOL(from_kuid);
> +
> +/**
> + *	from_kuid_munged - Create a uid from a kuid user-namespace pair.
> + *	@targ: The user namespace we want a uid in.
> + *	@kuid: The kernel internal uid to start with.
> + *
> + *	Map @kuid into the user-namespace specified by @targ and
> + *	return the resulting uid.
> + *
> + *	There is always a mapping into the initial user_namespace.
> + *
> + *	Unlike from_kuid from_kuid_munged never fails and always
> + *	returns a valid uid.  This makes from_kuid_munged appropriate
> + *	for use in syscalls like stat and getuid where failing the
> + *	system call and failing to provide a valid uid are not an
> + *	options.
> + *
> + *	If @kuid has no mapping in @targ overflowuid is returned.
> + */
> +uid_t from_kuid_munged(struct user_namespace *targ, kuid_t kuid)
> +{
> +	uid_t uid;
> +	uid = from_kuid(targ, kuid);
> +
> +	if (uid == (uid_t) -1)
> +		uid = overflowuid;
> +	return uid;
> +}
> +EXPORT_SYMBOL(from_kuid_munged);
> +
> +/**
> + *	make_kgid - Map a user-namespace gid pair into a kgid.
> + *	@ns:  User namespace that the gid is in
> + *	@uid: group identifier
> + *
> + *	Maps a user-namespace gid pair into a kernel internal kgid,
> + *	and returns that kgid.
> + *
> + *	When there is no mapping defined for the user-namespace gid
> + *	pair INVALID_GID is returned.  Callers are expected to test
> + *	for and handle INVALID_GID being returned.  INVALID_GID may be
> + *	tested for using gid_valid().
> + */
> +kgid_t make_kgid(struct user_namespace *ns, gid_t gid)
> +{
> +	/* Map the gid to a global kernel gid */
> +	return KGIDT_INIT(map_id_down(&ns->gid_map, gid));
> +}
> +EXPORT_SYMBOL(make_kgid);
> +
> +/**
> + *	from_kgid - Create a gid from a kgid user-namespace pair.
> + *	@targ: The user namespace we want a gid in.
> + *	@kgid: The kernel internal gid to start with.
> + *
> + *	Map @kgid into the user-namespace specified by @targ and
> + *	return the resulting gid.
> + *
> + *	There is always a mapping into the initial user_namespace.
> + *
> + *	If @kgid has no mapping in @targ (gid_t)-1 is returned.
> + */
> +gid_t from_kgid(struct user_namespace *targ, kgid_t kgid)
> +{
> +	/* Map the gid from a global kernel gid */
> +	return map_id_up(&targ->gid_map, __kgid_val(kgid));
> +}
> +EXPORT_SYMBOL(from_kgid);
> +
> +/**
> + *	from_kgid_munged - Create a gid from a kgid user-namespace pair.
> + *	@targ: The user namespace we want a gid in.
> + *	@kgid: The kernel internal gid to start with.
> + *
> + *	Map @kgid into the user-namespace specified by @targ and
> + *	return the resulting gid.
> + *
> + *	There is always a mapping into the initial user_namespace.
> + *
> + *	Unlike from_kgid from_kgid_munged never fails and always
> + *	returns a valid gid.  This makes from_kgid_munged appropriate
> + *	for use in syscalls like stat and getgid where failing the
> + *	system call and failing to provide a valid gid are not options.
> + *
> + *	If @kgid has no mapping in @targ overflowgid is returned.
> + */
> +gid_t from_kgid_munged(struct user_namespace *targ, kgid_t kgid)
> +{
> +	gid_t gid;
> +	gid = from_kgid(targ, kgid);
> +
> +	if (gid == (gid_t) -1)
> +		gid = overflowgid;
> +	return gid;
> +}
> +EXPORT_SYMBOL(from_kgid_munged);
> +
> +static int uid_m_show(struct seq_file *seq, void *v)
> +{
> +	struct user_namespace *ns = seq->private;
> +	struct uid_gid_extent *extent = v;
> +	struct user_namespace *lower_ns;
> +	uid_t lower;
>  
> -	/* No useful relationship so no mapping */
> -	return overflowuid;
> +	lower_ns = current_user_ns();
> +	if ((lower_ns == ns) && lower_ns->parent)
> +		lower_ns = lower_ns->parent;
> +
> +	lower = from_kuid(lower_ns, KUIDT_INIT(extent->lower_first));
> +
> +	seq_printf(seq, "%10u %10u %10u\n",
> +		extent->first,
> +		lower,
> +		extent->count);
> +
> +	return 0;
>  }
>  
> -gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t gid)
> +static int gid_m_show(struct seq_file *seq, void *v)
>  {
> -	struct user_namespace *tmp;
> +	struct user_namespace *ns = seq->private;
> +	struct uid_gid_extent *extent = v;
> +	struct user_namespace *lower_ns;
> +	gid_t lower;
>  
> -	if (likely(to == cred->user_ns))
> -		return gid;
> +	lower_ns = current_user_ns();
> +	if ((lower_ns == ns) && lower_ns->parent)
> +		lower_ns = lower_ns->parent;
>  
> -	/* Is cred->user the creator of the target user_ns
> -	 * or the creator of one of it's parents?
> +	lower = from_kgid(lower_ns, KGIDT_INIT(extent->lower_first));
> +
> +	seq_printf(seq, "%10u %10u %10u\n",
> +		extent->first,
> +		lower,
> +		extent->count);
> +
> +	return 0;
> +}
> +
> +static void *m_start(struct seq_file *seq, loff_t *ppos, struct uid_gid_map *map)
> +{
> +	struct uid_gid_extent *extent = NULL;
> +	loff_t pos = *ppos;
> +
> +	if (pos < map->nr_extents)
> +		extent = &map->extent[pos];
> +
> +	return extent;
> +}
> +
> +static void *uid_m_start(struct seq_file *seq, loff_t *ppos)
> +{
> +	struct user_namespace *ns = seq->private;
> +
> +	return m_start(seq, ppos, &ns->uid_map);
> +}
> +
> +static void *gid_m_start(struct seq_file *seq, loff_t *ppos)
> +{
> +	struct user_namespace *ns = seq->private;
> +
> +	return m_start(seq, ppos, &ns->gid_map);
> +}
> +
> +static void *m_next(struct seq_file *seq, void *v, loff_t *pos)
> +{
> +	(*pos)++;
> +	return seq->op->start(seq, pos);
> +}
> +
> +static void m_stop(struct seq_file *seq, void *v)
> +{
> +	return;
> +}
> +
> +struct seq_operations proc_uid_seq_operations = {
> +	.start = uid_m_start,
> +	.stop = m_stop,
> +	.next = m_next,
> +	.show = uid_m_show,
> +};
> +
> +struct seq_operations proc_gid_seq_operations = {
> +	.start = gid_m_start,
> +	.stop = m_stop,
> +	.next = m_next,
> +	.show = gid_m_show,
> +};
> +
> +static DEFINE_MUTEX(id_map_mutex);
> +
> +static ssize_t map_write(struct file *file, const char __user *buf,
> +			 size_t count, loff_t *ppos,
> +			 int cap_setid,
> +			 struct uid_gid_map *map,
> +			 struct uid_gid_map *parent_map)
> +{
> +	struct seq_file *seq = file->private_data;
> +	struct user_namespace *ns = seq->private;
> +	struct uid_gid_map new_map;
> +	unsigned idx;
> +	struct uid_gid_extent *extent, *last = NULL;
> +	unsigned long page = 0;
> +	char *kbuf, *pos, *next_line;
> +	ssize_t ret = -EINVAL;
> +
> +	/*
> +	 * The id_map_mutex serializes all writes to any given map.
> +	 *
> +	 * Any map is only ever written once.
> +	 *
> +	 * An id map fits within 1 cache line on most architectures.
> +	 *
> +	 * On read nothing needs to be done unless you are on an
> +	 * architecture with a crazy cache coherency model like alpha.
> +	 *
> +	 * There is a one time data dependency between reading the
> +	 * count of the extents and the values of the extents.  The
> +	 * desired behavior is to see the values of the extents that
> +	 * were written before the count of the extents.
> +	 *
> +	 * To achieve this smp_wmb() is used on guarantee the write
> +	 * order and smp_read_barrier_depends() is guaranteed that we
> +	 * don't have crazy architectures returning stale data.
> +	 *
> +	 */
> +	mutex_lock(&id_map_mutex);
> +
> +	ret = -EPERM;
> +	/* Only allow one successful write to the map */
> +	if (map->nr_extents != 0)
> +		goto out;
> +
> +	/* Require the appropriate privilege CAP_SETUID or CAP_SETGID
> +	 * over the user namespace in order to set the id mapping.
>  	 */
> -	for ( tmp = to; tmp != &init_user_ns; tmp = tmp->parent ) {
> -		if (uid_eq(cred->user->uid, tmp->owner)) {
> -			return (gid_t)0;
> +	if (!ns_capable(ns, cap_setid))
> +		goto out;
> +
> +	/* Get a buffer */
> +	ret = -ENOMEM;
> +	page = __get_free_page(GFP_TEMPORARY);
> +	kbuf = (char *) page;
> +	if (!page)
> +		goto out;
> +
> +	/* Only allow <= page size writes at the beginning of the file */
> +	ret = -EINVAL;
> +	if ((*ppos != 0) || (count >= PAGE_SIZE))
> +		goto out;
> +
> +	/* Slurp in the user data */
> +	ret = -EFAULT;
> +	if (copy_from_user(kbuf, buf, count))
> +		goto out;
> +	kbuf[count] = '\0';
> +
> +	/* Parse the user data */
> +	ret = -EINVAL;
> +	pos = kbuf;
> +	new_map.nr_extents = 0;
> +	for (;pos; pos = next_line) {
> +		extent = &new_map.extent[new_map.nr_extents];
> +
> +		/* Find the end of line and ensure I don't look past it */
> +		next_line = strchr(pos, '\n');
> +		if (next_line) {
> +			*next_line = '\0';
> +			next_line++;
> +			if (*next_line == '\0')
> +				next_line = NULL;
>  		}
> +
> +		pos = skip_spaces(pos);
> +		extent->first = simple_strtoul(pos, &pos, 10);
> +		if (!isspace(*pos))
> +			goto out;
> +
> +		pos = skip_spaces(pos);
> +		extent->lower_first = simple_strtoul(pos, &pos, 10);
> +		if (!isspace(*pos))
> +			goto out;
> +
> +		pos = skip_spaces(pos);
> +		extent->count = simple_strtoul(pos, &pos, 10);
> +		if (*pos && !isspace(*pos))
> +			goto out;
> +
> +		/* Verify there is not trailing junk on the line */
> +		pos = skip_spaces(pos);
> +		if (*pos != '\0')
> +			goto out;
> +
> +		/* Verify we have been given valid starting values */
> +		if ((extent->first == (u32) -1) ||
> +		    (extent->lower_first == (u32) -1 ))
> +			goto out;
> +
> +		/* Verify count is not zero and does not cause the extent to wrap */
> +		if ((extent->first + extent->count) <= extent->first)
> +			goto out;
> +		if ((extent->lower_first + extent->count) <= extent->lower_first)
> +			goto out;
> +
> +		/* For now only accept extents that are strictly in order */
> +		if (last &&
> +		    (((last->first + last->count) > extent->first) ||
> +		     ((last->lower_first + last->count) > extent->lower_first)))
> +			goto out;
> +
> +		new_map.nr_extents++;
> +		last = extent;
> +
> +		/* Fail if the file contains too many extents */
> +		if ((new_map.nr_extents == UID_GID_MAP_MAX_EXTENTS) &&
> +		    (next_line != NULL))
> +			goto out;
>  	}
> +	/* Be very certaint the new map actually exists */
> +	if (new_map.nr_extents == 0)
> +		goto out;
> +
> +	ret = -EPERM;
> +	/* Validate the user is allowed to use user id's mapped to. */
> +	if (!new_idmap_permitted(ns, cap_setid, &new_map))
> +		goto out;
> +
> +	/* Map the lower ids from the parent user namespace to the
> +	 * kernel global id space.
> +	 */
> +	for (idx = 0; idx < new_map.nr_extents; idx++) {
> +		u32 lower_first;
> +		extent = &new_map.extent[idx];
> +
> +		lower_first = map_id_range_down(parent_map,
> +						extent->lower_first,
> +						extent->count);
> +
> +		/* Fail if we can not map the specified extent to
> +		 * the kernel global id space.
> +		 */
> +		if (lower_first == (u32) -1)
> +			goto out;
> +
> +		extent->lower_first = lower_first;
> +	}
> +
> +	/* Install the map */
> +	memcpy(map->extent, new_map.extent,
> +		new_map.nr_extents*sizeof(new_map.extent[0]));
> +	smp_wmb();
> +	map->nr_extents = new_map.nr_extents;
> +
> +	*ppos = count;
> +	ret = count;
> +out:
> +	mutex_unlock(&id_map_mutex);
> +	if (page)
> +		free_page(page);
> +	return ret;
> +}
> +
> +ssize_t proc_uid_map_write(struct file *file, const char __user *buf, size_t size, loff_t *ppos)
> +{
> +	struct seq_file *seq = file->private_data;
> +	struct user_namespace *ns = seq->private;
> +
> +	if (!ns->parent)
> +		return -EPERM;
> +
> +	return map_write(file, buf, size, ppos, CAP_SETUID,
> +			 &ns->uid_map, &ns->parent->uid_map);
> +}
> +
> +ssize_t proc_gid_map_write(struct file *file, const char __user *buf, size_t size, loff_t *ppos)
> +{
> +	struct seq_file *seq = file->private_data;
> +	struct user_namespace *ns = seq->private;
> +
> +	if (!ns->parent)
> +		return -EPERM;
> +
> +	return map_write(file, buf, size, ppos, CAP_SETGID,
> +			 &ns->gid_map, &ns->parent->gid_map);
> +}
> +
> +static bool new_idmap_permitted(struct user_namespace *ns, int cap_setid,
> +				struct uid_gid_map *new_map)
> +{
> +	/* Allow the specified ids if we have the appropriate capability
> +	 * (CAP_SETUID or CAP_SETGID) over the parent user namespace.
> +	 */
> +	if (ns_capable(ns->parent, cap_setid))
> +		return true;
>  
> -	/* No useful relationship so no mapping */
> -	return overflowgid;
> +	return false;
>  }
>  
>  static __init int user_namespaces_init(void)
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 18/43] userns: Convert group_info values from gid_t to kgid_t.
       [not found]     ` <1333862139-31737-18-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-04-18 18:49       ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:49 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> As a first step to converting struct cred to be all kuid_t and kgid_t
> values convert the group values stored in group_info to always be
> kgid_t values.   Unless user namespaces are used this change should
> have no effect.
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
>  arch/s390/kernel/compat_linux.c   |   13 ++++++++-
>  fs/nfsd/auth.c                    |    5 ++-
>  fs/proc/array.c                   |    5 +++-
>  include/linux/cred.h              |    9 ++++---
>  kernel/groups.c                   |   48 +++++++++++++++++++-----------------
>  kernel/uid16.c                    |   14 +++++++++-
>  net/ipv4/ping.c                   |   11 ++++++--
>  net/sunrpc/auth_generic.c         |    4 +-
>  net/sunrpc/auth_gss/svcauth_gss.c |    7 ++++-
>  net/sunrpc/auth_unix.c            |   15 ++++++++---
>  net/sunrpc/svcauth_unix.c         |   18 ++++++++++---
>  security/keys/permission.c        |    3 +-
>  12 files changed, 103 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
> index ab64bdb..5baac18 100644
> --- a/arch/s390/kernel/compat_linux.c
> +++ b/arch/s390/kernel/compat_linux.c
> @@ -173,11 +173,14 @@ asmlinkage long sys32_setfsgid16(u16 gid)
>  
>  static int groups16_to_user(u16 __user *grouplist, struct group_info *group_info)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	int i;
>  	u16 group;
> +	kgid_t kgid;
>  
>  	for (i = 0; i < group_info->ngroups; i++) {
> -		group = (u16)GROUP_AT(group_info, i);
> +		kgid = GROUP_AT(group_info, i);
> +		group = (u16)from_kgid_munged(user_ns, kgid);
>  		if (put_user(group, grouplist+i))
>  			return -EFAULT;
>  	}
> @@ -187,13 +190,19 @@ static int groups16_to_user(u16 __user *grouplist, struct group_info *group_info
>  
>  static int groups16_from_user(struct group_info *group_info, u16 __user *grouplist)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	int i;
>  	u16 group;

need

	kgid_t kgid;

here

>  
>  	for (i = 0; i < group_info->ngroups; i++) {
>  		if (get_user(group, grouplist+i))
>  			return  -EFAULT;
> -		GROUP_AT(group_info, i) = (gid_t)group;
> +
> +		kgid = make_kgid(user_ns, (gid_t)group);
> +		if (!gid_valid(kgid))
> +			return -EINVAL;
> +
> +		GROUP_AT(group_info, i) = kgid;
>  	}
>  
>  	return 0;
> diff --git a/fs/nfsd/auth.c b/fs/nfsd/auth.c
> index 79717a4..204438c 100644
> --- a/fs/nfsd/auth.c
> +++ b/fs/nfsd/auth.c
> @@ -1,6 +1,7 @@
>  /* Copyright (C) 1995, 1996 Olaf Kirch <okir-pn4DOG8n3UYbFoVRYvo4fw@public.gmane.org> */
>  
>  #include <linux/sched.h>
> +#include <linux/user_namespace.h>
>  #include "nfsd.h"
>  #include "auth.h"
>  
> @@ -56,8 +57,8 @@ int nfsd_setuser(struct svc_rqst *rqstp, struct svc_export *exp)
>  			goto oom;
>  
>  		for (i = 0; i < rqgi->ngroups; i++) {
> -			if (!GROUP_AT(rqgi, i))
> -				GROUP_AT(gi, i) = exp->ex_anon_gid;
> +			if (gid_eq(GLOBAL_ROOT_GID, GROUP_AT(rqgi, i)))
> +				GROUP_AT(gi, i) = make_kgid(&init_user_ns, exp->ex_anon_gid);
>  			else
>  				GROUP_AT(gi, i) = GROUP_AT(rqgi, i);
>  		}
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index f9bd395..36a0a91 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -81,6 +81,7 @@
>  #include <linux/pid_namespace.h>
>  #include <linux/ptrace.h>
>  #include <linux/tracehook.h>
> +#include <linux/user_namespace.h>
>  
>  #include <asm/pgtable.h>
>  #include <asm/processor.h>
> @@ -161,6 +162,7 @@ static inline const char *get_task_state(struct task_struct *tsk)
>  static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
>  				struct pid *pid, struct task_struct *p)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	struct group_info *group_info;
>  	int g;
>  	struct fdtable *fdt = NULL;
> @@ -205,7 +207,8 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
>  	task_unlock(p);
>  
>  	for (g = 0; g < min(group_info->ngroups, NGROUPS_SMALL); g++)
> -		seq_printf(m, "%d ", GROUP_AT(group_info, g));
> +		seq_printf(m, "%d ",
> +			   from_kgid_munged(user_ns, GROUP_AT(group_info, g)));
>  	put_cred(cred);
>  
>  	seq_putc(m, '\n');
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index 2c60ec8..0ab3cda 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -17,6 +17,7 @@
>  #include <linux/key.h>
>  #include <linux/selinux.h>
>  #include <linux/atomic.h>
> +#include <linux/uidgid.h>
>  
>  struct user_struct;
>  struct cred;
> @@ -26,14 +27,14 @@ struct inode;
>   * COW Supplementary groups list
>   */
>  #define NGROUPS_SMALL		32
> -#define NGROUPS_PER_BLOCK	((unsigned int)(PAGE_SIZE / sizeof(gid_t)))
> +#define NGROUPS_PER_BLOCK	((unsigned int)(PAGE_SIZE / sizeof(kgid_t)))
>  
>  struct group_info {
>  	atomic_t	usage;
>  	int		ngroups;
>  	int		nblocks;
> -	gid_t		small_block[NGROUPS_SMALL];
> -	gid_t		*blocks[0];
> +	kgid_t		small_block[NGROUPS_SMALL];
> +	kgid_t		*blocks[0];
>  };
>  
>  /**
> @@ -66,7 +67,7 @@ extern struct group_info init_groups;
>  extern void groups_free(struct group_info *);
>  extern int set_current_groups(struct group_info *);
>  extern int set_groups(struct cred *, struct group_info *);
> -extern int groups_search(const struct group_info *, gid_t);
> +extern int groups_search(const struct group_info *, kgid_t);
>  
>  /* access the groups "array" with this macro */
>  #define GROUP_AT(gi, i) \
> diff --git a/kernel/groups.c b/kernel/groups.c
> index 99b53d1..84156f2 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -31,7 +31,7 @@ struct group_info *groups_alloc(int gidsetsize)
>  		group_info->blocks[0] = group_info->small_block;
>  	else {
>  		for (i = 0; i < nblocks; i++) {
> -			gid_t *b;
> +			kgid_t *b;
>  			b = (void *)__get_free_page(GFP_USER);
>  			if (!b)
>  				goto out_undo_partial_alloc;
> @@ -66,18 +66,15 @@ EXPORT_SYMBOL(groups_free);
>  static int groups_to_user(gid_t __user *grouplist,
>  			  const struct group_info *group_info)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	int i;
>  	unsigned int count = group_info->ngroups;
>  
> -	for (i = 0; i < group_info->nblocks; i++) {
> -		unsigned int cp_count = min(NGROUPS_PER_BLOCK, count);
> -		unsigned int len = cp_count * sizeof(*grouplist);
> -
> -		if (copy_to_user(grouplist, group_info->blocks[i], len))
> +	for (i = 0; i < count; i++) {
> +		gid_t gid;
> +		gid = from_kgid_munged(user_ns, GROUP_AT(group_info, i));
> +		if (put_user(gid, grouplist+i))
>  			return -EFAULT;
> -
> -		grouplist += NGROUPS_PER_BLOCK;
> -		count -= cp_count;
>  	}
>  	return 0;
>  }
> @@ -86,18 +83,21 @@ static int groups_to_user(gid_t __user *grouplist,
>  static int groups_from_user(struct group_info *group_info,
>      gid_t __user *grouplist)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	int i;
>  	unsigned int count = group_info->ngroups;
>  
> -	for (i = 0; i < group_info->nblocks; i++) {
> -		unsigned int cp_count = min(NGROUPS_PER_BLOCK, count);
> -		unsigned int len = cp_count * sizeof(*grouplist);
> -
> -		if (copy_from_user(group_info->blocks[i], grouplist, len))
> +	for (i = 0; i < count; i++) {
> +		gid_t gid;
> +		kgid_t kgid;
> +		if (get_user(gid, grouplist+i))
>  			return -EFAULT;
>  
> -		grouplist += NGROUPS_PER_BLOCK;
> -		count -= cp_count;
> +		kgid = make_kgid(user_ns, gid);
> +		if (!gid_valid(kgid))
> +			return -EINVAL;
> +
> +		GROUP_AT(group_info, i) = kgid;
>  	}
>  	return 0;
>  }
> @@ -117,9 +117,9 @@ static void groups_sort(struct group_info *group_info)
>  		for (base = 0; base < max; base++) {
>  			int left = base;
>  			int right = left + stride;
> -			gid_t tmp = GROUP_AT(group_info, right);
> +			kgid_t tmp = GROUP_AT(group_info, right);
>  
> -			while (left >= 0 && GROUP_AT(group_info, left) > tmp) {
> +			while (left >= 0 && gid_gt(GROUP_AT(group_info, left), tmp)) {
>  				GROUP_AT(group_info, right) =
>  				    GROUP_AT(group_info, left);
>  				right = left;
> @@ -132,7 +132,7 @@ static void groups_sort(struct group_info *group_info)
>  }
>  
>  /* a simple bsearch */
> -int groups_search(const struct group_info *group_info, gid_t grp)
> +int groups_search(const struct group_info *group_info, kgid_t grp)
>  {
>  	unsigned int left, right;
>  
> @@ -143,9 +143,9 @@ int groups_search(const struct group_info *group_info, gid_t grp)
>  	right = group_info->ngroups;
>  	while (left < right) {
>  		unsigned int mid = (left+right)/2;
> -		if (grp > GROUP_AT(group_info, mid))
> +		if (gid_gt(grp, GROUP_AT(group_info, mid)))
>  			left = mid + 1;
> -		else if (grp < GROUP_AT(group_info, mid))
> +		else if (gid_lt(grp, GROUP_AT(group_info, mid)))
>  			right = mid;
>  		else
>  			return 1;
> @@ -262,7 +262,8 @@ int in_group_p(gid_t grp)
>  	int retval = 1;
>  
>  	if (grp != cred->fsgid)
> -		retval = groups_search(cred->group_info, grp);
> +		retval = groups_search(cred->group_info,
> +				       make_kgid(cred->user_ns, grp));
>  	return retval;
>  }
>  
> @@ -274,7 +275,8 @@ int in_egroup_p(gid_t grp)
>  	int retval = 1;
>  
>  	if (grp != cred->egid)
> -		retval = groups_search(cred->group_info, grp);
> +		retval = groups_search(cred->group_info,
> +				       make_kgid(cred->user_ns, grp));
>  	return retval;
>  }
>  
> diff --git a/kernel/uid16.c b/kernel/uid16.c
> index 51c6e89..e530bc3 100644
> --- a/kernel/uid16.c
> +++ b/kernel/uid16.c
> @@ -134,11 +134,14 @@ SYSCALL_DEFINE1(setfsgid16, old_gid_t, gid)
>  static int groups16_to_user(old_gid_t __user *grouplist,
>      struct group_info *group_info)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	int i;
>  	old_gid_t group;
> +	kgid_t kgid;
>  
>  	for (i = 0; i < group_info->ngroups; i++) {
> -		group = high2lowgid(GROUP_AT(group_info, i));
> +		kgid = GROUP_AT(group_info, i);
> +		group = high2lowgid(from_kgid_munged(user_ns, kgid));
>  		if (put_user(group, grouplist+i))
>  			return -EFAULT;
>  	}
> @@ -149,13 +152,20 @@ static int groups16_to_user(old_gid_t __user *grouplist,
>  static int groups16_from_user(struct group_info *group_info,
>      old_gid_t __user *grouplist)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	int i;
>  	old_gid_t group;
> +	kgid_t kgid;
>  
>  	for (i = 0; i < group_info->ngroups; i++) {
>  		if (get_user(group, grouplist+i))
>  			return  -EFAULT;
> -		GROUP_AT(group_info, i) = low2highgid(group);
> +
> +		kgid = make_kgid(user_ns, low2highgid(group));
> +		if (!gid_valid(kgid))
> +			return -EINVAL;
> +
> +		GROUP_AT(group_info, i) = kgid;
>  	}
>  
>  	return 0;
> diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
> index 50009c7..9d3044f 100644
> --- a/net/ipv4/ping.c
> +++ b/net/ipv4/ping.c
> @@ -205,17 +205,22 @@ static int ping_init_sock(struct sock *sk)
>  	gid_t range[2];
>  	struct group_info *group_info = get_current_groups();
>  	int i, j, count = group_info->ngroups;
> +	kgid_t low, high;
>  
>  	inet_get_ping_group_range_net(net, range, range+1);
> +	low = make_kgid(&init_user_ns, range[0]);
> +	high = make_kgid(&init_user_ns, range[1]);
> +	if (!gid_valid(low) || !gid_valid(high) || gid_lt(high, low))
> +		return -EACCES;
> +
>  	if (range[0] <= group && group <= range[1])
>  		return 0;
>  
>  	for (i = 0; i < group_info->nblocks; i++) {
>  		int cp_count = min_t(int, NGROUPS_PER_BLOCK, count);
> -
>  		for (j = 0; j < cp_count; j++) {
> -			group = group_info->blocks[i][j];
> -			if (range[0] <= group && group <= range[1])
> +			kgid_t gid = group_info->blocks[i][j];
> +			if (gid_lte(low, gid) && gid_lte(gid, high))
>  				return 0;
>  		}
>  
> diff --git a/net/sunrpc/auth_generic.c b/net/sunrpc/auth_generic.c
> index 75762f3..6ed6f20 100644
> --- a/net/sunrpc/auth_generic.c
> +++ b/net/sunrpc/auth_generic.c
> @@ -160,8 +160,8 @@ generic_match(struct auth_cred *acred, struct rpc_cred *cred, int flags)
>  	if (gcred->acred.group_info->ngroups != acred->group_info->ngroups)
>  		goto out_nomatch;
>  	for (i = 0; i < gcred->acred.group_info->ngroups; i++) {
> -		if (GROUP_AT(gcred->acred.group_info, i) !=
> -				GROUP_AT(acred->group_info, i))
> +		if (!gid_eq(GROUP_AT(gcred->acred.group_info, i),
> +				GROUP_AT(acred->group_info, i)))
>  			goto out_nomatch;
>  	}
>  out_match:
> diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
> index 1600cfb..28b62db 100644
> --- a/net/sunrpc/auth_gss/svcauth_gss.c
> +++ b/net/sunrpc/auth_gss/svcauth_gss.c
> @@ -41,6 +41,7 @@
>  #include <linux/types.h>
>  #include <linux/module.h>
>  #include <linux/pagemap.h>
> +#include <linux/user_namespace.h>
>  
>  #include <linux/sunrpc/auth_gss.h>
>  #include <linux/sunrpc/gss_err.h>
> @@ -470,9 +471,13 @@ static int rsc_parse(struct cache_detail *cd,
>  		status = -EINVAL;
>  		for (i=0; i<N; i++) {
>  			gid_t gid;
> +			kgid_t kgid;
>  			if (get_int(&mesg, &gid))
>  				goto out;
> -			GROUP_AT(rsci.cred.cr_group_info, i) = gid;
> +			kgid = make_kgid(&init_user_ns, gid);
> +			if (!gid_valid(kgid))
> +				goto out;
> +			GROUP_AT(rsci.cred.cr_group_info, i) = kgid;
>  		}
>  
>  		/* mech name */
> diff --git a/net/sunrpc/auth_unix.c b/net/sunrpc/auth_unix.c
> index e50502d..52c5abd 100644
> --- a/net/sunrpc/auth_unix.c
> +++ b/net/sunrpc/auth_unix.c
> @@ -12,6 +12,7 @@
>  #include <linux/module.h>
>  #include <linux/sunrpc/clnt.h>
>  #include <linux/sunrpc/auth.h>
> +#include <linux/user_namespace.h>
>  
>  #define NFS_NGROUPS	16
>  
> @@ -78,8 +79,11 @@ unx_create_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags)
>  		groups = NFS_NGROUPS;
>  
>  	cred->uc_gid = acred->gid;
> -	for (i = 0; i < groups; i++)
> -		cred->uc_gids[i] = GROUP_AT(acred->group_info, i);
> +	for (i = 0; i < groups; i++) {
> +		gid_t gid;
> +		gid = from_kgid(&init_user_ns, GROUP_AT(acred->group_info, i));
> +		cred->uc_gids[i] = gid;
> +	}
>  	if (i < NFS_NGROUPS)
>  		cred->uc_gids[i] = NOGROUP;
>  
> @@ -126,9 +130,12 @@ unx_match(struct auth_cred *acred, struct rpc_cred *rcred, int flags)
>  		groups = acred->group_info->ngroups;
>  	if (groups > NFS_NGROUPS)
>  		groups = NFS_NGROUPS;
> -	for (i = 0; i < groups ; i++)
> -		if (cred->uc_gids[i] != GROUP_AT(acred->group_info, i))
> +	for (i = 0; i < groups ; i++) {
> +		gid_t gid;
> +		gid = from_kgid(&init_user_ns, GROUP_AT(acred->group_info, i));
> +		if (cred->uc_gids[i] != gid)
>  			return 0;
> +	}
>  	if (groups < NFS_NGROUPS &&
>  	    cred->uc_gids[groups] != NOGROUP)
>  		return 0;
> diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
> index 521d8f7..71ec853 100644
> --- a/net/sunrpc/svcauth_unix.c
> +++ b/net/sunrpc/svcauth_unix.c
> @@ -14,6 +14,7 @@
>  #include <net/sock.h>
>  #include <net/ipv6.h>
>  #include <linux/kernel.h>
> +#include <linux/user_namespace.h>
>  #define RPCDBG_FACILITY	RPCDBG_AUTH
>  
>  #include <linux/sunrpc/clnt.h>
> @@ -530,11 +531,15 @@ static int unix_gid_parse(struct cache_detail *cd,
>  
>  	for (i = 0 ; i < gids ; i++) {
>  		int gid;
> +		kgid_t kgid;
>  		rv = get_int(&mesg, &gid);
>  		err = -EINVAL;
>  		if (rv)
>  			goto out;
> -		GROUP_AT(ug.gi, i) = gid;
> +		kgid = make_kgid(&init_user_ns, gid);
> +		if (!gid_valid(kgid))
> +			goto out;
> +		GROUP_AT(ug.gi, i) = kgid;
>  	}
>  
>  	ugp = unix_gid_lookup(cd, uid);
> @@ -563,6 +568,7 @@ static int unix_gid_show(struct seq_file *m,
>  			 struct cache_detail *cd,
>  			 struct cache_head *h)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	struct unix_gid *ug;
>  	int i;
>  	int glen;
> @@ -580,7 +586,7 @@ static int unix_gid_show(struct seq_file *m,
>  
>  	seq_printf(m, "%u %d:", ug->uid, glen);
>  	for (i = 0; i < glen; i++)
> -		seq_printf(m, " %d", GROUP_AT(ug->gi, i));
> +		seq_printf(m, " %d", from_kgid_munged(user_ns, GROUP_AT(ug->gi, i)));
>  	seq_printf(m, "\n");
>  	return 0;
>  }
> @@ -831,8 +837,12 @@ svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp)
>  	cred->cr_group_info = groups_alloc(slen);
>  	if (cred->cr_group_info == NULL)
>  		return SVC_CLOSE;
> -	for (i = 0; i < slen; i++)
> -		GROUP_AT(cred->cr_group_info, i) = svc_getnl(argv);
> +	for (i = 0; i < slen; i++) {
> +		kgid_t kgid = make_kgid(&init_user_ns, svc_getnl(argv));
> +		if (!gid_valid(kgid))
> +			goto badcred;
> +		GROUP_AT(cred->cr_group_info, i) = kgid;
> +	}
>  	if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) {
>  		*authp = rpc_autherr_badverf;
>  		return SVC_DENIED;
> diff --git a/security/keys/permission.c b/security/keys/permission.c
> index e146cbd..5442900 100644
> --- a/security/keys/permission.c
> +++ b/security/keys/permission.c
> @@ -53,7 +53,8 @@ int key_task_permission(const key_ref_t key_ref, const struct cred *cred,
>  			goto use_these_perms;
>  		}
>  
> -		ret = groups_search(cred->group_info, key->gid);
> +		ret = groups_search(cred->group_info,
> +				    make_kgid(current_user_ns(), key->gid));
>  		if (ret) {
>  			kperm = key->perm >> 8;
>  			goto use_these_perms;
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 18/43] userns: Convert group_info values from gid_t to kgid_t.
  2012-04-08  5:15     ` "Eric W. Beiderman
                       ` (2 preceding siblings ...)
  (?)
@ 2012-04-18 18:49     ` Serge E. Hallyn
       [not found]       ` <20120418184936.GC4984-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  -1 siblings, 1 reply; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:49 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> As a first step to converting struct cred to be all kuid_t and kgid_t
> values convert the group values stored in group_info to always be
> kgid_t values.   Unless user namespaces are used this change should
> have no effect.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  arch/s390/kernel/compat_linux.c   |   13 ++++++++-
>  fs/nfsd/auth.c                    |    5 ++-
>  fs/proc/array.c                   |    5 +++-
>  include/linux/cred.h              |    9 ++++---
>  kernel/groups.c                   |   48 +++++++++++++++++++-----------------
>  kernel/uid16.c                    |   14 +++++++++-
>  net/ipv4/ping.c                   |   11 ++++++--
>  net/sunrpc/auth_generic.c         |    4 +-
>  net/sunrpc/auth_gss/svcauth_gss.c |    7 ++++-
>  net/sunrpc/auth_unix.c            |   15 ++++++++---
>  net/sunrpc/svcauth_unix.c         |   18 ++++++++++---
>  security/keys/permission.c        |    3 +-
>  12 files changed, 103 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
> index ab64bdb..5baac18 100644
> --- a/arch/s390/kernel/compat_linux.c
> +++ b/arch/s390/kernel/compat_linux.c
> @@ -173,11 +173,14 @@ asmlinkage long sys32_setfsgid16(u16 gid)
>  
>  static int groups16_to_user(u16 __user *grouplist, struct group_info *group_info)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	int i;
>  	u16 group;
> +	kgid_t kgid;
>  
>  	for (i = 0; i < group_info->ngroups; i++) {
> -		group = (u16)GROUP_AT(group_info, i);
> +		kgid = GROUP_AT(group_info, i);
> +		group = (u16)from_kgid_munged(user_ns, kgid);
>  		if (put_user(group, grouplist+i))
>  			return -EFAULT;
>  	}
> @@ -187,13 +190,19 @@ static int groups16_to_user(u16 __user *grouplist, struct group_info *group_info
>  
>  static int groups16_from_user(struct group_info *group_info, u16 __user *grouplist)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	int i;
>  	u16 group;

need

	kgid_t kgid;

here

>  
>  	for (i = 0; i < group_info->ngroups; i++) {
>  		if (get_user(group, grouplist+i))
>  			return  -EFAULT;
> -		GROUP_AT(group_info, i) = (gid_t)group;
> +
> +		kgid = make_kgid(user_ns, (gid_t)group);
> +		if (!gid_valid(kgid))
> +			return -EINVAL;
> +
> +		GROUP_AT(group_info, i) = kgid;
>  	}
>  
>  	return 0;
> diff --git a/fs/nfsd/auth.c b/fs/nfsd/auth.c
> index 79717a4..204438c 100644
> --- a/fs/nfsd/auth.c
> +++ b/fs/nfsd/auth.c
> @@ -1,6 +1,7 @@
>  /* Copyright (C) 1995, 1996 Olaf Kirch <okir@monad.swb.de> */
>  
>  #include <linux/sched.h>
> +#include <linux/user_namespace.h>
>  #include "nfsd.h"
>  #include "auth.h"
>  
> @@ -56,8 +57,8 @@ int nfsd_setuser(struct svc_rqst *rqstp, struct svc_export *exp)
>  			goto oom;
>  
>  		for (i = 0; i < rqgi->ngroups; i++) {
> -			if (!GROUP_AT(rqgi, i))
> -				GROUP_AT(gi, i) = exp->ex_anon_gid;
> +			if (gid_eq(GLOBAL_ROOT_GID, GROUP_AT(rqgi, i)))
> +				GROUP_AT(gi, i) = make_kgid(&init_user_ns, exp->ex_anon_gid);
>  			else
>  				GROUP_AT(gi, i) = GROUP_AT(rqgi, i);
>  		}
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index f9bd395..36a0a91 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -81,6 +81,7 @@
>  #include <linux/pid_namespace.h>
>  #include <linux/ptrace.h>
>  #include <linux/tracehook.h>
> +#include <linux/user_namespace.h>
>  
>  #include <asm/pgtable.h>
>  #include <asm/processor.h>
> @@ -161,6 +162,7 @@ static inline const char *get_task_state(struct task_struct *tsk)
>  static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
>  				struct pid *pid, struct task_struct *p)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	struct group_info *group_info;
>  	int g;
>  	struct fdtable *fdt = NULL;
> @@ -205,7 +207,8 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
>  	task_unlock(p);
>  
>  	for (g = 0; g < min(group_info->ngroups, NGROUPS_SMALL); g++)
> -		seq_printf(m, "%d ", GROUP_AT(group_info, g));
> +		seq_printf(m, "%d ",
> +			   from_kgid_munged(user_ns, GROUP_AT(group_info, g)));
>  	put_cred(cred);
>  
>  	seq_putc(m, '\n');
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index 2c60ec8..0ab3cda 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -17,6 +17,7 @@
>  #include <linux/key.h>
>  #include <linux/selinux.h>
>  #include <linux/atomic.h>
> +#include <linux/uidgid.h>
>  
>  struct user_struct;
>  struct cred;
> @@ -26,14 +27,14 @@ struct inode;
>   * COW Supplementary groups list
>   */
>  #define NGROUPS_SMALL		32
> -#define NGROUPS_PER_BLOCK	((unsigned int)(PAGE_SIZE / sizeof(gid_t)))
> +#define NGROUPS_PER_BLOCK	((unsigned int)(PAGE_SIZE / sizeof(kgid_t)))
>  
>  struct group_info {
>  	atomic_t	usage;
>  	int		ngroups;
>  	int		nblocks;
> -	gid_t		small_block[NGROUPS_SMALL];
> -	gid_t		*blocks[0];
> +	kgid_t		small_block[NGROUPS_SMALL];
> +	kgid_t		*blocks[0];
>  };
>  
>  /**
> @@ -66,7 +67,7 @@ extern struct group_info init_groups;
>  extern void groups_free(struct group_info *);
>  extern int set_current_groups(struct group_info *);
>  extern int set_groups(struct cred *, struct group_info *);
> -extern int groups_search(const struct group_info *, gid_t);
> +extern int groups_search(const struct group_info *, kgid_t);
>  
>  /* access the groups "array" with this macro */
>  #define GROUP_AT(gi, i) \
> diff --git a/kernel/groups.c b/kernel/groups.c
> index 99b53d1..84156f2 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -31,7 +31,7 @@ struct group_info *groups_alloc(int gidsetsize)
>  		group_info->blocks[0] = group_info->small_block;
>  	else {
>  		for (i = 0; i < nblocks; i++) {
> -			gid_t *b;
> +			kgid_t *b;
>  			b = (void *)__get_free_page(GFP_USER);
>  			if (!b)
>  				goto out_undo_partial_alloc;
> @@ -66,18 +66,15 @@ EXPORT_SYMBOL(groups_free);
>  static int groups_to_user(gid_t __user *grouplist,
>  			  const struct group_info *group_info)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	int i;
>  	unsigned int count = group_info->ngroups;
>  
> -	for (i = 0; i < group_info->nblocks; i++) {
> -		unsigned int cp_count = min(NGROUPS_PER_BLOCK, count);
> -		unsigned int len = cp_count * sizeof(*grouplist);
> -
> -		if (copy_to_user(grouplist, group_info->blocks[i], len))
> +	for (i = 0; i < count; i++) {
> +		gid_t gid;
> +		gid = from_kgid_munged(user_ns, GROUP_AT(group_info, i));
> +		if (put_user(gid, grouplist+i))
>  			return -EFAULT;
> -
> -		grouplist += NGROUPS_PER_BLOCK;
> -		count -= cp_count;
>  	}
>  	return 0;
>  }
> @@ -86,18 +83,21 @@ static int groups_to_user(gid_t __user *grouplist,
>  static int groups_from_user(struct group_info *group_info,
>      gid_t __user *grouplist)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	int i;
>  	unsigned int count = group_info->ngroups;
>  
> -	for (i = 0; i < group_info->nblocks; i++) {
> -		unsigned int cp_count = min(NGROUPS_PER_BLOCK, count);
> -		unsigned int len = cp_count * sizeof(*grouplist);
> -
> -		if (copy_from_user(group_info->blocks[i], grouplist, len))
> +	for (i = 0; i < count; i++) {
> +		gid_t gid;
> +		kgid_t kgid;
> +		if (get_user(gid, grouplist+i))
>  			return -EFAULT;
>  
> -		grouplist += NGROUPS_PER_BLOCK;
> -		count -= cp_count;
> +		kgid = make_kgid(user_ns, gid);
> +		if (!gid_valid(kgid))
> +			return -EINVAL;
> +
> +		GROUP_AT(group_info, i) = kgid;
>  	}
>  	return 0;
>  }
> @@ -117,9 +117,9 @@ static void groups_sort(struct group_info *group_info)
>  		for (base = 0; base < max; base++) {
>  			int left = base;
>  			int right = left + stride;
> -			gid_t tmp = GROUP_AT(group_info, right);
> +			kgid_t tmp = GROUP_AT(group_info, right);
>  
> -			while (left >= 0 && GROUP_AT(group_info, left) > tmp) {
> +			while (left >= 0 && gid_gt(GROUP_AT(group_info, left), tmp)) {
>  				GROUP_AT(group_info, right) =
>  				    GROUP_AT(group_info, left);
>  				right = left;
> @@ -132,7 +132,7 @@ static void groups_sort(struct group_info *group_info)
>  }
>  
>  /* a simple bsearch */
> -int groups_search(const struct group_info *group_info, gid_t grp)
> +int groups_search(const struct group_info *group_info, kgid_t grp)
>  {
>  	unsigned int left, right;
>  
> @@ -143,9 +143,9 @@ int groups_search(const struct group_info *group_info, gid_t grp)
>  	right = group_info->ngroups;
>  	while (left < right) {
>  		unsigned int mid = (left+right)/2;
> -		if (grp > GROUP_AT(group_info, mid))
> +		if (gid_gt(grp, GROUP_AT(group_info, mid)))
>  			left = mid + 1;
> -		else if (grp < GROUP_AT(group_info, mid))
> +		else if (gid_lt(grp, GROUP_AT(group_info, mid)))
>  			right = mid;
>  		else
>  			return 1;
> @@ -262,7 +262,8 @@ int in_group_p(gid_t grp)
>  	int retval = 1;
>  
>  	if (grp != cred->fsgid)
> -		retval = groups_search(cred->group_info, grp);
> +		retval = groups_search(cred->group_info,
> +				       make_kgid(cred->user_ns, grp));
>  	return retval;
>  }
>  
> @@ -274,7 +275,8 @@ int in_egroup_p(gid_t grp)
>  	int retval = 1;
>  
>  	if (grp != cred->egid)
> -		retval = groups_search(cred->group_info, grp);
> +		retval = groups_search(cred->group_info,
> +				       make_kgid(cred->user_ns, grp));
>  	return retval;
>  }
>  
> diff --git a/kernel/uid16.c b/kernel/uid16.c
> index 51c6e89..e530bc3 100644
> --- a/kernel/uid16.c
> +++ b/kernel/uid16.c
> @@ -134,11 +134,14 @@ SYSCALL_DEFINE1(setfsgid16, old_gid_t, gid)
>  static int groups16_to_user(old_gid_t __user *grouplist,
>      struct group_info *group_info)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	int i;
>  	old_gid_t group;
> +	kgid_t kgid;
>  
>  	for (i = 0; i < group_info->ngroups; i++) {
> -		group = high2lowgid(GROUP_AT(group_info, i));
> +		kgid = GROUP_AT(group_info, i);
> +		group = high2lowgid(from_kgid_munged(user_ns, kgid));
>  		if (put_user(group, grouplist+i))
>  			return -EFAULT;
>  	}
> @@ -149,13 +152,20 @@ static int groups16_to_user(old_gid_t __user *grouplist,
>  static int groups16_from_user(struct group_info *group_info,
>      old_gid_t __user *grouplist)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	int i;
>  	old_gid_t group;
> +	kgid_t kgid;
>  
>  	for (i = 0; i < group_info->ngroups; i++) {
>  		if (get_user(group, grouplist+i))
>  			return  -EFAULT;
> -		GROUP_AT(group_info, i) = low2highgid(group);
> +
> +		kgid = make_kgid(user_ns, low2highgid(group));
> +		if (!gid_valid(kgid))
> +			return -EINVAL;
> +
> +		GROUP_AT(group_info, i) = kgid;
>  	}
>  
>  	return 0;
> diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
> index 50009c7..9d3044f 100644
> --- a/net/ipv4/ping.c
> +++ b/net/ipv4/ping.c
> @@ -205,17 +205,22 @@ static int ping_init_sock(struct sock *sk)
>  	gid_t range[2];
>  	struct group_info *group_info = get_current_groups();
>  	int i, j, count = group_info->ngroups;
> +	kgid_t low, high;
>  
>  	inet_get_ping_group_range_net(net, range, range+1);
> +	low = make_kgid(&init_user_ns, range[0]);
> +	high = make_kgid(&init_user_ns, range[1]);
> +	if (!gid_valid(low) || !gid_valid(high) || gid_lt(high, low))
> +		return -EACCES;
> +
>  	if (range[0] <= group && group <= range[1])
>  		return 0;
>  
>  	for (i = 0; i < group_info->nblocks; i++) {
>  		int cp_count = min_t(int, NGROUPS_PER_BLOCK, count);
> -
>  		for (j = 0; j < cp_count; j++) {
> -			group = group_info->blocks[i][j];
> -			if (range[0] <= group && group <= range[1])
> +			kgid_t gid = group_info->blocks[i][j];
> +			if (gid_lte(low, gid) && gid_lte(gid, high))
>  				return 0;
>  		}
>  
> diff --git a/net/sunrpc/auth_generic.c b/net/sunrpc/auth_generic.c
> index 75762f3..6ed6f20 100644
> --- a/net/sunrpc/auth_generic.c
> +++ b/net/sunrpc/auth_generic.c
> @@ -160,8 +160,8 @@ generic_match(struct auth_cred *acred, struct rpc_cred *cred, int flags)
>  	if (gcred->acred.group_info->ngroups != acred->group_info->ngroups)
>  		goto out_nomatch;
>  	for (i = 0; i < gcred->acred.group_info->ngroups; i++) {
> -		if (GROUP_AT(gcred->acred.group_info, i) !=
> -				GROUP_AT(acred->group_info, i))
> +		if (!gid_eq(GROUP_AT(gcred->acred.group_info, i),
> +				GROUP_AT(acred->group_info, i)))
>  			goto out_nomatch;
>  	}
>  out_match:
> diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
> index 1600cfb..28b62db 100644
> --- a/net/sunrpc/auth_gss/svcauth_gss.c
> +++ b/net/sunrpc/auth_gss/svcauth_gss.c
> @@ -41,6 +41,7 @@
>  #include <linux/types.h>
>  #include <linux/module.h>
>  #include <linux/pagemap.h>
> +#include <linux/user_namespace.h>
>  
>  #include <linux/sunrpc/auth_gss.h>
>  #include <linux/sunrpc/gss_err.h>
> @@ -470,9 +471,13 @@ static int rsc_parse(struct cache_detail *cd,
>  		status = -EINVAL;
>  		for (i=0; i<N; i++) {
>  			gid_t gid;
> +			kgid_t kgid;
>  			if (get_int(&mesg, &gid))
>  				goto out;
> -			GROUP_AT(rsci.cred.cr_group_info, i) = gid;
> +			kgid = make_kgid(&init_user_ns, gid);
> +			if (!gid_valid(kgid))
> +				goto out;
> +			GROUP_AT(rsci.cred.cr_group_info, i) = kgid;
>  		}
>  
>  		/* mech name */
> diff --git a/net/sunrpc/auth_unix.c b/net/sunrpc/auth_unix.c
> index e50502d..52c5abd 100644
> --- a/net/sunrpc/auth_unix.c
> +++ b/net/sunrpc/auth_unix.c
> @@ -12,6 +12,7 @@
>  #include <linux/module.h>
>  #include <linux/sunrpc/clnt.h>
>  #include <linux/sunrpc/auth.h>
> +#include <linux/user_namespace.h>
>  
>  #define NFS_NGROUPS	16
>  
> @@ -78,8 +79,11 @@ unx_create_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags)
>  		groups = NFS_NGROUPS;
>  
>  	cred->uc_gid = acred->gid;
> -	for (i = 0; i < groups; i++)
> -		cred->uc_gids[i] = GROUP_AT(acred->group_info, i);
> +	for (i = 0; i < groups; i++) {
> +		gid_t gid;
> +		gid = from_kgid(&init_user_ns, GROUP_AT(acred->group_info, i));
> +		cred->uc_gids[i] = gid;
> +	}
>  	if (i < NFS_NGROUPS)
>  		cred->uc_gids[i] = NOGROUP;
>  
> @@ -126,9 +130,12 @@ unx_match(struct auth_cred *acred, struct rpc_cred *rcred, int flags)
>  		groups = acred->group_info->ngroups;
>  	if (groups > NFS_NGROUPS)
>  		groups = NFS_NGROUPS;
> -	for (i = 0; i < groups ; i++)
> -		if (cred->uc_gids[i] != GROUP_AT(acred->group_info, i))
> +	for (i = 0; i < groups ; i++) {
> +		gid_t gid;
> +		gid = from_kgid(&init_user_ns, GROUP_AT(acred->group_info, i));
> +		if (cred->uc_gids[i] != gid)
>  			return 0;
> +	}
>  	if (groups < NFS_NGROUPS &&
>  	    cred->uc_gids[groups] != NOGROUP)
>  		return 0;
> diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
> index 521d8f7..71ec853 100644
> --- a/net/sunrpc/svcauth_unix.c
> +++ b/net/sunrpc/svcauth_unix.c
> @@ -14,6 +14,7 @@
>  #include <net/sock.h>
>  #include <net/ipv6.h>
>  #include <linux/kernel.h>
> +#include <linux/user_namespace.h>
>  #define RPCDBG_FACILITY	RPCDBG_AUTH
>  
>  #include <linux/sunrpc/clnt.h>
> @@ -530,11 +531,15 @@ static int unix_gid_parse(struct cache_detail *cd,
>  
>  	for (i = 0 ; i < gids ; i++) {
>  		int gid;
> +		kgid_t kgid;
>  		rv = get_int(&mesg, &gid);
>  		err = -EINVAL;
>  		if (rv)
>  			goto out;
> -		GROUP_AT(ug.gi, i) = gid;
> +		kgid = make_kgid(&init_user_ns, gid);
> +		if (!gid_valid(kgid))
> +			goto out;
> +		GROUP_AT(ug.gi, i) = kgid;
>  	}
>  
>  	ugp = unix_gid_lookup(cd, uid);
> @@ -563,6 +568,7 @@ static int unix_gid_show(struct seq_file *m,
>  			 struct cache_detail *cd,
>  			 struct cache_head *h)
>  {
> +	struct user_namespace *user_ns = current_user_ns();
>  	struct unix_gid *ug;
>  	int i;
>  	int glen;
> @@ -580,7 +586,7 @@ static int unix_gid_show(struct seq_file *m,
>  
>  	seq_printf(m, "%u %d:", ug->uid, glen);
>  	for (i = 0; i < glen; i++)
> -		seq_printf(m, " %d", GROUP_AT(ug->gi, i));
> +		seq_printf(m, " %d", from_kgid_munged(user_ns, GROUP_AT(ug->gi, i)));
>  	seq_printf(m, "\n");
>  	return 0;
>  }
> @@ -831,8 +837,12 @@ svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp)
>  	cred->cr_group_info = groups_alloc(slen);
>  	if (cred->cr_group_info == NULL)
>  		return SVC_CLOSE;
> -	for (i = 0; i < slen; i++)
> -		GROUP_AT(cred->cr_group_info, i) = svc_getnl(argv);
> +	for (i = 0; i < slen; i++) {
> +		kgid_t kgid = make_kgid(&init_user_ns, svc_getnl(argv));
> +		if (!gid_valid(kgid))
> +			goto badcred;
> +		GROUP_AT(cred->cr_group_info, i) = kgid;
> +	}
>  	if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) {
>  		*authp = rpc_autherr_badverf;
>  		return SVC_DENIED;
> diff --git a/security/keys/permission.c b/security/keys/permission.c
> index e146cbd..5442900 100644
> --- a/security/keys/permission.c
> +++ b/security/keys/permission.c
> @@ -53,7 +53,8 @@ int key_task_permission(const key_ref_t key_ref, const struct cred *cred,
>  			goto use_these_perms;
>  		}
>  
> -		ret = groups_search(cred->group_info, key->gid);
> +		ret = groups_search(cred->group_info,
> +				    make_kgid(current_user_ns(), key->gid));
>  		if (ret) {
>  			kperm = key->perm >> 8;
>  			goto use_these_perms;
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 19/43] userns: Store uid and gid values in struct cred with kuid_t and kgid_t types
       [not found]     ` <1333862139-31737-19-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-04-18 18:49       ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:49 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> cred.h and a few trivial users of struct cred are changed.  The rest of the users
> of struct cred are left for other patches as there are too many changes to make
> in one go and leave the change reviewable.  If the user namespace is disabled and
> CONFIG_UIDGID_STRICT_TYPE_CHECKS are disabled the code will contiue to compile
> and behave correctly.
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  arch/x86/mm/fault.c            |    2 +-
>  fs/ioprio.c                    |    8 ++------
>  include/linux/cred.h           |   16 ++++++++--------
>  include/linux/user_namespace.h |    8 ++++----
>  kernel/cred.c                  |   36 ++++++++++++++++++++++--------------
>  kernel/signal.c                |   14 ++++++++------
>  kernel/sys.c                   |   26 +++++++++-----------------
>  kernel/user_namespace.c        |    4 ++--
>  mm/oom_kill.c                  |    4 ++--
>  security/commoncap.c           |    3 +--
>  10 files changed, 59 insertions(+), 62 deletions(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 3ecfd1a..76dcd9d 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -582,7 +582,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code,
>  		pte_t *pte = lookup_address(address, &level);
>  
>  		if (pte && pte_present(*pte) && !pte_exec(*pte))
> -			printk(nx_warning, current_uid());
> +			printk(nx_warning, from_kuid(&init_user_ns, current_uid()));
>  	}
>  
>  	printk(KERN_ALERT "BUG: unable to handle kernel ");
> diff --git a/fs/ioprio.c b/fs/ioprio.c
> index 8e35e96..2072e41 100644
> --- a/fs/ioprio.c
> +++ b/fs/ioprio.c
> @@ -123,9 +123,7 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
>  				break;
>  
>  			do_each_thread(g, p) {
> -				const struct cred *tcred = __task_cred(p);
> -				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
> -				if (!uid_eq(tcred_uid, uid))
> +				if (!uid_eq(task_uid(p), uid))
>  					continue;
>  				ret = set_task_ioprio(p, ioprio);
>  				if (ret)
> @@ -220,9 +218,7 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
>  				break;
>  
>  			do_each_thread(g, p) {
> -				const struct cred *tcred = __task_cred(p);
> -				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
> -				if (!uid_eq(tcred_uid, user->uid))
> +				if (!uid_eq(task_uid(p), user->uid))
>  					continue;
>  				tmpio = get_task_ioprio(p);
>  				if (tmpio < 0)
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index 0ab3cda..fac0579 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -123,14 +123,14 @@ struct cred {
>  #define CRED_MAGIC	0x43736564
>  #define CRED_MAGIC_DEAD	0x44656144
>  #endif
> -	uid_t		uid;		/* real UID of the task */
> -	gid_t		gid;		/* real GID of the task */
> -	uid_t		suid;		/* saved UID of the task */
> -	gid_t		sgid;		/* saved GID of the task */
> -	uid_t		euid;		/* effective UID of the task */
> -	gid_t		egid;		/* effective GID of the task */
> -	uid_t		fsuid;		/* UID for VFS ops */
> -	gid_t		fsgid;		/* GID for VFS ops */
> +	kuid_t		uid;		/* real UID of the task */
> +	kgid_t		gid;		/* real GID of the task */
> +	kuid_t		suid;		/* saved UID of the task */
> +	kgid_t		sgid;		/* saved GID of the task */
> +	kuid_t		euid;		/* effective UID of the task */
> +	kgid_t		egid;		/* effective GID of the task */
> +	kuid_t		fsuid;		/* UID for VFS ops */
> +	kgid_t		fsgid;		/* GID for VFS ops */
>  	unsigned	securebits;	/* SUID-less security management */
>  	kernel_cap_t	cap_inheritable; /* caps our children can inherit */
>  	kernel_cap_t	cap_permitted;	/* caps we're permitted */
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 4c9846d..a2c6145 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -70,15 +70,15 @@ static inline void put_user_ns(struct user_namespace *ns)
>  #endif
>  
>  static inline uid_t user_ns_map_uid(struct user_namespace *to,
> -	const struct cred *cred, uid_t uid)
> +	const struct cred *cred, kuid_t uid)
>  {
> -	return from_kuid_munged(to, make_kuid(cred->user_ns, uid));
> +	return from_kuid_munged(to, uid);
>  }
>  
>  static inline gid_t user_ns_map_gid(struct user_namespace *to,
> -	const struct cred *cred, gid_t gid)
> +	const struct cred *cred, kgid_t gid)
>  {
> -	return from_kgid_munged(to, make_kgid(cred->user_ns, gid));
> +	return from_kgid_munged(to, gid);
>  }
>  
>  #endif /* _LINUX_USER_H */
> diff --git a/kernel/cred.c b/kernel/cred.c
> index 7a0d806..eddc5e2 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -49,6 +49,14 @@ struct cred init_cred = {
>  	.subscribers		= ATOMIC_INIT(2),
>  	.magic			= CRED_MAGIC,
>  #endif
> +	.uid			= GLOBAL_ROOT_UID,
> +	.gid			= GLOBAL_ROOT_GID,
> +	.suid			= GLOBAL_ROOT_UID,
> +	.sgid			= GLOBAL_ROOT_GID,
> +	.euid			= GLOBAL_ROOT_UID,
> +	.egid			= GLOBAL_ROOT_GID,
> +	.fsuid			= GLOBAL_ROOT_UID,
> +	.fsgid			= GLOBAL_ROOT_GID,
>  	.securebits		= SECUREBITS_DEFAULT,
>  	.cap_inheritable	= CAP_EMPTY_SET,
>  	.cap_permitted		= CAP_FULL_SET,
> @@ -488,10 +496,10 @@ int commit_creds(struct cred *new)
>  	get_cred(new); /* we will require a ref for the subj creds too */
>  
>  	/* dumpability changes */
> -	if (old->euid != new->euid ||
> -	    old->egid != new->egid ||
> -	    old->fsuid != new->fsuid ||
> -	    old->fsgid != new->fsgid ||
> +	if (!uid_eq(old->euid, new->euid) ||
> +	    !gid_eq(old->egid, new->egid) ||
> +	    !uid_eq(old->fsuid, new->fsuid) ||
> +	    !gid_eq(old->fsgid, new->fsgid) ||
>  	    !cap_issubset(new->cap_permitted, old->cap_permitted)) {
>  		if (task->mm)
>  			set_dumpable(task->mm, suid_dumpable);
> @@ -500,9 +508,9 @@ int commit_creds(struct cred *new)
>  	}
>  
>  	/* alter the thread keyring */
> -	if (new->fsuid != old->fsuid)
> +	if (!uid_eq(new->fsuid, old->fsuid))
>  		key_fsuid_changed(task);
> -	if (new->fsgid != old->fsgid)
> +	if (!gid_eq(new->fsgid, old->fsgid))
>  		key_fsgid_changed(task);
>  
>  	/* do it
> @@ -519,16 +527,16 @@ int commit_creds(struct cred *new)
>  	alter_cred_subscribers(old, -2);
>  
>  	/* send notifications */
> -	if (new->uid   != old->uid  ||
> -	    new->euid  != old->euid ||
> -	    new->suid  != old->suid ||
> -	    new->fsuid != old->fsuid)
> +	if (!uid_eq(new->uid,   old->uid)  ||
> +	    !uid_eq(new->euid,  old->euid) ||
> +	    !uid_eq(new->suid,  old->suid) ||
> +	    !uid_eq(new->fsuid, old->fsuid))
>  		proc_id_connector(task, PROC_EVENT_UID);
>  
> -	if (new->gid   != old->gid  ||
> -	    new->egid  != old->egid ||
> -	    new->sgid  != old->sgid ||
> -	    new->fsgid != old->fsgid)
> +	if (!gid_eq(new->gid,   old->gid)  ||
> +	    !gid_eq(new->egid,  old->egid) ||
> +	    !gid_eq(new->sgid,  old->sgid) ||
> +	    !gid_eq(new->fsgid, old->fsgid))
>  		proc_id_connector(task, PROC_EVENT_GID);
>  
>  	/* release the old obj and subj refs both */
> diff --git a/kernel/signal.c b/kernel/signal.c
> index e2c5d84..2734dc9 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1038,8 +1038,10 @@ static inline void userns_fixup_signal_uid(struct siginfo *info, struct task_str
>  	if (SI_FROMKERNEL(info))
>  		return;
>  
> -	info->si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
> -					current_cred(), info->si_uid);
> +	rcu_read_lock();
> +	info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
> +					make_kuid(current_user_ns(), info->si_uid));
> +	rcu_read_unlock();
>  }
>  #else
>  static inline void userns_fixup_signal_uid(struct siginfo *info, struct task_struct *t)
> @@ -1106,7 +1108,7 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
>  			q->info.si_code = SI_USER;
>  			q->info.si_pid = task_tgid_nr_ns(current,
>  							task_active_pid_ns(t));
> -			q->info.si_uid = current_uid();
> +			q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
>  			break;
>  		case (unsigned long) SEND_SIG_PRIV:
>  			q->info.si_signo = sig;
> @@ -1973,7 +1975,7 @@ static void ptrace_do_notify(int signr, int exit_code, int why)
>  	info.si_signo = signr;
>  	info.si_code = exit_code;
>  	info.si_pid = task_pid_vnr(current);
> -	info.si_uid = current_uid();
> +	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
>  
>  	/* Let the debugger run.  */
>  	ptrace_stop(exit_code, why, 1, &info);
> @@ -2828,7 +2830,7 @@ SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
>  	info.si_errno = 0;
>  	info.si_code = SI_USER;
>  	info.si_pid = task_tgid_vnr(current);
> -	info.si_uid = current_uid();
> +	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
>  
>  	return kill_something_info(sig, &info, pid);
>  }
> @@ -2871,7 +2873,7 @@ static int do_tkill(pid_t tgid, pid_t pid, int sig)
>  	info.si_errno = 0;
>  	info.si_code = SI_TKILL;
>  	info.si_pid = task_tgid_vnr(current);
> -	info.si_uid = current_uid();
> +	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
>  
>  	return do_send_specific(tgid, pid, sig, &info);
>  }
> diff --git a/kernel/sys.c b/kernel/sys.c
> index f0c43b4..3996281 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -175,7 +175,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
>  	const struct cred *cred = current_cred();
>  	int error = -EINVAL;
>  	struct pid *pgrp;
> -	kuid_t cred_uid;
>  	kuid_t uid;
>  
>  	if (which > PRIO_USER || which < PRIO_PROCESS)
> @@ -209,22 +208,19 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
>  			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
>  			break;
>  		case PRIO_USER:
> -			cred_uid = make_kuid(cred->user_ns, cred->uid);
>  			uid = make_kuid(cred->user_ns, who);
>  			user = cred->user;
>  			if (!who)
> -				uid = cred_uid;
> -			else if (!uid_eq(uid, cred_uid) &&
> +				uid = cred->uid;
> +			else if (!uid_eq(uid, cred->uid) &&
>  				 !(user = find_user(uid)))
>  				goto out_unlock;	/* No processes for this user */
>  
>  			do_each_thread(g, p) {
> -				const struct cred *tcred = __task_cred(p);
> -				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
> -				if (uid_eq(tcred_uid, uid))
> +				if (uid_eq(task_uid(p), uid))
>  					error = set_one_prio(p, niceval, error);
>  			} while_each_thread(g, p);
> -			if (!uid_eq(uid, cred_uid))
> +			if (!uid_eq(uid, cred->uid))
>  				free_uid(user);		/* For find_user() */
>  			break;
>  	}
> @@ -248,7 +244,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
>  	const struct cred *cred = current_cred();
>  	long niceval, retval = -ESRCH;
>  	struct pid *pgrp;
> -	kuid_t cred_uid;
>  	kuid_t uid;
>  
>  	if (which > PRIO_USER || which < PRIO_PROCESS)
> @@ -280,25 +275,22 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
>  			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
>  			break;
>  		case PRIO_USER:
> -			cred_uid = make_kuid(cred->user_ns, cred->uid);
>  			uid = make_kuid(cred->user_ns, who);
>  			user = cred->user;
>  			if (!who)
> -				uid = cred_uid;
> -			else if (!uid_eq(uid, cred_uid) &&
> +				uid = cred->uid;
> +			else if (!uid_eq(uid, cred->uid) &&
>  				 !(user = find_user(uid)))
>  				goto out_unlock;	/* No processes for this user */
>  
>  			do_each_thread(g, p) {
> -				const struct cred *tcred = __task_cred(p);
> -				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
> -				if (uid_eq(tcred_uid, uid)) {
> +				if (uid_eq(task_uid(p), uid)) {
>  					niceval = 20 - task_nice(p);
>  					if (niceval > retval)
>  						retval = niceval;
>  				}
>  			} while_each_thread(g, p);
> -			if (!uid_eq(uid, cred_uid))
> +			if (!uid_eq(uid, cred->uid))
>  				free_uid(user);		/* for find_user() */
>  			break;
>  	}
> @@ -641,7 +633,7 @@ static int set_user(struct cred *new)
>  {
>  	struct user_struct *new_user;
>  
> -	new_user = alloc_uid(make_kuid(new->user_ns, new->uid));
> +	new_user = alloc_uid(new->uid);
>  	if (!new_user)
>  		return -EAGAIN;
>  
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 9991bac..0683dbf 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -36,8 +36,8 @@ static bool new_idmap_permitted(struct user_namespace *ns, int cap_setid,
>  int create_user_ns(struct cred *new)
>  {
>  	struct user_namespace *ns, *parent_ns = new->user_ns;
> -	kuid_t owner = make_kuid(new->user_ns, new->euid);
> -	kgid_t group = make_kgid(new->user_ns, new->egid);
> +	kuid_t owner = new->euid;
> +	kgid_t group = new->egid;
>  
>  	/* The creator needs a mapping in the parent user namespace
>  	 * or else we won't be able to reasonably tell userspace who
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 46bf2ed5..9f09a1f 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -410,8 +410,8 @@ static void dump_tasks(const struct mem_cgroup *memcg, const nodemask_t *nodemas
>  		}
>  
>  		pr_info("[%5d] %5d %5d %8lu %8lu %3u     %3d         %5d %s\n",
> -			task->pid, task_uid(task), task->tgid,
> -			task->mm->total_vm, get_mm_rss(task->mm),
> +			task->pid, from_kuid(&init_user_ns, task_uid(task)),
> +			task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
>  			task_cpu(task), task->signal->oom_adj,
>  			task->signal->oom_score_adj, task->comm);
>  		task_unlock(task);
> diff --git a/security/commoncap.c b/security/commoncap.c
> index f2399d8..dbd465a 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -77,8 +77,7 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
>  {
>  	for (;;) {
>  		/* The owner of the user namespace has all caps. */
> -		if (targ_ns != &init_user_ns && uid_eq(targ_ns->owner,
> -						       make_kuid(cred->user_ns, cred->euid)))
> +		if (targ_ns != &init_user_ns && uid_eq(targ_ns->owner, cred->euid))
>  			return 0;
>  
>  		/* Do we have the necessary capabilities? */
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 19/43] userns: Store uid and gid values in struct cred with kuid_t and kgid_t types
  2012-04-08  5:15     ` "Eric W. Beiderman
  (?)
  (?)
@ 2012-04-18 18:49     ` Serge E. Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:49 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> cred.h and a few trivial users of struct cred are changed.  The rest of the users
> of struct cred are left for other patches as there are too many changes to make
> in one go and leave the change reviewable.  If the user namespace is disabled and
> CONFIG_UIDGID_STRICT_TYPE_CHECKS are disabled the code will contiue to compile
> and behave correctly.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  arch/x86/mm/fault.c            |    2 +-
>  fs/ioprio.c                    |    8 ++------
>  include/linux/cred.h           |   16 ++++++++--------
>  include/linux/user_namespace.h |    8 ++++----
>  kernel/cred.c                  |   36 ++++++++++++++++++++++--------------
>  kernel/signal.c                |   14 ++++++++------
>  kernel/sys.c                   |   26 +++++++++-----------------
>  kernel/user_namespace.c        |    4 ++--
>  mm/oom_kill.c                  |    4 ++--
>  security/commoncap.c           |    3 +--
>  10 files changed, 59 insertions(+), 62 deletions(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 3ecfd1a..76dcd9d 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -582,7 +582,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code,
>  		pte_t *pte = lookup_address(address, &level);
>  
>  		if (pte && pte_present(*pte) && !pte_exec(*pte))
> -			printk(nx_warning, current_uid());
> +			printk(nx_warning, from_kuid(&init_user_ns, current_uid()));
>  	}
>  
>  	printk(KERN_ALERT "BUG: unable to handle kernel ");
> diff --git a/fs/ioprio.c b/fs/ioprio.c
> index 8e35e96..2072e41 100644
> --- a/fs/ioprio.c
> +++ b/fs/ioprio.c
> @@ -123,9 +123,7 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
>  				break;
>  
>  			do_each_thread(g, p) {
> -				const struct cred *tcred = __task_cred(p);
> -				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
> -				if (!uid_eq(tcred_uid, uid))
> +				if (!uid_eq(task_uid(p), uid))
>  					continue;
>  				ret = set_task_ioprio(p, ioprio);
>  				if (ret)
> @@ -220,9 +218,7 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
>  				break;
>  
>  			do_each_thread(g, p) {
> -				const struct cred *tcred = __task_cred(p);
> -				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
> -				if (!uid_eq(tcred_uid, user->uid))
> +				if (!uid_eq(task_uid(p), user->uid))
>  					continue;
>  				tmpio = get_task_ioprio(p);
>  				if (tmpio < 0)
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index 0ab3cda..fac0579 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -123,14 +123,14 @@ struct cred {
>  #define CRED_MAGIC	0x43736564
>  #define CRED_MAGIC_DEAD	0x44656144
>  #endif
> -	uid_t		uid;		/* real UID of the task */
> -	gid_t		gid;		/* real GID of the task */
> -	uid_t		suid;		/* saved UID of the task */
> -	gid_t		sgid;		/* saved GID of the task */
> -	uid_t		euid;		/* effective UID of the task */
> -	gid_t		egid;		/* effective GID of the task */
> -	uid_t		fsuid;		/* UID for VFS ops */
> -	gid_t		fsgid;		/* GID for VFS ops */
> +	kuid_t		uid;		/* real UID of the task */
> +	kgid_t		gid;		/* real GID of the task */
> +	kuid_t		suid;		/* saved UID of the task */
> +	kgid_t		sgid;		/* saved GID of the task */
> +	kuid_t		euid;		/* effective UID of the task */
> +	kgid_t		egid;		/* effective GID of the task */
> +	kuid_t		fsuid;		/* UID for VFS ops */
> +	kgid_t		fsgid;		/* GID for VFS ops */
>  	unsigned	securebits;	/* SUID-less security management */
>  	kernel_cap_t	cap_inheritable; /* caps our children can inherit */
>  	kernel_cap_t	cap_permitted;	/* caps we're permitted */
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 4c9846d..a2c6145 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -70,15 +70,15 @@ static inline void put_user_ns(struct user_namespace *ns)
>  #endif
>  
>  static inline uid_t user_ns_map_uid(struct user_namespace *to,
> -	const struct cred *cred, uid_t uid)
> +	const struct cred *cred, kuid_t uid)
>  {
> -	return from_kuid_munged(to, make_kuid(cred->user_ns, uid));
> +	return from_kuid_munged(to, uid);
>  }
>  
>  static inline gid_t user_ns_map_gid(struct user_namespace *to,
> -	const struct cred *cred, gid_t gid)
> +	const struct cred *cred, kgid_t gid)
>  {
> -	return from_kgid_munged(to, make_kgid(cred->user_ns, gid));
> +	return from_kgid_munged(to, gid);
>  }
>  
>  #endif /* _LINUX_USER_H */
> diff --git a/kernel/cred.c b/kernel/cred.c
> index 7a0d806..eddc5e2 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -49,6 +49,14 @@ struct cred init_cred = {
>  	.subscribers		= ATOMIC_INIT(2),
>  	.magic			= CRED_MAGIC,
>  #endif
> +	.uid			= GLOBAL_ROOT_UID,
> +	.gid			= GLOBAL_ROOT_GID,
> +	.suid			= GLOBAL_ROOT_UID,
> +	.sgid			= GLOBAL_ROOT_GID,
> +	.euid			= GLOBAL_ROOT_UID,
> +	.egid			= GLOBAL_ROOT_GID,
> +	.fsuid			= GLOBAL_ROOT_UID,
> +	.fsgid			= GLOBAL_ROOT_GID,
>  	.securebits		= SECUREBITS_DEFAULT,
>  	.cap_inheritable	= CAP_EMPTY_SET,
>  	.cap_permitted		= CAP_FULL_SET,
> @@ -488,10 +496,10 @@ int commit_creds(struct cred *new)
>  	get_cred(new); /* we will require a ref for the subj creds too */
>  
>  	/* dumpability changes */
> -	if (old->euid != new->euid ||
> -	    old->egid != new->egid ||
> -	    old->fsuid != new->fsuid ||
> -	    old->fsgid != new->fsgid ||
> +	if (!uid_eq(old->euid, new->euid) ||
> +	    !gid_eq(old->egid, new->egid) ||
> +	    !uid_eq(old->fsuid, new->fsuid) ||
> +	    !gid_eq(old->fsgid, new->fsgid) ||
>  	    !cap_issubset(new->cap_permitted, old->cap_permitted)) {
>  		if (task->mm)
>  			set_dumpable(task->mm, suid_dumpable);
> @@ -500,9 +508,9 @@ int commit_creds(struct cred *new)
>  	}
>  
>  	/* alter the thread keyring */
> -	if (new->fsuid != old->fsuid)
> +	if (!uid_eq(new->fsuid, old->fsuid))
>  		key_fsuid_changed(task);
> -	if (new->fsgid != old->fsgid)
> +	if (!gid_eq(new->fsgid, old->fsgid))
>  		key_fsgid_changed(task);
>  
>  	/* do it
> @@ -519,16 +527,16 @@ int commit_creds(struct cred *new)
>  	alter_cred_subscribers(old, -2);
>  
>  	/* send notifications */
> -	if (new->uid   != old->uid  ||
> -	    new->euid  != old->euid ||
> -	    new->suid  != old->suid ||
> -	    new->fsuid != old->fsuid)
> +	if (!uid_eq(new->uid,   old->uid)  ||
> +	    !uid_eq(new->euid,  old->euid) ||
> +	    !uid_eq(new->suid,  old->suid) ||
> +	    !uid_eq(new->fsuid, old->fsuid))
>  		proc_id_connector(task, PROC_EVENT_UID);
>  
> -	if (new->gid   != old->gid  ||
> -	    new->egid  != old->egid ||
> -	    new->sgid  != old->sgid ||
> -	    new->fsgid != old->fsgid)
> +	if (!gid_eq(new->gid,   old->gid)  ||
> +	    !gid_eq(new->egid,  old->egid) ||
> +	    !gid_eq(new->sgid,  old->sgid) ||
> +	    !gid_eq(new->fsgid, old->fsgid))
>  		proc_id_connector(task, PROC_EVENT_GID);
>  
>  	/* release the old obj and subj refs both */
> diff --git a/kernel/signal.c b/kernel/signal.c
> index e2c5d84..2734dc9 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1038,8 +1038,10 @@ static inline void userns_fixup_signal_uid(struct siginfo *info, struct task_str
>  	if (SI_FROMKERNEL(info))
>  		return;
>  
> -	info->si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
> -					current_cred(), info->si_uid);
> +	rcu_read_lock();
> +	info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
> +					make_kuid(current_user_ns(), info->si_uid));
> +	rcu_read_unlock();
>  }
>  #else
>  static inline void userns_fixup_signal_uid(struct siginfo *info, struct task_struct *t)
> @@ -1106,7 +1108,7 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
>  			q->info.si_code = SI_USER;
>  			q->info.si_pid = task_tgid_nr_ns(current,
>  							task_active_pid_ns(t));
> -			q->info.si_uid = current_uid();
> +			q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
>  			break;
>  		case (unsigned long) SEND_SIG_PRIV:
>  			q->info.si_signo = sig;
> @@ -1973,7 +1975,7 @@ static void ptrace_do_notify(int signr, int exit_code, int why)
>  	info.si_signo = signr;
>  	info.si_code = exit_code;
>  	info.si_pid = task_pid_vnr(current);
> -	info.si_uid = current_uid();
> +	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
>  
>  	/* Let the debugger run.  */
>  	ptrace_stop(exit_code, why, 1, &info);
> @@ -2828,7 +2830,7 @@ SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
>  	info.si_errno = 0;
>  	info.si_code = SI_USER;
>  	info.si_pid = task_tgid_vnr(current);
> -	info.si_uid = current_uid();
> +	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
>  
>  	return kill_something_info(sig, &info, pid);
>  }
> @@ -2871,7 +2873,7 @@ static int do_tkill(pid_t tgid, pid_t pid, int sig)
>  	info.si_errno = 0;
>  	info.si_code = SI_TKILL;
>  	info.si_pid = task_tgid_vnr(current);
> -	info.si_uid = current_uid();
> +	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
>  
>  	return do_send_specific(tgid, pid, sig, &info);
>  }
> diff --git a/kernel/sys.c b/kernel/sys.c
> index f0c43b4..3996281 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -175,7 +175,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
>  	const struct cred *cred = current_cred();
>  	int error = -EINVAL;
>  	struct pid *pgrp;
> -	kuid_t cred_uid;
>  	kuid_t uid;
>  
>  	if (which > PRIO_USER || which < PRIO_PROCESS)
> @@ -209,22 +208,19 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
>  			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
>  			break;
>  		case PRIO_USER:
> -			cred_uid = make_kuid(cred->user_ns, cred->uid);
>  			uid = make_kuid(cred->user_ns, who);
>  			user = cred->user;
>  			if (!who)
> -				uid = cred_uid;
> -			else if (!uid_eq(uid, cred_uid) &&
> +				uid = cred->uid;
> +			else if (!uid_eq(uid, cred->uid) &&
>  				 !(user = find_user(uid)))
>  				goto out_unlock;	/* No processes for this user */
>  
>  			do_each_thread(g, p) {
> -				const struct cred *tcred = __task_cred(p);
> -				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
> -				if (uid_eq(tcred_uid, uid))
> +				if (uid_eq(task_uid(p), uid))
>  					error = set_one_prio(p, niceval, error);
>  			} while_each_thread(g, p);
> -			if (!uid_eq(uid, cred_uid))
> +			if (!uid_eq(uid, cred->uid))
>  				free_uid(user);		/* For find_user() */
>  			break;
>  	}
> @@ -248,7 +244,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
>  	const struct cred *cred = current_cred();
>  	long niceval, retval = -ESRCH;
>  	struct pid *pgrp;
> -	kuid_t cred_uid;
>  	kuid_t uid;
>  
>  	if (which > PRIO_USER || which < PRIO_PROCESS)
> @@ -280,25 +275,22 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
>  			} while_each_pid_thread(pgrp, PIDTYPE_PGID, p);
>  			break;
>  		case PRIO_USER:
> -			cred_uid = make_kuid(cred->user_ns, cred->uid);
>  			uid = make_kuid(cred->user_ns, who);
>  			user = cred->user;
>  			if (!who)
> -				uid = cred_uid;
> -			else if (!uid_eq(uid, cred_uid) &&
> +				uid = cred->uid;
> +			else if (!uid_eq(uid, cred->uid) &&
>  				 !(user = find_user(uid)))
>  				goto out_unlock;	/* No processes for this user */
>  
>  			do_each_thread(g, p) {
> -				const struct cred *tcred = __task_cred(p);
> -				kuid_t tcred_uid = make_kuid(tcred->user_ns, tcred->uid);
> -				if (uid_eq(tcred_uid, uid)) {
> +				if (uid_eq(task_uid(p), uid)) {
>  					niceval = 20 - task_nice(p);
>  					if (niceval > retval)
>  						retval = niceval;
>  				}
>  			} while_each_thread(g, p);
> -			if (!uid_eq(uid, cred_uid))
> +			if (!uid_eq(uid, cred->uid))
>  				free_uid(user);		/* for find_user() */
>  			break;
>  	}
> @@ -641,7 +633,7 @@ static int set_user(struct cred *new)
>  {
>  	struct user_struct *new_user;
>  
> -	new_user = alloc_uid(make_kuid(new->user_ns, new->uid));
> +	new_user = alloc_uid(new->uid);
>  	if (!new_user)
>  		return -EAGAIN;
>  
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 9991bac..0683dbf 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -36,8 +36,8 @@ static bool new_idmap_permitted(struct user_namespace *ns, int cap_setid,
>  int create_user_ns(struct cred *new)
>  {
>  	struct user_namespace *ns, *parent_ns = new->user_ns;
> -	kuid_t owner = make_kuid(new->user_ns, new->euid);
> -	kgid_t group = make_kgid(new->user_ns, new->egid);
> +	kuid_t owner = new->euid;
> +	kgid_t group = new->egid;
>  
>  	/* The creator needs a mapping in the parent user namespace
>  	 * or else we won't be able to reasonably tell userspace who
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 46bf2ed5..9f09a1f 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -410,8 +410,8 @@ static void dump_tasks(const struct mem_cgroup *memcg, const nodemask_t *nodemas
>  		}
>  
>  		pr_info("[%5d] %5d %5d %8lu %8lu %3u     %3d         %5d %s\n",
> -			task->pid, task_uid(task), task->tgid,
> -			task->mm->total_vm, get_mm_rss(task->mm),
> +			task->pid, from_kuid(&init_user_ns, task_uid(task)),
> +			task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
>  			task_cpu(task), task->signal->oom_adj,
>  			task->signal->oom_score_adj, task->comm);
>  		task_unlock(task);
> diff --git a/security/commoncap.c b/security/commoncap.c
> index f2399d8..dbd465a 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -77,8 +77,7 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
>  {
>  	for (;;) {
>  		/* The owner of the user namespace has all caps. */
> -		if (targ_ns != &init_user_ns && uid_eq(targ_ns->owner,
> -						       make_kuid(cred->user_ns, cred->euid)))
> +		if (targ_ns != &init_user_ns && uid_eq(targ_ns->owner, cred->euid))
>  			return 0;
>  
>  		/* Do we have the necessary capabilities? */
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 20/43] userns: Replace user_ns_map_uid and user_ns_map_gid with from_kuid and from_kgid
       [not found]     ` <1333862139-31737-20-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-04-18 18:49       ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:49 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> These function are no longer needed replace them with their more useful equivalents.
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  include/linux/user_namespace.h |   12 ------------
>  ipc/mqueue.c                   |    3 +--
>  kernel/signal.c                |    2 +-
>  net/core/sock.c                |    4 ++--
>  4 files changed, 4 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index a2c6145..4e72922 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -69,16 +69,4 @@ static inline void put_user_ns(struct user_namespace *ns)
>  
>  #endif
>  
> -static inline uid_t user_ns_map_uid(struct user_namespace *to,
> -	const struct cred *cred, kuid_t uid)
> -{
> -	return from_kuid_munged(to, uid);
> -}
> -
> -static inline gid_t user_ns_map_gid(struct user_namespace *to,
> -	const struct cred *cred, kgid_t gid)
> -{
> -	return from_kgid_munged(to, gid);
> -}
> -
>  #endif /* _LINUX_USER_H */
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index b53cf34..b6a0d46 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -538,8 +538,7 @@ static void __do_notify(struct mqueue_inode_info *info)
>  			rcu_read_lock();
>  			sig_i.si_pid = task_tgid_nr_ns(current,
>  						ns_of_pid(info->notify_owner));
> -			sig_i.si_uid = user_ns_map_uid(info->notify_user_ns,
> -						current_cred(), current_uid());
> +			sig_i.si_uid = from_kuid_munged(info->notify_user_ns, current_uid());
>  			rcu_read_unlock();
>  
>  			kill_pid_info(info->notify.sigev_signo,
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 2734dc9..d630327 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1026,7 +1026,7 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
>  static inline uid_t map_cred_ns(const struct cred *cred,
>  				struct user_namespace *ns)
>  {
> -	return user_ns_map_uid(ns, cred, cred->uid);
> +	return from_kuid_munged(ns, cred->uid);
>  }
>  
>  #ifdef CONFIG_USER_NS
> diff --git a/net/core/sock.c b/net/core/sock.c
> index b2e14c0..e1ec8ba 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -821,8 +821,8 @@ void cred_to_ucred(struct pid *pid, const struct cred *cred,
>  	if (cred) {
>  		struct user_namespace *current_ns = current_user_ns();
>  
> -		ucred->uid = user_ns_map_uid(current_ns, cred, cred->euid);
> -		ucred->gid = user_ns_map_gid(current_ns, cred, cred->egid);
> +		ucred->uid = from_kuid(current_ns, cred->euid);
> +		ucred->gid = from_kgid(current_ns, cred->egid);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(cred_to_ucred);
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 20/43] userns: Replace user_ns_map_uid and user_ns_map_gid with from_kuid and from_kgid
  2012-04-08  5:15     ` "Eric W. Beiderman
  (?)
  (?)
@ 2012-04-18 18:49     ` Serge E. Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:49 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> These function are no longer needed replace them with their more useful equivalents.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  include/linux/user_namespace.h |   12 ------------
>  ipc/mqueue.c                   |    3 +--
>  kernel/signal.c                |    2 +-
>  net/core/sock.c                |    4 ++--
>  4 files changed, 4 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index a2c6145..4e72922 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -69,16 +69,4 @@ static inline void put_user_ns(struct user_namespace *ns)
>  
>  #endif
>  
> -static inline uid_t user_ns_map_uid(struct user_namespace *to,
> -	const struct cred *cred, kuid_t uid)
> -{
> -	return from_kuid_munged(to, uid);
> -}
> -
> -static inline gid_t user_ns_map_gid(struct user_namespace *to,
> -	const struct cred *cred, kgid_t gid)
> -{
> -	return from_kgid_munged(to, gid);
> -}
> -
>  #endif /* _LINUX_USER_H */
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index b53cf34..b6a0d46 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -538,8 +538,7 @@ static void __do_notify(struct mqueue_inode_info *info)
>  			rcu_read_lock();
>  			sig_i.si_pid = task_tgid_nr_ns(current,
>  						ns_of_pid(info->notify_owner));
> -			sig_i.si_uid = user_ns_map_uid(info->notify_user_ns,
> -						current_cred(), current_uid());
> +			sig_i.si_uid = from_kuid_munged(info->notify_user_ns, current_uid());
>  			rcu_read_unlock();
>  
>  			kill_pid_info(info->notify.sigev_signo,
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 2734dc9..d630327 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1026,7 +1026,7 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
>  static inline uid_t map_cred_ns(const struct cred *cred,
>  				struct user_namespace *ns)
>  {
> -	return user_ns_map_uid(ns, cred, cred->uid);
> +	return from_kuid_munged(ns, cred->uid);
>  }
>  
>  #ifdef CONFIG_USER_NS
> diff --git a/net/core/sock.c b/net/core/sock.c
> index b2e14c0..e1ec8ba 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -821,8 +821,8 @@ void cred_to_ucred(struct pid *pid, const struct cred *cred,
>  	if (cred) {
>  		struct user_namespace *current_ns = current_user_ns();
>  
> -		ucred->uid = user_ns_map_uid(current_ns, cred, cred->euid);
> -		ucred->gid = user_ns_map_gid(current_ns, cred, cred->egid);
> +		ucred->uid = from_kuid(current_ns, cred->euid);
> +		ucred->gid = from_kgid(current_ns, cred->egid);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(cred_to_ucred);
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 21/43] userns: Convert sched_set_affinity and sched_set_scheduler's permission checks
       [not found]   ` <1333862139-31737-21-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-04-18 18:50     ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:50 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Cyrill Gorcunov, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> - Compare kuids with uid_eq
> - kuid are uniuqe across all user namespaces so there is no longer the
>   need for a user_namespace comparison.
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  kernel/sched/core.c |    7 ++-----
>  1 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 96bff85..b189fec 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4042,11 +4042,8 @@ static bool check_same_owner(struct task_struct *p)
>  
>  	rcu_read_lock();
>  	pcred = __task_cred(p);
> -	if (cred->user_ns == pcred->user_ns)
> -		match = (cred->euid == pcred->euid ||
> -			 cred->euid == pcred->uid);
> -	else
> -		match = false;
> +	match = (uid_eq(cred->euid, pcred->euid) ||
> +		 uid_eq(cred->euid, pcred->uid));
>  	rcu_read_unlock();
>  	return match;
>  }
> -- 
> 1.7.2.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 21/43] userns: Convert sched_set_affinity and sched_set_scheduler's permission checks
  2012-04-08  5:15 ` [PATCH 21/43] userns: Convert sched_set_affinity and sched_set_scheduler's permission checks "Eric W. Beiderman
       [not found]   ` <1333862139-31737-21-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-04-18 18:50   ` Serge E. Hallyn
  1 sibling, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:50 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, linux-fsdevel, linux-security-module,
	Linux Containers, Andrew Morton, Linus Torvalds, Al Viro,
	Cyrill Gorcunov

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> - Compare kuids with uid_eq
> - kuid are uniuqe across all user namespaces so there is no longer the
>   need for a user_namespace comparison.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  kernel/sched/core.c |    7 ++-----
>  1 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 96bff85..b189fec 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4042,11 +4042,8 @@ static bool check_same_owner(struct task_struct *p)
>  
>  	rcu_read_lock();
>  	pcred = __task_cred(p);
> -	if (cred->user_ns == pcred->user_ns)
> -		match = (cred->euid == pcred->euid ||
> -			 cred->euid == pcred->uid);
> -	else
> -		match = false;
> +	match = (uid_eq(cred->euid, pcred->euid) ||
> +		 uid_eq(cred->euid, pcred->uid));
>  	rcu_read_unlock();
>  	return match;
>  }
> -- 
> 1.7.2.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 22/43] userns: Convert capabilities related permsion checks
       [not found]     ` <1333862139-31737-22-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-04-18 18:51       ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:51 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> - Use uid_eq when comparing kuids
>   Use gid_eq when comparing kgids
> - Use __make_kuid(user_ns, 0) to talk about the user_namespace root uid
>   Use __make_kgid(user_ns, 0) to talk about the user_namespace root gid
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

though, nit,

> ---
>  fs/open.c            |    3 ++-
>  security/commoncap.c |   43 ++++++++++++++++++++++++++++---------------
>  2 files changed, 30 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index 5720854..92335f6 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -316,7 +316,8 @@ SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
>  
>  	if (!issecure(SECURE_NO_SETUID_FIXUP)) {
>  		/* Clear the capabilities if we switch to a non-root user */
> -		if (override_cred->uid)
> +		kuid_t root_uid = make_kuid(override_cred->user_ns, 0);
> +		if (!uid_eq(override_cred->uid, root_uid))
>  			cap_clear(override_cred->cap_effective);
>  		else
>  			override_cred->cap_effective =
> diff --git a/security/commoncap.c b/security/commoncap.c
> index dbd465a..9bf8df8 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -472,19 +472,24 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
>  	struct cred *new = bprm->cred;
>  	bool effective, has_cap = false;
>  	int ret;
> +	kuid_t root_uid;
> +	kgid_t root_gid;

the root_gid is assigned but never used.

>  
>  	effective = false;
>  	ret = get_file_caps(bprm, &effective, &has_cap);
>  	if (ret < 0)
>  		return ret;
>  
> +	root_uid = make_kuid(new->user_ns, 0);
> +	root_gid = make_kgid(new->user_ns, 0);
> +
>  	if (!issecure(SECURE_NOROOT)) {
>  		/*
>  		 * If the legacy file capability is set, then don't set privs
>  		 * for a setuid root binary run by a non-root user.  Do set it
>  		 * for a root user just to cause least surprise to an admin.
>  		 */
> -		if (has_cap && new->uid != 0 && new->euid == 0) {
> +		if (has_cap && !uid_eq(new->uid, root_uid) && uid_eq(new->euid, root_uid)) {
>  			warn_setuid_and_fcaps_mixed(bprm->filename);
>  			goto skip;
>  		}
> @@ -495,12 +500,12 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
>  		 *
>  		 * If only the real uid is 0, we do not set the effective bit.
>  		 */
> -		if (new->euid == 0 || new->uid == 0) {
> +		if (uid_eq(new->euid, root_uid) || uid_eq(new->uid, root_uid)) {
>  			/* pP' = (cap_bset & ~0) | (pI & ~0) */
>  			new->cap_permitted = cap_combine(old->cap_bset,
>  							 old->cap_inheritable);
>  		}
> -		if (new->euid == 0)
> +		if (uid_eq(new->euid, root_uid))
>  			effective = true;
>  	}
>  skip:
> @@ -508,8 +513,8 @@ skip:
>  	/* Don't let someone trace a set[ug]id/setpcap binary with the revised
>  	 * credentials unless they have the appropriate permit
>  	 */
> -	if ((new->euid != old->uid ||
> -	     new->egid != old->gid ||
> +	if ((!uid_eq(new->euid, old->uid) ||
> +	     !gid_eq(new->egid, old->gid) ||
>  	     !cap_issubset(new->cap_permitted, old->cap_permitted)) &&
>  	    bprm->unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
>  		/* downgrade; they get no more than they had, and maybe less */
> @@ -544,7 +549,7 @@ skip:
>  	 */
>  	if (!cap_isclear(new->cap_effective)) {
>  		if (!cap_issubset(CAP_FULL_SET, new->cap_effective) ||
> -		    new->euid != 0 || new->uid != 0 ||
> +		    !uid_eq(new->euid, root_uid) || !uid_eq(new->uid, root_uid) ||
>  		    issecure(SECURE_NOROOT)) {
>  			ret = audit_log_bprm_fcaps(bprm, new, old);
>  			if (ret < 0)
> @@ -569,16 +574,17 @@ skip:
>  int cap_bprm_secureexec(struct linux_binprm *bprm)
>  {
>  	const struct cred *cred = current_cred();
> +	kuid_t root_uid = make_kuid(cred->user_ns, 0);
>  
> -	if (cred->uid != 0) {
> +	if (!uid_eq(cred->uid, root_uid)) {
>  		if (bprm->cap_effective)
>  			return 1;
>  		if (!cap_isclear(cred->cap_permitted))
>  			return 1;
>  	}
>  
> -	return (cred->euid != cred->uid ||
> -		cred->egid != cred->gid);
> +	return (!uid_eq(cred->euid, cred->uid) ||
> +		!gid_eq(cred->egid, cred->gid));
>  }
>  
>  /**
> @@ -668,15 +674,21 @@ int cap_inode_removexattr(struct dentry *dentry, const char *name)
>   */
>  static inline void cap_emulate_setxuid(struct cred *new, const struct cred *old)
>  {
> -	if ((old->uid == 0 || old->euid == 0 || old->suid == 0) &&
> -	    (new->uid != 0 && new->euid != 0 && new->suid != 0) &&
> +	kuid_t root_uid = make_kuid(old->user_ns, 0);
> +
> +	if ((uid_eq(old->uid, root_uid) ||
> +	     uid_eq(old->euid, root_uid) ||
> +	     uid_eq(old->suid, root_uid)) &&
> +	    (!uid_eq(new->uid, root_uid) &&
> +	     !uid_eq(new->euid, root_uid) &&
> +	     !uid_eq(new->suid, root_uid)) &&
>  	    !issecure(SECURE_KEEP_CAPS)) {
>  		cap_clear(new->cap_permitted);
>  		cap_clear(new->cap_effective);
>  	}
> -	if (old->euid == 0 && new->euid != 0)
> +	if (uid_eq(old->euid, root_uid) && !uid_eq(new->euid, root_uid))
>  		cap_clear(new->cap_effective);
> -	if (old->euid != 0 && new->euid == 0)
> +	if (!uid_eq(old->euid, root_uid) && uid_eq(new->euid, root_uid))
>  		new->cap_effective = new->cap_permitted;
>  }
>  
> @@ -709,11 +721,12 @@ int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags)
>  		 *          if not, we might be a bit too harsh here.
>  		 */
>  		if (!issecure(SECURE_NO_SETUID_FIXUP)) {
> -			if (old->fsuid == 0 && new->fsuid != 0)
> +			kuid_t root_uid = make_kuid(old->user_ns, 0);
> +			if (uid_eq(old->fsuid, root_uid) && !uid_eq(new->fsuid, root_uid))
>  				new->cap_effective =
>  					cap_drop_fs_set(new->cap_effective);
>  
> -			if (old->fsuid != 0 && new->fsuid == 0)
> +			if (!uid_eq(old->fsuid, root_uid) && uid_eq(new->fsuid, root_uid))
>  				new->cap_effective =
>  					cap_raise_fs_set(new->cap_effective,
>  							 new->cap_permitted);
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 22/43] userns: Convert capabilities related permsion checks
  2012-04-08  5:15     ` "Eric W. Beiderman
  (?)
  (?)
@ 2012-04-18 18:51     ` Serge E. Hallyn
       [not found]       ` <20120418185106.GG4984-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  -1 siblings, 1 reply; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:51 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> - Use uid_eq when comparing kuids
>   Use gid_eq when comparing kgids
> - Use __make_kuid(user_ns, 0) to talk about the user_namespace root uid
>   Use __make_kgid(user_ns, 0) to talk about the user_namespace root gid
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

though, nit,

> ---
>  fs/open.c            |    3 ++-
>  security/commoncap.c |   43 ++++++++++++++++++++++++++++---------------
>  2 files changed, 30 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index 5720854..92335f6 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -316,7 +316,8 @@ SYSCALL_DEFINE3(faccessat, int, dfd, const char __user *, filename, int, mode)
>  
>  	if (!issecure(SECURE_NO_SETUID_FIXUP)) {
>  		/* Clear the capabilities if we switch to a non-root user */
> -		if (override_cred->uid)
> +		kuid_t root_uid = make_kuid(override_cred->user_ns, 0);
> +		if (!uid_eq(override_cred->uid, root_uid))
>  			cap_clear(override_cred->cap_effective);
>  		else
>  			override_cred->cap_effective =
> diff --git a/security/commoncap.c b/security/commoncap.c
> index dbd465a..9bf8df8 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -472,19 +472,24 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
>  	struct cred *new = bprm->cred;
>  	bool effective, has_cap = false;
>  	int ret;
> +	kuid_t root_uid;
> +	kgid_t root_gid;

the root_gid is assigned but never used.

>  
>  	effective = false;
>  	ret = get_file_caps(bprm, &effective, &has_cap);
>  	if (ret < 0)
>  		return ret;
>  
> +	root_uid = make_kuid(new->user_ns, 0);
> +	root_gid = make_kgid(new->user_ns, 0);
> +
>  	if (!issecure(SECURE_NOROOT)) {
>  		/*
>  		 * If the legacy file capability is set, then don't set privs
>  		 * for a setuid root binary run by a non-root user.  Do set it
>  		 * for a root user just to cause least surprise to an admin.
>  		 */
> -		if (has_cap && new->uid != 0 && new->euid == 0) {
> +		if (has_cap && !uid_eq(new->uid, root_uid) && uid_eq(new->euid, root_uid)) {
>  			warn_setuid_and_fcaps_mixed(bprm->filename);
>  			goto skip;
>  		}
> @@ -495,12 +500,12 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
>  		 *
>  		 * If only the real uid is 0, we do not set the effective bit.
>  		 */
> -		if (new->euid == 0 || new->uid == 0) {
> +		if (uid_eq(new->euid, root_uid) || uid_eq(new->uid, root_uid)) {
>  			/* pP' = (cap_bset & ~0) | (pI & ~0) */
>  			new->cap_permitted = cap_combine(old->cap_bset,
>  							 old->cap_inheritable);
>  		}
> -		if (new->euid == 0)
> +		if (uid_eq(new->euid, root_uid))
>  			effective = true;
>  	}
>  skip:
> @@ -508,8 +513,8 @@ skip:
>  	/* Don't let someone trace a set[ug]id/setpcap binary with the revised
>  	 * credentials unless they have the appropriate permit
>  	 */
> -	if ((new->euid != old->uid ||
> -	     new->egid != old->gid ||
> +	if ((!uid_eq(new->euid, old->uid) ||
> +	     !gid_eq(new->egid, old->gid) ||
>  	     !cap_issubset(new->cap_permitted, old->cap_permitted)) &&
>  	    bprm->unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
>  		/* downgrade; they get no more than they had, and maybe less */
> @@ -544,7 +549,7 @@ skip:
>  	 */
>  	if (!cap_isclear(new->cap_effective)) {
>  		if (!cap_issubset(CAP_FULL_SET, new->cap_effective) ||
> -		    new->euid != 0 || new->uid != 0 ||
> +		    !uid_eq(new->euid, root_uid) || !uid_eq(new->uid, root_uid) ||
>  		    issecure(SECURE_NOROOT)) {
>  			ret = audit_log_bprm_fcaps(bprm, new, old);
>  			if (ret < 0)
> @@ -569,16 +574,17 @@ skip:
>  int cap_bprm_secureexec(struct linux_binprm *bprm)
>  {
>  	const struct cred *cred = current_cred();
> +	kuid_t root_uid = make_kuid(cred->user_ns, 0);
>  
> -	if (cred->uid != 0) {
> +	if (!uid_eq(cred->uid, root_uid)) {
>  		if (bprm->cap_effective)
>  			return 1;
>  		if (!cap_isclear(cred->cap_permitted))
>  			return 1;
>  	}
>  
> -	return (cred->euid != cred->uid ||
> -		cred->egid != cred->gid);
> +	return (!uid_eq(cred->euid, cred->uid) ||
> +		!gid_eq(cred->egid, cred->gid));
>  }
>  
>  /**
> @@ -668,15 +674,21 @@ int cap_inode_removexattr(struct dentry *dentry, const char *name)
>   */
>  static inline void cap_emulate_setxuid(struct cred *new, const struct cred *old)
>  {
> -	if ((old->uid == 0 || old->euid == 0 || old->suid == 0) &&
> -	    (new->uid != 0 && new->euid != 0 && new->suid != 0) &&
> +	kuid_t root_uid = make_kuid(old->user_ns, 0);
> +
> +	if ((uid_eq(old->uid, root_uid) ||
> +	     uid_eq(old->euid, root_uid) ||
> +	     uid_eq(old->suid, root_uid)) &&
> +	    (!uid_eq(new->uid, root_uid) &&
> +	     !uid_eq(new->euid, root_uid) &&
> +	     !uid_eq(new->suid, root_uid)) &&
>  	    !issecure(SECURE_KEEP_CAPS)) {
>  		cap_clear(new->cap_permitted);
>  		cap_clear(new->cap_effective);
>  	}
> -	if (old->euid == 0 && new->euid != 0)
> +	if (uid_eq(old->euid, root_uid) && !uid_eq(new->euid, root_uid))
>  		cap_clear(new->cap_effective);
> -	if (old->euid != 0 && new->euid == 0)
> +	if (!uid_eq(old->euid, root_uid) && uid_eq(new->euid, root_uid))
>  		new->cap_effective = new->cap_permitted;
>  }
>  
> @@ -709,11 +721,12 @@ int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags)
>  		 *          if not, we might be a bit too harsh here.
>  		 */
>  		if (!issecure(SECURE_NO_SETUID_FIXUP)) {
> -			if (old->fsuid == 0 && new->fsuid != 0)
> +			kuid_t root_uid = make_kuid(old->user_ns, 0);
> +			if (uid_eq(old->fsuid, root_uid) && !uid_eq(new->fsuid, root_uid))
>  				new->cap_effective =
>  					cap_drop_fs_set(new->cap_effective);
>  
> -			if (old->fsuid != 0 && new->fsuid == 0)
> +			if (!uid_eq(old->fsuid, root_uid) && uid_eq(new->fsuid, root_uid))
>  				new->cap_effective =
>  					cap_raise_fs_set(new->cap_effective,
>  							 new->cap_permitted);
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 24/43] userns: Convert ptrace, kill, set_priority permission checks to work with kuids and kgids
       [not found]     ` <1333862139-31737-24-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-04-18 18:56       ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:56 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> Update the permission checks to use the new uid_eq and gid_eq helpers
> and remove the now unnecessary user_ns equality comparison.
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
>  kernel/ptrace.c |   13 ++++++-------
>  kernel/signal.c |   15 ++++++---------
>  kernel/sys.c    |   18 ++++++++----------
>  3 files changed, 20 insertions(+), 26 deletions(-)
> 
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 24e0a5a..a232bb5 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -198,13 +198,12 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
>  		return 0;
>  	rcu_read_lock();
>  	tcred = __task_cred(task);
> -	if (cred->user_ns == tcred->user_ns &&
> -	    (cred->uid == tcred->euid &&
> -	     cred->uid == tcred->suid &&
> -	     cred->uid == tcred->uid  &&
> -	     cred->gid == tcred->egid &&
> -	     cred->gid == tcred->sgid &&
> -	     cred->gid == tcred->gid))
> +	if (uid_eq(cred->uid, tcred->euid) &&
> +	    uid_eq(cred->uid, tcred->suid) &&
> +	    uid_eq(cred->uid, tcred->uid)  &&
> +	    gid_eq(cred->gid, tcred->egid) &&
> +	    gid_eq(cred->gid, tcred->sgid) &&
> +	    gid_eq(cred->gid, tcred->gid))
>  		goto ok;
>  	if (ptrace_has_cap(tcred->user_ns, mode))
>  		goto ok;
> diff --git a/kernel/signal.c b/kernel/signal.c
> index d630327..9797939 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -767,11 +767,10 @@ static int kill_ok_by_cred(struct task_struct *t)
>  	const struct cred *cred = current_cred();
>  	const struct cred *tcred = __task_cred(t);
>  
> -	if (cred->user_ns == tcred->user_ns &&
> -	    (cred->euid == tcred->suid ||
> -	     cred->euid == tcred->uid ||
> -	     cred->uid  == tcred->suid ||
> -	     cred->uid  == tcred->uid))
> +	if (uid_eq(cred->euid, tcred->suid) ||
> +	    uid_eq(cred->euid, tcred->uid)  ||
> +	    uid_eq(cred->uid,  tcred->suid) ||
> +	    uid_eq(cred->uid,  tcred->uid))
>  		return 1;
>  
>  	if (ns_capable(tcred->user_ns, CAP_KILL))
> @@ -1389,10 +1388,8 @@ static int kill_as_cred_perm(const struct cred *cred,
>  			     struct task_struct *target)
>  {
>  	const struct cred *pcred = __task_cred(target);
> -	if (cred->user_ns != pcred->user_ns)
> -		return 0;
> -	if (cred->euid != pcred->suid && cred->euid != pcred->uid &&
> -	    cred->uid  != pcred->suid && cred->uid  != pcred->uid)
> +	if (uid_eq(cred->euid, pcred->suid) && uid_eq(cred->euid, pcred->uid) &&

These should be !uid_eq() right?
> +	    uid_eq(cred->uid,  pcred->suid) && uid_eq(cred->uid,  pcred->uid))
>  		return 0;
>  	return 1;
>  }
> diff --git a/kernel/sys.c b/kernel/sys.c
> index aff09f2..f484077 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -131,9 +131,8 @@ static bool set_one_prio_perm(struct task_struct *p)
>  {
>  	const struct cred *cred = current_cred(), *pcred = __task_cred(p);
>  
> -	if (pcred->user_ns == cred->user_ns &&
> -	    (pcred->uid  == cred->euid ||
> -	     pcred->euid == cred->euid))
> +	if (uid_eq(pcred->uid,  cred->euid) ||
> +	    uid_eq(pcred->euid, cred->euid))
>  		return true;
>  	if (ns_capable(pcred->user_ns, CAP_SYS_NICE))
>  		return true;
> @@ -1582,13 +1581,12 @@ static int check_prlimit_permission(struct task_struct *task)
>  		return 0;
>  
>  	tcred = __task_cred(task);
> -	if (cred->user_ns == tcred->user_ns &&
> -	    (cred->uid == tcred->euid &&
> -	     cred->uid == tcred->suid &&
> -	     cred->uid == tcred->uid &&
> -	     cred->gid == tcred->egid &&
> -	     cred->gid == tcred->sgid &&
> -		    cred->gid == tcred->gid))
> +	if (uid_eq(cred->uid, tcred->euid) &&
> +	    uid_eq(cred->uid, tcred->suid) &&
> +	    uid_eq(cred->uid, tcred->uid)  &&
> +	    gid_eq(cred->gid, tcred->egid) &&
> +	    gid_eq(cred->gid, tcred->sgid) &&
> +	    gid_eq(cred->gid, tcred->gid))
>  		return 0;
>  	if (ns_capable(tcred->user_ns, CAP_SYS_RESOURCE))
>  		return 0;
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 24/43] userns: Convert ptrace, kill, set_priority permission checks to work with kuids and kgids
  2012-04-08  5:15     ` "Eric W. Beiderman
                       ` (2 preceding siblings ...)
  (?)
@ 2012-04-18 18:56     ` Serge E. Hallyn
       [not found]       ` <20120418185610.GA5186-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  -1 siblings, 1 reply; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:56 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> Update the permission checks to use the new uid_eq and gid_eq helpers
> and remove the now unnecessary user_ns equality comparison.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  kernel/ptrace.c |   13 ++++++-------
>  kernel/signal.c |   15 ++++++---------
>  kernel/sys.c    |   18 ++++++++----------
>  3 files changed, 20 insertions(+), 26 deletions(-)
> 
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 24e0a5a..a232bb5 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -198,13 +198,12 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
>  		return 0;
>  	rcu_read_lock();
>  	tcred = __task_cred(task);
> -	if (cred->user_ns == tcred->user_ns &&
> -	    (cred->uid == tcred->euid &&
> -	     cred->uid == tcred->suid &&
> -	     cred->uid == tcred->uid  &&
> -	     cred->gid == tcred->egid &&
> -	     cred->gid == tcred->sgid &&
> -	     cred->gid == tcred->gid))
> +	if (uid_eq(cred->uid, tcred->euid) &&
> +	    uid_eq(cred->uid, tcred->suid) &&
> +	    uid_eq(cred->uid, tcred->uid)  &&
> +	    gid_eq(cred->gid, tcred->egid) &&
> +	    gid_eq(cred->gid, tcred->sgid) &&
> +	    gid_eq(cred->gid, tcred->gid))
>  		goto ok;
>  	if (ptrace_has_cap(tcred->user_ns, mode))
>  		goto ok;
> diff --git a/kernel/signal.c b/kernel/signal.c
> index d630327..9797939 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -767,11 +767,10 @@ static int kill_ok_by_cred(struct task_struct *t)
>  	const struct cred *cred = current_cred();
>  	const struct cred *tcred = __task_cred(t);
>  
> -	if (cred->user_ns == tcred->user_ns &&
> -	    (cred->euid == tcred->suid ||
> -	     cred->euid == tcred->uid ||
> -	     cred->uid  == tcred->suid ||
> -	     cred->uid  == tcred->uid))
> +	if (uid_eq(cred->euid, tcred->suid) ||
> +	    uid_eq(cred->euid, tcred->uid)  ||
> +	    uid_eq(cred->uid,  tcred->suid) ||
> +	    uid_eq(cred->uid,  tcred->uid))
>  		return 1;
>  
>  	if (ns_capable(tcred->user_ns, CAP_KILL))
> @@ -1389,10 +1388,8 @@ static int kill_as_cred_perm(const struct cred *cred,
>  			     struct task_struct *target)
>  {
>  	const struct cred *pcred = __task_cred(target);
> -	if (cred->user_ns != pcred->user_ns)
> -		return 0;
> -	if (cred->euid != pcred->suid && cred->euid != pcred->uid &&
> -	    cred->uid  != pcred->suid && cred->uid  != pcred->uid)
> +	if (uid_eq(cred->euid, pcred->suid) && uid_eq(cred->euid, pcred->uid) &&

These should be !uid_eq() right?
> +	    uid_eq(cred->uid,  pcred->suid) && uid_eq(cred->uid,  pcred->uid))
>  		return 0;
>  	return 1;
>  }
> diff --git a/kernel/sys.c b/kernel/sys.c
> index aff09f2..f484077 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -131,9 +131,8 @@ static bool set_one_prio_perm(struct task_struct *p)
>  {
>  	const struct cred *cred = current_cred(), *pcred = __task_cred(p);
>  
> -	if (pcred->user_ns == cred->user_ns &&
> -	    (pcred->uid  == cred->euid ||
> -	     pcred->euid == cred->euid))
> +	if (uid_eq(pcred->uid,  cred->euid) ||
> +	    uid_eq(pcred->euid, cred->euid))
>  		return true;
>  	if (ns_capable(pcred->user_ns, CAP_SYS_NICE))
>  		return true;
> @@ -1582,13 +1581,12 @@ static int check_prlimit_permission(struct task_struct *task)
>  		return 0;
>  
>  	tcred = __task_cred(task);
> -	if (cred->user_ns == tcred->user_ns &&
> -	    (cred->uid == tcred->euid &&
> -	     cred->uid == tcred->suid &&
> -	     cred->uid == tcred->uid &&
> -	     cred->gid == tcred->egid &&
> -	     cred->gid == tcred->sgid &&
> -		    cred->gid == tcred->gid))
> +	if (uid_eq(cred->uid, tcred->euid) &&
> +	    uid_eq(cred->uid, tcred->suid) &&
> +	    uid_eq(cred->uid, tcred->uid)  &&
> +	    gid_eq(cred->gid, tcred->egid) &&
> +	    gid_eq(cred->gid, tcred->sgid) &&
> +	    gid_eq(cred->gid, tcred->gid))
>  		return 0;
>  	if (ns_capable(tcred->user_ns, CAP_SYS_RESOURCE))
>  		return 0;
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 25/43] userns: Store uid and gid types in vfs structures with kuid_t and kgid_t types
       [not found]     ` <1333862139-31737-25-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-04-18 18:57       ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:57 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> The conversion of all of the users is not done yet there are too many to change
> in one go and leave the code reviewable. For now I change just the header and
> a few trivial users and rely on CONFIG_UIDGID_STRICT_TYPE_CHECKS not being set
> to ensure that the code will still compile during the transition.
> 
> Helper functions i_uid_read, i_uid_write, i_gid_read, i_gid_write are added
> so that in most cases filesystems can avoid the complexities of multiple user
> namespaces and can concentrate on moving their raw numeric values into and
> out of the vfs data structures.
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  fs/inode.c         |    6 +++---
>  include/linux/fs.h |   36 +++++++++++++++++++++++++++++++-----
>  2 files changed, 34 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index f0c4ace..deb72f6 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -135,8 +135,8 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
>  	inode->i_fop = &empty_fops;
>  	inode->__i_nlink = 1;
>  	inode->i_opflags = 0;
> -	inode->i_uid = 0;
> -	inode->i_gid = 0;
> +	i_uid_write(inode, 0);
> +	i_gid_write(inode, 0);
>  	atomic_set(&inode->i_writecount, 0);
>  	inode->i_size = 0;
>  	inode->i_blocks = 0;
> @@ -1732,7 +1732,7 @@ EXPORT_SYMBOL(inode_init_owner);
>   */
>  bool inode_owner_or_capable(const struct inode *inode)
>  {
> -	if (current_fsuid() == inode->i_uid)
> +	if (uid_eq(current_fsuid(), inode->i_uid))
>  		return true;
>  	if (inode_capable(inode, CAP_FOWNER))
>  		return true;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index a6c5efb..797eb26 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -402,6 +402,7 @@ struct inodes_stat_t {
>  #include <linux/atomic.h>
>  #include <linux/shrinker.h>
>  #include <linux/migrate_mode.h>
> +#include <linux/uidgid.h>
>  
>  #include <asm/byteorder.h>
>  
> @@ -469,8 +470,8 @@ typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
>  struct iattr {
>  	unsigned int	ia_valid;
>  	umode_t		ia_mode;
> -	uid_t		ia_uid;
> -	gid_t		ia_gid;
> +	kuid_t		ia_uid;
> +	kgid_t		ia_gid;
>  	loff_t		ia_size;
>  	struct timespec	ia_atime;
>  	struct timespec	ia_mtime;
> @@ -761,8 +762,8 @@ struct posix_acl;
>  struct inode {
>  	umode_t			i_mode;
>  	unsigned short		i_opflags;
> -	uid_t			i_uid;
> -	gid_t			i_gid;
> +	kuid_t			i_uid;
> +	kgid_t			i_gid;
>  	unsigned int		i_flags;
>  
>  #ifdef CONFIG_FS_POSIX_ACL
> @@ -927,6 +928,31 @@ static inline void i_size_write(struct inode *inode, loff_t i_size)
>  #endif
>  }
>  
> +/* Helper functions so that in most cases filesystems will
> + * not need to deal directly with kuid_t and kgid_t and can
> + * instead deal with the raw numeric values that are stored
> + * in the filesystem.
> + */
> +static inline uid_t i_uid_read(const struct inode *inode)
> +{
> +	return from_kuid(&init_user_ns, inode->i_uid);
> +}
> +
> +static inline gid_t i_gid_read(const struct inode *inode)
> +{
> +	return from_kgid(&init_user_ns, inode->i_gid);
> +}
> +
> +static inline void i_uid_write(struct inode *inode, uid_t uid)
> +{
> +	inode->i_uid = make_kuid(&init_user_ns, uid);
> +}
> +
> +static inline void i_gid_write(struct inode *inode, gid_t gid)
> +{
> +	inode->i_gid = make_kgid(&init_user_ns, gid);
> +}
> +
>  static inline unsigned iminor(const struct inode *inode)
>  {
>  	return MINOR(inode->i_rdev);
> @@ -943,7 +969,7 @@ struct fown_struct {
>  	rwlock_t lock;          /* protects pid, uid, euid fields */
>  	struct pid *pid;	/* pid or -pgrp where SIGIO should be sent */
>  	enum pid_type pid_type;	/* Kind of process group SIGIO should be sent to */
> -	uid_t uid, euid;	/* uid/euid of process setting the owner */
> +	kuid_t uid, euid;	/* uid/euid of process setting the owner */
>  	int signum;		/* posix.1b rt signal to be delivered on IO */
>  };
>  
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 25/43] userns: Store uid and gid types in vfs structures with kuid_t and kgid_t types
  2012-04-08  5:15     ` "Eric W. Beiderman
  (?)
  (?)
@ 2012-04-18 18:57     ` Serge E. Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:57 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> The conversion of all of the users is not done yet there are too many to change
> in one go and leave the code reviewable. For now I change just the header and
> a few trivial users and rely on CONFIG_UIDGID_STRICT_TYPE_CHECKS not being set
> to ensure that the code will still compile during the transition.
> 
> Helper functions i_uid_read, i_uid_write, i_gid_read, i_gid_write are added
> so that in most cases filesystems can avoid the complexities of multiple user
> namespaces and can concentrate on moving their raw numeric values into and
> out of the vfs data structures.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  fs/inode.c         |    6 +++---
>  include/linux/fs.h |   36 +++++++++++++++++++++++++++++++-----
>  2 files changed, 34 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index f0c4ace..deb72f6 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -135,8 +135,8 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
>  	inode->i_fop = &empty_fops;
>  	inode->__i_nlink = 1;
>  	inode->i_opflags = 0;
> -	inode->i_uid = 0;
> -	inode->i_gid = 0;
> +	i_uid_write(inode, 0);
> +	i_gid_write(inode, 0);
>  	atomic_set(&inode->i_writecount, 0);
>  	inode->i_size = 0;
>  	inode->i_blocks = 0;
> @@ -1732,7 +1732,7 @@ EXPORT_SYMBOL(inode_init_owner);
>   */
>  bool inode_owner_or_capable(const struct inode *inode)
>  {
> -	if (current_fsuid() == inode->i_uid)
> +	if (uid_eq(current_fsuid(), inode->i_uid))
>  		return true;
>  	if (inode_capable(inode, CAP_FOWNER))
>  		return true;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index a6c5efb..797eb26 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -402,6 +402,7 @@ struct inodes_stat_t {
>  #include <linux/atomic.h>
>  #include <linux/shrinker.h>
>  #include <linux/migrate_mode.h>
> +#include <linux/uidgid.h>
>  
>  #include <asm/byteorder.h>
>  
> @@ -469,8 +470,8 @@ typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
>  struct iattr {
>  	unsigned int	ia_valid;
>  	umode_t		ia_mode;
> -	uid_t		ia_uid;
> -	gid_t		ia_gid;
> +	kuid_t		ia_uid;
> +	kgid_t		ia_gid;
>  	loff_t		ia_size;
>  	struct timespec	ia_atime;
>  	struct timespec	ia_mtime;
> @@ -761,8 +762,8 @@ struct posix_acl;
>  struct inode {
>  	umode_t			i_mode;
>  	unsigned short		i_opflags;
> -	uid_t			i_uid;
> -	gid_t			i_gid;
> +	kuid_t			i_uid;
> +	kgid_t			i_gid;
>  	unsigned int		i_flags;
>  
>  #ifdef CONFIG_FS_POSIX_ACL
> @@ -927,6 +928,31 @@ static inline void i_size_write(struct inode *inode, loff_t i_size)
>  #endif
>  }
>  
> +/* Helper functions so that in most cases filesystems will
> + * not need to deal directly with kuid_t and kgid_t and can
> + * instead deal with the raw numeric values that are stored
> + * in the filesystem.
> + */
> +static inline uid_t i_uid_read(const struct inode *inode)
> +{
> +	return from_kuid(&init_user_ns, inode->i_uid);
> +}
> +
> +static inline gid_t i_gid_read(const struct inode *inode)
> +{
> +	return from_kgid(&init_user_ns, inode->i_gid);
> +}
> +
> +static inline void i_uid_write(struct inode *inode, uid_t uid)
> +{
> +	inode->i_uid = make_kuid(&init_user_ns, uid);
> +}
> +
> +static inline void i_gid_write(struct inode *inode, gid_t gid)
> +{
> +	inode->i_gid = make_kgid(&init_user_ns, gid);
> +}
> +
>  static inline unsigned iminor(const struct inode *inode)
>  {
>  	return MINOR(inode->i_rdev);
> @@ -943,7 +969,7 @@ struct fown_struct {
>  	rwlock_t lock;          /* protects pid, uid, euid fields */
>  	struct pid *pid;	/* pid or -pgrp where SIGIO should be sent */
>  	enum pid_type pid_type;	/* Kind of process group SIGIO should be sent to */
> -	uid_t uid, euid;	/* uid/euid of process setting the owner */
> +	kuid_t uid, euid;	/* uid/euid of process setting the owner */
>  	int signum;		/* posix.1b rt signal to be delivered on IO */
>  };
>  
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 26/43] userns: Convert in_group_p and in_egroup_p to use kgid_t
       [not found]     ` <1333862139-31737-26-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-04-18 18:58       ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:58 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  include/linux/cred.h |    4 ++--
>  kernel/groups.c      |   14 ++++++--------
>  2 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index fac0579..917dc5a 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -73,8 +73,8 @@ extern int groups_search(const struct group_info *, kgid_t);
>  #define GROUP_AT(gi, i) \
>  	((gi)->blocks[(i) / NGROUPS_PER_BLOCK][(i) % NGROUPS_PER_BLOCK])
>  
> -extern int in_group_p(gid_t);
> -extern int in_egroup_p(gid_t);
> +extern int in_group_p(kgid_t);
> +extern int in_egroup_p(kgid_t);
>  
>  /*
>   * The common credentials for a thread group
> diff --git a/kernel/groups.c b/kernel/groups.c
> index 84156f2..6b2588d 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -256,27 +256,25 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
>  /*
>   * Check whether we're fsgid/egid or in the supplemental group..
>   */
> -int in_group_p(gid_t grp)
> +int in_group_p(kgid_t grp)
>  {
>  	const struct cred *cred = current_cred();
>  	int retval = 1;
>  
> -	if (grp != cred->fsgid)
> -		retval = groups_search(cred->group_info,
> -				       make_kgid(cred->user_ns, grp));
> +	if (!gid_eq(grp, cred->fsgid))
> +		retval = groups_search(cred->group_info, grp);
>  	return retval;
>  }
>  
>  EXPORT_SYMBOL(in_group_p);
>  
> -int in_egroup_p(gid_t grp)
> +int in_egroup_p(kgid_t grp)
>  {
>  	const struct cred *cred = current_cred();
>  	int retval = 1;
>  
> -	if (grp != cred->egid)
> -		retval = groups_search(cred->group_info,
> -				       make_kgid(cred->user_ns, grp));
> +	if (!gid_eq(grp, cred->egid))
> +		retval = groups_search(cred->group_info, grp);
>  	return retval;
>  }
>  
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 26/43] userns: Convert in_group_p and in_egroup_p to use kgid_t
  2012-04-08  5:15     ` "Eric W. Beiderman
  (?)
  (?)
@ 2012-04-18 18:58     ` Serge E. Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 18:58 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  include/linux/cred.h |    4 ++--
>  kernel/groups.c      |   14 ++++++--------
>  2 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index fac0579..917dc5a 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -73,8 +73,8 @@ extern int groups_search(const struct group_info *, kgid_t);
>  #define GROUP_AT(gi, i) \
>  	((gi)->blocks[(i) / NGROUPS_PER_BLOCK][(i) % NGROUPS_PER_BLOCK])
>  
> -extern int in_group_p(gid_t);
> -extern int in_egroup_p(gid_t);
> +extern int in_group_p(kgid_t);
> +extern int in_egroup_p(kgid_t);
>  
>  /*
>   * The common credentials for a thread group
> diff --git a/kernel/groups.c b/kernel/groups.c
> index 84156f2..6b2588d 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -256,27 +256,25 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
>  /*
>   * Check whether we're fsgid/egid or in the supplemental group..
>   */
> -int in_group_p(gid_t grp)
> +int in_group_p(kgid_t grp)
>  {
>  	const struct cred *cred = current_cred();
>  	int retval = 1;
>  
> -	if (grp != cred->fsgid)
> -		retval = groups_search(cred->group_info,
> -				       make_kgid(cred->user_ns, grp));
> +	if (!gid_eq(grp, cred->fsgid))
> +		retval = groups_search(cred->group_info, grp);
>  	return retval;
>  }
>  
>  EXPORT_SYMBOL(in_group_p);
>  
> -int in_egroup_p(gid_t grp)
> +int in_egroup_p(kgid_t grp)
>  {
>  	const struct cred *cred = current_cred();
>  	int retval = 1;
>  
> -	if (grp != cred->egid)
> -		retval = groups_search(cred->group_info,
> -				       make_kgid(cred->user_ns, grp));
> +	if (!gid_eq(grp, cred->egid))
> +		retval = groups_search(cred->group_info, grp);
>  	return retval;
>  }
>  
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  2012-04-08  5:15 ` [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs "Eric W. Beiderman
@ 2012-04-18 19:02       ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:02 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
>  fs/attr.c                |    8 ++++----
>  fs/exec.c                |   10 +++++-----
>  fs/fcntl.c               |    6 +++---
>  fs/ioprio.c              |    4 ++--
>  fs/locks.c               |    2 +-
>  fs/namei.c               |    8 ++++----
>  include/linux/quotaops.h |    4 ++--
>  7 files changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/attr.c b/fs/attr.c
> index 73f69a6..2f094c6 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -47,14 +47,14 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)
>  
>  	/* Make sure a caller can chown. */
>  	if ((ia_valid & ATTR_UID) &&
> -	    (current_fsuid() != inode->i_uid ||
> -	     attr->ia_uid != inode->i_uid) && !capable(CAP_CHOWN))
> +	    (!uid_eq(current_fsuid(), inode->i_uid) ||
> +	     !uid_eq(attr->ia_uid, inode->i_uid)) && !capable(CAP_CHOWN))
>  		return -EPERM;
>  
>  	/* Make sure caller can chgrp. */
>  	if ((ia_valid & ATTR_GID) &&
> -	    (current_fsuid() != inode->i_uid ||
> -	    (!in_group_p(attr->ia_gid) && attr->ia_gid != inode->i_gid)) &&
> +	    (!uid_eq(current_fsuid(), inode->i_uid) ||
> +	    (!in_group_p(attr->ia_gid) && gid_eq(attr->ia_gid, inode->i_gid))) &&

This should be !gid_eq() ?

>  	    !capable(CAP_CHOWN))
>  		return -EPERM;
>  
> diff --git a/fs/exec.c b/fs/exec.c
> index 9a1d9f0..00ae2ef 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1139,7 +1139,7 @@ void setup_new_exec(struct linux_binprm * bprm)
>  	/* This is the point of no return */
>  	current->sas_ss_sp = current->sas_ss_size = 0;
>  
> -	if (current_euid() == current_uid() && current_egid() == current_gid())
> +	if (uid_eq(current_euid(), current_uid()) && gid_eq(current_egid(), current_gid()))
>  		set_dumpable(current->mm, 1);
>  	else
>  		set_dumpable(current->mm, suid_dumpable);
> @@ -1153,8 +1153,8 @@ void setup_new_exec(struct linux_binprm * bprm)
>  	current->mm->task_size = TASK_SIZE;
>  
>  	/* install the new credentials */
> -	if (bprm->cred->uid != current_euid() ||
> -	    bprm->cred->gid != current_egid()) {
> +	if (!uid_eq(bprm->cred->uid, current_euid()) ||
> +	    !gid_eq(bprm->cred->gid, current_egid())) {
>  		current->pdeath_signal = 0;
>  	} else {
>  		would_dump(bprm, bprm->file);
> @@ -2120,7 +2120,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
>  	if (__get_dumpable(cprm.mm_flags) == 2) {
>  		/* Setuid core dump mode */
>  		flag = O_EXCL;		/* Stop rewrite attacks */
> -		cred->fsuid = 0;	/* Dump root private */
> +		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
>  	}
>  
>  	retval = coredump_wait(exit_code, &core_state);
> @@ -2221,7 +2221,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
>  		 * Dont allow local users get cute and trick others to coredump
>  		 * into their pre-created files.
>  		 */
> -		if (inode->i_uid != current_fsuid())
> +		if (!uid_eq(inode->i_uid, current_fsuid()))
>  			goto close_fail;
>  		if (!cprm.file->f_op || !cprm.file->f_op->write)
>  			goto close_fail;
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index 75e7c1f..d078b75 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -532,9 +532,9 @@ static inline int sigio_perm(struct task_struct *p,
>  
>  	rcu_read_lock();
>  	cred = __task_cred(p);
> -	ret = ((fown->euid == 0 ||
> -		fown->euid == cred->suid || fown->euid == cred->uid ||
> -		fown->uid  == cred->suid || fown->uid  == cred->uid) &&
> +	ret = ((uid_eq(fown->euid, GLOBAL_ROOT_UID) ||
> +		uid_eq(fown->euid, cred->suid) || uid_eq(fown->euid, cred->uid) ||
> +		uid_eq(fown->uid,  cred->suid) || uid_eq(fown->uid,  cred->uid)) &&
>  	       !security_file_send_sigiotask(p, fown, sig));
>  	rcu_read_unlock();
>  	return ret;
> diff --git a/fs/ioprio.c b/fs/ioprio.c
> index 2072e41..5e6dbe89 100644
> --- a/fs/ioprio.c
> +++ b/fs/ioprio.c
> @@ -37,8 +37,8 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
>  
>  	rcu_read_lock();
>  	tcred = __task_cred(task);
> -	if (tcred->uid != cred->euid &&
> -	    tcred->uid != cred->uid && !capable(CAP_SYS_NICE)) {
> +	if (!uid_eq(tcred->uid, cred->euid) &&
> +	    !uid_eq(tcred->uid, cred->uid) && !capable(CAP_SYS_NICE)) {
>  		rcu_read_unlock();
>  		return -EPERM;
>  	}
> diff --git a/fs/locks.c b/fs/locks.c
> index 637694b..3e946cd 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1445,7 +1445,7 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
>  	struct inode *inode = dentry->d_inode;
>  	int error;
>  
> -	if ((current_fsuid() != inode->i_uid) && !capable(CAP_LEASE))
> +	if ((!uid_eq(current_fsuid(), inode->i_uid)) && !capable(CAP_LEASE))
>  		return -EACCES;
>  	if (!S_ISREG(inode->i_mode))
>  		return -EINVAL;
> diff --git a/fs/namei.c b/fs/namei.c
> index 941c436..86512b4 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -228,7 +228,7 @@ static int acl_permission_check(struct inode *inode, int mask)
>  {
>  	unsigned int mode = inode->i_mode;
>  
> -	if (likely(current_fsuid() == inode->i_uid))
> +	if (likely(uid_eq(current_fsuid(), inode->i_uid)))
>  		mode >>= 6;
>  	else {
>  		if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
> @@ -1956,13 +1956,13 @@ static int user_path_parent(int dfd, const char __user *path,
>   */
>  static inline int check_sticky(struct inode *dir, struct inode *inode)
>  {
> -	uid_t fsuid = current_fsuid();
> +	kuid_t fsuid = current_fsuid();
>  
>  	if (!(dir->i_mode & S_ISVTX))
>  		return 0;
> -	if (inode->i_uid == fsuid)
> +	if (uid_eq(inode->i_uid, fsuid))
>  		return 0;
> -	if (dir->i_uid == fsuid)
> +	if (uid_eq(dir->i_uid, fsuid))
>  		return 0;
>  	return !inode_capable(inode, CAP_FOWNER);
>  }
> diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
> index d93f95e..17b9773 100644
> --- a/include/linux/quotaops.h
> +++ b/include/linux/quotaops.h
> @@ -22,8 +22,8 @@ static inline struct quota_info *sb_dqopt(struct super_block *sb)
>  static inline bool is_quota_modification(struct inode *inode, struct iattr *ia)
>  {
>  	return (ia->ia_valid & ATTR_SIZE && ia->ia_size != inode->i_size) ||
> -		(ia->ia_valid & ATTR_UID && ia->ia_uid != inode->i_uid) ||
> -		(ia->ia_valid & ATTR_GID && ia->ia_gid != inode->i_gid);
> +		(ia->ia_valid & ATTR_UID && !uid_eq(ia->ia_uid, inode->i_uid)) ||
> +		(ia->ia_valid & ATTR_GID && !gid_eq(ia->ia_gid, inode->i_gid));
>  }
>  
>  #if defined(CONFIG_QUOTA)
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
@ 2012-04-18 19:02       ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:02 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  fs/attr.c                |    8 ++++----
>  fs/exec.c                |   10 +++++-----
>  fs/fcntl.c               |    6 +++---
>  fs/ioprio.c              |    4 ++--
>  fs/locks.c               |    2 +-
>  fs/namei.c               |    8 ++++----
>  include/linux/quotaops.h |    4 ++--
>  7 files changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/attr.c b/fs/attr.c
> index 73f69a6..2f094c6 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -47,14 +47,14 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)
>  
>  	/* Make sure a caller can chown. */
>  	if ((ia_valid & ATTR_UID) &&
> -	    (current_fsuid() != inode->i_uid ||
> -	     attr->ia_uid != inode->i_uid) && !capable(CAP_CHOWN))
> +	    (!uid_eq(current_fsuid(), inode->i_uid) ||
> +	     !uid_eq(attr->ia_uid, inode->i_uid)) && !capable(CAP_CHOWN))
>  		return -EPERM;
>  
>  	/* Make sure caller can chgrp. */
>  	if ((ia_valid & ATTR_GID) &&
> -	    (current_fsuid() != inode->i_uid ||
> -	    (!in_group_p(attr->ia_gid) && attr->ia_gid != inode->i_gid)) &&
> +	    (!uid_eq(current_fsuid(), inode->i_uid) ||
> +	    (!in_group_p(attr->ia_gid) && gid_eq(attr->ia_gid, inode->i_gid))) &&

This should be !gid_eq() ?

>  	    !capable(CAP_CHOWN))
>  		return -EPERM;
>  
> diff --git a/fs/exec.c b/fs/exec.c
> index 9a1d9f0..00ae2ef 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1139,7 +1139,7 @@ void setup_new_exec(struct linux_binprm * bprm)
>  	/* This is the point of no return */
>  	current->sas_ss_sp = current->sas_ss_size = 0;
>  
> -	if (current_euid() == current_uid() && current_egid() == current_gid())
> +	if (uid_eq(current_euid(), current_uid()) && gid_eq(current_egid(), current_gid()))
>  		set_dumpable(current->mm, 1);
>  	else
>  		set_dumpable(current->mm, suid_dumpable);
> @@ -1153,8 +1153,8 @@ void setup_new_exec(struct linux_binprm * bprm)
>  	current->mm->task_size = TASK_SIZE;
>  
>  	/* install the new credentials */
> -	if (bprm->cred->uid != current_euid() ||
> -	    bprm->cred->gid != current_egid()) {
> +	if (!uid_eq(bprm->cred->uid, current_euid()) ||
> +	    !gid_eq(bprm->cred->gid, current_egid())) {
>  		current->pdeath_signal = 0;
>  	} else {
>  		would_dump(bprm, bprm->file);
> @@ -2120,7 +2120,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
>  	if (__get_dumpable(cprm.mm_flags) == 2) {
>  		/* Setuid core dump mode */
>  		flag = O_EXCL;		/* Stop rewrite attacks */
> -		cred->fsuid = 0;	/* Dump root private */
> +		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
>  	}
>  
>  	retval = coredump_wait(exit_code, &core_state);
> @@ -2221,7 +2221,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
>  		 * Dont allow local users get cute and trick others to coredump
>  		 * into their pre-created files.
>  		 */
> -		if (inode->i_uid != current_fsuid())
> +		if (!uid_eq(inode->i_uid, current_fsuid()))
>  			goto close_fail;
>  		if (!cprm.file->f_op || !cprm.file->f_op->write)
>  			goto close_fail;
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index 75e7c1f..d078b75 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -532,9 +532,9 @@ static inline int sigio_perm(struct task_struct *p,
>  
>  	rcu_read_lock();
>  	cred = __task_cred(p);
> -	ret = ((fown->euid == 0 ||
> -		fown->euid == cred->suid || fown->euid == cred->uid ||
> -		fown->uid  == cred->suid || fown->uid  == cred->uid) &&
> +	ret = ((uid_eq(fown->euid, GLOBAL_ROOT_UID) ||
> +		uid_eq(fown->euid, cred->suid) || uid_eq(fown->euid, cred->uid) ||
> +		uid_eq(fown->uid,  cred->suid) || uid_eq(fown->uid,  cred->uid)) &&
>  	       !security_file_send_sigiotask(p, fown, sig));
>  	rcu_read_unlock();
>  	return ret;
> diff --git a/fs/ioprio.c b/fs/ioprio.c
> index 2072e41..5e6dbe89 100644
> --- a/fs/ioprio.c
> +++ b/fs/ioprio.c
> @@ -37,8 +37,8 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
>  
>  	rcu_read_lock();
>  	tcred = __task_cred(task);
> -	if (tcred->uid != cred->euid &&
> -	    tcred->uid != cred->uid && !capable(CAP_SYS_NICE)) {
> +	if (!uid_eq(tcred->uid, cred->euid) &&
> +	    !uid_eq(tcred->uid, cred->uid) && !capable(CAP_SYS_NICE)) {
>  		rcu_read_unlock();
>  		return -EPERM;
>  	}
> diff --git a/fs/locks.c b/fs/locks.c
> index 637694b..3e946cd 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1445,7 +1445,7 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
>  	struct inode *inode = dentry->d_inode;
>  	int error;
>  
> -	if ((current_fsuid() != inode->i_uid) && !capable(CAP_LEASE))
> +	if ((!uid_eq(current_fsuid(), inode->i_uid)) && !capable(CAP_LEASE))
>  		return -EACCES;
>  	if (!S_ISREG(inode->i_mode))
>  		return -EINVAL;
> diff --git a/fs/namei.c b/fs/namei.c
> index 941c436..86512b4 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -228,7 +228,7 @@ static int acl_permission_check(struct inode *inode, int mask)
>  {
>  	unsigned int mode = inode->i_mode;
>  
> -	if (likely(current_fsuid() == inode->i_uid))
> +	if (likely(uid_eq(current_fsuid(), inode->i_uid)))
>  		mode >>= 6;
>  	else {
>  		if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
> @@ -1956,13 +1956,13 @@ static int user_path_parent(int dfd, const char __user *path,
>   */
>  static inline int check_sticky(struct inode *dir, struct inode *inode)
>  {
> -	uid_t fsuid = current_fsuid();
> +	kuid_t fsuid = current_fsuid();
>  
>  	if (!(dir->i_mode & S_ISVTX))
>  		return 0;
> -	if (inode->i_uid == fsuid)
> +	if (uid_eq(inode->i_uid, fsuid))
>  		return 0;
> -	if (dir->i_uid == fsuid)
> +	if (uid_eq(dir->i_uid, fsuid))
>  		return 0;
>  	return !inode_capable(inode, CAP_FOWNER);
>  }
> diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
> index d93f95e..17b9773 100644
> --- a/include/linux/quotaops.h
> +++ b/include/linux/quotaops.h
> @@ -22,8 +22,8 @@ static inline struct quota_info *sb_dqopt(struct super_block *sb)
>  static inline bool is_quota_modification(struct inode *inode, struct iattr *ia)
>  {
>  	return (ia->ia_valid & ATTR_SIZE && ia->ia_size != inode->i_size) ||
> -		(ia->ia_valid & ATTR_UID && ia->ia_uid != inode->i_uid) ||
> -		(ia->ia_valid & ATTR_GID && ia->ia_gid != inode->i_gid);
> +		(ia->ia_valid & ATTR_UID && !uid_eq(ia->ia_uid, inode->i_uid)) ||
> +		(ia->ia_valid & ATTR_GID && !gid_eq(ia->ia_gid, inode->i_gid));
>  }
>  
>  #if defined(CONFIG_QUOTA)
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  2012-04-08  5:15 ` [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs "Eric W. Beiderman
@ 2012-04-18 19:03       ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:03 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
>  fs/attr.c                |    8 ++++----
>  fs/exec.c                |   10 +++++-----
>  fs/fcntl.c               |    6 +++---
>  fs/ioprio.c              |    4 ++--
>  fs/locks.c               |    2 +-
>  fs/namei.c               |    8 ++++----
>  include/linux/quotaops.h |    4 ++--
>  7 files changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/attr.c b/fs/attr.c
> index 73f69a6..2f094c6 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -47,14 +47,14 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)
>  
>  	/* Make sure a caller can chown. */
>  	if ((ia_valid & ATTR_UID) &&
> -	    (current_fsuid() != inode->i_uid ||
> -	     attr->ia_uid != inode->i_uid) && !capable(CAP_CHOWN))
> +	    (!uid_eq(current_fsuid(), inode->i_uid) ||
> +	     !uid_eq(attr->ia_uid, inode->i_uid)) && !capable(CAP_CHOWN))
>  		return -EPERM;
>  
>  	/* Make sure caller can chgrp. */
>  	if ((ia_valid & ATTR_GID) &&
> -	    (current_fsuid() != inode->i_uid ||
> -	    (!in_group_p(attr->ia_gid) && attr->ia_gid != inode->i_gid)) &&
> +	    (!uid_eq(current_fsuid(), inode->i_uid) ||
> +	    (!in_group_p(attr->ia_gid) && gid_eq(attr->ia_gid, inode->i_gid))) &&
>  	    !capable(CAP_CHOWN))
>  		return -EPERM;
>  
> diff --git a/fs/exec.c b/fs/exec.c
> index 9a1d9f0..00ae2ef 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1139,7 +1139,7 @@ void setup_new_exec(struct linux_binprm * bprm)
>  	/* This is the point of no return */
>  	current->sas_ss_sp = current->sas_ss_size = 0;
>  
> -	if (current_euid() == current_uid() && current_egid() == current_gid())
> +	if (uid_eq(current_euid(), current_uid()) && gid_eq(current_egid(), current_gid()))
>  		set_dumpable(current->mm, 1);
>  	else
>  		set_dumpable(current->mm, suid_dumpable);
> @@ -1153,8 +1153,8 @@ void setup_new_exec(struct linux_binprm * bprm)
>  	current->mm->task_size = TASK_SIZE;
>  
>  	/* install the new credentials */
> -	if (bprm->cred->uid != current_euid() ||
> -	    bprm->cred->gid != current_egid()) {
> +	if (!uid_eq(bprm->cred->uid, current_euid()) ||
> +	    !gid_eq(bprm->cred->gid, current_egid())) {
>  		current->pdeath_signal = 0;
>  	} else {
>  		would_dump(bprm, bprm->file);
> @@ -2120,7 +2120,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
>  	if (__get_dumpable(cprm.mm_flags) == 2) {
>  		/* Setuid core dump mode */
>  		flag = O_EXCL;		/* Stop rewrite attacks */
> -		cred->fsuid = 0;	/* Dump root private */
> +		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */

Sorry, one more - can this be the per-ns root uid?  The coredumps should
be ok to belong to privileged users in the namespace right?

>  	}
>  
>  	retval = coredump_wait(exit_code, &core_state);
> @@ -2221,7 +2221,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
>  		 * Dont allow local users get cute and trick others to coredump
>  		 * into their pre-created files.
>  		 */
> -		if (inode->i_uid != current_fsuid())
> +		if (!uid_eq(inode->i_uid, current_fsuid()))
>  			goto close_fail;
>  		if (!cprm.file->f_op || !cprm.file->f_op->write)
>  			goto close_fail;
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index 75e7c1f..d078b75 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -532,9 +532,9 @@ static inline int sigio_perm(struct task_struct *p,
>  
>  	rcu_read_lock();
>  	cred = __task_cred(p);
> -	ret = ((fown->euid == 0 ||
> -		fown->euid == cred->suid || fown->euid == cred->uid ||
> -		fown->uid  == cred->suid || fown->uid  == cred->uid) &&
> +	ret = ((uid_eq(fown->euid, GLOBAL_ROOT_UID) ||
> +		uid_eq(fown->euid, cred->suid) || uid_eq(fown->euid, cred->uid) ||
> +		uid_eq(fown->uid,  cred->suid) || uid_eq(fown->uid,  cred->uid)) &&
>  	       !security_file_send_sigiotask(p, fown, sig));
>  	rcu_read_unlock();
>  	return ret;
> diff --git a/fs/ioprio.c b/fs/ioprio.c
> index 2072e41..5e6dbe89 100644
> --- a/fs/ioprio.c
> +++ b/fs/ioprio.c
> @@ -37,8 +37,8 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
>  
>  	rcu_read_lock();
>  	tcred = __task_cred(task);
> -	if (tcred->uid != cred->euid &&
> -	    tcred->uid != cred->uid && !capable(CAP_SYS_NICE)) {
> +	if (!uid_eq(tcred->uid, cred->euid) &&
> +	    !uid_eq(tcred->uid, cred->uid) && !capable(CAP_SYS_NICE)) {
>  		rcu_read_unlock();
>  		return -EPERM;
>  	}
> diff --git a/fs/locks.c b/fs/locks.c
> index 637694b..3e946cd 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1445,7 +1445,7 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
>  	struct inode *inode = dentry->d_inode;
>  	int error;
>  
> -	if ((current_fsuid() != inode->i_uid) && !capable(CAP_LEASE))
> +	if ((!uid_eq(current_fsuid(), inode->i_uid)) && !capable(CAP_LEASE))
>  		return -EACCES;
>  	if (!S_ISREG(inode->i_mode))
>  		return -EINVAL;
> diff --git a/fs/namei.c b/fs/namei.c
> index 941c436..86512b4 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -228,7 +228,7 @@ static int acl_permission_check(struct inode *inode, int mask)
>  {
>  	unsigned int mode = inode->i_mode;
>  
> -	if (likely(current_fsuid() == inode->i_uid))
> +	if (likely(uid_eq(current_fsuid(), inode->i_uid)))
>  		mode >>= 6;
>  	else {
>  		if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
> @@ -1956,13 +1956,13 @@ static int user_path_parent(int dfd, const char __user *path,
>   */
>  static inline int check_sticky(struct inode *dir, struct inode *inode)
>  {
> -	uid_t fsuid = current_fsuid();
> +	kuid_t fsuid = current_fsuid();
>  
>  	if (!(dir->i_mode & S_ISVTX))
>  		return 0;
> -	if (inode->i_uid == fsuid)
> +	if (uid_eq(inode->i_uid, fsuid))
>  		return 0;
> -	if (dir->i_uid == fsuid)
> +	if (uid_eq(dir->i_uid, fsuid))
>  		return 0;
>  	return !inode_capable(inode, CAP_FOWNER);
>  }
> diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
> index d93f95e..17b9773 100644
> --- a/include/linux/quotaops.h
> +++ b/include/linux/quotaops.h
> @@ -22,8 +22,8 @@ static inline struct quota_info *sb_dqopt(struct super_block *sb)
>  static inline bool is_quota_modification(struct inode *inode, struct iattr *ia)
>  {
>  	return (ia->ia_valid & ATTR_SIZE && ia->ia_size != inode->i_size) ||
> -		(ia->ia_valid & ATTR_UID && ia->ia_uid != inode->i_uid) ||
> -		(ia->ia_valid & ATTR_GID && ia->ia_gid != inode->i_gid);
> +		(ia->ia_valid & ATTR_UID && !uid_eq(ia->ia_uid, inode->i_uid)) ||
> +		(ia->ia_valid & ATTR_GID && !gid_eq(ia->ia_gid, inode->i_gid));
>  }
>  
>  #if defined(CONFIG_QUOTA)
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
@ 2012-04-18 19:03       ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:03 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  fs/attr.c                |    8 ++++----
>  fs/exec.c                |   10 +++++-----
>  fs/fcntl.c               |    6 +++---
>  fs/ioprio.c              |    4 ++--
>  fs/locks.c               |    2 +-
>  fs/namei.c               |    8 ++++----
>  include/linux/quotaops.h |    4 ++--
>  7 files changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/attr.c b/fs/attr.c
> index 73f69a6..2f094c6 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -47,14 +47,14 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)
>  
>  	/* Make sure a caller can chown. */
>  	if ((ia_valid & ATTR_UID) &&
> -	    (current_fsuid() != inode->i_uid ||
> -	     attr->ia_uid != inode->i_uid) && !capable(CAP_CHOWN))
> +	    (!uid_eq(current_fsuid(), inode->i_uid) ||
> +	     !uid_eq(attr->ia_uid, inode->i_uid)) && !capable(CAP_CHOWN))
>  		return -EPERM;
>  
>  	/* Make sure caller can chgrp. */
>  	if ((ia_valid & ATTR_GID) &&
> -	    (current_fsuid() != inode->i_uid ||
> -	    (!in_group_p(attr->ia_gid) && attr->ia_gid != inode->i_gid)) &&
> +	    (!uid_eq(current_fsuid(), inode->i_uid) ||
> +	    (!in_group_p(attr->ia_gid) && gid_eq(attr->ia_gid, inode->i_gid))) &&
>  	    !capable(CAP_CHOWN))
>  		return -EPERM;
>  
> diff --git a/fs/exec.c b/fs/exec.c
> index 9a1d9f0..00ae2ef 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1139,7 +1139,7 @@ void setup_new_exec(struct linux_binprm * bprm)
>  	/* This is the point of no return */
>  	current->sas_ss_sp = current->sas_ss_size = 0;
>  
> -	if (current_euid() == current_uid() && current_egid() == current_gid())
> +	if (uid_eq(current_euid(), current_uid()) && gid_eq(current_egid(), current_gid()))
>  		set_dumpable(current->mm, 1);
>  	else
>  		set_dumpable(current->mm, suid_dumpable);
> @@ -1153,8 +1153,8 @@ void setup_new_exec(struct linux_binprm * bprm)
>  	current->mm->task_size = TASK_SIZE;
>  
>  	/* install the new credentials */
> -	if (bprm->cred->uid != current_euid() ||
> -	    bprm->cred->gid != current_egid()) {
> +	if (!uid_eq(bprm->cred->uid, current_euid()) ||
> +	    !gid_eq(bprm->cred->gid, current_egid())) {
>  		current->pdeath_signal = 0;
>  	} else {
>  		would_dump(bprm, bprm->file);
> @@ -2120,7 +2120,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
>  	if (__get_dumpable(cprm.mm_flags) == 2) {
>  		/* Setuid core dump mode */
>  		flag = O_EXCL;		/* Stop rewrite attacks */
> -		cred->fsuid = 0;	/* Dump root private */
> +		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */

Sorry, one more - can this be the per-ns root uid?  The coredumps should
be ok to belong to privileged users in the namespace right?

>  	}
>  
>  	retval = coredump_wait(exit_code, &core_state);
> @@ -2221,7 +2221,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
>  		 * Dont allow local users get cute and trick others to coredump
>  		 * into their pre-created files.
>  		 */
> -		if (inode->i_uid != current_fsuid())
> +		if (!uid_eq(inode->i_uid, current_fsuid()))
>  			goto close_fail;
>  		if (!cprm.file->f_op || !cprm.file->f_op->write)
>  			goto close_fail;
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index 75e7c1f..d078b75 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -532,9 +532,9 @@ static inline int sigio_perm(struct task_struct *p,
>  
>  	rcu_read_lock();
>  	cred = __task_cred(p);
> -	ret = ((fown->euid == 0 ||
> -		fown->euid == cred->suid || fown->euid == cred->uid ||
> -		fown->uid  == cred->suid || fown->uid  == cred->uid) &&
> +	ret = ((uid_eq(fown->euid, GLOBAL_ROOT_UID) ||
> +		uid_eq(fown->euid, cred->suid) || uid_eq(fown->euid, cred->uid) ||
> +		uid_eq(fown->uid,  cred->suid) || uid_eq(fown->uid,  cred->uid)) &&
>  	       !security_file_send_sigiotask(p, fown, sig));
>  	rcu_read_unlock();
>  	return ret;
> diff --git a/fs/ioprio.c b/fs/ioprio.c
> index 2072e41..5e6dbe89 100644
> --- a/fs/ioprio.c
> +++ b/fs/ioprio.c
> @@ -37,8 +37,8 @@ int set_task_ioprio(struct task_struct *task, int ioprio)
>  
>  	rcu_read_lock();
>  	tcred = __task_cred(task);
> -	if (tcred->uid != cred->euid &&
> -	    tcred->uid != cred->uid && !capable(CAP_SYS_NICE)) {
> +	if (!uid_eq(tcred->uid, cred->euid) &&
> +	    !uid_eq(tcred->uid, cred->uid) && !capable(CAP_SYS_NICE)) {
>  		rcu_read_unlock();
>  		return -EPERM;
>  	}
> diff --git a/fs/locks.c b/fs/locks.c
> index 637694b..3e946cd 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1445,7 +1445,7 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
>  	struct inode *inode = dentry->d_inode;
>  	int error;
>  
> -	if ((current_fsuid() != inode->i_uid) && !capable(CAP_LEASE))
> +	if ((!uid_eq(current_fsuid(), inode->i_uid)) && !capable(CAP_LEASE))
>  		return -EACCES;
>  	if (!S_ISREG(inode->i_mode))
>  		return -EINVAL;
> diff --git a/fs/namei.c b/fs/namei.c
> index 941c436..86512b4 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -228,7 +228,7 @@ static int acl_permission_check(struct inode *inode, int mask)
>  {
>  	unsigned int mode = inode->i_mode;
>  
> -	if (likely(current_fsuid() == inode->i_uid))
> +	if (likely(uid_eq(current_fsuid(), inode->i_uid)))
>  		mode >>= 6;
>  	else {
>  		if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
> @@ -1956,13 +1956,13 @@ static int user_path_parent(int dfd, const char __user *path,
>   */
>  static inline int check_sticky(struct inode *dir, struct inode *inode)
>  {
> -	uid_t fsuid = current_fsuid();
> +	kuid_t fsuid = current_fsuid();
>  
>  	if (!(dir->i_mode & S_ISVTX))
>  		return 0;
> -	if (inode->i_uid == fsuid)
> +	if (uid_eq(inode->i_uid, fsuid))
>  		return 0;
> -	if (dir->i_uid == fsuid)
> +	if (uid_eq(dir->i_uid, fsuid))
>  		return 0;
>  	return !inode_capable(inode, CAP_FOWNER);
>  }
> diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
> index d93f95e..17b9773 100644
> --- a/include/linux/quotaops.h
> +++ b/include/linux/quotaops.h
> @@ -22,8 +22,8 @@ static inline struct quota_info *sb_dqopt(struct super_block *sb)
>  static inline bool is_quota_modification(struct inode *inode, struct iattr *ia)
>  {
>  	return (ia->ia_valid & ATTR_SIZE && ia->ia_size != inode->i_size) ||
> -		(ia->ia_valid & ATTR_UID && ia->ia_uid != inode->i_uid) ||
> -		(ia->ia_valid & ATTR_GID && ia->ia_gid != inode->i_gid);
> +		(ia->ia_valid & ATTR_UID && !uid_eq(ia->ia_uid, inode->i_uid)) ||
> +		(ia->ia_valid & ATTR_GID && !gid_eq(ia->ia_gid, inode->i_gid));
>  }
>  
>  #if defined(CONFIG_QUOTA)
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 28/43] userns: Convert user specfied uids and gids in chown into kuids and kgid
  2012-04-08  5:15     ` "Eric W. Beiderman
@ 2012-04-18 19:03         ` Serge E. Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:03 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  fs/open.c |   13 +++++++++++--
>  1 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index 92335f6..e166801 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -506,15 +506,24 @@ static int chown_common(struct path *path, uid_t user, gid_t group)
>  	struct inode *inode = path->dentry->d_inode;
>  	int error;
>  	struct iattr newattrs;
> +	kuid_t uid;
> +	kgid_t gid;
> +
> +	uid = make_kuid(current_user_ns(), user);
> +	gid = make_kgid(current_user_ns(), group);
>  
>  	newattrs.ia_valid =  ATTR_CTIME;
>  	if (user != (uid_t) -1) {
> +		if (!uid_valid(uid))
> +			return -EINVAL;
>  		newattrs.ia_valid |= ATTR_UID;
> -		newattrs.ia_uid = user;
> +		newattrs.ia_uid = uid;
>  	}
>  	if (group != (gid_t) -1) {
> +		if (!gid_valid(gid))
> +			return -EINVAL;
>  		newattrs.ia_valid |= ATTR_GID;
> -		newattrs.ia_gid = group;
> +		newattrs.ia_gid = gid;
>  	}
>  	if (!S_ISDIR(inode->i_mode))
>  		newattrs.ia_valid |=
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 28/43] userns: Convert user specfied uids and gids in chown into kuids and kgid
@ 2012-04-18 19:03         ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:03 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  fs/open.c |   13 +++++++++++--
>  1 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index 92335f6..e166801 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -506,15 +506,24 @@ static int chown_common(struct path *path, uid_t user, gid_t group)
>  	struct inode *inode = path->dentry->d_inode;
>  	int error;
>  	struct iattr newattrs;
> +	kuid_t uid;
> +	kgid_t gid;
> +
> +	uid = make_kuid(current_user_ns(), user);
> +	gid = make_kgid(current_user_ns(), group);
>  
>  	newattrs.ia_valid =  ATTR_CTIME;
>  	if (user != (uid_t) -1) {
> +		if (!uid_valid(uid))
> +			return -EINVAL;
>  		newattrs.ia_valid |= ATTR_UID;
> -		newattrs.ia_uid = user;
> +		newattrs.ia_uid = uid;
>  	}
>  	if (group != (gid_t) -1) {
> +		if (!gid_valid(gid))
> +			return -EINVAL;
>  		newattrs.ia_valid |= ATTR_GID;
> -		newattrs.ia_gid = group;
> +		newattrs.ia_gid = gid;
>  	}
>  	if (!S_ISDIR(inode->i_mode))
>  		newattrs.ia_valid |=
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 29/43] userns: Convert stat to return values mapped from kuids and kgids
       [not found]     ` <1333862139-31737-29-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-04-18 19:03       ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:03 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Cyrill Gorcunov, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> - Store uids and gids with kuid_t and kgid_t in struct kstat
> - Convert uid and gids to userspace usable values with
>   from_kuid and from_kgid
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  arch/arm/kernel/sys_oabi-compat.c |    4 ++--
>  arch/parisc/hpux/fs.c             |    4 ++--
>  arch/s390/kernel/compat_linux.c   |    4 ++--
>  arch/sparc/kernel/sys_sparc32.c   |    4 ++--
>  arch/x86/ia32/sys_ia32.c          |    4 ++--
>  fs/compat.c                       |    4 ++--
>  fs/stat.c                         |    8 ++++----
>  include/linux/stat.h              |    5 +++--
>  8 files changed, 19 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
> index af0aaeb..3e94811 100644
> --- a/arch/arm/kernel/sys_oabi-compat.c
> +++ b/arch/arm/kernel/sys_oabi-compat.c
> @@ -124,8 +124,8 @@ static long cp_oldabi_stat64(struct kstat *stat,
>  	tmp.__st_ino = stat->ino;
>  	tmp.st_mode = stat->mode;
>  	tmp.st_nlink = stat->nlink;
> -	tmp.st_uid = stat->uid;
> -	tmp.st_gid = stat->gid;
> +	tmp.st_uid = from_kuid_munged(current_user_ns(), stat->uid);
> +	tmp.st_gid = from_kgid_munged(current_user_ns(), stat->gid);
>  	tmp.st_rdev = huge_encode_dev(stat->rdev);
>  	tmp.st_size = stat->size;
>  	tmp.st_blocks = stat->blocks;
> diff --git a/arch/parisc/hpux/fs.c b/arch/parisc/hpux/fs.c
> index 0dc8543..c71eb6c 100644
> --- a/arch/parisc/hpux/fs.c
> +++ b/arch/parisc/hpux/fs.c
> @@ -159,8 +159,8 @@ static int cp_hpux_stat(struct kstat *stat, struct hpux_stat64 __user *statbuf)
>  	tmp.st_ino = stat->ino;
>  	tmp.st_mode = stat->mode;
>  	tmp.st_nlink = stat->nlink;
> -	tmp.st_uid = stat->uid;
> -	tmp.st_gid = stat->gid;
> +	tmp.st_uid = from_kuid_munged(current_user_ns(), stat->uid);
> +	tmp.st_gid = from_kgid_munged(current_user_ns(), stat->gid);
>  	tmp.st_rdev = new_encode_dev(stat->rdev);
>  	tmp.st_size = stat->size;
>  	tmp.st_atime = stat->atime.tv_sec;
> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
> index 5baac18..80ab23a 100644
> --- a/arch/s390/kernel/compat_linux.c
> +++ b/arch/s390/kernel/compat_linux.c
> @@ -546,8 +546,8 @@ static int cp_stat64(struct stat64_emu31 __user *ubuf, struct kstat *stat)
>  	tmp.__st_ino = (u32)stat->ino;
>  	tmp.st_mode = stat->mode;
>  	tmp.st_nlink = (unsigned int)stat->nlink;
> -	tmp.st_uid = stat->uid;
> -	tmp.st_gid = stat->gid;
> +	tmp.st_uid = from_kuid_munged(current_user_ns(), stat->uid);
> +	tmp.st_gid = from_kgid_munged(current_user_ns(), stat->gid);
>  	tmp.st_rdev = huge_encode_dev(stat->rdev);
>  	tmp.st_size = stat->size;
>  	tmp.st_blksize = (u32)stat->blksize;
> diff --git a/arch/sparc/kernel/sys_sparc32.c b/arch/sparc/kernel/sys_sparc32.c
> index 29c478f..f739233 100644
> --- a/arch/sparc/kernel/sys_sparc32.c
> +++ b/arch/sparc/kernel/sys_sparc32.c
> @@ -139,8 +139,8 @@ static int cp_compat_stat64(struct kstat *stat,
>  	err |= put_user(stat->ino, &statbuf->st_ino);
>  	err |= put_user(stat->mode, &statbuf->st_mode);
>  	err |= put_user(stat->nlink, &statbuf->st_nlink);
> -	err |= put_user(stat->uid, &statbuf->st_uid);
> -	err |= put_user(stat->gid, &statbuf->st_gid);
> +	err |= put_user(from_kuid_munged(current_user_ns(), stat->uid), &statbuf->st_uid);
> +	err |= put_user(from_kgid_munged(current_user_ns(), stat->gid), &statbuf->st_gid);
>  	err |= put_user(huge_encode_dev(stat->rdev), &statbuf->st_rdev);
>  	err |= put_user(0, (unsigned long __user *) &statbuf->__pad3[0]);
>  	err |= put_user(stat->size, &statbuf->st_size);
> diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
> index aec2202..d5c820a 100644
> --- a/arch/x86/ia32/sys_ia32.c
> +++ b/arch/x86/ia32/sys_ia32.c
> @@ -71,8 +71,8 @@ static int cp_stat64(struct stat64 __user *ubuf, struct kstat *stat)
>  {
>  	typeof(ubuf->st_uid) uid = 0;
>  	typeof(ubuf->st_gid) gid = 0;
> -	SET_UID(uid, stat->uid);
> -	SET_GID(gid, stat->gid);
> +	SET_UID(uid, from_kuid_munged(current_user_ns(), stat->uid));
> +	SET_GID(gid, from_kgid_munged(current_user_ns(), stat->gid));
>  	if (!access_ok(VERIFY_WRITE, ubuf, sizeof(struct stat64)) ||
>  	    __put_user(huge_encode_dev(stat->dev), &ubuf->st_dev) ||
>  	    __put_user(stat->ino, &ubuf->__st_ino) ||
> diff --git a/fs/compat.c b/fs/compat.c
> index f2944ac..0781e61 100644
> --- a/fs/compat.c
> +++ b/fs/compat.c
> @@ -144,8 +144,8 @@ static int cp_compat_stat(struct kstat *stat, struct compat_stat __user *ubuf)
>  	tmp.st_nlink = stat->nlink;
>  	if (tmp.st_nlink != stat->nlink)
>  		return -EOVERFLOW;
> -	SET_UID(tmp.st_uid, stat->uid);
> -	SET_GID(tmp.st_gid, stat->gid);
> +	SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
> +	SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
>  	tmp.st_rdev = old_encode_dev(stat->rdev);
>  	if ((u64) stat->size > MAX_NON_LFS)
>  		return -EOVERFLOW;
> diff --git a/fs/stat.c b/fs/stat.c
> index c733dc5..fca17f9 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -137,8 +137,8 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
>  	tmp.st_nlink = stat->nlink;
>  	if (tmp.st_nlink != stat->nlink)
>  		return -EOVERFLOW;
> -	SET_UID(tmp.st_uid, stat->uid);
> -	SET_GID(tmp.st_gid, stat->gid);
> +	SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
> +	SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
>  	tmp.st_rdev = old_encode_dev(stat->rdev);
>  #if BITS_PER_LONG == 32
>  	if (stat->size > MAX_NON_LFS)
> @@ -215,8 +215,8 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf)
>  	tmp.st_nlink = stat->nlink;
>  	if (tmp.st_nlink != stat->nlink)
>  		return -EOVERFLOW;
> -	SET_UID(tmp.st_uid, stat->uid);
> -	SET_GID(tmp.st_gid, stat->gid);
> +	SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
> +	SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
>  #if BITS_PER_LONG == 32
>  	tmp.st_rdev = old_encode_dev(stat->rdev);
>  #else
> diff --git a/include/linux/stat.h b/include/linux/stat.h
> index 611c398..4613240 100644
> --- a/include/linux/stat.h
> +++ b/include/linux/stat.h
> @@ -58,14 +58,15 @@
>  
>  #include <linux/types.h>
>  #include <linux/time.h>
> +#include <linux/uidgid.h>
>  
>  struct kstat {
>  	u64		ino;
>  	dev_t		dev;
>  	umode_t		mode;
>  	unsigned int	nlink;
> -	uid_t		uid;
> -	gid_t		gid;
> +	kuid_t		uid;
> +	kgid_t		gid;
>  	dev_t		rdev;
>  	loff_t		size;
>  	struct timespec  atime;
> -- 
> 1.7.2.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 29/43] userns: Convert stat to return values mapped from kuids and kgids
  2012-04-08  5:15     ` "Eric W. Beiderman
                       ` (2 preceding siblings ...)
  (?)
@ 2012-04-18 19:03     ` Serge E. Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:03 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, linux-fsdevel, linux-security-module,
	Linux Containers, Andrew Morton, Linus Torvalds, Al Viro,
	Cyrill Gorcunov

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> - Store uids and gids with kuid_t and kgid_t in struct kstat
> - Convert uid and gids to userspace usable values with
>   from_kuid and from_kgid
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  arch/arm/kernel/sys_oabi-compat.c |    4 ++--
>  arch/parisc/hpux/fs.c             |    4 ++--
>  arch/s390/kernel/compat_linux.c   |    4 ++--
>  arch/sparc/kernel/sys_sparc32.c   |    4 ++--
>  arch/x86/ia32/sys_ia32.c          |    4 ++--
>  fs/compat.c                       |    4 ++--
>  fs/stat.c                         |    8 ++++----
>  include/linux/stat.h              |    5 +++--
>  8 files changed, 19 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
> index af0aaeb..3e94811 100644
> --- a/arch/arm/kernel/sys_oabi-compat.c
> +++ b/arch/arm/kernel/sys_oabi-compat.c
> @@ -124,8 +124,8 @@ static long cp_oldabi_stat64(struct kstat *stat,
>  	tmp.__st_ino = stat->ino;
>  	tmp.st_mode = stat->mode;
>  	tmp.st_nlink = stat->nlink;
> -	tmp.st_uid = stat->uid;
> -	tmp.st_gid = stat->gid;
> +	tmp.st_uid = from_kuid_munged(current_user_ns(), stat->uid);
> +	tmp.st_gid = from_kgid_munged(current_user_ns(), stat->gid);
>  	tmp.st_rdev = huge_encode_dev(stat->rdev);
>  	tmp.st_size = stat->size;
>  	tmp.st_blocks = stat->blocks;
> diff --git a/arch/parisc/hpux/fs.c b/arch/parisc/hpux/fs.c
> index 0dc8543..c71eb6c 100644
> --- a/arch/parisc/hpux/fs.c
> +++ b/arch/parisc/hpux/fs.c
> @@ -159,8 +159,8 @@ static int cp_hpux_stat(struct kstat *stat, struct hpux_stat64 __user *statbuf)
>  	tmp.st_ino = stat->ino;
>  	tmp.st_mode = stat->mode;
>  	tmp.st_nlink = stat->nlink;
> -	tmp.st_uid = stat->uid;
> -	tmp.st_gid = stat->gid;
> +	tmp.st_uid = from_kuid_munged(current_user_ns(), stat->uid);
> +	tmp.st_gid = from_kgid_munged(current_user_ns(), stat->gid);
>  	tmp.st_rdev = new_encode_dev(stat->rdev);
>  	tmp.st_size = stat->size;
>  	tmp.st_atime = stat->atime.tv_sec;
> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
> index 5baac18..80ab23a 100644
> --- a/arch/s390/kernel/compat_linux.c
> +++ b/arch/s390/kernel/compat_linux.c
> @@ -546,8 +546,8 @@ static int cp_stat64(struct stat64_emu31 __user *ubuf, struct kstat *stat)
>  	tmp.__st_ino = (u32)stat->ino;
>  	tmp.st_mode = stat->mode;
>  	tmp.st_nlink = (unsigned int)stat->nlink;
> -	tmp.st_uid = stat->uid;
> -	tmp.st_gid = stat->gid;
> +	tmp.st_uid = from_kuid_munged(current_user_ns(), stat->uid);
> +	tmp.st_gid = from_kgid_munged(current_user_ns(), stat->gid);
>  	tmp.st_rdev = huge_encode_dev(stat->rdev);
>  	tmp.st_size = stat->size;
>  	tmp.st_blksize = (u32)stat->blksize;
> diff --git a/arch/sparc/kernel/sys_sparc32.c b/arch/sparc/kernel/sys_sparc32.c
> index 29c478f..f739233 100644
> --- a/arch/sparc/kernel/sys_sparc32.c
> +++ b/arch/sparc/kernel/sys_sparc32.c
> @@ -139,8 +139,8 @@ static int cp_compat_stat64(struct kstat *stat,
>  	err |= put_user(stat->ino, &statbuf->st_ino);
>  	err |= put_user(stat->mode, &statbuf->st_mode);
>  	err |= put_user(stat->nlink, &statbuf->st_nlink);
> -	err |= put_user(stat->uid, &statbuf->st_uid);
> -	err |= put_user(stat->gid, &statbuf->st_gid);
> +	err |= put_user(from_kuid_munged(current_user_ns(), stat->uid), &statbuf->st_uid);
> +	err |= put_user(from_kgid_munged(current_user_ns(), stat->gid), &statbuf->st_gid);
>  	err |= put_user(huge_encode_dev(stat->rdev), &statbuf->st_rdev);
>  	err |= put_user(0, (unsigned long __user *) &statbuf->__pad3[0]);
>  	err |= put_user(stat->size, &statbuf->st_size);
> diff --git a/arch/x86/ia32/sys_ia32.c b/arch/x86/ia32/sys_ia32.c
> index aec2202..d5c820a 100644
> --- a/arch/x86/ia32/sys_ia32.c
> +++ b/arch/x86/ia32/sys_ia32.c
> @@ -71,8 +71,8 @@ static int cp_stat64(struct stat64 __user *ubuf, struct kstat *stat)
>  {
>  	typeof(ubuf->st_uid) uid = 0;
>  	typeof(ubuf->st_gid) gid = 0;
> -	SET_UID(uid, stat->uid);
> -	SET_GID(gid, stat->gid);
> +	SET_UID(uid, from_kuid_munged(current_user_ns(), stat->uid));
> +	SET_GID(gid, from_kgid_munged(current_user_ns(), stat->gid));
>  	if (!access_ok(VERIFY_WRITE, ubuf, sizeof(struct stat64)) ||
>  	    __put_user(huge_encode_dev(stat->dev), &ubuf->st_dev) ||
>  	    __put_user(stat->ino, &ubuf->__st_ino) ||
> diff --git a/fs/compat.c b/fs/compat.c
> index f2944ac..0781e61 100644
> --- a/fs/compat.c
> +++ b/fs/compat.c
> @@ -144,8 +144,8 @@ static int cp_compat_stat(struct kstat *stat, struct compat_stat __user *ubuf)
>  	tmp.st_nlink = stat->nlink;
>  	if (tmp.st_nlink != stat->nlink)
>  		return -EOVERFLOW;
> -	SET_UID(tmp.st_uid, stat->uid);
> -	SET_GID(tmp.st_gid, stat->gid);
> +	SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
> +	SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
>  	tmp.st_rdev = old_encode_dev(stat->rdev);
>  	if ((u64) stat->size > MAX_NON_LFS)
>  		return -EOVERFLOW;
> diff --git a/fs/stat.c b/fs/stat.c
> index c733dc5..fca17f9 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -137,8 +137,8 @@ static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * sta
>  	tmp.st_nlink = stat->nlink;
>  	if (tmp.st_nlink != stat->nlink)
>  		return -EOVERFLOW;
> -	SET_UID(tmp.st_uid, stat->uid);
> -	SET_GID(tmp.st_gid, stat->gid);
> +	SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
> +	SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
>  	tmp.st_rdev = old_encode_dev(stat->rdev);
>  #if BITS_PER_LONG == 32
>  	if (stat->size > MAX_NON_LFS)
> @@ -215,8 +215,8 @@ static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf)
>  	tmp.st_nlink = stat->nlink;
>  	if (tmp.st_nlink != stat->nlink)
>  		return -EOVERFLOW;
> -	SET_UID(tmp.st_uid, stat->uid);
> -	SET_GID(tmp.st_gid, stat->gid);
> +	SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
> +	SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
>  #if BITS_PER_LONG == 32
>  	tmp.st_rdev = old_encode_dev(stat->rdev);
>  #else
> diff --git a/include/linux/stat.h b/include/linux/stat.h
> index 611c398..4613240 100644
> --- a/include/linux/stat.h
> +++ b/include/linux/stat.h
> @@ -58,14 +58,15 @@
>  
>  #include <linux/types.h>
>  #include <linux/time.h>
> +#include <linux/uidgid.h>
>  
>  struct kstat {
>  	u64		ino;
>  	dev_t		dev;
>  	umode_t		mode;
>  	unsigned int	nlink;
> -	uid_t		uid;
> -	gid_t		gid;
> +	kuid_t		uid;
> +	kgid_t		gid;
>  	dev_t		rdev;
>  	loff_t		size;
>  	struct timespec  atime;
> -- 
> 1.7.2.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 30/43] userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
       [not found]     ` <1333862139-31737-30-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-04-18 19:05       ` Serge E. Hallyn
  2012-04-18 19:09       ` Serge E. Hallyn
  1 sibling, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:05 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  fs/exec.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 00ae2ef..e001bdf 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1291,8 +1291,11 @@ int prepare_binprm(struct linux_binprm *bprm)
>  	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) {
>  		/* Set-uid? */
>  		if (mode & S_ISUID) {
> +			if (!kuid_has_mapping(bprm->cred->user_ns, inode->i_uid))
> +				return -EPERM;
>  			bprm->per_clear |= PER_CLEAR_ON_SETID;
>  			bprm->cred->euid = inode->i_uid;
> +
>  		}
>  
>  		/* Set-gid? */
> @@ -1302,6 +1305,8 @@ int prepare_binprm(struct linux_binprm *bprm)
>  		 * executable.
>  		 */
>  		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> +			if (!kgid_has_mapping(bprm->cred->user_ns, inode->i_gid))
> +				return -EPERM;
>  			bprm->per_clear |= PER_CLEAR_ON_SETID;
>  			bprm->cred->egid = inode->i_gid;
>  		}
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 30/43] userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
  2012-04-08  5:15     ` "Eric W. Beiderman
  (?)
  (?)
@ 2012-04-18 19:05     ` Serge E. Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:05 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  fs/exec.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 00ae2ef..e001bdf 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1291,8 +1291,11 @@ int prepare_binprm(struct linux_binprm *bprm)
>  	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) {
>  		/* Set-uid? */
>  		if (mode & S_ISUID) {
> +			if (!kuid_has_mapping(bprm->cred->user_ns, inode->i_uid))
> +				return -EPERM;
>  			bprm->per_clear |= PER_CLEAR_ON_SETID;
>  			bprm->cred->euid = inode->i_uid;
> +
>  		}
>  
>  		/* Set-gid? */
> @@ -1302,6 +1305,8 @@ int prepare_binprm(struct linux_binprm *bprm)
>  		 * executable.
>  		 */
>  		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> +			if (!kgid_has_mapping(bprm->cred->user_ns, inode->i_gid))
> +				return -EPERM;
>  			bprm->per_clear |= PER_CLEAR_ON_SETID;
>  			bprm->cred->egid = inode->i_gid;
>  		}
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 31/43] userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
  2012-04-08  5:15     ` "Eric W. Beiderman
@ 2012-04-18 19:06         ` Serge E. Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:06 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  kernel/capability.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/capability.c b/kernel/capability.c
> index cc5f071..493d972 100644
> --- a/kernel/capability.c
> +++ b/kernel/capability.c
> @@ -429,12 +429,14 @@ bool nsown_capable(int cap)
>   * targeted at it's own user namespace and that the given inode is owned
>   * by the current user namespace or a child namespace.
>   *
> - * Currently inodes can only be owned by the initial user namespace.
> + * Currently we check to see if an inode is owned by the current
> + * user namespace by seeing if the inode's owner maps into the
> + * current user namespace.
>   *
>   */
>  bool inode_capable(const struct inode *inode, int cap)
>  {
>  	struct user_namespace *ns = current_user_ns();
>  
> -	return ns_capable(ns, cap) && (ns == &init_user_ns);
> +	return ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid);
>  }
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 31/43] userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
@ 2012-04-18 19:06         ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:06 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  kernel/capability.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/capability.c b/kernel/capability.c
> index cc5f071..493d972 100644
> --- a/kernel/capability.c
> +++ b/kernel/capability.c
> @@ -429,12 +429,14 @@ bool nsown_capable(int cap)
>   * targeted at it's own user namespace and that the given inode is owned
>   * by the current user namespace or a child namespace.
>   *
> - * Currently inodes can only be owned by the initial user namespace.
> + * Currently we check to see if an inode is owned by the current
> + * user namespace by seeing if the inode's owner maps into the
> + * current user namespace.
>   *
>   */
>  bool inode_capable(const struct inode *inode, int cap)
>  {
>  	struct user_namespace *ns = current_user_ns();
>  
> -	return ns_capable(ns, cap) && (ns == &init_user_ns);
> +	return ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid);
>  }
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 32/43] userns: signal remove unnecessary map_cred_ns
       [not found]     ` <1333862139-31737-32-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-04-18 19:07       ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:07 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> map_cred_ns is a light wrapper around from_kuid with the order of the arguments
> reversed.  Replace map_cred_ns with from_kuid and remove map_cred_ns.
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  kernel/signal.c |   20 +++++---------------
>  1 files changed, 5 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 9797939..6aca310 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1019,15 +1019,6 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
>  	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
>  }
>  
> -/*
> - * map the uid in struct cred into user namespace *ns
> - */
> -static inline uid_t map_cred_ns(const struct cred *cred,
> -				struct user_namespace *ns)
> -{
> -	return from_kuid_munged(ns, cred->uid);
> -}
> -
>  #ifdef CONFIG_USER_NS
>  static inline void userns_fixup_signal_uid(struct siginfo *info, struct task_struct *t)
>  {
> @@ -1677,8 +1668,8 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
>  	 */
>  	rcu_read_lock();
>  	info.si_pid = task_pid_nr_ns(tsk, tsk->parent->nsproxy->pid_ns);
> -	info.si_uid = map_cred_ns(__task_cred(tsk),
> -			task_cred_xxx(tsk->parent, user_ns));
> +	info.si_uid = from_kuid_munged(task_cred_xxx(tsk->parent, user_ns),
> +				       task_uid(tsk));
>  	rcu_read_unlock();
>  
>  	info.si_utime = cputime_to_clock_t(tsk->utime + tsk->signal->utime);
> @@ -1761,8 +1752,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>  	 */
>  	rcu_read_lock();
>  	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
> -	info.si_uid = map_cred_ns(__task_cred(tsk),
> -			task_cred_xxx(parent, user_ns));
> +	info.si_uid = from_kuid_munged(task_cred_xxx(parent, user_ns), task_uid(tsk));
>  	rcu_read_unlock();
>  
>  	info.si_utime = cputime_to_clock_t(tsk->utime);
> @@ -2180,8 +2170,8 @@ static int ptrace_signal(int signr, siginfo_t *info,
>  		info->si_code = SI_USER;
>  		rcu_read_lock();
>  		info->si_pid = task_pid_vnr(current->parent);
> -		info->si_uid = map_cred_ns(__task_cred(current->parent),
> -				current_user_ns());
> +		info->si_uid = from_kuid_munged(current_user_ns(),
> +						task_uid(current->parent));
>  		rcu_read_unlock();
>  	}
>  
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 32/43] userns: signal remove unnecessary map_cred_ns
  2012-04-08  5:15     ` "Eric W. Beiderman
                       ` (2 preceding siblings ...)
  (?)
@ 2012-04-18 19:07     ` Serge E. Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:07 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> map_cred_ns is a light wrapper around from_kuid with the order of the arguments
> reversed.  Replace map_cred_ns with from_kuid and remove map_cred_ns.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  kernel/signal.c |   20 +++++---------------
>  1 files changed, 5 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 9797939..6aca310 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1019,15 +1019,6 @@ static inline int legacy_queue(struct sigpending *signals, int sig)
>  	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
>  }
>  
> -/*
> - * map the uid in struct cred into user namespace *ns
> - */
> -static inline uid_t map_cred_ns(const struct cred *cred,
> -				struct user_namespace *ns)
> -{
> -	return from_kuid_munged(ns, cred->uid);
> -}
> -
>  #ifdef CONFIG_USER_NS
>  static inline void userns_fixup_signal_uid(struct siginfo *info, struct task_struct *t)
>  {
> @@ -1677,8 +1668,8 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
>  	 */
>  	rcu_read_lock();
>  	info.si_pid = task_pid_nr_ns(tsk, tsk->parent->nsproxy->pid_ns);
> -	info.si_uid = map_cred_ns(__task_cred(tsk),
> -			task_cred_xxx(tsk->parent, user_ns));
> +	info.si_uid = from_kuid_munged(task_cred_xxx(tsk->parent, user_ns),
> +				       task_uid(tsk));
>  	rcu_read_unlock();
>  
>  	info.si_utime = cputime_to_clock_t(tsk->utime + tsk->signal->utime);
> @@ -1761,8 +1752,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>  	 */
>  	rcu_read_lock();
>  	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
> -	info.si_uid = map_cred_ns(__task_cred(tsk),
> -			task_cred_xxx(parent, user_ns));
> +	info.si_uid = from_kuid_munged(task_cred_xxx(parent, user_ns), task_uid(tsk));
>  	rcu_read_unlock();
>  
>  	info.si_utime = cputime_to_clock_t(tsk->utime);
> @@ -2180,8 +2170,8 @@ static int ptrace_signal(int signr, siginfo_t *info,
>  		info->si_code = SI_USER;
>  		rcu_read_lock();
>  		info->si_pid = task_pid_vnr(current->parent);
> -		info->si_uid = map_cred_ns(__task_cred(current->parent),
> -				current_user_ns());
> +		info->si_uid = from_kuid_munged(current_user_ns(),
> +						task_uid(current->parent));
>  		rcu_read_unlock();
>  	}
>  
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 30/43] userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
       [not found]     ` <1333862139-31737-30-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2012-04-18 19:05       ` Serge E. Hallyn
@ 2012-04-18 19:09       ` Serge E. Hallyn
  1 sibling, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:09 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 

Oh, perhaps this is the right place in the thread to discuss the issue of
what to do with file capabilities?  I'm ok waiting until the next iteration
to even discuss it, so long as we start by refusing setting of fcaps by
any task not in init_user_ns.

> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
>  fs/exec.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 00ae2ef..e001bdf 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1291,8 +1291,11 @@ int prepare_binprm(struct linux_binprm *bprm)
>  	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) {
>  		/* Set-uid? */
>  		if (mode & S_ISUID) {
> +			if (!kuid_has_mapping(bprm->cred->user_ns, inode->i_uid))
> +				return -EPERM;
>  			bprm->per_clear |= PER_CLEAR_ON_SETID;
>  			bprm->cred->euid = inode->i_uid;
> +
>  		}
>  
>  		/* Set-gid? */
> @@ -1302,6 +1305,8 @@ int prepare_binprm(struct linux_binprm *bprm)
>  		 * executable.
>  		 */
>  		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> +			if (!kgid_has_mapping(bprm->cred->user_ns, inode->i_gid))
> +				return -EPERM;
>  			bprm->per_clear |= PER_CLEAR_ON_SETID;
>  			bprm->cred->egid = inode->i_gid;
>  		}
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 30/43] userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
  2012-04-08  5:15     ` "Eric W. Beiderman
                       ` (3 preceding siblings ...)
  (?)
@ 2012-04-18 19:09     ` Serge E. Hallyn
       [not found]       ` <20120418190927.GK5186-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  -1 siblings, 1 reply; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:09 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 

Oh, perhaps this is the right place in the thread to discuss the issue of
what to do with file capabilities?  I'm ok waiting until the next iteration
to even discuss it, so long as we start by refusing setting of fcaps by
any task not in init_user_ns.

> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  fs/exec.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 00ae2ef..e001bdf 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1291,8 +1291,11 @@ int prepare_binprm(struct linux_binprm *bprm)
>  	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) {
>  		/* Set-uid? */
>  		if (mode & S_ISUID) {
> +			if (!kuid_has_mapping(bprm->cred->user_ns, inode->i_uid))
> +				return -EPERM;
>  			bprm->per_clear |= PER_CLEAR_ON_SETID;
>  			bprm->cred->euid = inode->i_uid;
> +
>  		}
>  
>  		/* Set-gid? */
> @@ -1302,6 +1305,8 @@ int prepare_binprm(struct linux_binprm *bprm)
>  		 * executable.
>  		 */
>  		if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> +			if (!kgid_has_mapping(bprm->cred->user_ns, inode->i_gid))
> +				return -EPERM;
>  			bprm->per_clear |= PER_CLEAR_ON_SETID;
>  			bprm->cred->egid = inode->i_gid;
>  		}
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 33/43] userns: Convert binary formats to use kuid/kgid where appropriate
       [not found]     ` <1333862139-31737-33-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-04-18 19:10       ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:10 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Ok I'll spare the list the traffic - for patches 33-43,

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

Thanks, Eric.

-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 33/43] userns: Convert binary formats to use kuid/kgid where appropriate
  2012-04-08  5:15     ` "Eric W. Beiderman
                       ` (2 preceding siblings ...)
  (?)
@ 2012-04-18 19:10     ` Serge E. Hallyn
  2012-04-24  2:44       ` Eric W. Biederman
       [not found]       ` <20120418191033.GL5186-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  -1 siblings, 2 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-18 19:10 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Ok I'll spare the list the traffic - for patches 33-43,

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

Thanks, Eric.

-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
       [not found]       ` <20120418184847.GA4984-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2012-04-20 22:58         ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-20 22:58 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> 
>> - Transform userns->creator from a user_struct reference to a simple
>>   kuid_t, kgid_t pair.
>> 
>>   In cap_capable this allows the check to see if we are the creator of
>>   a namespace to become the classic suser style euid permission check.
>> 
>>   This allows us to remove the need for a struct cred in the mapping
>>   functions and still be able to dispaly the user namespace creators
>>   uid and gid as 0.
>> 
>> - Remove the now unnecessary delayed_work in free_user_ns.
>> 
>>   All that is left for free_user_ns to do is to call kmem_cache_free
>>   and put_user_ns.  Those functions can be called in any context
>>   so call them directly from free_user_ns removing the need for delayed work.
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> ---
>>  include/linux/user_namespace.h |    4 ++--
>>  kernel/user.c                  |    7 ++++---
>>  kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
>>  security/commoncap.c           |    5 +++--
>>  4 files changed, 27 insertions(+), 28 deletions(-)
>> 
>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> index d767508..8a391bd 100644
>> --- a/include/linux/user_namespace.h
>> +++ b/include/linux/user_namespace.h
>> @@ -9,8 +9,8 @@
>>  struct user_namespace {
>>  	struct kref		kref;
>>  	struct user_namespace	*parent;
>> -	struct user_struct	*creator;
>> -	struct work_struct	destroyer;
>> +	kuid_t			owner;
>> +	kgid_t			group;
>>  };
>>  
>>  extern struct user_namespace init_user_ns;
>> diff --git a/kernel/user.c b/kernel/user.c
>> index 025077e..cff3856 100644
>> --- a/kernel/user.c
>> +++ b/kernel/user.c
>> @@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
>>  	.kref = {
>>  		.refcount	= ATOMIC_INIT(3),
>>  	},
>> -	.creator = &root_user,
>> +	.owner = GLOBAL_ROOT_UID,
>> +	.group = GLOBAL_ROOT_GID,
>>  };
>>  EXPORT_SYMBOL_GPL(init_user_ns);
>>  
>> @@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
>>   */
>>  static DEFINE_SPINLOCK(uidhash_lock);
>>  
>> -/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
>> +/* root_user.__count is 1, for init task cred */
>>  struct user_struct root_user = {
>> -	.__count	= ATOMIC_INIT(2),
>> +	.__count	= ATOMIC_INIT(1),
>>  	.processes	= ATOMIC_INIT(1),
>>  	.files		= ATOMIC_INIT(0),
>>  	.sigpending	= ATOMIC_INIT(0),
>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> index 898e973..f69741a 100644
>> --- a/kernel/user_namespace.c
>> +++ b/kernel/user_namespace.c
>> @@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
>>  {
>>  	struct user_namespace *ns, *parent_ns = new->user_ns;
>>  	struct user_struct *root_user;
>> +	kuid_t owner = make_kuid(new->user_ns, new->euid);
>> +	kgid_t group = make_kgid(new->user_ns, new->egid);
>> +
>> +	/* The creator needs a mapping in the parent user namespace
>> +	 * or else we won't be able to reasonably tell userspace who
>> +	 * created a user_namespace.
>> +	 */
>> +	if (!kuid_has_mapping(parent_ns, owner) ||
>> +	    !kgid_has_mapping(parent_ns, group))
>> +		return -EPERM;
>>  
>>  	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
>>  	if (!ns)
>> @@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
>>  
>>  	/* set the new root user in the credentials under preparation */
>>  	ns->parent = parent_ns;
>
> I think in the past the creator cred pinned the ns->parent.  Do you now
> need to explicitly pin ns->parent (and release it in free_user_ns())?

Yes we do have to explicitly reference count the parent namespace.
But that happened in the patch 7:
"userns: Add an explicit reference to the parent user namespace"


Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
  2012-04-18 18:48     ` Serge E. Hallyn
       [not found]       ` <20120418184847.GA4984-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2012-04-20 22:58       ` Eric W. Biederman
       [not found]         ` <m1aa266meh.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  1 sibling, 1 reply; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-20 22:58 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Beiderman (ebiederm@xmission.com):
>> From: Eric W. Biederman <ebiederm@xmission.com>
>> 
>> - Transform userns->creator from a user_struct reference to a simple
>>   kuid_t, kgid_t pair.
>> 
>>   In cap_capable this allows the check to see if we are the creator of
>>   a namespace to become the classic suser style euid permission check.
>> 
>>   This allows us to remove the need for a struct cred in the mapping
>>   functions and still be able to dispaly the user namespace creators
>>   uid and gid as 0.
>> 
>> - Remove the now unnecessary delayed_work in free_user_ns.
>> 
>>   All that is left for free_user_ns to do is to call kmem_cache_free
>>   and put_user_ns.  Those functions can be called in any context
>>   so call them directly from free_user_ns removing the need for delayed work.
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> ---
>>  include/linux/user_namespace.h |    4 ++--
>>  kernel/user.c                  |    7 ++++---
>>  kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
>>  security/commoncap.c           |    5 +++--
>>  4 files changed, 27 insertions(+), 28 deletions(-)
>> 
>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> index d767508..8a391bd 100644
>> --- a/include/linux/user_namespace.h
>> +++ b/include/linux/user_namespace.h
>> @@ -9,8 +9,8 @@
>>  struct user_namespace {
>>  	struct kref		kref;
>>  	struct user_namespace	*parent;
>> -	struct user_struct	*creator;
>> -	struct work_struct	destroyer;
>> +	kuid_t			owner;
>> +	kgid_t			group;
>>  };
>>  
>>  extern struct user_namespace init_user_ns;
>> diff --git a/kernel/user.c b/kernel/user.c
>> index 025077e..cff3856 100644
>> --- a/kernel/user.c
>> +++ b/kernel/user.c
>> @@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
>>  	.kref = {
>>  		.refcount	= ATOMIC_INIT(3),
>>  	},
>> -	.creator = &root_user,
>> +	.owner = GLOBAL_ROOT_UID,
>> +	.group = GLOBAL_ROOT_GID,
>>  };
>>  EXPORT_SYMBOL_GPL(init_user_ns);
>>  
>> @@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
>>   */
>>  static DEFINE_SPINLOCK(uidhash_lock);
>>  
>> -/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
>> +/* root_user.__count is 1, for init task cred */
>>  struct user_struct root_user = {
>> -	.__count	= ATOMIC_INIT(2),
>> +	.__count	= ATOMIC_INIT(1),
>>  	.processes	= ATOMIC_INIT(1),
>>  	.files		= ATOMIC_INIT(0),
>>  	.sigpending	= ATOMIC_INIT(0),
>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> index 898e973..f69741a 100644
>> --- a/kernel/user_namespace.c
>> +++ b/kernel/user_namespace.c
>> @@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
>>  {
>>  	struct user_namespace *ns, *parent_ns = new->user_ns;
>>  	struct user_struct *root_user;
>> +	kuid_t owner = make_kuid(new->user_ns, new->euid);
>> +	kgid_t group = make_kgid(new->user_ns, new->egid);
>> +
>> +	/* The creator needs a mapping in the parent user namespace
>> +	 * or else we won't be able to reasonably tell userspace who
>> +	 * created a user_namespace.
>> +	 */
>> +	if (!kuid_has_mapping(parent_ns, owner) ||
>> +	    !kgid_has_mapping(parent_ns, group))
>> +		return -EPERM;
>>  
>>  	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
>>  	if (!ns)
>> @@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
>>  
>>  	/* set the new root user in the credentials under preparation */
>>  	ns->parent = parent_ns;
>
> I think in the past the creator cred pinned the ns->parent.  Do you now
> need to explicitly pin ns->parent (and release it in free_user_ns())?

Yes we do have to explicitly reference count the parent namespace.
But that happened in the patch 7:
"userns: Add an explicit reference to the parent user namespace"


Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 18/43] userns: Convert group_info values from gid_t to kgid_t.
  2012-04-18 18:49     ` Serge E. Hallyn
@ 2012-04-20 23:05           ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-20 23:05 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> 
>> As a first step to converting struct cred to be all kuid_t and kgid_t
>> values convert the group values stored in group_info to always be
>> kgid_t values.   Unless user namespaces are used this change should
>> have no effect.
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> ---
>>  arch/s390/kernel/compat_linux.c   |   13 ++++++++-
>>  fs/nfsd/auth.c                    |    5 ++-
>>  fs/proc/array.c                   |    5 +++-
>>  include/linux/cred.h              |    9 ++++---
>>  kernel/groups.c                   |   48 +++++++++++++++++++-----------------
>>  kernel/uid16.c                    |   14 +++++++++-
>>  net/ipv4/ping.c                   |   11 ++++++--
>>  net/sunrpc/auth_generic.c         |    4 +-
>>  net/sunrpc/auth_gss/svcauth_gss.c |    7 ++++-
>>  net/sunrpc/auth_unix.c            |   15 ++++++++---
>>  net/sunrpc/svcauth_unix.c         |   18 ++++++++++---
>>  security/keys/permission.c        |    3 +-
>>  12 files changed, 103 insertions(+), 49 deletions(-)
>> 
>> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
>> index ab64bdb..5baac18 100644
>> --- a/arch/s390/kernel/compat_linux.c
>> +++ b/arch/s390/kernel/compat_linux.c
>> @@ -173,11 +173,14 @@ asmlinkage long sys32_setfsgid16(u16 gid)
>>  
>>  static int groups16_to_user(u16 __user *grouplist, struct group_info *group_info)
>>  {
>> +	struct user_namespace *user_ns = current_user_ns();
>>  	int i;
>>  	u16 group;
>> +	kgid_t kgid;
>>  
>>  	for (i = 0; i < group_info->ngroups; i++) {
>> -		group = (u16)GROUP_AT(group_info, i);
>> +		kgid = GROUP_AT(group_info, i);
>> +		group = (u16)from_kgid_munged(user_ns, kgid);
>>  		if (put_user(group, grouplist+i))
>>  			return -EFAULT;
>>  	}
>> @@ -187,13 +190,19 @@ static int groups16_to_user(u16 __user *grouplist, struct group_info *group_info
>>  
>>  static int groups16_from_user(struct group_info *group_info, u16 __user *grouplist)
>>  {
>> +	struct user_namespace *user_ns = current_user_ns();
>>  	int i;
>>  	u16 group;
>
> need
>
> 	kgid_t kgid;
>
> here

Doh!  Thank you.

Now I am wondering why s390 has it's own copy of this function instead
of using the version in uid16.c.

>>  	for (i = 0; i < group_info->ngroups; i++) {
>>  		if (get_user(group, grouplist+i))
>>  			return  -EFAULT;
>> -		GROUP_AT(group_info, i) = (gid_t)group;
>> +
>> +		kgid = make_kgid(user_ns, (gid_t)group);
>> +		if (!gid_valid(kgid))
>> +			return -EINVAL;
>> +
>> +		GROUP_AT(group_info, i) = kgid;
>>  	}
>>  
>>  	return 0;

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 18/43] userns: Convert group_info values from gid_t to kgid_t.
@ 2012-04-20 23:05           ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-20 23:05 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Beiderman (ebiederm@xmission.com):
>> From: Eric W. Biederman <ebiederm@xmission.com>
>> 
>> As a first step to converting struct cred to be all kuid_t and kgid_t
>> values convert the group values stored in group_info to always be
>> kgid_t values.   Unless user namespaces are used this change should
>> have no effect.
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> ---
>>  arch/s390/kernel/compat_linux.c   |   13 ++++++++-
>>  fs/nfsd/auth.c                    |    5 ++-
>>  fs/proc/array.c                   |    5 +++-
>>  include/linux/cred.h              |    9 ++++---
>>  kernel/groups.c                   |   48 +++++++++++++++++++-----------------
>>  kernel/uid16.c                    |   14 +++++++++-
>>  net/ipv4/ping.c                   |   11 ++++++--
>>  net/sunrpc/auth_generic.c         |    4 +-
>>  net/sunrpc/auth_gss/svcauth_gss.c |    7 ++++-
>>  net/sunrpc/auth_unix.c            |   15 ++++++++---
>>  net/sunrpc/svcauth_unix.c         |   18 ++++++++++---
>>  security/keys/permission.c        |    3 +-
>>  12 files changed, 103 insertions(+), 49 deletions(-)
>> 
>> diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c
>> index ab64bdb..5baac18 100644
>> --- a/arch/s390/kernel/compat_linux.c
>> +++ b/arch/s390/kernel/compat_linux.c
>> @@ -173,11 +173,14 @@ asmlinkage long sys32_setfsgid16(u16 gid)
>>  
>>  static int groups16_to_user(u16 __user *grouplist, struct group_info *group_info)
>>  {
>> +	struct user_namespace *user_ns = current_user_ns();
>>  	int i;
>>  	u16 group;
>> +	kgid_t kgid;
>>  
>>  	for (i = 0; i < group_info->ngroups; i++) {
>> -		group = (u16)GROUP_AT(group_info, i);
>> +		kgid = GROUP_AT(group_info, i);
>> +		group = (u16)from_kgid_munged(user_ns, kgid);
>>  		if (put_user(group, grouplist+i))
>>  			return -EFAULT;
>>  	}
>> @@ -187,13 +190,19 @@ static int groups16_to_user(u16 __user *grouplist, struct group_info *group_info
>>  
>>  static int groups16_from_user(struct group_info *group_info, u16 __user *grouplist)
>>  {
>> +	struct user_namespace *user_ns = current_user_ns();
>>  	int i;
>>  	u16 group;
>
> need
>
> 	kgid_t kgid;
>
> here

Doh!  Thank you.

Now I am wondering why s390 has it's own copy of this function instead
of using the version in uid16.c.

>>  	for (i = 0; i < group_info->ngroups; i++) {
>>  		if (get_user(group, grouplist+i))
>>  			return  -EFAULT;
>> -		GROUP_AT(group_info, i) = (gid_t)group;
>> +
>> +		kgid = make_kgid(user_ns, (gid_t)group);
>> +		if (!gid_valid(kgid))
>> +			return -EINVAL;
>> +
>> +		GROUP_AT(group_info, i) = kgid;
>>  	}
>>  
>>  	return 0;

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 22/43] userns: Convert capabilities related permsion checks
  2012-04-18 18:51     ` Serge E. Hallyn
@ 2012-04-20 23:18           ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-20 23:18 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> 
>> - Use uid_eq when comparing kuids
>>   Use gid_eq when comparing kgids
>> - Use __make_kuid(user_ns, 0) to talk about the user_namespace root uid
>>   Use __make_kgid(user_ns, 0) to talk about the user_namespace root gid
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>
> Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
>
> though, nit,
>
>> ---
>>  fs/open.c            |    3 ++-
>>  security/commoncap.c |   43 ++++++++++++++++++++++++++++---------------
>>  2 files changed, 30 insertions(+), 16 deletions(-)
>> 

>> diff --git a/security/commoncap.c b/security/commoncap.c
>> index dbd465a..9bf8df8 100644
>> --- a/security/commoncap.c
>> +++ b/security/commoncap.c
>> @@ -472,19 +472,24 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
>>  	struct cred *new = bprm->cred;
>>  	bool effective, has_cap = false;
>>  	int ret;
>> +	kuid_t root_uid;
>> +	kgid_t root_gid;
>
> the root_gid is assigned but never used.

Thanks snipped.  It doesn't look like there will ever be a use for it.

>>  
>>  	effective = false;
>>  	ret = get_file_caps(bprm, &effective, &has_cap);
>>  	if (ret < 0)
>>  		return ret;
>>  
>> +	root_uid = make_kuid(new->user_ns, 0);
>> +	root_gid = make_kgid(new->user_ns, 0);
>> +

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 22/43] userns: Convert capabilities related permsion checks
@ 2012-04-20 23:18           ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-20 23:18 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Beiderman (ebiederm@xmission.com):
>> From: Eric W. Biederman <ebiederm@xmission.com>
>> 
>> - Use uid_eq when comparing kuids
>>   Use gid_eq when comparing kgids
>> - Use __make_kuid(user_ns, 0) to talk about the user_namespace root uid
>>   Use __make_kgid(user_ns, 0) to talk about the user_namespace root gid
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>
> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
>
> though, nit,
>
>> ---
>>  fs/open.c            |    3 ++-
>>  security/commoncap.c |   43 ++++++++++++++++++++++++++++---------------
>>  2 files changed, 30 insertions(+), 16 deletions(-)
>> 

>> diff --git a/security/commoncap.c b/security/commoncap.c
>> index dbd465a..9bf8df8 100644
>> --- a/security/commoncap.c
>> +++ b/security/commoncap.c
>> @@ -472,19 +472,24 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
>>  	struct cred *new = bprm->cred;
>>  	bool effective, has_cap = false;
>>  	int ret;
>> +	kuid_t root_uid;
>> +	kgid_t root_gid;
>
> the root_gid is assigned but never used.

Thanks snipped.  It doesn't look like there will ever be a use for it.

>>  
>>  	effective = false;
>>  	ret = get_file_caps(bprm, &effective, &has_cap);
>>  	if (ret < 0)
>>  		return ret;
>>  
>> +	root_uid = make_kuid(new->user_ns, 0);
>> +	root_gid = make_kgid(new->user_ns, 0);
>> +

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 24/43] userns: Convert ptrace, kill, set_priority permission checks to work with kuids and kgids
  2012-04-18 18:56     ` Serge E. Hallyn
@ 2012-04-20 23:51           ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-20 23:51 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> 
>> Update the permission checks to use the new uid_eq and gid_eq helpers
>> and remove the now unnecessary user_ns equality comparison.
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> ---

>> @@ -1389,10 +1388,8 @@ static int kill_as_cred_perm(const struct cred *cred,
>>  			     struct task_struct *target)
>>  {
>>  	const struct cred *pcred = __task_cred(target);
>> -	if (cred->user_ns != pcred->user_ns)
>> -		return 0;
>> -	if (cred->euid != pcred->suid && cred->euid != pcred->uid &&
>> -	    cred->uid  != pcred->suid && cred->uid  != pcred->uid)
>> +	if (uid_eq(cred->euid, pcred->suid) && uid_eq(cred->euid, pcred->uid) &&
>
> These should be !uid_eq() right?
>> +	    uid_eq(cred->uid,  pcred->suid) && uid_eq(cred->uid,
>pcred->uid))

Yes.

Thank you for catching this.  This kind of mistake is unfortunately much
to easy to make.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 24/43] userns: Convert ptrace, kill, set_priority permission checks to work with kuids and kgids
@ 2012-04-20 23:51           ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-20 23:51 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Beiderman (ebiederm@xmission.com):
>> From: Eric W. Biederman <ebiederm@xmission.com>
>> 
>> Update the permission checks to use the new uid_eq and gid_eq helpers
>> and remove the now unnecessary user_ns equality comparison.
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> ---

>> @@ -1389,10 +1388,8 @@ static int kill_as_cred_perm(const struct cred *cred,
>>  			     struct task_struct *target)
>>  {
>>  	const struct cred *pcred = __task_cred(target);
>> -	if (cred->user_ns != pcred->user_ns)
>> -		return 0;
>> -	if (cred->euid != pcred->suid && cred->euid != pcred->uid &&
>> -	    cred->uid  != pcred->suid && cred->uid  != pcred->uid)
>> +	if (uid_eq(cred->euid, pcred->suid) && uid_eq(cred->euid, pcred->uid) &&
>
> These should be !uid_eq() right?
>> +	    uid_eq(cred->uid,  pcred->suid) && uid_eq(cred->uid,
>pcred->uid))

Yes.

Thank you for catching this.  This kind of mistake is unfortunately much
to easy to make.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
       [not found]       ` <20120418190213.GD5186-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2012-04-21  0:05         ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-21  0:05 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> ---
>>  fs/attr.c                |    8 ++++----
>>  fs/exec.c                |   10 +++++-----
>>  fs/fcntl.c               |    6 +++---
>>  fs/ioprio.c              |    4 ++--
>>  fs/locks.c               |    2 +-
>>  fs/namei.c               |    8 ++++----
>>  include/linux/quotaops.h |    4 ++--
>>  7 files changed, 21 insertions(+), 21 deletions(-)
>> 
>> diff --git a/fs/attr.c b/fs/attr.c
>> index 73f69a6..2f094c6 100644
>> --- a/fs/attr.c
>> +++ b/fs/attr.c
>> @@ -47,14 +47,14 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)
>>  
>>  	/* Make sure a caller can chown. */
>>  	if ((ia_valid & ATTR_UID) &&
>> -	    (current_fsuid() != inode->i_uid ||
>> -	     attr->ia_uid != inode->i_uid) && !capable(CAP_CHOWN))
>> +	    (!uid_eq(current_fsuid(), inode->i_uid) ||
>> +	     !uid_eq(attr->ia_uid, inode->i_uid)) && !capable(CAP_CHOWN))
>>  		return -EPERM;
>>  
>>  	/* Make sure caller can chgrp. */
>>  	if ((ia_valid & ATTR_GID) &&
>> -	    (current_fsuid() != inode->i_uid ||
>> -	    (!in_group_p(attr->ia_gid) && attr->ia_gid != inode->i_gid)) &&
>> +	    (!uid_eq(current_fsuid(), inode->i_uid) ||
>> +	    (!in_group_p(attr->ia_gid) && gid_eq(attr->ia_gid, inode->i_gid))) &&
>
> This should be !gid_eq() ?

Yes.  Thank you, it is now fixed in my tree.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  2012-04-18 19:02       ` Serge E. Hallyn
  (?)
@ 2012-04-21  0:05       ` Eric W. Biederman
  -1 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-21  0:05 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Beiderman (ebiederm@xmission.com):
>> From: Eric W. Biederman <ebiederm@xmission.com>
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> ---
>>  fs/attr.c                |    8 ++++----
>>  fs/exec.c                |   10 +++++-----
>>  fs/fcntl.c               |    6 +++---
>>  fs/ioprio.c              |    4 ++--
>>  fs/locks.c               |    2 +-
>>  fs/namei.c               |    8 ++++----
>>  include/linux/quotaops.h |    4 ++--
>>  7 files changed, 21 insertions(+), 21 deletions(-)
>> 
>> diff --git a/fs/attr.c b/fs/attr.c
>> index 73f69a6..2f094c6 100644
>> --- a/fs/attr.c
>> +++ b/fs/attr.c
>> @@ -47,14 +47,14 @@ int inode_change_ok(const struct inode *inode, struct iattr *attr)
>>  
>>  	/* Make sure a caller can chown. */
>>  	if ((ia_valid & ATTR_UID) &&
>> -	    (current_fsuid() != inode->i_uid ||
>> -	     attr->ia_uid != inode->i_uid) && !capable(CAP_CHOWN))
>> +	    (!uid_eq(current_fsuid(), inode->i_uid) ||
>> +	     !uid_eq(attr->ia_uid, inode->i_uid)) && !capable(CAP_CHOWN))
>>  		return -EPERM;
>>  
>>  	/* Make sure caller can chgrp. */
>>  	if ((ia_valid & ATTR_GID) &&
>> -	    (current_fsuid() != inode->i_uid ||
>> -	    (!in_group_p(attr->ia_gid) && attr->ia_gid != inode->i_gid)) &&
>> +	    (!uid_eq(current_fsuid(), inode->i_uid) ||
>> +	    (!in_group_p(attr->ia_gid) && gid_eq(attr->ia_gid, inode->i_gid))) &&
>
> This should be !gid_eq() ?

Yes.  Thank you, it is now fixed in my tree.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  2012-04-18 19:03       ` Serge E. Hallyn
@ 2012-04-21  0:58           ` Eric W. Biederman
  -1 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-21  0:58 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> ---
>>  fs/attr.c                |    8 ++++----
>>  fs/exec.c                |   10 +++++-----
>>  fs/fcntl.c               |    6 +++---
>>  fs/ioprio.c              |    4 ++--
>>  fs/locks.c               |    2 +-
>>  fs/namei.c               |    8 ++++----
>>  include/linux/quotaops.h |    4 ++--
>>  7 files changed, 21 insertions(+), 21 deletions(-)
>> 

>> @@ -2120,7 +2120,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
>>  	if (__get_dumpable(cprm.mm_flags) == 2) {
>>  		/* Setuid core dump mode */
>>  		flag = O_EXCL;		/* Stop rewrite attacks */
>> -		cred->fsuid = 0;	/* Dump root private */
>> +		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
>
> Sorry, one more - can this be the per-ns root uid?  The coredumps should
> be ok to belong to privileged users in the namespace right?

I'm not certain it was clear when you were looking at this that
this is about dumping core from suid applications, not normal
applications. 

 Looking at the code in commoncap and commit_creds it looks like it is a
bug that we don't call set_dumpable(new, suid_dumpable) in common cap
when we use file capabilities.  I might be wrong but I think we escape
the test in commit_creds in that case.


Having thought about it we can make this per namespace but not in
this patch.

Things that I see as missing.
- We likely need to make the suid_dumpable sysctl per namespace.
  There is a prctl so it is already per process.
- We would need to capture the user_namespace at mm creation time,
  during exec, so we know which root user we could use.

  By it's nature we know an mm can't escape a user namespace so the
  user namespace an mm is created in will have a root user we can
  dump core as.

I was wondering if we could relax this to a uid captured at mm creation
time (and certainly we can capture the root user), but there are enough
weird cases I don't think it is possible to safely allow anything more
relaxed that the root of the user_namespace that created the mm.

I don't believe we can't use the user_namespace of current because the
application may have been suid and then cloned a new user namespace
keeping the mm or perhaps just the uid/euid split.

So in short it is doable but a little tricky so it doesn't belong in
a patch with a bunch of boring and safe conversions.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
@ 2012-04-21  0:58           ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-21  0:58 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Beiderman (ebiederm@xmission.com):
>> From: Eric W. Biederman <ebiederm@xmission.com>
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> ---
>>  fs/attr.c                |    8 ++++----
>>  fs/exec.c                |   10 +++++-----
>>  fs/fcntl.c               |    6 +++---
>>  fs/ioprio.c              |    4 ++--
>>  fs/locks.c               |    2 +-
>>  fs/namei.c               |    8 ++++----
>>  include/linux/quotaops.h |    4 ++--
>>  7 files changed, 21 insertions(+), 21 deletions(-)
>> 

>> @@ -2120,7 +2120,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
>>  	if (__get_dumpable(cprm.mm_flags) == 2) {
>>  		/* Setuid core dump mode */
>>  		flag = O_EXCL;		/* Stop rewrite attacks */
>> -		cred->fsuid = 0;	/* Dump root private */
>> +		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
>
> Sorry, one more - can this be the per-ns root uid?  The coredumps should
> be ok to belong to privileged users in the namespace right?

I'm not certain it was clear when you were looking at this that
this is about dumping core from suid applications, not normal
applications. 

 Looking at the code in commoncap and commit_creds it looks like it is a
bug that we don't call set_dumpable(new, suid_dumpable) in common cap
when we use file capabilities.  I might be wrong but I think we escape
the test in commit_creds in that case.


Having thought about it we can make this per namespace but not in
this patch.

Things that I see as missing.
- We likely need to make the suid_dumpable sysctl per namespace.
  There is a prctl so it is already per process.
- We would need to capture the user_namespace at mm creation time,
  during exec, so we know which root user we could use.

  By it's nature we know an mm can't escape a user namespace so the
  user namespace an mm is created in will have a root user we can
  dump core as.

I was wondering if we could relax this to a uid captured at mm creation
time (and certainly we can capture the root user), but there are enough
weird cases I don't think it is possible to safely allow anything more
relaxed that the root of the user_namespace that created the mm.

I don't believe we can't use the user_namespace of current because the
application may have been suid and then cloned a new user namespace
keeping the mm or perhaps just the uid/euid split.

So in short it is doable but a little tricky so it doesn't belong in
a patch with a bunch of boring and safe conversions.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 30/43] userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
  2012-04-18 19:09     ` Serge E. Hallyn
@ 2012-04-24  2:28           ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-24  2:28 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> 
>
> Oh, perhaps this is the right place in the thread to discuss the issue of
> what to do with file capabilities?  I'm ok waiting until the next iteration
> to even discuss it, so long as we start by refusing setting of fcaps by
> any task not in init_user_ns.

For now we do refuse all callers in the init_user_ns because that path
is protected by a capable and not an ns_capable call.

And as a general policy I have pushed all of the changes from capable to
ns_capable out till after we get these other user namespace bits so we
can get the patches reviewed so hopefully don't enable something that is
not safe.

Let's just note here that when we ever get a filesystem mounted in
something other than the init_user_ns or otherwise allow file
capabilities that do not belong to the init_user_ns we need to an
additional exec check to avoid a security issue for processes in the
init_user_ns using those credentials.

The other direction the init_user_ns setting file caps on a file and use
using them in a child namespace seems safe, and practical because of the
way we handle capabilities.  Aka if you have a capability in an outer
user namespace you also have it in a child user namespace.  Which means
a file cap exec today will give you just the capabilities in the child
user namespace.

Something else to think about when we reach filesystems mounted in
different user namespaces (aka unprivileged mounts) are security
labels on files in different user namespaces.  Not any kind of immediate
concern but something we may have to handle eventually.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 30/43] userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
@ 2012-04-24  2:28           ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-24  2:28 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Beiderman (ebiederm@xmission.com):
>> From: Eric W. Biederman <ebiederm@xmission.com>
>> 
>
> Oh, perhaps this is the right place in the thread to discuss the issue of
> what to do with file capabilities?  I'm ok waiting until the next iteration
> to even discuss it, so long as we start by refusing setting of fcaps by
> any task not in init_user_ns.

For now we do refuse all callers in the init_user_ns because that path
is protected by a capable and not an ns_capable call.

And as a general policy I have pushed all of the changes from capable to
ns_capable out till after we get these other user namespace bits so we
can get the patches reviewed so hopefully don't enable something that is
not safe.

Let's just note here that when we ever get a filesystem mounted in
something other than the init_user_ns or otherwise allow file
capabilities that do not belong to the init_user_ns we need to an
additional exec check to avoid a security issue for processes in the
init_user_ns using those credentials.

The other direction the init_user_ns setting file caps on a file and use
using them in a child namespace seems safe, and practical because of the
way we handle capabilities.  Aka if you have a capability in an outer
user namespace you also have it in a child user namespace.  Which means
a file cap exec today will give you just the capabilities in the child
user namespace.

Something else to think about when we reach filesystems mounted in
different user namespaces (aka unprivileged mounts) are security
labels on files in different user namespaces.  Not any kind of immediate
concern but something we may have to handle eventually.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 33/43] userns: Convert binary formats to use kuid/kgid where appropriate
       [not found]       ` <20120418191033.GL5186-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2012-04-24  2:44         ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-24  2:44 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>
> Ok I'll spare the list the traffic - for patches 33-43,
>
> Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

I noticed one other significant gotcha with my changes.

Code that has not been converted and needs to be converted fails to
build with type errors.  -- Good.

Code that has not been converted and needs to be converted is still
selectable in make config.  -- Bad

make oldmodconfig -- fails with type errors unless everything is
converted Bad.

So I am adding/debugging an additional patch that will catch these
failure in Kconfig so people don't get nasty surprises.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 33/43] userns: Convert binary formats to use kuid/kgid where appropriate
  2012-04-18 19:10     ` Serge E. Hallyn
@ 2012-04-24  2:44       ` Eric W. Biederman
       [not found]       ` <20120418191033.GL5186-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  1 sibling, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-24  2:44 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Beiderman (ebiederm@xmission.com):
>> From: Eric W. Biederman <ebiederm@xmission.com>
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>
> Ok I'll spare the list the traffic - for patches 33-43,
>
> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

I noticed one other significant gotcha with my changes.

Code that has not been converted and needs to be converted fails to
build with type errors.  -- Good.

Code that has not been converted and needs to be converted is still
selectable in make config.  -- Bad

make oldmodconfig -- fails with type errors unless everything is
converted Bad.

So I am adding/debugging an additional patch that will catch these
failure in Kconfig so people don't get nasty surprises.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 30/43] userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
  2012-04-24  2:28           ` Eric W. Biederman
@ 2012-04-24 15:10               ` Serge Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge Hallyn @ 2012-04-24 15:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Cyrill Gorcunov, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> 
> > Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> >> 
> >
> > Oh, perhaps this is the right place in the thread to discuss the issue of
> > what to do with file capabilities?  I'm ok waiting until the next iteration
> > to even discuss it, so long as we start by refusing setting of fcaps by
> > any task not in init_user_ns.
> 
> For now we do refuse all callers in the init_user_ns because that path
> is protected by a capable and not an ns_capable call.
> 
> And as a general policy I have pushed all of the changes from capable to
> ns_capable out till after we get these other user namespace bits so we
> can get the patches reviewed so hopefully don't enable something that is
> not safe.
> 
> Let's just note here that when we ever get a filesystem mounted in
> something other than the init_user_ns or otherwise allow file
> capabilities that do not belong to the init_user_ns we need to an
> additional exec check to avoid a security issue for processes in the
> init_user_ns using those credentials.
> 
> The other direction the init_user_ns setting file caps on a file and use
> using them in a child namespace seems safe, and practical because of the
> way we handle capabilities.  Aka if you have a capability in an outer
> user namespace you also have it in a child user namespace.  Which means
> a file cap exec today will give you just the capabilities in the child
> user namespace.
> 
> Something else to think about when we reach filesystems mounted in
> different user namespaces (aka unprivileged mounts) are security
> labels on files in different user namespaces.  Not any kind of immediate
> concern but something we may have to handle eventually.

An interesting concern to discuss at the security mini-summit (or just
at the UDS session).

-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 30/43] userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
@ 2012-04-24 15:10               ` Serge Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge Hallyn @ 2012-04-24 15:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Linux Containers, linux-kernel, linux-fsdevel,
	linux-security-module, Al Viro, Cyrill Gorcunov, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> 
> > Quoting Eric W. Beiderman (ebiederm@xmission.com):
> >> From: Eric W. Biederman <ebiederm@xmission.com>
> >> 
> >
> > Oh, perhaps this is the right place in the thread to discuss the issue of
> > what to do with file capabilities?  I'm ok waiting until the next iteration
> > to even discuss it, so long as we start by refusing setting of fcaps by
> > any task not in init_user_ns.
> 
> For now we do refuse all callers in the init_user_ns because that path
> is protected by a capable and not an ns_capable call.
> 
> And as a general policy I have pushed all of the changes from capable to
> ns_capable out till after we get these other user namespace bits so we
> can get the patches reviewed so hopefully don't enable something that is
> not safe.
> 
> Let's just note here that when we ever get a filesystem mounted in
> something other than the init_user_ns or otherwise allow file
> capabilities that do not belong to the init_user_ns we need to an
> additional exec check to avoid a security issue for processes in the
> init_user_ns using those credentials.
> 
> The other direction the init_user_ns setting file caps on a file and use
> using them in a child namespace seems safe, and practical because of the
> way we handle capabilities.  Aka if you have a capability in an outer
> user namespace you also have it in a child user namespace.  Which means
> a file cap exec today will give you just the capabilities in the child
> user namespace.
> 
> Something else to think about when we reach filesystems mounted in
> different user namespaces (aka unprivileged mounts) are security
> labels on files in different user namespaces.  Not any kind of immediate
> concern but something we may have to handle eventually.

An interesting concern to discuss at the security mini-summit (or just
at the UDS session).

-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
  2012-04-20 22:58       ` Eric W. Biederman
@ 2012-04-24 17:33             ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-24 17:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> 
> > Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> >> 
> >> - Transform userns->creator from a user_struct reference to a simple
> >>   kuid_t, kgid_t pair.
> >> 
> >>   In cap_capable this allows the check to see if we are the creator of
> >>   a namespace to become the classic suser style euid permission check.
> >> 
> >>   This allows us to remove the need for a struct cred in the mapping
> >>   functions and still be able to dispaly the user namespace creators
> >>   uid and gid as 0.
> >> 
> >> - Remove the now unnecessary delayed_work in free_user_ns.
> >> 
> >>   All that is left for free_user_ns to do is to call kmem_cache_free
> >>   and put_user_ns.  Those functions can be called in any context
> >>   so call them directly from free_user_ns removing the need for delayed work.
> >> 
> >> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> >> ---
> >>  include/linux/user_namespace.h |    4 ++--
> >>  kernel/user.c                  |    7 ++++---
> >>  kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
> >>  security/commoncap.c           |    5 +++--
> >>  4 files changed, 27 insertions(+), 28 deletions(-)
> >> 
> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> >> index d767508..8a391bd 100644
> >> --- a/include/linux/user_namespace.h
> >> +++ b/include/linux/user_namespace.h
> >> @@ -9,8 +9,8 @@
> >>  struct user_namespace {
> >>  	struct kref		kref;
> >>  	struct user_namespace	*parent;
> >> -	struct user_struct	*creator;
> >> -	struct work_struct	destroyer;
> >> +	kuid_t			owner;
> >> +	kgid_t			group;
> >>  };
> >>  
> >>  extern struct user_namespace init_user_ns;
> >> diff --git a/kernel/user.c b/kernel/user.c
> >> index 025077e..cff3856 100644
> >> --- a/kernel/user.c
> >> +++ b/kernel/user.c
> >> @@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
> >>  	.kref = {
> >>  		.refcount	= ATOMIC_INIT(3),
> >>  	},
> >> -	.creator = &root_user,
> >> +	.owner = GLOBAL_ROOT_UID,
> >> +	.group = GLOBAL_ROOT_GID,
> >>  };
> >>  EXPORT_SYMBOL_GPL(init_user_ns);
> >>  
> >> @@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
> >>   */
> >>  static DEFINE_SPINLOCK(uidhash_lock);
> >>  
> >> -/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
> >> +/* root_user.__count is 1, for init task cred */
> >>  struct user_struct root_user = {
> >> -	.__count	= ATOMIC_INIT(2),
> >> +	.__count	= ATOMIC_INIT(1),
> >>  	.processes	= ATOMIC_INIT(1),
> >>  	.files		= ATOMIC_INIT(0),
> >>  	.sigpending	= ATOMIC_INIT(0),
> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> >> index 898e973..f69741a 100644
> >> --- a/kernel/user_namespace.c
> >> +++ b/kernel/user_namespace.c
> >> @@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
> >>  {
> >>  	struct user_namespace *ns, *parent_ns = new->user_ns;
> >>  	struct user_struct *root_user;
> >> +	kuid_t owner = make_kuid(new->user_ns, new->euid);
> >> +	kgid_t group = make_kgid(new->user_ns, new->egid);
> >> +
> >> +	/* The creator needs a mapping in the parent user namespace
> >> +	 * or else we won't be able to reasonably tell userspace who
> >> +	 * created a user_namespace.
> >> +	 */
> >> +	if (!kuid_has_mapping(parent_ns, owner) ||
> >> +	    !kgid_has_mapping(parent_ns, group))
> >> +		return -EPERM;
> >>  
> >>  	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
> >>  	if (!ns)
> >> @@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
> >>  
> >>  	/* set the new root user in the credentials under preparation */
> >>  	ns->parent = parent_ns;
> >
> > I think in the past the creator cred pinned the ns->parent.  Do you now
> > need to explicitly pin ns->parent (and release it in free_user_ns())?
> 
> Yes we do have to explicitly reference count the parent namespace.
> But that happened in the patch 7:
> "userns: Add an explicit reference to the parent user namespace"

Perhaps that suffices, but I'm not convinced.  The struct cred is
pinning it's own ns, but if t1 does clone(CLONE_NEWUSER) to produce
t2, which does the same to procduce t3, and then t2 exits, I'm not
seeing what will pin t2's userns.

-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
@ 2012-04-24 17:33             ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-24 17:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> 
> > Quoting Eric W. Beiderman (ebiederm@xmission.com):
> >> From: Eric W. Biederman <ebiederm@xmission.com>
> >> 
> >> - Transform userns->creator from a user_struct reference to a simple
> >>   kuid_t, kgid_t pair.
> >> 
> >>   In cap_capable this allows the check to see if we are the creator of
> >>   a namespace to become the classic suser style euid permission check.
> >> 
> >>   This allows us to remove the need for a struct cred in the mapping
> >>   functions and still be able to dispaly the user namespace creators
> >>   uid and gid as 0.
> >> 
> >> - Remove the now unnecessary delayed_work in free_user_ns.
> >> 
> >>   All that is left for free_user_ns to do is to call kmem_cache_free
> >>   and put_user_ns.  Those functions can be called in any context
> >>   so call them directly from free_user_ns removing the need for delayed work.
> >> 
> >> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> >> ---
> >>  include/linux/user_namespace.h |    4 ++--
> >>  kernel/user.c                  |    7 ++++---
> >>  kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
> >>  security/commoncap.c           |    5 +++--
> >>  4 files changed, 27 insertions(+), 28 deletions(-)
> >> 
> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> >> index d767508..8a391bd 100644
> >> --- a/include/linux/user_namespace.h
> >> +++ b/include/linux/user_namespace.h
> >> @@ -9,8 +9,8 @@
> >>  struct user_namespace {
> >>  	struct kref		kref;
> >>  	struct user_namespace	*parent;
> >> -	struct user_struct	*creator;
> >> -	struct work_struct	destroyer;
> >> +	kuid_t			owner;
> >> +	kgid_t			group;
> >>  };
> >>  
> >>  extern struct user_namespace init_user_ns;
> >> diff --git a/kernel/user.c b/kernel/user.c
> >> index 025077e..cff3856 100644
> >> --- a/kernel/user.c
> >> +++ b/kernel/user.c
> >> @@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
> >>  	.kref = {
> >>  		.refcount	= ATOMIC_INIT(3),
> >>  	},
> >> -	.creator = &root_user,
> >> +	.owner = GLOBAL_ROOT_UID,
> >> +	.group = GLOBAL_ROOT_GID,
> >>  };
> >>  EXPORT_SYMBOL_GPL(init_user_ns);
> >>  
> >> @@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
> >>   */
> >>  static DEFINE_SPINLOCK(uidhash_lock);
> >>  
> >> -/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
> >> +/* root_user.__count is 1, for init task cred */
> >>  struct user_struct root_user = {
> >> -	.__count	= ATOMIC_INIT(2),
> >> +	.__count	= ATOMIC_INIT(1),
> >>  	.processes	= ATOMIC_INIT(1),
> >>  	.files		= ATOMIC_INIT(0),
> >>  	.sigpending	= ATOMIC_INIT(0),
> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> >> index 898e973..f69741a 100644
> >> --- a/kernel/user_namespace.c
> >> +++ b/kernel/user_namespace.c
> >> @@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
> >>  {
> >>  	struct user_namespace *ns, *parent_ns = new->user_ns;
> >>  	struct user_struct *root_user;
> >> +	kuid_t owner = make_kuid(new->user_ns, new->euid);
> >> +	kgid_t group = make_kgid(new->user_ns, new->egid);
> >> +
> >> +	/* The creator needs a mapping in the parent user namespace
> >> +	 * or else we won't be able to reasonably tell userspace who
> >> +	 * created a user_namespace.
> >> +	 */
> >> +	if (!kuid_has_mapping(parent_ns, owner) ||
> >> +	    !kgid_has_mapping(parent_ns, group))
> >> +		return -EPERM;
> >>  
> >>  	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
> >>  	if (!ns)
> >> @@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
> >>  
> >>  	/* set the new root user in the credentials under preparation */
> >>  	ns->parent = parent_ns;
> >
> > I think in the past the creator cred pinned the ns->parent.  Do you now
> > need to explicitly pin ns->parent (and release it in free_user_ns())?
> 
> Yes we do have to explicitly reference count the parent namespace.
> But that happened in the patch 7:
> "userns: Add an explicit reference to the parent user namespace"

Perhaps that suffices, but I'm not convinced.  The struct cred is
pinning it's own ns, but if t1 does clone(CLONE_NEWUSER) to produce
t2, which does the same to procduce t3, and then t2 exits, I'm not
seeing what will pin t2's userns.

-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
       [not found]           ` <m1sjfx2950.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2012-04-24 17:41             ` Serge E. Hallyn
  2012-04-26  0:11               ` Serge E. Hallyn
  1 sibling, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-24 17:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> 
> > Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> >> 
> >> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> >> ---
> >>  fs/attr.c                |    8 ++++----
> >>  fs/exec.c                |   10 +++++-----
> >>  fs/fcntl.c               |    6 +++---
> >>  fs/ioprio.c              |    4 ++--
> >>  fs/locks.c               |    2 +-
> >>  fs/namei.c               |    8 ++++----
> >>  include/linux/quotaops.h |    4 ++--
> >>  7 files changed, 21 insertions(+), 21 deletions(-)
> >> 
> 
> >> @@ -2120,7 +2120,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
> >>  	if (__get_dumpable(cprm.mm_flags) == 2) {
> >>  		/* Setuid core dump mode */
> >>  		flag = O_EXCL;		/* Stop rewrite attacks */
> >> -		cred->fsuid = 0;	/* Dump root private */
> >> +		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
> >
> > Sorry, one more - can this be the per-ns root uid?  The coredumps should
> > be ok to belong to privileged users in the namespace right?
> 
> I'm not certain it was clear when you were looking at this that
> this is about dumping core from suid applications, not normal
> applications. 

Right, that makes sense,  but I was thinking we could just find the
root uid for the namespace in which current_euid() is valid.

Waiting is fine :)

>  Looking at the code in commoncap and commit_creds it looks like it is a
> bug that we don't call set_dumpable(new, suid_dumpable) in common cap

That sounds like a bug, yes.

> when we use file capabilities.  I might be wrong but I think we escape
> the test in commit_creds in that case.
> 
> 
> Having thought about it we can make this per namespace but not in
> this patch.
> 
> Things that I see as missing.
> - We likely need to make the suid_dumpable sysctl per namespace.
>   There is a prctl so it is already per process.
> - We would need to capture the user_namespace at mm creation time,
>   during exec, so we know which root user we could use.
>
>   By it's nature we know an mm can't escape a user namespace so the
>   user namespace an mm is created in will have a root user we can
>   dump core as.
> 
> I was wondering if we could relax this to a uid captured at mm creation
> time (and certainly we can capture the root user), but there are enough
> weird cases I don't think it is possible to safely allow anything more
> relaxed that the root of the user_namespace that created the mm.
> 
> I don't believe we can't use the user_namespace of current because the
> application may have been suid and then cloned a new user namespace
> keeping the mm or perhaps just the uid/euid split.
> 
> So in short it is doable but a little tricky so it doesn't belong in
> a patch with a bunch of boring and safe conversions.
> 
> Eric

Thanks for elaborating on that.  Makes sense to wait.

-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  2012-04-21  0:58           ` Eric W. Biederman
  (?)
  (?)
@ 2012-04-24 17:41           ` Serge E. Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-24 17:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> 
> > Quoting Eric W. Beiderman (ebiederm@xmission.com):
> >> From: Eric W. Biederman <ebiederm@xmission.com>
> >> 
> >> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> >> ---
> >>  fs/attr.c                |    8 ++++----
> >>  fs/exec.c                |   10 +++++-----
> >>  fs/fcntl.c               |    6 +++---
> >>  fs/ioprio.c              |    4 ++--
> >>  fs/locks.c               |    2 +-
> >>  fs/namei.c               |    8 ++++----
> >>  include/linux/quotaops.h |    4 ++--
> >>  7 files changed, 21 insertions(+), 21 deletions(-)
> >> 
> 
> >> @@ -2120,7 +2120,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
> >>  	if (__get_dumpable(cprm.mm_flags) == 2) {
> >>  		/* Setuid core dump mode */
> >>  		flag = O_EXCL;		/* Stop rewrite attacks */
> >> -		cred->fsuid = 0;	/* Dump root private */
> >> +		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
> >
> > Sorry, one more - can this be the per-ns root uid?  The coredumps should
> > be ok to belong to privileged users in the namespace right?
> 
> I'm not certain it was clear when you were looking at this that
> this is about dumping core from suid applications, not normal
> applications. 

Right, that makes sense,  but I was thinking we could just find the
root uid for the namespace in which current_euid() is valid.

Waiting is fine :)

>  Looking at the code in commoncap and commit_creds it looks like it is a
> bug that we don't call set_dumpable(new, suid_dumpable) in common cap

That sounds like a bug, yes.

> when we use file capabilities.  I might be wrong but I think we escape
> the test in commit_creds in that case.
> 
> 
> Having thought about it we can make this per namespace but not in
> this patch.
> 
> Things that I see as missing.
> - We likely need to make the suid_dumpable sysctl per namespace.
>   There is a prctl so it is already per process.
> - We would need to capture the user_namespace at mm creation time,
>   during exec, so we know which root user we could use.
>
>   By it's nature we know an mm can't escape a user namespace so the
>   user namespace an mm is created in will have a root user we can
>   dump core as.
> 
> I was wondering if we could relax this to a uid captured at mm creation
> time (and certainly we can capture the root user), but there are enough
> weird cases I don't think it is possible to safely allow anything more
> relaxed that the root of the user_namespace that created the mm.
> 
> I don't believe we can't use the user_namespace of current because the
> application may have been suid and then cloned a new user namespace
> keeping the mm or perhaps just the uid/euid split.
> 
> So in short it is doable but a little tricky so it doesn't belong in
> a patch with a bunch of boring and safe conversions.
> 
> Eric

Thanks for elaborating on that.  Makes sense to wait.

-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
  2012-04-24 17:33             ` Serge E. Hallyn
@ 2012-04-24 19:41                 ` Eric W. Biederman
  -1 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-24 19:41 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
>> 
>> > Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> >> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> >> 
>> >> - Transform userns->creator from a user_struct reference to a simple
>> >>   kuid_t, kgid_t pair.
>> >> 
>> >>   In cap_capable this allows the check to see if we are the creator of
>> >>   a namespace to become the classic suser style euid permission check.
>> >> 
>> >>   This allows us to remove the need for a struct cred in the mapping
>> >>   functions and still be able to dispaly the user namespace creators
>> >>   uid and gid as 0.
>> >> 
>> >> - Remove the now unnecessary delayed_work in free_user_ns.
>> >> 
>> >>   All that is left for free_user_ns to do is to call kmem_cache_free
>> >>   and put_user_ns.  Those functions can be called in any context
>> >>   so call them directly from free_user_ns removing the need for delayed work.
>> >> 
>> >> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> >> ---
>> >>  include/linux/user_namespace.h |    4 ++--
>> >>  kernel/user.c                  |    7 ++++---
>> >>  kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
>> >>  security/commoncap.c           |    5 +++--
>> >>  4 files changed, 27 insertions(+), 28 deletions(-)
>> >> 
>> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> >> index d767508..8a391bd 100644
>> >> --- a/include/linux/user_namespace.h
>> >> +++ b/include/linux/user_namespace.h
>> >> @@ -9,8 +9,8 @@
>> >>  struct user_namespace {
>> >>  	struct kref		kref;
>> >>  	struct user_namespace	*parent;
>> >> -	struct user_struct	*creator;
>> >> -	struct work_struct	destroyer;
>> >> +	kuid_t			owner;
>> >> +	kgid_t			group;
>> >>  };
>> >>  
>> >>  extern struct user_namespace init_user_ns;
>> >> diff --git a/kernel/user.c b/kernel/user.c
>> >> index 025077e..cff3856 100644
>> >> --- a/kernel/user.c
>> >> +++ b/kernel/user.c
>> >> @@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
>> >>  	.kref = {
>> >>  		.refcount	= ATOMIC_INIT(3),
>> >>  	},
>> >> -	.creator = &root_user,
>> >> +	.owner = GLOBAL_ROOT_UID,
>> >> +	.group = GLOBAL_ROOT_GID,
>> >>  };
>> >>  EXPORT_SYMBOL_GPL(init_user_ns);
>> >>  
>> >> @@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
>> >>   */
>> >>  static DEFINE_SPINLOCK(uidhash_lock);
>> >>  
>> >> -/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
>> >> +/* root_user.__count is 1, for init task cred */
>> >>  struct user_struct root_user = {
>> >> -	.__count	= ATOMIC_INIT(2),
>> >> +	.__count	= ATOMIC_INIT(1),
>> >>  	.processes	= ATOMIC_INIT(1),
>> >>  	.files		= ATOMIC_INIT(0),
>> >>  	.sigpending	= ATOMIC_INIT(0),
>> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> >> index 898e973..f69741a 100644
>> >> --- a/kernel/user_namespace.c
>> >> +++ b/kernel/user_namespace.c
>> >> @@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
>> >>  {
>> >>  	struct user_namespace *ns, *parent_ns = new->user_ns;
>> >>  	struct user_struct *root_user;
>> >> +	kuid_t owner = make_kuid(new->user_ns, new->euid);
>> >> +	kgid_t group = make_kgid(new->user_ns, new->egid);
>> >> +
>> >> +	/* The creator needs a mapping in the parent user namespace
>> >> +	 * or else we won't be able to reasonably tell userspace who
>> >> +	 * created a user_namespace.
>> >> +	 */
>> >> +	if (!kuid_has_mapping(parent_ns, owner) ||
>> >> +	    !kgid_has_mapping(parent_ns, group))
>> >> +		return -EPERM;
>> >>  
>> >>  	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
>> >>  	if (!ns)
>> >> @@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
>> >>  
>> >>  	/* set the new root user in the credentials under preparation */
>> >>  	ns->parent = parent_ns;
>> >
>> > I think in the past the creator cred pinned the ns->parent.  Do you now
>> > need to explicitly pin ns->parent (and release it in free_user_ns())?
>> 
>> Yes we do have to explicitly reference count the parent namespace.
>> But that happened in the patch 7:
>> "userns: Add an explicit reference to the parent user namespace"

Make that patch 8 not patch 7: 
"userns: Add an explicit reference to the parent user namespace"
Perhaps the patch number reference pointed you to look at the wrong code.

> Perhaps that suffices, but I'm not convinced.  The struct cred is
> pinning it's own ns, but if t1 does clone(CLONE_NEWUSER) to produce
> t2, which does the same to procduce t3, and then t2 exits, I'm not
> seeing what will pin t2's userns.

t3's userns hold's a reference to the departed t2's userns.
t2's userns hold's a reference to t1's userns.

free_user_ns does put that userns reference.

It is all there and explict.  Usernamespaces refer directly to each
other.  That was all needed to get struct user out of the usernamespace
game.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
@ 2012-04-24 19:41                 ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-24 19:41 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> "Serge E. Hallyn" <serge@hallyn.com> writes:
>> 
>> > Quoting Eric W. Beiderman (ebiederm@xmission.com):
>> >> From: Eric W. Biederman <ebiederm@xmission.com>
>> >> 
>> >> - Transform userns->creator from a user_struct reference to a simple
>> >>   kuid_t, kgid_t pair.
>> >> 
>> >>   In cap_capable this allows the check to see if we are the creator of
>> >>   a namespace to become the classic suser style euid permission check.
>> >> 
>> >>   This allows us to remove the need for a struct cred in the mapping
>> >>   functions and still be able to dispaly the user namespace creators
>> >>   uid and gid as 0.
>> >> 
>> >> - Remove the now unnecessary delayed_work in free_user_ns.
>> >> 
>> >>   All that is left for free_user_ns to do is to call kmem_cache_free
>> >>   and put_user_ns.  Those functions can be called in any context
>> >>   so call them directly from free_user_ns removing the need for delayed work.
>> >> 
>> >> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> >> ---
>> >>  include/linux/user_namespace.h |    4 ++--
>> >>  kernel/user.c                  |    7 ++++---
>> >>  kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
>> >>  security/commoncap.c           |    5 +++--
>> >>  4 files changed, 27 insertions(+), 28 deletions(-)
>> >> 
>> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> >> index d767508..8a391bd 100644
>> >> --- a/include/linux/user_namespace.h
>> >> +++ b/include/linux/user_namespace.h
>> >> @@ -9,8 +9,8 @@
>> >>  struct user_namespace {
>> >>  	struct kref		kref;
>> >>  	struct user_namespace	*parent;
>> >> -	struct user_struct	*creator;
>> >> -	struct work_struct	destroyer;
>> >> +	kuid_t			owner;
>> >> +	kgid_t			group;
>> >>  };
>> >>  
>> >>  extern struct user_namespace init_user_ns;
>> >> diff --git a/kernel/user.c b/kernel/user.c
>> >> index 025077e..cff3856 100644
>> >> --- a/kernel/user.c
>> >> +++ b/kernel/user.c
>> >> @@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
>> >>  	.kref = {
>> >>  		.refcount	= ATOMIC_INIT(3),
>> >>  	},
>> >> -	.creator = &root_user,
>> >> +	.owner = GLOBAL_ROOT_UID,
>> >> +	.group = GLOBAL_ROOT_GID,
>> >>  };
>> >>  EXPORT_SYMBOL_GPL(init_user_ns);
>> >>  
>> >> @@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
>> >>   */
>> >>  static DEFINE_SPINLOCK(uidhash_lock);
>> >>  
>> >> -/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
>> >> +/* root_user.__count is 1, for init task cred */
>> >>  struct user_struct root_user = {
>> >> -	.__count	= ATOMIC_INIT(2),
>> >> +	.__count	= ATOMIC_INIT(1),
>> >>  	.processes	= ATOMIC_INIT(1),
>> >>  	.files		= ATOMIC_INIT(0),
>> >>  	.sigpending	= ATOMIC_INIT(0),
>> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> >> index 898e973..f69741a 100644
>> >> --- a/kernel/user_namespace.c
>> >> +++ b/kernel/user_namespace.c
>> >> @@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
>> >>  {
>> >>  	struct user_namespace *ns, *parent_ns = new->user_ns;
>> >>  	struct user_struct *root_user;
>> >> +	kuid_t owner = make_kuid(new->user_ns, new->euid);
>> >> +	kgid_t group = make_kgid(new->user_ns, new->egid);
>> >> +
>> >> +	/* The creator needs a mapping in the parent user namespace
>> >> +	 * or else we won't be able to reasonably tell userspace who
>> >> +	 * created a user_namespace.
>> >> +	 */
>> >> +	if (!kuid_has_mapping(parent_ns, owner) ||
>> >> +	    !kgid_has_mapping(parent_ns, group))
>> >> +		return -EPERM;
>> >>  
>> >>  	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
>> >>  	if (!ns)
>> >> @@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
>> >>  
>> >>  	/* set the new root user in the credentials under preparation */
>> >>  	ns->parent = parent_ns;
>> >
>> > I think in the past the creator cred pinned the ns->parent.  Do you now
>> > need to explicitly pin ns->parent (and release it in free_user_ns())?
>> 
>> Yes we do have to explicitly reference count the parent namespace.
>> But that happened in the patch 7:
>> "userns: Add an explicit reference to the parent user namespace"

Make that patch 8 not patch 7: 
"userns: Add an explicit reference to the parent user namespace"
Perhaps the patch number reference pointed you to look at the wrong code.

> Perhaps that suffices, but I'm not convinced.  The struct cred is
> pinning it's own ns, but if t1 does clone(CLONE_NEWUSER) to produce
> t2, which does the same to procduce t3, and then t2 exits, I'm not
> seeing what will pin t2's userns.

t3's userns hold's a reference to the departed t2's userns.
t2's userns hold's a reference to t1's userns.

free_user_ns does put that userns reference.

It is all there and explict.  Usernamespaces refer directly to each
other.  That was all needed to get struct user out of the usernamespace
game.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
  2012-04-24 19:41                 ` Eric W. Biederman
@ 2012-04-24 20:23                     ` Serge E. Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-24 20:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> 
> > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> >> 
> >> > Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >> >> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> >> >> 
> >> >> - Transform userns->creator from a user_struct reference to a simple
> >> >>   kuid_t, kgid_t pair.
> >> >> 
> >> >>   In cap_capable this allows the check to see if we are the creator of
> >> >>   a namespace to become the classic suser style euid permission check.
> >> >> 
> >> >>   This allows us to remove the need for a struct cred in the mapping
> >> >>   functions and still be able to dispaly the user namespace creators
> >> >>   uid and gid as 0.
> >> >> 
> >> >> - Remove the now unnecessary delayed_work in free_user_ns.
> >> >> 
> >> >>   All that is left for free_user_ns to do is to call kmem_cache_free
> >> >>   and put_user_ns.  Those functions can be called in any context
> >> >>   so call them directly from free_user_ns removing the need for delayed work.
> >> >> 
> >> >> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> >> >> ---
> >> >>  include/linux/user_namespace.h |    4 ++--
> >> >>  kernel/user.c                  |    7 ++++---
> >> >>  kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
> >> >>  security/commoncap.c           |    5 +++--
> >> >>  4 files changed, 27 insertions(+), 28 deletions(-)
> >> >> 
> >> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> >> >> index d767508..8a391bd 100644
> >> >> --- a/include/linux/user_namespace.h
> >> >> +++ b/include/linux/user_namespace.h
> >> >> @@ -9,8 +9,8 @@
> >> >>  struct user_namespace {
> >> >>  	struct kref		kref;
> >> >>  	struct user_namespace	*parent;
> >> >> -	struct user_struct	*creator;
> >> >> -	struct work_struct	destroyer;
> >> >> +	kuid_t			owner;
> >> >> +	kgid_t			group;
> >> >>  };
> >> >>  
> >> >>  extern struct user_namespace init_user_ns;
> >> >> diff --git a/kernel/user.c b/kernel/user.c
> >> >> index 025077e..cff3856 100644
> >> >> --- a/kernel/user.c
> >> >> +++ b/kernel/user.c
> >> >> @@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
> >> >>  	.kref = {
> >> >>  		.refcount	= ATOMIC_INIT(3),
> >> >>  	},
> >> >> -	.creator = &root_user,
> >> >> +	.owner = GLOBAL_ROOT_UID,
> >> >> +	.group = GLOBAL_ROOT_GID,
> >> >>  };
> >> >>  EXPORT_SYMBOL_GPL(init_user_ns);
> >> >>  
> >> >> @@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
> >> >>   */
> >> >>  static DEFINE_SPINLOCK(uidhash_lock);
> >> >>  
> >> >> -/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
> >> >> +/* root_user.__count is 1, for init task cred */
> >> >>  struct user_struct root_user = {
> >> >> -	.__count	= ATOMIC_INIT(2),
> >> >> +	.__count	= ATOMIC_INIT(1),
> >> >>  	.processes	= ATOMIC_INIT(1),
> >> >>  	.files		= ATOMIC_INIT(0),
> >> >>  	.sigpending	= ATOMIC_INIT(0),
> >> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> >> >> index 898e973..f69741a 100644
> >> >> --- a/kernel/user_namespace.c
> >> >> +++ b/kernel/user_namespace.c
> >> >> @@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
> >> >>  {
> >> >>  	struct user_namespace *ns, *parent_ns = new->user_ns;
> >> >>  	struct user_struct *root_user;
> >> >> +	kuid_t owner = make_kuid(new->user_ns, new->euid);
> >> >> +	kgid_t group = make_kgid(new->user_ns, new->egid);
> >> >> +
> >> >> +	/* The creator needs a mapping in the parent user namespace
> >> >> +	 * or else we won't be able to reasonably tell userspace who
> >> >> +	 * created a user_namespace.
> >> >> +	 */
> >> >> +	if (!kuid_has_mapping(parent_ns, owner) ||
> >> >> +	    !kgid_has_mapping(parent_ns, group))
> >> >> +		return -EPERM;
> >> >>  
> >> >>  	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
> >> >>  	if (!ns)
> >> >> @@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
> >> >>  
> >> >>  	/* set the new root user in the credentials under preparation */
> >> >>  	ns->parent = parent_ns;
> >> >
> >> > I think in the past the creator cred pinned the ns->parent.  Do you now
> >> > need to explicitly pin ns->parent (and release it in free_user_ns())?
> >> 
> >> Yes we do have to explicitly reference count the parent namespace.
> >> But that happened in the patch 7:
> >> "userns: Add an explicit reference to the parent user namespace"
> 
> Make that patch 8 not patch 7: 
> "userns: Add an explicit reference to the parent user namespace"
> Perhaps the patch number reference pointed you to look at the wrong code.

D'oh, yup.  That explains it better.

And so parent_userns keeps the refcount from the cred 'new' after
new->ns = ns;  That works, thanks.

> > Perhaps that suffices, but I'm not convinced.  The struct cred is
> > pinning it's own ns, but if t1 does clone(CLONE_NEWUSER) to produce
> > t2, which does the same to procduce t3, and then t2 exits, I'm not
> > seeing what will pin t2's userns.
> 
> t3's userns hold's a reference to the departed t2's userns.
> t2's userns hold's a reference to t1's userns.
> 
> free_user_ns does put that userns reference.
> 
> It is all there and explict.  Usernamespaces refer directly to each

Actually can we make it just one tinge more explicit, and put a comment
above the 'new->user_ns = ns'?  There's currently the comment

   /* Leave the reference to our user_ns with the new cred */

But that's about the initial refcount on the new ns.  Perhaps change that to:

   /*
    * Leave the reference to our new user_ns with the new cred,
    * and leave the reference on the old ns to pin new->parent_ns
    */

> other.  That was all needed to get struct user out of the usernamespace
> game.
> 
> Eric

Thanks, Eric.  So then

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

which, given the other nits are addressed, should cover the whole
set with my acks.

thanks,
-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
@ 2012-04-24 20:23                     ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-24 20:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> 
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> "Serge E. Hallyn" <serge@hallyn.com> writes:
> >> 
> >> > Quoting Eric W. Beiderman (ebiederm@xmission.com):
> >> >> From: Eric W. Biederman <ebiederm@xmission.com>
> >> >> 
> >> >> - Transform userns->creator from a user_struct reference to a simple
> >> >>   kuid_t, kgid_t pair.
> >> >> 
> >> >>   In cap_capable this allows the check to see if we are the creator of
> >> >>   a namespace to become the classic suser style euid permission check.
> >> >> 
> >> >>   This allows us to remove the need for a struct cred in the mapping
> >> >>   functions and still be able to dispaly the user namespace creators
> >> >>   uid and gid as 0.
> >> >> 
> >> >> - Remove the now unnecessary delayed_work in free_user_ns.
> >> >> 
> >> >>   All that is left for free_user_ns to do is to call kmem_cache_free
> >> >>   and put_user_ns.  Those functions can be called in any context
> >> >>   so call them directly from free_user_ns removing the need for delayed work.
> >> >> 
> >> >> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> >> >> ---
> >> >>  include/linux/user_namespace.h |    4 ++--
> >> >>  kernel/user.c                  |    7 ++++---
> >> >>  kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
> >> >>  security/commoncap.c           |    5 +++--
> >> >>  4 files changed, 27 insertions(+), 28 deletions(-)
> >> >> 
> >> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> >> >> index d767508..8a391bd 100644
> >> >> --- a/include/linux/user_namespace.h
> >> >> +++ b/include/linux/user_namespace.h
> >> >> @@ -9,8 +9,8 @@
> >> >>  struct user_namespace {
> >> >>  	struct kref		kref;
> >> >>  	struct user_namespace	*parent;
> >> >> -	struct user_struct	*creator;
> >> >> -	struct work_struct	destroyer;
> >> >> +	kuid_t			owner;
> >> >> +	kgid_t			group;
> >> >>  };
> >> >>  
> >> >>  extern struct user_namespace init_user_ns;
> >> >> diff --git a/kernel/user.c b/kernel/user.c
> >> >> index 025077e..cff3856 100644
> >> >> --- a/kernel/user.c
> >> >> +++ b/kernel/user.c
> >> >> @@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
> >> >>  	.kref = {
> >> >>  		.refcount	= ATOMIC_INIT(3),
> >> >>  	},
> >> >> -	.creator = &root_user,
> >> >> +	.owner = GLOBAL_ROOT_UID,
> >> >> +	.group = GLOBAL_ROOT_GID,
> >> >>  };
> >> >>  EXPORT_SYMBOL_GPL(init_user_ns);
> >> >>  
> >> >> @@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
> >> >>   */
> >> >>  static DEFINE_SPINLOCK(uidhash_lock);
> >> >>  
> >> >> -/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
> >> >> +/* root_user.__count is 1, for init task cred */
> >> >>  struct user_struct root_user = {
> >> >> -	.__count	= ATOMIC_INIT(2),
> >> >> +	.__count	= ATOMIC_INIT(1),
> >> >>  	.processes	= ATOMIC_INIT(1),
> >> >>  	.files		= ATOMIC_INIT(0),
> >> >>  	.sigpending	= ATOMIC_INIT(0),
> >> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> >> >> index 898e973..f69741a 100644
> >> >> --- a/kernel/user_namespace.c
> >> >> +++ b/kernel/user_namespace.c
> >> >> @@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
> >> >>  {
> >> >>  	struct user_namespace *ns, *parent_ns = new->user_ns;
> >> >>  	struct user_struct *root_user;
> >> >> +	kuid_t owner = make_kuid(new->user_ns, new->euid);
> >> >> +	kgid_t group = make_kgid(new->user_ns, new->egid);
> >> >> +
> >> >> +	/* The creator needs a mapping in the parent user namespace
> >> >> +	 * or else we won't be able to reasonably tell userspace who
> >> >> +	 * created a user_namespace.
> >> >> +	 */
> >> >> +	if (!kuid_has_mapping(parent_ns, owner) ||
> >> >> +	    !kgid_has_mapping(parent_ns, group))
> >> >> +		return -EPERM;
> >> >>  
> >> >>  	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
> >> >>  	if (!ns)
> >> >> @@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
> >> >>  
> >> >>  	/* set the new root user in the credentials under preparation */
> >> >>  	ns->parent = parent_ns;
> >> >
> >> > I think in the past the creator cred pinned the ns->parent.  Do you now
> >> > need to explicitly pin ns->parent (and release it in free_user_ns())?
> >> 
> >> Yes we do have to explicitly reference count the parent namespace.
> >> But that happened in the patch 7:
> >> "userns: Add an explicit reference to the parent user namespace"
> 
> Make that patch 8 not patch 7: 
> "userns: Add an explicit reference to the parent user namespace"
> Perhaps the patch number reference pointed you to look at the wrong code.

D'oh, yup.  That explains it better.

And so parent_userns keeps the refcount from the cred 'new' after
new->ns = ns;  That works, thanks.

> > Perhaps that suffices, but I'm not convinced.  The struct cred is
> > pinning it's own ns, but if t1 does clone(CLONE_NEWUSER) to produce
> > t2, which does the same to procduce t3, and then t2 exits, I'm not
> > seeing what will pin t2's userns.
> 
> t3's userns hold's a reference to the departed t2's userns.
> t2's userns hold's a reference to t1's userns.
> 
> free_user_ns does put that userns reference.
> 
> It is all there and explict.  Usernamespaces refer directly to each

Actually can we make it just one tinge more explicit, and put a comment
above the 'new->user_ns = ns'?  There's currently the comment

   /* Leave the reference to our user_ns with the new cred */

But that's about the initial refcount on the new ns.  Perhaps change that to:

   /*
    * Leave the reference to our new user_ns with the new cred,
    * and leave the reference on the old ns to pin new->parent_ns
    */

> other.  That was all needed to get struct user out of the usernamespace
> game.
> 
> Eric

Thanks, Eric.  So then

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

which, given the other nits are addressed, should cover the whole
set with my acks.

thanks,
-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  2012-04-21  0:58           ` Eric W. Biederman
@ 2012-04-26  0:11               ` Serge E. Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-26  0:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> 
> > Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> >> 
> >> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> >> ---
> >>  fs/attr.c                |    8 ++++----
> >>  fs/exec.c                |   10 +++++-----
> >>  fs/fcntl.c               |    6 +++---
> >>  fs/ioprio.c              |    4 ++--
> >>  fs/locks.c               |    2 +-
> >>  fs/namei.c               |    8 ++++----
> >>  include/linux/quotaops.h |    4 ++--
> >>  7 files changed, 21 insertions(+), 21 deletions(-)
> >> 
> 
> >> @@ -2120,7 +2120,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
> >>  	if (__get_dumpable(cprm.mm_flags) == 2) {
> >>  		/* Setuid core dump mode */
> >>  		flag = O_EXCL;		/* Stop rewrite attacks */
> >> -		cred->fsuid = 0;	/* Dump root private */
> >> +		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
> >
> > Sorry, one more - can this be the per-ns root uid?  The coredumps should
> > be ok to belong to privileged users in the namespace right?
> 
> I'm not certain it was clear when you were looking at this that
> this is about dumping core from suid applications, not normal
> applications. 
> 
>  Looking at the code in commoncap and commit_creds it looks like it is a
> bug that we don't call set_dumpable(new, suid_dumpable) in common cap
> when we use file capabilities.  I might be wrong but I think we escape

We do, check kernel/cred.c:commit_creds().  So long as the new permitted
set is not a subset of the old one.

Tested it to make absolutely sure.  When I add file capabilities to a
program that otherwise dumps core (int *x = 0; *x = 0;), core dumps are
no longer generated.

-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
@ 2012-04-26  0:11               ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-26  0:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> 
> > Quoting Eric W. Beiderman (ebiederm@xmission.com):
> >> From: Eric W. Biederman <ebiederm@xmission.com>
> >> 
> >> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> >> ---
> >>  fs/attr.c                |    8 ++++----
> >>  fs/exec.c                |   10 +++++-----
> >>  fs/fcntl.c               |    6 +++---
> >>  fs/ioprio.c              |    4 ++--
> >>  fs/locks.c               |    2 +-
> >>  fs/namei.c               |    8 ++++----
> >>  include/linux/quotaops.h |    4 ++--
> >>  7 files changed, 21 insertions(+), 21 deletions(-)
> >> 
> 
> >> @@ -2120,7 +2120,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
> >>  	if (__get_dumpable(cprm.mm_flags) == 2) {
> >>  		/* Setuid core dump mode */
> >>  		flag = O_EXCL;		/* Stop rewrite attacks */
> >> -		cred->fsuid = 0;	/* Dump root private */
> >> +		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
> >
> > Sorry, one more - can this be the per-ns root uid?  The coredumps should
> > be ok to belong to privileged users in the namespace right?
> 
> I'm not certain it was clear when you were looking at this that
> this is about dumping core from suid applications, not normal
> applications. 
> 
>  Looking at the code in commoncap and commit_creds it looks like it is a
> bug that we don't call set_dumpable(new, suid_dumpable) in common cap
> when we use file capabilities.  I might be wrong but I think we escape

We do, check kernel/cred.c:commit_creds().  So long as the new permitted
set is not a subset of the old one.

Tested it to make absolutely sure.  When I add file capabilities to a
program that otherwise dumps core (int *x = 0; *x = 0;), core dumps are
no longer generated.

-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  2012-04-26  0:11               ` Serge E. Hallyn
@ 2012-04-26  5:33                   ` Eric W. Biederman
  -1 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-26  5:33 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
>> 
>> > Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> >> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> >> 
>> >> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> >> ---
>> >>  fs/attr.c                |    8 ++++----
>> >>  fs/exec.c                |   10 +++++-----
>> >>  fs/fcntl.c               |    6 +++---
>> >>  fs/ioprio.c              |    4 ++--
>> >>  fs/locks.c               |    2 +-
>> >>  fs/namei.c               |    8 ++++----
>> >>  include/linux/quotaops.h |    4 ++--
>> >>  7 files changed, 21 insertions(+), 21 deletions(-)
>> >> 
>> 
>> >> @@ -2120,7 +2120,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
>> >>  	if (__get_dumpable(cprm.mm_flags) == 2) {
>> >>  		/* Setuid core dump mode */
>> >>  		flag = O_EXCL;		/* Stop rewrite attacks */
>> >> -		cred->fsuid = 0;	/* Dump root private */
>> >> +		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
>> >
>> > Sorry, one more - can this be the per-ns root uid?  The coredumps should
>> > be ok to belong to privileged users in the namespace right?
>> 
>> I'm not certain it was clear when you were looking at this that
>> this is about dumping core from suid applications, not normal
>> applications. 
>> 
>>  Looking at the code in commoncap and commit_creds it looks like it is a
>> bug that we don't call set_dumpable(new, suid_dumpable) in common cap
>> when we use file capabilities.  I might be wrong but I think we escape
>
> We do, check kernel/cred.c:commit_creds().  So long as the new permitted
> set is not a subset of the old one.
>
> Tested it to make absolutely sure.  When I add file capabilities to a
> program that otherwise dumps core (int *x = 0; *x = 0;), core dumps are
> no longer generated.

Thanks for testing.  Just reading through I was not certain if we had
the change in creds that commit_creds needed to trigger the set_dumpable
logic.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
@ 2012-04-26  5:33                   ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-26  5:33 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> "Serge E. Hallyn" <serge@hallyn.com> writes:
>> 
>> > Quoting Eric W. Beiderman (ebiederm@xmission.com):
>> >> From: Eric W. Biederman <ebiederm@xmission.com>
>> >> 
>> >> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> >> ---
>> >>  fs/attr.c                |    8 ++++----
>> >>  fs/exec.c                |   10 +++++-----
>> >>  fs/fcntl.c               |    6 +++---
>> >>  fs/ioprio.c              |    4 ++--
>> >>  fs/locks.c               |    2 +-
>> >>  fs/namei.c               |    8 ++++----
>> >>  include/linux/quotaops.h |    4 ++--
>> >>  7 files changed, 21 insertions(+), 21 deletions(-)
>> >> 
>> 
>> >> @@ -2120,7 +2120,7 @@ void do_coredump(long signr, int exit_code, struct pt_regs *regs)
>> >>  	if (__get_dumpable(cprm.mm_flags) == 2) {
>> >>  		/* Setuid core dump mode */
>> >>  		flag = O_EXCL;		/* Stop rewrite attacks */
>> >> -		cred->fsuid = 0;	/* Dump root private */
>> >> +		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
>> >
>> > Sorry, one more - can this be the per-ns root uid?  The coredumps should
>> > be ok to belong to privileged users in the namespace right?
>> 
>> I'm not certain it was clear when you were looking at this that
>> this is about dumping core from suid applications, not normal
>> applications. 
>> 
>>  Looking at the code in commoncap and commit_creds it looks like it is a
>> bug that we don't call set_dumpable(new, suid_dumpable) in common cap
>> when we use file capabilities.  I might be wrong but I think we escape
>
> We do, check kernel/cred.c:commit_creds().  So long as the new permitted
> set is not a subset of the old one.
>
> Tested it to make absolutely sure.  When I add file capabilities to a
> program that otherwise dumps core (int *x = 0; *x = 0;), core dumps are
> no longer generated.

Thanks for testing.  Just reading through I was not certain if we had
the change in creds that commit_creds needed to trigger the set_dumpable
logic.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
       [not found]                     ` <20120424202301.GA11326-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2012-04-26  9:09                       ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-26  9:09 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
>> 
>> > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> >> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
>> >> 
>> >> > Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> >> >> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> >> >> 
>> >> >> - Transform userns->creator from a user_struct reference to a simple
>> >> >>   kuid_t, kgid_t pair.
>> >> >> 
>> >> >>   In cap_capable this allows the check to see if we are the creator of
>> >> >>   a namespace to become the classic suser style euid permission check.
>> >> >> 
>> >> >>   This allows us to remove the need for a struct cred in the mapping
>> >> >>   functions and still be able to dispaly the user namespace creators
>> >> >>   uid and gid as 0.
>> >> >> 
>> >> >> - Remove the now unnecessary delayed_work in free_user_ns.
>> >> >> 
>> >> >>   All that is left for free_user_ns to do is to call kmem_cache_free
>> >> >>   and put_user_ns.  Those functions can be called in any context
>> >> >>   so call them directly from free_user_ns removing the need for delayed work.
>> >> >> 
>> >> >> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> >> >> ---
>> >> >>  include/linux/user_namespace.h |    4 ++--
>> >> >>  kernel/user.c                  |    7 ++++---
>> >> >>  kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
>> >> >>  security/commoncap.c           |    5 +++--
>> >> >>  4 files changed, 27 insertions(+), 28 deletions(-)
>> >> >> 
>> >> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> >> >> index d767508..8a391bd 100644
>> >> >> --- a/include/linux/user_namespace.h
>> >> >> +++ b/include/linux/user_namespace.h
>> >> >> @@ -9,8 +9,8 @@
>> >> >>  struct user_namespace {
>> >> >>  	struct kref		kref;
>> >> >>  	struct user_namespace	*parent;
>> >> >> -	struct user_struct	*creator;
>> >> >> -	struct work_struct	destroyer;
>> >> >> +	kuid_t			owner;
>> >> >> +	kgid_t			group;
>> >> >>  };
>> >> >>  
>> >> >>  extern struct user_namespace init_user_ns;
>> >> >> diff --git a/kernel/user.c b/kernel/user.c
>> >> >> index 025077e..cff3856 100644
>> >> >> --- a/kernel/user.c
>> >> >> +++ b/kernel/user.c
>> >> >> @@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
>> >> >>  	.kref = {
>> >> >>  		.refcount	= ATOMIC_INIT(3),
>> >> >>  	},
>> >> >> -	.creator = &root_user,
>> >> >> +	.owner = GLOBAL_ROOT_UID,
>> >> >> +	.group = GLOBAL_ROOT_GID,
>> >> >>  };
>> >> >>  EXPORT_SYMBOL_GPL(init_user_ns);
>> >> >>  
>> >> >> @@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
>> >> >>   */
>> >> >>  static DEFINE_SPINLOCK(uidhash_lock);
>> >> >>  
>> >> >> -/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
>> >> >> +/* root_user.__count is 1, for init task cred */
>> >> >>  struct user_struct root_user = {
>> >> >> -	.__count	= ATOMIC_INIT(2),
>> >> >> +	.__count	= ATOMIC_INIT(1),
>> >> >>  	.processes	= ATOMIC_INIT(1),
>> >> >>  	.files		= ATOMIC_INIT(0),
>> >> >>  	.sigpending	= ATOMIC_INIT(0),
>> >> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> >> >> index 898e973..f69741a 100644
>> >> >> --- a/kernel/user_namespace.c
>> >> >> +++ b/kernel/user_namespace.c
>> >> >> @@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
>> >> >>  {
>> >> >>  	struct user_namespace *ns, *parent_ns = new->user_ns;
>> >> >>  	struct user_struct *root_user;
>> >> >> +	kuid_t owner = make_kuid(new->user_ns, new->euid);
>> >> >> +	kgid_t group = make_kgid(new->user_ns, new->egid);
>> >> >> +
>> >> >> +	/* The creator needs a mapping in the parent user namespace
>> >> >> +	 * or else we won't be able to reasonably tell userspace who
>> >> >> +	 * created a user_namespace.
>> >> >> +	 */
>> >> >> +	if (!kuid_has_mapping(parent_ns, owner) ||
>> >> >> +	    !kgid_has_mapping(parent_ns, group))
>> >> >> +		return -EPERM;
>> >> >>  
>> >> >>  	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
>> >> >>  	if (!ns)
>> >> >> @@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
>> >> >>  
>> >> >>  	/* set the new root user in the credentials under preparation */
>> >> >>  	ns->parent = parent_ns;
>> >> >
>> >> > I think in the past the creator cred pinned the ns->parent.  Do you now
>> >> > need to explicitly pin ns->parent (and release it in free_user_ns())?
>> >> 
>> >> Yes we do have to explicitly reference count the parent namespace.
>> >> But that happened in the patch 7:
>> >> "userns: Add an explicit reference to the parent user namespace"
>> 
>> Make that patch 8 not patch 7: 
>> "userns: Add an explicit reference to the parent user namespace"
>> Perhaps the patch number reference pointed you to look at the wrong code.
>
> D'oh, yup.  That explains it better.
>
> And so parent_userns keeps the refcount from the cred 'new' after
> new->ns = ns;  That works, thanks.
>
>> > Perhaps that suffices, but I'm not convinced.  The struct cred is
>> > pinning it's own ns, but if t1 does clone(CLONE_NEWUSER) to produce
>> > t2, which does the same to procduce t3, and then t2 exits, I'm not
>> > seeing what will pin t2's userns.
>> 
>> t3's userns hold's a reference to the departed t2's userns.
>> t2's userns hold's a reference to t1's userns.
>> 
>> free_user_ns does put that userns reference.
>> 
>> It is all there and explict.  Usernamespaces refer directly to each
>
> Actually can we make it just one tinge more explicit, and put a comment
> above the 'new->user_ns = ns'?  There's currently the comment
>
>    /* Leave the reference to our user_ns with the new cred */
>
> But that's about the initial refcount on the new ns.  Perhaps change that to:
>
>    /*
>     * Leave the reference to our new user_ns with the new cred,
>     * and leave the reference on the old ns to pin new->parent_ns
>     */

I have added the following comment.  Hopefully that makes it clearer. */

	/* Leave the new->user_ns reference with the new user namespace. */
>
>> other.  That was all needed to get struct user out of the usernamespace
>> game.
>> 
>> Eric
>
> Thanks, Eric.  So then
>
> Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
>
> which, given the other nits are addressed, should cover the whole
> set with my acks.

Hmm.  I don't have a record of you looking at my patch 23.
"userns: Convert setting and getting uid and gid system calls to use kuid and kgid"

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
  2012-04-24 20:23                     ` Serge E. Hallyn
  (?)
@ 2012-04-26  9:09                     ` Eric W. Biederman
       [not found]                       ` <m1ehradfl3.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  -1 siblings, 1 reply; 227+ messages in thread
From: Eric W. Biederman @ 2012-04-26  9:09 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

"Serge E. Hallyn" <serge@hallyn.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> "Serge E. Hallyn" <serge@hallyn.com> writes:
>> 
>> > Quoting Eric W. Biederman (ebiederm@xmission.com):
>> >> "Serge E. Hallyn" <serge@hallyn.com> writes:
>> >> 
>> >> > Quoting Eric W. Beiderman (ebiederm@xmission.com):
>> >> >> From: Eric W. Biederman <ebiederm@xmission.com>
>> >> >> 
>> >> >> - Transform userns->creator from a user_struct reference to a simple
>> >> >>   kuid_t, kgid_t pair.
>> >> >> 
>> >> >>   In cap_capable this allows the check to see if we are the creator of
>> >> >>   a namespace to become the classic suser style euid permission check.
>> >> >> 
>> >> >>   This allows us to remove the need for a struct cred in the mapping
>> >> >>   functions and still be able to dispaly the user namespace creators
>> >> >>   uid and gid as 0.
>> >> >> 
>> >> >> - Remove the now unnecessary delayed_work in free_user_ns.
>> >> >> 
>> >> >>   All that is left for free_user_ns to do is to call kmem_cache_free
>> >> >>   and put_user_ns.  Those functions can be called in any context
>> >> >>   so call them directly from free_user_ns removing the need for delayed work.
>> >> >> 
>> >> >> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> >> >> ---
>> >> >>  include/linux/user_namespace.h |    4 ++--
>> >> >>  kernel/user.c                  |    7 ++++---
>> >> >>  kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
>> >> >>  security/commoncap.c           |    5 +++--
>> >> >>  4 files changed, 27 insertions(+), 28 deletions(-)
>> >> >> 
>> >> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> >> >> index d767508..8a391bd 100644
>> >> >> --- a/include/linux/user_namespace.h
>> >> >> +++ b/include/linux/user_namespace.h
>> >> >> @@ -9,8 +9,8 @@
>> >> >>  struct user_namespace {
>> >> >>  	struct kref		kref;
>> >> >>  	struct user_namespace	*parent;
>> >> >> -	struct user_struct	*creator;
>> >> >> -	struct work_struct	destroyer;
>> >> >> +	kuid_t			owner;
>> >> >> +	kgid_t			group;
>> >> >>  };
>> >> >>  
>> >> >>  extern struct user_namespace init_user_ns;
>> >> >> diff --git a/kernel/user.c b/kernel/user.c
>> >> >> index 025077e..cff3856 100644
>> >> >> --- a/kernel/user.c
>> >> >> +++ b/kernel/user.c
>> >> >> @@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
>> >> >>  	.kref = {
>> >> >>  		.refcount	= ATOMIC_INIT(3),
>> >> >>  	},
>> >> >> -	.creator = &root_user,
>> >> >> +	.owner = GLOBAL_ROOT_UID,
>> >> >> +	.group = GLOBAL_ROOT_GID,
>> >> >>  };
>> >> >>  EXPORT_SYMBOL_GPL(init_user_ns);
>> >> >>  
>> >> >> @@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
>> >> >>   */
>> >> >>  static DEFINE_SPINLOCK(uidhash_lock);
>> >> >>  
>> >> >> -/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
>> >> >> +/* root_user.__count is 1, for init task cred */
>> >> >>  struct user_struct root_user = {
>> >> >> -	.__count	= ATOMIC_INIT(2),
>> >> >> +	.__count	= ATOMIC_INIT(1),
>> >> >>  	.processes	= ATOMIC_INIT(1),
>> >> >>  	.files		= ATOMIC_INIT(0),
>> >> >>  	.sigpending	= ATOMIC_INIT(0),
>> >> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> >> >> index 898e973..f69741a 100644
>> >> >> --- a/kernel/user_namespace.c
>> >> >> +++ b/kernel/user_namespace.c
>> >> >> @@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
>> >> >>  {
>> >> >>  	struct user_namespace *ns, *parent_ns = new->user_ns;
>> >> >>  	struct user_struct *root_user;
>> >> >> +	kuid_t owner = make_kuid(new->user_ns, new->euid);
>> >> >> +	kgid_t group = make_kgid(new->user_ns, new->egid);
>> >> >> +
>> >> >> +	/* The creator needs a mapping in the parent user namespace
>> >> >> +	 * or else we won't be able to reasonably tell userspace who
>> >> >> +	 * created a user_namespace.
>> >> >> +	 */
>> >> >> +	if (!kuid_has_mapping(parent_ns, owner) ||
>> >> >> +	    !kgid_has_mapping(parent_ns, group))
>> >> >> +		return -EPERM;
>> >> >>  
>> >> >>  	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
>> >> >>  	if (!ns)
>> >> >> @@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
>> >> >>  
>> >> >>  	/* set the new root user in the credentials under preparation */
>> >> >>  	ns->parent = parent_ns;
>> >> >
>> >> > I think in the past the creator cred pinned the ns->parent.  Do you now
>> >> > need to explicitly pin ns->parent (and release it in free_user_ns())?
>> >> 
>> >> Yes we do have to explicitly reference count the parent namespace.
>> >> But that happened in the patch 7:
>> >> "userns: Add an explicit reference to the parent user namespace"
>> 
>> Make that patch 8 not patch 7: 
>> "userns: Add an explicit reference to the parent user namespace"
>> Perhaps the patch number reference pointed you to look at the wrong code.
>
> D'oh, yup.  That explains it better.
>
> And so parent_userns keeps the refcount from the cred 'new' after
> new->ns = ns;  That works, thanks.
>
>> > Perhaps that suffices, but I'm not convinced.  The struct cred is
>> > pinning it's own ns, but if t1 does clone(CLONE_NEWUSER) to produce
>> > t2, which does the same to procduce t3, and then t2 exits, I'm not
>> > seeing what will pin t2's userns.
>> 
>> t3's userns hold's a reference to the departed t2's userns.
>> t2's userns hold's a reference to t1's userns.
>> 
>> free_user_ns does put that userns reference.
>> 
>> It is all there and explict.  Usernamespaces refer directly to each
>
> Actually can we make it just one tinge more explicit, and put a comment
> above the 'new->user_ns = ns'?  There's currently the comment
>
>    /* Leave the reference to our user_ns with the new cred */
>
> But that's about the initial refcount on the new ns.  Perhaps change that to:
>
>    /*
>     * Leave the reference to our new user_ns with the new cred,
>     * and leave the reference on the old ns to pin new->parent_ns
>     */

I have added the following comment.  Hopefully that makes it clearer. */

	/* Leave the new->user_ns reference with the new user namespace. */
>
>> other.  That was all needed to get struct user out of the usernamespace
>> game.
>> 
>> Eric
>
> Thanks, Eric.  So then
>
> Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
>
> which, given the other nits are addressed, should cover the whole
> set with my acks.

Hmm.  I don't have a record of you looking at my patch 23.
"userns: Convert setting and getting uid and gid system calls to use kuid and kgid"

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 23/43] userns: Convert setting and getting uid and gid system calls to use kuid and kgid
  2012-04-08  5:15     ` "Eric W. Beiderman
@ 2012-04-26 16:20         ` Serge E. Hallyn
  -1 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-26 16:20 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> 
> Convert setregid, setgid, setreuid, setuid,
> setresuid, getresuid, setresgid, getresgid, setfsuid, setfsgid,
> getuid, geteuid, getgid, getegid,
> waitpid, waitid, wait4.
> 
> Convert userspace uids and gids into kuids and kgids before
> being placed on struct cred.  Convert struct cred kuids and
> kgids into userspace uids and gids when returning them.
> 
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

> ---
>  kernel/exit.c  |    6 +-
>  kernel/sys.c   |  216 ++++++++++++++++++++++++++++++++++++++-----------------
>  kernel/timer.c |    8 +-
>  kernel/uid16.c |   34 ++++++---
>  4 files changed, 178 insertions(+), 86 deletions(-)
> 
> diff --git a/kernel/exit.c b/kernel/exit.c
> index d8bd3b42..789e3c5 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1214,7 +1214,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
>  	unsigned long state;
>  	int retval, status, traced;
>  	pid_t pid = task_pid_vnr(p);
> -	uid_t uid = __task_cred(p)->uid;
> +	uid_t uid = from_kuid_munged(current_user_ns(), __task_cred(p)->uid);
>  	struct siginfo __user *infop;
>  
>  	if (!likely(wo->wo_flags & WEXITED))
> @@ -1427,7 +1427,7 @@ static int wait_task_stopped(struct wait_opts *wo,
>  	if (!unlikely(wo->wo_flags & WNOWAIT))
>  		*p_code = 0;
>  
> -	uid = task_uid(p);
> +	uid = from_kuid_munged(current_user_ns(), __task_cred(p)->uid);
>  unlock_sig:
>  	spin_unlock_irq(&p->sighand->siglock);
>  	if (!exit_code)
> @@ -1500,7 +1500,7 @@ static int wait_task_continued(struct wait_opts *wo, struct task_struct *p)
>  	}
>  	if (!unlikely(wo->wo_flags & WNOWAIT))
>  		p->signal->flags &= ~SIGNAL_STOP_CONTINUED;
> -	uid = task_uid(p);
> +	uid = from_kuid_munged(current_user_ns(), __task_cred(p)->uid);
>  	spin_unlock_irq(&p->sighand->siglock);
>  
>  	pid = task_pid_vnr(p);
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 3996281..aff09f2 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -555,9 +555,19 @@ void ctrl_alt_del(void)
>   */
>  SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
>  {
> +	struct user_namespace *ns = current_user_ns();
>  	const struct cred *old;
>  	struct cred *new;
>  	int retval;
> +	kgid_t krgid, kegid;
> +
> +	krgid = make_kgid(ns, rgid);
> +	kegid = make_kgid(ns, egid);
> +
> +	if ((rgid != (gid_t) -1) && !gid_valid(krgid))
> +		return -EINVAL;
> +	if ((egid != (gid_t) -1) && !gid_valid(kegid))
> +		return -EINVAL;
>  
>  	new = prepare_creds();
>  	if (!new)
> @@ -566,25 +576,25 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
>  
>  	retval = -EPERM;
>  	if (rgid != (gid_t) -1) {
> -		if (old->gid == rgid ||
> -		    old->egid == rgid ||
> +		if (gid_eq(old->gid, krgid) ||
> +		    gid_eq(old->egid, krgid) ||
>  		    nsown_capable(CAP_SETGID))
> -			new->gid = rgid;
> +			new->gid = krgid;
>  		else
>  			goto error;
>  	}
>  	if (egid != (gid_t) -1) {
> -		if (old->gid == egid ||
> -		    old->egid == egid ||
> -		    old->sgid == egid ||
> +		if (gid_eq(old->gid, kegid) ||
> +		    gid_eq(old->egid, kegid) ||
> +		    gid_eq(old->sgid, kegid) ||
>  		    nsown_capable(CAP_SETGID))
> -			new->egid = egid;
> +			new->egid = kegid;
>  		else
>  			goto error;
>  	}
>  
>  	if (rgid != (gid_t) -1 ||
> -	    (egid != (gid_t) -1 && egid != old->gid))
> +	    (egid != (gid_t) -1 && !gid_eq(kegid, old->gid)))
>  		new->sgid = new->egid;
>  	new->fsgid = new->egid;
>  
> @@ -602,9 +612,15 @@ error:
>   */
>  SYSCALL_DEFINE1(setgid, gid_t, gid)
>  {
> +	struct user_namespace *ns = current_user_ns();
>  	const struct cred *old;
>  	struct cred *new;
>  	int retval;
> +	kgid_t kgid;
> +
> +	kgid = make_kgid(ns, gid);
> +	if (!gid_valid(kgid))
> +		return -EINVAL;
>  
>  	new = prepare_creds();
>  	if (!new)
> @@ -613,9 +629,9 @@ SYSCALL_DEFINE1(setgid, gid_t, gid)
>  
>  	retval = -EPERM;
>  	if (nsown_capable(CAP_SETGID))
> -		new->gid = new->egid = new->sgid = new->fsgid = gid;
> -	else if (gid == old->gid || gid == old->sgid)
> -		new->egid = new->fsgid = gid;
> +		new->gid = new->egid = new->sgid = new->fsgid = kgid;
> +	else if (gid_eq(kgid, old->gid) || gid_eq(kgid, old->sgid))
> +		new->egid = new->fsgid = kgid;
>  	else
>  		goto error;
>  
> @@ -672,9 +688,19 @@ static int set_user(struct cred *new)
>   */
>  SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
>  {
> +	struct user_namespace *ns = current_user_ns();
>  	const struct cred *old;
>  	struct cred *new;
>  	int retval;
> +	kuid_t kruid, keuid;
> +
> +	kruid = make_kuid(ns, ruid);
> +	keuid = make_kuid(ns, euid);
> +
> +	if ((ruid != (uid_t) -1) && !uid_valid(kruid))
> +		return -EINVAL;
> +	if ((euid != (uid_t) -1) && !uid_valid(keuid))
> +		return -EINVAL;
>  
>  	new = prepare_creds();
>  	if (!new)
> @@ -683,29 +709,29 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
>  
>  	retval = -EPERM;
>  	if (ruid != (uid_t) -1) {
> -		new->uid = ruid;
> -		if (old->uid != ruid &&
> -		    old->euid != ruid &&
> +		new->uid = kruid;
> +		if (!uid_eq(old->uid, kruid) &&
> +		    !uid_eq(old->euid, kruid) &&
>  		    !nsown_capable(CAP_SETUID))
>  			goto error;
>  	}
>  
>  	if (euid != (uid_t) -1) {
> -		new->euid = euid;
> -		if (old->uid != euid &&
> -		    old->euid != euid &&
> -		    old->suid != euid &&
> +		new->euid = keuid;
> +		if (!uid_eq(old->uid, keuid) &&
> +		    !uid_eq(old->euid, keuid) &&
> +		    !uid_eq(old->suid, keuid) &&
>  		    !nsown_capable(CAP_SETUID))
>  			goto error;
>  	}
>  
> -	if (new->uid != old->uid) {
> +	if (!uid_eq(new->uid, old->uid)) {
>  		retval = set_user(new);
>  		if (retval < 0)
>  			goto error;
>  	}
>  	if (ruid != (uid_t) -1 ||
> -	    (euid != (uid_t) -1 && euid != old->uid))
> +	    (euid != (uid_t) -1 && !uid_eq(keuid, old->uid)))
>  		new->suid = new->euid;
>  	new->fsuid = new->euid;
>  
> @@ -733,9 +759,15 @@ error:
>   */
>  SYSCALL_DEFINE1(setuid, uid_t, uid)
>  {
> +	struct user_namespace *ns = current_user_ns();
>  	const struct cred *old;
>  	struct cred *new;
>  	int retval;
> +	kuid_t kuid;
> +
> +	kuid = make_kuid(ns, uid);
> +	if (!uid_valid(kuid))
> +		return -EINVAL;
>  
>  	new = prepare_creds();
>  	if (!new)
> @@ -744,17 +776,17 @@ SYSCALL_DEFINE1(setuid, uid_t, uid)
>  
>  	retval = -EPERM;
>  	if (nsown_capable(CAP_SETUID)) {
> -		new->suid = new->uid = uid;
> -		if (uid != old->uid) {
> +		new->suid = new->uid = kuid;
> +		if (!uid_eq(kuid, old->uid)) {
>  			retval = set_user(new);
>  			if (retval < 0)
>  				goto error;
>  		}
> -	} else if (uid != old->uid && uid != new->suid) {
> +	} else if (!uid_eq(kuid, old->uid) && !uid_eq(kuid, new->suid)) {
>  		goto error;
>  	}
>  
> -	new->fsuid = new->euid = uid;
> +	new->fsuid = new->euid = kuid;
>  
>  	retval = security_task_fix_setuid(new, old, LSM_SETID_ID);
>  	if (retval < 0)
> @@ -774,9 +806,24 @@ error:
>   */
>  SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
>  {
> +	struct user_namespace *ns = current_user_ns();
>  	const struct cred *old;
>  	struct cred *new;
>  	int retval;
> +	kuid_t kruid, keuid, ksuid;
> +
> +	kruid = make_kuid(ns, ruid);
> +	keuid = make_kuid(ns, euid);
> +	ksuid = make_kuid(ns, suid);
> +
> +	if ((ruid != (uid_t) -1) && !uid_valid(kruid))
> +		return -EINVAL;
> +
> +	if ((euid != (uid_t) -1) && !uid_valid(keuid))
> +		return -EINVAL;
> +
> +	if ((suid != (uid_t) -1) && !uid_valid(ksuid))
> +		return -EINVAL;
>  
>  	new = prepare_creds();
>  	if (!new)
> @@ -786,29 +833,29 @@ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
>  
>  	retval = -EPERM;
>  	if (!nsown_capable(CAP_SETUID)) {
> -		if (ruid != (uid_t) -1 && ruid != old->uid &&
> -		    ruid != old->euid  && ruid != old->suid)
> +		if (ruid != (uid_t) -1        && !uid_eq(kruid, old->uid) &&
> +		    !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
>  			goto error;
> -		if (euid != (uid_t) -1 && euid != old->uid &&
> -		    euid != old->euid  && euid != old->suid)
> +		if (euid != (uid_t) -1        && !uid_eq(keuid, old->uid) &&
> +		    !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid))
>  			goto error;
> -		if (suid != (uid_t) -1 && suid != old->uid &&
> -		    suid != old->euid  && suid != old->suid)
> +		if (suid != (uid_t) -1        && !uid_eq(ksuid, old->uid) &&
> +		    !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid))
>  			goto error;
>  	}
>  
>  	if (ruid != (uid_t) -1) {
> -		new->uid = ruid;
> -		if (ruid != old->uid) {
> +		new->uid = kruid;
> +		if (!uid_eq(kruid, old->uid)) {
>  			retval = set_user(new);
>  			if (retval < 0)
>  				goto error;
>  		}
>  	}
>  	if (euid != (uid_t) -1)
> -		new->euid = euid;
> +		new->euid = keuid;
>  	if (suid != (uid_t) -1)
> -		new->suid = suid;
> +		new->suid = ksuid;
>  	new->fsuid = new->euid;
>  
>  	retval = security_task_fix_setuid(new, old, LSM_SETID_RES);
> @@ -822,14 +869,19 @@ error:
>  	return retval;
>  }
>  
> -SYSCALL_DEFINE3(getresuid, uid_t __user *, ruid, uid_t __user *, euid, uid_t __user *, suid)
> +SYSCALL_DEFINE3(getresuid, uid_t __user *, ruidp, uid_t __user *, euidp, uid_t __user *, suidp)
>  {
>  	const struct cred *cred = current_cred();
>  	int retval;
> +	uid_t ruid, euid, suid;
> +
> +	ruid = from_kuid_munged(cred->user_ns, cred->uid);
> +	euid = from_kuid_munged(cred->user_ns, cred->euid);
> +	suid = from_kuid_munged(cred->user_ns, cred->suid);
>  
> -	if (!(retval   = put_user(cred->uid,  ruid)) &&
> -	    !(retval   = put_user(cred->euid, euid)))
> -		retval = put_user(cred->suid, suid);
> +	if (!(retval   = put_user(ruid, ruidp)) &&
> +	    !(retval   = put_user(euid, euidp)))
> +		retval = put_user(suid, suidp);
>  
>  	return retval;
>  }
> @@ -839,9 +891,22 @@ SYSCALL_DEFINE3(getresuid, uid_t __user *, ruid, uid_t __user *, euid, uid_t __u
>   */
>  SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
>  {
> +	struct user_namespace *ns = current_user_ns();
>  	const struct cred *old;
>  	struct cred *new;
>  	int retval;
> +	kgid_t krgid, kegid, ksgid;
> +
> +	krgid = make_kgid(ns, rgid);
> +	kegid = make_kgid(ns, egid);
> +	ksgid = make_kgid(ns, sgid);
> +
> +	if ((rgid != (gid_t) -1) && !gid_valid(krgid))
> +		return -EINVAL;
> +	if ((egid != (gid_t) -1) && !gid_valid(kegid))
> +		return -EINVAL;
> +	if ((sgid != (gid_t) -1) && !gid_valid(ksgid))
> +		return -EINVAL;
>  
>  	new = prepare_creds();
>  	if (!new)
> @@ -850,23 +915,23 @@ SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
>  
>  	retval = -EPERM;
>  	if (!nsown_capable(CAP_SETGID)) {
> -		if (rgid != (gid_t) -1 && rgid != old->gid &&
> -		    rgid != old->egid  && rgid != old->sgid)
> +		if (rgid != (gid_t) -1        && !gid_eq(krgid, old->gid) &&
> +		    !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
>  			goto error;
> -		if (egid != (gid_t) -1 && egid != old->gid &&
> -		    egid != old->egid  && egid != old->sgid)
> +		if (egid != (gid_t) -1        && !gid_eq(kegid, old->gid) &&
> +		    !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid))
>  			goto error;
> -		if (sgid != (gid_t) -1 && sgid != old->gid &&
> -		    sgid != old->egid  && sgid != old->sgid)
> +		if (sgid != (gid_t) -1        && !gid_eq(ksgid, old->gid) &&
> +		    !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid))
>  			goto error;
>  	}
>  
>  	if (rgid != (gid_t) -1)
> -		new->gid = rgid;
> +		new->gid = krgid;
>  	if (egid != (gid_t) -1)
> -		new->egid = egid;
> +		new->egid = kegid;
>  	if (sgid != (gid_t) -1)
> -		new->sgid = sgid;
> +		new->sgid = ksgid;
>  	new->fsgid = new->egid;
>  
>  	return commit_creds(new);
> @@ -876,14 +941,19 @@ error:
>  	return retval;
>  }
>  
> -SYSCALL_DEFINE3(getresgid, gid_t __user *, rgid, gid_t __user *, egid, gid_t __user *, sgid)
> +SYSCALL_DEFINE3(getresgid, gid_t __user *, rgidp, gid_t __user *, egidp, gid_t __user *, sgidp)
>  {
>  	const struct cred *cred = current_cred();
>  	int retval;
> +	gid_t rgid, egid, sgid;
> +
> +	rgid = from_kgid_munged(cred->user_ns, cred->gid);
> +	egid = from_kgid_munged(cred->user_ns, cred->egid);
> +	sgid = from_kgid_munged(cred->user_ns, cred->sgid);
>  
> -	if (!(retval   = put_user(cred->gid,  rgid)) &&
> -	    !(retval   = put_user(cred->egid, egid)))
> -		retval = put_user(cred->sgid, sgid);
> +	if (!(retval   = put_user(rgid, rgidp)) &&
> +	    !(retval   = put_user(egid, egidp)))
> +		retval = put_user(sgid, sgidp);
>  
>  	return retval;
>  }
> @@ -900,18 +970,24 @@ SYSCALL_DEFINE1(setfsuid, uid_t, uid)
>  	const struct cred *old;
>  	struct cred *new;
>  	uid_t old_fsuid;
> +	kuid_t kuid;
> +
> +	old = current_cred();
> +	old_fsuid = from_kuid_munged(old->user_ns, old->fsuid);
> +
> +	kuid = make_kuid(old->user_ns, uid);
> +	if (!uid_valid(kuid))
> +		return old_fsuid;
>  
>  	new = prepare_creds();
>  	if (!new)
> -		return current_fsuid();
> -	old = current_cred();
> -	old_fsuid = old->fsuid;
> +		return old_fsuid;
>  
> -	if (uid == old->uid  || uid == old->euid  ||
> -	    uid == old->suid || uid == old->fsuid ||
> +	if (uid_eq(kuid, old->uid)  || uid_eq(kuid, old->euid)  ||
> +	    uid_eq(kuid, old->suid) || uid_eq(kuid, old->fsuid) ||
>  	    nsown_capable(CAP_SETUID)) {
> -		if (uid != old_fsuid) {
> -			new->fsuid = uid;
> +		if (!uid_eq(kuid, old->fsuid)) {
> +			new->fsuid = kuid;
>  			if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
>  				goto change_okay;
>  		}
> @@ -933,18 +1009,24 @@ SYSCALL_DEFINE1(setfsgid, gid_t, gid)
>  	const struct cred *old;
>  	struct cred *new;
>  	gid_t old_fsgid;
> +	kgid_t kgid;
> +
> +	old = current_cred();
> +	old_fsgid = from_kgid_munged(old->user_ns, old->fsgid);
> +
> +	kgid = make_kgid(old->user_ns, gid);
> +	if (!gid_valid(kgid))
> +		return old_fsgid;
>  
>  	new = prepare_creds();
>  	if (!new)
> -		return current_fsgid();
> -	old = current_cred();
> -	old_fsgid = old->fsgid;
> +		return old_fsgid;
>  
> -	if (gid == old->gid  || gid == old->egid  ||
> -	    gid == old->sgid || gid == old->fsgid ||
> +	if (gid_eq(kgid, old->gid)  || gid_eq(kgid, old->egid)  ||
> +	    gid_eq(kgid, old->sgid) || gid_eq(kgid, old->fsgid) ||
>  	    nsown_capable(CAP_SETGID)) {
> -		if (gid != old_fsgid) {
> -			new->fsgid = gid;
> +		if (!gid_eq(kgid, old->fsgid)) {
> +			new->fsgid = kgid;
>  			goto change_okay;
>  		}
>  	}
> @@ -1503,10 +1585,10 @@ static int check_prlimit_permission(struct task_struct *task)
>  	if (cred->user_ns == tcred->user_ns &&
>  	    (cred->uid == tcred->euid &&
>  	     cred->uid == tcred->suid &&
> -	     cred->uid == tcred->uid  &&
> +	     cred->uid == tcred->uid &&
>  	     cred->gid == tcred->egid &&
>  	     cred->gid == tcred->sgid &&
> -	     cred->gid == tcred->gid))
> +		    cred->gid == tcred->gid))
>  		return 0;
>  	if (ns_capable(tcred->user_ns, CAP_SYS_RESOURCE))
>  		return 0;
> diff --git a/kernel/timer.c b/kernel/timer.c
> index a297ffc..67316cb 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1427,25 +1427,25 @@ SYSCALL_DEFINE0(getppid)
>  SYSCALL_DEFINE0(getuid)
>  {
>  	/* Only we change this so SMP safe */
> -	return current_uid();
> +	return from_kuid_munged(current_user_ns(), current_uid());
>  }
>  
>  SYSCALL_DEFINE0(geteuid)
>  {
>  	/* Only we change this so SMP safe */
> -	return current_euid();
> +	return from_kuid_munged(current_user_ns(), current_euid());
>  }
>  
>  SYSCALL_DEFINE0(getgid)
>  {
>  	/* Only we change this so SMP safe */
> -	return current_gid();
> +	return from_kgid_munged(current_user_ns(), current_gid());
>  }
>  
>  SYSCALL_DEFINE0(getegid)
>  {
>  	/* Only we change this so SMP safe */
> -	return  current_egid();
> +	return from_kgid_munged(current_user_ns(), current_egid());
>  }
>  
>  #endif
> diff --git a/kernel/uid16.c b/kernel/uid16.c
> index e530bc3..d7948eb 100644
> --- a/kernel/uid16.c
> +++ b/kernel/uid16.c
> @@ -81,14 +81,19 @@ SYSCALL_DEFINE3(setresuid16, old_uid_t, ruid, old_uid_t, euid, old_uid_t, suid)
>  	return ret;
>  }
>  
> -SYSCALL_DEFINE3(getresuid16, old_uid_t __user *, ruid, old_uid_t __user *, euid, old_uid_t __user *, suid)
> +SYSCALL_DEFINE3(getresuid16, old_uid_t __user *, ruidp, old_uid_t __user *, euidp, old_uid_t __user *, suidp)
>  {
>  	const struct cred *cred = current_cred();
>  	int retval;
> +	old_uid_t ruid, euid, suid;
>  
> -	if (!(retval   = put_user(high2lowuid(cred->uid),  ruid)) &&
> -	    !(retval   = put_user(high2lowuid(cred->euid), euid)))
> -		retval = put_user(high2lowuid(cred->suid), suid);
> +	ruid = high2lowuid(from_kuid_munged(cred->user_ns, cred->uid));
> +	euid = high2lowuid(from_kuid_munged(cred->user_ns, cred->euid));
> +	suid = high2lowuid(from_kuid_munged(cred->user_ns, cred->suid));
> +
> +	if (!(retval   = put_user(ruid, ruidp)) &&
> +	    !(retval   = put_user(euid, euidp)))
> +		retval = put_user(suid, suidp);
>  
>  	return retval;
>  }
> @@ -103,14 +108,19 @@ SYSCALL_DEFINE3(setresgid16, old_gid_t, rgid, old_gid_t, egid, old_gid_t, sgid)
>  }
>  
>  
> -SYSCALL_DEFINE3(getresgid16, old_gid_t __user *, rgid, old_gid_t __user *, egid, old_gid_t __user *, sgid)
> +SYSCALL_DEFINE3(getresgid16, old_gid_t __user *, rgidp, old_gid_t __user *, egidp, old_gid_t __user *, sgidp)
>  {
>  	const struct cred *cred = current_cred();
>  	int retval;
> +	old_gid_t rgid, egid, sgid;
> +
> +	rgid = high2lowgid(from_kgid_munged(cred->user_ns, cred->gid));
> +	egid = high2lowgid(from_kgid_munged(cred->user_ns, cred->egid));
> +	sgid = high2lowgid(from_kgid_munged(cred->user_ns, cred->sgid));
>  
> -	if (!(retval   = put_user(high2lowgid(cred->gid),  rgid)) &&
> -	    !(retval   = put_user(high2lowgid(cred->egid), egid)))
> -		retval = put_user(high2lowgid(cred->sgid), sgid);
> +	if (!(retval   = put_user(rgid, rgidp)) &&
> +	    !(retval   = put_user(egid, egidp)))
> +		retval = put_user(sgid, sgidp);
>  
>  	return retval;
>  }
> @@ -221,20 +231,20 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
>  
>  SYSCALL_DEFINE0(getuid16)
>  {
> -	return high2lowuid(current_uid());
> +	return high2lowuid(from_kuid_munged(current_user_ns(), current_uid()));
>  }
>  
>  SYSCALL_DEFINE0(geteuid16)
>  {
> -	return high2lowuid(current_euid());
> +	return high2lowuid(from_kuid_munged(current_user_ns(), current_euid()));
>  }
>  
>  SYSCALL_DEFINE0(getgid16)
>  {
> -	return high2lowgid(current_gid());
> +	return high2lowgid(from_kgid_munged(current_user_ns(), current_gid()));
>  }
>  
>  SYSCALL_DEFINE0(getegid16)
>  {
> -	return high2lowgid(current_egid());
> +	return high2lowgid(from_kgid_munged(current_user_ns(), current_egid()));
>  }
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 23/43] userns: Convert setting and getting uid and gid system calls to use kuid and kgid
@ 2012-04-26 16:20         ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-26 16:20 UTC (permalink / raw)
  To: Eric W. Beiderman
  Cc: linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Beiderman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> Convert setregid, setgid, setreuid, setuid,
> setresuid, getresuid, setresgid, getresgid, setfsuid, setfsgid,
> getuid, geteuid, getgid, getegid,
> waitpid, waitid, wait4.
> 
> Convert userspace uids and gids into kuids and kgids before
> being placed on struct cred.  Convert struct cred kuids and
> kgids into userspace uids and gids when returning them.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
>  kernel/exit.c  |    6 +-
>  kernel/sys.c   |  216 ++++++++++++++++++++++++++++++++++++++-----------------
>  kernel/timer.c |    8 +-
>  kernel/uid16.c |   34 ++++++---
>  4 files changed, 178 insertions(+), 86 deletions(-)
> 
> diff --git a/kernel/exit.c b/kernel/exit.c
> index d8bd3b42..789e3c5 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1214,7 +1214,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
>  	unsigned long state;
>  	int retval, status, traced;
>  	pid_t pid = task_pid_vnr(p);
> -	uid_t uid = __task_cred(p)->uid;
> +	uid_t uid = from_kuid_munged(current_user_ns(), __task_cred(p)->uid);
>  	struct siginfo __user *infop;
>  
>  	if (!likely(wo->wo_flags & WEXITED))
> @@ -1427,7 +1427,7 @@ static int wait_task_stopped(struct wait_opts *wo,
>  	if (!unlikely(wo->wo_flags & WNOWAIT))
>  		*p_code = 0;
>  
> -	uid = task_uid(p);
> +	uid = from_kuid_munged(current_user_ns(), __task_cred(p)->uid);
>  unlock_sig:
>  	spin_unlock_irq(&p->sighand->siglock);
>  	if (!exit_code)
> @@ -1500,7 +1500,7 @@ static int wait_task_continued(struct wait_opts *wo, struct task_struct *p)
>  	}
>  	if (!unlikely(wo->wo_flags & WNOWAIT))
>  		p->signal->flags &= ~SIGNAL_STOP_CONTINUED;
> -	uid = task_uid(p);
> +	uid = from_kuid_munged(current_user_ns(), __task_cred(p)->uid);
>  	spin_unlock_irq(&p->sighand->siglock);
>  
>  	pid = task_pid_vnr(p);
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 3996281..aff09f2 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -555,9 +555,19 @@ void ctrl_alt_del(void)
>   */
>  SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
>  {
> +	struct user_namespace *ns = current_user_ns();
>  	const struct cred *old;
>  	struct cred *new;
>  	int retval;
> +	kgid_t krgid, kegid;
> +
> +	krgid = make_kgid(ns, rgid);
> +	kegid = make_kgid(ns, egid);
> +
> +	if ((rgid != (gid_t) -1) && !gid_valid(krgid))
> +		return -EINVAL;
> +	if ((egid != (gid_t) -1) && !gid_valid(kegid))
> +		return -EINVAL;
>  
>  	new = prepare_creds();
>  	if (!new)
> @@ -566,25 +576,25 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
>  
>  	retval = -EPERM;
>  	if (rgid != (gid_t) -1) {
> -		if (old->gid == rgid ||
> -		    old->egid == rgid ||
> +		if (gid_eq(old->gid, krgid) ||
> +		    gid_eq(old->egid, krgid) ||
>  		    nsown_capable(CAP_SETGID))
> -			new->gid = rgid;
> +			new->gid = krgid;
>  		else
>  			goto error;
>  	}
>  	if (egid != (gid_t) -1) {
> -		if (old->gid == egid ||
> -		    old->egid == egid ||
> -		    old->sgid == egid ||
> +		if (gid_eq(old->gid, kegid) ||
> +		    gid_eq(old->egid, kegid) ||
> +		    gid_eq(old->sgid, kegid) ||
>  		    nsown_capable(CAP_SETGID))
> -			new->egid = egid;
> +			new->egid = kegid;
>  		else
>  			goto error;
>  	}
>  
>  	if (rgid != (gid_t) -1 ||
> -	    (egid != (gid_t) -1 && egid != old->gid))
> +	    (egid != (gid_t) -1 && !gid_eq(kegid, old->gid)))
>  		new->sgid = new->egid;
>  	new->fsgid = new->egid;
>  
> @@ -602,9 +612,15 @@ error:
>   */
>  SYSCALL_DEFINE1(setgid, gid_t, gid)
>  {
> +	struct user_namespace *ns = current_user_ns();
>  	const struct cred *old;
>  	struct cred *new;
>  	int retval;
> +	kgid_t kgid;
> +
> +	kgid = make_kgid(ns, gid);
> +	if (!gid_valid(kgid))
> +		return -EINVAL;
>  
>  	new = prepare_creds();
>  	if (!new)
> @@ -613,9 +629,9 @@ SYSCALL_DEFINE1(setgid, gid_t, gid)
>  
>  	retval = -EPERM;
>  	if (nsown_capable(CAP_SETGID))
> -		new->gid = new->egid = new->sgid = new->fsgid = gid;
> -	else if (gid == old->gid || gid == old->sgid)
> -		new->egid = new->fsgid = gid;
> +		new->gid = new->egid = new->sgid = new->fsgid = kgid;
> +	else if (gid_eq(kgid, old->gid) || gid_eq(kgid, old->sgid))
> +		new->egid = new->fsgid = kgid;
>  	else
>  		goto error;
>  
> @@ -672,9 +688,19 @@ static int set_user(struct cred *new)
>   */
>  SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
>  {
> +	struct user_namespace *ns = current_user_ns();
>  	const struct cred *old;
>  	struct cred *new;
>  	int retval;
> +	kuid_t kruid, keuid;
> +
> +	kruid = make_kuid(ns, ruid);
> +	keuid = make_kuid(ns, euid);
> +
> +	if ((ruid != (uid_t) -1) && !uid_valid(kruid))
> +		return -EINVAL;
> +	if ((euid != (uid_t) -1) && !uid_valid(keuid))
> +		return -EINVAL;
>  
>  	new = prepare_creds();
>  	if (!new)
> @@ -683,29 +709,29 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
>  
>  	retval = -EPERM;
>  	if (ruid != (uid_t) -1) {
> -		new->uid = ruid;
> -		if (old->uid != ruid &&
> -		    old->euid != ruid &&
> +		new->uid = kruid;
> +		if (!uid_eq(old->uid, kruid) &&
> +		    !uid_eq(old->euid, kruid) &&
>  		    !nsown_capable(CAP_SETUID))
>  			goto error;
>  	}
>  
>  	if (euid != (uid_t) -1) {
> -		new->euid = euid;
> -		if (old->uid != euid &&
> -		    old->euid != euid &&
> -		    old->suid != euid &&
> +		new->euid = keuid;
> +		if (!uid_eq(old->uid, keuid) &&
> +		    !uid_eq(old->euid, keuid) &&
> +		    !uid_eq(old->suid, keuid) &&
>  		    !nsown_capable(CAP_SETUID))
>  			goto error;
>  	}
>  
> -	if (new->uid != old->uid) {
> +	if (!uid_eq(new->uid, old->uid)) {
>  		retval = set_user(new);
>  		if (retval < 0)
>  			goto error;
>  	}
>  	if (ruid != (uid_t) -1 ||
> -	    (euid != (uid_t) -1 && euid != old->uid))
> +	    (euid != (uid_t) -1 && !uid_eq(keuid, old->uid)))
>  		new->suid = new->euid;
>  	new->fsuid = new->euid;
>  
> @@ -733,9 +759,15 @@ error:
>   */
>  SYSCALL_DEFINE1(setuid, uid_t, uid)
>  {
> +	struct user_namespace *ns = current_user_ns();
>  	const struct cred *old;
>  	struct cred *new;
>  	int retval;
> +	kuid_t kuid;
> +
> +	kuid = make_kuid(ns, uid);
> +	if (!uid_valid(kuid))
> +		return -EINVAL;
>  
>  	new = prepare_creds();
>  	if (!new)
> @@ -744,17 +776,17 @@ SYSCALL_DEFINE1(setuid, uid_t, uid)
>  
>  	retval = -EPERM;
>  	if (nsown_capable(CAP_SETUID)) {
> -		new->suid = new->uid = uid;
> -		if (uid != old->uid) {
> +		new->suid = new->uid = kuid;
> +		if (!uid_eq(kuid, old->uid)) {
>  			retval = set_user(new);
>  			if (retval < 0)
>  				goto error;
>  		}
> -	} else if (uid != old->uid && uid != new->suid) {
> +	} else if (!uid_eq(kuid, old->uid) && !uid_eq(kuid, new->suid)) {
>  		goto error;
>  	}
>  
> -	new->fsuid = new->euid = uid;
> +	new->fsuid = new->euid = kuid;
>  
>  	retval = security_task_fix_setuid(new, old, LSM_SETID_ID);
>  	if (retval < 0)
> @@ -774,9 +806,24 @@ error:
>   */
>  SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
>  {
> +	struct user_namespace *ns = current_user_ns();
>  	const struct cred *old;
>  	struct cred *new;
>  	int retval;
> +	kuid_t kruid, keuid, ksuid;
> +
> +	kruid = make_kuid(ns, ruid);
> +	keuid = make_kuid(ns, euid);
> +	ksuid = make_kuid(ns, suid);
> +
> +	if ((ruid != (uid_t) -1) && !uid_valid(kruid))
> +		return -EINVAL;
> +
> +	if ((euid != (uid_t) -1) && !uid_valid(keuid))
> +		return -EINVAL;
> +
> +	if ((suid != (uid_t) -1) && !uid_valid(ksuid))
> +		return -EINVAL;
>  
>  	new = prepare_creds();
>  	if (!new)
> @@ -786,29 +833,29 @@ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
>  
>  	retval = -EPERM;
>  	if (!nsown_capable(CAP_SETUID)) {
> -		if (ruid != (uid_t) -1 && ruid != old->uid &&
> -		    ruid != old->euid  && ruid != old->suid)
> +		if (ruid != (uid_t) -1        && !uid_eq(kruid, old->uid) &&
> +		    !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
>  			goto error;
> -		if (euid != (uid_t) -1 && euid != old->uid &&
> -		    euid != old->euid  && euid != old->suid)
> +		if (euid != (uid_t) -1        && !uid_eq(keuid, old->uid) &&
> +		    !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid))
>  			goto error;
> -		if (suid != (uid_t) -1 && suid != old->uid &&
> -		    suid != old->euid  && suid != old->suid)
> +		if (suid != (uid_t) -1        && !uid_eq(ksuid, old->uid) &&
> +		    !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid))
>  			goto error;
>  	}
>  
>  	if (ruid != (uid_t) -1) {
> -		new->uid = ruid;
> -		if (ruid != old->uid) {
> +		new->uid = kruid;
> +		if (!uid_eq(kruid, old->uid)) {
>  			retval = set_user(new);
>  			if (retval < 0)
>  				goto error;
>  		}
>  	}
>  	if (euid != (uid_t) -1)
> -		new->euid = euid;
> +		new->euid = keuid;
>  	if (suid != (uid_t) -1)
> -		new->suid = suid;
> +		new->suid = ksuid;
>  	new->fsuid = new->euid;
>  
>  	retval = security_task_fix_setuid(new, old, LSM_SETID_RES);
> @@ -822,14 +869,19 @@ error:
>  	return retval;
>  }
>  
> -SYSCALL_DEFINE3(getresuid, uid_t __user *, ruid, uid_t __user *, euid, uid_t __user *, suid)
> +SYSCALL_DEFINE3(getresuid, uid_t __user *, ruidp, uid_t __user *, euidp, uid_t __user *, suidp)
>  {
>  	const struct cred *cred = current_cred();
>  	int retval;
> +	uid_t ruid, euid, suid;
> +
> +	ruid = from_kuid_munged(cred->user_ns, cred->uid);
> +	euid = from_kuid_munged(cred->user_ns, cred->euid);
> +	suid = from_kuid_munged(cred->user_ns, cred->suid);
>  
> -	if (!(retval   = put_user(cred->uid,  ruid)) &&
> -	    !(retval   = put_user(cred->euid, euid)))
> -		retval = put_user(cred->suid, suid);
> +	if (!(retval   = put_user(ruid, ruidp)) &&
> +	    !(retval   = put_user(euid, euidp)))
> +		retval = put_user(suid, suidp);
>  
>  	return retval;
>  }
> @@ -839,9 +891,22 @@ SYSCALL_DEFINE3(getresuid, uid_t __user *, ruid, uid_t __user *, euid, uid_t __u
>   */
>  SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
>  {
> +	struct user_namespace *ns = current_user_ns();
>  	const struct cred *old;
>  	struct cred *new;
>  	int retval;
> +	kgid_t krgid, kegid, ksgid;
> +
> +	krgid = make_kgid(ns, rgid);
> +	kegid = make_kgid(ns, egid);
> +	ksgid = make_kgid(ns, sgid);
> +
> +	if ((rgid != (gid_t) -1) && !gid_valid(krgid))
> +		return -EINVAL;
> +	if ((egid != (gid_t) -1) && !gid_valid(kegid))
> +		return -EINVAL;
> +	if ((sgid != (gid_t) -1) && !gid_valid(ksgid))
> +		return -EINVAL;
>  
>  	new = prepare_creds();
>  	if (!new)
> @@ -850,23 +915,23 @@ SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
>  
>  	retval = -EPERM;
>  	if (!nsown_capable(CAP_SETGID)) {
> -		if (rgid != (gid_t) -1 && rgid != old->gid &&
> -		    rgid != old->egid  && rgid != old->sgid)
> +		if (rgid != (gid_t) -1        && !gid_eq(krgid, old->gid) &&
> +		    !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
>  			goto error;
> -		if (egid != (gid_t) -1 && egid != old->gid &&
> -		    egid != old->egid  && egid != old->sgid)
> +		if (egid != (gid_t) -1        && !gid_eq(kegid, old->gid) &&
> +		    !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid))
>  			goto error;
> -		if (sgid != (gid_t) -1 && sgid != old->gid &&
> -		    sgid != old->egid  && sgid != old->sgid)
> +		if (sgid != (gid_t) -1        && !gid_eq(ksgid, old->gid) &&
> +		    !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid))
>  			goto error;
>  	}
>  
>  	if (rgid != (gid_t) -1)
> -		new->gid = rgid;
> +		new->gid = krgid;
>  	if (egid != (gid_t) -1)
> -		new->egid = egid;
> +		new->egid = kegid;
>  	if (sgid != (gid_t) -1)
> -		new->sgid = sgid;
> +		new->sgid = ksgid;
>  	new->fsgid = new->egid;
>  
>  	return commit_creds(new);
> @@ -876,14 +941,19 @@ error:
>  	return retval;
>  }
>  
> -SYSCALL_DEFINE3(getresgid, gid_t __user *, rgid, gid_t __user *, egid, gid_t __user *, sgid)
> +SYSCALL_DEFINE3(getresgid, gid_t __user *, rgidp, gid_t __user *, egidp, gid_t __user *, sgidp)
>  {
>  	const struct cred *cred = current_cred();
>  	int retval;
> +	gid_t rgid, egid, sgid;
> +
> +	rgid = from_kgid_munged(cred->user_ns, cred->gid);
> +	egid = from_kgid_munged(cred->user_ns, cred->egid);
> +	sgid = from_kgid_munged(cred->user_ns, cred->sgid);
>  
> -	if (!(retval   = put_user(cred->gid,  rgid)) &&
> -	    !(retval   = put_user(cred->egid, egid)))
> -		retval = put_user(cred->sgid, sgid);
> +	if (!(retval   = put_user(rgid, rgidp)) &&
> +	    !(retval   = put_user(egid, egidp)))
> +		retval = put_user(sgid, sgidp);
>  
>  	return retval;
>  }
> @@ -900,18 +970,24 @@ SYSCALL_DEFINE1(setfsuid, uid_t, uid)
>  	const struct cred *old;
>  	struct cred *new;
>  	uid_t old_fsuid;
> +	kuid_t kuid;
> +
> +	old = current_cred();
> +	old_fsuid = from_kuid_munged(old->user_ns, old->fsuid);
> +
> +	kuid = make_kuid(old->user_ns, uid);
> +	if (!uid_valid(kuid))
> +		return old_fsuid;
>  
>  	new = prepare_creds();
>  	if (!new)
> -		return current_fsuid();
> -	old = current_cred();
> -	old_fsuid = old->fsuid;
> +		return old_fsuid;
>  
> -	if (uid == old->uid  || uid == old->euid  ||
> -	    uid == old->suid || uid == old->fsuid ||
> +	if (uid_eq(kuid, old->uid)  || uid_eq(kuid, old->euid)  ||
> +	    uid_eq(kuid, old->suid) || uid_eq(kuid, old->fsuid) ||
>  	    nsown_capable(CAP_SETUID)) {
> -		if (uid != old_fsuid) {
> -			new->fsuid = uid;
> +		if (!uid_eq(kuid, old->fsuid)) {
> +			new->fsuid = kuid;
>  			if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
>  				goto change_okay;
>  		}
> @@ -933,18 +1009,24 @@ SYSCALL_DEFINE1(setfsgid, gid_t, gid)
>  	const struct cred *old;
>  	struct cred *new;
>  	gid_t old_fsgid;
> +	kgid_t kgid;
> +
> +	old = current_cred();
> +	old_fsgid = from_kgid_munged(old->user_ns, old->fsgid);
> +
> +	kgid = make_kgid(old->user_ns, gid);
> +	if (!gid_valid(kgid))
> +		return old_fsgid;
>  
>  	new = prepare_creds();
>  	if (!new)
> -		return current_fsgid();
> -	old = current_cred();
> -	old_fsgid = old->fsgid;
> +		return old_fsgid;
>  
> -	if (gid == old->gid  || gid == old->egid  ||
> -	    gid == old->sgid || gid == old->fsgid ||
> +	if (gid_eq(kgid, old->gid)  || gid_eq(kgid, old->egid)  ||
> +	    gid_eq(kgid, old->sgid) || gid_eq(kgid, old->fsgid) ||
>  	    nsown_capable(CAP_SETGID)) {
> -		if (gid != old_fsgid) {
> -			new->fsgid = gid;
> +		if (!gid_eq(kgid, old->fsgid)) {
> +			new->fsgid = kgid;
>  			goto change_okay;
>  		}
>  	}
> @@ -1503,10 +1585,10 @@ static int check_prlimit_permission(struct task_struct *task)
>  	if (cred->user_ns == tcred->user_ns &&
>  	    (cred->uid == tcred->euid &&
>  	     cred->uid == tcred->suid &&
> -	     cred->uid == tcred->uid  &&
> +	     cred->uid == tcred->uid &&
>  	     cred->gid == tcred->egid &&
>  	     cred->gid == tcred->sgid &&
> -	     cred->gid == tcred->gid))
> +		    cred->gid == tcred->gid))
>  		return 0;
>  	if (ns_capable(tcred->user_ns, CAP_SYS_RESOURCE))
>  		return 0;
> diff --git a/kernel/timer.c b/kernel/timer.c
> index a297ffc..67316cb 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1427,25 +1427,25 @@ SYSCALL_DEFINE0(getppid)
>  SYSCALL_DEFINE0(getuid)
>  {
>  	/* Only we change this so SMP safe */
> -	return current_uid();
> +	return from_kuid_munged(current_user_ns(), current_uid());
>  }
>  
>  SYSCALL_DEFINE0(geteuid)
>  {
>  	/* Only we change this so SMP safe */
> -	return current_euid();
> +	return from_kuid_munged(current_user_ns(), current_euid());
>  }
>  
>  SYSCALL_DEFINE0(getgid)
>  {
>  	/* Only we change this so SMP safe */
> -	return current_gid();
> +	return from_kgid_munged(current_user_ns(), current_gid());
>  }
>  
>  SYSCALL_DEFINE0(getegid)
>  {
>  	/* Only we change this so SMP safe */
> -	return  current_egid();
> +	return from_kgid_munged(current_user_ns(), current_egid());
>  }
>  
>  #endif
> diff --git a/kernel/uid16.c b/kernel/uid16.c
> index e530bc3..d7948eb 100644
> --- a/kernel/uid16.c
> +++ b/kernel/uid16.c
> @@ -81,14 +81,19 @@ SYSCALL_DEFINE3(setresuid16, old_uid_t, ruid, old_uid_t, euid, old_uid_t, suid)
>  	return ret;
>  }
>  
> -SYSCALL_DEFINE3(getresuid16, old_uid_t __user *, ruid, old_uid_t __user *, euid, old_uid_t __user *, suid)
> +SYSCALL_DEFINE3(getresuid16, old_uid_t __user *, ruidp, old_uid_t __user *, euidp, old_uid_t __user *, suidp)
>  {
>  	const struct cred *cred = current_cred();
>  	int retval;
> +	old_uid_t ruid, euid, suid;
>  
> -	if (!(retval   = put_user(high2lowuid(cred->uid),  ruid)) &&
> -	    !(retval   = put_user(high2lowuid(cred->euid), euid)))
> -		retval = put_user(high2lowuid(cred->suid), suid);
> +	ruid = high2lowuid(from_kuid_munged(cred->user_ns, cred->uid));
> +	euid = high2lowuid(from_kuid_munged(cred->user_ns, cred->euid));
> +	suid = high2lowuid(from_kuid_munged(cred->user_ns, cred->suid));
> +
> +	if (!(retval   = put_user(ruid, ruidp)) &&
> +	    !(retval   = put_user(euid, euidp)))
> +		retval = put_user(suid, suidp);
>  
>  	return retval;
>  }
> @@ -103,14 +108,19 @@ SYSCALL_DEFINE3(setresgid16, old_gid_t, rgid, old_gid_t, egid, old_gid_t, sgid)
>  }
>  
>  
> -SYSCALL_DEFINE3(getresgid16, old_gid_t __user *, rgid, old_gid_t __user *, egid, old_gid_t __user *, sgid)
> +SYSCALL_DEFINE3(getresgid16, old_gid_t __user *, rgidp, old_gid_t __user *, egidp, old_gid_t __user *, sgidp)
>  {
>  	const struct cred *cred = current_cred();
>  	int retval;
> +	old_gid_t rgid, egid, sgid;
> +
> +	rgid = high2lowgid(from_kgid_munged(cred->user_ns, cred->gid));
> +	egid = high2lowgid(from_kgid_munged(cred->user_ns, cred->egid));
> +	sgid = high2lowgid(from_kgid_munged(cred->user_ns, cred->sgid));
>  
> -	if (!(retval   = put_user(high2lowgid(cred->gid),  rgid)) &&
> -	    !(retval   = put_user(high2lowgid(cred->egid), egid)))
> -		retval = put_user(high2lowgid(cred->sgid), sgid);
> +	if (!(retval   = put_user(rgid, rgidp)) &&
> +	    !(retval   = put_user(egid, egidp)))
> +		retval = put_user(sgid, sgidp);
>  
>  	return retval;
>  }
> @@ -221,20 +231,20 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
>  
>  SYSCALL_DEFINE0(getuid16)
>  {
> -	return high2lowuid(current_uid());
> +	return high2lowuid(from_kuid_munged(current_user_ns(), current_uid()));
>  }
>  
>  SYSCALL_DEFINE0(geteuid16)
>  {
> -	return high2lowuid(current_euid());
> +	return high2lowuid(from_kuid_munged(current_user_ns(), current_euid()));
>  }
>  
>  SYSCALL_DEFINE0(getgid16)
>  {
> -	return high2lowgid(current_gid());
> +	return high2lowgid(from_kgid_munged(current_user_ns(), current_gid()));
>  }
>  
>  SYSCALL_DEFINE0(getegid16)
>  {
> -	return high2lowgid(current_egid());
> +	return high2lowgid(from_kgid_munged(current_user_ns(), current_egid()));
>  }
> -- 
> 1.7.2.5
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
  2012-04-26  9:09                     ` Eric W. Biederman
@ 2012-04-26 16:21                           ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-26 16:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	Cyrill Gorcunov, Andrew Morton, Linus Torvalds

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> 
> > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> >> 
> >> > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >> >> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> >> >> 
> >> >> > Quoting Eric W. Beiderman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >> >> >> From: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> >> >> >> 
> >> >> >> - Transform userns->creator from a user_struct reference to a simple
> >> >> >>   kuid_t, kgid_t pair.
> >> >> >> 
> >> >> >>   In cap_capable this allows the check to see if we are the creator of
> >> >> >>   a namespace to become the classic suser style euid permission check.
> >> >> >> 
> >> >> >>   This allows us to remove the need for a struct cred in the mapping
> >> >> >>   functions and still be able to dispaly the user namespace creators
> >> >> >>   uid and gid as 0.
> >> >> >> 
> >> >> >> - Remove the now unnecessary delayed_work in free_user_ns.
> >> >> >> 
> >> >> >>   All that is left for free_user_ns to do is to call kmem_cache_free
> >> >> >>   and put_user_ns.  Those functions can be called in any context
> >> >> >>   so call them directly from free_user_ns removing the need for delayed work.
> >> >> >> 
> >> >> >> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> >> >> >> ---
> >> >> >>  include/linux/user_namespace.h |    4 ++--
> >> >> >>  kernel/user.c                  |    7 ++++---
> >> >> >>  kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
> >> >> >>  security/commoncap.c           |    5 +++--
> >> >> >>  4 files changed, 27 insertions(+), 28 deletions(-)
> >> >> >> 
> >> >> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> >> >> >> index d767508..8a391bd 100644
> >> >> >> --- a/include/linux/user_namespace.h
> >> >> >> +++ b/include/linux/user_namespace.h
> >> >> >> @@ -9,8 +9,8 @@
> >> >> >>  struct user_namespace {
> >> >> >>  	struct kref		kref;
> >> >> >>  	struct user_namespace	*parent;
> >> >> >> -	struct user_struct	*creator;
> >> >> >> -	struct work_struct	destroyer;
> >> >> >> +	kuid_t			owner;
> >> >> >> +	kgid_t			group;
> >> >> >>  };
> >> >> >>  
> >> >> >>  extern struct user_namespace init_user_ns;
> >> >> >> diff --git a/kernel/user.c b/kernel/user.c
> >> >> >> index 025077e..cff3856 100644
> >> >> >> --- a/kernel/user.c
> >> >> >> +++ b/kernel/user.c
> >> >> >> @@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
> >> >> >>  	.kref = {
> >> >> >>  		.refcount	= ATOMIC_INIT(3),
> >> >> >>  	},
> >> >> >> -	.creator = &root_user,
> >> >> >> +	.owner = GLOBAL_ROOT_UID,
> >> >> >> +	.group = GLOBAL_ROOT_GID,
> >> >> >>  };
> >> >> >>  EXPORT_SYMBOL_GPL(init_user_ns);
> >> >> >>  
> >> >> >> @@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
> >> >> >>   */
> >> >> >>  static DEFINE_SPINLOCK(uidhash_lock);
> >> >> >>  
> >> >> >> -/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
> >> >> >> +/* root_user.__count is 1, for init task cred */
> >> >> >>  struct user_struct root_user = {
> >> >> >> -	.__count	= ATOMIC_INIT(2),
> >> >> >> +	.__count	= ATOMIC_INIT(1),
> >> >> >>  	.processes	= ATOMIC_INIT(1),
> >> >> >>  	.files		= ATOMIC_INIT(0),
> >> >> >>  	.sigpending	= ATOMIC_INIT(0),
> >> >> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> >> >> >> index 898e973..f69741a 100644
> >> >> >> --- a/kernel/user_namespace.c
> >> >> >> +++ b/kernel/user_namespace.c
> >> >> >> @@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
> >> >> >>  {
> >> >> >>  	struct user_namespace *ns, *parent_ns = new->user_ns;
> >> >> >>  	struct user_struct *root_user;
> >> >> >> +	kuid_t owner = make_kuid(new->user_ns, new->euid);
> >> >> >> +	kgid_t group = make_kgid(new->user_ns, new->egid);
> >> >> >> +
> >> >> >> +	/* The creator needs a mapping in the parent user namespace
> >> >> >> +	 * or else we won't be able to reasonably tell userspace who
> >> >> >> +	 * created a user_namespace.
> >> >> >> +	 */
> >> >> >> +	if (!kuid_has_mapping(parent_ns, owner) ||
> >> >> >> +	    !kgid_has_mapping(parent_ns, group))
> >> >> >> +		return -EPERM;
> >> >> >>  
> >> >> >>  	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
> >> >> >>  	if (!ns)
> >> >> >> @@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
> >> >> >>  
> >> >> >>  	/* set the new root user in the credentials under preparation */
> >> >> >>  	ns->parent = parent_ns;
> >> >> >
> >> >> > I think in the past the creator cred pinned the ns->parent.  Do you now
> >> >> > need to explicitly pin ns->parent (and release it in free_user_ns())?
> >> >> 
> >> >> Yes we do have to explicitly reference count the parent namespace.
> >> >> But that happened in the patch 7:
> >> >> "userns: Add an explicit reference to the parent user namespace"
> >> 
> >> Make that patch 8 not patch 7: 
> >> "userns: Add an explicit reference to the parent user namespace"
> >> Perhaps the patch number reference pointed you to look at the wrong code.
> >
> > D'oh, yup.  That explains it better.
> >
> > And so parent_userns keeps the refcount from the cred 'new' after
> > new->ns = ns;  That works, thanks.
> >
> >> > Perhaps that suffices, but I'm not convinced.  The struct cred is
> >> > pinning it's own ns, but if t1 does clone(CLONE_NEWUSER) to produce
> >> > t2, which does the same to procduce t3, and then t2 exits, I'm not
> >> > seeing what will pin t2's userns.
> >> 
> >> t3's userns hold's a reference to the departed t2's userns.
> >> t2's userns hold's a reference to t1's userns.
> >> 
> >> free_user_ns does put that userns reference.
> >> 
> >> It is all there and explict.  Usernamespaces refer directly to each
> >
> > Actually can we make it just one tinge more explicit, and put a comment
> > above the 'new->user_ns = ns'?  There's currently the comment
> >
> >    /* Leave the reference to our user_ns with the new cred */
> >
> > But that's about the initial refcount on the new ns.  Perhaps change that to:
> >
> >    /*
> >     * Leave the reference to our new user_ns with the new cred,
> >     * and leave the reference on the old ns to pin new->parent_ns
> >     */
> 
> I have added the following comment.  Hopefully that makes it clearer. */
> 
> 	/* Leave the new->user_ns reference with the new user namespace. */

Great, thanks.

> >> other.  That was all needed to get struct user out of the usernamespace
> >> game.
> >> 
> >> Eric
> >
> > Thanks, Eric.  So then
> >
> > Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> >
> > which, given the other nits are addressed, should cover the whole
> > set with my acks.
> 
> Hmm.  I don't have a record of you looking at my patch 23.
> "userns: Convert setting and getting uid and gid system calls to use kuid and kgid"

I'd looked at it, but apparently never replied.  Took another look to be
sure, and replied now.

thanks,
-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid.
@ 2012-04-26 16:21                           ` Serge E. Hallyn
  0 siblings, 0 replies; 227+ messages in thread
From: Serge E. Hallyn @ 2012-04-26 16:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, linux-kernel, Linux Containers, Cyrill Gorcunov,
	linux-security-module, Al Viro, linux-fsdevel, Andrew Morton,
	Linus Torvalds

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serge@hallyn.com> writes:
> 
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> "Serge E. Hallyn" <serge@hallyn.com> writes:
> >> 
> >> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> >> "Serge E. Hallyn" <serge@hallyn.com> writes:
> >> >> 
> >> >> > Quoting Eric W. Beiderman (ebiederm@xmission.com):
> >> >> >> From: Eric W. Biederman <ebiederm@xmission.com>
> >> >> >> 
> >> >> >> - Transform userns->creator from a user_struct reference to a simple
> >> >> >>   kuid_t, kgid_t pair.
> >> >> >> 
> >> >> >>   In cap_capable this allows the check to see if we are the creator of
> >> >> >>   a namespace to become the classic suser style euid permission check.
> >> >> >> 
> >> >> >>   This allows us to remove the need for a struct cred in the mapping
> >> >> >>   functions and still be able to dispaly the user namespace creators
> >> >> >>   uid and gid as 0.
> >> >> >> 
> >> >> >> - Remove the now unnecessary delayed_work in free_user_ns.
> >> >> >> 
> >> >> >>   All that is left for free_user_ns to do is to call kmem_cache_free
> >> >> >>   and put_user_ns.  Those functions can be called in any context
> >> >> >>   so call them directly from free_user_ns removing the need for delayed work.
> >> >> >> 
> >> >> >> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> >> >> >> ---
> >> >> >>  include/linux/user_namespace.h |    4 ++--
> >> >> >>  kernel/user.c                  |    7 ++++---
> >> >> >>  kernel/user_namespace.c        |   39 ++++++++++++++++++---------------------
> >> >> >>  security/commoncap.c           |    5 +++--
> >> >> >>  4 files changed, 27 insertions(+), 28 deletions(-)
> >> >> >> 
> >> >> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> >> >> >> index d767508..8a391bd 100644
> >> >> >> --- a/include/linux/user_namespace.h
> >> >> >> +++ b/include/linux/user_namespace.h
> >> >> >> @@ -9,8 +9,8 @@
> >> >> >>  struct user_namespace {
> >> >> >>  	struct kref		kref;
> >> >> >>  	struct user_namespace	*parent;
> >> >> >> -	struct user_struct	*creator;
> >> >> >> -	struct work_struct	destroyer;
> >> >> >> +	kuid_t			owner;
> >> >> >> +	kgid_t			group;
> >> >> >>  };
> >> >> >>  
> >> >> >>  extern struct user_namespace init_user_ns;
> >> >> >> diff --git a/kernel/user.c b/kernel/user.c
> >> >> >> index 025077e..cff3856 100644
> >> >> >> --- a/kernel/user.c
> >> >> >> +++ b/kernel/user.c
> >> >> >> @@ -25,7 +25,8 @@ struct user_namespace init_user_ns = {
> >> >> >>  	.kref = {
> >> >> >>  		.refcount	= ATOMIC_INIT(3),
> >> >> >>  	},
> >> >> >> -	.creator = &root_user,
> >> >> >> +	.owner = GLOBAL_ROOT_UID,
> >> >> >> +	.group = GLOBAL_ROOT_GID,
> >> >> >>  };
> >> >> >>  EXPORT_SYMBOL_GPL(init_user_ns);
> >> >> >>  
> >> >> >> @@ -54,9 +55,9 @@ struct hlist_head uidhash_table[UIDHASH_SZ];
> >> >> >>   */
> >> >> >>  static DEFINE_SPINLOCK(uidhash_lock);
> >> >> >>  
> >> >> >> -/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
> >> >> >> +/* root_user.__count is 1, for init task cred */
> >> >> >>  struct user_struct root_user = {
> >> >> >> -	.__count	= ATOMIC_INIT(2),
> >> >> >> +	.__count	= ATOMIC_INIT(1),
> >> >> >>  	.processes	= ATOMIC_INIT(1),
> >> >> >>  	.files		= ATOMIC_INIT(0),
> >> >> >>  	.sigpending	= ATOMIC_INIT(0),
> >> >> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> >> >> >> index 898e973..f69741a 100644
> >> >> >> --- a/kernel/user_namespace.c
> >> >> >> +++ b/kernel/user_namespace.c
> >> >> >> @@ -27,6 +27,16 @@ int create_user_ns(struct cred *new)
> >> >> >>  {
> >> >> >>  	struct user_namespace *ns, *parent_ns = new->user_ns;
> >> >> >>  	struct user_struct *root_user;
> >> >> >> +	kuid_t owner = make_kuid(new->user_ns, new->euid);
> >> >> >> +	kgid_t group = make_kgid(new->user_ns, new->egid);
> >> >> >> +
> >> >> >> +	/* The creator needs a mapping in the parent user namespace
> >> >> >> +	 * or else we won't be able to reasonably tell userspace who
> >> >> >> +	 * created a user_namespace.
> >> >> >> +	 */
> >> >> >> +	if (!kuid_has_mapping(parent_ns, owner) ||
> >> >> >> +	    !kgid_has_mapping(parent_ns, group))
> >> >> >> +		return -EPERM;
> >> >> >>  
> >> >> >>  	ns = kmem_cache_alloc(user_ns_cachep, GFP_KERNEL);
> >> >> >>  	if (!ns)
> >> >> >> @@ -43,7 +53,9 @@ int create_user_ns(struct cred *new)
> >> >> >>  
> >> >> >>  	/* set the new root user in the credentials under preparation */
> >> >> >>  	ns->parent = parent_ns;
> >> >> >
> >> >> > I think in the past the creator cred pinned the ns->parent.  Do you now
> >> >> > need to explicitly pin ns->parent (and release it in free_user_ns())?
> >> >> 
> >> >> Yes we do have to explicitly reference count the parent namespace.
> >> >> But that happened in the patch 7:
> >> >> "userns: Add an explicit reference to the parent user namespace"
> >> 
> >> Make that patch 8 not patch 7: 
> >> "userns: Add an explicit reference to the parent user namespace"
> >> Perhaps the patch number reference pointed you to look at the wrong code.
> >
> > D'oh, yup.  That explains it better.
> >
> > And so parent_userns keeps the refcount from the cred 'new' after
> > new->ns = ns;  That works, thanks.
> >
> >> > Perhaps that suffices, but I'm not convinced.  The struct cred is
> >> > pinning it's own ns, but if t1 does clone(CLONE_NEWUSER) to produce
> >> > t2, which does the same to procduce t3, and then t2 exits, I'm not
> >> > seeing what will pin t2's userns.
> >> 
> >> t3's userns hold's a reference to the departed t2's userns.
> >> t2's userns hold's a reference to t1's userns.
> >> 
> >> free_user_ns does put that userns reference.
> >> 
> >> It is all there and explict.  Usernamespaces refer directly to each
> >
> > Actually can we make it just one tinge more explicit, and put a comment
> > above the 'new->user_ns = ns'?  There's currently the comment
> >
> >    /* Leave the reference to our user_ns with the new cred */
> >
> > But that's about the initial refcount on the new ns.  Perhaps change that to:
> >
> >    /*
> >     * Leave the reference to our new user_ns with the new cred,
> >     * and leave the reference on the old ns to pin new->parent_ns
> >     */
> 
> I have added the following comment.  Hopefully that makes it clearer. */
> 
> 	/* Leave the new->user_ns reference with the new user namespace. */

Great, thanks.

> >> other.  That was all needed to get struct user out of the usernamespace
> >> game.
> >> 
> >> Eric
> >
> > Thanks, Eric.  So then
> >
> > Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
> >
> > which, given the other nits are addressed, should cover the whole
> > set with my acks.
> 
> Hmm.  I don't have a record of you looking at my patch 23.
> "userns: Convert setting and getting uid and gid system calls to use kuid and kgid"

I'd looked at it, but apparently never replied.  Took another look to be
sure, and replied now.

thanks,
-serge

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Please include user-namespace.git in linux-next
  2012-04-08  5:10 ` Eric W. Biederman
@ 2012-05-11 23:20     ` Eric W. Biederman
  -1 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-05-11 23:20 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Cyrill Gorcunov, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds


In linux-next please include git://pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-next

This tree includes the fixed versions of the 43 patches I sent out for
review a few weeks ago, plus an updated Kconfig options so the parts
that won't build can't build can't be selected making allyesconfig and
allnoconfig safe.

The discussion of this work was covered in lwn at:
http://lwn.net/Articles/491310

There are a bunch more trivial patches to go that still need to be
reviewed but the core has been reviewed and tested left/right up/down
and sideways and it looks like all of the bugs have fallen out.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Please include user-namespace.git in linux-next
@ 2012-05-11 23:20     ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-05-11 23:20 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Serge E. Hallyn, Andrew Morton, Linus Torvalds, Al Viro,
	Cyrill Gorcunov, linux-kernel


In linux-next please include git://pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-next

This tree includes the fixed versions of the 43 patches I sent out for
review a few weeks ago, plus an updated Kconfig options so the parts
that won't build can't build can't be selected making allyesconfig and
allnoconfig safe.

The discussion of this work was covered in lwn at:
http://lwn.net/Articles/491310

There are a bunch more trivial patches to go that still need to be
reviewed but the core has been reviewed and tested left/right up/down
and sideways and it looks like all of the bugs have fallen out.

Eric



^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: Please include user-namespace.git in linux-next
  2012-05-11 23:20     ` Eric W. Biederman
  (?)
@ 2012-05-13 23:35         ` Stephen Rothwell
  -1 siblings, 0 replies; 227+ messages in thread
From: Stephen Rothwell @ 2012-05-13 23:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Cyrill Gorcunov, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds


[-- Attachment #1.1: Type: text/plain, Size: 1865 bytes --]

HI Eric,

On Fri, 11 May 2012 16:20:54 -0700 ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote:
>
> In linux-next please include git://pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-next

I assume you left out "git.kernel.org/"  :-)

Included from today.

Thanks for adding your subsystem tree as a participant of linux-next.  As
you may know, this is not a judgment of your code.  The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window. 

You will need to ensure that the patches/commits in your tree/series have
been:
     * submitted under GPL v2 (or later) and include the Contributor's
	Signed-off-by,
     * posted to the relevant mailing list,
     * reviewed by you (or another maintainer of your subsystem tree),
     * successfully unit tested, and 
     * destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch).  It is allowed to be rebased if you deem it necessary.

-- 
Cheers,
Stephen Rothwell 
sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org

Legal Stuff:
By participating in linux-next, your subsystem tree contributions are
public and will be included in the linux-next trees.  You may be sent
e-mail messages indicating errors or other issues when the
patches/commits from your subsystem tree are merged and tested in
linux-next.  These messages may also be cross-posted to the linux-next
mailing list, the linux-kernel mailing list, etc.  The linux-next tree
project and IBM (my employer) make no warranties regarding the linux-next
project, the testing procedures, the results, the e-mails, etc.  If you
don't agree to these ground rules, let me know and I'll remove your tree
from participation in linux-next.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: Please include user-namespace.git in linux-next
@ 2012-05-13 23:35         ` Stephen Rothwell
  0 siblings, 0 replies; 227+ messages in thread
From: Stephen Rothwell @ 2012-05-13 23:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-fsdevel, linux-security-module, Linux Containers,
	Serge E. Hallyn, Andrew Morton, Linus Torvalds, Al Viro,
	Cyrill Gorcunov, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1815 bytes --]

HI Eric,

On Fri, 11 May 2012 16:20:54 -0700 ebiederm@xmission.com (Eric W. Biederman) wrote:
>
> In linux-next please include git://pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-next

I assume you left out "git.kernel.org/"  :-)

Included from today.

Thanks for adding your subsystem tree as a participant of linux-next.  As
you may know, this is not a judgment of your code.  The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window. 

You will need to ensure that the patches/commits in your tree/series have
been:
     * submitted under GPL v2 (or later) and include the Contributor's
	Signed-off-by,
     * posted to the relevant mailing list,
     * reviewed by you (or another maintainer of your subsystem tree),
     * successfully unit tested, and 
     * destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch).  It is allowed to be rebased if you deem it necessary.

-- 
Cheers,
Stephen Rothwell 
sfr@canb.auug.org.au

Legal Stuff:
By participating in linux-next, your subsystem tree contributions are
public and will be included in the linux-next trees.  You may be sent
e-mail messages indicating errors or other issues when the
patches/commits from your subsystem tree are merged and tested in
linux-next.  These messages may also be cross-posted to the linux-next
mailing list, the linux-kernel mailing list, etc.  The linux-next tree
project and IBM (my employer) make no warranties regarding the linux-next
project, the testing procedures, the results, the e-mails, etc.  If you
don't agree to these ground rules, let me know and I'll remove your tree
from participation in linux-next.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: Please include user-namespace.git in linux-next
@ 2012-05-13 23:35         ` Stephen Rothwell
  0 siblings, 0 replies; 227+ messages in thread
From: Stephen Rothwell @ 2012-05-13 23:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Containers, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Cyrill Gorcunov, linux-security-module-u79uwXL29TY76Z2rM5mHXA,
	Al Viro, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	Linus Torvalds


[-- Attachment #1.1: Type: text/plain, Size: 1865 bytes --]

HI Eric,

On Fri, 11 May 2012 16:20:54 -0700 ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote:
>
> In linux-next please include git://pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-next

I assume you left out "git.kernel.org/"  :-)

Included from today.

Thanks for adding your subsystem tree as a participant of linux-next.  As
you may know, this is not a judgment of your code.  The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window. 

You will need to ensure that the patches/commits in your tree/series have
been:
     * submitted under GPL v2 (or later) and include the Contributor's
	Signed-off-by,
     * posted to the relevant mailing list,
     * reviewed by you (or another maintainer of your subsystem tree),
     * successfully unit tested, and 
     * destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch).  It is allowed to be rebased if you deem it necessary.

-- 
Cheers,
Stephen Rothwell 
sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org

Legal Stuff:
By participating in linux-next, your subsystem tree contributions are
public and will be included in the linux-next trees.  You may be sent
e-mail messages indicating errors or other issues when the
patches/commits from your subsystem tree are merged and tested in
linux-next.  These messages may also be cross-posted to the linux-next
mailing list, the linux-kernel mailing list, etc.  The linux-next tree
project and IBM (my employer) make no warranties regarding the linux-next
project, the testing procedures, the results, the e-mails, etc.  If you
don't agree to these ground rules, let me know and I'll remove your tree
from participation in linux-next.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 205 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: Please include user-namespace.git in linux-next
  2012-05-11 23:20     ` Eric W. Biederman
  (?)
  (?)
@ 2012-05-21  2:25     ` Tetsuo Handa
  2012-05-22 17:26       ` Eric W. Biederman
  -1 siblings, 1 reply; 227+ messages in thread
From: Tetsuo Handa @ 2012-05-21  2:25 UTC (permalink / raw)
  To: ebiederm; +Cc: sfr, linux-kernel

I think something is wrong with commit e1c972b6 "userns: Add negative depends
on entries to avoid building code that is userns unsafe".

With gcc 4.4.6 on CentOS 6.2, "make allnoconfig" where UIDGID_CONVERTED should
become y is showing

Symbol: USER_NS [=n]
Type  : boolean
Prompt: User namespace (EXPERIMENTAL)
  Defined at init/Kconfig:880
  Depends on: NAMESPACES [=y] && EXPERIMENTAL [=n] && UIDGID_CONVERTED [=n]
  Location:
    -> General setup
      -> Namespaces support (NAMESPACES [=y])
  Selects: UIDGID_STRICT_TYPE_CHECKS [=n]

.
I think this commit meant "!FOO" rather than "FOO = n",
othwewise there is no way for linux-next-20120518 to enable USER_NS.
----------------------------------------
PATCH: user_ns: Fix wrong dependency in UIDGID_CONVERTED.

"depends on FOO = n" should be "depends on !FOO".

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SKAURA.ne.jp>
----------
diff --git a/init/Kconfig b/init/Kconfig
index 20f6702..7316ed6 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -917,110 +917,110 @@ config UIDGID_CONVERTED
 
 	# List of kernel pieces that need user namespace work
 	# Features
-	depends on SYSVIPC = n
-	depends on IMA = n
-	depends on EVM = n
-	depends on KEYS = n
-	depends on AUDIT = n
-	depends on AUDITSYSCALL = n
-	depends on TASKSTATS = n
-	depends on TRACING = n
-	depends on FS_POSIX_ACL = n
-	depends on QUOTA = n
-	depends on QUOTACTL = n
-	depends on DEBUG_CREDENTIALS = n
-	depends on BSD_PROCESS_ACCT = n
-	depends on DRM = n
-	depends on PROC_EVENTS = n
+	depends on !SYSVIPC
+	depends on !IMA
+	depends on !EVM
+	depends on !KEYS
+	depends on !AUDIT
+	depends on !AUDITSYSCALL
+	depends on !TASKSTATS
+	depends on !TRACING
+	depends on !FS_POSIX_ACL
+	depends on !QUOTA
+	depends on !QUOTACTL
+	depends on !DEBUG_CREDENTIALS
+	depends on !BSD_PROCESS_ACCT
+	depends on !DRM
+	depends on !PROC_EVENTS
 
 	# Networking
-	depends on NET = n
-	depends on NET_9P = n
-	depends on IPX = n
-	depends on PHONET = n
-	depends on NET_CLS_FLOW = n
-	depends on NETFILTER_XT_MATCH_OWNER = n
-	depends on NETFILTER_XT_MATCH_RECENT = n
-	depends on NETFILTER_XT_TARGET_LOG = n
-	depends on NETFILTER_NETLINK_LOG = n
-	depends on INET = n
-	depends on IPV6 = n
-	depends on IP_SCTP = n
-	depends on AF_RXRPC = n
-	depends on LLC2 = n
-	depends on NET_KEY = n
-	depends on INET_DIAG = n
-	depends on DNS_RESOLVER = n
-	depends on AX25 = n
-	depends on ATALK = n
+	depends on !NET
+	depends on !NET_9P
+	depends on !IPX
+	depends on !PHONET
+	depends on !NET_CLS_FLOW
+	depends on !NETFILTER_XT_MATCH_OWNER
+	depends on !NETFILTER_XT_MATCH_RECENT
+	depends on !NETFILTER_XT_TARGET_LOG
+	depends on !NETFILTER_NETLINK_LOG
+	depends on !INET
+	depends on !IPV6
+	depends on !IP_SCTP
+	depends on !AF_RXRPC
+	depends on !LLC2
+	depends on !NET_KEY
+	depends on !INET_DIAG
+	depends on !DNS_RESOLVER
+	depends on !AX25
+	depends on !ATALK
 
 	# Filesystems
-	depends on USB_DEVICEFS = n
-	depends on USB_GADGETFS = n
-	depends on USB_FUNCTIONFS = n
-	depends on DEVTMPFS = n
-	depends on XENFS = n
-
-	depends on 9P_FS = n
-	depends on ADFS_FS = n
-	depends on AFFS_FS = n
-	depends on AFS_FS = n
-	depends on AUTOFS4_FS = n
-	depends on BEFS_FS = n
-	depends on BFS_FS = n
-	depends on BTRFS_FS = n
-	depends on CEPH_FS = n
-	depends on CIFS = n
-	depends on CODA_FS = n
-	depends on CONFIGFS_FS = n
-	depends on CRAMFS = n
-	depends on DEBUG_FS = n
-	depends on ECRYPT_FS = n
-	depends on EFS_FS = n
-	depends on EXOFS_FS = n
-	depends on FAT_FS = n
-	depends on FUSE_FS = n
-	depends on GFS2_FS = n
-	depends on HFS_FS = n
-	depends on HFSPLUS_FS = n
-	depends on HPFS_FS = n
-	depends on HUGETLBFS = n
-	depends on ISO9660_FS = n
-	depends on JFFS2_FS = n
-	depends on JFS_FS = n
-	depends on LOGFS = n
-	depends on MINIX_FS = n
-	depends on NCP_FS = n
-	depends on NFSD = n
-	depends on NFS_FS = n
-	depends on NILFS2_FS = n
-	depends on NTFS_FS = n
-	depends on OCFS2_FS = n
-	depends on OMFS_FS = n
-	depends on QNX4FS_FS = n
-	depends on QNX6FS_FS = n
-	depends on REISERFS_FS = n
-	depends on SQUASHFS = n
-	depends on SYSV_FS = n
-	depends on UBIFS_FS = n
-	depends on UDF_FS = n
-	depends on UFS_FS = n
-	depends on VXFS_FS = n
-	depends on XFS_FS = n
-
-	depends on !UML || HOSTFS = n
+	depends on !USB_DEVICEFS
+	depends on !USB_GADGETFS
+	depends on !USB_FUNCTIONFS
+	depends on !DEVTMPFS
+	depends on !XENFS
+
+	depends on !9P_FS
+	depends on !ADFS_FS
+	depends on !AFFS_FS
+	depends on !AFS_FS
+	depends on !AUTOFS4_FS
+	depends on !BEFS_FS
+	depends on !BFS_FS
+	depends on !BTRFS_FS
+	depends on !CEPH_FS
+	depends on !CIFS
+	depends on !CODA_FS
+	depends on !CONFIGFS_FS
+	depends on !CRAMFS
+	depends on !DEBUG_FS
+	depends on !ECRYPT_FS
+	depends on !EFS_FS
+	depends on !EXOFS_FS
+	depends on !FAT_FS
+	depends on !FUSE_FS
+	depends on !GFS2_FS
+	depends on !HFS_FS
+	depends on !HFSPLUS_FS
+	depends on !HPFS_FS
+	depends on !HUGETLBFS
+	depends on !ISO9660_FS
+	depends on !JFFS2_FS
+	depends on !JFS_FS
+	depends on !LOGFS
+	depends on !MINIX_FS
+	depends on !NCP_FS
+	depends on !NFSD
+	depends on !NFS_FS
+	depends on !NILFS2_FS
+	depends on !NTFS_FS
+	depends on !OCFS2_FS
+	depends on !OMFS_FS
+	depends on !QNX4FS_FS
+	depends on !QNX6FS_FS
+	depends on !REISERFS_FS
+	depends on !SQUASHFS
+	depends on !SYSV_FS
+	depends on !UBIFS_FS
+	depends on !UDF_FS
+	depends on !UFS_FS
+	depends on !VXFS_FS
+	depends on !XFS_FS
+
+	depends on !UML || !HOSTFS
 
 	# The rare drivers that won't build
-	depends on AIRO = n
-	depends on AIRO_CS = n
-	depends on TUN = n
-	depends on INFINIBAND_QIB = n
-	depends on BLK_DEV_LOOP = n
-	depends on ANDROID_BINDER_IPC = n
+	depends on !AIRO
+	depends on !AIRO_CS
+	depends on !TUN
+	depends on !INFINIBAND_QIB
+	depends on !BLK_DEV_LOOP
+	depends on !ANDROID_BINDER_IPC
 
 	# Security modules
-	depends on SECURITY_TOMOYO = n
-	depends on SECURITY_APPARMOR = n
+	depends on !SECURITY_TOMOYO
+	depends on !SECURITY_APPARMOR
 
 config UIDGID_STRICT_TYPE_CHECKS
 	bool "Require conversions between uid/gids and their internal representation"

^ permalink raw reply related	[flat|nested] 227+ messages in thread

* Re: Please include user-namespace.git in linux-next
  2012-05-21  2:25     ` Tetsuo Handa
@ 2012-05-22 17:26       ` Eric W. Biederman
  0 siblings, 0 replies; 227+ messages in thread
From: Eric W. Biederman @ 2012-05-22 17:26 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: sfr, linux-kernel

Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:

> I think something is wrong with commit e1c972b6 "userns: Add negative depends
> on entries to avoid building code that is userns unsafe".

My apologies for the delayed reply.

> With gcc 4.4.6 on CentOS 6.2, "make allnoconfig" where UIDGID_CONVERTED should
> become y is showing
>
> Symbol: USER_NS [=n]
> Type  : boolean
> Prompt: User namespace (EXPERIMENTAL)
>   Defined at init/Kconfig:880
>   Depends on: NAMESPACES [=y] && EXPERIMENTAL [=n] && UIDGID_CONVERTED [=n]
>   Location:
>     -> General setup
>       -> Namespaces support (NAMESPACES [=y])
>   Selects: UIDGID_STRICT_TYPE_CHECKS [=n]
>
> .
> I think this commit meant "!FOO" rather than "FOO = n",
> othwewise there is no way for linux-next-20120518 to enable USER_NS.
> ----------------------------------------
> PATCH: user_ns: Fix wrong dependency in UIDGID_CONVERTED.

It turns out to be more subtle than that. The issue is that I have
"depends on USB_DEVICEFS = n" and then USB_DEVICEFS was removed.

"depends on FOO = n" is the only way I found that will succeed when in
verifying that FOO is neither enabled nor modular.  "depends on !FOO"
appears to succeed when FOO = m which is not at all what I want.

After both trees are merged I will have to remove that
"depends on USB_DEVICEFS = n" dependency.

I can't see any possible way to do hand this before then.

Eric

^ permalink raw reply	[flat|nested] 227+ messages in thread

end of thread, other threads:[~2012-05-22 17:26 UTC | newest]

Thread overview: 227+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-08  5:10 [REVIEW][PATCH 0/43] Completing the user namespace Eric W. Biederman
2012-04-08  5:10 ` Eric W. Biederman
2012-04-08  5:10 ` Eric W. Biederman
     [not found] ` <m11unyn70b.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2012-04-08  5:14   ` [PATCH 01/43] vfs: Don't allow a user namespace root to make device nodes "Eric W. Beiderman
2012-04-08  5:14     ` "Eric W. Beiderman
2012-04-08  5:14     ` "Eric W. Beiderman
2012-04-08  5:14   ` [PATCH 02/43] userns: Kill bogus declaration of function release_uids "Eric W. Beiderman
2012-04-08  5:14     ` "Eric W. Beiderman
2012-04-08  5:14     ` "Eric W. Beiderman
2012-04-08  5:14   ` [PATCH 03/43] userns: Replace netlink uses of cap_raised with capable "Eric W. Beiderman
2012-04-08  5:14     ` "Eric W. Beiderman
2012-04-08  5:14     ` "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 04/43] userns: Remove unnecessary cast to struct user_struct when copying cred->user "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 05/43] cred: Add forward declaration of init_user_ns in all cases "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 06/43] userns: Use cred->user_ns instead of cred->user->user_ns "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 07/43] cred: Refcount the user_ns pointed to by the cred "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 08/43] userns: Add an explicit reference to the parent user namespace "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 09/43] mqueue: Explicitly capture the user namespace to send the notification to "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 10/43] userns: Deprecate and rename the user_namespace reference in the user_struct "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 11/43] userns: Start out with a full set of capabilities "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 12/43] userns: Replace the hard to write inode_userns with inode_capable "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 13/43] userns: Add kuid_t and kgid_t and associated infrastructure in uidgid.h "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 14/43] userns: Add a Kconfig option to enforce strict kuid and kgid type checks "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 15/43] userns: Disassociate user_struct from the user_namespace "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 16/43] userns: Simplify the user_namespace by making userns->creator a kuid "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-18 18:48     ` Serge E. Hallyn
     [not found]       ` <20120418184847.GA4984-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-04-20 22:58         ` Eric W. Biederman
2012-04-20 22:58       ` Eric W. Biederman
     [not found]         ` <m1aa266meh.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2012-04-24 17:33           ` Serge E. Hallyn
2012-04-24 17:33             ` Serge E. Hallyn
     [not found]             ` <20120424173347.GA14017-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-04-24 19:41               ` Eric W. Biederman
2012-04-24 19:41                 ` Eric W. Biederman
     [not found]                 ` <m14ns8lxyc.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2012-04-24 20:23                   ` Serge E. Hallyn
2012-04-24 20:23                     ` Serge E. Hallyn
2012-04-26  9:09                     ` Eric W. Biederman
     [not found]                       ` <m1ehradfl3.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2012-04-26 16:21                         ` Serge E. Hallyn
2012-04-26 16:21                           ` Serge E. Hallyn
     [not found]                     ` <20120424202301.GA11326-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-04-26  9:09                       ` Eric W. Biederman
     [not found]     ` <1333862139-31737-16-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 18:48       ` Serge E. Hallyn
2012-04-08  5:15   ` [PATCH 17/43] userns: Rework the user_namespace adding uid/gid mapping support "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
     [not found]     ` <1333862139-31737-17-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 18:49       ` Serge E. Hallyn
2012-04-18 18:49         ` Serge E. Hallyn
2012-04-08  5:15   ` [PATCH 18/43] userns: Convert group_info values from gid_t to kgid_t "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
     [not found]     ` <1333862139-31737-18-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 18:49       ` Serge E. Hallyn
2012-04-18 18:49     ` Serge E. Hallyn
     [not found]       ` <20120418184936.GC4984-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-04-20 23:05         ` Eric W. Biederman
2012-04-20 23:05           ` Eric W. Biederman
2012-04-08  5:15   ` [PATCH 19/43] userns: Store uid and gid values in struct cred with kuid_t and kgid_t types "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-18 18:49     ` Serge E. Hallyn
     [not found]     ` <1333862139-31737-19-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 18:49       ` Serge E. Hallyn
2012-04-08  5:15   ` [PATCH 20/43] userns: Replace user_ns_map_uid and user_ns_map_gid with from_kuid and from_kgid "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-18 18:49     ` Serge E. Hallyn
     [not found]     ` <1333862139-31737-20-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 18:49       ` Serge E. Hallyn
2012-04-08  5:15   ` [PATCH 21/43] userns: Convert sched_set_affinity and sched_set_scheduler's permission checks "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 22/43] userns: Convert capabilities related permsion checks "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-18 18:51     ` Serge E. Hallyn
     [not found]       ` <20120418185106.GG4984-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-04-20 23:18         ` Eric W. Biederman
2012-04-20 23:18           ` Eric W. Biederman
     [not found]     ` <1333862139-31737-22-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 18:51       ` Serge E. Hallyn
2012-04-08  5:15   ` [PATCH 23/43] userns: Convert setting and getting uid and gid system calls to use kuid and kgid "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
     [not found]     ` <1333862139-31737-23-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-26 16:20       ` Serge E. Hallyn
2012-04-26 16:20         ` Serge E. Hallyn
2012-04-08  5:15   ` [PATCH 24/43] userns: Convert ptrace, kill, set_priority permission checks to work with kuids and kgids "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
     [not found]     ` <1333862139-31737-24-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 18:56       ` Serge E. Hallyn
2012-04-18 18:56     ` Serge E. Hallyn
     [not found]       ` <20120418185610.GA5186-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-04-20 23:51         ` Eric W. Biederman
2012-04-20 23:51           ` Eric W. Biederman
2012-04-08  5:15   ` [PATCH 25/43] userns: Store uid and gid types in vfs structures with kuid_t and kgid_t types "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-18 18:57     ` Serge E. Hallyn
     [not found]     ` <1333862139-31737-25-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 18:57       ` Serge E. Hallyn
2012-04-08  5:15   ` [PATCH 26/43] userns: Convert in_group_p and in_egroup_p to use kgid_t "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-18 18:58     ` Serge E. Hallyn
     [not found]     ` <1333862139-31737-26-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 18:58       ` Serge E. Hallyn
2012-04-08  5:15   ` [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 28/43] userns: Convert user specfied uids and gids in chown into kuids and kgid "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
     [not found]     ` <1333862139-31737-28-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 19:03       ` Serge E. Hallyn
2012-04-18 19:03         ` Serge E. Hallyn
2012-04-08  5:15   ` [PATCH 29/43] userns: Convert stat to return values mapped from kuids and kgids "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
     [not found]     ` <1333862139-31737-29-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 19:03       ` Serge E. Hallyn
2012-04-18 19:03     ` Serge E. Hallyn
2012-04-08  5:15   ` [PATCH 30/43] userns: Fail exec for suid and sgid binaries with ids outside our user namespace "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-18 19:05     ` Serge E. Hallyn
     [not found]     ` <1333862139-31737-30-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 19:05       ` Serge E. Hallyn
2012-04-18 19:09       ` Serge E. Hallyn
2012-04-18 19:09     ` Serge E. Hallyn
     [not found]       ` <20120418190927.GK5186-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-04-24  2:28         ` Eric W. Biederman
2012-04-24  2:28           ` Eric W. Biederman
     [not found]           ` <m1ehrdrhgr.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2012-04-24 15:10             ` Serge Hallyn
2012-04-24 15:10               ` Serge Hallyn
2012-04-08  5:15   ` [PATCH 31/43] userns: Teach inode_capable to understand inodes whose uids map to other namespaces "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
     [not found]     ` <1333862139-31737-31-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 19:06       ` Serge E. Hallyn
2012-04-18 19:06         ` Serge E. Hallyn
2012-04-08  5:15   ` [PATCH 32/43] userns: signal remove unnecessary map_cred_ns "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
     [not found]     ` <1333862139-31737-32-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 19:07       ` Serge E. Hallyn
2012-04-18 19:07     ` Serge E. Hallyn
2012-04-08  5:15   ` [PATCH 33/43] userns: Convert binary formats to use kuid/kgid where appropriate "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
     [not found]     ` <1333862139-31737-33-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 19:10       ` Serge E. Hallyn
2012-04-18 19:10     ` Serge E. Hallyn
2012-04-24  2:44       ` Eric W. Biederman
     [not found]       ` <20120418191033.GL5186-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-04-24  2:44         ` Eric W. Biederman
2012-04-08  5:15   ` [PATCH 34/43] userns: Convert devpts " "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 35/43] userns: Convert ext2 " "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15     ` "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 36/43] userns: Convert ext3 " "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 37/43] userns: Convert ext4 to user " "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 38/43] userns: Convert proc to use " "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 39/43] userns: Convert sysctl permission checks to use kuid and kgids "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 40/43] userns: Convert sysfs to use kgid/kuid where appropriate "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 41/43] userns: Convert tmpfs to use kuid and kgid " "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 42/43] userns: Convert cgroup permission checks to use uid_eq "Eric W. Beiderman
2012-04-08  5:15   ` [PATCH 43/43] userns: Convert the move_pages, and migrate_pages " "Eric W. Beiderman
2012-04-08 14:54   ` [REVIEW][PATCH 0/43] Completing the user namespace Serge Hallyn
2012-04-08 17:40   ` richard -rw- weinberger
2012-05-11 23:20   ` Please include user-namespace.git in linux-next Eric W. Biederman
2012-05-11 23:20     ` Eric W. Biederman
     [not found]     ` <m1likyz4mh.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2012-05-13 23:35       ` Stephen Rothwell
2012-05-13 23:35         ` Stephen Rothwell
2012-05-13 23:35         ` Stephen Rothwell
2012-05-21  2:25     ` Tetsuo Handa
2012-05-22 17:26       ` Eric W. Biederman
2012-04-08  5:15 ` [PATCH 21/43] userns: Convert sched_set_affinity and sched_set_scheduler's permission checks "Eric W. Beiderman
     [not found]   ` <1333862139-31737-21-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 18:50     ` Serge E. Hallyn
2012-04-18 18:50   ` Serge E. Hallyn
2012-04-08  5:15 ` [PATCH 27/43] userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs "Eric W. Beiderman
     [not found]   ` <1333862139-31737-27-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-04-18 19:02     ` Serge E. Hallyn
2012-04-18 19:02       ` Serge E. Hallyn
2012-04-21  0:05       ` Eric W. Biederman
     [not found]       ` <20120418190213.GD5186-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-04-21  0:05         ` Eric W. Biederman
2012-04-18 19:03     ` Serge E. Hallyn
2012-04-18 19:03       ` Serge E. Hallyn
     [not found]       ` <20120418190337.GE5186-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-04-21  0:58         ` Eric W. Biederman
2012-04-21  0:58           ` Eric W. Biederman
     [not found]           ` <m1sjfx2950.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2012-04-24 17:41             ` Serge E. Hallyn
2012-04-26  0:11             ` Serge E. Hallyn
2012-04-26  0:11               ` Serge E. Hallyn
     [not found]               ` <20120426001101.GA10308-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-04-26  5:33                 ` Eric W. Biederman
2012-04-26  5:33                   ` Eric W. Biederman
2012-04-24 17:41           ` Serge E. Hallyn
2012-04-08  5:15 ` [PATCH 36/43] userns: Convert ext3 to use kuid/kgid where appropriate "Eric W. Beiderman
2012-04-08  5:15 ` [PATCH 37/43] userns: Convert ext4 to user " "Eric W. Beiderman
2012-04-08  5:15 ` [PATCH 38/43] userns: Convert proc to use " "Eric W. Beiderman
2012-04-08  5:15 ` [PATCH 39/43] userns: Convert sysctl permission checks to use kuid and kgids "Eric W. Beiderman
2012-04-08  5:15 ` [PATCH 40/43] userns: Convert sysfs to use kgid/kuid where appropriate "Eric W. Beiderman
2012-04-08  5:15 ` [PATCH 41/43] userns: Convert tmpfs to use kuid and kgid " "Eric W. Beiderman
2012-04-08  5:15 ` [PATCH 42/43] userns: Convert cgroup permission checks to use uid_eq "Eric W. Beiderman
2012-04-08  5:15 ` [PATCH 43/43] userns: Convert the move_pages, and migrate_pages " "Eric W. Beiderman
2012-04-08 14:54 ` [REVIEW][PATCH 0/43] Completing the user namespace Serge Hallyn
2012-04-08 17:40 ` richard -rw- weinberger
2012-04-08 17:40   ` richard -rw- weinberger
     [not found]   ` <CAFLxGvwyx6S6+eZtR=UNSQe_O+W7oZW=GosseL54HGpjtYGXjg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-08 21:30     ` Eric W. Biederman
2012-04-08 21:30       ` Eric W. Biederman
     [not found]       ` <m1iph9ewsy.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2012-04-08 22:04         ` richard -rw- weinberger
2012-04-08 22:04           ` richard -rw- weinberger
2012-04-08 22:52           ` Eric W. Biederman
     [not found]           ` <CAFLxGvwHtA028V2XudM-5HXmXCPw5ENL5E_nHKZh_gbrsRV69g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-08 22:52             ` Eric W. Biederman
2012-04-10 19:01 ` Andy Lutomirski
2012-04-10 21:59   ` Eric W. Biederman
2012-04-10 22:15     ` Andrew Lutomirski
2012-04-10 23:01       ` Markus Gutschke
2012-04-11  0:04         ` Eric W. Biederman
2012-04-10 23:50       ` Eric W. Biederman
2012-04-10 23:56         ` Andrew Lutomirski
2012-04-11  1:01           ` Eric W. Biederman
2012-04-11  1:00             ` Andrew Lutomirski
2012-04-11  1:14               ` Eric W. Biederman
2012-04-11  1:22                 ` Andrew Lutomirski
2012-04-11  4:37                 ` Serge Hallyn
2012-04-11  4:33             ` Serge Hallyn
2012-04-11  4:16         ` Serge Hallyn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.