All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey Gladkov <gladkov.alexey@gmail.com>
To: LKML <linux-kernel@vger.kernel.org>,
	io-uring@vger.kernel.org,
	Kernel Hardening <kernel-hardening@lists.openwall.com>,
	Linux Containers <containers@lists.linux-foundation.org>,
	linux-mm@kvack.org
Cc: Jens Axboe <axboe@kernel.dk>, Kees Cook <keescook@chromium.org>,
	Jann Horn <jannh@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	"Eric W . Biederman" <ebiederm@xmission.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexey Gladkov <legion@kernel.org>
Subject: [RFC PATCH v3 0/8] Count rlimits in each user namespace
Date: Fri, 15 Jan 2021 15:57:21 +0100	[thread overview]
Message-ID: <cover.1610722473.git.gladkov.alexey@gmail.com> (raw)

Preface
-------
These patches are for binding the rlimit counters to a user in user namespace.
This patch set can be applied on top of:

git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.11-rc2

Problem
-------
The RLIMIT_NPROC, RLIMIT_MEMLOCK, RLIMIT_SIGPENDING, RLIMIT_MSGQUEUE rlimits
implementation places the counters in user_struct [1]. These limits are global
between processes and persists for the lifetime of the process, even if
processes are in different user namespaces.

To illustrate the impact of rlimits, let's say there is a program that does not
fork. Some service-A wants to run this program as user X in multiple containers.
Since the program never fork the service wants to set RLIMIT_NPROC=1.

service-A
 \- program (uid=1000, container1, rlimit_nproc=1)
 \- program (uid=1000, container2, rlimit_nproc=1)

The service-A sets RLIMIT_NPROC=1 and runs the program in container1. When the
service-A tries to run a program with RLIMIT_NPROC=1 in container2 it fails
since user X already has one running process.

The problem is not that the limit from container1 affects container2. The
problem is that limit is verified against the global counter that reflects
the number of processes in all containers.

This problem can be worked around by using different users for each container
but in this case we face a different problem of uid mapping when transferring
files from one container to another.

Eric W. Biederman mentioned this issue [2][3].

Introduced changes
------------------
To address the problem, we bind rlimit counters to user namespace. Each counter
reflects the number of processes in a given uid in a given user namespace. The
result is a tree of rlimit counters with the biggest value at the root (aka
init_user_ns). The limit is considered exceeded if it's exceeded up in the tree.

[1] https://lore.kernel.org/containers/87imd2incs.fsf@x220.int.ebiederm.org/
[2] https://lists.linuxfoundation.org/pipermail/containers/2020-August/042096.html
[3] https://lists.linuxfoundation.org/pipermail/containers/2020-October/042524.html

Changelog
---------
v3:
* Added get_ucounts() function to increase the reference count. The existing
  get_counts() function renamed to __get_ucounts().
* The type of ucounts.count changed from atomic_t to refcount_t.
* Dropped 'const' from set_cred_ucounts() arguments.
* Fixed a bug with freeing the cred structure after calling cred_alloc_blank().
* Commit messages have been updated.
* Added selftest.

v2:
* RLIMIT_MEMLOCK, RLIMIT_SIGPENDING and RLIMIT_MSGQUEUE are migrated to ucounts.
* Added ucounts for pair uid and user namespace into cred.
* Added the ability to increase ucount by more than 1.

v1:
* After discussion with Eric W. Biederman, I increased the size of ucounts to
  atomic_long_t.
* Added ucount_max to avoid the fork bomb.

--

Alexey Gladkov (8):
  Use refcount_t for ucounts reference counting
  Add a reference to ucounts for each cred
  Move RLIMIT_NPROC counter to ucounts
  Move RLIMIT_MSGQUEUE counter to ucounts
  Move RLIMIT_SIGPENDING counter to ucounts
  Move RLIMIT_MEMLOCK counter to ucounts
  Move RLIMIT_NPROC check to the place where we increment the counter
  kselftests: Add test to check for rlimit changes in different user
    namespaces

 fs/exec.c                                     |   2 +-
 fs/hugetlbfs/inode.c                          |  17 +-
 fs/io-wq.c                                    |  22 ++-
 fs/io-wq.h                                    |   2 +-
 fs/io_uring.c                                 |   2 +-
 fs/proc/array.c                               |   2 +-
 include/linux/cred.h                          |   3 +
 include/linux/hugetlb.h                       |   3 +-
 include/linux/mm.h                            |   4 +-
 include/linux/sched/user.h                    |   6 -
 include/linux/shmem_fs.h                      |   2 +-
 include/linux/signal_types.h                  |   4 +-
 include/linux/user_namespace.h                |  31 +++-
 ipc/mqueue.c                                  |  29 ++--
 ipc/shm.c                                     |  31 ++--
 kernel/cred.c                                 |  46 ++++-
 kernel/exit.c                                 |   2 +-
 kernel/fork.c                                 |  12 +-
 kernel/signal.c                               |  53 +++---
 kernel/sys.c                                  |  13 --
 kernel/ucount.c                               | 111 +++++++++---
 kernel/user.c                                 |   2 -
 kernel/user_namespace.c                       |   7 +-
 mm/memfd.c                                    |   4 +-
 mm/mlock.c                                    |  35 ++--
 mm/mmap.c                                     |   3 +-
 mm/shmem.c                                    |   8 +-
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/rlimits/.gitignore    |   2 +
 tools/testing/selftests/rlimits/Makefile      |   6 +
 tools/testing/selftests/rlimits/config        |   1 +
 .../selftests/rlimits/rlimits-per-userns.c    | 161 ++++++++++++++++++
 32 files changed, 445 insertions(+), 182 deletions(-)
 create mode 100644 tools/testing/selftests/rlimits/.gitignore
 create mode 100644 tools/testing/selftests/rlimits/Makefile
 create mode 100644 tools/testing/selftests/rlimits/config
 create mode 100644 tools/testing/selftests/rlimits/rlimits-per-userns.c

-- 
2.29.2

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

WARNING: multiple messages have this Message-ID (diff)
From: Alexey Gladkov <gladkov.alexey@gmail.com>
To: LKML <linux-kernel@vger.kernel.org>,
	io-uring@vger.kernel.org,
	Kernel Hardening <kernel-hardening@lists.openwall.com>,
	Linux Containers <containers@lists.linux-foundation.org>,
	linux-mm@kvack.org
Cc: Alexey Gladkov <legion@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	"Eric W . Biederman" <ebiederm@xmission.com>,
	Jann Horn <jannh@google.com>, Jens Axboe <axboe@kernel.dk>,
	Kees Cook <keescook@chromium.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>
Subject: [RFC PATCH v3 0/8] Count rlimits in each user namespace
Date: Fri, 15 Jan 2021 15:57:21 +0100	[thread overview]
Message-ID: <cover.1610722473.git.gladkov.alexey@gmail.com> (raw)

Preface
-------
These patches are for binding the rlimit counters to a user in user namespace.
This patch set can be applied on top of:

git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.11-rc2

Problem
-------
The RLIMIT_NPROC, RLIMIT_MEMLOCK, RLIMIT_SIGPENDING, RLIMIT_MSGQUEUE rlimits
implementation places the counters in user_struct [1]. These limits are global
between processes and persists for the lifetime of the process, even if
processes are in different user namespaces.

To illustrate the impact of rlimits, let's say there is a program that does not
fork. Some service-A wants to run this program as user X in multiple containers.
Since the program never fork the service wants to set RLIMIT_NPROC=1.

service-A
 \- program (uid=1000, container1, rlimit_nproc=1)
 \- program (uid=1000, container2, rlimit_nproc=1)

The service-A sets RLIMIT_NPROC=1 and runs the program in container1. When the
service-A tries to run a program with RLIMIT_NPROC=1 in container2 it fails
since user X already has one running process.

The problem is not that the limit from container1 affects container2. The
problem is that limit is verified against the global counter that reflects
the number of processes in all containers.

This problem can be worked around by using different users for each container
but in this case we face a different problem of uid mapping when transferring
files from one container to another.

Eric W. Biederman mentioned this issue [2][3].

Introduced changes
------------------
To address the problem, we bind rlimit counters to user namespace. Each counter
reflects the number of processes in a given uid in a given user namespace. The
result is a tree of rlimit counters with the biggest value at the root (aka
init_user_ns). The limit is considered exceeded if it's exceeded up in the tree.

[1] https://lore.kernel.org/containers/87imd2incs.fsf@x220.int.ebiederm.org/
[2] https://lists.linuxfoundation.org/pipermail/containers/2020-August/042096.html
[3] https://lists.linuxfoundation.org/pipermail/containers/2020-October/042524.html

Changelog
---------
v3:
* Added get_ucounts() function to increase the reference count. The existing
  get_counts() function renamed to __get_ucounts().
* The type of ucounts.count changed from atomic_t to refcount_t.
* Dropped 'const' from set_cred_ucounts() arguments.
* Fixed a bug with freeing the cred structure after calling cred_alloc_blank().
* Commit messages have been updated.
* Added selftest.

v2:
* RLIMIT_MEMLOCK, RLIMIT_SIGPENDING and RLIMIT_MSGQUEUE are migrated to ucounts.
* Added ucounts for pair uid and user namespace into cred.
* Added the ability to increase ucount by more than 1.

v1:
* After discussion with Eric W. Biederman, I increased the size of ucounts to
  atomic_long_t.
* Added ucount_max to avoid the fork bomb.

--

Alexey Gladkov (8):
  Use refcount_t for ucounts reference counting
  Add a reference to ucounts for each cred
  Move RLIMIT_NPROC counter to ucounts
  Move RLIMIT_MSGQUEUE counter to ucounts
  Move RLIMIT_SIGPENDING counter to ucounts
  Move RLIMIT_MEMLOCK counter to ucounts
  Move RLIMIT_NPROC check to the place where we increment the counter
  kselftests: Add test to check for rlimit changes in different user
    namespaces

 fs/exec.c                                     |   2 +-
 fs/hugetlbfs/inode.c                          |  17 +-
 fs/io-wq.c                                    |  22 ++-
 fs/io-wq.h                                    |   2 +-
 fs/io_uring.c                                 |   2 +-
 fs/proc/array.c                               |   2 +-
 include/linux/cred.h                          |   3 +
 include/linux/hugetlb.h                       |   3 +-
 include/linux/mm.h                            |   4 +-
 include/linux/sched/user.h                    |   6 -
 include/linux/shmem_fs.h                      |   2 +-
 include/linux/signal_types.h                  |   4 +-
 include/linux/user_namespace.h                |  31 +++-
 ipc/mqueue.c                                  |  29 ++--
 ipc/shm.c                                     |  31 ++--
 kernel/cred.c                                 |  46 ++++-
 kernel/exit.c                                 |   2 +-
 kernel/fork.c                                 |  12 +-
 kernel/signal.c                               |  53 +++---
 kernel/sys.c                                  |  13 --
 kernel/ucount.c                               | 111 +++++++++---
 kernel/user.c                                 |   2 -
 kernel/user_namespace.c                       |   7 +-
 mm/memfd.c                                    |   4 +-
 mm/mlock.c                                    |  35 ++--
 mm/mmap.c                                     |   3 +-
 mm/shmem.c                                    |   8 +-
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/rlimits/.gitignore    |   2 +
 tools/testing/selftests/rlimits/Makefile      |   6 +
 tools/testing/selftests/rlimits/config        |   1 +
 .../selftests/rlimits/rlimits-per-userns.c    | 161 ++++++++++++++++++
 32 files changed, 445 insertions(+), 182 deletions(-)
 create mode 100644 tools/testing/selftests/rlimits/.gitignore
 create mode 100644 tools/testing/selftests/rlimits/Makefile
 create mode 100644 tools/testing/selftests/rlimits/config
 create mode 100644 tools/testing/selftests/rlimits/rlimits-per-userns.c

-- 
2.29.2


             reply	other threads:[~2021-01-15 14:59 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-15 14:57 Alexey Gladkov [this message]
2021-01-15 14:57 ` [RFC PATCH v3 0/8] Count rlimits in each user namespace Alexey Gladkov
2021-01-15 14:57 ` [RFC PATCH v3 1/8] Use refcount_t for ucounts reference counting Alexey Gladkov
2021-01-15 14:57   ` Alexey Gladkov
2021-01-18  6:06   ` c25050162e: WARNING:at_lib/refcount.c:#refcount_warn_saturate kernel test robot
2021-01-18  6:06     ` kernel test robot
2021-01-18  6:06     ` kernel test robot
2021-01-18 19:14   ` [RFC PATCH v3 1/8] Use refcount_t for ucounts reference counting Linus Torvalds
2021-01-18 19:14     ` Linus Torvalds
2021-01-18 19:14     ` Linus Torvalds
2021-01-18 19:45     ` Alexey Gladkov
2021-01-18 19:45       ` Alexey Gladkov
2021-01-18 20:34       ` Linus Torvalds
2021-01-18 20:34         ` Linus Torvalds
2021-01-18 20:34         ` Linus Torvalds
2021-01-18 20:56         ` Alexey Gladkov
2021-01-18 20:56           ` Alexey Gladkov
2021-01-19  4:35           ` Kaiwan N Billimoria
2021-01-19  4:35             ` Kaiwan N Billimoria
2021-01-19  4:35             ` Kaiwan N Billimoria
2021-01-20  1:57           ` Eric W. Biederman
2021-01-20  1:57             ` Eric W. Biederman
2021-01-20  1:57             ` Eric W. Biederman
2021-01-20  1:58             ` Eric W. Biederman
2021-01-20  1:58               ` Eric W. Biederman
2021-01-20  1:58               ` Eric W. Biederman
2021-01-21 12:04             ` Alexey Gladkov
2021-01-21 12:04               ` Alexey Gladkov
2021-01-21 15:50               ` Eric W. Biederman
2021-01-21 15:50                 ` Eric W. Biederman
2021-01-21 15:50                 ` Eric W. Biederman
2021-01-21 16:07                 ` Alexey Gladkov
2021-01-21 16:07                   ` Alexey Gladkov
2021-01-15 14:57 ` [RFC PATCH v3 2/8] Add a reference to ucounts for each cred Alexey Gladkov
2021-01-15 14:57   ` Alexey Gladkov
2021-01-15 22:15   ` kernel test robot
2021-01-16  2:29   ` kernel test robot
2021-01-18  6:47   ` 14c3c8a27f: kernel_BUG_at_kernel/cred.c kernel test robot
2021-01-18  6:47     ` kernel test robot
2021-01-18  6:47     ` kernel test robot
2021-01-18  8:31   ` [PATCH v4 2/8] Add a reference to ucounts for each cred Alexey Gladkov
2021-01-18  8:31     ` Alexey Gladkov
2021-01-15 14:57 ` [RFC PATCH v3 3/8] Move RLIMIT_NPROC counter to ucounts Alexey Gladkov
2021-01-15 14:57   ` Alexey Gladkov
2021-01-15 14:57 ` [RFC PATCH v3 4/8] Move RLIMIT_MSGQUEUE " Alexey Gladkov
2021-01-15 14:57   ` Alexey Gladkov
2021-01-15 14:57 ` [RFC PATCH v3 5/8] Move RLIMIT_SIGPENDING " Alexey Gladkov
2021-01-15 14:57   ` Alexey Gladkov
2021-01-15 14:57 ` [RFC PATCH v3 6/8] Move RLIMIT_MEMLOCK " Alexey Gladkov
2021-01-15 14:57   ` Alexey Gladkov
2021-01-15 14:57 ` [RFC PATCH v3 7/8] Move RLIMIT_NPROC check to the place where we increment the counter Alexey Gladkov
2021-01-15 14:57   ` Alexey Gladkov
2021-01-15 14:57 ` [RFC PATCH v3 8/8] kselftests: Add test to check for rlimit changes in different user namespaces Alexey Gladkov
2021-01-15 14:57   ` Alexey Gladkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1610722473.git.gladkov.alexey@gmail.com \
    --to=gladkov.alexey@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=containers@lists.linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=io-uring@vger.kernel.org \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=legion@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.