linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	 Kernel Hardening <kernel-hardening@lists.openwall.com>,
	 Linux Containers <containers@lists.linux-foundation.org>,
	 linux-mm@kvack.org,  Alexey Gladkov <legion@kernel.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 Christian Brauner <christian.brauner@ubuntu.com>,
	 Jann Horn <jannh@google.com>,  Jens Axboe <axboe@kernel.dk>,
	 Kees Cook <keescook@chromium.org>,
	 Linus Torvalds <torvalds@linux-foundation.org>,
	 Oleg Nesterov <oleg@redhat.com>
Subject: Re: [PATCH v9 0/8] Count rlimits in each user namespace
Date: Mon, 05 Apr 2021 12:03:05 -0500	[thread overview]
Message-ID: <m17dlgll4m.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <cover.1616533074.git.gladkov.alexey@gmail.com> (Alexey Gladkov's message of "Tue, 23 Mar 2021 21:59:09 +0100")

Alexey Gladkov <gladkov.alexey@gmail.com> writes:

> Preface
> -------
> These patches are for binding the rlimit counters to a user in user namespace.
> This patch set can be applied on top of:
>
> git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.12-rc4
>
> Problem
> -------
> The RLIMIT_NPROC, RLIMIT_MEMLOCK, RLIMIT_SIGPENDING, RLIMIT_MSGQUEUE rlimits
> implementation places the counters in user_struct [1]. These limits are global
> between processes and persists for the lifetime of the process, even if
> processes are in different user namespaces.
>
> To illustrate the impact of rlimits, let's say there is a program that does not
> fork. Some service-A wants to run this program as user X in multiple containers.
> Since the program never fork the service wants to set RLIMIT_NPROC=1.
>
> service-A
>  \- program (uid=1000, container1, rlimit_nproc=1)
>  \- program (uid=1000, container2, rlimit_nproc=1)
>
> The service-A sets RLIMIT_NPROC=1 and runs the program in container1. When the
> service-A tries to run a program with RLIMIT_NPROC=1 in container2 it fails
> since user X already has one running process.
>
> The problem is not that the limit from container1 affects container2. The
> problem is that limit is verified against the global counter that reflects
> the number of processes in all containers.
>
> This problem can be worked around by using different users for each container
> but in this case we face a different problem of uid mapping when transferring
> files from one container to another.
>
> Eric W. Biederman mentioned this issue [2][3].
>
> Introduced changes
> ------------------
> To address the problem, we bind rlimit counters to user namespace. Each counter
> reflects the number of processes in a given uid in a given user namespace. The
> result is a tree of rlimit counters with the biggest value at the root (aka
> init_user_ns). The limit is considered exceeded if it's exceeded up in the tree.
>
> [1]: https://lore.kernel.org/containers/87imd2incs.fsf@x220.int.ebiederm.org/
> [2]: https://lists.linuxfoundation.org/pipermail/containers/2020-August/042096.html
> [3]: https://lists.linuxfoundation.org/pipermail/containers/2020-October/042524.html
>
> Changelog
> ---------
> v9:
> * Used a negative value to check that the ucounts->count is close to overflow.
> * Rebased onto v5.12-rc4.
>
> v8:
> * Used atomic_t for ucounts reference counting. Also added counter overflow
>   check (thanks to Linus Torvalds for the idea).
> * Fixed other issues found by lkp-tests project in the patch that Reimplements
>   RLIMIT_MEMLOCK on top of ucounts.
>
> v7:
> * Fixed issues found by lkp-tests project in the patch that Reimplements
>   RLIMIT_MEMLOCK on top of ucounts.
>
> v6:
> * Fixed issues found by lkp-tests project.
> * Rebased onto v5.11.
>
> v5:
> * Split the first commit into two commits: change ucounts.count type to atomic_long_t
>   and add ucounts to cred. These commits were merged by mistake during the rebase.
> * The __get_ucounts() renamed to alloc_ucounts().
> * The cred.ucounts update has been moved from commit_creds() as it did not allow
>   to handle errors.
> * Added error handling of set_cred_ucounts().
>
> v4:
> * Reverted the type change of ucounts.count to refcount_t.
> * Fixed typo in the kernel/cred.c
>
> v3:
> * Added get_ucounts() function to increase the reference count. The existing
>   get_counts() function renamed to __get_ucounts().
> * The type of ucounts.count changed from atomic_t to refcount_t.
> * Dropped 'const' from set_cred_ucounts() arguments.
> * Fixed a bug with freeing the cred structure after calling cred_alloc_blank().
> * Commit messages have been updated.
> * Added selftest.
>
> v2:
> * RLIMIT_MEMLOCK, RLIMIT_SIGPENDING and RLIMIT_MSGQUEUE are migrated to ucounts.
> * Added ucounts for pair uid and user namespace into cred.
> * Added the ability to increase ucount by more than 1.
>
> v1:
> * After discussion with Eric W. Biederman, I increased the size of ucounts to
>   atomic_long_t.
> * Added ucount_max to avoid the fork bomb.
>
> --
>
> Alexey Gladkov (8):
>   Increase size of ucounts to atomic_long_t
>   Add a reference to ucounts for each cred
>   Use atomic_t for ucounts reference counting
>   Reimplement RLIMIT_NPROC on top of ucounts
>   Reimplement RLIMIT_MSGQUEUE on top of ucounts
>   Reimplement RLIMIT_SIGPENDING on top of ucounts
>   Reimplement RLIMIT_MEMLOCK on top of ucounts
>   kselftests: Add test to check for rlimit changes in different user namespaces
>

Overall this looks good, and it is a very good sign that the automatic
testing bots have not found anything.  I found a few little nits.
But thing are looking very good.

Eric



      parent reply	other threads:[~2021-04-05 17:03 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-23 20:59 [PATCH v9 0/8] Count rlimits in each user namespace Alexey Gladkov
2021-03-23 20:59 ` [PATCH v9 1/8] Increase size of ucounts to atomic_long_t Alexey Gladkov
2021-03-23 20:59 ` [PATCH v9 2/8] Add a reference to ucounts for each cred Alexey Gladkov
2021-03-23 20:59 ` [PATCH v9 3/8] Use atomic_t for ucounts reference counting Alexey Gladkov
2021-04-05 17:01   ` Eric W. Biederman
2021-03-23 20:59 ` [PATCH v9 4/8] Reimplement RLIMIT_NPROC on top of ucounts Alexey Gladkov
2021-04-05 16:56   ` Eric W. Biederman
2021-04-06 15:44     ` Alexey Gladkov
2021-04-07 16:56       ` Eric W. Biederman
2021-03-23 20:59 ` [PATCH v9 5/8] Reimplement RLIMIT_MSGQUEUE " Alexey Gladkov
2021-03-23 20:59 ` [PATCH v9 6/8] Reimplement RLIMIT_SIGPENDING " Alexey Gladkov
2021-04-05 16:58   ` Eric W. Biederman
2021-03-23 20:59 ` [PATCH v9 7/8] Reimplement RLIMIT_MEMLOCK " Alexey Gladkov
2021-03-23 20:59 ` [PATCH v9 8/8] kselftests: Add test to check for rlimit changes in different user namespaces Alexey Gladkov
2021-04-05 17:03 ` Eric W. Biederman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m17dlgll4m.fsf@fess.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=christian.brauner@ubuntu.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=gladkov.alexey@gmail.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=legion@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).