From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.7 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69C4FC433E0 for ; Mon, 15 Feb 2021 12:43:34 +0000 (UTC) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 153C764DC3 for ; Mon, 15 Feb 2021 12:43:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 153C764DC3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=containers-bounces@lists.linux-foundation.org Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id B7E806F54C for ; Mon, 15 Feb 2021 12:43:33 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9YDDh4WeWH_4 for ; Mon, 15 Feb 2021 12:43:30 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id BAFD06F5AA; Mon, 15 Feb 2021 12:43:30 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTP id 317A46F575; Mon, 15 Feb 2021 12:42:51 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 18B8EC0893; Mon, 15 Feb 2021 12:42:51 +0000 (UTC) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 7AEE3C0893 for ; Mon, 15 Feb 2021 12:42:49 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 3C1E26F54C for ; Mon, 15 Feb 2021 12:42:49 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3QcB6sq2FBpT for ; Mon, 15 Feb 2021 12:42:47 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id 047216F491; Mon, 15 Feb 2021 12:42:46 +0000 (UTC) Received: from localhost by smtp3.osuosl.org with SpamAssassin (version 3.4.2); Mon, 15 Feb 2021 12:42:46 +0000 From: Alexey Gladkov To: LKML , io-uring@vger.kernel.org, Kernel Hardening , Linux Containers , linux-mm@kvack.org Subject: [PATCH v6 0/7] Count rlimits in each user namespace Date: Mon, 15 Feb 2021 13:41:07 +0100 Message-Id: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----------=_602A6C46.5964EAFA" Cc: Jens Axboe , Kees Cook , Jann Horn , Linus Torvalds , Oleg Nesterov , "Eric W . Biederman" , Andrew Morton , Alexey Gladkov X-BeenThere: containers@lists.linux-foundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Linux Containers List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: containers-bounces@lists.linux-foundation.org Sender: "Containers" This is a multi-part message in MIME format. ------------=_602A6C46.5964EAFA Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit Spam detection software, running on the system "smtp3.osuosl.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Preface ------- These patches are for binding the rlimit counters to a user in user namespace. This patch set can be applied on top of: git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.11 Content analysis details: (5.2 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4674] 1.0 FORGED_GMAIL_RCVD 'From' gmail.com does not match 'Received' headers 0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (gladkov.alexey[at]gmail.com) 0.0 DKIM_ADSP_CUSTOM_MED No valid author signature, adsp_override is CUSTOM_MED 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 0.9 NML_ADSP_CUSTOM_MED ADSP custom_med hit, and not from a mailing list 1.9 SPOOFED_FREEMAIL No description available. 0.0 SPOOF_GMAIL_MID From Gmail but it doesn't seem to be... ------------=_602A6C46.5964EAFA Content-Type: message/rfc822; x-spam-type=original Content-Description: original message before SpamAssassin Content-Disposition: inline Content-Transfer-Encoding: 8bit Return-Path: X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 Received: from raptor.unsafe.ru (raptor.unsafe.ru [5.9.43.93]) by smtp3.osuosl.org (Postfix) with ESMTP id 0061B6F491 for ; Mon, 15 Feb 2021 12:42:43 +0000 (UTC) Received: from comp-core-i7-2640m-0182e6.redhat.com (ip-94-113-225-162.net.upcbroadband.cz [94.113.225.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by raptor.unsafe.ru (Postfix) with ESMTPSA id 255A020460; Mon, 15 Feb 2021 12:42:23 +0000 (UTC) From: Alexey Gladkov To: LKML , io-uring@vger.kernel.org, Kernel Hardening , Linux Containers , linux-mm@kvack.org Cc: Alexey Gladkov , Andrew Morton , Christian Brauner , "Eric W . Biederman" , Jann Horn , Jens Axboe , Kees Cook , Linus Torvalds , Oleg Nesterov Subject: [PATCH v6 0/7] Count rlimits in each user namespace Date: Mon, 15 Feb 2021 13:41:07 +0100 Message-Id: X-Mailer: git-send-email 2.29.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.6.1 (raptor.unsafe.ru [5.9.43.93]); Mon, 15 Feb 2021 12:42:40 +0000 (UTC) Preface ------- These patches are for binding the rlimit counters to a user in user namespace. This patch set can be applied on top of: git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.11 Problem ------- The RLIMIT_NPROC, RLIMIT_MEMLOCK, RLIMIT_SIGPENDING, RLIMIT_MSGQUEUE rlimits implementation places the counters in user_struct [1]. These limits are global between processes and persists for the lifetime of the process, even if processes are in different user namespaces. To illustrate the impact of rlimits, let's say there is a program that does not fork. Some service-A wants to run this program as user X in multiple containers. Since the program never fork the service wants to set RLIMIT_NPROC=1. service-A \- program (uid=1000, container1, rlimit_nproc=1) \- program (uid=1000, container2, rlimit_nproc=1) The service-A sets RLIMIT_NPROC=1 and runs the program in container1. When the service-A tries to run a program with RLIMIT_NPROC=1 in container2 it fails since user X already has one running process. The problem is not that the limit from container1 affects container2. The problem is that limit is verified against the global counter that reflects the number of processes in all containers. This problem can be worked around by using different users for each container but in this case we face a different problem of uid mapping when transferring files from one container to another. Eric W. Biederman mentioned this issue [2][3]. Introduced changes ------------------ To address the problem, we bind rlimit counters to user namespace. Each counter reflects the number of processes in a given uid in a given user namespace. The result is a tree of rlimit counters with the biggest value at the root (aka init_user_ns). The limit is considered exceeded if it's exceeded up in the tree. [1] https://lore.kernel.org/containers/87imd2incs.fsf@x220.int.ebiederm.org/ [2] https://lists.linuxfoundation.org/pipermail/containers/2020-August/042096.html [3] https://lists.linuxfoundation.org/pipermail/containers/2020-October/042524.html Changelog --------- v6: * Fixed issues found by lkp-tests project. * Rebased onto v5.11. v5: * Split the first commit into two commits: change ucounts.count type to atomic_long_t and add ucounts to cred. These commits were merged by mistake during the rebase. * The __get_ucounts() renamed to alloc_ucounts(). * The cred.ucounts update has been moved from commit_creds() as it did not allow to handle errors. * Added error handling of set_cred_ucounts(). v4: * Reverted the type change of ucounts.count to refcount_t. * Fixed typo in the kernel/cred.c v3: * Added get_ucounts() function to increase the reference count. The existing get_counts() function renamed to __get_ucounts(). * The type of ucounts.count changed from atomic_t to refcount_t. * Dropped 'const' from set_cred_ucounts() arguments. * Fixed a bug with freeing the cred structure after calling cred_alloc_blank(). * Commit messages have been updated. * Added selftest. v2: * RLIMIT_MEMLOCK, RLIMIT_SIGPENDING and RLIMIT_MSGQUEUE are migrated to ucounts. * Added ucounts for pair uid and user namespace into cred. * Added the ability to increase ucount by more than 1. v1: * After discussion with Eric W. Biederman, I increased the size of ucounts to atomic_long_t. * Added ucount_max to avoid the fork bomb. -- Alexey Gladkov (7): Increase size of ucounts to atomic_long_t Add a reference to ucounts for each cred Reimplement RLIMIT_NPROC on top of ucounts Reimplement RLIMIT_MSGQUEUE on top of ucounts Reimplement RLIMIT_SIGPENDING on top of ucounts Reimplement RLIMIT_MEMLOCK on top of ucounts kselftests: Add test to check for rlimit changes in different user namespaces fs/exec.c | 6 +- fs/hugetlbfs/inode.c | 16 +- fs/io-wq.c | 22 ++- fs/io-wq.h | 2 +- fs/io_uring.c | 2 +- fs/proc/array.c | 2 +- include/linux/cred.h | 4 + include/linux/hugetlb.h | 4 +- include/linux/mm.h | 4 +- include/linux/sched/user.h | 7 - include/linux/shmem_fs.h | 2 +- include/linux/signal_types.h | 4 +- include/linux/user_namespace.h | 24 ++- ipc/mqueue.c | 29 ++-- ipc/shm.c | 30 ++-- kernel/cred.c | 50 +++++- kernel/exit.c | 2 +- kernel/fork.c | 18 +- kernel/signal.c | 53 +++--- kernel/sys.c | 14 +- kernel/ucount.c | 120 +++++++++++-- kernel/user.c | 3 - kernel/user_namespace.c | 9 +- mm/memfd.c | 5 +- mm/mlock.c | 35 ++-- mm/mmap.c | 4 +- mm/shmem.c | 8 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/rlimits/.gitignore | 2 + tools/testing/selftests/rlimits/Makefile | 6 + tools/testing/selftests/rlimits/config | 1 + .../selftests/rlimits/rlimits-per-userns.c | 161 ++++++++++++++++++ 32 files changed, 495 insertions(+), 155 deletions(-) create mode 100644 tools/testing/selftests/rlimits/.gitignore create mode 100644 tools/testing/selftests/rlimits/Makefile create mode 100644 tools/testing/selftests/rlimits/config create mode 100644 tools/testing/selftests/rlimits/rlimits-per-userns.c -- 2.29.2 ------------=_602A6C46.5964EAFA Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Containers mailing list Containers@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/containers ------------=_602A6C46.5964EAFA-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 629BFC433E0 for ; Mon, 15 Feb 2021 12:43:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 244EC64DEC for ; Mon, 15 Feb 2021 12:43:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229977AbhBOMnc (ORCPT ); Mon, 15 Feb 2021 07:43:32 -0500 Received: from raptor.unsafe.ru ([5.9.43.93]:54650 "EHLO raptor.unsafe.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229802AbhBOMnY (ORCPT ); Mon, 15 Feb 2021 07:43:24 -0500 Received: from comp-core-i7-2640m-0182e6.redhat.com (ip-94-113-225-162.net.upcbroadband.cz [94.113.225.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by raptor.unsafe.ru (Postfix) with ESMTPSA id 255A020460; Mon, 15 Feb 2021 12:42:23 +0000 (UTC) From: Alexey Gladkov To: LKML , io-uring@vger.kernel.org, Kernel Hardening , Linux Containers , linux-mm@kvack.org Cc: Alexey Gladkov , Andrew Morton , Christian Brauner , "Eric W . Biederman" , Jann Horn , Jens Axboe , Kees Cook , Linus Torvalds , Oleg Nesterov Subject: [PATCH v6 0/7] Count rlimits in each user namespace Date: Mon, 15 Feb 2021 13:41:07 +0100 Message-Id: X-Mailer: git-send-email 2.29.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.6.1 (raptor.unsafe.ru [5.9.43.93]); Mon, 15 Feb 2021 12:42:40 +0000 (UTC) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Preface ------- These patches are for binding the rlimit counters to a user in user namespace. This patch set can be applied on top of: git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.11 Problem ------- The RLIMIT_NPROC, RLIMIT_MEMLOCK, RLIMIT_SIGPENDING, RLIMIT_MSGQUEUE rlimits implementation places the counters in user_struct [1]. These limits are global between processes and persists for the lifetime of the process, even if processes are in different user namespaces. To illustrate the impact of rlimits, let's say there is a program that does not fork. Some service-A wants to run this program as user X in multiple containers. Since the program never fork the service wants to set RLIMIT_NPROC=1. service-A \- program (uid=1000, container1, rlimit_nproc=1) \- program (uid=1000, container2, rlimit_nproc=1) The service-A sets RLIMIT_NPROC=1 and runs the program in container1. When the service-A tries to run a program with RLIMIT_NPROC=1 in container2 it fails since user X already has one running process. The problem is not that the limit from container1 affects container2. The problem is that limit is verified against the global counter that reflects the number of processes in all containers. This problem can be worked around by using different users for each container but in this case we face a different problem of uid mapping when transferring files from one container to another. Eric W. Biederman mentioned this issue [2][3]. Introduced changes ------------------ To address the problem, we bind rlimit counters to user namespace. Each counter reflects the number of processes in a given uid in a given user namespace. The result is a tree of rlimit counters with the biggest value at the root (aka init_user_ns). The limit is considered exceeded if it's exceeded up in the tree. [1] https://lore.kernel.org/containers/87imd2incs.fsf@x220.int.ebiederm.org/ [2] https://lists.linuxfoundation.org/pipermail/containers/2020-August/042096.html [3] https://lists.linuxfoundation.org/pipermail/containers/2020-October/042524.html Changelog --------- v6: * Fixed issues found by lkp-tests project. * Rebased onto v5.11. v5: * Split the first commit into two commits: change ucounts.count type to atomic_long_t and add ucounts to cred. These commits were merged by mistake during the rebase. * The __get_ucounts() renamed to alloc_ucounts(). * The cred.ucounts update has been moved from commit_creds() as it did not allow to handle errors. * Added error handling of set_cred_ucounts(). v4: * Reverted the type change of ucounts.count to refcount_t. * Fixed typo in the kernel/cred.c v3: * Added get_ucounts() function to increase the reference count. The existing get_counts() function renamed to __get_ucounts(). * The type of ucounts.count changed from atomic_t to refcount_t. * Dropped 'const' from set_cred_ucounts() arguments. * Fixed a bug with freeing the cred structure after calling cred_alloc_blank(). * Commit messages have been updated. * Added selftest. v2: * RLIMIT_MEMLOCK, RLIMIT_SIGPENDING and RLIMIT_MSGQUEUE are migrated to ucounts. * Added ucounts for pair uid and user namespace into cred. * Added the ability to increase ucount by more than 1. v1: * After discussion with Eric W. Biederman, I increased the size of ucounts to atomic_long_t. * Added ucount_max to avoid the fork bomb. -- Alexey Gladkov (7): Increase size of ucounts to atomic_long_t Add a reference to ucounts for each cred Reimplement RLIMIT_NPROC on top of ucounts Reimplement RLIMIT_MSGQUEUE on top of ucounts Reimplement RLIMIT_SIGPENDING on top of ucounts Reimplement RLIMIT_MEMLOCK on top of ucounts kselftests: Add test to check for rlimit changes in different user namespaces fs/exec.c | 6 +- fs/hugetlbfs/inode.c | 16 +- fs/io-wq.c | 22 ++- fs/io-wq.h | 2 +- fs/io_uring.c | 2 +- fs/proc/array.c | 2 +- include/linux/cred.h | 4 + include/linux/hugetlb.h | 4 +- include/linux/mm.h | 4 +- include/linux/sched/user.h | 7 - include/linux/shmem_fs.h | 2 +- include/linux/signal_types.h | 4 +- include/linux/user_namespace.h | 24 ++- ipc/mqueue.c | 29 ++-- ipc/shm.c | 30 ++-- kernel/cred.c | 50 +++++- kernel/exit.c | 2 +- kernel/fork.c | 18 +- kernel/signal.c | 53 +++--- kernel/sys.c | 14 +- kernel/ucount.c | 120 +++++++++++-- kernel/user.c | 3 - kernel/user_namespace.c | 9 +- mm/memfd.c | 5 +- mm/mlock.c | 35 ++-- mm/mmap.c | 4 +- mm/shmem.c | 8 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/rlimits/.gitignore | 2 + tools/testing/selftests/rlimits/Makefile | 6 + tools/testing/selftests/rlimits/config | 1 + .../selftests/rlimits/rlimits-per-userns.c | 161 ++++++++++++++++++ 32 files changed, 495 insertions(+), 155 deletions(-) create mode 100644 tools/testing/selftests/rlimits/.gitignore create mode 100644 tools/testing/selftests/rlimits/Makefile create mode 100644 tools/testing/selftests/rlimits/config create mode 100644 tools/testing/selftests/rlimits/rlimits-per-userns.c -- 2.29.2