From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60B13C35254 for ; Wed, 5 Feb 2020 13:26:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3DE3B2072B for ; Wed, 5 Feb 2020 13:26:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728052AbgBEN0q (ORCPT ); Wed, 5 Feb 2020 08:26:46 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:56947 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726748AbgBEN0q (ORCPT ); Wed, 5 Feb 2020 08:26:46 -0500 Received: from ip5f5bf7ec.dynamic.kabel-deutschland.de ([95.91.247.236] helo=wittgenstein.fritz.box) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1izKhX-00035h-Ps; Wed, 05 Feb 2020 13:26:43 +0000 From: Christian Brauner To: linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, Tejun Heo Cc: Oleg Nesterov , Peter Zijlstra , Christian Brauner Subject: [PATCH v6 0/6] clone3 & cgroups: allow spawning processes into cgroups Date: Wed, 5 Feb 2020 14:26:17 +0100 Message-Id: <20200205132623.670015-1-christian.brauner@ubuntu.com> X-Mailer: git-send-email 2.25.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hey Tejun, This is v6 of the promised series to enable spawning processes into a target cgroup different from the parent's cgroup. This series can be pulled from the signed tag clone_into_cgroup_v5.7: git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/clone_into_cgroup_v5.7 and is available at kernel.org: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=clone_into_cgroup github.com: https://github.com/brauner/linux/tree/clone_into_cgroup gitlab.com: https://gitlab.com/brauner/linux/commits/clone_into_cgroup /* v1 */ Link: https://lore.kernel.org/r/20191218173516.7875-1-christian.brauner@ubuntu.com /* v2 */ Link: https://lore.kernel.org/r/20191223061504.28716-1-christian.brauner@ubuntu.com Rework locking and remove unneeded helper functions. Please see individual patch changelogs for details. With this I've been able to run the cgroup selftests and stress tests in loops for a long time without any regressions or deadlocks; lockdep and kasan did not complain either. /* v3 */ Link: https://lore.kernel.org/r/20200117002143.15559-1-christian.brauner@ubuntu.com Split preliminary work into separate patches. See changelog of individual commits. /* v4 */ Link: https://lore.kernel.org/r/20200117181219.14542-1-christian.brauner@ubuntu.com Verify that we have write access to the target cgroup. This is usually done by the vfs but since we aren't going through the vfs with CLONE_INTO_CGROUP we need to do it ourselves. /* v5 */ Link: https://lore.kernel.org/r/20200121154844.411-1-christian.brauner@ubuntu.com Don't pass down the parent task_struct as argument, just use current directly. Put kargs->cset on error. /* v6 */ Fix refcounting when setting new root_cset for CLONE_INTO_CGROUP. With this cgroup migration will be a lot easier, and accounting will be more exact. It also allows for nice features such as creating a frozen process by spawning it into a frozen cgroup. The code simplifies container creation and exec logic quite a bit as well. I've tried to contain all core changes for this features in kernel/cgroup/* to avoid exposing cgroup internals. This has mostly worked. When a new process is supposed to be spawned in a cgroup different from the parent's then we briefly acquire the cgroup mutex right before fork()'s point of no return and drop it once the child process has been attached to the tasklist and to its css_set. This is done to ensure that the cgroup isn't removed behind our back. The cgroup mutex is _only_ held in this case; the usual case, where the child is created in the same cgroup as the parent does not acquire it since the cgroup can't be removed. The series already comes with proper testing. Once we've decided that this approach is good I'll expand the test-suite even more. Thanks! Christian Christian Brauner (6): cgroup: unify attach permission checking cgroup: add cgroup_get_from_file() helper cgroup: refactor fork helpers cgroup: add cgroup_may_write() helper clone3: allow spawning processes into cgroups selftests/cgroup: add tests for cloning into cgroups include/linux/cgroup-defs.h | 5 +- include/linux/cgroup.h | 20 +- include/linux/sched/task.h | 4 + include/uapi/linux/sched.h | 5 + kernel/cgroup/cgroup.c | 291 ++++++++++++++---- kernel/cgroup/pids.c | 15 +- kernel/fork.c | 19 +- tools/testing/selftests/cgroup/Makefile | 6 +- tools/testing/selftests/cgroup/cgroup_util.c | 126 ++++++++ tools/testing/selftests/cgroup/cgroup_util.h | 4 + tools/testing/selftests/cgroup/test_core.c | 64 ++++ .../selftests/clone3/clone3_selftests.h | 19 +- 12 files changed, 496 insertions(+), 82 deletions(-) base-commit: d5226fa6dbae0569ee43ecfc08bdcd6770fc4755 -- 2.25.0 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christian Brauner Subject: [PATCH v6 0/6] clone3 & cgroups: allow spawning processes into cgroups Date: Wed, 5 Feb 2020 14:26:17 +0100 Message-ID: <20200205132623.670015-1-christian.brauner@ubuntu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Return-path: Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tejun Heo Cc: Oleg Nesterov , Peter Zijlstra , Christian Brauner List-Id: linux-api@vger.kernel.org Hey Tejun, This is v6 of the promised series to enable spawning processes into a target cgroup different from the parent's cgroup. This series can be pulled from the signed tag clone_into_cgroup_v5.7: git-OoYKEaZ2EDaWaY/ihj7yzEB+6BGkLq7r@public.gmane.org:pub/scm/linux/kernel/git/brauner/linux tags/clone_into_cgroup_v5.7 and is available at kernel.org: https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=clone_into_cgroup github.com: https://github.com/brauner/linux/tree/clone_into_cgroup gitlab.com: https://gitlab.com/brauner/linux/commits/clone_into_cgroup /* v1 */ Link: https://lore.kernel.org/r/20191218173516.7875-1-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org /* v2 */ Link: https://lore.kernel.org/r/20191223061504.28716-1-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org Rework locking and remove unneeded helper functions. Please see individual patch changelogs for details. With this I've been able to run the cgroup selftests and stress tests in loops for a long time without any regressions or deadlocks; lockdep and kasan did not complain either. /* v3 */ Link: https://lore.kernel.org/r/20200117002143.15559-1-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org Split preliminary work into separate patches. See changelog of individual commits. /* v4 */ Link: https://lore.kernel.org/r/20200117181219.14542-1-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org Verify that we have write access to the target cgroup. This is usually done by the vfs but since we aren't going through the vfs with CLONE_INTO_CGROUP we need to do it ourselves. /* v5 */ Link: https://lore.kernel.org/r/20200121154844.411-1-christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org Don't pass down the parent task_struct as argument, just use current directly. Put kargs->cset on error. /* v6 */ Fix refcounting when setting new root_cset for CLONE_INTO_CGROUP. With this cgroup migration will be a lot easier, and accounting will be more exact. It also allows for nice features such as creating a frozen process by spawning it into a frozen cgroup. The code simplifies container creation and exec logic quite a bit as well. I've tried to contain all core changes for this features in kernel/cgroup/* to avoid exposing cgroup internals. This has mostly worked. When a new process is supposed to be spawned in a cgroup different from the parent's then we briefly acquire the cgroup mutex right before fork()'s point of no return and drop it once the child process has been attached to the tasklist and to its css_set. This is done to ensure that the cgroup isn't removed behind our back. The cgroup mutex is _only_ held in this case; the usual case, where the child is created in the same cgroup as the parent does not acquire it since the cgroup can't be removed. The series already comes with proper testing. Once we've decided that this approach is good I'll expand the test-suite even more. Thanks! Christian Christian Brauner (6): cgroup: unify attach permission checking cgroup: add cgroup_get_from_file() helper cgroup: refactor fork helpers cgroup: add cgroup_may_write() helper clone3: allow spawning processes into cgroups selftests/cgroup: add tests for cloning into cgroups include/linux/cgroup-defs.h | 5 +- include/linux/cgroup.h | 20 +- include/linux/sched/task.h | 4 + include/uapi/linux/sched.h | 5 + kernel/cgroup/cgroup.c | 291 ++++++++++++++---- kernel/cgroup/pids.c | 15 +- kernel/fork.c | 19 +- tools/testing/selftests/cgroup/Makefile | 6 +- tools/testing/selftests/cgroup/cgroup_util.c | 126 ++++++++ tools/testing/selftests/cgroup/cgroup_util.h | 4 + tools/testing/selftests/cgroup/test_core.c | 64 ++++ .../selftests/clone3/clone3_selftests.h | 19 +- 12 files changed, 496 insertions(+), 82 deletions(-) base-commit: d5226fa6dbae0569ee43ecfc08bdcd6770fc4755 -- 2.25.0