From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C190C38BFA for ; Mon, 24 Feb 2020 22:18:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 59973218AC for ; Mon, 24 Feb 2020 22:18:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XEpAdGo0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 59973218AC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F08E56B0005; Mon, 24 Feb 2020 17:18:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E91326B0006; Mon, 24 Feb 2020 17:18:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D59EC6B0007; Mon, 24 Feb 2020 17:18:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0209.hostedemail.com [216.40.44.209]) by kanga.kvack.org (Postfix) with ESMTP id B5E8E6B0005 for ; Mon, 24 Feb 2020 17:18:05 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 6104B824556B for ; Mon, 24 Feb 2020 22:18:05 +0000 (UTC) X-FDA: 76526434530.15.brick17_63b057289701e X-HE-Tag: brick17_63b057289701e X-Filterd-Recvd-Size: 6189 Received: from mail-qk1-f196.google.com (mail-qk1-f196.google.com [209.85.222.196]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Mon, 24 Feb 2020 22:18:04 +0000 (UTC) Received: by mail-qk1-f196.google.com with SMTP id h4so10220604qkm.0 for ; Mon, 24 Feb 2020 14:18:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=tuG46v7pUNtXQODDHrjX+i51IKR3M2oYfS7laFORPp4=; b=XEpAdGo026I77uzxt7/HRuwG0Z3ke5/JatppxUYtFyVC7CZO/badnU2tLHdHpc7SQs LxRFS9iknH4NyfX/AQqvLkzYm0urksQlsKHuN5mQwQ2OSRx8CermDM7iWC9OTgoj7LDw qpOIhqv5qlSSt4FWsAmR/8ZEEwtKaLN9TECZ6IGCtzzy+ulUSUz2nw1DP1/DSnxY5EcI 4hBqe8dOgP+t95iSSDy1aS67ojRZrqZiTQFfAFBxWBE1AayChECN+sXvulVRYglTIkEq wzdtBxOaWYMUH+YEG9gFBXdZF41ySIe+aKtyt/pXvBIaw4SBHmwq0f9riXNn4n3sb5BN Tc9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=tuG46v7pUNtXQODDHrjX+i51IKR3M2oYfS7laFORPp4=; b=Lnm7aHItSEhN7ld7S3UZAJe4i2OVbx5LuAyVfLY3gmuuGAml0IrxXMfEsizNzIhRmK 6OOIw4qr7VOac/V/YVy9Zb3Kme56yInXujZt9ud8SyaLRRpY4mKl0NgakEG4LmgPt449 XYNFIZNBQMob2vclzxyyyqcWti9CGIho/0RbjsGFv83DKxnGQaC+RzzMCXXxrbHLSYRv G/oFreLYcZa8yfYVO4TL6CYHJdgvtCvy005kljxai9iUnNx/qgQNY21E7isLkclsr09e OaL4iBhGY2CsAp7XVXKXNNIZwYBirMEg6ai0TCpJu+hVMHLWqchI3btRglIYJkXZvh9V d3TA== X-Gm-Message-State: APjAAAW4sl/uN4mLxG8G9AdRZEJH8u8xLb/pXlSSKhPVcOy1VQOZda1b Y1WYR7VsTRkFOHJt7Knr99E= X-Google-Smtp-Source: APXvYqwuPk7+hhjm59kepkZOFZVJxYndmI2m8lWGURnJAjQVfx6w4pZF1QXH2n28q2DSmYnrMolx/w== X-Received: by 2002:a37:7746:: with SMTP id s67mr51588255qkc.127.1582582684427; Mon, 24 Feb 2020 14:18:04 -0800 (PST) Received: from dschatzberg-fedora-PC0Y6AEN.thefacebook.com ([2620:10d:c091:500::2:b19b]) by smtp.gmail.com with ESMTPSA id o17sm6648870qtj.80.2020.02.24.14.18.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Feb 2020 14:18:03 -0800 (PST) From: Dan Schatzberg To: Cc: Dan Schatzberg , Jens Axboe , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Hugh Dickins , Roman Gushchin , Shakeel Butt , Chris Down , Yang Shi , Thomas Gleixner , linux-block@vger.kernel.org (open list:BLOCK LAYER), linux-kernel@vger.kernel.org (open list), cgroups@vger.kernel.org (open list:CONTROL GROUP (CGROUP)), linux-mm@kvack.org (open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)) Subject: [PATCH v3 0/3] Charge loop device i/o to issuing cgroup Date: Mon, 24 Feb 2020 17:17:44 -0500 Message-Id: X-Mailer: git-send-email 2.21.1 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Changes since V3: * Fix race on loop device destruction and deferred worker cleanup * Ensure charge on shmem_swapin_page works just like getpage * Minor style changes Changes since V2: * Deferred destruction of workqueue items so in the common case there is no allocation needed Changes since V1: * Split out and reordered patches so cgroup charging changes are separate from kworker -> workqueue change * Add mem_css to struct loop_cmd to simplify logic The loop device runs all i/o to the backing file on a separate kworker thread which results in all i/o being charged to the root cgroup. This allows a loop device to be used to trivially bypass resource limits and other policy. This patch series fixes this gap in accounting. A simple script to demonstrate this behavior on cgroupv2 machine: ''' #!/bin/bash set -e CGROUP=3D/sys/fs/cgroup/test.slice LOOP_DEV=3D/dev/loop0 if [[ ! -d $CGROUP ]] then sudo mkdir $CGROUP fi grep oom_kill $CGROUP/memory.events # Set a memory limit, write more than that limit to tmpfs -> OOM kill sudo unshare -m bash -c " echo \$\$ > $CGROUP/cgroup.procs; echo 0 > $CGROUP/memory.swap.max; echo 64M > $CGROUP/memory.max; mount -t tmpfs -o size=3D512m tmpfs /tmp; dd if=3D/dev/zero of=3D/tmp/file bs=3D1M count=3D256" || true grep oom_kill $CGROUP/memory.events # Set a memory limit, write more than that limit through loopback # device -> no OOM kill sudo unshare -m bash -c " echo \$\$ > $CGROUP/cgroup.procs; echo 0 > $CGROUP/memory.swap.max; echo 64M > $CGROUP/memory.max; mount -t tmpfs -o size=3D512m tmpfs /tmp; truncate -s 512m /tmp/backing_file losetup $LOOP_DEV /tmp/backing_file dd if=3D/dev/zero of=3D$LOOP_DEV bs=3D1M count=3D256; losetup -D $LOOP_DEV" || true grep oom_kill $CGROUP/memory.events ''' Naively charging cgroups could result in priority inversions through the single kworker thread in the case where multiple cgroups are reading/writing to the same loop device. This patch series does some minor modification to the loop driver so that each cgroup can make forward progress independently to avoid this inversion. With this patch series applied, the above script triggers OOM kills when writing through the loop device as expected. Dan Schatzberg (3): loop: Use worker per cgroup instead of kworker mm: Charge active memcg when no mm is set loop: Charge i/o to mem and blk cg drivers/block/loop.c | 246 +++++++++++++++++++++++++++++++------ drivers/block/loop.h | 14 ++- include/linux/memcontrol.h | 6 + kernel/cgroup/cgroup.c | 1 + mm/memcontrol.c | 11 +- mm/shmem.c | 4 +- 6 files changed, 235 insertions(+), 47 deletions(-) --=20 2.17.1