From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2805C47096 for ; Thu, 3 Jun 2021 14:57:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 431D6613E9 for ; Thu, 3 Jun 2021 14:57:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 431D6613E9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C840C6B00C4; Thu, 3 Jun 2021 10:57:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C33166B00C5; Thu, 3 Jun 2021 10:57:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A85966B00C6; Thu, 3 Jun 2021 10:57:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0027.hostedemail.com [216.40.44.27]) by kanga.kvack.org (Postfix) with ESMTP id 6EBD66B00C4 for ; Thu, 3 Jun 2021 10:57:26 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 01321180AD801 for ; Thu, 3 Jun 2021 14:57:26 +0000 (UTC) X-FDA: 78212716092.31.0D4830B Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) by imf17.hostedemail.com (Postfix) with ESMTP id AE1D34202A07 for ; Thu, 3 Jun 2021 14:57:15 +0000 (UTC) Received: by mail-qk1-f175.google.com with SMTP id i67so6184532qkc.4 for ; Thu, 03 Jun 2021 07:57:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=rlJ05cns3Kw02wHzy/fk7shZWnWXqFlVRCeogh0cQHs=; b=QwulaL5sAsbuU3RVIk5owgrkT1UhnwT/VbRG1Xs3XZImPch+XRmXTfjyN78iC274iy Izp0D6p8IEw4k37j3ujrwZeTvwWerC+BCYatRHtIRGlSzuD4rUhXoFRh5zrhzv2Qg6GI YoSb3jNprRJHK8Aaeuj8tDirOvpyF2ACPDo3G+6y2eFtT88DCa5VP+Qo5pIA05dp7Aif w9KDnDBKZyobpCl+SU1T+6CVxzU1KEmCepAREiIyK4oulr/fznqMaIT41w5JZz8ROkEa DEc3B0oRBTzxYWTR8bxizx6hs8tfyZ7mqnUSmMeo3Z8NYaYyDWuawMYMXlpWaZxJw9Im bYuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=rlJ05cns3Kw02wHzy/fk7shZWnWXqFlVRCeogh0cQHs=; b=dgU/uxVRnb5yV9yfS0+f91qTMB66OcnURuQshPbTNeR/qGRHyHSrYqM5xRtehHuces KPh2sDOx8hl2phPVsBXttUKwj45RbCnKRx9I2QWGyFyCbE39jEwTOTAxnHm36BeXQx8r CX2UoQkYThtiCK1RaOQsuVP6z3CZYwLjQydO/d311mwtCH5+wdvhMcmg9gTJF/mGvupz gmNdZb/Y2Q0fVHvc5WfdpnwA+DDZpLwWKBhpedgYRqvIVSBTJc4ai/XsdB5zwoLz68rI rJvcxFA4xT+dajej3GDp/wxgoppVBAjcoX4qlNMPe/6FJnnSXJS3YaytX4gX2PzpAv4m kZvA== X-Gm-Message-State: AOAM533gyps1IQ+A4OOrYEZrdv5lC+Db8j09PM/O1kZzucKlBUF18Cfn vDX9fnxWvQTkvZsPQfnBScM= X-Google-Smtp-Source: ABdhPJx0w507n2feT3WeLa9EFDutgCVqNm17jPyd7ic3GRRlfnlsM41RFs9FF4rQ/zgLCfePYtV7AA== X-Received: by 2002:a05:620a:44c4:: with SMTP id y4mr24724qkp.216.1622732244807; Thu, 03 Jun 2021 07:57:24 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:8008]) by smtp.gmail.com with ESMTPSA id k124sm2011692qkc.132.2021.06.03.07.57.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Jun 2021 07:57:24 -0700 (PDT) From: Dan Schatzberg To: Jens Axboe Cc: linux-block@vger.kernel.org (open list:BLOCK LAYER), linux-kernel@vger.kernel.org (open list), cgroups@vger.kernel.org (open list:CONTROL GROUP (CGROUP)), linux-mm@kvack.org (open list:MEMORY MANAGEMENT) Subject: [PATCH V13 0/3] Charge loop device i/o to issuing cgroup Date: Thu, 3 Jun 2021 07:57:04 -0700 Message-Id: <20210603145707.4031641-1-schatzberg.dan@gmail.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=QwulaL5s; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of schatzbergdan@gmail.com designates 209.85.222.175 as permitted sender) smtp.mailfrom=schatzbergdan@gmail.com X-Stat-Signature: p76kabq4ri87zhekdnsggbuu64gxroyq X-Rspamd-Queue-Id: AE1D34202A07 X-Rspamd-Server: rspam02 X-HE-Tag: 1622732235-538274 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: No significant changes, rebased on Linus's tree. Jens, this series was intended to go into the mm tree since it had some conflicts with mm changes. It never got picked up for 5.12 and the corresponding mm changes are now in linus's tree. This is mostly a loop change so it feels more appropriate to go through the block tree. Do you think that makes sense? Changes since V12: * Small change to get_mem_cgroup_from_mm to avoid needing get_active_memcg Changes since V11: * Removed WQ_MEM_RECLAIM flag from loop workqueue. Technically, this can be driven by writeback, but this was causing a warning in xfs and likely other filesystems aren't equipped to be driven by reclaim at the VFS layer. * Included a small fix from Colin Ian King. * reworked get_mem_cgroup_from_mm to institute the necessary charge priority. Changes since V10: * Added page-cache charging to mm: Charge active memcg when no mm is set Changes since V9: * Rebased against linus's branch which now includes Roman Gushchin's patch this series is based off of Changes since V8: * Rebased on top of Roman Gushchin's patch (https://lkml.org/lkml/2020/8/21/1464) which provides the nesting support for setting active memcg. Dropped the patch from this series that did the same thing. Changes since V7: * Rebased against linus's branch Changes since V6: * Added separate spinlock for worker synchronization * Minor style changes Changes since V5: * Fixed a missing css_put when failing to allocate a worker * Minor style changes Changes since V4: Only patches 1 and 2 have changed. * Fixed irq lock ordering bug * Simplified loop detach * Added support for nesting memalloc_use_memcg Changes since V3: * Fix race on loop device destruction and deferred worker cleanup * Ensure charge on shmem_swapin_page works just like getpage * Minor style changes Changes since V2: * Deferred destruction of workqueue items so in the common case there is no allocation needed Changes since V1: * Split out and reordered patches so cgroup charging changes are separate from kworker -> workqueue change * Add mem_css to struct loop_cmd to simplify logic The loop device runs all i/o to the backing file on a separate kworker thread which results in all i/o being charged to the root cgroup. This allows a loop device to be used to trivially bypass resource limits and other policy. This patch series fixes this gap in accounting. A simple script to demonstrate this behavior on cgroupv2 machine: ''' #!/bin/bash set -e CGROUP=3D/sys/fs/cgroup/test.slice LOOP_DEV=3D/dev/loop0 if [[ ! -d $CGROUP ]] then sudo mkdir $CGROUP fi grep oom_kill $CGROUP/memory.events # Set a memory limit, write more than that limit to tmpfs -> OOM kill sudo unshare -m bash -c " echo \$\$ > $CGROUP/cgroup.procs; echo 0 > $CGROUP/memory.swap.max; echo 64M > $CGROUP/memory.max; mount -t tmpfs -o size=3D512m tmpfs /tmp; dd if=3D/dev/zero of=3D/tmp/file bs=3D1M count=3D256" || true grep oom_kill $CGROUP/memory.events # Set a memory limit, write more than that limit through loopback # device -> no OOM kill sudo unshare -m bash -c " echo \$\$ > $CGROUP/cgroup.procs; echo 0 > $CGROUP/memory.swap.max; echo 64M > $CGROUP/memory.max; mount -t tmpfs -o size=3D512m tmpfs /tmp; truncate -s 512m /tmp/backing_file losetup $LOOP_DEV /tmp/backing_file dd if=3D/dev/zero of=3D$LOOP_DEV bs=3D1M count=3D256; losetup -D $LOOP_DEV" || true grep oom_kill $CGROUP/memory.events ''' Naively charging cgroups could result in priority inversions through the single kworker thread in the case where multiple cgroups are reading/writing to the same loop device. This patch series does some minor modification to the loop driver so that each cgroup can make forward progress independently to avoid this inversion. With this patch series applied, the above script triggers OOM kills when writing through the loop device as expected. Dan Schatzberg (3): loop: Use worker per cgroup instead of kworker mm: Charge active memcg when no mm is set loop: Charge i/o to mem and blk cg drivers/block/loop.c | 241 ++++++++++++++++++++++++++++++------- drivers/block/loop.h | 15 ++- include/linux/memcontrol.h | 6 + kernel/cgroup/cgroup.c | 1 + mm/filemap.c | 2 +- mm/memcontrol.c | 49 +++++--- mm/shmem.c | 4 +- 7 files changed, 250 insertions(+), 68 deletions(-) --=20 2.30.2