linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
  • [parent not found: <20210610173944.1203706-4-schatzberg.dan@gmail.com>]
  • * [PATCH V13 0/3] Charge loop device i/o to issuing cgroup
    @ 2021-06-03 14:57 Dan Schatzberg
      2021-06-03 14:57 ` [PATCH 3/3] loop: Charge i/o to mem and blk cg Dan Schatzberg
      0 siblings, 1 reply; 14+ messages in thread
    From: Dan Schatzberg @ 2021-06-03 14:57 UTC (permalink / raw)
      To: Jens Axboe
      Cc: open list:BLOCK LAYER, open list,
    	open list:CONTROL GROUP (CGROUP),
    	open list:MEMORY MANAGEMENT
    
    No significant changes, rebased on Linus's tree.
    
    Jens, this series was intended to go into the mm tree since it had
    some conflicts with mm changes. It never got picked up for 5.12 and
    the corresponding mm changes are now in linus's tree. This is mostly a
    loop change so it feels more appropriate to go through the block tree.
    Do you think that makes sense?
    
    Changes since V12:
    
    * Small change to get_mem_cgroup_from_mm to avoid needing
      get_active_memcg
    
    Changes since V11:
    
    * Removed WQ_MEM_RECLAIM flag from loop workqueue. Technically, this
      can be driven by writeback, but this was causing a warning in xfs
      and likely other filesystems aren't equipped to be driven by reclaim
      at the VFS layer.
    * Included a small fix from Colin Ian King.
    * reworked get_mem_cgroup_from_mm to institute the necessary charge
      priority.
    
    Changes since V10:
    
    * Added page-cache charging to mm: Charge active memcg when no mm is set
    
    Changes since V9:
    
    * Rebased against linus's branch which now includes Roman Gushchin's
      patch this series is based off of
    
    Changes since V8:
    
    * Rebased on top of Roman Gushchin's patch
      (https://lkml.org/lkml/2020/8/21/1464) which provides the nesting
      support for setting active memcg. Dropped the patch from this series
      that did the same thing.
    
    Changes since V7:
    
    * Rebased against linus's branch
    
    Changes since V6:
    
    * Added separate spinlock for worker synchronization
    * Minor style changes
    
    Changes since V5:
    
    * Fixed a missing css_put when failing to allocate a worker
    * Minor style changes
    
    Changes since V4:
    
    Only patches 1 and 2 have changed.
    
    * Fixed irq lock ordering bug
    * Simplified loop detach
    * Added support for nesting memalloc_use_memcg
    
    Changes since V3:
    
    * Fix race on loop device destruction and deferred worker cleanup
    * Ensure charge on shmem_swapin_page works just like getpage
    * Minor style changes
    
    Changes since V2:
    
    * Deferred destruction of workqueue items so in the common case there
      is no allocation needed
    
    Changes since V1:
    
    * Split out and reordered patches so cgroup charging changes are
      separate from kworker -> workqueue change
    
    * Add mem_css to struct loop_cmd to simplify logic
    
    The loop device runs all i/o to the backing file on a separate kworker
    thread which results in all i/o being charged to the root cgroup. This
    allows a loop device to be used to trivially bypass resource limits
    and other policy. This patch series fixes this gap in accounting.
    
    A simple script to demonstrate this behavior on cgroupv2 machine:
    
    '''
    #!/bin/bash
    set -e
    
    CGROUP=/sys/fs/cgroup/test.slice
    LOOP_DEV=/dev/loop0
    
    if [[ ! -d $CGROUP ]]
    then
        sudo mkdir $CGROUP
    fi
    
    grep oom_kill $CGROUP/memory.events
    
    # Set a memory limit, write more than that limit to tmpfs -> OOM kill
    sudo unshare -m bash -c "
    echo \$\$ > $CGROUP/cgroup.procs;
    echo 0 > $CGROUP/memory.swap.max;
    echo 64M > $CGROUP/memory.max;
    mount -t tmpfs -o size=512m tmpfs /tmp;
    dd if=/dev/zero of=/tmp/file bs=1M count=256" || true
    
    grep oom_kill $CGROUP/memory.events
    
    # Set a memory limit, write more than that limit through loopback
    # device -> no OOM kill
    sudo unshare -m bash -c "
    echo \$\$ > $CGROUP/cgroup.procs;
    echo 0 > $CGROUP/memory.swap.max;
    echo 64M > $CGROUP/memory.max;
    mount -t tmpfs -o size=512m tmpfs /tmp;
    truncate -s 512m /tmp/backing_file
    losetup $LOOP_DEV /tmp/backing_file
    dd if=/dev/zero of=$LOOP_DEV bs=1M count=256;
    losetup -D $LOOP_DEV" || true
    
    grep oom_kill $CGROUP/memory.events
    '''
    
    Naively charging cgroups could result in priority inversions through
    the single kworker thread in the case where multiple cgroups are
    reading/writing to the same loop device. This patch series does some
    minor modification to the loop driver so that each cgroup can make
    forward progress independently to avoid this inversion.
    
    With this patch series applied, the above script triggers OOM kills
    when writing through the loop device as expected.
    
    Dan Schatzberg (3):
      loop: Use worker per cgroup instead of kworker
      mm: Charge active memcg when no mm is set
      loop: Charge i/o to mem and blk cg
    
     drivers/block/loop.c       | 241 ++++++++++++++++++++++++++++++-------
     drivers/block/loop.h       |  15 ++-
     include/linux/memcontrol.h |   6 +
     kernel/cgroup/cgroup.c     |   1 +
     mm/filemap.c               |   2 +-
     mm/memcontrol.c            |  49 +++++---
     mm/shmem.c                 |   4 +-
     7 files changed, 250 insertions(+), 68 deletions(-)
    
    -- 
    2.30.2
    
    
    
    ^ permalink raw reply	[flat|nested] 14+ messages in thread
    * [PATCH V12 0/3] Charge loop device i/o to issuing cgroup
    @ 2021-04-02 19:16 Dan Schatzberg
      2021-04-02 19:16 ` [PATCH 3/3] loop: Charge i/o to mem and blk cg Dan Schatzberg
      0 siblings, 1 reply; 14+ messages in thread
    From: Dan Schatzberg @ 2021-04-02 19:16 UTC (permalink / raw)
      Cc: Jens Axboe, Tejun Heo, Zefan Li, Johannes Weiner, Andrew Morton,
    	Michal Hocko, Vladimir Davydov, Hugh Dickins, Shakeel Butt,
    	Roman Gushchin, Muchun Song, Yang Shi, Alex Shi, Alexander Duyck,
    	Wei Yang, open list:BLOCK LAYER, open list,
    	open list:CONTROL GROUP (CGROUP),
    	open list:MEMORY MANAGEMENT
    
    No major changes, rebased on top of latest mm tree
    
    Changes since V12:
    
    * Small change to get_mem_cgroup_from_mm to avoid needing
      get_active_memcg
    
    Changes since V11:
    
    * Removed WQ_MEM_RECLAIM flag from loop workqueue. Technically, this
      can be driven by writeback, but this was causing a warning in xfs
      and likely other filesystems aren't equipped to be driven by reclaim
      at the VFS layer.
    * Included a small fix from Colin Ian King.
    * reworked get_mem_cgroup_from_mm to institute the necessary charge
      priority.
    
    Changes since V10:
    
    * Added page-cache charging to mm: Charge active memcg when no mm is set
    
    Changes since V9:
    
    * Rebased against linus's branch which now includes Roman Gushchin's
      patch this series is based off of
    
    Changes since V8:
    
    * Rebased on top of Roman Gushchin's patch
      (https://lkml.org/lkml/2020/8/21/1464) which provides the nesting
      support for setting active memcg. Dropped the patch from this series
      that did the same thing.
    
    Changes since V7:
    
    * Rebased against linus's branch
    
    Changes since V6:
    
    * Added separate spinlock for worker synchronization
    * Minor style changes
    
    Changes since V5:
    
    * Fixed a missing css_put when failing to allocate a worker
    * Minor style changes
    
    Changes since V4:
    
    Only patches 1 and 2 have changed.
    
    * Fixed irq lock ordering bug
    * Simplified loop detach
    * Added support for nesting memalloc_use_memcg
    
    Changes since V3:
    
    * Fix race on loop device destruction and deferred worker cleanup
    * Ensure charge on shmem_swapin_page works just like getpage
    * Minor style changes
    
    Changes since V2:
    
    * Deferred destruction of workqueue items so in the common case there
      is no allocation needed
    
    Changes since V1:
    
    * Split out and reordered patches so cgroup charging changes are
      separate from kworker -> workqueue change
    
    * Add mem_css to struct loop_cmd to simplify logic
    
    The loop device runs all i/o to the backing file on a separate kworker
    thread which results in all i/o being charged to the root cgroup. This
    allows a loop device to be used to trivially bypass resource limits
    and other policy. This patch series fixes this gap in accounting.
    
    A simple script to demonstrate this behavior on cgroupv2 machine:
    
    '''
    #!/bin/bash
    set -e
    
    CGROUP=/sys/fs/cgroup/test.slice
    LOOP_DEV=/dev/loop0
    
    if [[ ! -d $CGROUP ]]
    then
        sudo mkdir $CGROUP
    fi
    
    grep oom_kill $CGROUP/memory.events
    
    # Set a memory limit, write more than that limit to tmpfs -> OOM kill
    sudo unshare -m bash -c "
    echo \$\$ > $CGROUP/cgroup.procs;
    echo 0 > $CGROUP/memory.swap.max;
    echo 64M > $CGROUP/memory.max;
    mount -t tmpfs -o size=512m tmpfs /tmp;
    dd if=/dev/zero of=/tmp/file bs=1M count=256" || true
    
    grep oom_kill $CGROUP/memory.events
    
    # Set a memory limit, write more than that limit through loopback
    # device -> no OOM kill
    sudo unshare -m bash -c "
    echo \$\$ > $CGROUP/cgroup.procs;
    echo 0 > $CGROUP/memory.swap.max;
    echo 64M > $CGROUP/memory.max;
    mount -t tmpfs -o size=512m tmpfs /tmp;
    truncate -s 512m /tmp/backing_file
    losetup $LOOP_DEV /tmp/backing_file
    dd if=/dev/zero of=$LOOP_DEV bs=1M count=256;
    losetup -D $LOOP_DEV" || true
    
    grep oom_kill $CGROUP/memory.events
    '''
    
    Naively charging cgroups could result in priority inversions through
    the single kworker thread in the case where multiple cgroups are
    reading/writing to the same loop device. This patch series does some
    minor modification to the loop driver so that each cgroup can make
    forward progress independently to avoid this inversion.
    
    With this patch series applied, the above script triggers OOM kills
    when writing through the loop device as expected.
    
    Dan Schatzberg (3):
      loop: Use worker per cgroup instead of kworker
      mm: Charge active memcg when no mm is set
      loop: Charge i/o to mem and blk cg
    
     drivers/block/loop.c       | 244 ++++++++++++++++++++++++++++++-------
     drivers/block/loop.h       |  15 ++-
     include/linux/memcontrol.h |   6 +
     kernel/cgroup/cgroup.c     |   1 +
     mm/filemap.c               |   2 +-
     mm/memcontrol.c            |  49 +++++---
     mm/shmem.c                 |   4 +-
     7 files changed, 253 insertions(+), 68 deletions(-)
    
    -- 
    2.30.2
    
    
    
    ^ permalink raw reply	[flat|nested] 14+ messages in thread
    * [PATCH V11 0/3] Charge loop device i/o to issuing cgroup
    @ 2021-03-29 14:48 Dan Schatzberg
      2021-03-29 14:48 ` [PATCH 3/3] loop: Charge i/o to mem and blk cg Dan Schatzberg
      0 siblings, 1 reply; 14+ messages in thread
    From: Dan Schatzberg @ 2021-03-29 14:48 UTC (permalink / raw)
      Cc: Jens Axboe, Tejun Heo, Zefan Li, Johannes Weiner, Andrew Morton,
    	Michal Hocko, Vladimir Davydov, Hugh Dickins, Shakeel Butt,
    	Roman Gushchin, Yang Shi, Muchun Song, Alex Shi, Alexander Duyck,
    	Yafang Shao, Wei Yang, open list:BLOCK LAYER, open list,
    	open list:CONTROL GROUP (CGROUP),
    	open list:MEMORY MANAGEMENT
    
    No major changes, rebased on top of latest mm tree
    
    Changes since V11:
    
    * Removed WQ_MEM_RECLAIM flag from loop workqueue. Technically, this
      can be driven by writeback, but this was causing a warning in xfs
      and likely other filesystems aren't equipped to be driven by reclaim
      at the VFS layer.
    * Included a small fix from Colin Ian King.
    * reworked get_mem_cgroup_from_mm to institute the necessary charge
      priority.
    
    Changes since V10:
    
    * Added page-cache charging to mm: Charge active memcg when no mm is set
    
    Changes since V9:
    
    * Rebased against linus's branch which now includes Roman Gushchin's
      patch this series is based off of
    
    Changes since V8:
    
    * Rebased on top of Roman Gushchin's patch
      (https://lkml.org/lkml/2020/8/21/1464) which provides the nesting
      support for setting active memcg. Dropped the patch from this series
      that did the same thing.
    
    Changes since V7:
    
    * Rebased against linus's branch
    
    Changes since V6:
    
    * Added separate spinlock for worker synchronization
    * Minor style changes
    
    Changes since V5:
    
    * Fixed a missing css_put when failing to allocate a worker
    * Minor style changes
    
    Changes since V4:
    
    Only patches 1 and 2 have changed.
    
    * Fixed irq lock ordering bug
    * Simplified loop detach
    * Added support for nesting memalloc_use_memcg
    
    Changes since V3:
    
    * Fix race on loop device destruction and deferred worker cleanup
    * Ensure charge on shmem_swapin_page works just like getpage
    * Minor style changes
    
    Changes since V2:
    
    * Deferred destruction of workqueue items so in the common case there
      is no allocation needed
    
    Changes since V1:
    
    * Split out and reordered patches so cgroup charging changes are
      separate from kworker -> workqueue change
    
    * Add mem_css to struct loop_cmd to simplify logic
    
    The loop device runs all i/o to the backing file on a separate kworker
    thread which results in all i/o being charged to the root cgroup. This
    allows a loop device to be used to trivially bypass resource limits
    and other policy. This patch series fixes this gap in accounting.
    
    A simple script to demonstrate this behavior on cgroupv2 machine:
    
    '''
    #!/bin/bash
    set -e
    
    CGROUP=/sys/fs/cgroup/test.slice
    LOOP_DEV=/dev/loop0
    
    if [[ ! -d $CGROUP ]]
    then
        sudo mkdir $CGROUP
    fi
    
    grep oom_kill $CGROUP/memory.events
    
    # Set a memory limit, write more than that limit to tmpfs -> OOM kill
    sudo unshare -m bash -c "
    echo \$\$ > $CGROUP/cgroup.procs;
    echo 0 > $CGROUP/memory.swap.max;
    echo 64M > $CGROUP/memory.max;
    mount -t tmpfs -o size=512m tmpfs /tmp;
    dd if=/dev/zero of=/tmp/file bs=1M count=256" || true
    
    grep oom_kill $CGROUP/memory.events
    
    # Set a memory limit, write more than that limit through loopback
    # device -> no OOM kill
    sudo unshare -m bash -c "
    echo \$\$ > $CGROUP/cgroup.procs;
    echo 0 > $CGROUP/memory.swap.max;
    echo 64M > $CGROUP/memory.max;
    mount -t tmpfs -o size=512m tmpfs /tmp;
    truncate -s 512m /tmp/backing_file
    losetup $LOOP_DEV /tmp/backing_file
    dd if=/dev/zero of=$LOOP_DEV bs=1M count=256;
    losetup -D $LOOP_DEV" || true
    
    grep oom_kill $CGROUP/memory.events
    '''
    
    Naively charging cgroups could result in priority inversions through
    the single kworker thread in the case where multiple cgroups are
    reading/writing to the same loop device. This patch series does some
    minor modification to the loop driver so that each cgroup can make
    forward progress independently to avoid this inversion.
    
    With this patch series applied, the above script triggers OOM kills
    when writing through the loop device as expected.
    
    Dan Schatzberg (3):
      loop: Use worker per cgroup instead of kworker
      mm: Charge active memcg when no mm is set
      loop: Charge i/o to mem and blk cg
    
     drivers/block/loop.c       | 248 ++++++++++++++++++++++++++++++-------
     drivers/block/loop.h       |  15 ++-
     include/linux/memcontrol.h |   6 +
     kernel/cgroup/cgroup.c     |   1 +
     mm/filemap.c               |   2 +-
     mm/memcontrol.c            |  73 ++++++-----
     mm/shmem.c                 |   4 +-
     7 files changed, 267 insertions(+), 82 deletions(-)
    
    -- 
    2.30.2
    
    
    
    ^ permalink raw reply	[flat|nested] 14+ messages in thread
    * [PATCH v10 0/3] Charge loop device i/o to issuing cgroup
    @ 2021-03-16 15:36 Dan Schatzberg
      2021-03-16 15:36 ` [PATCH 3/3] loop: Charge i/o to mem and blk cg Dan Schatzberg
      0 siblings, 1 reply; 14+ messages in thread
    From: Dan Schatzberg @ 2021-03-16 15:36 UTC (permalink / raw)
      Cc: Jens Axboe, Tejun Heo, Zefan Li, Johannes Weiner, Andrew Morton,
    	Michal Hocko, Vladimir Davydov, Hugh Dickins, Shakeel Butt,
    	Roman Gushchin, Muchun Song, Alex Shi, Alexander Duyck,
    	Chris Down, Yafang Shao, Wei Yang, open list:BLOCK LAYER,
    	open list, open list:CONTROL GROUP (CGROUP),
    	open list:MEMORY MANAGEMENT
    
    No major changes, just rebasing and resubmitting
    
    Changes since V10:
    
    * Added page-cache charging to mm: Charge active memcg when no mm is set
    
    Changes since V9:
    
    * Rebased against linus's branch which now includes Roman Gushchin's
      patch this series is based off of
    
    Changes since V8:
    
    * Rebased on top of Roman Gushchin's patch
      (https://lkml.org/lkml/2020/8/21/1464) which provides the nesting
      support for setting active memcg. Dropped the patch from this series
      that did the same thing.
    
    Changes since V7:
    
    * Rebased against linus's branch
    
    Changes since V6:
    
    * Added separate spinlock for worker synchronization
    * Minor style changes
    
    Changes since V5:
    
    * Fixed a missing css_put when failing to allocate a worker
    * Minor style changes
    
    Changes since V4:
    
    Only patches 1 and 2 have changed.
    
    * Fixed irq lock ordering bug
    * Simplified loop detach
    * Added support for nesting memalloc_use_memcg
    
    Changes since V3:
    
    * Fix race on loop device destruction and deferred worker cleanup
    * Ensure charge on shmem_swapin_page works just like getpage
    * Minor style changes
    
    Changes since V2:
    
    * Deferred destruction of workqueue items so in the common case there
      is no allocation needed
    
    Changes since V1:
    
    * Split out and reordered patches so cgroup charging changes are
      separate from kworker -> workqueue change
    
    * Add mem_css to struct loop_cmd to simplify logic
    
    The loop device runs all i/o to the backing file on a separate kworker
    thread which results in all i/o being charged to the root cgroup. This
    allows a loop device to be used to trivially bypass resource limits
    and other policy. This patch series fixes this gap in accounting.
    
    A simple script to demonstrate this behavior on cgroupv2 machine:
    
    '''
    #!/bin/bash
    set -e
    
    CGROUP=/sys/fs/cgroup/test.slice
    LOOP_DEV=/dev/loop0
    
    if [[ ! -d $CGROUP ]]
    then
        sudo mkdir $CGROUP
    fi
    
    grep oom_kill $CGROUP/memory.events
    
    # Set a memory limit, write more than that limit to tmpfs -> OOM kill
    sudo unshare -m bash -c "
    echo \$\$ > $CGROUP/cgroup.procs;
    echo 0 > $CGROUP/memory.swap.max;
    echo 64M > $CGROUP/memory.max;
    mount -t tmpfs -o size=512m tmpfs /tmp;
    dd if=/dev/zero of=/tmp/file bs=1M count=256" || true
    
    grep oom_kill $CGROUP/memory.events
    
    # Set a memory limit, write more than that limit through loopback
    # device -> no OOM kill
    sudo unshare -m bash -c "
    echo \$\$ > $CGROUP/cgroup.procs;
    echo 0 > $CGROUP/memory.swap.max;
    echo 64M > $CGROUP/memory.max;
    mount -t tmpfs -o size=512m tmpfs /tmp;
    truncate -s 512m /tmp/backing_file
    losetup $LOOP_DEV /tmp/backing_file
    dd if=/dev/zero of=$LOOP_DEV bs=1M count=256;
    losetup -D $LOOP_DEV" || true
    
    grep oom_kill $CGROUP/memory.events
    '''
    
    Naively charging cgroups could result in priority inversions through
    the single kworker thread in the case where multiple cgroups are
    reading/writing to the same loop device. This patch series does some
    minor modification to the loop driver so that each cgroup can make
    forward progress independently to avoid this inversion.
    
    With this patch series applied, the above script triggers OOM kills
    when writing through the loop device as expected.
    
    Dan Schatzberg (3):
      loop: Use worker per cgroup instead of kworker
      mm: Charge active memcg when no mm is set
      loop: Charge i/o to mem and blk cg
    
     drivers/block/loop.c       | 248 ++++++++++++++++++++++++++++++-------
     drivers/block/loop.h       |  15 ++-
     include/linux/memcontrol.h |  11 ++
     kernel/cgroup/cgroup.c     |   1 +
     mm/filemap.c               |   2 +-
     mm/memcontrol.c            |  15 ++-
     mm/shmem.c                 |   4 +-
     7 files changed, 242 insertions(+), 54 deletions(-)
    
    -- 
    2.30.2
    
    
    
    ^ permalink raw reply	[flat|nested] 14+ messages in thread
    * [PATCH v8 0/3] Charge loop device i/o to issuing cgroup
    @ 2020-08-31 15:36 Dan Schatzberg
      2020-08-31 15:37 ` [PATCH 3/3] loop: Charge i/o to mem and blk cg Dan Schatzberg
      0 siblings, 1 reply; 14+ messages in thread
    From: Dan Schatzberg @ 2020-08-31 15:36 UTC (permalink / raw)
      Cc: Dan Schatzberg, Jens Axboe, Tejun Heo, Li Zefan, Johannes Weiner,
    	Michal Hocko, Vladimir Davydov, Andrew Morton, Hugh Dickins,
    	Shakeel Butt, Roman Gushchin, Joonsoo Kim, Chris Down, Yang Shi,
    	Jakub Kicinski, open list:BLOCK LAYER, open list,
    	open list:CONTROL GROUP (CGROUP),
    	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
    
    Much of the discussion about this has died down. There's been a
    concern raised that we could generalize infrastructure across loop,
    md, etc. This may be possible, in the future, but it isn't clear to me
    how this would look like. I'm inclined to fix the existing issue with
    loop devices now (this is a problem we hit at FB) and address
    consolidation with other cases if and when those need to be addressed.
    
    Note that this series needs to be based off of Roman Gushchin's patch
    (https://lkml.org/lkml/2020/8/21/1464) to compile.
    
    Changes since V8:
    
    * Rebased on top of Roman Gushchin's patch
      (https://lkml.org/lkml/2020/8/21/1464) which provides the nesting
      support for setting active memcg. Dropped the patch from this series
      that did the same thing.
    
    Changes since V7:
    
    * Rebased against linus's branch
    
    Changes since V6:
    
    * Added separate spinlock for worker synchronization
    * Minor style changes
    
    Changes since V5:
    
    * Fixed a missing css_put when failing to allocate a worker
    * Minor style changes
    
    Changes since V4:
    
    Only patches 1 and 2 have changed.
    
    * Fixed irq lock ordering bug
    * Simplified loop detach
    * Added support for nesting memalloc_use_memcg
    
    Changes since V3:
    
    * Fix race on loop device destruction and deferred worker cleanup
    * Ensure charge on shmem_swapin_page works just like getpage
    * Minor style changes
    
    Changes since V2:
    
    * Deferred destruction of workqueue items so in the common case there
      is no allocation needed
    
    Changes since V1:
    
    * Split out and reordered patches so cgroup charging changes are
      separate from kworker -> workqueue change
    
    * Add mem_css to struct loop_cmd to simplify logic
    
    The loop device runs all i/o to the backing file on a separate kworker
    thread which results in all i/o being charged to the root cgroup. This
    allows a loop device to be used to trivially bypass resource limits
    and other policy. This patch series fixes this gap in accounting.
    
    A simple script to demonstrate this behavior on cgroupv2 machine:
    
    '''
    #!/bin/bash
    set -e
    
    CGROUP=/sys/fs/cgroup/test.slice
    LOOP_DEV=/dev/loop0
    
    if [[ ! -d $CGROUP ]]
    then
        sudo mkdir $CGROUP
    fi
    
    grep oom_kill $CGROUP/memory.events
    
    # Set a memory limit, write more than that limit to tmpfs -> OOM kill
    sudo unshare -m bash -c "
    echo \$\$ > $CGROUP/cgroup.procs;
    echo 0 > $CGROUP/memory.swap.max;
    echo 64M > $CGROUP/memory.max;
    mount -t tmpfs -o size=512m tmpfs /tmp;
    dd if=/dev/zero of=/tmp/file bs=1M count=256" || true
    
    grep oom_kill $CGROUP/memory.events
    
    # Set a memory limit, write more than that limit through loopback
    # device -> no OOM kill
    sudo unshare -m bash -c "
    echo \$\$ > $CGROUP/cgroup.procs;
    echo 0 > $CGROUP/memory.swap.max;
    echo 64M > $CGROUP/memory.max;
    mount -t tmpfs -o size=512m tmpfs /tmp;
    truncate -s 512m /tmp/backing_file
    losetup $LOOP_DEV /tmp/backing_file
    dd if=/dev/zero of=$LOOP_DEV bs=1M count=256;
    losetup -D $LOOP_DEV" || true
    
    grep oom_kill $CGROUP/memory.events
    '''
    
    Naively charging cgroups could result in priority inversions through
    the single kworker thread in the case where multiple cgroups are
    reading/writing to the same loop device. This patch series does some
    minor modification to the loop driver so that each cgroup can make
    forward progress independently to avoid this inversion.
    
    With this patch series applied, the above script triggers OOM kills
    when writing through the loop device as expected.
    
    Dan Schatzberg (3):
      loop: Use worker per cgroup instead of kworker
      mm: Charge active memcg when no mm is set
      loop: Charge i/o to mem and blk cg
    
     drivers/block/loop.c       | 248 ++++++++++++++++++++++++++++++-------
     drivers/block/loop.h       |  15 ++-
     include/linux/memcontrol.h |   6 +
     kernel/cgroup/cgroup.c     |   1 +
     mm/memcontrol.c            |  11 +-
     mm/shmem.c                 |   4 +-
     6 files changed, 232 insertions(+), 53 deletions(-)
    
    -- 
    2.24.1
    
    
    
    ^ permalink raw reply	[flat|nested] 14+ messages in thread

    end of thread, other threads:[~2021-06-30 14:50 UTC | newest]
    
    Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
    -- links below jump to the message on this page --
         [not found] <20210610173944.1203706-1-schatzberg.dan@gmail.com>
         [not found] ` <20210610173944.1203706-3-schatzberg.dan@gmail.com>
    2021-06-25 14:47   ` [PATCH 2/3] mm: Charge active memcg when no mm is set Michal Koutný
         [not found] ` <20210610173944.1203706-4-schatzberg.dan@gmail.com>
    2021-06-25 15:01   ` [PATCH 3/3] loop: Charge i/o to mem and blk cg Michal Koutný
    2021-06-28 14:17     ` Dan Schatzberg
    2021-06-29 10:26       ` Michal Koutný
    2021-06-29 14:03         ` Dan Schatzberg
    2021-06-30  9:42           ` Michal Koutný
    2021-06-30 14:49             ` Dan Schatzberg
    2021-06-03 14:57 [PATCH V13 0/3] Charge loop device i/o to issuing cgroup Dan Schatzberg
    2021-06-03 14:57 ` [PATCH 3/3] loop: Charge i/o to mem and blk cg Dan Schatzberg
      -- strict thread matches above, loose matches on Subject: below --
    2021-04-02 19:16 [PATCH V12 0/3] Charge loop device i/o to issuing cgroup Dan Schatzberg
    2021-04-02 19:16 ` [PATCH 3/3] loop: Charge i/o to mem and blk cg Dan Schatzberg
    2021-04-06  3:23   ` Ming Lei
    2021-03-29 14:48 [PATCH V11 0/3] Charge loop device i/o to issuing cgroup Dan Schatzberg
    2021-03-29 14:48 ` [PATCH 3/3] loop: Charge i/o to mem and blk cg Dan Schatzberg
    2021-03-16 15:36 [PATCH v10 0/3] Charge loop device i/o to issuing cgroup Dan Schatzberg
    2021-03-16 15:36 ` [PATCH 3/3] loop: Charge i/o to mem and blk cg Dan Schatzberg
    2021-03-16 16:25   ` Shakeel Butt
    2020-08-31 15:36 [PATCH v8 0/3] Charge loop device i/o to issuing cgroup Dan Schatzberg
    2020-08-31 15:37 ` [PATCH 3/3] loop: Charge i/o to mem and blk cg Dan Schatzberg
    

    This is a public inbox, see mirroring instructions
    for how to clone and mirror all data and code used for this inbox;
    as well as URLs for NNTP newsgroup(s).