All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Down <chris@chrisdown.name>
To: Yu Zhao <yuzhao@google.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	cgroups@vger.kernel.org, kernel-team@fb.com,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: MGLRU premature memcg OOM on slow writes
Date: Fri, 9 Feb 2024 02:31:35 +0000	[thread overview]
Message-ID: <ZcWOh9u3uqZjNFMa@chrisdown.name> (raw)

Hi Yu,

When running with MGLRU I'm encountering premature OOMs when transferring files 
to a slow disk.

On non-MGLRU setups, writeback flushers are awakened and get to work. But on 
MGLRU, one can see OOM killer outputs like the following when doing an rsync 
with a memory.max of 32M:

---

     % systemd-run --user -t -p MemoryMax=32M -- rsync -rv ... /mnt/usb
     Running as unit: run-u640.service
     Press ^] three times within 1s to disconnect TTY.
     sending incremental file list
     ...
     rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(713) [generator=3.2.7]

---

     [41368.535735] Memory cgroup out of memory: Killed process 128824 (rsync) total-vm:14008kB, anon-rss:256kB, file-rss:5504kB, shmem-rss:0kB, UID:1000 pgtables:64kB oom_score_adj:200
     [41369.847965] rsync invoked oom-killer: gfp_mask=0x408d40(GFP_NOFS|__GFP_NOFAIL|__GFP_ZERO|__GFP_ACCOUNT), order=0, oom_score_adj=200
     [41369.847972] CPU: 1 PID: 128826 Comm: rsync Tainted: G S         OE      6.7.4-arch1-1 #1 20d30c48b78a04be2046f4b305b40455f0b5b38b
     [41369.847975] Hardware name: LENOVO 20WNS23A0G/20WNS23A0G, BIOS N35ET53W (1.53 ) 03/22/2023
     [41369.847977] Call Trace:
     [41369.847978]  <TASK>
     [41369.847980]  dump_stack_lvl+0x47/0x60
     [41369.847985]  dump_header+0x45/0x1b0
     [41369.847988]  oom_kill_process+0xfa/0x200
     [41369.847990]  out_of_memory+0x244/0x590
     [41369.847992]  mem_cgroup_out_of_memory+0x134/0x150
     [41369.847995]  try_charge_memcg+0x76d/0x870
     [41369.847998]  ? try_charge_memcg+0xcd/0x870
     [41369.848000]  obj_cgroup_charge+0xb8/0x1b0
     [41369.848002]  kmem_cache_alloc+0xaa/0x310
     [41369.848005]  ? alloc_buffer_head+0x1e/0x80
     [41369.848007]  alloc_buffer_head+0x1e/0x80
     [41369.848009]  folio_alloc_buffers+0xab/0x180
     [41369.848012]  ? __pfx_fat_get_block+0x10/0x10 [fat 0a109de409393851f8a884f020fb5682aab8dcd1]
     [41369.848021]  create_empty_buffers+0x1d/0xb0
     [41369.848023]  __block_write_begin_int+0x524/0x600
     [41369.848026]  ? __pfx_fat_get_block+0x10/0x10 [fat 0a109de409393851f8a884f020fb5682aab8dcd1]
     [41369.848031]  ? __filemap_get_folio+0x168/0x2e0
     [41369.848033]  ? __pfx_fat_get_block+0x10/0x10 [fat 0a109de409393851f8a884f020fb5682aab8dcd1]
     [41369.848038]  block_write_begin+0x52/0x120
     [41369.848040]  fat_write_begin+0x34/0x80 [fat 0a109de409393851f8a884f020fb5682aab8dcd1]
     [41369.848046]  ? __pfx_fat_get_block+0x10/0x10 [fat 0a109de409393851f8a884f020fb5682aab8dcd1]
     [41369.848051]  generic_perform_write+0xd6/0x240
     [41369.848054]  generic_file_write_iter+0x65/0xd0
     [41369.848056]  vfs_write+0x23a/0x400
     [41369.848060]  ksys_write+0x6f/0xf0
     [41369.848063]  do_syscall_64+0x61/0xe0
     [41369.848065]  ? do_user_addr_fault+0x304/0x670
     [41369.848069]  ? exc_page_fault+0x7f/0x180
     [41369.848071]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
     [41369.848074] RIP: 0033:0x7965df71a184
     [41369.848116] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 3e 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
     [41369.848117] RSP: 002b:00007fffee661738 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
     [41369.848119] RAX: ffffffffffffffda RBX: 0000570f66343bb0 RCX: 00007965df71a184
     [41369.848121] RDX: 0000000000040000 RSI: 0000570f66343bb0 RDI: 0000000000000003
     [41369.848122] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000570f66343b20
     [41369.848122] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000649
     [41369.848123] R13: 0000570f651f8b40 R14: 0000000000008000 R15: 0000570f6633bba0
     [41369.848125]  </TASK>
     [41369.848126] memory: usage 32768kB, limit 32768kB, failcnt 21239
     [41369.848126] swap: usage 2112kB, limit 9007199254740988kB, failcnt 0
     [41369.848127] Memory cgroup stats for /user.slice/user-1000.slice/user@1000.service/app.slice/run-u640.service:
     [41369.848174] anon 0
     [41369.848175] file 26927104
     [41369.848176] kernel 6615040
     [41369.848176] kernel_stack 32768
     [41369.848177] pagetables 122880
     [41369.848177] sec_pagetables 0
     [41369.848177] percpu 480
     [41369.848178] sock 0
     [41369.848178] vmalloc 0
     [41369.848178] shmem 0
     [41369.848179] zswap 312451
     [41369.848179] zswapped 1458176
     [41369.848179] file_mapped 0
     [41369.848180] file_dirty 26923008
     [41369.848180] file_writeback 0
     [41369.848180] swapcached 12288
     [41369.848181] anon_thp 0
     [41369.848181] file_thp 0
     [41369.848181] shmem_thp 0
     [41369.848182] inactive_anon 0
     [41369.848182] active_anon 12288
     [41369.848182] inactive_file 15908864
     [41369.848183] active_file 11014144
     [41369.848183] unevictable 0
     [41369.848183] slab_reclaimable 5963640
     [41369.848184] slab_unreclaimable 89048
     [41369.848184] slab 6052688
     [41369.848185] workingset_refault_anon 4031
     [41369.848185] workingset_refault_file 9236
     [41369.848185] workingset_activate_anon 691
     [41369.848186] workingset_activate_file 2553
     [41369.848186] workingset_restore_anon 691
     [41369.848186] workingset_restore_file 0
     [41369.848187] workingset_nodereclaim 0
     [41369.848187] pgscan 40473
     [41369.848187] pgsteal 20881
     [41369.848188] pgscan_kswapd 0
     [41369.848188] pgscan_direct 40473
     [41369.848188] pgscan_khugepaged 0
     [41369.848189] pgsteal_kswapd 0
     [41369.848189] pgsteal_direct 20881
     [41369.848190] pgsteal_khugepaged 0
     [41369.848190] pgfault 6019
     [41369.848190] pgmajfault 4033
     [41369.848191] pgrefill 30578988
     [41369.848191] pgactivate 2925
     [41369.848191] pgdeactivate 0
     [41369.848192] pglazyfree 0
     [41369.848192] pglazyfreed 0
     [41369.848192] zswpin 1520
     [41369.848193] zswpout 1141
     [41369.848193] thp_fault_alloc 0
     [41369.848193] thp_collapse_alloc 0
     [41369.848194] thp_swpout 0
     [41369.848194] thp_swpout_fallback 0
     [41369.848194] Tasks state (memory values in pages):
     [41369.848195] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
     [41369.848195] [ 128825]  1000 128825     3449      864    65536      192           200 rsync
     [41369.848198] [ 128826]  1000 128826     3523      288    57344      288           200 rsync
     [41369.848199] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/user.slice/user-1000.slice/user@1000.service/app.slice/run-u640.service,task_memcg=/user.slice/user-1000.slice/user@1000.service/app.slice/run-u640.service,task=rsync,pid=128825,uid=1000
     [41369.848207] Memory cgroup out of memory: Killed process 128825 (rsync) total-vm:13796kB, anon-rss:0kB, file-rss:3456kB, shmem-rss:0kB, UID:1000 pgtables:64kB oom_score_adj:200

---

Importantly, note that there appears to be no attempt to write back before 
declaring OOM -- file_writeback is 0 when file_dirty is 26923008. The issue is 
consistently reproducible (and thanks Johannes for looking at this with me).

On non-MGLRU, flushers are active and are making forward progress in preventing 
OOM.

This is writing to a slow disk with about ~10MiB/s available write speed, so 
the CPU and read speed is far faster than the write speed the disk 
can take.

Is this a known problem in MGLRU? If not, could you point me to where MGLRU 
tries to handle flusher wakeup on slow I/O? I didn't immediately find it.

Thanks,

Chris

             reply	other threads:[~2024-02-09  2:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-09  2:31 Chris Down [this message]
2024-02-29 17:28 ` MGLRU premature memcg OOM on slow writes Chris Down
2024-02-29 23:51 ` Axel Rasmussen
2024-03-01  0:30   ` Chris Down
2024-03-08 19:18     ` Axel Rasmussen
2024-03-08 21:22       ` Johannes Weiner
2024-03-11  9:11       ` Yafang Shao
2024-03-12 16:44         ` Axel Rasmussen
2024-03-12 20:07           ` Yu Zhao
2024-03-12 20:11             ` Yu Zhao
2024-03-13  3:33               ` Yafang Shao
2024-03-14 22:23                 ` Yu Zhao
2024-03-15  2:38                   ` Yafang Shao
2024-03-15 14:27                     ` Johannes Weiner
2024-03-12 21:08             ` Johannes Weiner
2024-03-13  2:08               ` Yu Zhao
2024-03-13  3:22                 ` Johannes Weiner
2024-03-13 10:59               ` Hillf Danton
2024-03-01 11:25   ` Hillf Danton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZcWOh9u3uqZjNFMa@chrisdown.name \
    --to=chris@chrisdown.name \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.