From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755289AbeBOIs6 convert rfc822-to-8bit (ORCPT ); Thu, 15 Feb 2018 03:48:58 -0500 Received: from smtprelay.restena.lu ([158.64.1.62]:48340 "EHLO smtprelay.restena.lu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755206AbeBOIs4 (ORCPT ); Thu, 15 Feb 2018 03:48:56 -0500 Date: Thu, 15 Feb 2018 09:40:24 +0100 From: Bruno =?UTF-8?B?UHLDqW1vbnQ=?= To: cgroups@vger.kernel.org Cc: Tejun Heo , Johannes Weiner , Michal Hocko , Vladimir Davydov , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: CGroup-v2: Memory: current impossibly high for empty cgroup Message-ID: <20180215094024.5fc6161a@pluto.restena.lu> X-Mailer: Claws Mail 3.15.0-dirty (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With 4.15.2 kernel I'm hitting a state where a given leaf v2 cgroup is considering itself as permanently over-limit and OOM-kill any process I try to move into it (it's currently empty!) I can't hand out a simple reproducer right now, but it seems during accounting the counter went "negative". CGroupv2 is mounted on /sys/fs/cgroup/hosting/ and the leaf-cgroup is /sys/fs/cgroup/hosting/websrv/: cgroup.controllers:cpu io memory pids cgroup.events:populated 0 cgroup.max.depth:max cgroup.max.descendants:max cgroup.stat:nr_descendants 0 cgroup.stat:nr_dying_descendants 0 cgroup.type:domain cpu.max:max 100000 cpu.stat:usage_usec 48830679 cpu.stat:user_usec 25448590 cpu.stat:system_usec 23382088 cpu.stat:nr_periods 0 cpu.stat:nr_throttled 0 cpu.stat:throttled_usec 0 cpu.weight:100 cpu.weight.nice:0 io.bfq.weight:100 io.stat:8:0 rbytes=143360 wbytes=32768 rios=24 wios=7 io.stat:7:0 rbytes=51687424 wbytes=0 rios=12619 wios=0 io.weight:default 100 memory.current:18446744073694965760 memory.events:low 0 memory.events:high 0 memory.events:max 114 memory.events:oom 15 memory.events:oom_kill 8 memory.high:8589934592 memory.low:4294967296 memory.max:17179869184 memory.stat:anon 0 memory.stat:file 0 memory.stat:kernel_stack 0 memory.stat:slab 593920 memory.stat:sock 0 memory.stat:shmem 0 memory.stat:file_mapped 0 memory.stat:file_dirty 0 memory.stat:file_writeback 0 memory.stat:inactive_anon 0 memory.stat:active_anon 0 memory.stat:inactive_file 0 memory.stat:active_file 0 memory.stat:unevictable 0 memory.stat:slab_reclaimable 204800 memory.stat:slab_unreclaimable 389120 memory.stat:pgfault 80983 memory.stat:pgmajfault 259 memory.stat:pgrefill 115 memory.stat:pgscan 384 memory.stat:pgsteal 263 memory.stat:pgactivate 100 memory.stat:pgdeactivate 115 memory.stat:pglazyfree 0 memory.stat:pglazyfreed 0 memory.stat:workingset_refault 0 memory.stat:workingset_activate 0 memory.stat:workingset_nodereclaim 3 pids.current:0 pids.events:max 0 pids.max:max cgroup.procs is empty. Possibly interesting content in kernel log: [580391.746367] WARNING: CPU: 1 PID: 24354 at /data/kernel/linux-4.15/mm/page_counter.c:27 page_counter_cancel+0x10/0x20 [580391.746498] Modules linked in: floppy [580391.746551] CPU: 1 PID: 24354 Comm: image-starter Not tainted 4.15.2-x86_64-vmware #2 [580391.746612] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.0.B64.1508272355 08/27/2015 [580391.746716] RIP: 0010:page_counter_cancel+0x10/0x20 [580391.746767] RSP: 0018:ffffad4f8475fb08 EFLAGS: 00010293 [580391.746819] RAX: 0000000000000005 RBX: ffffad4f8475fb38 RCX: 000000000000000e [580391.746877] RDX: ffffa058b6a51148 RSI: 000000000000000e RDI: ffffa058b6a51148 [580391.746935] RBP: 000000000000000e R08: 0000000000022b30 R09: 0000000000000c4c [580391.746993] R10: 0000000000000004 R11: ffffffffffffffff R12: ffffad4f8475fb38 [580391.747051] R13: ffffffffbbc68df8 R14: ffffad4f8475fd30 R15: 0000000000000001 [580391.747117] FS: 00007fd5e3b84740(0000) GS:ffffa058bdd00000(0000) knlGS:0000000000000000 [580391.747178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [580391.747233] CR2: 00000000016fa3e8 CR3: 0000000005c0a001 CR4: 00000000000606a0 [580391.747344] Call Trace: [580391.747400] page_counter_uncharge+0x16/0x20 [580391.747450] uncharge_batch+0x2e/0x150 [580391.747503] mem_cgroup_uncharge_list+0x54/0x60 [580391.747555] release_pages+0x2f6/0x330 [580391.747609] __pagevec_release+0x25/0x30 [580391.747659] truncate_inode_pages_range+0x251/0x6e0 [580391.747712] ? write_cache_pages+0x311/0x350 [580391.747763] __blkdev_put+0x6f/0x1e0 [580391.747815] deactivate_locked_super+0x2a/0x60 [580391.747870] cleanup_mnt+0x40/0x60 [580391.747921] task_work_run+0x7b/0xa0 [580391.747975] do_exit+0x38d/0x960 [580391.749543] do_group_exit+0x99/0xa0 [580391.750525] SyS_exit_group+0xb/0x10 [580391.751457] do_syscall_64+0x74/0x120 [580391.752427] entry_SYSCALL_64_after_hwframe+0x21/0x86 [580391.753370] RIP: 0033:0x7fd5e3463529 [580391.754290] RSP: 002b:00007ffdaad73478 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7 [580391.755205] RAX: ffffffffffffffda RBX: 000000002d220011 RCX: 00007fd5e3463529 [580391.756119] RDX: 0000000000000000 RSI: 00007ffdaad732dc RDI: 0000000000000000 [580391.758421] RBP: 0000000000000000 R08: 000000000000003c R09: 00000000000000e7 [580391.759590] R10: ffffffffffffff80 R11: 0000000000000206 R12: 00007ffdaad72480 [580391.760499] R13: 0000000000001000 R14: 00000000016f8080 R15: 0000000000000000 [580391.761366] Code: eb 07 e8 f4 26 fb ff eb d0 48 c7 c7 00 40 c3 bb e8 46 50 44 00 89 d8 5b 5d c3 90 48 89 f0 48 f7 d8 f0 48 0f c1 07 48 39 f0 79 02 <0f> ff c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 eb 19 48 89 f0 [580391.763136] ---[ end trace 60c773f283acdb6f ]--- [580393.195186] image-starter invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0 [580393.199401] image-starter cpuset=/ mems_allowed=0 [580393.201491] CPU: 1 PID: 28447 Comm: image-starter Tainted: G W 4.15.2-x86_64-vmware #2 [580393.202899] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.0.B64.1508272355 08/27/2015 [580393.202901] Call Trace: [580393.202918] dump_stack+0x5c/0x85 [580393.202927] dump_header+0x5a/0x233 [580393.202933] oom_kill_process+0x91/0x3e0 [580393.202936] out_of_memory+0x221/0x240 [580393.202940] mem_cgroup_out_of_memory+0x36/0x50 [580393.202942] mem_cgroup_oom_synchronize+0x1ff/0x300 [580393.202945] ? __mem_cgroup_insert_exceeded+0x90/0x90 [580393.202947] pagefault_out_of_memory+0x1f/0x4d [580393.202951] __do_page_fault+0x323/0x390 [580393.202957] ? page_fault+0x36/0x60 [580393.202959] page_fault+0x4c/0x60 [580393.202964] RIP: 0033:0x7f9edd31c919 [580393.202966] RSP: 002b:00007ffd818f4290 EFLAGS: 00010246 [580393.202968] Task in /websrv killed as a result of limit of /websrv [580393.202972] memory: usage 18446744073709537436kB, limit 16777216kB, failcnt 44 [580393.202973] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 [580393.202975] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 [580393.202976] Memory cgroup stats for /websrv: cache:16KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:16KB inactive_anon:0KB active_anon:0KB inactive_file:8KB active_file:8KB unevictable:0KB [580393.202983] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [580393.203001] [28447] 0 28447 2143 31 57344 0 0 image-starter [580393.203004] [28448] 0 28448 2143 34 57344 0 0 image-starter [580393.203006] [28449] 0 28449 2143 34 57344 0 0 image-starter [580393.203008] [28450] 0 28450 2143 34 57344 0 0 image-starter [580393.203010] [28451] 0 28451 2143 34 57344 0 0 image-starter [580393.203012] Memory cgroup out of memory: Kill process 28449 (image-starter) score 0 or sacrifice child [580393.203026] Killed process 28449 (image-starter) total-vm:8572kB, anon-rss:136kB, file-rss:0kB, shmem-rss:0kB ... (other processes being OOM-killed) image-starter is a small kind of container managing tool that makes use of namespaces and assigns namespaced processes to corresponding cgroups (pertinent cgroup directories are bind-mounted into the mount namespace). The probable memory-counter-underflow seems to have happened either on stop of container (likely) or on new start of it. Possibly of interest, some mmaps inside cgroup did fail during previous run of container due to RLIMIT_DATA. As this is a test system I can happily perform debugging and experimentation on it (I'm currently working at moving from v1 cgroups to v2 cgroups). Cheers, Bruno From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bruno =?UTF-8?B?UHLDqW1vbnQ=?= Subject: CGroup-v2: Memory: current impossibly high for empty cgroup Date: Thu, 15 Feb 2018 09:40:24 +0100 Message-ID: <20180215094024.5fc6161a@pluto.restena.lu> Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: cgroups@vger.kernel.org Cc: Tejun Heo , Johannes Weiner , Michal Hocko , Vladimir Davydov , linux-mm@kvack.org, linux-kernel@vger.kernel.org With 4.15.2 kernel I'm hitting a state where a given leaf v2 cgroup is considering itself as permanently over-limit and OOM-kill any process I try to move into it (it's currently empty!) I can't hand out a simple reproducer right now, but it seems during accounting the counter went "negative". CGroupv2 is mounted on /sys/fs/cgroup/hosting/ and the leaf-cgroup is /sys/fs/cgroup/hosting/websrv/: cgroup.controllers:cpu io memory pids cgroup.events:populated 0 cgroup.max.depth:max cgroup.max.descendants:max cgroup.stat:nr_descendants 0 cgroup.stat:nr_dying_descendants 0 cgroup.type:domain cpu.max:max 100000 cpu.stat:usage_usec 48830679 cpu.stat:user_usec 25448590 cpu.stat:system_usec 23382088 cpu.stat:nr_periods 0 cpu.stat:nr_throttled 0 cpu.stat:throttled_usec 0 cpu.weight:100 cpu.weight.nice:0 io.bfq.weight:100 io.stat:8:0 rbytes=3D143360 wbytes=3D32768 rios=3D24 wios=3D7 io.stat:7:0 rbytes=3D51687424 wbytes=3D0 rios=3D12619 wios=3D0 io.weight:default 100 memory.current:18446744073694965760 memory.events:low 0 memory.events:high 0 memory.events:max 114 memory.events:oom 15 memory.events:oom_kill 8 memory.high:8589934592 memory.low:4294967296 memory.max:17179869184 memory.stat:anon 0 memory.stat:file 0 memory.stat:kernel_stack 0 memory.stat:slab 593920 memory.stat:sock 0 memory.stat:shmem 0 memory.stat:file_mapped 0 memory.stat:file_dirty 0 memory.stat:file_writeback 0 memory.stat:inactive_anon 0 memory.stat:active_anon 0 memory.stat:inactive_file 0 memory.stat:active_file 0 memory.stat:unevictable 0 memory.stat:slab_reclaimable 204800 memory.stat:slab_unreclaimable 389120 memory.stat:pgfault 80983 memory.stat:pgmajfault 259 memory.stat:pgrefill 115 memory.stat:pgscan 384 memory.stat:pgsteal 263 memory.stat:pgactivate 100 memory.stat:pgdeactivate 115 memory.stat:pglazyfree 0 memory.stat:pglazyfreed 0 memory.stat:workingset_refault 0 memory.stat:workingset_activate 0 memory.stat:workingset_nodereclaim 3 pids.current:0 pids.events:max 0 pids.max:max cgroup.procs is empty. Possibly interesting content in kernel log: [580391.746367] WARNING: CPU: 1 PID: 24354 at /data/kernel/linux-4.15/mm/pa= ge_counter.c:27 page_counter_cancel+0x10/0x20 [580391.746498] Modules linked in: floppy [580391.746551] CPU: 1 PID: 24354 Comm: image-starter Not tainted 4.15.2-x8= 6_64-vmware #2 [580391.746612] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Referen= ce Platform, BIOS VMW71.00V.0.B64.1508272355 08/27/2015 [580391.746716] RIP: 0010:page_counter_cancel+0x10/0x20 [580391.746767] RSP: 0018:ffffad4f8475fb08 EFLAGS: 00010293 [580391.746819] RAX: 0000000000000005 RBX: ffffad4f8475fb38 RCX: 0000000000= 00000e [580391.746877] RDX: ffffa058b6a51148 RSI: 000000000000000e RDI: ffffa058b6= a51148 [580391.746935] RBP: 000000000000000e R08: 0000000000022b30 R09: 0000000000= 000c4c [580391.746993] R10: 0000000000000004 R11: ffffffffffffffff R12: ffffad4f84= 75fb38 [580391.747051] R13: ffffffffbbc68df8 R14: ffffad4f8475fd30 R15: 0000000000= 000001 [580391.747117] FS: 00007fd5e3b84740(0000) GS:ffffa058bdd00000(0000) knlGS= :0000000000000000 [580391.747178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [580391.747233] CR2: 00000000016fa3e8 CR3: 0000000005c0a001 CR4: 0000000000= 0606a0 [580391.747344] Call Trace: [580391.747400] page_counter_uncharge+0x16/0x20 [580391.747450] uncharge_batch+0x2e/0x150 [580391.747503] mem_cgroup_uncharge_list+0x54/0x60 [580391.747555] release_pages+0x2f6/0x330 [580391.747609] __pagevec_release+0x25/0x30 [580391.747659] truncate_inode_pages_range+0x251/0x6e0 [580391.747712] ? write_cache_pages+0x311/0x350 [580391.747763] __blkdev_put+0x6f/0x1e0 [580391.747815] deactivate_locked_super+0x2a/0x60 [580391.747870] cleanup_mnt+0x40/0x60 [580391.747921] task_work_run+0x7b/0xa0 [580391.747975] do_exit+0x38d/0x960 [580391.749543] do_group_exit+0x99/0xa0 [580391.750525] SyS_exit_group+0xb/0x10 [580391.751457] do_syscall_64+0x74/0x120 [580391.752427] entry_SYSCALL_64_after_hwframe+0x21/0x86 [580391.753370] RIP: 0033:0x7fd5e3463529 [580391.754290] RSP: 002b:00007ffdaad73478 EFLAGS: 00000206 ORIG_RAX: 00000= 000000000e7 [580391.755205] RAX: ffffffffffffffda RBX: 000000002d220011 RCX: 00007fd5e3= 463529 [580391.756119] RDX: 0000000000000000 RSI: 00007ffdaad732dc RDI: 0000000000= 000000 [580391.758421] RBP: 0000000000000000 R08: 000000000000003c R09: 0000000000= 0000e7 [580391.759590] R10: ffffffffffffff80 R11: 0000000000000206 R12: 00007ffdaa= d72480 [580391.760499] R13: 0000000000001000 R14: 00000000016f8080 R15: 0000000000= 000000 [580391.761366] Code: eb 07 e8 f4 26 fb ff eb d0 48 c7 c7 00 40 c3 bb e8 46= 50 44 00 89 d8 5b 5d c3 90 48 89 f0 48 f7 d8 f0 48 0f c1 07 48 39 f0 79 02= <0f> ff c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 eb 19 48 89 f0 [580391.763136] ---[ end trace 60c773f283acdb6f ]--- [580393.195186] image-starter invoked oom-killer: gfp_mask=3D0x14000c0(GFP_= KERNEL), nodemask=3D(null), order=3D0, oom_score_adj=3D0 [580393.199401] image-starter cpuset=3D/ mems_allowed=3D0 [580393.201491] CPU: 1 PID: 28447 Comm: image-starter Tainted: G W = 4.15.2-x86_64-vmware #2 [580393.202899] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Referen= ce Platform, BIOS VMW71.00V.0.B64.1508272355 08/27/2015 [580393.202901] Call Trace: [580393.202918] dump_stack+0x5c/0x85 [580393.202927] dump_header+0x5a/0x233 [580393.202933] oom_kill_process+0x91/0x3e0 [580393.202936] out_of_memory+0x221/0x240 [580393.202940] mem_cgroup_out_of_memory+0x36/0x50 [580393.202942] mem_cgroup_oom_synchronize+0x1ff/0x300 [580393.202945] ? __mem_cgroup_insert_exceeded+0x90/0x90 [580393.202947] pagefault_out_of_memory+0x1f/0x4d [580393.202951] __do_page_fault+0x323/0x390 [580393.202957] ? page_fault+0x36/0x60 [580393.202959] page_fault+0x4c/0x60 [580393.202964] RIP: 0033:0x7f9edd31c919 [580393.202966] RSP: 002b:00007ffd818f4290 EFLAGS: 00010246 [580393.202968] Task in /websrv killed as a result of limit of /websrv [580393.202972] memory: usage 18446744073709537436kB, limit 16777216kB, fai= lcnt 44 [580393.202973] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 [580393.202975] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 [580393.202976] Memory cgroup stats for /websrv: cache:16KB rss:0KB rss_hug= e:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:16KB inactive_anon:0KB = active_anon:0KB inactive_file:8KB active_file:8KB unevictable:0KB [580393.202983] [ pid ] uid tgid total_vm rss pgtables_bytes swapen= ts oom_score_adj name [580393.203001] [28447] 0 28447 2143 31 57344 0 = 0 image-starter [580393.203004] [28448] 0 28448 2143 34 57344 0 = 0 image-starter [580393.203006] [28449] 0 28449 2143 34 57344 0 = 0 image-starter [580393.203008] [28450] 0 28450 2143 34 57344 0 = 0 image-starter [580393.203010] [28451] 0 28451 2143 34 57344 0 = 0 image-starter [580393.203012] Memory cgroup out of memory: Kill process 28449 (image-star= ter) score 0 or sacrifice child [580393.203026] Killed process 28449 (image-starter) total-vm:8572kB, anon-= rss:136kB, file-rss:0kB, shmem-rss:0kB ... (other processes being OOM-killed) image-starter is a small kind of container managing tool that makes use of namespaces and assigns namespaced processes to corresponding cgroups (perti= nent cgroup directories are bind-mounted into the mount namespace). The probable memory-counter-underflow seems to have happened either on stop= of container (likely) or on new start of it. Possibly of interest, some mmaps inside cgroup did fail during previous run= of container due to RLIMIT_DATA. As this is a test system I can happily perform debugging and experimentatio= n on it (I'm currently working at moving from v1 cgroups to v2 cgroups). Cheers, Bruno -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org