* Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. @ 2019-08-01 18:04 Masoud Sharbiani 2019-08-01 18:19 ` Greg KH 2019-08-02 7:40 ` Michal Hocko 0 siblings, 2 replies; 23+ messages in thread From: Masoud Sharbiani @ 2019-08-01 18:04 UTC (permalink / raw) To: gregkh, mhocko, hannes, vdavydov.dev; +Cc: linux-mm, cgroups, linux-kernel [-- Attachment #1: Type: text/plain, Size: 3831 bytes --] Hey folks, I’ve come across an issue that affects most of 4.19, 4.20 and 5.2 linux-stable kernels that has only been fixed in 5.3-rc1. It was introduced by 29ef680 memcg, oom: move out_of_memory back to the charge path The gist of it is that if you have a memory control group for a process that repeatedly maps all of the pages of a file with repeated calls to: mmap(NULL, pages * PAGE_SIZE, PROT_WRITE|PROT_READ, MAP_FILE|MAP_PRIVATE, fd, 0) The memory cg eventually runs out of memory, as it should. However, prior to the 29ef680 commit, it would kill the running process with OOM; After that commit ( and until 5.3-rc1; Haven’t pinpointed the exact commit in between 5.2.0 and 5.3-rc1) the offending process goes into %100 CPU usage, and doesn’t die (prior behavior) or fail the mmap call (which is what happens if one runs the test program with a low ulimit -v value). Any ideas on how to chase this down further? (Test program and script have been pasted below) Thanks, Masoud //——— leaker.c —— #include <sys/mman.h> #include <fcntl.h> #include <unistd.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/uio.h> #include <stdio.h> #include <errno.h> #include <signal.h> #ifndef PAGE_SIZE #define PAGE_SIZE 4096 #endif void sighandler(int x) { printf("SIGNAL %d received. Quitting\n", x); exit(2); } int main(int ac, char*av[]) { int i; int fd; int pages = 4096; char buf[PAGE_SIZE]; char *d; int sum = 0, loop_cnt = 0; int max_loops = 100000; // For getopt(3) stuff: int opt; while ((opt = getopt(ac, av, "p:c:")) != -1) { switch (opt) { case 'p': pages = atoi(optarg); break; case 'c': max_loops = atoi(optarg); break; default: fprintf(stderr, "Wrong usage:\n"); fprintf(stderr, "%s -p <pages> -c <loop_count>\n", av[0]); exit(-1); } } signal(SIGTERM, sighandler); printf("Mapping %d pages anonymously %d times.\n", pages, max_loops); printf("File size will be %ld\n", pages * (long)PAGE_SIZE); printf("max memory usage size will be %ld\n", (long) max_loops * pages * PAGE_SIZE); memset(buf, 0, PAGE_SIZE); fd = open("big-data-file.bin", O_CREAT|O_WRONLY|O_TRUNC , S_IRUSR | S_IWUSR); if (fd == -1) { printf("open failed: %d - %s\n", errno, strerror(errno)); return -1; } for (i=0; i < pages; i++) { write(fd, buf, PAGE_SIZE); } close(fd); fd = open("big-data-file.bin", O_RDWR); printf("fd is %d\n", fd); while (loop_cnt < max_loops) { d = mmap(NULL, pages * PAGE_SIZE, PROT_WRITE|PROT_READ, MAP_FILE|MAP_PRIVATE, fd, 0); if (d == MAP_FAILED) { printf("mmap failed: %d - %s\n", errno, strerror(errno)); return -1; } printf("Buffer is @ %p\n", d); for (i = 0; i < pages * PAGE_SIZE; i++) { sum += d[i]; if ((i & (PAGE_SIZE-1)) == 0) d[i] = 42; } printf("Todal sum was %d. Loop count is %d\n", sum, loop_cnt++); } close(fd); return 0; } ///—— test script launching it… #!/bin/sh if [ `id -u` -ne 0 ]; then echo NEED TO RUN THIS AS ROOT.; exit 1 fi PID=$(echo $$) echo PID detected as: $PID mkdir /sys/fs/cgroup/memory/leaker echo 536870912 > /sys/fs/cgroup/memory/leaker/memory.limit_in_bytes echo leaker mem cgroup created, with `cat /sys/fs/cgroup/memory/leaker/memory.limit_in_bytes` bytes. echo $PID > /sys/fs/cgroup/memory/leaker/cgroup.procs echo Moved into the leaker cgroup. ps -o cgroup $PID sleep 15 echo Starting... ./leaker -p 10240 -c 100000 [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 3437 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-01 18:04 Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1 Masoud Sharbiani @ 2019-08-01 18:19 ` Greg KH 2019-08-02 1:08 ` Masoud Sharbiani 2019-08-02 7:40 ` Michal Hocko 1 sibling, 1 reply; 23+ messages in thread From: Greg KH @ 2019-08-01 18:19 UTC (permalink / raw) To: Masoud Sharbiani Cc: mhocko, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel On Thu, Aug 01, 2019 at 11:04:14AM -0700, Masoud Sharbiani wrote: > Hey folks, > I’ve come across an issue that affects most of 4.19, 4.20 and 5.2 linux-stable kernels that has only been fixed in 5.3-rc1. > It was introduced by > > 29ef680 memcg, oom: move out_of_memory back to the charge path > > The gist of it is that if you have a memory control group for a process that repeatedly maps all of the pages of a file with repeated calls to: > > mmap(NULL, pages * PAGE_SIZE, PROT_WRITE|PROT_READ, MAP_FILE|MAP_PRIVATE, fd, 0) > > The memory cg eventually runs out of memory, as it should. However, > prior to the 29ef680 commit, it would kill the running process with > OOM; After that commit ( and until 5.3-rc1; Haven’t pinpointed the > exact commit in between 5.2.0 and 5.3-rc1) the offending process goes > into %100 CPU usage, and doesn’t die (prior behavior) or fail the mmap > call (which is what happens if one runs the test program with a low > ulimit -v value). > > Any ideas on how to chase this down further? Finding the exact patch that fixes this would be great, as then I can add it to the 4.19 and 5.2 stable kernels (4.20 is long end-of-life, no idea why you are messing with that one...) thanks, greg k-h ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-01 18:19 ` Greg KH @ 2019-08-02 1:08 ` Masoud Sharbiani 2019-08-02 8:18 ` Michal Hocko 0 siblings, 1 reply; 23+ messages in thread From: Masoud Sharbiani @ 2019-08-02 1:08 UTC (permalink / raw) To: Greg KH; +Cc: mhocko, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel > On Aug 1, 2019, at 11:19 AM, Greg KH <gregkh@linuxfoundation.org> wrote: > > On Thu, Aug 01, 2019 at 11:04:14AM -0700, Masoud Sharbiani wrote: >> Hey folks, >> I’ve come across an issue that affects most of 4.19, 4.20 and 5.2 linux-stable kernels that has only been fixed in 5.3-rc1. >> It was introduced by >> >> 29ef680 memcg, oom: move out_of_memory back to the charge path >> >> The gist of it is that if you have a memory control group for a process that repeatedly maps all of the pages of a file with repeated calls to: >> >> mmap(NULL, pages * PAGE_SIZE, PROT_WRITE|PROT_READ, MAP_FILE|MAP_PRIVATE, fd, 0) >> >> The memory cg eventually runs out of memory, as it should. However, >> prior to the 29ef680 commit, it would kill the running process with >> OOM; After that commit ( and until 5.3-rc1; Haven’t pinpointed the >> exact commit in between 5.2.0 and 5.3-rc1) the offending process goes >> into %100 CPU usage, and doesn’t die (prior behavior) or fail the mmap >> call (which is what happens if one runs the test program with a low >> ulimit -v value). >> >> Any ideas on how to chase this down further? > > Finding the exact patch that fixes this would be great, as then I can > add it to the 4.19 and 5.2 stable kernels (4.20 is long end-of-life, no > idea why you are messing with that one...) > > thanks, > > greg k-h Allow me to issue a correction: Running this test on linux master <629f8205a6cc63d2e8e30956bad958a3507d018f> correctly terminates the leaker app with OOM. However, running it a second time (after removing the memory cgroup, and allowing the test script to run it again), causes this: kernel:watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [leaker1:7193] [ 202.511024] CPU: 7 PID: 7193 Comm: leaker1 Not tainted 5.3.0-rc2+ #8 [ 202.517378] Hardware name: <redacted> [ 202.525554] RIP: 0010:lruvec_lru_size+0x49/0xf0 [ 202.530085] Code: 41 89 ed b8 ff ff ff ff 45 31 f6 49 c1 e5 03 eb 19 48 63 d0 4c 89 e9 48 8b 14 d5 20 b7 11 b5 48 03 8b 88 00 00 00 4c 03 34 11 <48> c7 c6 80 c5 40 b5 89 c7 e8 29 a7 6f 00 3b 05 57 9d 24 01 72 d1 [ 202.548831] RSP: 0018:ffffa7c5480df620 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 [ 202.556398] RAX: 0000000000000000 RBX: ffff8f5b7a1af800 RCX: 00003859bfa03bc0 [ 202.563528] RDX: ffff8f5b7f800000 RSI: 0000000000000018 RDI: ffffffffb540c580 [ 202.570662] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000004 [ 202.577795] R10: ffff8f5b62548000 R11: 0000000000000000 R12: 0000000000000004 [ 202.584928] R13: 0000000000000008 R14: 0000000000000000 R15: 0000000000000000 [ 202.592063] FS: 00007ff73d835740(0000) GS:ffff8f6b7f840000(0000) knlGS:0000000000000000 [ 202.600149] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 202.605895] CR2: 00007f1b1c00e428 CR3: 0000001021d56006 CR4: 00000000001606e0 [ 202.613026] Call Trace: [ 202.615475] shrink_node_memcg+0xdb/0x7a0 [ 202.619488] ? shrink_slab+0x266/0x2a0 [ 202.623242] ? mem_cgroup_iter+0x10a/0x2c0 [ 202.627337] shrink_node+0xdd/0x4c0 [ 202.630831] do_try_to_free_pages+0xea/0x3c0 [ 202.635104] try_to_free_mem_cgroup_pages+0xf5/0x1e0 [ 202.640068] try_charge+0x279/0x7a0 [ 202.643565] mem_cgroup_try_charge+0x51/0x1a0 [ 202.647925] __add_to_page_cache_locked+0x19f/0x330 [ 202.652800] ? __mod_lruvec_state+0x40/0xe0 [ 202.656987] ? scan_shadow_nodes+0x30/0x30 [ 202.661086] add_to_page_cache_lru+0x49/0xd0 [ 202.665361] iomap_readpages_actor+0xea/0x230 [ 202.669718] ? iomap_migrate_page+0xe0/0xe0 [ 202.673906] iomap_apply+0xb8/0x150 [ 202.677398] iomap_readpages+0xa7/0x1a0 [ 202.681237] ? iomap_migrate_page+0xe0/0xe0 [ 202.685424] read_pages+0x68/0x190 [ 202.688829] __do_page_cache_readahead+0x19c/0x1b0 [ 202.693622] ondemand_readahead+0x168/0x2a0 [ 202.697808] filemap_fault+0x32d/0x830 [ 202.701562] ? __mod_lruvec_state+0x40/0xe0 [ 202.705747] ? page_remove_rmap+0xcf/0x150 [ 202.709846] ? alloc_set_pte+0x240/0x2c0 [ 202.713775] __xfs_filemap_fault+0x71/0x1c0 [ 202.717963] __do_fault+0x38/0xb0 [ 202.721280] __handle_mm_fault+0x73f/0x1080 [ 202.725467] ? __switch_to_asm+0x34/0x70 [ 202.729390] ? __switch_to_asm+0x40/0x70 [ 202.733318] handle_mm_fault+0xce/0x1f0 [ 202.737158] __do_page_fault+0x231/0x480 [ 202.741083] page_fault+0x2f/0x40 [ 202.744404] RIP: 0033:0x400c20 [ 202.747461] Code: 45 c8 48 89 c6 bf 32 0e 40 00 b8 00 00 00 00 e8 76 fb ff ff c7 45 ec 00 00 00 00 eb 36 8b 45 ec 48 63 d0 48 8b 45 c8 48 01 d0 <0f> b6 00 0f be c0 01 45 e4 8b 45 ec 25 ff 0f 00 00 85 c0 75 10 8b [ 202.766208] RSP: 002b:00007ffde95ae460 EFLAGS: 00010206 [ 202.771432] RAX: 00007ff71e855000 RBX: 0000000000000000 RCX: 000000000000001a [ 202.778558] RDX: 0000000001dfd000 RSI: 000000007fffffe5 RDI: 0000000000000000 [ 202.785692] RBP: 00007ffde95af4b0 R08: 0000000000000000 R09: 00007ff73d2a520d [ 202.792823] R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000400850 [ 202.799949] R13: 00007ffde95af590 R14: 0000000000000000 R15: 0000000000000000 Further tests show that this also happens if one waits long enough on 5.3-rc1 as well. So I don’t think we have a fix in tree yet. Cheers, Masoud ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-02 1:08 ` Masoud Sharbiani @ 2019-08-02 8:18 ` Michal Hocko 0 siblings, 0 replies; 23+ messages in thread From: Michal Hocko @ 2019-08-02 8:18 UTC (permalink / raw) To: Hillf Danton Cc: Masoud Sharbiani, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel, Greg KH [Hillf, your email client or workflow mangles emails. In this case you are seem to be reusing the message id from the email you are replying to which confuses my email client to assume your email is a duplicate.] On Fri 02-08-19 16:08:01, Hillf Danton wrote: [...] > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2547,8 +2547,12 @@ retry: > nr_reclaimed = try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages, > gfp_mask, may_swap); > > - if (mem_cgroup_margin(mem_over_limit) >= nr_pages) > - goto retry; > + if (mem_cgroup_margin(mem_over_limit) >= nr_pages) { > + if (nr_retries--) > + goto retry; > + /* give up charging memhog */ > + return -ENOMEM; > + } Huh, what? You are effectively saying that we should fail the charge when the requested nr_pages would fit in. This doesn't make much sense to me. What are you trying to achive here? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-01 18:04 Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1 Masoud Sharbiani 2019-08-01 18:19 ` Greg KH @ 2019-08-02 7:40 ` Michal Hocko 2019-08-02 14:18 ` Masoud Sharbiani 1 sibling, 1 reply; 23+ messages in thread From: Michal Hocko @ 2019-08-02 7:40 UTC (permalink / raw) To: Masoud Sharbiani Cc: gregkh, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel On Thu 01-08-19 11:04:14, Masoud Sharbiani wrote: > Hey folks, > I’ve come across an issue that affects most of 4.19, 4.20 and 5.2 linux-stable kernels that has only been fixed in 5.3-rc1. > It was introduced by > > 29ef680 memcg, oom: move out_of_memory back to the charge path This commit shouldn't really change the OOM behavior for your particular test case. It would have changed MAP_POPULATE behavior but your usage is triggering the standard page fault path. The only difference with 29ef680 is that the OOM killer is invoked during the charge path rather than on the way out of the page fault. Anyway, I tried to run your test case in a loop and leaker always ends up being killed as expected with 5.2. See the below oom report. There must be something else going on. How much swap do you have on your system? [337533.314245] leaker invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [337533.314250] CPU: 3 PID: 23793 Comm: leaker Not tainted 5.2.0-rc7 #54 [337533.314251] Hardware name: Dell Inc. Latitude E7470/0T6HHJ, BIOS 1.5.3 04/18/2016 [337533.314252] Call Trace: [337533.314258] dump_stack+0x67/0x8e [337533.314262] dump_header+0x51/0x2e9 [337533.314265] ? preempt_count_sub+0xc6/0xd2 [337533.314267] ? _raw_spin_unlock_irqrestore+0x2c/0x3e [337533.314269] oom_kill_process+0x90/0x11d [337533.314271] out_of_memory+0x25c/0x26f [337533.314273] mem_cgroup_out_of_memory+0x8a/0xa6 [337533.314276] try_charge+0x1d0/0x782 [337533.314278] ? preempt_count_sub+0xc6/0xd2 [337533.314280] mem_cgroup_try_charge+0x1a1/0x207 [337533.314282] __add_to_page_cache_locked+0xf9/0x2dd [337533.314285] ? memcg_drain_all_list_lrus+0x125/0x125 [337533.314286] add_to_page_cache_lru+0x3c/0x96 [337533.314288] pagecache_get_page.part.7+0x1d6/0x240 [337533.314290] filemap_fault+0x267/0x54a [337533.314292] ext4_filemap_fault+0x2d/0x41 [337533.314294] ? ext4_page_mkwrite+0x3cd/0x3cd [337533.314296] __do_fault+0x47/0xa7 [337533.314297] __handle_mm_fault+0xaaa/0xf9d [337533.314300] handle_mm_fault+0x174/0x1c3 [337533.314303] __do_page_fault+0x309/0x412 [337533.314305] do_page_fault+0x10b/0x131 [337533.314307] ? page_fault+0x8/0x30 [337533.314309] page_fault+0x1e/0x30 [337533.314311] RIP: 0033:0x55a806ef8503 [337533.314313] Code: 48 89 c6 48 8d 3d 28 0c 00 00 b8 00 00 00 00 e8 73 fb ff ff c7 45 ec 00 00 00 00 eb 36 8b 45 ec 48 63 d0 48 8b 45 c8 48 01 d0 <0f> b6 00 0f be c0 01 45 e4 8b 45 ec 25 ff 0f 00 00 85 c0 75 10 8b [337533.314314] RSP: 002b:00007ffcf6734730 EFLAGS: 00010206 [337533.314316] RAX: 00007f2228f74000 RBX: 0000000000000000 RCX: 0000000000000000 [337533.314317] RDX: 0000000000487000 RSI: 000055a806efc260 RDI: 0000000000000000 [337533.314318] RBP: 00007ffcf6735780 R08: 0000000000000000 R09: 00007ffcf67345fc [337533.314319] R10: 0000000000000000 R11: 0000000000000246 R12: 000055a806ef8120 [337533.314320] R13: 00007ffcf6735860 R14: 0000000000000000 R15: 0000000000000000 [337533.314322] memory: usage 524288kB, limit 524288kB, failcnt 1240247 [337533.314323] memory+swap: usage 2592556kB, limit 9007199254740988kB, failcnt 0 [337533.314324] kmem: usage 7260kB, limit 9007199254740988kB, failcnt 0 [337533.314325] Memory cgroup stats for /leaker: cache:80KB rss:516948KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:2068268KB inactive_anon:258520KB active_anon:258412KB inactive_file:32KB active_file:12KB unevictable:0KB [337533.314332] Tasks state (memory values in pages): [337533.314333] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [337533.314404] [ 23777] 0 23777 596 400 36864 4 0 sh [337533.314407] [ 23793] 0 23793 655928 126942 5226496 519670 0 leaker [337533.314408] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),oom_memcg=/leaker,task_memcg=/leaker,task=leaker,pid=23793,uid=0 [337533.314412] Memory cgroup out of memory: Killed process 23793 (leaker) total-vm:2623712kB, anon-rss:506500kB, file-rss:1268kB, shmem-rss:0kB [337533.418036] oom_reaper: reaped process 23793 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-02 7:40 ` Michal Hocko @ 2019-08-02 14:18 ` Masoud Sharbiani 2019-08-02 14:41 ` Michal Hocko 0 siblings, 1 reply; 23+ messages in thread From: Masoud Sharbiani @ 2019-08-02 14:18 UTC (permalink / raw) To: Michal Hocko Cc: gregkh, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel > On Aug 2, 2019, at 12:40 AM, Michal Hocko <mhocko@kernel.org> wrote: > > On Thu 01-08-19 11:04:14, Masoud Sharbiani wrote: >> Hey folks, >> I’ve come across an issue that affects most of 4.19, 4.20 and 5.2 linux-stable kernels that has only been fixed in 5.3-rc1. >> It was introduced by >> >> 29ef680 memcg, oom: move out_of_memory back to the charge path > > This commit shouldn't really change the OOM behavior for your particular > test case. It would have changed MAP_POPULATE behavior but your usage is > triggering the standard page fault path. The only difference with > 29ef680 is that the OOM killer is invoked during the charge path rather > than on the way out of the page fault. > > Anyway, I tried to run your test case in a loop and leaker always ends > up being killed as expected with 5.2. See the below oom report. There > must be something else going on. How much swap do you have on your > system? I do not have swap defined. -m > > [337533.314245] leaker invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 > [337533.314250] CPU: 3 PID: 23793 Comm: leaker Not tainted 5.2.0-rc7 #54 > [337533.314251] Hardware name: Dell Inc. Latitude E7470/0T6HHJ, BIOS 1.5.3 04/18/2016 > [337533.314252] Call Trace: > [337533.314258] dump_stack+0x67/0x8e > [337533.314262] dump_header+0x51/0x2e9 > [337533.314265] ? preempt_count_sub+0xc6/0xd2 > [337533.314267] ? _raw_spin_unlock_irqrestore+0x2c/0x3e > [337533.314269] oom_kill_process+0x90/0x11d > [337533.314271] out_of_memory+0x25c/0x26f > [337533.314273] mem_cgroup_out_of_memory+0x8a/0xa6 > [337533.314276] try_charge+0x1d0/0x782 > [337533.314278] ? preempt_count_sub+0xc6/0xd2 > [337533.314280] mem_cgroup_try_charge+0x1a1/0x207 > [337533.314282] __add_to_page_cache_locked+0xf9/0x2dd > [337533.314285] ? memcg_drain_all_list_lrus+0x125/0x125 > [337533.314286] add_to_page_cache_lru+0x3c/0x96 > [337533.314288] pagecache_get_page.part.7+0x1d6/0x240 > [337533.314290] filemap_fault+0x267/0x54a > [337533.314292] ext4_filemap_fault+0x2d/0x41 > [337533.314294] ? ext4_page_mkwrite+0x3cd/0x3cd > [337533.314296] __do_fault+0x47/0xa7 > [337533.314297] __handle_mm_fault+0xaaa/0xf9d > [337533.314300] handle_mm_fault+0x174/0x1c3 > [337533.314303] __do_page_fault+0x309/0x412 > [337533.314305] do_page_fault+0x10b/0x131 > [337533.314307] ? page_fault+0x8/0x30 > [337533.314309] page_fault+0x1e/0x30 > [337533.314311] RIP: 0033:0x55a806ef8503 > [337533.314313] Code: 48 89 c6 48 8d 3d 28 0c 00 00 b8 00 00 00 00 e8 73 fb ff ff c7 45 ec 00 00 00 00 eb 36 8b 45 ec 48 63 d0 48 8b 45 c8 48 01 d0 <0f> b6 00 0f be c0 01 45 e4 8b 45 ec 25 ff 0f 00 00 85 c0 75 10 8b > [337533.314314] RSP: 002b:00007ffcf6734730 EFLAGS: 00010206 > [337533.314316] RAX: 00007f2228f74000 RBX: 0000000000000000 RCX: 0000000000000000 > [337533.314317] RDX: 0000000000487000 RSI: 000055a806efc260 RDI: 0000000000000000 > [337533.314318] RBP: 00007ffcf6735780 R08: 0000000000000000 R09: 00007ffcf67345fc > [337533.314319] R10: 0000000000000000 R11: 0000000000000246 R12: 000055a806ef8120 > [337533.314320] R13: 00007ffcf6735860 R14: 0000000000000000 R15: 0000000000000000 > [337533.314322] memory: usage 524288kB, limit 524288kB, failcnt 1240247 > [337533.314323] memory+swap: usage 2592556kB, limit 9007199254740988kB, failcnt 0 > [337533.314324] kmem: usage 7260kB, limit 9007199254740988kB, failcnt 0 > [337533.314325] Memory cgroup stats for /leaker: cache:80KB rss:516948KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:2068268KB inactive_anon:258520KB active_anon:258412KB inactive_file:32KB active_file:12KB unevictable:0KB > [337533.314332] Tasks state (memory values in pages): > [337533.314333] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name > [337533.314404] [ 23777] 0 23777 596 400 36864 4 0 sh > [337533.314407] [ 23793] 0 23793 655928 126942 5226496 519670 0 leaker > [337533.314408] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),oom_memcg=/leaker,task_memcg=/leaker,task=leaker,pid=23793,uid=0 > [337533.314412] Memory cgroup out of memory: Killed process 23793 (leaker) total-vm:2623712kB, anon-rss:506500kB, file-rss:1268kB, shmem-rss:0kB > [337533.418036] oom_reaper: reaped process 23793 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-02 14:18 ` Masoud Sharbiani @ 2019-08-02 14:41 ` Michal Hocko 2019-08-02 18:00 ` Masoud Sharbiani 0 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2019-08-02 14:41 UTC (permalink / raw) To: Masoud Sharbiani Cc: gregkh, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel On Fri 02-08-19 07:18:17, Masoud Sharbiani wrote: > > > > On Aug 2, 2019, at 12:40 AM, Michal Hocko <mhocko@kernel.org> wrote: > > > > On Thu 01-08-19 11:04:14, Masoud Sharbiani wrote: > >> Hey folks, > >> I’ve come across an issue that affects most of 4.19, 4.20 and 5.2 linux-stable kernels that has only been fixed in 5.3-rc1. > >> It was introduced by > >> > >> 29ef680 memcg, oom: move out_of_memory back to the charge path > > > > This commit shouldn't really change the OOM behavior for your particular > > test case. It would have changed MAP_POPULATE behavior but your usage is > > triggering the standard page fault path. The only difference with > > 29ef680 is that the OOM killer is invoked during the charge path rather > > than on the way out of the page fault. > > > > Anyway, I tried to run your test case in a loop and leaker always ends > > up being killed as expected with 5.2. See the below oom report. There > > must be something else going on. How much swap do you have on your > > system? > > I do not have swap defined. OK, I have retested with swap disabled and again everything seems to be working as expected. The oom happens earlier because I do not have to wait for the swap to get full. Which fs do you use to write the file that you mmap? Or could you try to simplify your test even further? E.g. does everything work as expected when doing anonymous mmap rather than file backed one? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-02 14:41 ` Michal Hocko @ 2019-08-02 18:00 ` Masoud Sharbiani 2019-08-02 19:14 ` Michal Hocko 0 siblings, 1 reply; 23+ messages in thread From: Masoud Sharbiani @ 2019-08-02 18:00 UTC (permalink / raw) To: Michal Hocko Cc: gregkh, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel [-- Attachment #1: Type: text/plain, Size: 6201 bytes --] > On Aug 2, 2019, at 7:41 AM, Michal Hocko <mhocko@kernel.org> wrote: > > On Fri 02-08-19 07:18:17, Masoud Sharbiani wrote: >> >> >>> On Aug 2, 2019, at 12:40 AM, Michal Hocko <mhocko@kernel.org> wrote: >>> >>> On Thu 01-08-19 11:04:14, Masoud Sharbiani wrote: >>>> Hey folks, >>>> I’ve come across an issue that affects most of 4.19, 4.20 and 5.2 linux-stable kernels that has only been fixed in 5.3-rc1. >>>> It was introduced by >>>> >>>> 29ef680 memcg, oom: move out_of_memory back to the charge path >>> >>> This commit shouldn't really change the OOM behavior for your particular >>> test case. It would have changed MAP_POPULATE behavior but your usage is >>> triggering the standard page fault path. The only difference with >>> 29ef680 is that the OOM killer is invoked during the charge path rather >>> than on the way out of the page fault. >>> >>> Anyway, I tried to run your test case in a loop and leaker always ends >>> up being killed as expected with 5.2. See the below oom report. There >>> must be something else going on. How much swap do you have on your >>> system? >> >> I do not have swap defined. > > OK, I have retested with swap disabled and again everything seems to be > working as expected. The oom happens earlier because I do not have to > wait for the swap to get full. > In my tests (with the script provided), it only loops 11 iterations before hanging, and uttering the soft lockup message. > Which fs do you use to write the file that you mmap? /dev/sda3 on / type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota) Part of the soft lockup path actually specifies that it is going through __xfs_filemap_fault(): [ 561.452933] watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [leaker:3261] [ 561.459904] Modules linked in: dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt gpio_ich iTCO_vendor_support dcdbas ipmi_ssif intel_powerc lamp coretemp kvm_intel ses ipmi_si kvm enclosure scsi_transport_sas ipmi_devintf irqbypass pcspkr lpc_ich sg joydev ipmi_msghandler wmi acp i_power_meter acpi_cpufreq xfs libcrc32c ata_generic sd_mod pata_acpi ata_piix libata megaraid_sas crc32c_intel serio_raw bnx2 bonding [ 561.495979] CPU: 4 PID: 3261 Comm: leaker Tainted: G I L 5.3.0-rc2+ #10 [ 561.503704] Hardware name: Dell Inc. PowerEdge R710/0YDJK3, BIOS 6.4.0 07/23/2013 [ 561.511168] RIP: 0010:lruvec_lru_size+0x49/0xf0 [ 561.515687] Code: 41 89 ed b8 ff ff ff ff 45 31 f6 49 c1 e5 03 eb 19 48 63 d0 4c 89 e9 48 03 8b 88 00 00 00 48 8b 14 d5 60 a9 92 94 4c 03 34 11 <48> c7 c6 80 7c bf 94 89 c7 e8 89 d3 59 00 3b 05 27 eb ff 00 72 d1 [ 561.534418] RSP: 0018:ffffb5f886a3f640 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 [ 561.541968] RAX: 0000000000000002 RBX: ffff96fca3bba400 RCX: 00003ef5d82059f0 [ 561.549085] RDX: ffff9702a7a40000 RSI: 0000000000000010 RDI: ffffffff94bf7c80 [ 561.556202] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff94ae1c00 [ 561.563318] R10: ffff96fcc7802520 R11: 0000000000000000 R12: 0000000000000004 [ 561.570435] R13: 0000000000000008 R14: 0000000000000000 R15: 0000000000000000 [ 561.577553] FS: 00007f5522602740(0000) GS:ffff9702a7a80000(0000) knlGS:0000000000000000 [ 561.585623] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 561.591352] CR2: 00007fba755f95b0 CR3: 0000000c646dc000 CR4: 00000000000006e0 [ 561.598468] Call Trace: [ 561.600907] shrink_node_memcg+0xc8/0x790 [ 561.604905] ? shrink_slab+0x245/0x280 [ 561.608644] ? mem_cgroup_iter+0x10a/0x2c0 [ 561.612728] shrink_node+0xcd/0x490 [ 561.616208] do_try_to_free_pages+0xda/0x3a0 [ 561.620466] ? mem_cgroup_select_victim_node+0x43/0x2f0 [ 561.625678] try_to_free_mem_cgroup_pages+0xe7/0x1c0 [ 561.630629] try_charge+0x246/0x7a0 [ 561.634107] mem_cgroup_try_charge+0x6b/0x1e0 [ 561.638453] ? mem_cgroup_commit_charge+0x5a/0x110 [ 561.643231] __add_to_page_cache_locked+0x195/0x330 [ 561.648100] ? scan_shadow_nodes+0x30/0x30 [ 561.652184] add_to_page_cache_lru+0x39/0xa0 [ 561.656442] iomap_readpages_actor+0xf2/0x230 [ 561.660787] iomap_apply+0xa3/0x130 [ 561.664266] iomap_readpages+0x97/0x180 [ 561.668091] ? iomap_migrate_page+0xe0/0xe0 [ 561.672266] read_pages+0x57/0x180 [ 561.675657] __do_page_cache_readahead+0x1ac/0x1c0 [ 561.680436] ondemand_readahead+0x168/0x2a0 [ 561.684606] filemap_fault+0x30d/0x830 [ 561.688343] ? flush_tlb_func_common.isra.8+0x147/0x230 [ 561.693554] ? __mod_lruvec_state+0x40/0xe0 [ 561.697726] ? alloc_set_pte+0x4e6/0x5b0 [ 561.701669] __xfs_filemap_fault+0x61/0x190 [xfs] [ 561.706361] __do_fault+0x38/0xb0 [ 561.709666] __handle_mm_fault+0xbee/0xe90 [ 561.713750] handle_mm_fault+0xe2/0x200 [ 561.717574] __do_page_fault+0x224/0x490 [ 561.721485] do_page_fault+0x31/0x120 [ 561.725137] page_fault+0x3e/0x50 [ 561.728439] RIP: 0033:0x400c5a [ 561.731483] Code: 45 c0 48 89 c6 bf 77 0e 40 00 b8 00 00 00 00 e8 3c fb ff ff c7 45 dc 00 00 00 00 eb 36 8b 45 dc 48 63 d0 48 8b 45 c0 48 01 d0 <0f> b6 00 0f be c0 01 45 e8 8b 45 dc 25 ff 0f 00 00 85 c0 75 10 8b [ 561.750214] RSP: 002b:00007fffba1d9450 EFLAGS: 00010206 [ 561.755426] RAX: 00007f550346b000 RBX: 0000000000000000 RCX: 000000000000001a [ 561.762542] RDX: 0000000001c4c000 RSI: 000000007fffffe5 RDI: 0000000000000000 [ 561.769659] RBP: 00007fffba1da4a0 R08: 0000000000000000 R09: 00007f552206c20d [ 561.776775] R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000400850 [ 561.783892] R13: 00007fffba1da580 R14: 0000000000000000 R15: 0000000000000000 If I switch the backing file to a ext4 filesystem (separate hard drive), it OOMs. If I switch the file used to /dev/zero, it OOMs: … Todal sum was 0. Loop count is 11 Buffer is @ 0x7f2b66c00000 ./test-script-devzero.sh: line 16: 3561 Killed ./leaker -p 10240 -c 100000 > Or could you try to > simplify your test even further? E.g. does everything work as expected > when doing anonymous mmap rather than file backed one? It also OOMs with MAP_ANON. Hope that helps. Masoud > -- > Michal Hocko > SUSE Labs [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 3437 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-02 18:00 ` Masoud Sharbiani @ 2019-08-02 19:14 ` Michal Hocko [not found] ` <A06C5313-B021-4ADA-9897-CE260A9011CC@apple.com> 0 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2019-08-02 19:14 UTC (permalink / raw) To: Masoud Sharbiani Cc: gregkh, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel On Fri 02-08-19 11:00:55, Masoud Sharbiani wrote: > > > > On Aug 2, 2019, at 7:41 AM, Michal Hocko <mhocko@kernel.org> wrote: > > > > On Fri 02-08-19 07:18:17, Masoud Sharbiani wrote: > >> > >> > >>> On Aug 2, 2019, at 12:40 AM, Michal Hocko <mhocko@kernel.org> wrote: > >>> > >>> On Thu 01-08-19 11:04:14, Masoud Sharbiani wrote: > >>>> Hey folks, > >>>> I’ve come across an issue that affects most of 4.19, 4.20 and 5.2 linux-stable kernels that has only been fixed in 5.3-rc1. > >>>> It was introduced by > >>>> > >>>> 29ef680 memcg, oom: move out_of_memory back to the charge path > >>> > >>> This commit shouldn't really change the OOM behavior for your particular > >>> test case. It would have changed MAP_POPULATE behavior but your usage is > >>> triggering the standard page fault path. The only difference with > >>> 29ef680 is that the OOM killer is invoked during the charge path rather > >>> than on the way out of the page fault. > >>> > >>> Anyway, I tried to run your test case in a loop and leaker always ends > >>> up being killed as expected with 5.2. See the below oom report. There > >>> must be something else going on. How much swap do you have on your > >>> system? > >> > >> I do not have swap defined. > > > > OK, I have retested with swap disabled and again everything seems to be > > working as expected. The oom happens earlier because I do not have to > > wait for the swap to get full. > > > > In my tests (with the script provided), it only loops 11 iterations before hanging, and uttering the soft lockup message. > > > > Which fs do you use to write the file that you mmap? > > /dev/sda3 on / type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota) > > Part of the soft lockup path actually specifies that it is going through __xfs_filemap_fault(): Right, I have just missed that. [...] > If I switch the backing file to a ext4 filesystem (separate hard drive), it OOMs. > > > If I switch the file used to /dev/zero, it OOMs: > … > Todal sum was 0. Loop count is 11 > Buffer is @ 0x7f2b66c00000 > ./test-script-devzero.sh: line 16: 3561 Killed ./leaker -p 10240 -c 100000 > > > > Or could you try to > > simplify your test even further? E.g. does everything work as expected > > when doing anonymous mmap rather than file backed one? > > It also OOMs with MAP_ANON. > > Hope that helps. It helps to focus more on the xfs reclaim path. Just to be sure, is there any difference if you use cgroup v2? I do not expect to be but just to be sure there are no v1 artifacts. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <A06C5313-B021-4ADA-9897-CE260A9011CC@apple.com>]
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. [not found] ` <A06C5313-B021-4ADA-9897-CE260A9011CC@apple.com> @ 2019-08-03 2:36 ` Tetsuo Handa 2019-08-03 15:51 ` Tetsuo Handa 2019-08-05 8:18 ` Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1 Michal Hocko 1 sibling, 1 reply; 23+ messages in thread From: Tetsuo Handa @ 2019-08-03 2:36 UTC (permalink / raw) To: Masoud Sharbiani, Michal Hocko Cc: Greg KH, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel Well, while mem_cgroup_oom() is actually called, due to hitting /* * The OOM killer does not compensate for IO-less reclaim. * pagefault_out_of_memory lost its gfp context so we have to * make sure exclude 0 mask - all other users should have at least * ___GFP_DIRECT_RECLAIM to get here. */ if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS)) return true; path inside out_of_memory(), OOM_SUCCESS is returned and retrying without making forward progress... ---------------------------------------- --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2447,6 +2447,8 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, */ oom_status = mem_cgroup_oom(mem_over_limit, gfp_mask, get_order(nr_pages * PAGE_SIZE)); + printk("mem_cgroup_oom(%pGg)=%u\n", &gfp_mask, oom_status); + dump_stack(); switch (oom_status) { case OOM_SUCCESS: nr_retries = MEM_CGROUP_RECLAIM_RETRIES; ---------------------------------------- ---------------------------------------- [ 55.208578][ T2798] mem_cgroup_oom(GFP_NOFS)=0 [ 55.210424][ T2798] CPU: 3 PID: 2798 Comm: leaker Not tainted 5.3.0-rc2+ #637 [ 55.212985][ T2798] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 [ 55.217260][ T2798] Call Trace: [ 55.218597][ T2798] dump_stack+0x67/0x95 [ 55.220200][ T2798] try_charge+0x4ca/0x6d0 [ 55.221843][ T2798] ? get_mem_cgroup_from_mm+0x1ff/0x2c0 [ 55.223855][ T2798] mem_cgroup_try_charge+0x88/0x2d0 [ 55.225723][ T2798] __add_to_page_cache_locked+0x27e/0x4c0 [ 55.227784][ T2798] ? scan_shadow_nodes+0x30/0x30 [ 55.229577][ T2798] add_to_page_cache_lru+0x72/0x180 [ 55.231467][ T2798] iomap_readpages_actor+0xeb/0x1e0 [ 55.233376][ T2798] ? iomap_migrate_page+0x120/0x120 [ 55.235382][ T2798] iomap_apply+0xaf/0x150 [ 55.237049][ T2798] iomap_readpages+0x9f/0x160 [ 55.239061][ T2798] ? iomap_migrate_page+0x120/0x120 [ 55.241013][ T2798] xfs_vm_readpages+0x54/0x130 [xfs] [ 55.242960][ T2798] read_pages+0x63/0x160 [ 55.244613][ T2798] __do_page_cache_readahead+0x1cd/0x200 [ 55.246699][ T2798] ondemand_readahead+0x201/0x4d0 [ 55.248562][ T2798] page_cache_async_readahead+0x16e/0x2e0 [ 55.250740][ T2798] ? page_cache_async_readahead+0xa5/0x2e0 [ 55.252881][ T2798] filemap_fault+0x3f3/0xc20 [ 55.254813][ T2798] ? xfs_ilock+0x1de/0x2c0 [xfs] [ 55.256858][ T2798] ? __xfs_filemap_fault+0x7f/0x270 [xfs] [ 55.259118][ T2798] ? down_read_nested+0x98/0x170 [ 55.261123][ T2798] ? xfs_ilock+0x1de/0x2c0 [xfs] [ 55.263146][ T2798] __xfs_filemap_fault+0x92/0x270 [xfs] [ 55.265210][ T2798] xfs_filemap_fault+0x27/0x30 [xfs] [ 55.267164][ T2798] __do_fault+0x33/0xd0 [ 55.268784][ T2798] do_fault+0x3be/0x5c0 [ 55.270390][ T2798] __handle_mm_fault+0x462/0xc00 [ 55.272251][ T2798] handle_mm_fault+0x17c/0x380 [ 55.274055][ T2798] ? handle_mm_fault+0x46/0x380 [ 55.275877][ T2798] __do_page_fault+0x24a/0x4c0 [ 55.277676][ T2798] do_page_fault+0x27/0x1b0 [ 55.279399][ T2798] page_fault+0x34/0x40 [ 55.281053][ T2798] RIP: 0033:0x4009f0 [ 55.282564][ T2798] Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83 [ 55.289631][ T2798] RSP: 002b:00007fff1804ec00 EFLAGS: 00010206 [ 55.291835][ T2798] RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001a1a000 [ 55.294745][ T2798] RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000 [ 55.297500][ T2798] RBP: 000000000000000c R08: 0000000000000000 R09: 00007f4e7392320d [ 55.300225][ T2798] R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0 [ 55.303047][ T2798] R13: 0000000000000003 R14: 00007f4e530d6000 R15: 0000000002800000 ---------------------------------------- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-03 2:36 ` Tetsuo Handa @ 2019-08-03 15:51 ` Tetsuo Handa 2019-08-03 17:41 ` Masoud Sharbiani 2019-08-05 8:42 ` Michal Hocko 0 siblings, 2 replies; 23+ messages in thread From: Tetsuo Handa @ 2019-08-03 15:51 UTC (permalink / raw) To: Masoud Sharbiani, Michal Hocko Cc: Greg KH, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel Masoud, will you try this patch? By the way, is /sys/fs/cgroup/memory/leaker/memory.usage_in_bytes remains non-zero despite /sys/fs/cgroup/memory/leaker/tasks became empty due to memcg OOM killer expected? Deleting big-data-file.bin after memcg OOM killer reduces some, but still remains non-zero. ---------------------------------------- From 2f92c70f390f42185c6e2abb8dda98b1b7d02fa9 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Sun, 4 Aug 2019 00:41:30 +0900 Subject: [PATCH] memcg, oom: don't require __GFP_FS when invoking memcg OOM killer Masoud Sharbiani noticed that commit 29ef680ae7c21110 ("memcg, oom: move out_of_memory back to the charge path") broke memcg OOM called from __xfs_filemap_fault() path. It turned out that try_chage() is retrying forever without making forward progress because mem_cgroup_oom(GFP_NOFS) cannot invoke the OOM killer due to commit 3da88fb3bacfaa33 ("mm, oom: move GFP_NOFS check to out_of_memory"). Regarding memcg OOM, we need to bypass GFP_NOFS check in order to guarantee forward progress. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Masoud Sharbiani <msharbiani@apple.com> Bisected-by: Masoud Sharbiani <msharbiani@apple.com> Fixes: 29ef680ae7c21110 ("memcg, oom: move out_of_memory back to the charge path") --- mm/oom_kill.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index eda2e2a..26804ab 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1068,9 +1068,10 @@ bool out_of_memory(struct oom_control *oc) * The OOM killer does not compensate for IO-less reclaim. * pagefault_out_of_memory lost its gfp context so we have to * make sure exclude 0 mask - all other users should have at least - * ___GFP_DIRECT_RECLAIM to get here. + * ___GFP_DIRECT_RECLAIM to get here. But mem_cgroup_oom() has to + * invoke the OOM killer even if it is a GFP_NOFS allocation. */ - if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS)) + if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc)) return true; /* -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-03 15:51 ` Tetsuo Handa @ 2019-08-03 17:41 ` Masoud Sharbiani 2019-08-03 18:24 ` Masoud Sharbiani 2019-08-05 8:42 ` Michal Hocko 1 sibling, 1 reply; 23+ messages in thread From: Masoud Sharbiani @ 2019-08-03 17:41 UTC (permalink / raw) To: Tetsuo Handa Cc: Michal Hocko, Greg KH, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel > On Aug 3, 2019, at 8:51 AM, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > > Masoud, will you try this patch? Gladly. It looks like it is working (and OOMing properly). > > By the way, is /sys/fs/cgroup/memory/leaker/memory.usage_in_bytes remains non-zero > despite /sys/fs/cgroup/memory/leaker/tasks became empty due to memcg OOM killer expected? > Deleting big-data-file.bin after memcg OOM killer reduces some, but still remains > non-zero. Yes. I had not noticed that: [ 1114.190477] oom_reaper: reaped process 1942 (leaker), now anon-rss:0kB, file- rss:0kB, shmem-rss:0kB ./test-script.sh: line 16: 1942 Killed ./leaker -p 10240 -c 100000 [root@localhost laleaker]# cat /sys/fs/cgroup/memory/leaker/memory.usage_in_bytes 3194880 [root@localhost laleaker]# cat /sys/fs/cgroup/memory/leaker/memory.limit_in_bytes 536870912 [root@localhost laleaker]# rm -f big-data-file.bin [root@localhost laleaker]# cat /sys/fs/cgroup/memory/leaker/memory.usage_in_bytes 2838528 Thanks! Masoud PS: Tried hand-back-porting it to 4.19-y and it didn’t work. I think there are other patches between 4.19.0 and 5.3 that could be necessary… > > ---------------------------------------- > From 2f92c70f390f42185c6e2abb8dda98b1b7d02fa9 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Date: Sun, 4 Aug 2019 00:41:30 +0900 > Subject: [PATCH] memcg, oom: don't require __GFP_FS when invoking memcg OOM killer > > Masoud Sharbiani noticed that commit 29ef680ae7c21110 ("memcg, oom: move > out_of_memory back to the charge path") broke memcg OOM called from > __xfs_filemap_fault() path. It turned out that try_chage() is retrying > forever without making forward progress because mem_cgroup_oom(GFP_NOFS) > cannot invoke the OOM killer due to commit 3da88fb3bacfaa33 ("mm, oom: > move GFP_NOFS check to out_of_memory"). Regarding memcg OOM, we need to > bypass GFP_NOFS check in order to guarantee forward progress. > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Reported-by: Masoud Sharbiani <msharbiani@apple.com> > Bisected-by: Masoud Sharbiani <msharbiani@apple.com> > Fixes: 29ef680ae7c21110 ("memcg, oom: move out_of_memory back to the charge path") > --- > mm/oom_kill.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index eda2e2a..26804ab 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -1068,9 +1068,10 @@ bool out_of_memory(struct oom_control *oc) > * The OOM killer does not compensate for IO-less reclaim. > * pagefault_out_of_memory lost its gfp context so we have to > * make sure exclude 0 mask - all other users should have at least > - * ___GFP_DIRECT_RECLAIM to get here. > + * ___GFP_DIRECT_RECLAIM to get here. But mem_cgroup_oom() has to > + * invoke the OOM killer even if it is a GFP_NOFS allocation. > */ > - if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS)) > + if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc)) > return true; > > /* > -- > 1.8.3.1 > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-03 17:41 ` Masoud Sharbiani @ 2019-08-03 18:24 ` Masoud Sharbiani 0 siblings, 0 replies; 23+ messages in thread From: Masoud Sharbiani @ 2019-08-03 18:24 UTC (permalink / raw) To: Tetsuo Handa Cc: Michal Hocko, Greg KH, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel > On Aug 3, 2019, at 10:41 AM, Masoud Sharbiani <msharbiani@apple.com> wrote: > > > >> On Aug 3, 2019, at 8:51 AM, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: >> >> Masoud, will you try this patch? > > Gladly. > It looks like it is working (and OOMing properly). > > >> >> By the way, is /sys/fs/cgroup/memory/leaker/memory.usage_in_bytes remains non-zero >> despite /sys/fs/cgroup/memory/leaker/tasks became empty due to memcg OOM killer expected? >> Deleting big-data-file.bin after memcg OOM killer reduces some, but still remains >> non-zero. > > Yes. I had not noticed that: > > [ 1114.190477] oom_reaper: reaped process 1942 (leaker), now anon-rss:0kB, file- > rss:0kB, shmem-rss:0kB > ./test-script.sh: line 16: 1942 Killed ./leaker -p 10240 -c 100000 > > [root@localhost laleaker]# cat /sys/fs/cgroup/memory/leaker/memory.usage_in_bytes > 3194880 > [root@localhost laleaker]# cat /sys/fs/cgroup/memory/leaker/memory.limit_in_bytes > 536870912 > [root@localhost laleaker]# rm -f big-data-file.bin > [root@localhost laleaker]# cat /sys/fs/cgroup/memory/leaker/memory.usage_in_bytes > 2838528 > > Thanks! > Masoud > > PS: Tried hand-back-porting it to 4.19-y and it didn’t work. I think there are other patches between 4.19.0 and 5.3 that could be necessary… > Please ignore this last part. It works on 4.19-y branch as well. Masoud > >> >> ---------------------------------------- >> From 2f92c70f390f42185c6e2abb8dda98b1b7d02fa9 Mon Sep 17 00:00:00 2001 >> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> >> Date: Sun, 4 Aug 2019 00:41:30 +0900 >> Subject: [PATCH] memcg, oom: don't require __GFP_FS when invoking memcg OOM killer >> >> Masoud Sharbiani noticed that commit 29ef680ae7c21110 ("memcg, oom: move >> out_of_memory back to the charge path") broke memcg OOM called from >> __xfs_filemap_fault() path. It turned out that try_chage() is retrying >> forever without making forward progress because mem_cgroup_oom(GFP_NOFS) >> cannot invoke the OOM killer due to commit 3da88fb3bacfaa33 ("mm, oom: >> move GFP_NOFS check to out_of_memory"). Regarding memcg OOM, we need to >> bypass GFP_NOFS check in order to guarantee forward progress. >> >> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> >> Reported-by: Masoud Sharbiani <msharbiani@apple.com> >> Bisected-by: Masoud Sharbiani <msharbiani@apple.com> >> Fixes: 29ef680ae7c21110 ("memcg, oom: move out_of_memory back to the charge path") >> --- >> mm/oom_kill.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c >> index eda2e2a..26804ab 100644 >> --- a/mm/oom_kill.c >> +++ b/mm/oom_kill.c >> @@ -1068,9 +1068,10 @@ bool out_of_memory(struct oom_control *oc) >> * The OOM killer does not compensate for IO-less reclaim. >> * pagefault_out_of_memory lost its gfp context so we have to >> * make sure exclude 0 mask - all other users should have at least >> - * ___GFP_DIRECT_RECLAIM to get here. >> + * ___GFP_DIRECT_RECLAIM to get here. But mem_cgroup_oom() has to >> + * invoke the OOM killer even if it is a GFP_NOFS allocation. >> */ >> - if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS)) >> + if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc)) >> return true; >> >> /* >> -- >> 1.8.3.1 >> >> > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-03 15:51 ` Tetsuo Handa 2019-08-03 17:41 ` Masoud Sharbiani @ 2019-08-05 8:42 ` Michal Hocko 2019-08-05 11:36 ` Tetsuo Handa 1 sibling, 1 reply; 23+ messages in thread From: Michal Hocko @ 2019-08-05 8:42 UTC (permalink / raw) To: Tetsuo Handa Cc: Masoud Sharbiani, Greg KH, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel On Sun 04-08-19 00:51:18, Tetsuo Handa wrote: > Masoud, will you try this patch? > > By the way, is /sys/fs/cgroup/memory/leaker/memory.usage_in_bytes remains non-zero > despite /sys/fs/cgroup/memory/leaker/tasks became empty due to memcg OOM killer expected? > Deleting big-data-file.bin after memcg OOM killer reduces some, but still remains > non-zero. > > ---------------------------------------- > >From 2f92c70f390f42185c6e2abb8dda98b1b7d02fa9 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Date: Sun, 4 Aug 2019 00:41:30 +0900 > Subject: [PATCH] memcg, oom: don't require __GFP_FS when invoking memcg OOM killer > > Masoud Sharbiani noticed that commit 29ef680ae7c21110 ("memcg, oom: move > out_of_memory back to the charge path") broke memcg OOM called from > __xfs_filemap_fault() path. This is very well spotted! I really didn't think of GFP_NOFS although xfs in the mix could give me some clue. > It turned out that try_chage() is retrying > forever without making forward progress because mem_cgroup_oom(GFP_NOFS) > cannot invoke the OOM killer due to commit 3da88fb3bacfaa33 ("mm, oom: > move GFP_NOFS check to out_of_memory"). Regarding memcg OOM, we need to > bypass GFP_NOFS check in order to guarantee forward progress. This deserves more information about the fix. Why is it OK to trigger OOM for GFP_NOFS allocations? Doesn't this lead to pre-mature OOM killer invocation? You can argue that memcg charges have ignored GFP_NOFS without seeing a lot of problems. But please document that in the changelog. It is 3da88fb3bacfaa33 that has introduced this heuristic and I have to confess I haven't realized the side effect on the memcg side because OOM was triggered only from the GFP_KERNEL context. So I would point to 3da88fb3bacfaa33 as introducing the regression albeit silent at the time. > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Reported-by: Masoud Sharbiani <msharbiani@apple.com> > Bisected-by: Masoud Sharbiani <msharbiani@apple.com> > Fixes: 29ef680ae7c21110 ("memcg, oom: move out_of_memory back to the charge path") I would say Fixes: 3da88fb3bacfaa33 # necessary after 29ef680ae7c21110 Other than that I am not really sure about a better fix. Let's see whether we see some pre-mature memcg OOM reports and think where to get from there. With updated changelog Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > --- > mm/oom_kill.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index eda2e2a..26804ab 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -1068,9 +1068,10 @@ bool out_of_memory(struct oom_control *oc) > * The OOM killer does not compensate for IO-less reclaim. > * pagefault_out_of_memory lost its gfp context so we have to > * make sure exclude 0 mask - all other users should have at least > - * ___GFP_DIRECT_RECLAIM to get here. > + * ___GFP_DIRECT_RECLAIM to get here. But mem_cgroup_oom() has to > + * invoke the OOM killer even if it is a GFP_NOFS allocation. > */ > - if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS)) > + if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc)) > return true; > > /* > -- > 1.8.3.1 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-05 8:42 ` Michal Hocko @ 2019-08-05 11:36 ` Tetsuo Handa 2019-08-05 11:44 ` Michal Hocko 0 siblings, 1 reply; 23+ messages in thread From: Tetsuo Handa @ 2019-08-05 11:36 UTC (permalink / raw) To: Michal Hocko, Andrew Morton Cc: Masoud Sharbiani, Greg KH, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel I updated the changelog. From 80b6f63b9d30df414e468e193a7f1b40c373ed68 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Mon, 5 Aug 2019 20:28:35 +0900 Subject: [PATCH v2] memcg, oom: don't require __GFP_FS when invoking memcg OOM killer Masoud Sharbiani noticed that commit 29ef680ae7c21110 ("memcg, oom: move out_of_memory back to the charge path") broke memcg OOM called from __xfs_filemap_fault() path. It turned out that try_charge() is retrying forever without making forward progress because mem_cgroup_oom(GFP_NOFS) cannot invoke the OOM killer due to commit 3da88fb3bacfaa33 ("mm, oom: move GFP_NOFS check to out_of_memory"). Allowing forced charge due to being unable to invoke memcg OOM killer will lead to global OOM situation, and just returning -ENOMEM will not solve memcg OOM situation. Therefore, invoking memcg OOM killer (despite GFP_NOFS) will be the only choice we can choose for now. Until 29ef680ae7c21110~1, we were able to invoke memcg OOM killer when GFP_KERNEL reclaim failed [1]. But since 29ef680ae7c21110, we need to invoke memcg OOM killer when GFP_NOFS reclaim failed [2]. Although in the past we did invoke memcg OOM killer for GFP_NOFS [3], we might get pre-mature memcg OOM reports due to this patch. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-and-tested-by: Masoud Sharbiani <msharbiani@apple.com> Bisected-by: Masoud Sharbiani <msharbiani@apple.com> Acked-by: Michal Hocko <mhocko@suse.com> Fixes: 3da88fb3bacfaa33 # necessary after 29ef680ae7c21110 Cc: <stable@vger.kernel.org> # 4.19+ [1] leaker invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 CPU: 0 PID: 2746 Comm: leaker Not tainted 4.18.0+ #19 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 Call Trace: dump_stack+0x63/0x88 dump_header+0x67/0x27a ? mem_cgroup_scan_tasks+0x91/0xf0 oom_kill_process+0x210/0x410 out_of_memory+0x10a/0x2c0 mem_cgroup_out_of_memory+0x46/0x80 mem_cgroup_oom_synchronize+0x2e4/0x310 ? high_work_func+0x20/0x20 pagefault_out_of_memory+0x31/0x76 mm_fault_error+0x55/0x115 ? handle_mm_fault+0xfd/0x220 __do_page_fault+0x433/0x4e0 do_page_fault+0x22/0x30 ? page_fault+0x8/0x30 page_fault+0x1e/0x30 RIP: 0033:0x4009f0 Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83 RSP: 002b:00007ffe29ae96f0 EFLAGS: 00010206 RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001ce1000 RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000 RBP: 000000000000000c R08: 0000000000000000 R09: 00007f94be09220d R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0 R13: 0000000000000003 R14: 00007f949d845000 R15: 0000000002800000 Task in /leaker killed as a result of limit of /leaker memory: usage 524288kB, limit 524288kB, failcnt 158965 memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 kmem: usage 2016kB, limit 9007199254740988kB, failcnt 0 Memory cgroup stats for /leaker: cache:844KB rss:521136KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:132KB writeback:0KB inactive_anon:0KB active_anon:521224KB inactive_file:1012KB active_file:8KB unevictable:0KB Memory cgroup out of memory: Kill process 2746 (leaker) score 998 or sacrifice child Killed process 2746 (leaker) total-vm:536704kB, anon-rss:521176kB, file-rss:1208kB, shmem-rss:0kB oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [2] leaker invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), nodemask=(null), order=0, oom_score_adj=0 CPU: 1 PID: 2746 Comm: leaker Not tainted 4.18.0+ #20 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 Call Trace: dump_stack+0x63/0x88 dump_header+0x67/0x27a ? mem_cgroup_scan_tasks+0x91/0xf0 oom_kill_process+0x210/0x410 out_of_memory+0x109/0x2d0 mem_cgroup_out_of_memory+0x46/0x80 try_charge+0x58d/0x650 ? __radix_tree_replace+0x81/0x100 mem_cgroup_try_charge+0x7a/0x100 __add_to_page_cache_locked+0x92/0x180 add_to_page_cache_lru+0x4d/0xf0 iomap_readpages_actor+0xde/0x1b0 ? iomap_zero_range_actor+0x1d0/0x1d0 iomap_apply+0xaf/0x130 iomap_readpages+0x9f/0x150 ? iomap_zero_range_actor+0x1d0/0x1d0 xfs_vm_readpages+0x18/0x20 [xfs] read_pages+0x60/0x140 __do_page_cache_readahead+0x193/0x1b0 ondemand_readahead+0x16d/0x2c0 page_cache_async_readahead+0x9a/0xd0 filemap_fault+0x403/0x620 ? alloc_set_pte+0x12c/0x540 ? _cond_resched+0x14/0x30 __xfs_filemap_fault+0x66/0x180 [xfs] xfs_filemap_fault+0x27/0x30 [xfs] __do_fault+0x19/0x40 __handle_mm_fault+0x8e8/0xb60 handle_mm_fault+0xfd/0x220 __do_page_fault+0x238/0x4e0 do_page_fault+0x22/0x30 ? page_fault+0x8/0x30 page_fault+0x1e/0x30 RIP: 0033:0x4009f0 Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83 RSP: 002b:00007ffda45c9290 EFLAGS: 00010206 RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001a1e000 RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000 RBP: 000000000000000c R08: 0000000000000000 R09: 00007f6d061ff20d R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0 R13: 0000000000000003 R14: 00007f6ce59b2000 R15: 0000000002800000 Task in /leaker killed as a result of limit of /leaker memory: usage 524288kB, limit 524288kB, failcnt 7221 memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 kmem: usage 1944kB, limit 9007199254740988kB, failcnt 0 Memory cgroup stats for /leaker: cache:3632KB rss:518232KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:518408KB inactive_file:3908KB active_file:12KB unevictable:0KB Memory cgroup out of memory: Kill process 2746 (leaker) score 992 or sacrifice child Killed process 2746 (leaker) total-vm:536704kB, anon-rss:518264kB, file-rss:1188kB, shmem-rss:0kB oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [3] leaker invoked oom-killer: gfp_mask=0x50, order=0, oom_score_adj=0 leaker cpuset=/ mems_allowed=0 CPU: 1 PID: 3206 Comm: leaker Not tainted 3.10.0-957.27.2.el7.x86_64 #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 Call Trace: [<ffffffffaf364147>] dump_stack+0x19/0x1b [<ffffffffaf35eb6a>] dump_header+0x90/0x229 [<ffffffffaedbb456>] ? find_lock_task_mm+0x56/0xc0 [<ffffffffaee32a38>] ? try_get_mem_cgroup_from_mm+0x28/0x60 [<ffffffffaedbb904>] oom_kill_process+0x254/0x3d0 [<ffffffffaee36c36>] mem_cgroup_oom_synchronize+0x546/0x570 [<ffffffffaee360b0>] ? mem_cgroup_charge_common+0xc0/0xc0 [<ffffffffaedbc194>] pagefault_out_of_memory+0x14/0x90 [<ffffffffaf35d072>] mm_fault_error+0x6a/0x157 [<ffffffffaf3717c8>] __do_page_fault+0x3c8/0x4f0 [<ffffffffaf371925>] do_page_fault+0x35/0x90 [<ffffffffaf36d768>] page_fault+0x28/0x30 Task in /leaker killed as a result of limit of /leaker memory: usage 524288kB, limit 524288kB, failcnt 20628 memory+swap: usage 524288kB, limit 9007199254740988kB, failcnt 0 kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Memory cgroup stats for /leaker: cache:840KB rss:523448KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:523448KB inactive_file:464KB active_file:376KB unevictable:0KB Memory cgroup out of memory: Kill process 3206 (leaker) score 970 or sacrifice child Killed process 3206 (leaker) total-vm:536692kB, anon-rss:523304kB, file-rss:412kB, shmem-rss:0kB --- mm/oom_kill.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index eda2e2a..26804ab 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1068,9 +1068,10 @@ bool out_of_memory(struct oom_control *oc) * The OOM killer does not compensate for IO-less reclaim. * pagefault_out_of_memory lost its gfp context so we have to * make sure exclude 0 mask - all other users should have at least - * ___GFP_DIRECT_RECLAIM to get here. + * ___GFP_DIRECT_RECLAIM to get here. But mem_cgroup_oom() has to + * invoke the OOM killer even if it is a GFP_NOFS allocation. */ - if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS)) + if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc)) return true; /* -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-05 11:36 ` Tetsuo Handa @ 2019-08-05 11:44 ` Michal Hocko 2019-08-05 14:00 ` Tetsuo Handa 0 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2019-08-05 11:44 UTC (permalink / raw) To: Tetsuo Handa Cc: Andrew Morton, Masoud Sharbiani, Greg KH, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel On Mon 05-08-19 20:36:05, Tetsuo Handa wrote: > I updated the changelog. This looks much better, thanks! One nit > >From 80b6f63b9d30df414e468e193a7f1b40c373ed68 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Date: Mon, 5 Aug 2019 20:28:35 +0900 > Subject: [PATCH v2] memcg, oom: don't require __GFP_FS when invoking memcg OOM killer > > Masoud Sharbiani noticed that commit 29ef680ae7c21110 ("memcg, oom: move > out_of_memory back to the charge path") broke memcg OOM called from > __xfs_filemap_fault() path. It turned out that try_charge() is retrying > forever without making forward progress because mem_cgroup_oom(GFP_NOFS) > cannot invoke the OOM killer due to commit 3da88fb3bacfaa33 ("mm, oom: > move GFP_NOFS check to out_of_memory"). > > Allowing forced charge due to being unable to invoke memcg OOM killer > will lead to global OOM situation, and just returning -ENOMEM will not > solve memcg OOM situation. Returning -ENOMEM would effectivelly lead to triggering the oom killer from the page fault bail out path. So effectively get us back to before 29ef680ae7c21110. But it is true that this is riskier from the observability POV when a) the OOM path wouldn't point to the culprit and b) it would leak ENOMEM from g-u-p path. > Therefore, invoking memcg OOM killer (despite > GFP_NOFS) will be the only choice we can choose for now. > > Until 29ef680ae7c21110~1, we were able to invoke memcg OOM killer when > GFP_KERNEL reclaim failed [1]. But since 29ef680ae7c21110, we need to > invoke memcg OOM killer when GFP_NOFS reclaim failed [2]. Although in > the past we did invoke memcg OOM killer for GFP_NOFS [3], we might get > pre-mature memcg OOM reports due to this patch. > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Reported-and-tested-by: Masoud Sharbiani <msharbiani@apple.com> > Bisected-by: Masoud Sharbiani <msharbiani@apple.com> > Acked-by: Michal Hocko <mhocko@suse.com> > Fixes: 3da88fb3bacfaa33 # necessary after 29ef680ae7c21110 > Cc: <stable@vger.kernel.org> # 4.19+ > > > [1] > > leaker invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 > CPU: 0 PID: 2746 Comm: leaker Not tainted 4.18.0+ #19 > Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 > Call Trace: > dump_stack+0x63/0x88 > dump_header+0x67/0x27a > ? mem_cgroup_scan_tasks+0x91/0xf0 > oom_kill_process+0x210/0x410 > out_of_memory+0x10a/0x2c0 > mem_cgroup_out_of_memory+0x46/0x80 > mem_cgroup_oom_synchronize+0x2e4/0x310 > ? high_work_func+0x20/0x20 > pagefault_out_of_memory+0x31/0x76 > mm_fault_error+0x55/0x115 > ? handle_mm_fault+0xfd/0x220 > __do_page_fault+0x433/0x4e0 > do_page_fault+0x22/0x30 > ? page_fault+0x8/0x30 > page_fault+0x1e/0x30 > RIP: 0033:0x4009f0 > Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83 > RSP: 002b:00007ffe29ae96f0 EFLAGS: 00010206 > RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001ce1000 > RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000 > RBP: 000000000000000c R08: 0000000000000000 R09: 00007f94be09220d > R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0 > R13: 0000000000000003 R14: 00007f949d845000 R15: 0000000002800000 > Task in /leaker killed as a result of limit of /leaker > memory: usage 524288kB, limit 524288kB, failcnt 158965 > memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > kmem: usage 2016kB, limit 9007199254740988kB, failcnt 0 > Memory cgroup stats for /leaker: cache:844KB rss:521136KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:132KB writeback:0KB inactive_anon:0KB active_anon:521224KB inactive_file:1012KB active_file:8KB unevictable:0KB > Memory cgroup out of memory: Kill process 2746 (leaker) score 998 or sacrifice child > Killed process 2746 (leaker) total-vm:536704kB, anon-rss:521176kB, file-rss:1208kB, shmem-rss:0kB > oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > > > [2] > > leaker invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), nodemask=(null), order=0, oom_score_adj=0 > CPU: 1 PID: 2746 Comm: leaker Not tainted 4.18.0+ #20 > Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 > Call Trace: > dump_stack+0x63/0x88 > dump_header+0x67/0x27a > ? mem_cgroup_scan_tasks+0x91/0xf0 > oom_kill_process+0x210/0x410 > out_of_memory+0x109/0x2d0 > mem_cgroup_out_of_memory+0x46/0x80 > try_charge+0x58d/0x650 > ? __radix_tree_replace+0x81/0x100 > mem_cgroup_try_charge+0x7a/0x100 > __add_to_page_cache_locked+0x92/0x180 > add_to_page_cache_lru+0x4d/0xf0 > iomap_readpages_actor+0xde/0x1b0 > ? iomap_zero_range_actor+0x1d0/0x1d0 > iomap_apply+0xaf/0x130 > iomap_readpages+0x9f/0x150 > ? iomap_zero_range_actor+0x1d0/0x1d0 > xfs_vm_readpages+0x18/0x20 [xfs] > read_pages+0x60/0x140 > __do_page_cache_readahead+0x193/0x1b0 > ondemand_readahead+0x16d/0x2c0 > page_cache_async_readahead+0x9a/0xd0 > filemap_fault+0x403/0x620 > ? alloc_set_pte+0x12c/0x540 > ? _cond_resched+0x14/0x30 > __xfs_filemap_fault+0x66/0x180 [xfs] > xfs_filemap_fault+0x27/0x30 [xfs] > __do_fault+0x19/0x40 > __handle_mm_fault+0x8e8/0xb60 > handle_mm_fault+0xfd/0x220 > __do_page_fault+0x238/0x4e0 > do_page_fault+0x22/0x30 > ? page_fault+0x8/0x30 > page_fault+0x1e/0x30 > RIP: 0033:0x4009f0 > Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83 > RSP: 002b:00007ffda45c9290 EFLAGS: 00010206 > RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001a1e000 > RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000 > RBP: 000000000000000c R08: 0000000000000000 R09: 00007f6d061ff20d > R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0 > R13: 0000000000000003 R14: 00007f6ce59b2000 R15: 0000000002800000 > Task in /leaker killed as a result of limit of /leaker > memory: usage 524288kB, limit 524288kB, failcnt 7221 > memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > kmem: usage 1944kB, limit 9007199254740988kB, failcnt 0 > Memory cgroup stats for /leaker: cache:3632KB rss:518232KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:518408KB inactive_file:3908KB active_file:12KB unevictable:0KB > Memory cgroup out of memory: Kill process 2746 (leaker) score 992 or sacrifice child > Killed process 2746 (leaker) total-vm:536704kB, anon-rss:518264kB, file-rss:1188kB, shmem-rss:0kB > oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > > > [3] > > leaker invoked oom-killer: gfp_mask=0x50, order=0, oom_score_adj=0 > leaker cpuset=/ mems_allowed=0 > CPU: 1 PID: 3206 Comm: leaker Not tainted 3.10.0-957.27.2.el7.x86_64 #1 > Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 > Call Trace: > [<ffffffffaf364147>] dump_stack+0x19/0x1b > [<ffffffffaf35eb6a>] dump_header+0x90/0x229 > [<ffffffffaedbb456>] ? find_lock_task_mm+0x56/0xc0 > [<ffffffffaee32a38>] ? try_get_mem_cgroup_from_mm+0x28/0x60 > [<ffffffffaedbb904>] oom_kill_process+0x254/0x3d0 > [<ffffffffaee36c36>] mem_cgroup_oom_synchronize+0x546/0x570 > [<ffffffffaee360b0>] ? mem_cgroup_charge_common+0xc0/0xc0 > [<ffffffffaedbc194>] pagefault_out_of_memory+0x14/0x90 > [<ffffffffaf35d072>] mm_fault_error+0x6a/0x157 > [<ffffffffaf3717c8>] __do_page_fault+0x3c8/0x4f0 > [<ffffffffaf371925>] do_page_fault+0x35/0x90 > [<ffffffffaf36d768>] page_fault+0x28/0x30 > Task in /leaker killed as a result of limit of /leaker > memory: usage 524288kB, limit 524288kB, failcnt 20628 > memory+swap: usage 524288kB, limit 9007199254740988kB, failcnt 0 > kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > Memory cgroup stats for /leaker: cache:840KB rss:523448KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:523448KB inactive_file:464KB active_file:376KB unevictable:0KB > Memory cgroup out of memory: Kill process 3206 (leaker) score 970 or sacrifice child > Killed process 3206 (leaker) total-vm:536692kB, anon-rss:523304kB, file-rss:412kB, shmem-rss:0kB > > --- > mm/oom_kill.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index eda2e2a..26804ab 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -1068,9 +1068,10 @@ bool out_of_memory(struct oom_control *oc) > * The OOM killer does not compensate for IO-less reclaim. > * pagefault_out_of_memory lost its gfp context so we have to > * make sure exclude 0 mask - all other users should have at least > - * ___GFP_DIRECT_RECLAIM to get here. > + * ___GFP_DIRECT_RECLAIM to get here. But mem_cgroup_oom() has to > + * invoke the OOM killer even if it is a GFP_NOFS allocation. > */ > - if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS)) > + if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc)) > return true; > > /* > -- > 1.8.3.1 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-05 11:44 ` Michal Hocko @ 2019-08-05 14:00 ` Tetsuo Handa 2019-08-05 14:26 ` Michal Hocko 0 siblings, 1 reply; 23+ messages in thread From: Tetsuo Handa @ 2019-08-05 14:00 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Masoud Sharbiani, Greg KH, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel On 2019/08/05 20:44, Michal Hocko wrote: >> Allowing forced charge due to being unable to invoke memcg OOM killer >> will lead to global OOM situation, and just returning -ENOMEM will not >> solve memcg OOM situation. > > Returning -ENOMEM would effectivelly lead to triggering the oom killer > from the page fault bail out path. So effectively get us back to before > 29ef680ae7c21110. But it is true that this is riskier from the > observability POV when a) the OOM path wouldn't point to the culprit and > b) it would leak ENOMEM from g-u-p path. > Excuse me? But according to my experiment, below code showed flood of "Returning -ENOMEM" message instead of invoking the OOM killer. I didn't find it gets us back to before 29ef680ae7c21110... --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1884,6 +1884,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int mem_cgroup_unmark_under_oom(memcg); if (mem_cgroup_out_of_memory(memcg, mask, order)) ret = OOM_SUCCESS; + else if (!(mask & __GFP_FS)) + ret = OOM_SKIPPED; else ret = OOM_FAILED; @@ -2457,8 +2459,10 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, goto nomem; } nomem: - if (!(gfp_mask & __GFP_NOFAIL)) + if (!(gfp_mask & __GFP_NOFAIL)) { + printk("Returning -ENOMEM\n"); return -ENOMEM; + } force: /* * The allocation either can't fail or will lead to more memory --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1071,7 +1071,7 @@ bool out_of_memory(struct oom_control *oc) * ___GFP_DIRECT_RECLAIM to get here. */ if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS)) - return true; + return !is_memcg_oom(oc); /* * Check if there were limitations on the allocation (only relevant for ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-05 14:00 ` Tetsuo Handa @ 2019-08-05 14:26 ` Michal Hocko 2019-08-06 10:26 ` Tetsuo Handa 0 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2019-08-05 14:26 UTC (permalink / raw) To: Tetsuo Handa Cc: Andrew Morton, Masoud Sharbiani, Greg KH, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel On Mon 05-08-19 23:00:12, Tetsuo Handa wrote: > On 2019/08/05 20:44, Michal Hocko wrote: > >> Allowing forced charge due to being unable to invoke memcg OOM killer > >> will lead to global OOM situation, and just returning -ENOMEM will not > >> solve memcg OOM situation. > > > > Returning -ENOMEM would effectivelly lead to triggering the oom killer > > from the page fault bail out path. So effectively get us back to before > > 29ef680ae7c21110. But it is true that this is riskier from the > > observability POV when a) the OOM path wouldn't point to the culprit and > > b) it would leak ENOMEM from g-u-p path. > > > > Excuse me? But according to my experiment, below code showed flood of > "Returning -ENOMEM" message instead of invoking the OOM killer. > I didn't find it gets us back to before 29ef680ae7c21110... You would need to declare OOM_ASYNC to return ENOMEM properly from the charge (which is effectivelly a revert of 29ef680ae7c21110 for NOFS allocations). Something like the following diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ba9138a4a1de..cc34ff0932ce 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1797,7 +1797,7 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int * Please note that mem_cgroup_out_of_memory might fail to find a * victim and then we have to bail out from the charge path. */ - if (memcg->oom_kill_disable) { + if (memcg->oom_kill_disable || !(mask & __GFP_FS)) { if (!current->in_user_fault) return OOM_SKIPPED; css_get(&memcg->css); I am quite surprised that your patch didn't trigger the global OOM though. It might mean that ENOMEM doesn't propagate all the way down to the #PF handler for this path for some reason. Anyway what I meant to say is that returning ENOMEM has the observable issues as well. -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-05 14:26 ` Michal Hocko @ 2019-08-06 10:26 ` Tetsuo Handa 2019-08-06 10:50 ` Michal Hocko 0 siblings, 1 reply; 23+ messages in thread From: Tetsuo Handa @ 2019-08-06 10:26 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Masoud Sharbiani, Greg KH, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel On 2019/08/05 23:26, Michal Hocko wrote: > On Mon 05-08-19 23:00:12, Tetsuo Handa wrote: >> On 2019/08/05 20:44, Michal Hocko wrote: >>>> Allowing forced charge due to being unable to invoke memcg OOM killer >>>> will lead to global OOM situation, and just returning -ENOMEM will not >>>> solve memcg OOM situation. >>> >>> Returning -ENOMEM would effectivelly lead to triggering the oom killer >>> from the page fault bail out path. So effectively get us back to before >>> 29ef680ae7c21110. But it is true that this is riskier from the >>> observability POV when a) the OOM path wouldn't point to the culprit and >>> b) it would leak ENOMEM from g-u-p path. >>> >> >> Excuse me? But according to my experiment, below code showed flood of >> "Returning -ENOMEM" message instead of invoking the OOM killer. >> I didn't find it gets us back to before 29ef680ae7c21110... > > You would need to declare OOM_ASYNC to return ENOMEM properly from the > charge (which is effectivelly a revert of 29ef680ae7c21110 for NOFS > allocations). Something like the following > OK. We need to set current->memcg_* before declaring something other than OOM_SUCCESS and OOM_FAILED... Well, it seems that returning -ENOMEM after setting current->memcg_* works as expected. What's wrong with your approach? --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1843,6 +1843,15 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int if (order > PAGE_ALLOC_COSTLY_ORDER) return OOM_SKIPPED; + if (!(mask & __GFP_FS)) { + BUG_ON(current->memcg_in_oom); + css_get(&memcg->css); + current->memcg_in_oom = memcg; + current->memcg_oom_gfp_mask = mask; + current->memcg_oom_order = order; + return OOM_ASYNC; + } + memcg_memory_event(memcg, MEMCG_OOM); /* [ 49.921978][ T6736] leaker invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [ 49.925152][ T6736] CPU: 1 PID: 6736 Comm: leaker Kdump: loaded Not tainted 5.3.0-rc3+ #936 [ 49.927917][ T6736] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 [ 49.931337][ T6736] Call Trace: [ 49.932673][ T6736] dump_stack+0x67/0x95 [ 49.934438][ T6736] dump_header+0x4d/0x3e0 [ 49.936142][ T6736] oom_kill_process+0x193/0x220 [ 49.940276][ T6736] out_of_memory+0x105/0x360 [ 49.941863][ T6736] mem_cgroup_out_of_memory+0xb6/0xd0 [ 49.943819][ T6736] try_charge+0xa78/0xa90 [ 49.945584][ T6736] mem_cgroup_try_charge+0x88/0x2f0 [ 49.947411][ T6736] __add_to_page_cache_locked+0x27e/0x4c0 [ 49.949441][ T6736] ? scan_shadow_nodes+0x30/0x30 [ 49.951155][ T6736] add_to_page_cache_lru+0x72/0x180 [ 49.952940][ T6736] pagecache_get_page+0xb6/0x2b0 [ 49.954718][ T6736] filemap_fault+0x613/0xc20 [ 49.956407][ T6736] ? filemap_fault+0x446/0xc20 [ 49.958221][ T6736] ? __xfs_filemap_fault+0x7f/0x290 [xfs] [ 49.960206][ T6736] ? down_read_nested+0x93/0x170 [ 49.962141][ T6736] ? xfs_ilock+0x1ea/0x2f0 [xfs] [ 49.963925][ T6736] __xfs_filemap_fault+0x92/0x290 [xfs] [ 49.966089][ T6736] xfs_filemap_fault+0x27/0x30 [xfs] [ 49.967864][ T6736] __do_fault+0x33/0xd0 [ 49.969467][ T6736] __handle_mm_fault+0x891/0xbe0 [ 49.971222][ T6736] handle_mm_fault+0x179/0x380 [ 49.972902][ T6736] ? handle_mm_fault+0x46/0x380 [ 49.974544][ T6736] __do_page_fault+0x255/0x4d0 [ 49.976283][ T6736] do_page_fault+0x27/0x1e0 [ 49.978012][ T6736] page_fault+0x34/0x40 [ 49.979540][ T6736] RIP: 0033:0x4009f0 [ 49.981007][ T6736] Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83 [ 49.987171][ T6736] RSP: 002b:00007ffdbe464810 EFLAGS: 00010206 [ 49.989302][ T6736] RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001d69000 [ 49.992130][ T6736] RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000 [ 49.994857][ T6736] RBP: 000000000000000c R08: 0000000000000000 R09: 00007fa1a2ee420d [ 49.997579][ T6736] R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0 [ 50.000251][ T6736] R13: 0000000000000003 R14: 00007fa182697000 R15: 0000000002800000 [ 50.003734][ T6736] memory: usage 524288kB, limit 524288kB, failcnt 660235 [ 50.006452][ T6736] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 [ 50.009165][ T6736] kmem: usage 2196kB, limit 9007199254740988kB, failcnt 0 [ 50.011886][ T6736] Memory cgroup stats for /leaker: [ 50.011950][ T6736] anon 534147072 [ 50.011950][ T6736] file 212992 [ 50.011950][ T6736] kernel_stack 36864 [ 50.011950][ T6736] slab 933888 [ 50.011950][ T6736] sock 0 [ 50.011950][ T6736] shmem 0 [ 50.011950][ T6736] file_mapped 0 [ 50.011950][ T6736] file_dirty 0 [ 50.011950][ T6736] file_writeback 0 [ 50.011950][ T6736] anon_thp 0 [ 50.011950][ T6736] inactive_anon 0 [ 50.011950][ T6736] active_anon 534048768 [ 50.011950][ T6736] inactive_file 0 [ 50.011950][ T6736] active_file 151552 [ 50.011950][ T6736] unevictable 0 [ 50.011950][ T6736] slab_reclaimable 327680 [ 50.011950][ T6736] slab_unreclaimable 606208 [ 50.011950][ T6736] pgfault 140250 [ 50.011950][ T6736] pgmajfault 693 [ 50.011950][ T6736] workingset_refault 169950 [ 50.011950][ T6736] workingset_activate 1353 [ 50.011950][ T6736] workingset_nodereclaim 0 [ 50.011950][ T6736] pgrefill 5848 [ 50.011950][ T6736] pgscan 859688 [ 50.011950][ T6736] pgsteal 180103 [ 50.052086][ T6736] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),oom_memcg=/leaker,task_memcg=/leaker,task=leaker,pid=6736,uid=0 [ 50.056749][ T6736] Memory cgroup out of memory: Killed process 6736 (leaker) total-vm:536700kB, anon-rss:521704kB, file-rss:1180kB, shmem-rss:0kB [ 50.167554][ T55] oom_reaper: reaped process 6736 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. 2019-08-06 10:26 ` Tetsuo Handa @ 2019-08-06 10:50 ` Michal Hocko 2019-08-06 12:48 ` [PATCH v3] memcg, oom: don't require __GFP_FS when invoking memcg OOM killer Tetsuo Handa 0 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2019-08-06 10:50 UTC (permalink / raw) To: Tetsuo Handa Cc: Andrew Morton, Masoud Sharbiani, Greg KH, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel On Tue 06-08-19 19:26:12, Tetsuo Handa wrote: > On 2019/08/05 23:26, Michal Hocko wrote: > > On Mon 05-08-19 23:00:12, Tetsuo Handa wrote: > >> On 2019/08/05 20:44, Michal Hocko wrote: > >>>> Allowing forced charge due to being unable to invoke memcg OOM killer > >>>> will lead to global OOM situation, and just returning -ENOMEM will not > >>>> solve memcg OOM situation. > >>> > >>> Returning -ENOMEM would effectivelly lead to triggering the oom killer > >>> from the page fault bail out path. So effectively get us back to before > >>> 29ef680ae7c21110. But it is true that this is riskier from the > >>> observability POV when a) the OOM path wouldn't point to the culprit and > >>> b) it would leak ENOMEM from g-u-p path. > >>> > >> > >> Excuse me? But according to my experiment, below code showed flood of > >> "Returning -ENOMEM" message instead of invoking the OOM killer. > >> I didn't find it gets us back to before 29ef680ae7c21110... > > > > You would need to declare OOM_ASYNC to return ENOMEM properly from the > > charge (which is effectivelly a revert of 29ef680ae7c21110 for NOFS > > allocations). Something like the following > > > > OK. We need to set current->memcg_* before declaring something other than > OOM_SUCCESS and OOM_FAILED... Well, it seems that returning -ENOMEM after > setting current->memcg_* works as expected. What's wrong with your approach? As I've said, and hoped you could pick up parts for your changelog for the ENOMEM part, a) oom path is lost b) some paths will leak ENOMEM e.g. g-u-p. So your patch to trigger the oom even for NOFS is a better alternative I just found your ENOMEM note misleading and something that could improve. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3] memcg, oom: don't require __GFP_FS when invoking memcg OOM killer 2019-08-06 10:50 ` Michal Hocko @ 2019-08-06 12:48 ` Tetsuo Handa 0 siblings, 0 replies; 23+ messages in thread From: Tetsuo Handa @ 2019-08-06 12:48 UTC (permalink / raw) To: Michal Hocko, Andrew Morton Cc: Masoud Sharbiani, Greg KH, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel Masoud Sharbiani noticed that commit 29ef680ae7c21110 ("memcg, oom: move out_of_memory back to the charge path") broke memcg OOM called from __xfs_filemap_fault() path. It turned out that try_charge() is retrying forever without making forward progress because mem_cgroup_oom(GFP_NOFS) cannot invoke the OOM killer due to commit 3da88fb3bacfaa33 ("mm, oom: move GFP_NOFS check to out_of_memory"). Allowing forced charge due to being unable to invoke memcg OOM killer will lead to global OOM situation. Also, just returning -ENOMEM will be risky because OOM path is lost and some paths (e.g. get_user_pages()) will leak -ENOMEM. Therefore, invoking memcg OOM killer (despite GFP_NOFS) will be the only choice we can choose for now. Until 29ef680ae7c21110~1, we were able to invoke memcg OOM killer when GFP_KERNEL reclaim failed [1]. But since 29ef680ae7c21110, we need to invoke memcg OOM killer when GFP_NOFS reclaim failed [2]. Although in the past we did invoke memcg OOM killer for GFP_NOFS [3], we might get pre-mature memcg OOM reports due to this patch. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-and-tested-by: Masoud Sharbiani <msharbiani@apple.com> Bisected-by: Masoud Sharbiani <msharbiani@apple.com> Acked-by: Michal Hocko <mhocko@suse.com> Fixes: 3da88fb3bacfaa33 # necessary after 29ef680ae7c21110 Cc: <stable@vger.kernel.org> # 4.19+ [1] leaker invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 CPU: 0 PID: 2746 Comm: leaker Not tainted 4.18.0+ #19 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 Call Trace: dump_stack+0x63/0x88 dump_header+0x67/0x27a ? mem_cgroup_scan_tasks+0x91/0xf0 oom_kill_process+0x210/0x410 out_of_memory+0x10a/0x2c0 mem_cgroup_out_of_memory+0x46/0x80 mem_cgroup_oom_synchronize+0x2e4/0x310 ? high_work_func+0x20/0x20 pagefault_out_of_memory+0x31/0x76 mm_fault_error+0x55/0x115 ? handle_mm_fault+0xfd/0x220 __do_page_fault+0x433/0x4e0 do_page_fault+0x22/0x30 ? page_fault+0x8/0x30 page_fault+0x1e/0x30 RIP: 0033:0x4009f0 Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83 RSP: 002b:00007ffe29ae96f0 EFLAGS: 00010206 RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001ce1000 RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000 RBP: 000000000000000c R08: 0000000000000000 R09: 00007f94be09220d R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0 R13: 0000000000000003 R14: 00007f949d845000 R15: 0000000002800000 Task in /leaker killed as a result of limit of /leaker memory: usage 524288kB, limit 524288kB, failcnt 158965 memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 kmem: usage 2016kB, limit 9007199254740988kB, failcnt 0 Memory cgroup stats for /leaker: cache:844KB rss:521136KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:132KB writeback:0KB inactive_anon:0KB active_anon:521224KB inactive_file:1012KB active_file:8KB unevictable:0KB Memory cgroup out of memory: Kill process 2746 (leaker) score 998 or sacrifice child Killed process 2746 (leaker) total-vm:536704kB, anon-rss:521176kB, file-rss:1208kB, shmem-rss:0kB oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [2] leaker invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), nodemask=(null), order=0, oom_score_adj=0 CPU: 1 PID: 2746 Comm: leaker Not tainted 4.18.0+ #20 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 Call Trace: dump_stack+0x63/0x88 dump_header+0x67/0x27a ? mem_cgroup_scan_tasks+0x91/0xf0 oom_kill_process+0x210/0x410 out_of_memory+0x109/0x2d0 mem_cgroup_out_of_memory+0x46/0x80 try_charge+0x58d/0x650 ? __radix_tree_replace+0x81/0x100 mem_cgroup_try_charge+0x7a/0x100 __add_to_page_cache_locked+0x92/0x180 add_to_page_cache_lru+0x4d/0xf0 iomap_readpages_actor+0xde/0x1b0 ? iomap_zero_range_actor+0x1d0/0x1d0 iomap_apply+0xaf/0x130 iomap_readpages+0x9f/0x150 ? iomap_zero_range_actor+0x1d0/0x1d0 xfs_vm_readpages+0x18/0x20 [xfs] read_pages+0x60/0x140 __do_page_cache_readahead+0x193/0x1b0 ondemand_readahead+0x16d/0x2c0 page_cache_async_readahead+0x9a/0xd0 filemap_fault+0x403/0x620 ? alloc_set_pte+0x12c/0x540 ? _cond_resched+0x14/0x30 __xfs_filemap_fault+0x66/0x180 [xfs] xfs_filemap_fault+0x27/0x30 [xfs] __do_fault+0x19/0x40 __handle_mm_fault+0x8e8/0xb60 handle_mm_fault+0xfd/0x220 __do_page_fault+0x238/0x4e0 do_page_fault+0x22/0x30 ? page_fault+0x8/0x30 page_fault+0x1e/0x30 RIP: 0033:0x4009f0 Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83 RSP: 002b:00007ffda45c9290 EFLAGS: 00010206 RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001a1e000 RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000 RBP: 000000000000000c R08: 0000000000000000 R09: 00007f6d061ff20d R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0 R13: 0000000000000003 R14: 00007f6ce59b2000 R15: 0000000002800000 Task in /leaker killed as a result of limit of /leaker memory: usage 524288kB, limit 524288kB, failcnt 7221 memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 kmem: usage 1944kB, limit 9007199254740988kB, failcnt 0 Memory cgroup stats for /leaker: cache:3632KB rss:518232KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:518408KB inactive_file:3908KB active_file:12KB unevictable:0KB Memory cgroup out of memory: Kill process 2746 (leaker) score 992 or sacrifice child Killed process 2746 (leaker) total-vm:536704kB, anon-rss:518264kB, file-rss:1188kB, shmem-rss:0kB oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [3] leaker invoked oom-killer: gfp_mask=0x50, order=0, oom_score_adj=0 leaker cpuset=/ mems_allowed=0 CPU: 1 PID: 3206 Comm: leaker Not tainted 3.10.0-957.27.2.el7.x86_64 #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 Call Trace: [<ffffffffaf364147>] dump_stack+0x19/0x1b [<ffffffffaf35eb6a>] dump_header+0x90/0x229 [<ffffffffaedbb456>] ? find_lock_task_mm+0x56/0xc0 [<ffffffffaee32a38>] ? try_get_mem_cgroup_from_mm+0x28/0x60 [<ffffffffaedbb904>] oom_kill_process+0x254/0x3d0 [<ffffffffaee36c36>] mem_cgroup_oom_synchronize+0x546/0x570 [<ffffffffaee360b0>] ? mem_cgroup_charge_common+0xc0/0xc0 [<ffffffffaedbc194>] pagefault_out_of_memory+0x14/0x90 [<ffffffffaf35d072>] mm_fault_error+0x6a/0x157 [<ffffffffaf3717c8>] __do_page_fault+0x3c8/0x4f0 [<ffffffffaf371925>] do_page_fault+0x35/0x90 [<ffffffffaf36d768>] page_fault+0x28/0x30 Task in /leaker killed as a result of limit of /leaker memory: usage 524288kB, limit 524288kB, failcnt 20628 memory+swap: usage 524288kB, limit 9007199254740988kB, failcnt 0 kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Memory cgroup stats for /leaker: cache:840KB rss:523448KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:523448KB inactive_file:464KB active_file:376KB unevictable:0KB Memory cgroup out of memory: Kill process 3206 (leaker) score 970 or sacrifice child Killed process 3206 (leaker) total-vm:536692kB, anon-rss:523304kB, file-rss:412kB, shmem-rss:0kB --- mm/oom_kill.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index eda2e2a..26804ab 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1068,9 +1068,10 @@ bool out_of_memory(struct oom_control *oc) * The OOM killer does not compensate for IO-less reclaim. * pagefault_out_of_memory lost its gfp context so we have to * make sure exclude 0 mask - all other users should have at least - * ___GFP_DIRECT_RECLAIM to get here. + * ___GFP_DIRECT_RECLAIM to get here. But mem_cgroup_oom() has to + * invoke the OOM killer even if it is a GFP_NOFS allocation. */ - if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS)) + if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc)) return true; /* -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. [not found] ` <A06C5313-B021-4ADA-9897-CE260A9011CC@apple.com> 2019-08-03 2:36 ` Tetsuo Handa @ 2019-08-05 8:18 ` Michal Hocko 1 sibling, 0 replies; 23+ messages in thread From: Michal Hocko @ 2019-08-05 8:18 UTC (permalink / raw) To: Masoud Sharbiani Cc: Greg KH, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel On Fri 02-08-19 16:28:25, Masoud Sharbiani wrote: > > > > On Aug 2, 2019, at 12:14 PM, Michal Hocko <mhocko@kernel.org> wrote: > > > > On Fri 02-08-19 11:00:55, Masoud Sharbiani wrote: > >> > >> > >>> On Aug 2, 2019, at 7:41 AM, Michal Hocko <mhocko@kernel.org> wrote: > >>> > >>> On Fri 02-08-19 07:18:17, Masoud Sharbiani wrote: > >>>> > >>>> > >>>>> On Aug 2, 2019, at 12:40 AM, Michal Hocko <mhocko@kernel.org> wrote: > >>>>> > >>>>> On Thu 01-08-19 11:04:14, Masoud Sharbiani wrote: > >>>>>> Hey folks, > >>>>>> I’ve come across an issue that affects most of 4.19, 4.20 and 5.2 linux-stable kernels that has only been fixed in 5.3-rc1. > >>>>>> It was introduced by > >>>>>> > >>>>>> 29ef680 memcg, oom: move out_of_memory back to the charge path > >>>>> > >>>>> This commit shouldn't really change the OOM behavior for your particular > >>>>> test case. It would have changed MAP_POPULATE behavior but your usage is > >>>>> triggering the standard page fault path. The only difference with > >>>>> 29ef680 is that the OOM killer is invoked during the charge path rather > >>>>> than on the way out of the page fault. > >>>>> > >>>>> Anyway, I tried to run your test case in a loop and leaker always ends > >>>>> up being killed as expected with 5.2. See the below oom report. There > >>>>> must be something else going on. How much swap do you have on your > >>>>> system? > >>>> > >>>> I do not have swap defined. > >>> > >>> OK, I have retested with swap disabled and again everything seems to be > >>> working as expected. The oom happens earlier because I do not have to > >>> wait for the swap to get full. > >>> > >> > >> In my tests (with the script provided), it only loops 11 iterations before hanging, and uttering the soft lockup message. > >> > >> > >>> Which fs do you use to write the file that you mmap? > >> > >> /dev/sda3 on / type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota) > >> > >> Part of the soft lockup path actually specifies that it is going through __xfs_filemap_fault(): > > > > Right, I have just missed that. > > > > [...] > > > >> If I switch the backing file to a ext4 filesystem (separate hard drive), it OOMs. > >> > >> > >> If I switch the file used to /dev/zero, it OOMs: > >> … > >> Todal sum was 0. Loop count is 11 > >> Buffer is @ 0x7f2b66c00000 > >> ./test-script-devzero.sh: line 16: 3561 Killed ./leaker -p 10240 -c 100000 > >> > >> > >>> Or could you try to > >>> simplify your test even further? E.g. does everything work as expected > >>> when doing anonymous mmap rather than file backed one? > >> > >> It also OOMs with MAP_ANON. > >> > >> Hope that helps. > > > > It helps to focus more on the xfs reclaim path. Just to be sure, is > > there any difference if you use cgroup v2? I do not expect to be but > > just to be sure there are no v1 artifacts. > > I was unable to use cgroups2. I’ve created the new control group, but the attempt to move a running process into it fails with ‘Device or resource busy’. Have you enabled the memory controller for the hierarchy? Please read Documentation/admin-guide/cgroup-v2.rst for more information. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20190802121059.13192-1-hdanton@sina.com>]
* Re: Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1. [not found] <20190802121059.13192-1-hdanton@sina.com> @ 2019-08-02 13:40 ` Michal Hocko 0 siblings, 0 replies; 23+ messages in thread From: Michal Hocko @ 2019-08-02 13:40 UTC (permalink / raw) To: Hillf Danton Cc: Masoud Sharbiani, hannes, vdavydov.dev, linux-mm, cgroups, linux-kernel, Greg KH On Fri 02-08-19 20:10:58, Hillf Danton wrote: > > On Fri, 2 Aug 2019 16:18:40 +0800 Michal Hocko wrote: [...] > > Huh, what? You are effectively saying that we should fail the charge > > when the requested nr_pages would fit in. This doesn't make much sense > > to me. What are you trying to achive here? > > The report looks like the result of a tight loop. > I want to break it and make the end result of do_page_fault unsuccessful > if nr_retries rounds of page reclaiming fail to get work done. What made > me a bit over stretched is how to determine if the chargee is a memhog > in memcg's vocabulary. > What I prefer here is that do_page_fault succeeds, even if the chargee > exhausts its memory quota/budget granted, as long as more than nr_pages > can be reclaimed _within_ nr_retries rounds. IOW the deadline for memhog > is nr_retries, and no more. No, this really doesn't really make sense because it leads to pre-mature charge failures. The charge path is funadamentally not different from the page allocator path. We do try to reclaim and retry the allocation. We keep retrying for ever for non-costly order requests in both cases (modulo some corner cases like oom victims etc.). -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2019-08-06 12:48 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-08-01 18:04 Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1 Masoud Sharbiani 2019-08-01 18:19 ` Greg KH 2019-08-02 1:08 ` Masoud Sharbiani 2019-08-02 8:18 ` Michal Hocko 2019-08-02 7:40 ` Michal Hocko 2019-08-02 14:18 ` Masoud Sharbiani 2019-08-02 14:41 ` Michal Hocko 2019-08-02 18:00 ` Masoud Sharbiani 2019-08-02 19:14 ` Michal Hocko [not found] ` <A06C5313-B021-4ADA-9897-CE260A9011CC@apple.com> 2019-08-03 2:36 ` Tetsuo Handa 2019-08-03 15:51 ` Tetsuo Handa 2019-08-03 17:41 ` Masoud Sharbiani 2019-08-03 18:24 ` Masoud Sharbiani 2019-08-05 8:42 ` Michal Hocko 2019-08-05 11:36 ` Tetsuo Handa 2019-08-05 11:44 ` Michal Hocko 2019-08-05 14:00 ` Tetsuo Handa 2019-08-05 14:26 ` Michal Hocko 2019-08-06 10:26 ` Tetsuo Handa 2019-08-06 10:50 ` Michal Hocko 2019-08-06 12:48 ` [PATCH v3] memcg, oom: don't require __GFP_FS when invoking memcg OOM killer Tetsuo Handa 2019-08-05 8:18 ` Possible mem cgroup bug in kernels between 4.18.0 and 5.3-rc1 Michal Hocko [not found] <20190802121059.13192-1-hdanton@sina.com> 2019-08-02 13:40 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).