* Re: OOM killer changes [not found] <d8f3adcc-3607-1ef6-9ec5-82b2e125eef2@quantum.com> @ 2016-08-01 6:16 ` Michal Hocko [not found] ` <b1a39756-a0b5-1900-6575-d6e1f502cb26@Quantum.com> 0 siblings, 1 reply; 50+ messages in thread From: Michal Hocko @ 2016-08-01 6:16 UTC (permalink / raw) To: Ralf-Peter Rohbeck; +Cc: linux-mm [CC linux-mm] On Sun 31-07-16 21:29:02, Ralf-Peter Rohbeck wrote: > Hello, > > I just noted that 4.7rc7 killed processes for no good reason apparently, on > a system with plenty of memory free and plenty of swap space. Have you seen a similar with 4.6? Can you reproduce this behavior? > At the time I initialized some USB3 drives by overwriting them with zeroes > so IO was constantly busy (sync never finished.) Not sure if that was the > reason. Still looking. Could you share your OOM report please? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
[parent not found: <b1a39756-a0b5-1900-6575-d6e1f502cb26@Quantum.com>]
[parent not found: <20160801182358.GB31957@dhcp22.suse.cz>]
[parent not found: <30dbabc4-585c-55a5-9f3a-4e243c28356a@Quantum.com>]
* Re: OOM killer changes [not found] ` <30dbabc4-585c-55a5-9f3a-4e243c28356a@Quantum.com> @ 2016-08-01 19:26 ` Michal Hocko 2016-08-01 19:35 ` Ralf-Peter Rohbeck ` (2 more replies) 0 siblings, 3 replies; 50+ messages in thread From: Michal Hocko @ 2016-08-01 19:26 UTC (permalink / raw) To: Ralf-Peter Rohbeck; +Cc: linux-mm, Vlastimil Babka [re-adding linux-mm mailing list - please always use reply-to-all also CCing Vlastimil who can help with the compaction debugging] On Mon 01-08-16 11:48:53, Ralf-Peter Rohbeck wrote: > See the messages log attached. It has several OOM killer entries. > Let me know if there's anything else I can do. I'll try the disk erasing on > 4.6 and on 4.7. Jul 31 17:17:05 fs kernel: [11918.534744] x2golistsession invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 [...] Jul 31 17:17:05 fs kernel: [11918.557356] Mem-Info: Jul 31 17:17:05 fs kernel: [11918.558268] active_anon:7856 inactive_anon:21924 isolated_anon:0 Jul 31 17:17:05 fs kernel: [11918.558268] active_file:70925 inactive_file:1796707 isolated_file:0 Jul 31 17:17:05 fs kernel: [11918.558268] unevictable:0 dirty:277675 writeback:57117 unstable:0 Jul 31 17:17:05 fs kernel: [11918.558268] slab_reclaimable:75821 slab_unreclaimable:9490 Jul 31 17:17:05 fs kernel: [11918.558268] mapped:12014 shmem:2414 pagetables:1497 bounce:0 Jul 31 17:17:05 fs kernel: [11918.558268] free:37021 free_pcp:89 free_cma:0 [...] Jul 31 17:17:05 fs kernel: [11918.578836] Node 0 DMA32: 2137*4kB (UME) 5043*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 48892kB Jul 31 17:17:05 fs kernel: [11918.580370] Node 0 Normal: 2663*4kB (UME) 7452*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70268kB The above process is trying to allocate the kernel stack which is order-2 (16kB) of physically contiguous memory which is clearly not available as you can see. Memory compaction (assuming you have CONFIG_COMPACTION enabled) which is a part of the oom reclaim process should help to form such blocks but those retries are bound and if there is not much hope left we eventually hit the OOM killer. If you look at the above counters there is a lot of memory dirty and under the writeback (1.3G), this suggests that the IO is quite slow wrt. writers. Anyway there is a lot of anonymous memory which should be a good candidate for compaction. But the IO doesn't seem to be the main factor I guess. Later OOM invocations have a slightly different pattern (let's take the last one): Aug 1 06:30:45 fs kernel: [59536.957034] x2golistsession invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 [...] Aug 1 06:30:45 fs kernel: [59536.976467] Mem-Info: Aug 1 06:30:45 fs kernel: [59536.977442] active_anon:16045 inactive_anon:20473 isolated_anon:0 Aug 1 06:30:45 fs kernel: [59536.977442] active_file:169767 inactive_file:1727008 isolated_file:0 Aug 1 06:30:45 fs kernel: [59536.977442] unevictable:0 dirty:32734 writeback:0 unstable:0 Aug 1 06:30:45 fs kernel: [59536.977442] slab_reclaimable:41953 slab_unreclaimable:7507 Aug 1 06:30:45 fs kernel: [59536.977442] mapped:10619 shmem:2443 pagetables:1971 bounce:0 Aug 1 06:30:45 fs kernel: [59536.977442] free:36686 free_pcp:119 free_cma:0 [...] Aug 1 06:30:45 fs kernel: [59536.996407] Node 0 DMA32: 5909*4kB (UME) 3800*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54036kB Aug 1 06:30:45 fs kernel: [59536.997846] Node 0 Normal: 4041*4kB (UME) 6799*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70556kB the amount of dirty pages is much smaller as well as the anonymous memory. The biggest portion seems to be in the page cache. The memory is till hugely fragmented though. In fact if we check all the OOM invocations the only consistent thing is that the memory is fragmented and the compaction cannot make sufficient progress consistently. We can assume that the situation actually gets better because there are some holes between those OOMs so we can assume that something has unpinned a larger amount memory and allowed the compaction to make further progress or that the load has strong peaks. We would need more information from the compaction to know better. Vlastimil will surely tell you which tracepoints to enable. Jul 31 17:17:05 fs kernel: [11918.578836] Node 0 DMA32: 2137*4kB (UME) 5043*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 48892kB Jul 31 17:17:05 fs kernel: [11918.580370] Node 0 Normal: 2663*4kB (UME) 7452*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70268kB Jul 31 20:17:51 fs kernel: [22764.494449] Node 0 DMA32: 2568*4kB (UME) 5472*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54048kB Jul 31 20:17:51 fs kernel: [22764.495510] Node 0 Normal: 6109*4kB (UME) 6651*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 77660kB Jul 31 20:57:18 fs kernel: [25131.260737] Node 0 DMA32: 2139*4kB (UME) 5114*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 49468kB Jul 31 20:57:18 fs kernel: [25131.262060] Node 0 Normal: 3611*4kB (UME) 7312*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 72940kB Jul 31 23:36:25 fs kernel: [34677.849133] Node 0 DMA32: 10276*4kB (UME) 3565*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 69624kB Jul 31 23:36:25 fs kernel: [34677.850547] Node 0 Normal: 19080*4kB (UE) 1361*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 87208kB Jul 31 23:36:35 fs kernel: [34688.300852] Node 0 DMA32: 2291*4kB (UME) 5208*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 50828kB Jul 31 23:36:35 fs kernel: [34688.301959] Node 0 Normal: 5519*4kB (UME) 7338*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 80780kB Jul 31 23:36:40 fs kernel: [34692.902932] Node 0 DMA32: 3163*4kB (UE) 4566*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 49180kB Jul 31 23:36:40 fs kernel: [34692.904897] Node 0 Normal: 5833*4kB (UE) 6387*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 74428kB Jul 31 23:36:47 fs kernel: [34699.517079] Node 0 DMA32: 3068*4kB (UME) 4889*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 51384kB Jul 31 23:36:47 fs kernel: [34699.518537] Node 0 Normal: 5935*4kB (UME) 7324*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 82332kB Jul 31 23:36:50 fs kernel: [34702.755342] Node 0 DMA32: 4975*4kB (UME) 4500*8kB (UM) 3*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55948kB Jul 31 23:36:50 fs kernel: [34702.757018] Node 0 Normal: 7171*4kB (UE) 6047*8kB (U) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 77076kB Jul 31 23:39:39 fs kernel: [34871.854243] Node 0 DMA32: 14269*4kB (UME) 1547*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 69452kB Jul 31 23:39:39 fs kernel: [34871.855525] Node 0 Normal: 19081*4kB (UME) 28*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 76548kB Jul 31 23:39:44 fs kernel: [34876.491809] Node 0 DMA32: 11368*4kB (UME) 4265*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 79592kB Jul 31 23:39:44 fs kernel: [34876.493233] Node 0 Normal: 20088*4kB (UME) 236*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 82240kB Jul 31 23:39:53 fs kernel: [34885.459361] Node 0 DMA32: 13302*4kB (UME) 2180*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70648kB Jul 31 23:39:53 fs kernel: [34885.461011] Node 0 Normal: 18393*4kB (UE) 512*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 77668kB Jul 31 23:39:55 fs kernel: [34887.848712] Node 0 DMA32: 14180*4kB (UE) 1690*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70240kB Jul 31 23:39:55 fs kernel: [34887.850194] Node 0 Normal: 19598*4kB (UM) 21*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 78560kB Aug 1 06:30:42 fs kernel: [59534.373842] Node 0 DMA32: 4458*4kB (UME) 4252*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 51848kB Aug 1 06:30:42 fs kernel: [59534.375266] Node 0 Normal: 2265*4kB (U) 7168*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 66404kB Aug 1 06:30:45 fs kernel: [59536.996407] Node 0 DMA32: 5909*4kB (UME) 3800*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54036kB Aug 1 06:30:45 fs kernel: [59536.997846] Node 0 Normal: 4041*4kB (UME) 6799*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70556kB -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-01 19:26 ` Michal Hocko @ 2016-08-01 19:35 ` Ralf-Peter Rohbeck 2016-08-01 19:43 ` Michal Hocko 2016-08-02 7:11 ` Vlastimil Babka 2016-08-02 9:02 ` Michal Hocko 2 siblings, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-01 19:35 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Vlastimil Babka On 01.08.2016 12:26, Michal Hocko wrote: > [re-adding linux-mm mailing list - please always use reply-to-all > also CCing Vlastimil who can help with the compaction debugging] > > On Mon 01-08-16 11:48:53, Ralf-Peter Rohbeck wrote: >> See the messages log attached. It has several OOM killer entries. >> Let me know if there's anything else I can do. I'll try the disk erasing on >> 4.6 and on 4.7. > Jul 31 17:17:05 fs kernel: [11918.534744] x2golistsession invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 > [...] > Jul 31 17:17:05 fs kernel: [11918.557356] Mem-Info: > Jul 31 17:17:05 fs kernel: [11918.558268] active_anon:7856 inactive_anon:21924 isolated_anon:0 > Jul 31 17:17:05 fs kernel: [11918.558268] active_file:70925 inactive_file:1796707 isolated_file:0 > Jul 31 17:17:05 fs kernel: [11918.558268] unevictable:0 dirty:277675 writeback:57117 unstable:0 > Jul 31 17:17:05 fs kernel: [11918.558268] slab_reclaimable:75821 slab_unreclaimable:9490 > Jul 31 17:17:05 fs kernel: [11918.558268] mapped:12014 shmem:2414 pagetables:1497 bounce:0 > Jul 31 17:17:05 fs kernel: [11918.558268] free:37021 free_pcp:89 free_cma:0 > [...] > Jul 31 17:17:05 fs kernel: [11918.578836] Node 0 DMA32: 2137*4kB (UME) 5043*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 48892kB > Jul 31 17:17:05 fs kernel: [11918.580370] Node 0 Normal: 2663*4kB (UME) 7452*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70268kB > > The above process is trying to allocate the kernel stack which is > order-2 (16kB) of physically contiguous memory which is clearly > not available as you can see. Memory compaction (assuming you have > CONFIG_COMPACTION enabled) which is a part of the oom reclaim process I'm using the Debian kernel from experimental. CONFIG_COMPACTION is enabled: root@fs:~# fgrep CONFIG_COMPACTION /boot/config-4.7.0-rc7-amd64 CONFIG_COMPACTION=y > should help to form such blocks but those retries are bound and if > there is not much hope left we eventually hit the OOM killer. If you > look at the above counters there is a lot of memory dirty and under the > writeback (1.3G), this suggests that the IO is quite slow wrt. writers. > Anyway there is a lot of anonymous memory which should be a good > candidate for compaction. > > But the IO doesn't seem to be the main factor I guess. Later OOM > invocations have a slightly different pattern (let's take the last one): > > Aug 1 06:30:45 fs kernel: [59536.957034] x2golistsession invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 > [...] > Aug 1 06:30:45 fs kernel: [59536.976467] Mem-Info: > Aug 1 06:30:45 fs kernel: [59536.977442] active_anon:16045 inactive_anon:20473 isolated_anon:0 > Aug 1 06:30:45 fs kernel: [59536.977442] active_file:169767 inactive_file:1727008 isolated_file:0 > Aug 1 06:30:45 fs kernel: [59536.977442] unevictable:0 dirty:32734 writeback:0 unstable:0 > Aug 1 06:30:45 fs kernel: [59536.977442] slab_reclaimable:41953 slab_unreclaimable:7507 > Aug 1 06:30:45 fs kernel: [59536.977442] mapped:10619 shmem:2443 pagetables:1971 bounce:0 > Aug 1 06:30:45 fs kernel: [59536.977442] free:36686 free_pcp:119 free_cma:0 > [...] > Aug 1 06:30:45 fs kernel: [59536.996407] Node 0 DMA32: 5909*4kB (UME) 3800*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54036kB > Aug 1 06:30:45 fs kernel: [59536.997846] Node 0 Normal: 4041*4kB (UME) 6799*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70556kB > > the amount of dirty pages is much smaller as well as the anonymous > memory. The biggest portion seems to be in the page cache. The memory The page cache will always be full if I'm writing at full steam to multiple drives, no? > is till hugely fragmented though. In fact if we check all the OOM > invocations the only consistent thing is that the memory is fragmented > and the compaction cannot make sufficient progress consistently. We can > assume that the situation actually gets better because there are some > holes between those OOMs so we can assume that something has unpinned a > larger amount memory and allowed the compaction to make further progress > or that the load has strong peaks. We would need more information from > the compaction to know better. Vlastimil will surely tell you which > tracepoints to enable. > > Jul 31 17:17:05 fs kernel: [11918.578836] Node 0 DMA32: 2137*4kB (UME) 5043*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 48892kB > Jul 31 17:17:05 fs kernel: [11918.580370] Node 0 Normal: 2663*4kB (UME) 7452*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70268kB > Jul 31 20:17:51 fs kernel: [22764.494449] Node 0 DMA32: 2568*4kB (UME) 5472*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54048kB > Jul 31 20:17:51 fs kernel: [22764.495510] Node 0 Normal: 6109*4kB (UME) 6651*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 77660kB > Jul 31 20:57:18 fs kernel: [25131.260737] Node 0 DMA32: 2139*4kB (UME) 5114*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 49468kB > Jul 31 20:57:18 fs kernel: [25131.262060] Node 0 Normal: 3611*4kB (UME) 7312*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 72940kB > Jul 31 23:36:25 fs kernel: [34677.849133] Node 0 DMA32: 10276*4kB (UME) 3565*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 69624kB > Jul 31 23:36:25 fs kernel: [34677.850547] Node 0 Normal: 19080*4kB (UE) 1361*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 87208kB > Jul 31 23:36:35 fs kernel: [34688.300852] Node 0 DMA32: 2291*4kB (UME) 5208*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 50828kB > Jul 31 23:36:35 fs kernel: [34688.301959] Node 0 Normal: 5519*4kB (UME) 7338*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 80780kB > Jul 31 23:36:40 fs kernel: [34692.902932] Node 0 DMA32: 3163*4kB (UE) 4566*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 49180kB > Jul 31 23:36:40 fs kernel: [34692.904897] Node 0 Normal: 5833*4kB (UE) 6387*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 74428kB > Jul 31 23:36:47 fs kernel: [34699.517079] Node 0 DMA32: 3068*4kB (UME) 4889*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 51384kB > Jul 31 23:36:47 fs kernel: [34699.518537] Node 0 Normal: 5935*4kB (UME) 7324*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 82332kB > Jul 31 23:36:50 fs kernel: [34702.755342] Node 0 DMA32: 4975*4kB (UME) 4500*8kB (UM) 3*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55948kB > Jul 31 23:36:50 fs kernel: [34702.757018] Node 0 Normal: 7171*4kB (UE) 6047*8kB (U) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 77076kB > Jul 31 23:39:39 fs kernel: [34871.854243] Node 0 DMA32: 14269*4kB (UME) 1547*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 69452kB > Jul 31 23:39:39 fs kernel: [34871.855525] Node 0 Normal: 19081*4kB (UME) 28*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 76548kB > Jul 31 23:39:44 fs kernel: [34876.491809] Node 0 DMA32: 11368*4kB (UME) 4265*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 79592kB > Jul 31 23:39:44 fs kernel: [34876.493233] Node 0 Normal: 20088*4kB (UME) 236*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 82240kB > Jul 31 23:39:53 fs kernel: [34885.459361] Node 0 DMA32: 13302*4kB (UME) 2180*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70648kB > Jul 31 23:39:53 fs kernel: [34885.461011] Node 0 Normal: 18393*4kB (UE) 512*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 77668kB > Jul 31 23:39:55 fs kernel: [34887.848712] Node 0 DMA32: 14180*4kB (UE) 1690*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70240kB > Jul 31 23:39:55 fs kernel: [34887.850194] Node 0 Normal: 19598*4kB (UM) 21*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 78560kB > Aug 1 06:30:42 fs kernel: [59534.373842] Node 0 DMA32: 4458*4kB (UME) 4252*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 51848kB > Aug 1 06:30:42 fs kernel: [59534.375266] Node 0 Normal: 2265*4kB (U) 7168*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 66404kB > Aug 1 06:30:45 fs kernel: [59536.996407] Node 0 DMA32: 5909*4kB (UME) 3800*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54036kB > Aug 1 06:30:45 fs kernel: [59536.997846] Node 0 Normal: 4041*4kB (UME) 6799*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70556kB ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-01 19:35 ` Ralf-Peter Rohbeck @ 2016-08-01 19:43 ` Michal Hocko 2016-08-01 19:52 ` Ralf-Peter Rohbeck 0 siblings, 1 reply; 50+ messages in thread From: Michal Hocko @ 2016-08-01 19:43 UTC (permalink / raw) To: Ralf-Peter Rohbeck; +Cc: linux-mm, Vlastimil Babka On Mon 01-08-16 12:35:51, Ralf-Peter Rohbeck wrote: > On 01.08.2016 12:26, Michal Hocko wrote: [...] > > the amount of dirty pages is much smaller as well as the anonymous > > memory. The biggest portion seems to be in the page cache. The memory > > The page cache will always be full if I'm writing at full steam to multiple > drives, no? Yes, the memory full of page cache is not unusual. The large portion of that memory being dirty/writeback can be a problem. That is why we have a dirty memory throttling which slows down (throttles) writers to keep the amount reasonable. What is your dirty throttling setup? $ grep . /proc/sys/vm/dirty* and what is your storage setup? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-01 19:43 ` Michal Hocko @ 2016-08-01 19:52 ` Ralf-Peter Rohbeck 2016-08-01 20:09 ` Michal Hocko 0 siblings, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-01 19:52 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Vlastimil Babka On 01.08.2016 12:43, Michal Hocko wrote: > On Mon 01-08-16 12:35:51, Ralf-Peter Rohbeck wrote: >> On 01.08.2016 12:26, Michal Hocko wrote: > [...] >>> the amount of dirty pages is much smaller as well as the anonymous >>> memory. The biggest portion seems to be in the page cache. The memory >> The page cache will always be full if I'm writing at full steam to multiple >> drives, no? > Yes, the memory full of page cache is not unusual. The large portion of > that memory being dirty/writeback can be a problem. That is why we have > a dirty memory throttling which slows down (throttles) writers to keep > the amount reasonable. What is your dirty throttling setup? > $ grep . /proc/sys/vm/dirty* > > and what is your storage setup? root@fs:~# grep . /proc/sys/vm/dirty* /proc/sys/vm/dirty_background_bytes:0 /proc/sys/vm/dirty_background_ratio:10 /proc/sys/vm/dirty_bytes:0 /proc/sys/vm/dirty_expire_centisecs:3000 /proc/sys/vm/dirty_ratio:20 /proc/sys/vm/dirtytime_expire_seconds:43200 /proc/sys/vm/dirty_writeback_centisecs:500 Storage setup: root@fs:~# lsscsi [0:2:0:0] disk LSI MR9271-8iCC 3.29 /dev/sda [0:2:1:0] disk LSI MR9271-8iCC 3.29 /dev/sdb [9:0:0:0] disk TOSHIBA External USB 3.0 5438 /dev/sdf [10:0:0:0] disk Seagate Backup+ Desk 050B /dev/sdc [11:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdd [12:0:0:0] disk Seagate Backup+ Desk 050B /dev/sde [13:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdg [14:0:0:0] disk TOSHIBA External USB 3.0 5438 /dev/sdl [15:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdh [16:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdi [17:0:0:0] disk TOSHIBA External USB 3.0 5438 /dev/sdm [18:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdj [19:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdk sda is a 6x 1TB RAID5 and sdb is a single 480GB SSD, both on a MegaRAID controller. The rest are 4TB USB drives that I'm experimenting with. ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-01 19:52 ` Ralf-Peter Rohbeck @ 2016-08-01 20:09 ` Michal Hocko 2016-08-01 20:16 ` Ralf-Peter Rohbeck 0 siblings, 1 reply; 50+ messages in thread From: Michal Hocko @ 2016-08-01 20:09 UTC (permalink / raw) To: Ralf-Peter Rohbeck; +Cc: linux-mm, Vlastimil Babka On Mon 01-08-16 12:52:40, Ralf-Peter Rohbeck wrote: > On 01.08.2016 12:43, Michal Hocko wrote: > > On Mon 01-08-16 12:35:51, Ralf-Peter Rohbeck wrote: > > > On 01.08.2016 12:26, Michal Hocko wrote: > > [...] > > > > the amount of dirty pages is much smaller as well as the anonymous > > > > memory. The biggest portion seems to be in the page cache. The memory > > > The page cache will always be full if I'm writing at full steam to multiple > > > drives, no? > > Yes, the memory full of page cache is not unusual. The large portion of > > that memory being dirty/writeback can be a problem. That is why we have > > a dirty memory throttling which slows down (throttles) writers to keep > > the amount reasonable. What is your dirty throttling setup? > > $ grep . /proc/sys/vm/dirty* > > > > and what is your storage setup? > > root@fs:~# grep . /proc/sys/vm/dirty* > /proc/sys/vm/dirty_background_bytes:0 > /proc/sys/vm/dirty_background_ratio:10 > /proc/sys/vm/dirty_bytes:0 > /proc/sys/vm/dirty_expire_centisecs:3000 > /proc/sys/vm/dirty_ratio:20 With your 8G of RAM this can be quite a lot of dirty data at once. Is your storage able to write that back in a reasonable time? I mean this shouldn't cause the OOM killer but it can lead to some unexpected stalls especially when there are a lot of writers AFAIU. dirty_bytes knob should help to define a better cap. > /proc/sys/vm/dirtytime_expire_seconds:43200 > /proc/sys/vm/dirty_writeback_centisecs:500 > > > Storage setup: > > root@fs:~# lsscsi > [0:2:0:0] disk LSI MR9271-8iCC 3.29 /dev/sda > [0:2:1:0] disk LSI MR9271-8iCC 3.29 /dev/sdb > [9:0:0:0] disk TOSHIBA External USB 3.0 5438 /dev/sdf > [10:0:0:0] disk Seagate Backup+ Desk 050B /dev/sdc > [11:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdd > [12:0:0:0] disk Seagate Backup+ Desk 050B /dev/sde > [13:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdg > [14:0:0:0] disk TOSHIBA External USB 3.0 5438 /dev/sdl > [15:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdh > [16:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdi > [17:0:0:0] disk TOSHIBA External USB 3.0 5438 /dev/sdm > [18:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdj > [19:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdk > > sda is a 6x 1TB RAID5 and sdb is a single 480GB SSD, both on a MegaRAID > controller. > > The rest are 4TB USB drives that I'm experimenting with. Which devices did you write when hitting the OOM killer? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-01 20:09 ` Michal Hocko @ 2016-08-01 20:16 ` Ralf-Peter Rohbeck 2016-08-01 20:26 ` Michal Hocko 0 siblings, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-01 20:16 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Vlastimil Babka On 08/01/16 13:09, Michal Hocko wrote: > On Mon 01-08-16 12:52:40, Ralf-Peter Rohbeck wrote: >> On 01.08.2016 12:43, Michal Hocko wrote: >>> On Mon 01-08-16 12:35:51, Ralf-Peter Rohbeck wrote: >>>> On 01.08.2016 12:26, Michal Hocko wrote: >>> [...] >>>>> the amount of dirty pages is much smaller as well as the anonymous >>>>> memory. The biggest portion seems to be in the page cache. The memory >>>> The page cache will always be full if I'm writing at full steam to multiple >>>> drives, no? >>> Yes, the memory full of page cache is not unusual. The large portion of >>> that memory being dirty/writeback can be a problem. That is why we have >>> a dirty memory throttling which slows down (throttles) writers to keep >>> the amount reasonable. What is your dirty throttling setup? >>> $ grep . /proc/sys/vm/dirty* >>> >>> and what is your storage setup? >> root@fs:~# grep . /proc/sys/vm/dirty* >> /proc/sys/vm/dirty_background_bytes:0 >> /proc/sys/vm/dirty_background_ratio:10 >> /proc/sys/vm/dirty_bytes:0 >> /proc/sys/vm/dirty_expire_centisecs:3000 >> /proc/sys/vm/dirty_ratio:20 > With your 8G of RAM this can be quite a lot of dirty data at once. Is > your storage able to write that back in a reasonable time? I mean this > shouldn't cause the OOM killer but it can lead to some unexpected stalls > especially when there are a lot of writers AFAIU. dirty_bytes knob > should help to define a better cap. The main filesystems are on the MegaRAID and can do 500-600 MB/s. Writing to the USB drives only pushes about 90MB/s per drive. > >> /proc/sys/vm/dirtytime_expire_seconds:43200 >> /proc/sys/vm/dirty_writeback_centisecs:500 >> >> >> Storage setup: >> >> root@fs:~# lsscsi >> [0:2:0:0] disk LSI MR9271-8iCC 3.29 /dev/sda >> [0:2:1:0] disk LSI MR9271-8iCC 3.29 /dev/sdb >> [9:0:0:0] disk TOSHIBA External USB 3.0 5438 /dev/sdf >> [10:0:0:0] disk Seagate Backup+ Desk 050B /dev/sdc >> [11:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdd >> [12:0:0:0] disk Seagate Backup+ Desk 050B /dev/sde >> [13:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdg >> [14:0:0:0] disk TOSHIBA External USB 3.0 5438 /dev/sdl >> [15:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdh >> [16:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdi >> [17:0:0:0] disk TOSHIBA External USB 3.0 5438 /dev/sdm >> [18:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdj >> [19:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdk >> >> sda is a 6x 1TB RAID5 and sdb is a single 480GB SSD, both on a MegaRAID >> controller. >> >> The rest are 4TB USB drives that I'm experimenting with. > Which devices did you write when hitting the OOM killer? sdc, sdd and sde each at max speed, with a little bit of garden variety IO on sda and sdb. ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-01 20:16 ` Ralf-Peter Rohbeck @ 2016-08-01 20:26 ` Michal Hocko 2016-08-01 21:14 ` Ralf-Peter Rohbeck 0 siblings, 1 reply; 50+ messages in thread From: Michal Hocko @ 2016-08-01 20:26 UTC (permalink / raw) To: Ralf-Peter Rohbeck; +Cc: linux-mm, Vlastimil Babka On Mon 01-08-16 13:16:49, Ralf-Peter Rohbeck wrote: > > > On 08/01/16 13:09, Michal Hocko wrote: > > On Mon 01-08-16 12:52:40, Ralf-Peter Rohbeck wrote: [...] > > > root@fs:~# lsscsi > > > [0:2:0:0] disk LSI MR9271-8iCC 3.29 /dev/sda > > > [0:2:1:0] disk LSI MR9271-8iCC 3.29 /dev/sdb > > > [9:0:0:0] disk TOSHIBA External USB 3.0 5438 /dev/sdf > > > [10:0:0:0] disk Seagate Backup+ Desk 050B /dev/sdc > > > [11:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdd > > > [12:0:0:0] disk Seagate Backup+ Desk 050B /dev/sde > > > [13:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdg > > > [14:0:0:0] disk TOSHIBA External USB 3.0 5438 /dev/sdl > > > [15:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdh > > > [16:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdi > > > [17:0:0:0] disk TOSHIBA External USB 3.0 5438 /dev/sdm > > > [18:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdj > > > [19:0:0:0] disk Seagate Expansion Desk 9400 /dev/sdk > > > > > > sda is a 6x 1TB RAID5 and sdb is a single 480GB SSD, both on a MegaRAID > > > controller. > > > > > > The rest are 4TB USB drives that I'm experimenting with. > > Which devices did you write when hitting the OOM killer? > sdc, sdd and sde each at max speed, with a little bit of garden variety IO > on sda and sdb. So do I get it right that the majority of the IO is to those slower USB disks? If yes then does lowering the dirty_bytes to something smaller help? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-01 20:26 ` Michal Hocko @ 2016-08-01 21:14 ` Ralf-Peter Rohbeck 2016-08-01 21:27 ` Ralf-Peter Rohbeck 0 siblings, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-01 21:14 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Vlastimil Babka On 01.08.2016 13:26, Michal Hocko wrote: > >> sdc, sdd and sde each at max speed, with a little bit of garden variety IO >> on sda and sdb. > So do I get it right that the majority of the IO is to those slower USB > disks? If yes then does lowering the dirty_bytes to something smaller > help? Yes, the vast majority. I set dirty_bytes to 128MiB and started a fairly IO and memory intensive process and the OOM killer kicked in within a few seconds. Same with 16MiB dirty_bytes and 1MiB. Some additional IO load from my fast subsystem is enough: At 1MiB dirty_bytes, find /btrfs0/ -type f -exec md5sum {} \; was enough (where /btrfs0 is on a LVM2 LV and the PV is on sda.) It read a few dozen files (random stuff with very mixed file sizes, none very big) until the OOM killer kicked in. I'll try 4.6. Ralf-Peter ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-01 21:14 ` Ralf-Peter Rohbeck @ 2016-08-01 21:27 ` Ralf-Peter Rohbeck 2016-08-02 7:10 ` Michal Hocko 0 siblings, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-01 21:27 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Vlastimil Babka On 01.08.2016 14:14, Ralf-Peter Rohbeck wrote: > On 01.08.2016 13:26, Michal Hocko wrote: >> >>> sdc, sdd and sde each at max speed, with a little bit of garden >>> variety IO >>> on sda and sdb. >> So do I get it right that the majority of the IO is to those slower USB >> disks? If yes then does lowering the dirty_bytes to something smaller >> help? > > Yes, the vast majority. > > I set dirty_bytes to 128MiB and started a fairly IO and memory > intensive process and the OOM killer kicked in within a few seconds. > > Same with 16MiB dirty_bytes and 1MiB. > > Some additional IO load from my fast subsystem is enough: > > At 1MiB dirty_bytes, > > find /btrfs0/ -type f -exec md5sum {} \; > > was enough (where /btrfs0 is on a LVM2 LV and the PV is on sda.) It > read a few dozen files (random stuff with very mixed file sizes, none > very big) until the OOM killer kicked in. > > I'll try 4.6. With Debian 4.6.0.1 (4.6.4-1) it works: Writing to 3 USB drives and running each of the 3 tests that triggered the OOM killer in parallel, with default dirty settings. Ralf-Peter ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-01 21:27 ` Ralf-Peter Rohbeck @ 2016-08-02 7:10 ` Michal Hocko 2016-08-02 19:25 ` Ralf-Peter Rohbeck 0 siblings, 1 reply; 50+ messages in thread From: Michal Hocko @ 2016-08-02 7:10 UTC (permalink / raw) To: Ralf-Peter Rohbeck; +Cc: linux-mm, Vlastimil Babka On Mon 01-08-16 14:27:51, Ralf-Peter Rohbeck wrote: > On 01.08.2016 14:14, Ralf-Peter Rohbeck wrote: > > On 01.08.2016 13:26, Michal Hocko wrote: > > > > > > > sdc, sdd and sde each at max speed, with a little bit of garden > > > > variety IO > > > > on sda and sdb. > > > So do I get it right that the majority of the IO is to those slower USB > > > disks? If yes then does lowering the dirty_bytes to something smaller > > > help? > > > > Yes, the vast majority. > > > > I set dirty_bytes to 128MiB and started a fairly IO and memory intensive > > process and the OOM killer kicked in within a few seconds. > > > > Same with 16MiB dirty_bytes and 1MiB. > > > > Some additional IO load from my fast subsystem is enough: > > > > At 1MiB dirty_bytes, > > > > find /btrfs0/ -type f -exec md5sum {} \; > > > > was enough (where /btrfs0 is on a LVM2 LV and the PV is on sda.) It read > > a few dozen files (random stuff with very mixed file sizes, none very > > big) until the OOM killer kicked in. > > > > I'll try 4.6. > > With Debian 4.6.0.1 (4.6.4-1) it works: Writing to 3 USB drives and running > each of the 3 tests that triggered the OOM killer in parallel, with default > dirty settings. Thanks for retesting! Now that it seems you are able to reproduce this, could you do some experiments, please? First of all it would be great to find out why we do not retry the compaction and whether it could make some progress. The patch below will tell us the first part. Tracepoints can tell us the other part. Vlastimil, could you recommend some which would give us some hints without generating way too much output? --- diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8b3e1341b754..a10b29a918d4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3274,6 +3274,7 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags, *migrate_mode = MIGRATE_SYNC_LIGHT; return true; } + pr_info("XXX: compaction_failed\n"); return false; } @@ -3283,8 +3284,12 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags, * But do not retry if the given zonelist is not suitable for * compaction. */ - if (compaction_withdrawn(compact_result)) - return compaction_zonelist_suitable(ac, order, alloc_flags); + if (compaction_withdrawn(compact_result)) { + int ret = compaction_zonelist_suitable(ac, order, alloc_flags); + if (!ret) + pr_info("XXX: no zone suitable for compaction\n"); + return ret; + } /* * !costly requests are much more important than __GFP_REPEAT @@ -3299,6 +3304,7 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags, if (compaction_retries <= max_retries) return true; + pr_info("XXX: compaction retries fail after %d\n", compaction_retries); return false; } #else -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-02 7:10 ` Michal Hocko @ 2016-08-02 19:25 ` Ralf-Peter Rohbeck 2016-08-15 4:48 ` Ralf-Peter Rohbeck 0 siblings, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-02 19:25 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Vlastimil Babka I can do that but it'll be later this week. Ralf-Peter On 08/02/2016 12:10 AM, Michal Hocko wrote: > On Mon 01-08-16 14:27:51, Ralf-Peter Rohbeck wrote: >> On 01.08.2016 14:14, Ralf-Peter Rohbeck wrote: >>> On 01.08.2016 13:26, Michal Hocko wrote: >>>>> sdc, sdd and sde each at max speed, with a little bit of garden >>>>> variety IO >>>>> on sda and sdb. >>>> So do I get it right that the majority of the IO is to those slower USB >>>> disks? If yes then does lowering the dirty_bytes to something smaller >>>> help? >>> ADMIN >>> Yes, the vast majority. >>> >>> I set dirty_bytes to 128MiB and started a fairly IO and memory intensive >>> process and the OOM killer kicked in within a few seconds. >>> >>> Same with 16MiB dirty_bytes and 1MiB. >>> >>> Some additional IO load from my fast subsystem is enough: >>> >>> At 1MiB dirty_bytes, >>> >>> find /btrfs0/ -type f -exec md5sum {} \; >>> >>> was enough (where /btrfs0 is on a LVM2 LV and the PV is on sda.) It read >>> a few dozen files (random stuff with very mixed file sizes, none very >>> big) until the OOM killer kicked in. >>> >>> I'll try 4.6. >> With Debian 4.6.0.1 (4.6.4-1) it works: Writing to 3 USB drives and running >> each of the 3 tests that triggered the OOM killer in parallel, with default >> dirty settings. > Thanks for retesting! Now that it seems you are able to reproduce this, > could you do some experiments, please? First of all it would be great to > find out why we do not retry the compaction and whether it could make > some progress. The patch below will tell us the first part. Tracepoints > can tell us the other part. Vlastimil, could you recommend some which > would give us some hints without generating way too much output? > --- > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8b3e1341b754..a10b29a918d4 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3274,6 +3274,7 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags, > *migrate_mode = MIGRATE_SYNC_LIGHT; > return true; > } > + pr_info("XXX: compaction_failed\n"); > return false; > } > > @@ -3283,8 +3284,12 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags, > * But do not retry if the given zonelist is not suitable for > * compaction. > */ > - if (compaction_withdrawn(compact_result)) > - return compaction_zonelist_suitable(ac, order, alloc_flags); > + if (compaction_withdrawn(compact_result)) { > + int ret = compaction_zonelist_suitable(ac, order, alloc_flags); > + if (!ret) > + pr_info("XXX: no zone suitable for compaction\n"); > + return ret; > + } > > /* > * !costly requests are much more important than __GFP_REPEAT > @@ -3299,6 +3304,7 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags, > if (compaction_retries <= max_retries) > return true; > > + pr_info("XXX: compaction retries fail after %d\n", compaction_retries); > return false; > } > #else > ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-02 19:25 ` Ralf-Peter Rohbeck @ 2016-08-15 4:48 ` Ralf-Peter Rohbeck 2016-08-15 9:16 ` Vlastimil Babka 0 siblings, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-15 4:48 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Vlastimil Babka On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote: > I can do that but it'll be later this week. > > Ralf-Peter > On 08/02/2016 12:10 AM, Michal Hocko wrote: >> On Mon 01-08-16 14:27:51, Ralf-Peter Rohbeck wrote: >>> On 01.08.2016 14:14, Ralf-Peter Rohbeck wrote: >>>> On 01.08.2016 13:26, Michal Hocko wrote: >>>>>> sdc, sdd and sde each at max speed, with a little bit of garden >>>>>> variety IO >>>>>> on sda and sdb. >>>>> So do I get it right that the majority of the IO is to those >>>>> slower USB >>>>> disks? If yes then does lowering the dirty_bytes to something >>>>> smaller >>>>> help? >>>> ADMIN >>>> Yes, the vast majority. >>>> >>>> I set dirty_bytes to 128MiB and started a fairly IO and memory >>>> intensive >>>> process and the OOM killer kicked in within a few seconds. >>>> >>>> Same with 16MiB dirty_bytes and 1MiB. >>>> >>>> Some additional IO load from my fast subsystem is enough: >>>> >>>> At 1MiB dirty_bytes, >>>> >>>> find /btrfs0/ -type f -exec md5sum {} \; >>>> >>>> was enough (where /btrfs0 is on a LVM2 LV and the PV is on sda.) It >>>> read >>>> a few dozen files (random stuff with very mixed file sizes, none very >>>> big) until the OOM killer kicked in. >>>> >>>> I'll try 4.6. >>> With Debian 4.6.0.1 (4.6.4-1) it works: Writing to 3 USB drives and >>> running >>> each of the 3 tests that triggered the OOM killer in parallel, with >>> default >>> dirty settings. >> Thanks for retesting! Now that it seems you are able to reproduce this, >> could you do some experiments, please? First of all it would be great to >> find out why we do not retry the compaction and whether it could make >> some progress. The patch below will tell us the first part. Tracepoints >> can tell us the other part. Vlastimil, could you recommend some which >> would give us some hints without generating way too much output? >> --- >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 8b3e1341b754..a10b29a918d4 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -3274,6 +3274,7 @@ should_compact_retry(struct alloc_context *ac, >> int order, int alloc_flags, >> *migrate_mode = MIGRATE_SYNC_LIGHT; >> return true; >> } >> + pr_info("XXX: compaction_failed\n"); >> return false; >> } >> @@ -3283,8 +3284,12 @@ should_compact_retry(struct alloc_context >> *ac, int order, int alloc_flags, >> * But do not retry if the given zonelist is not suitable for >> * compaction. >> */ >> - if (compaction_withdrawn(compact_result)) >> - return compaction_zonelist_suitable(ac, order, alloc_flags); >> + if (compaction_withdrawn(compact_result)) { >> + int ret = compaction_zonelist_suitable(ac, order, alloc_flags); >> + if (!ret) >> + pr_info("XXX: no zone suitable for compaction\n"); >> + return ret; >> + } >> /* >> * !costly requests are much more important than __GFP_REPEAT >> @@ -3299,6 +3304,7 @@ should_compact_retry(struct alloc_context *ac, >> int order, int alloc_flags, >> if (compaction_retries <= max_retries) >> return true; >> + pr_info("XXX: compaction retries fail after %d\n", >> compaction_retries); >> return false; >> } >> #else >> > Took me a little longer than expected due to work. The failure wouldn't happen for a while and so I started a couple of scripts and let them run. When I checked today the server didn't respond on the network and sure enough it had killed everything. This is with 4.7.0 with the config based on Debian 4.7-rc7. trace_pipe got a little big (5GB) so I uploaded the logs to https://filebin.net/box0wycfouvhl6sr/OOM_4.7.0.tar.bz2. before_btrfs is before the btrfs filesystems were mounted. I did run a btrfs balance because it creates IO load and I needed to balance anyway. Maybe that's what caused it? I'll make the changes requested by Michal and try again. Thanks, Ralf-Peter ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-15 4:48 ` Ralf-Peter Rohbeck @ 2016-08-15 9:16 ` Vlastimil Babka 2016-08-15 15:01 ` Michal Hocko 2016-08-16 3:12 ` Joonsoo Kim 0 siblings, 2 replies; 50+ messages in thread From: Vlastimil Babka @ 2016-08-15 9:16 UTC (permalink / raw) To: Ralf-Peter Rohbeck, Michal Hocko; +Cc: linux-mm On 08/15/2016 06:48 AM, Ralf-Peter Rohbeck wrote: > On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote: >> > Took me a little longer than expected due to work. The failure wouldn't > happen for a while and so I started a couple of scripts and let them > run. When I checked today the server didn't respond on the network and > sure enough it had killed everything. This is with 4.7.0 with the config > based on Debian 4.7-rc7. > > trace_pipe got a little big (5GB) so I uploaded the logs to > https://filebin.net/box0wycfouvhl6sr/OOM_4.7.0.tar.bz2. before_btrfs is > before the btrfs filesystems were mounted. > I did run a btrfs balance because it creates IO load and I needed to > balance anyway. Maybe that's what caused it? pgmigrate_success 46738962 pgmigrate_fail 135649772 compact_migrate_scanned 309726659 compact_free_scanned 9715615169 compact_isolated 229689596 compact_stall 4777 compact_fail 3068 compact_success 1709 compact_daemon_wake 207834 The migration failures are quite enormous. Very quick analysis of the trace seems to confirm that these are mostly "real", as opposed to result of failure to isolate free pages for migration targets, although the free scanner spent a lot of time: > grep "nr_failed=32" -B1 trace_pipe.log | grep isolate_freepages.*nr_taken=0 | wc -l 3246 So is it one of the cases where fs is unable to migrate dirty/writeback pages? Vlastimil > I'll make the changes requested by Michal and try again. > > Thanks, > Ralf-Peter > > > ---------------------------------------------------------------------- > The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-15 9:16 ` Vlastimil Babka @ 2016-08-15 15:01 ` Michal Hocko 2016-08-15 18:42 ` Ralf-Peter Rohbeck 2016-08-16 3:12 ` Joonsoo Kim 1 sibling, 1 reply; 50+ messages in thread From: Michal Hocko @ 2016-08-15 15:01 UTC (permalink / raw) To: Vlastimil Babka; +Cc: Ralf-Peter Rohbeck, linux-mm On Mon 15-08-16 11:16:36, Vlastimil Babka wrote: > On 08/15/2016 06:48 AM, Ralf-Peter Rohbeck wrote: > > On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote: > >> > > Took me a little longer than expected due to work. The failure wouldn't > > happen for a while and so I started a couple of scripts and let them > > run. When I checked today the server didn't respond on the network and > > sure enough it had killed everything. This is with 4.7.0 with the config > > based on Debian 4.7-rc7. > > > > trace_pipe got a little big (5GB) so I uploaded the logs to > > https://filebin.net/box0wycfouvhl6sr/OOM_4.7.0.tar.bz2. before_btrfs is > > before the btrfs filesystems were mounted. > > I did run a btrfs balance because it creates IO load and I needed to > > balance anyway. Maybe that's what caused it? > > pgmigrate_success 46738962 > pgmigrate_fail 135649772 > compact_migrate_scanned 309726659 > compact_free_scanned 9715615169 > compact_isolated 229689596 > compact_stall 4777 > compact_fail 3068 > compact_success 1709 > compact_daemon_wake 207834 > > The migration failures are quite enormous. Very quick analysis of the > trace seems to confirm that these are mostly "real", as opposed to result > of failure to isolate free pages for migration targets, although the free > scanner spent a lot of time: > > > grep "nr_failed=32" -B1 trace_pipe.log | grep isolate_freepages.*nr_taken=0 | wc -l > 3246 > > So is it one of the cases where fs is unable to migrate dirty/writeback pages? It smells that way. Now we should find out why and what can we do about that. I suspect that try_to_release_page is not able to release the page for migration. Btrfs doesn't seem to have migratepage for page cache pages so it should go via fallback_migrate_page. The following diff should tell us whether this is really the case. Just open trace_pipe and see whether this path really triggered. --- diff --git a/mm/migrate.c b/mm/migrate.c index 72c09dea6526..120e2e5fcbea 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -729,8 +729,10 @@ static int fallback_migrate_page(struct address_space *mapping, * We must have no buffers or drop them. */ if (page_has_private(page) && - !try_to_release_page(page, GFP_KERNEL)) + !try_to_release_page(page, GFP_KERNEL)) { + trace_printk("try_to_release_page failed for a_ops:%pS\n", page->a_ops); return -EAGAIN; + } return migrate_page(mapping, newpage, page, mode); } -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-15 15:01 ` Michal Hocko @ 2016-08-15 18:42 ` Ralf-Peter Rohbeck 2016-08-16 7:32 ` Michal Hocko 0 siblings, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-15 18:42 UTC (permalink / raw) To: Michal Hocko, Vlastimil Babka; +Cc: linux-mm [-- Attachment #1: Type: text/plain, Size: 3689 bytes --] This time the OOM killer hit much quicker. No btrfs balance, just compiling the kernel with the new change did it. Much smaller logs so I'm attaching them. Ralf-Peter On 15.08.2016 08:01, Michal Hocko wrote: > On Mon 15-08-16 11:16:36, Vlastimil Babka wrote: >> On 08/15/2016 06:48 AM, Ralf-Peter Rohbeck wrote: >>> On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote: >>> Took me a little longer than expected due to work. The failure wouldn't >>> happen for a while and so I started a couple of scripts and let them >>> run. When I checked today the server didn't respond on the network and >>> sure enough it had killed everything. This is with 4.7.0 with the config >>> based on Debian 4.7-rc7. >>> >>> trace_pipe got a little big (5GB) so I uploaded the logs to >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__filebin.net_box0wycfouvhl6sr_OOM-5F4.7.0.tar.bz2&d=DQIBAg&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=yGQdEpZknbtYvR0TyhkCGu-ifLklIvXIf740poRFltQ&m=TBVC4CIIUzJlmpDNapp31jIbz3Gy1M-aQ9jhrv0U56I&s=ozhbhqcuwlWiU1Cd8PZGl5-CC69-m-sNUitSYI2ry1Y&e= . before_btrfs is >>> before the btrfs filesystems were mounted. >>> I did run a btrfs balance because it creates IO load and I needed to >>> balance anyway. Maybe that's what caused it? >> pgmigrate_success 46738962 >> pgmigrate_fail 135649772 >> compact_migrate_scanned 309726659 >> compact_free_scanned 9715615169 >> compact_isolated 229689596 >> compact_stall 4777 >> compact_fail 3068 >> compact_success 1709 >> compact_daemon_wake 207834 >> >> The migration failures are quite enormous. Very quick analysis of the >> trace seems to confirm that these are mostly "real", as opposed to result >> of failure to isolate free pages for migration targets, although the free >> scanner spent a lot of time: >> >>> grep "nr_failed=32" -B1 trace_pipe.log | grep isolate_freepages.*nr_taken=0 | wc -l >> 3246 >> >> So is it one of the cases where fs is unable to migrate dirty/writeback pages? > It smells that way. Now we should find out why and what can we do about > that. I suspect that try_to_release_page is not able to release the page > for migration. Btrfs doesn't seem to have migratepage for page cache > pages so it should go via fallback_migrate_page. > > The following diff should tell us whether this is really the case. Just > open trace_pipe and see whether this path really triggered. > --- > diff --git a/mm/migrate.c b/mm/migrate.c > index 72c09dea6526..120e2e5fcbea 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -729,8 +729,10 @@ static int fallback_migrate_page(struct address_space *mapping, > * We must have no buffers or drop them. > */ > if (page_has_private(page) && > - !try_to_release_page(page, GFP_KERNEL)) > + !try_to_release_page(page, GFP_KERNEL)) { > + trace_printk("try_to_release_page failed for a_ops:%pS\n", page->a_ops); > return -EAGAIN; > + } > > return migrate_page(mapping, newpage, page, mode); > } ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. [-- Attachment #2: OOM_4.7.0_p1.tar.bz2 --] [-- Type: application/x-bzip, Size: 2325210 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-15 18:42 ` Ralf-Peter Rohbeck @ 2016-08-16 7:32 ` Michal Hocko 2016-08-16 7:43 ` Michal Hocko 2016-08-17 0:26 ` Ralf-Peter Rohbeck 0 siblings, 2 replies; 50+ messages in thread From: Michal Hocko @ 2016-08-16 7:32 UTC (permalink / raw) To: Ralf-Peter Rohbeck; +Cc: Vlastimil Babka, linux-mm On Mon 15-08-16 11:42:11, Ralf-Peter Rohbeck wrote: > This time the OOM killer hit much quicker. No btrfs balance, just compiling > the kernel with the new change did it. > Much smaller logs so I'm attaching them. Just to clarify. You have added the trace_printk for try_to_release_page, right? (after fixing it of course). If yes there is no single mention of that path failing which would support Joonsoo's theory... Could you try with his patch? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-16 7:32 ` Michal Hocko @ 2016-08-16 7:43 ` Michal Hocko 2016-08-17 9:14 ` Ralf-Peter Rohbeck 2016-08-17 0:26 ` Ralf-Peter Rohbeck 1 sibling, 1 reply; 50+ messages in thread From: Michal Hocko @ 2016-08-16 7:43 UTC (permalink / raw) To: Ralf-Peter Rohbeck; +Cc: Vlastimil Babka, linux-mm On Tue 16-08-16 09:32:46, Michal Hocko wrote: > On Mon 15-08-16 11:42:11, Ralf-Peter Rohbeck wrote: > > This time the OOM killer hit much quicker. No btrfs balance, just compiling > > the kernel with the new change did it. > > Much smaller logs so I'm attaching them. > > Just to clarify. You have added the trace_printk for > try_to_release_page, right? (after fixing it of course). If yes there is > no single mention of that path failing which would support Joonsoo's > theory... Could you try with his patch? And then it would be great if you could test with the current linux-next tree. Vlastimil has done some changes which might help. But even if they don't then it would be better to add more changes on top of them. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-16 7:43 ` Michal Hocko @ 2016-08-17 9:14 ` Ralf-Peter Rohbeck 2016-08-17 9:23 ` Vlastimil Babka 0 siblings, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-17 9:14 UTC (permalink / raw) To: Michal Hocko; +Cc: Vlastimil Babka, linux-mm [-- Attachment #1: Type: text/plain, Size: 1575 bytes --] On 16.08.2016 00:43, Michal Hocko wrote: > On Tue 16-08-16 09:32:46, Michal Hocko wrote: >> On Mon 15-08-16 11:42:11, Ralf-Peter Rohbeck wrote: >>> This time the OOM killer hit much quicker. No btrfs balance, just compiling >>> the kernel with the new change did it. >>> Much smaller logs so I'm attaching them. >> Just to clarify. You have added the trace_printk for >> try_to_release_page, right? (after fixing it of course). If yes there is >> no single mention of that path failing which would support Joonsoo's >> theory... Could you try with his patch? > And then it would be great if you could test with the current linux-next > tree. Vlastimil has done some changes which might help. But even if they > don't then it would be better to add more changes on top of them. Results with 4.8.0-rc2 are attached. OOM happened rather quickly. Ralf-Peter ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. [-- Attachment #2: OOM_4.8.0-rc2.tar.bz2 --] [-- Type: application/x-bzip, Size: 83610 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-17 9:14 ` Ralf-Peter Rohbeck @ 2016-08-17 9:23 ` Vlastimil Babka 2016-08-17 9:28 ` Ralf-Peter Rohbeck 0 siblings, 1 reply; 50+ messages in thread From: Vlastimil Babka @ 2016-08-17 9:23 UTC (permalink / raw) To: Ralf-Peter Rohbeck, Michal Hocko; +Cc: linux-mm On 08/17/2016 11:14 AM, Ralf-Peter Rohbeck wrote: > On 16.08.2016 00:43, Michal Hocko wrote: >> On Tue 16-08-16 09:32:46, Michal Hocko wrote: >>> On Mon 15-08-16 11:42:11, Ralf-Peter Rohbeck wrote: >>>> This time the OOM killer hit much quicker. No btrfs balance, just compiling >>>> the kernel with the new change did it. >>>> Much smaller logs so I'm attaching them. >>> Just to clarify. You have added the trace_printk for >>> try_to_release_page, right? (after fixing it of course). If yes there is >>> no single mention of that path failing which would support Joonsoo's >>> theory... Could you try with his patch? >> And then it would be great if you could test with the current linux-next >> tree. Vlastimil has done some changes which might help. But even if they >> don't then it would be better to add more changes on top of them. > > Results with 4.8.0-rc2 are attached. OOM happened rather quickly. 4.8.0-rc2 is not "linux-next". What Michal meant is the linux-next git (there's no tarball on kernel.org for it): git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > Ralf-Peter > > ---------------------------------------------------------------------- > The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-17 9:23 ` Vlastimil Babka @ 2016-08-17 9:28 ` Ralf-Peter Rohbeck 2016-08-17 9:33 ` Michal Hocko 0 siblings, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-17 9:28 UTC (permalink / raw) To: Vlastimil Babka, Michal Hocko; +Cc: linux-mm On 17.08.2016 02:23, Vlastimil Babka wrote: > On 08/17/2016 11:14 AM, Ralf-Peter Rohbeck wrote: >> On 16.08.2016 00:43, Michal Hocko wrote: >>> On Tue 16-08-16 09:32:46, Michal Hocko wrote: >>>> On Mon 15-08-16 11:42:11, Ralf-Peter Rohbeck wrote: >>>>> This time the OOM killer hit much quicker. No btrfs balance, just >>>>> compiling >>>>> the kernel with the new change did it. >>>>> Much smaller logs so I'm attaching them. >>>> Just to clarify. You have added the trace_printk for >>>> try_to_release_page, right? (after fixing it of course). If yes >>>> there is >>>> no single mention of that path failing which would support Joonsoo's >>>> theory... Could you try with his patch? >>> And then it would be great if you could test with the current >>> linux-next >>> tree. Vlastimil has done some changes which might help. But even if >>> they >>> don't then it would be better to add more changes on top of them. >> >> Results with 4.8.0-rc2 are attached. OOM happened rather quickly. > > 4.8.0-rc2 is not "linux-next". What Michal meant is the linux-next git > (there's no tarball on kernel.org for it): > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Hmm. I added linux-next git, fetched it etc but apparently I didn't check out the right branch. Do you want next-20160817? ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-17 9:28 ` Ralf-Peter Rohbeck @ 2016-08-17 9:33 ` Michal Hocko 2016-08-17 23:37 ` Ralf-Peter Rohbeck 0 siblings, 1 reply; 50+ messages in thread From: Michal Hocko @ 2016-08-17 9:33 UTC (permalink / raw) To: Ralf-Peter Rohbeck; +Cc: Vlastimil Babka, linux-mm On Wed 17-08-16 02:28:35, Ralf-Peter Rohbeck wrote: > On 17.08.2016 02:23, Vlastimil Babka wrote: [...] > > 4.8.0-rc2 is not "linux-next". What Michal meant is the linux-next git > > (there's no tarball on kernel.org for it): > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > > Hmm. I added linux-next git, fetched it etc but apparently I didn't check > out the right branch. Do you want next-20160817? Yes this one should be OK. It contains Vlastimil's patches. Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-17 9:33 ` Michal Hocko @ 2016-08-17 23:37 ` Ralf-Peter Rohbeck 2016-08-18 6:57 ` Vlastimil Babka 0 siblings, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-17 23:37 UTC (permalink / raw) To: Michal Hocko; +Cc: Vlastimil Babka, linux-mm On 17.08.2016 02:33, Michal Hocko wrote: > On Wed 17-08-16 02:28:35, Ralf-Peter Rohbeck wrote: >> On 17.08.2016 02:23, Vlastimil Babka wrote: > [...] >>> 4.8.0-rc2 is not "linux-next". What Michal meant is the linux-next git >>> (there's no tarball on kernel.org for it): >>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git >> Hmm. I added linux-next git, fetched it etc but apparently I didn't check >> out the right branch. Do you want next-20160817? > Yes this one should be OK. It contains Vlastimil's patches. > > Thanks! This has been working so far. I built a kernel successfully, with dd writing to two drives. There were a number of messages in the trace pipe but compaction/migration always succeeded it seems. I'll run the big torture test overnight. Ralf-Peter ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-17 23:37 ` Ralf-Peter Rohbeck @ 2016-08-18 6:57 ` Vlastimil Babka 2016-08-18 20:01 ` Ralf-Peter Rohbeck 0 siblings, 1 reply; 50+ messages in thread From: Vlastimil Babka @ 2016-08-18 6:57 UTC (permalink / raw) To: Ralf-Peter Rohbeck, Michal Hocko; +Cc: linux-mm On 08/18/2016 01:37 AM, Ralf-Peter Rohbeck wrote: > On 17.08.2016 02:33, Michal Hocko wrote: >> On Wed 17-08-16 02:28:35, Ralf-Peter Rohbeck wrote: >>> On 17.08.2016 02:23, Vlastimil Babka wrote: >> [...] >>>> 4.8.0-rc2 is not "linux-next". What Michal meant is the linux-next git >>>> (there's no tarball on kernel.org for it): >>>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git >>> Hmm. I added linux-next git, fetched it etc but apparently I didn't check >>> out the right branch. Do you want next-20160817? >> Yes this one should be OK. It contains Vlastimil's patches. >> >> Thanks! > > This has been working so far. I built a kernel successfully, with dd > writing to two drives. There were a number of messages in the trace pipe > but compaction/migration always succeeded it seems. > I'll run the big torture test overnight. Good news, thanks. Did you also apply Joonsoo's suggested removal of suitable_migration_target() check, or is this just the linux-next version with added trace_printk()/pr_info()? Vlastimil > Ralf-Peter > > ---------------------------------------------------------------------- > The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-18 6:57 ` Vlastimil Babka @ 2016-08-18 20:01 ` Ralf-Peter Rohbeck 2016-08-18 20:12 ` Vlastimil Babka 0 siblings, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-18 20:01 UTC (permalink / raw) To: Vlastimil Babka, Michal Hocko; +Cc: linux-mm On 17.08.2016 23:57, Vlastimil Babka wrote: >>>> Hmm. I added linux-next git, fetched it etc but apparently I didn't check >>>> out the right branch. Do you want next-20160817? >>> Yes this one should be OK. It contains Vlastimil's patches. >>> >>> Thanks! >> This has been working so far. I built a kernel successfully, with dd >> writing to two drives. There were a number of messages in the trace pipe >> but compaction/migration always succeeded it seems. >> I'll run the big torture test overnight. > Good news, thanks. Did you also apply Joonsoo's suggested removal of > suitable_migration_target() check, or is this just the linux-next > version with added trace_printk()/pr_info()? > > Vlastimil Yes, that change was in my test with linux-next-20160817. Here's the diff: diff --git a/mm/compaction.c b/mm/compaction.c index f94ae67..60a9ca2 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1083,8 +1083,10 @@ static void isolate_freepages(struct compact_control *cc) continue; /* Check the block is suitable for migration */ +/* if (!suitable_migration_target(page)) continue; +*/ /* If isolation recently failed, do not retry */ if (!isolation_suitable(cc, page)) diff --git a/mm/migrate.c b/mm/migrate.c index f7ee04a..b1176a4 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -827,8 +827,10 @@ static int fallback_migrate_page(struct address_space *mapping, * We must have no buffers or drop them. */ if (page_has_private(page) && - !try_to_release_page(page, GFP_KERNEL)) + !try_to_release_page(page, GFP_KERNEL)) { + trace_printk("try_to_release_page failed for a_ops:%pS\n", page->mapping->a_ops); return -EAGAIN; + } return migrate_page(mapping, newpage, page, mode); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5637733..b443652 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3202,8 +3202,12 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags, * But do not retry if the given zonelist is not suitable for * compaction. */ - if (compaction_withdrawn(compact_result)) - return compaction_zonelist_suitable(ac, order, alloc_flags); + if (compaction_withdrawn(compact_result)) { + int ret = compaction_zonelist_suitable(ac, order, alloc_flags); + if (!ret) + pr_info("XXX: no zone suitable for compaction\n"); + return ret; + } /* * !costly requests are much more important than __GFP_REPEAT @@ -3227,6 +3231,7 @@ check_priority: (*compact_priority)--; return true; } + pr_info("XXX: compaction retries fail after %d\n", compaction_retries); return false; } #else It ran the whole night with continuous torture tests and writing to two drives. No OOM. Logs are at https://filebin.net/l2kp3iit8dj0fq6q/OOM_4.8.0-next-20160817.tar.bz2. Thanks for fixing this! Ralf-Peter ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-18 20:01 ` Ralf-Peter Rohbeck @ 2016-08-18 20:12 ` Vlastimil Babka 2016-08-19 2:42 ` Ralf-Peter Rohbeck 0 siblings, 1 reply; 50+ messages in thread From: Vlastimil Babka @ 2016-08-18 20:12 UTC (permalink / raw) To: Ralf-Peter Rohbeck, Michal Hocko; +Cc: linux-mm On 18.8.2016 22:01, Ralf-Peter Rohbeck wrote: > On 17.08.2016 23:57, Vlastimil Babka wrote: >>>>> Hmm. I added linux-next git, fetched it etc but apparently I didn't check >>>>> out the right branch. Do you want next-20160817? >>>> Yes this one should be OK. It contains Vlastimil's patches. >>>> >>>> Thanks! >>> This has been working so far. I built a kernel successfully, with dd >>> writing to two drives. There were a number of messages in the trace pipe >>> but compaction/migration always succeeded it seems. >>> I'll run the big torture test overnight. >> Good news, thanks. Did you also apply Joonsoo's suggested removal of >> suitable_migration_target() check, or is this just the linux-next >> version with added trace_printk()/pr_info()? >> >> Vlastimil > Yes, that change was in my test with linux-next-20160817. Here's the diff: > > diff --git a/mm/compaction.c b/mm/compaction.c > index f94ae67..60a9ca2 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1083,8 +1083,10 @@ static void isolate_freepages(struct > compact_control *cc) > continue; > > /* Check the block is suitable for migration */ > +/* > if (!suitable_migration_target(page)) > continue; > +*/ OK, could you please also try if uncommenting the above still works without OOM? Or just plain linux-next-20160817, I guess we don't need the printk's to test this difference. Thanks a lot! Vlastimil -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-18 20:12 ` Vlastimil Babka @ 2016-08-19 2:42 ` Ralf-Peter Rohbeck 2016-08-19 6:27 ` Vlastimil Babka 0 siblings, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-19 2:42 UTC (permalink / raw) To: Vlastimil Babka, Michal Hocko; +Cc: linux-mm [-- Attachment #1: Type: text/plain, Size: 2365 bytes --] On 18.08.2016 13:12, Vlastimil Babka wrote: > On 18.8.2016 22:01, Ralf-Peter Rohbeck wrote: >> On 17.08.2016 23:57, Vlastimil Babka wrote: >>>>>> Hmm. I added linux-next git, fetched it etc but apparently I didn't check >>>>>> out the right branch. Do you want next-20160817? >>>>> Yes this one should be OK. It contains Vlastimil's patches. >>>>> >>>>> Thanks! >>>> This has been working so far. I built a kernel successfully, with dd >>>> writing to two drives. There were a number of messages in the trace pipe >>>> but compaction/migration always succeeded it seems. >>>> I'll run the big torture test overnight. >>> Good news, thanks. Did you also apply Joonsoo's suggested removal of >>> suitable_migration_target() check, or is this just the linux-next >>> version with added trace_printk()/pr_info()? >>> >>> Vlastimil >> Yes, that change was in my test with linux-next-20160817. Here's the diff: >> >> diff --git a/mm/compaction.c b/mm/compaction.c >> index f94ae67..60a9ca2 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -1083,8 +1083,10 @@ static void isolate_freepages(struct >> compact_control *cc) >> continue; >> >> /* Check the block is suitable for migration */ >> +/* >> if (!suitable_migration_target(page)) >> continue; >> +*/ > OK, could you please also try if uncommenting the above still works without OOM? > Or just plain linux-next-20160817, I guess we don't need the printk's to test > this difference. > > Thanks a lot! > Vlastimil > With the two lines back in I had OOMs again. See the attached logs. Ralf-Peter ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. [-- Attachment #2: OOM_4.8.0-next-20160817_p2.tar.bz2 --] [-- Type: application/x-bzip, Size: 59669 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-19 2:42 ` Ralf-Peter Rohbeck @ 2016-08-19 6:27 ` Vlastimil Babka 2016-08-19 7:33 ` Michal Hocko 2016-08-23 5:02 ` Joonsoo Kim 0 siblings, 2 replies; 50+ messages in thread From: Vlastimil Babka @ 2016-08-19 6:27 UTC (permalink / raw) To: Ralf-Peter Rohbeck, Michal Hocko; +Cc: linux-mm On 08/19/2016 04:42 AM, Ralf-Peter Rohbeck wrote: > On 18.08.2016 13:12, Vlastimil Babka wrote: >> On 18.8.2016 22:01, Ralf-Peter Rohbeck wrote: >>> On 17.08.2016 23:57, Vlastimil Babka wrote: >>>> Vlastimil >>> Yes, that change was in my test with linux-next-20160817. Here's the diff: >>> >>> diff --git a/mm/compaction.c b/mm/compaction.c >>> index f94ae67..60a9ca2 100644 >>> --- a/mm/compaction.c >>> +++ b/mm/compaction.c >>> @@ -1083,8 +1083,10 @@ static void isolate_freepages(struct >>> compact_control *cc) >>> continue; >>> >>> /* Check the block is suitable for migration */ >>> +/* >>> if (!suitable_migration_target(page)) >>> continue; >>> +*/ >> OK, could you please also try if uncommenting the above still works without OOM? >> Or just plain linux-next-20160817, I guess we don't need the printk's to test >> this difference. >> >> Thanks a lot! >> Vlastimil >> > With the two lines back in I had OOMs again. See the attached logs. Thanks for the confirmation. We however shouldn't disable the heuristic completely, so here's a compromise patch hooking into the new compaction priorities. Can you please test on top of linux-next? -----8<----- ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-19 6:27 ` Vlastimil Babka @ 2016-08-19 7:33 ` Michal Hocko 2016-08-19 7:47 ` Vlastimil Babka 2016-08-23 5:02 ` Joonsoo Kim 1 sibling, 1 reply; 50+ messages in thread From: Michal Hocko @ 2016-08-19 7:33 UTC (permalink / raw) To: Vlastimil Babka; +Cc: Ralf-Peter Rohbeck, linux-mm On Fri 19-08-16 08:27:34, Vlastimil Babka wrote: > On 08/19/2016 04:42 AM, Ralf-Peter Rohbeck wrote: > > On 18.08.2016 13:12, Vlastimil Babka wrote: > >> On 18.8.2016 22:01, Ralf-Peter Rohbeck wrote: > >>> On 17.08.2016 23:57, Vlastimil Babka wrote: > >>>> Vlastimil > >>> Yes, that change was in my test with linux-next-20160817. Here's the diff: > >>> > >>> diff --git a/mm/compaction.c b/mm/compaction.c > >>> index f94ae67..60a9ca2 100644 > >>> --- a/mm/compaction.c > >>> +++ b/mm/compaction.c > >>> @@ -1083,8 +1083,10 @@ static void isolate_freepages(struct > >>> compact_control *cc) > >>> continue; > >>> > >>> /* Check the block is suitable for migration */ > >>> +/* > >>> if (!suitable_migration_target(page)) > >>> continue; > >>> +*/ > >> OK, could you please also try if uncommenting the above still works without OOM? > >> Or just plain linux-next-20160817, I guess we don't need the printk's to test > >> this difference. > >> > >> Thanks a lot! > >> Vlastimil > >> > > With the two lines back in I had OOMs again. See the attached logs. > > Thanks for the confirmation. > > We however shouldn't disable the heuristic completely, so here's a compromise > patch hooking into the new compaction priorities. Can you please test on top of > linux-next? > > -----8<----- > >From 0927cc2a4c6a3247111168eace9012c23d06f9db Mon Sep 17 00:00:00 2001 > From: Vlastimil Babka <vbabka@suse.cz> > Date: Thu, 18 Aug 2016 16:01:14 +0200 > Subject: [PATCH] mm, compaction: make full priority ignore pageblock > suitability > > Ralf-Peter Rohbeck has reported premature OOMs for order-2 allocations (stack) > due to OOM rework in 4.7. In his scenario (parallel kernel build and dd writing > to two drives) many pageblocks get marked as Unmovable and compaction free > scanner struggles to isolate free pages. Joonsoo Kim pointed out that the free > scanner skips pageblocks that are not movable to prevent filling them and > forcing non-movable allocations to fallback to other pageblocks. Such heuristic > makes sense to help prevent long-term fragmentation, but premature OOMs are > relatively more urgent problem. As a compromise, this patch disables the > heuristic only for the ultimate compaction priority. > > Reported-by: Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com> > Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Thanks to both of you! I do agree that we should drop all these heuristics when we struggle and there is an OOM risk. I have just a small nit here. I would prefer s@COMPACT_PRIO_SYNC_FULL@MIN_COMPACT_PRIORITY@ when disabling them because this would be easier to follow and it would be easier for future changes. Which brings me to another thing I was suggesting earlier. I believe we should go to this MIN_COMPACT_PRIORITY only for !costly requests because costly orders shouldn't get all those exceptions and risk long term fragmentation issues. We do not have that many costly requests (except for hugetlb) so it doesn't matter all that much right now but long term we want to differentiate those I believe. That being said, let's wait for the feedback on this patch + linux-next. If it works out I will send a stable 4.7 patch which drops compaction feedback from should_compact_retry (turn it to the !COMPACTION version) so that 4.7 users do not suffer from the premature OOM and will ask Andrew to sneak the compaction patches to 4.8 as they fix a real issue and the risk is not really high. Acked-by: Michal Hocko <mhocko@suse.com> > --- > mm/compaction.c | 11 ++++++++--- > mm/internal.h | 1 + > 2 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 0bba270f97ad..884b1baa58df 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -997,8 +997,12 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn, > #ifdef CONFIG_COMPACTION > > /* Returns true if the page is within a block suitable for migration to */ > -static bool suitable_migration_target(struct page *page) > +static bool suitable_migration_target(struct compact_control *cc, > + struct page *page) > { > + if (cc->ignore_block_suitable) > + return true; > + > /* If the page is a large free page, then disallow migration */ > if (PageBuddy(page)) { > /* > @@ -1083,7 +1087,7 @@ static void isolate_freepages(struct compact_control *cc) > continue; > > /* Check the block is suitable for migration */ > - if (!suitable_migration_target(page)) > + if (!suitable_migration_target(cc, page)) > continue; > > /* If isolation recently failed, do not retry */ > @@ -1656,7 +1660,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, > .classzone_idx = classzone_idx, > .direct_compaction = true, > .whole_zone = (prio == COMPACT_PRIO_SYNC_FULL), > - .ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL) > + .ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL), > + .ignore_block_suitable = (prio == COMPACT_PRIO_SYNC_FULL) > }; > INIT_LIST_HEAD(&cc.freepages); > INIT_LIST_HEAD(&cc.migratepages); > diff --git a/mm/internal.h b/mm/internal.h > index 5214bf8e3171..537ac9951f5f 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -178,6 +178,7 @@ struct compact_control { > unsigned long last_migrated_pfn;/* Not yet flushed page being freed */ > enum migrate_mode mode; /* Async or sync migration mode */ > bool ignore_skip_hint; /* Scan blocks even if marked skip */ > + bool ignore_block_suitable; /* Scan blocks considered unsuitable */ > bool direct_compaction; /* False from kcompactd or /proc/... */ > bool whole_zone; /* Whole zone should/has been scanned */ > int order; /* order a direct compactor needs */ > -- > 2.9.2 > > > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-19 7:33 ` Michal Hocko @ 2016-08-19 7:47 ` Vlastimil Babka 2016-08-19 8:26 ` Michal Hocko 0 siblings, 1 reply; 50+ messages in thread From: Vlastimil Babka @ 2016-08-19 7:47 UTC (permalink / raw) To: Michal Hocko, Andrew Morton; +Cc: Ralf-Peter Rohbeck, linux-mm, Joonsoo Kim On 08/19/2016 09:33 AM, Michal Hocko wrote: > On Fri 19-08-16 08:27:34, Vlastimil Babka wrote: >> On 08/19/2016 04:42 AM, Ralf-Peter Rohbeck wrote: >>> On 18.08.2016 13:12, Vlastimil Babka wrote: >>>> On 18.8.2016 22:01, Ralf-Peter Rohbeck wrote: >>>>> On 17.08.2016 23:57, Vlastimil Babka wrote: >>>>>> Vlastimil >>>>> Yes, that change was in my test with linux-next-20160817. Here's the diff: >>>>> >>>>> diff --git a/mm/compaction.c b/mm/compaction.c >>>>> index f94ae67..60a9ca2 100644 >>>>> --- a/mm/compaction.c >>>>> +++ b/mm/compaction.c >>>>> @@ -1083,8 +1083,10 @@ static void isolate_freepages(struct >>>>> compact_control *cc) >>>>> continue; >>>>> >>>>> /* Check the block is suitable for migration */ >>>>> +/* >>>>> if (!suitable_migration_target(page)) >>>>> continue; >>>>> +*/ >>>> OK, could you please also try if uncommenting the above still works without OOM? >>>> Or just plain linux-next-20160817, I guess we don't need the printk's to test >>>> this difference. >>>> >>>> Thanks a lot! >>>> Vlastimil >>>> >>> With the two lines back in I had OOMs again. See the attached logs. >> >> Thanks for the confirmation. >> >> We however shouldn't disable the heuristic completely, so here's a compromise >> patch hooking into the new compaction priorities. Can you please test on top of >> linux-next? >> >> -----8<----- >> >From 0927cc2a4c6a3247111168eace9012c23d06f9db Mon Sep 17 00:00:00 2001 >> From: Vlastimil Babka <vbabka@suse.cz> >> Date: Thu, 18 Aug 2016 16:01:14 +0200 >> Subject: [PATCH] mm, compaction: make full priority ignore pageblock >> suitability >> >> Ralf-Peter Rohbeck has reported premature OOMs for order-2 allocations (stack) >> due to OOM rework in 4.7. In his scenario (parallel kernel build and dd writing >> to two drives) many pageblocks get marked as Unmovable and compaction free >> scanner struggles to isolate free pages. Joonsoo Kim pointed out that the free >> scanner skips pageblocks that are not movable to prevent filling them and >> forcing non-movable allocations to fallback to other pageblocks. Such heuristic >> makes sense to help prevent long-term fragmentation, but premature OOMs are >> relatively more urgent problem. As a compromise, this patch disables the >> heuristic only for the ultimate compaction priority. >> >> Reported-by: Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com> >> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > > Thanks to both of you! I do agree that we should drop all these > heuristics when we struggle and there is an OOM risk. I have just a > small nit here. I would prefer > s@COMPACT_PRIO_SYNC_FULL@MIN_COMPACT_PRIORITY@ when disabling them > because this would be easier to follow and it would be easier for future > changes. OK, but then we should start with a change to mm-compaction-add-the-ultimate-direct-compaction-priority.patch (fix at the end of this e-mail) to make things consistent. Then I will apply that to the new patch if it's successfully tested. > Which brings me to another thing I was suggesting earlier. I > believe we should go to this MIN_COMPACT_PRIORITY only for !costly > requests because costly orders shouldn't get all those exceptions and > risk long term fragmentation issues. We do not have that many costly > requests (except for hugetlb) so it doesn't matter all that much right > now but long term we want to differentiate those I believe. I'll send such change afterwards as well. > That being said, let's wait for the feedback on this patch + linux-next. > If it works out I will send a stable 4.7 patch which drops compaction > feedback from should_compact_retry (turn it to the !COMPACTION version) > so that 4.7 users do not suffer from the premature OOM and will ask > Andrew to sneak the compaction patches to 4.8 as they fix a real issue > and the risk is not really high. Agreed. > Acked-by: Michal Hocko <mhocko@suse.com> Thanks! -----8<----- ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-19 7:47 ` Vlastimil Babka @ 2016-08-19 8:26 ` Michal Hocko 2016-08-24 18:13 ` Ralf-Peter Rohbeck 0 siblings, 1 reply; 50+ messages in thread From: Michal Hocko @ 2016-08-19 8:26 UTC (permalink / raw) To: Vlastimil Babka; +Cc: Andrew Morton, Ralf-Peter Rohbeck, linux-mm, Joonsoo Kim On Fri 19-08-16 09:47:59, Vlastimil Babka wrote: > On 08/19/2016 09:33 AM, Michal Hocko wrote: > > On Fri 19-08-16 08:27:34, Vlastimil Babka wrote: > >> On 08/19/2016 04:42 AM, Ralf-Peter Rohbeck wrote: > >>> On 18.08.2016 13:12, Vlastimil Babka wrote: > >>>> On 18.8.2016 22:01, Ralf-Peter Rohbeck wrote: > >>>>> On 17.08.2016 23:57, Vlastimil Babka wrote: > >>>>>> Vlastimil > >>>>> Yes, that change was in my test with linux-next-20160817. Here's the diff: > >>>>> > >>>>> diff --git a/mm/compaction.c b/mm/compaction.c > >>>>> index f94ae67..60a9ca2 100644 > >>>>> --- a/mm/compaction.c > >>>>> +++ b/mm/compaction.c > >>>>> @@ -1083,8 +1083,10 @@ static void isolate_freepages(struct > >>>>> compact_control *cc) > >>>>> continue; > >>>>> > >>>>> /* Check the block is suitable for migration */ > >>>>> +/* > >>>>> if (!suitable_migration_target(page)) > >>>>> continue; > >>>>> +*/ > >>>> OK, could you please also try if uncommenting the above still works without OOM? > >>>> Or just plain linux-next-20160817, I guess we don't need the printk's to test > >>>> this difference. > >>>> > >>>> Thanks a lot! > >>>> Vlastimil > >>>> > >>> With the two lines back in I had OOMs again. See the attached logs. > >> > >> Thanks for the confirmation. > >> > >> We however shouldn't disable the heuristic completely, so here's a compromise > >> patch hooking into the new compaction priorities. Can you please test on top of > >> linux-next? > >> > >> -----8<----- > >> >From 0927cc2a4c6a3247111168eace9012c23d06f9db Mon Sep 17 00:00:00 2001 > >> From: Vlastimil Babka <vbabka@suse.cz> > >> Date: Thu, 18 Aug 2016 16:01:14 +0200 > >> Subject: [PATCH] mm, compaction: make full priority ignore pageblock > >> suitability > >> > >> Ralf-Peter Rohbeck has reported premature OOMs for order-2 allocations (stack) > >> due to OOM rework in 4.7. In his scenario (parallel kernel build and dd writing > >> to two drives) many pageblocks get marked as Unmovable and compaction free > >> scanner struggles to isolate free pages. Joonsoo Kim pointed out that the free > >> scanner skips pageblocks that are not movable to prevent filling them and > >> forcing non-movable allocations to fallback to other pageblocks. Such heuristic > >> makes sense to help prevent long-term fragmentation, but premature OOMs are > >> relatively more urgent problem. As a compromise, this patch disables the > >> heuristic only for the ultimate compaction priority. > >> > >> Reported-by: Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com> > >> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> > >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > > > > Thanks to both of you! I do agree that we should drop all these > > heuristics when we struggle and there is an OOM risk. I have just a > > small nit here. I would prefer > > s@COMPACT_PRIO_SYNC_FULL@MIN_COMPACT_PRIORITY@ when disabling them > > because this would be easier to follow and it would be easier for future > > changes. > > OK, but then we should start with a change to > mm-compaction-add-the-ultimate-direct-compaction-priority.patch > (fix at the end of this e-mail) to make things consistent. > Then I will apply that to the new patch if it's successfully tested. This can go as a separate clean up patch. No need to alter previous patches sitting in the mmotm. > > Which brings me to another thing I was suggesting earlier. I > > believe we should go to this MIN_COMPACT_PRIORITY only for !costly > > requests because costly orders shouldn't get all those exceptions and > > risk long term fragmentation issues. We do not have that many costly > > requests (except for hugetlb) so it doesn't matter all that much right > > now but long term we want to differentiate those I believe. > > I'll send such change afterwards as well. Thanks! > > That being said, let's wait for the feedback on this patch + linux-next. > > If it works out I will send a stable 4.7 patch which drops compaction > > feedback from should_compact_retry (turn it to the !COMPACTION version) > > so that 4.7 users do not suffer from the premature OOM and will ask > > Andrew to sneak the compaction patches to 4.8 as they fix a real issue > > and the risk is not really high. > > Agreed. > > > Acked-by: Michal Hocko <mhocko@suse.com> > > Thanks! > > -----8<----- > >From c4da7022e85e52f5463055cdc474656652e7a504 Mon Sep 17 00:00:00 2001 > From: Vlastimil Babka <vbabka@suse.cz> > Date: Fri, 19 Aug 2016 09:40:31 +0200 > Subject: [PATCH] mm, compaction: add the ultimate direct compaction > priority-fix > > Use the MIN_COMPACT_PRIORITY alias instead of COMPACT_PRIO_SYNC_FULL to > disable heuristics "because this would be easier to follow and it would be > easier for future changes", per Michal. > > Suggested-by: Michal Hocko <mhocko@suse.cz> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > Fixes: mmotm mm-compaction-add-the-ultimate-direct-compaction-priority.patch I guess Fixes is a bit misleading. This is not a bug it is a cleanup patch. Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > --- > mm/compaction.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index ae4f40afcca1..3e35fce2cace 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1644,8 +1644,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, > .alloc_flags = alloc_flags, > .classzone_idx = classzone_idx, > .direct_compaction = true, > - .whole_zone = (prio == COMPACT_PRIO_SYNC_FULL), > - .ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL) > + .whole_zone = (prio == MIN_COMPACT_PRIORITY), > + .ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY) > }; > INIT_LIST_HEAD(&cc.freepages); > INIT_LIST_HEAD(&cc.migratepages); > @@ -1691,7 +1691,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, > ac->nodemask) { > enum compact_result status; > > - if (prio > COMPACT_PRIO_SYNC_FULL > + if (prio > MIN_COMPACT_PRIORITY > && compaction_deferred(zone, order)) { > rc = max_t(enum compact_result, COMPACT_DEFERRED, rc); > continue; > -- > 2.9.2 > > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-19 8:26 ` Michal Hocko @ 2016-08-24 18:13 ` Ralf-Peter Rohbeck 2016-08-25 7:22 ` Michal Hocko 0 siblings, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-24 18:13 UTC (permalink / raw) To: Michal Hocko, Vlastimil Babka; +Cc: Andrew Morton, linux-mm, Joonsoo Kim [-- Attachment #1: Type: text/plain, Size: 3303 bytes --] On 19.08.2016 01:26, Michal Hocko wrote: > >>> That being said, let's wait for the feedback on this patch + linux-next. >>> If it works out I will send a stable 4.7 patch which drops compaction >>> feedback from should_compact_retry (turn it to the !COMPACTION version) >>> so that 4.7 users do not suffer from the premature OOM and will ask >>> Andrew to sneak the compaction patches to 4.8 as they fix a real issue >>> and the risk is not really high. >> Agreed. >> >>> Acked-by: Michal Hocko <mhocko@suse.com> >> Thanks! >> >> -----8<----- >> >From c4da7022e85e52f5463055cdc474656652e7a504 Mon Sep 17 00:00:00 2001 >> From: Vlastimil Babka <vbabka@suse.cz> >> Date: Fri, 19 Aug 2016 09:40:31 +0200 >> Subject: [PATCH] mm, compaction: add the ultimate direct compaction >> priority-fix >> >> Use the MIN_COMPACT_PRIORITY alias instead of COMPACT_PRIO_SYNC_FULL to >> disable heuristics "because this would be easier to follow and it would be >> easier for future changes", per Michal. >> >> Suggested-by: Michal Hocko <mhocko@suse.cz> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> >> Fixes: mmotm mm-compaction-add-the-ultimate-direct-compaction-priority.patch > I guess Fixes is a bit misleading. This is not a bug it is a cleanup > patch. > > Acked-by: Michal Hocko <mhocko@suse.com> > > Thanks! > >> --- >> mm/compaction.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/mm/compaction.c b/mm/compaction.c >> index ae4f40afcca1..3e35fce2cace 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -1644,8 +1644,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, >> .alloc_flags = alloc_flags, >> .classzone_idx = classzone_idx, >> .direct_compaction = true, >> - .whole_zone = (prio == COMPACT_PRIO_SYNC_FULL), >> - .ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL) >> + .whole_zone = (prio == MIN_COMPACT_PRIORITY), >> + .ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY) >> }; >> INIT_LIST_HEAD(&cc.freepages); >> INIT_LIST_HEAD(&cc.migratepages); >> @@ -1691,7 +1691,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, >> ac->nodemask) { >> enum compact_result status; >> >> - if (prio > COMPACT_PRIO_SYNC_FULL >> + if (prio > MIN_COMPACT_PRIORITY >> && compaction_deferred(zone, order)) { >> rc = max_t(enum compact_result, COMPACT_DEFERRED, rc); >> continue; >> -- >> 2.9.2 >> >> This change was in linux-next-20160823 so I ran it unmodified. I did get an OOM, see attached. Thanks, Ralf-Peter ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. [-- Attachment #2: OOM_4.8.0-rc3-next-20160823+.tar.bz2 --] [-- Type: application/x-bzip, Size: 1377662 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-24 18:13 ` Ralf-Peter Rohbeck @ 2016-08-25 7:22 ` Michal Hocko 2016-08-25 20:35 ` Ralf-Peter Rohbeck 0 siblings, 1 reply; 50+ messages in thread From: Michal Hocko @ 2016-08-25 7:22 UTC (permalink / raw) To: Ralf-Peter Rohbeck; +Cc: Vlastimil Babka, Andrew Morton, linux-mm, Joonsoo Kim On Wed 24-08-16 11:13:31, Ralf-Peter Rohbeck wrote: > On 19.08.2016 01:26, Michal Hocko wrote: [...] > > > diff --git a/mm/compaction.c b/mm/compaction.c > > > index ae4f40afcca1..3e35fce2cace 100644 > > > --- a/mm/compaction.c > > > +++ b/mm/compaction.c > > > @@ -1644,8 +1644,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, > > > .alloc_flags = alloc_flags, > > > .classzone_idx = classzone_idx, > > > .direct_compaction = true, > > > - .whole_zone = (prio == COMPACT_PRIO_SYNC_FULL), > > > - .ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL) > > > + .whole_zone = (prio == MIN_COMPACT_PRIORITY), > > > + .ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY) > > > }; > > > INIT_LIST_HEAD(&cc.freepages); > > > INIT_LIST_HEAD(&cc.migratepages); > > > @@ -1691,7 +1691,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, > > > ac->nodemask) { > > > enum compact_result status; > > > - if (prio > COMPACT_PRIO_SYNC_FULL > > > + if (prio > MIN_COMPACT_PRIORITY > > > && compaction_deferred(zone, order)) { > > > rc = max_t(enum compact_result, COMPACT_DEFERRED, rc); > > > continue; > > > -- > > > 2.9.2 > > > > > > > This change was in linux-next-20160823 so I ran it unmodified. > > I did get an OOM, see attached. This patch shouldn't make any difference to the previous patch you were testing. Anyway I do not have the above linux-next tag so I cannot check what exactly was there. The current code in linux-next contains http://lkml.kernel.org/r/20160823074339.GB23577@dhcp22.suse.cz so a different approach. Once that patch hits the Linus tree we will try to resurrect the compaction improvements series in linux-next and continue with the testing. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-25 7:22 ` Michal Hocko @ 2016-08-25 20:35 ` Ralf-Peter Rohbeck 2016-08-26 8:35 ` Michal Hocko 0 siblings, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-25 20:35 UTC (permalink / raw) To: Michal Hocko; +Cc: Vlastimil Babka, Andrew Morton, linux-mm, Joonsoo Kim On 25.08.2016 00:22, Michal Hocko wrote: > On Wed 24-08-16 11:13:31, Ralf-Peter Rohbeck wrote: >> On 19.08.2016 01:26, Michal Hocko wrote: > [...] >>>> diff --git a/mm/compaction.c b/mm/compaction.c >>>> index ae4f40afcca1..3e35fce2cace 100644 >>>> --- a/mm/compaction.c >>>> +++ b/mm/compaction.c >>>> @@ -1644,8 +1644,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, >>>> .alloc_flags = alloc_flags, >>>> .classzone_idx = classzone_idx, >>>> .direct_compaction = true, >>>> - .whole_zone = (prio == COMPACT_PRIO_SYNC_FULL), >>>> - .ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL) >>>> + .whole_zone = (prio == MIN_COMPACT_PRIORITY), >>>> + .ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY) >>>> }; >>>> INIT_LIST_HEAD(&cc.freepages); >>>> INIT_LIST_HEAD(&cc.migratepages); >>>> @@ -1691,7 +1691,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, >>>> ac->nodemask) { >>>> enum compact_result status; >>>> - if (prio > COMPACT_PRIO_SYNC_FULL >>>> + if (prio > MIN_COMPACT_PRIORITY >>>> && compaction_deferred(zone, order)) { >>>> rc = max_t(enum compact_result, COMPACT_DEFERRED, rc); >>>> continue; >>>> -- >>>> 2.9.2 >>>> >>>> >> This change was in linux-next-20160823 so I ran it unmodified. >> >> I did get an OOM, see attached. > This patch shouldn't make any difference to the previous patch you were > testing. Anyway I do not have the above linux-next tag so I cannot check > what exactly was there. The current code in linux-next contains > https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_20160823074339.GB23577-40dhcp22.suse.cz&d=DQIBAg&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=yGQdEpZknbtYvR0TyhkCGu-ifLklIvXIf740poRFltQ&m=CNEWNMAovbVAu8gw1UooufVBqAK0HbH5FJskyAmkR1g&s=S-eqTOP5U79awF_vqBSGNfNrvOe5l60XzVoVa6DuWx4&e= so a > different approach. Once that patch hits the Linus tree we will try to > resurrect the compaction improvements series in linux-next and continue > with the testing. Sorry, the tag was next-20160823; I called the branch linux-next-20160823. Ralf-Peter ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-25 20:35 ` Ralf-Peter Rohbeck @ 2016-08-26 8:35 ` Michal Hocko 2016-09-06 11:09 ` Vlastimil Babka 0 siblings, 1 reply; 50+ messages in thread From: Michal Hocko @ 2016-08-26 8:35 UTC (permalink / raw) To: Ralf-Peter Rohbeck; +Cc: Vlastimil Babka, Andrew Morton, linux-mm, Joonsoo Kim On Thu 25-08-16 13:35:04, Ralf-Peter Rohbeck wrote: [...] > Sorry, the tag was next-20160823; I called the branch linux-next-20160823. Yeah that is the tag I was looking for but the linux-next is quite volatile and if you do not fetch the particular tag it won't exist in leter trees. Anyway, I have set up a branch oom-playground in my tree git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git which which is on top of the current up-to-date mmotm tree + revert of the quick workaround which you have already tested (thanks for that!) and with the Vlastimil's patch which was dropped due to workaround. AFAIU this is what you have previously tested without OOM but later on still managed to hit OOM again. Which would suggest we are still not there and need to investigate further. I have some ideas what to do but I would appreciate if we can confirm this status before we try new things. Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-26 8:35 ` Michal Hocko @ 2016-09-06 11:09 ` Vlastimil Babka 0 siblings, 0 replies; 50+ messages in thread From: Vlastimil Babka @ 2016-09-06 11:09 UTC (permalink / raw) To: Michal Hocko, Ralf-Peter Rohbeck; +Cc: Andrew Morton, linux-mm, Joonsoo Kim On 08/26/2016 10:35 AM, Michal Hocko wrote: > On Thu 25-08-16 13:35:04, Ralf-Peter Rohbeck wrote: > [...] >> Sorry, the tag was next-20160823; I called the branch linux-next-20160823. > > Yeah that is the tag I was looking for but the linux-next is quite > volatile and if you do not fetch the particular tag it won't exist in > leter trees. Anyway, I have set up a branch oom-playground in my tree > git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git which which > is on top of the current up-to-date mmotm tree + revert of the quick > workaround which you have already tested (thanks for that!) and with > the Vlastimil's patch which was dropped due to workaround. This is missing the patch that introduced ignoring pageblock suitability for the highest compaction priority [1]. > AFAIU this > is what you have previously tested without OOM but later on still > managed to hit OOM again. I think the test also didn't include the patch [1] due to some confusion. I think I'll just resend everything (in a new thread) for testing on top of latest mmotm git. [1] http://marc.info/?l=linux-mm&m=147158805719821 > Which would suggest we are still not there > and need to investigate further. I have some ideas what to do but I > would appreciate if we can confirm this status before we try new things. > > Thanks! > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-19 6:27 ` Vlastimil Babka 2016-08-19 7:33 ` Michal Hocko @ 2016-08-23 5:02 ` Joonsoo Kim 2016-08-23 7:45 ` Michal Hocko 1 sibling, 1 reply; 50+ messages in thread From: Joonsoo Kim @ 2016-08-23 5:02 UTC (permalink / raw) To: Vlastimil Babka; +Cc: Ralf-Peter Rohbeck, Michal Hocko, linux-mm On Fri, Aug 19, 2016 at 08:27:34AM +0200, Vlastimil Babka wrote: > On 08/19/2016 04:42 AM, Ralf-Peter Rohbeck wrote: > > On 18.08.2016 13:12, Vlastimil Babka wrote: > >> On 18.8.2016 22:01, Ralf-Peter Rohbeck wrote: > >>> On 17.08.2016 23:57, Vlastimil Babka wrote: > >>>> Vlastimil > >>> Yes, that change was in my test with linux-next-20160817. Here's the diff: > >>> > >>> diff --git a/mm/compaction.c b/mm/compaction.c > >>> index f94ae67..60a9ca2 100644 > >>> --- a/mm/compaction.c > >>> +++ b/mm/compaction.c > >>> @@ -1083,8 +1083,10 @@ static void isolate_freepages(struct > >>> compact_control *cc) > >>> continue; > >>> > >>> /* Check the block is suitable for migration */ > >>> +/* > >>> if (!suitable_migration_target(page)) > >>> continue; > >>> +*/ > >> OK, could you please also try if uncommenting the above still works without OOM? > >> Or just plain linux-next-20160817, I guess we don't need the printk's to test > >> this difference. > >> > >> Thanks a lot! > >> Vlastimil > >> > > With the two lines back in I had OOMs again. See the attached logs. > > Thanks for the confirmation. > > We however shouldn't disable the heuristic completely, so here's a compromise > patch hooking into the new compaction priorities. Can you please test on top of > linux-next? > > -----8<----- > >From 0927cc2a4c6a3247111168eace9012c23d06f9db Mon Sep 17 00:00:00 2001 > From: Vlastimil Babka <vbabka@suse.cz> > Date: Thu, 18 Aug 2016 16:01:14 +0200 > Subject: [PATCH] mm, compaction: make full priority ignore pageblock > suitability > > Ralf-Peter Rohbeck has reported premature OOMs for order-2 allocations (stack) > due to OOM rework in 4.7. In his scenario (parallel kernel build and dd writing > to two drives) many pageblocks get marked as Unmovable and compaction free > scanner struggles to isolate free pages. Joonsoo Kim pointed out that the free > scanner skips pageblocks that are not movable to prevent filling them and > forcing non-movable allocations to fallback to other pageblocks. Such heuristic > makes sense to help prevent long-term fragmentation, but premature OOMs are > relatively more urgent problem. As a compromise, this patch disables the > heuristic only for the ultimate compaction priority. > > Reported-by: Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com> > Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > mm/compaction.c | 11 ++++++++--- > mm/internal.h | 1 + > 2 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 0bba270f97ad..884b1baa58df 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -997,8 +997,12 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn, > #ifdef CONFIG_COMPACTION > > /* Returns true if the page is within a block suitable for migration to */ > -static bool suitable_migration_target(struct page *page) > +static bool suitable_migration_target(struct compact_control *cc, > + struct page *page) > { > + if (cc->ignore_block_suitable) > + return true; > + > /* If the page is a large free page, then disallow migration */ > if (PageBuddy(page)) { > /* > @@ -1083,7 +1087,7 @@ static void isolate_freepages(struct compact_control *cc) > continue; > > /* Check the block is suitable for migration */ > - if (!suitable_migration_target(page)) > + if (!suitable_migration_target(cc, page)) > continue; > > /* If isolation recently failed, do not retry */ > @@ -1656,7 +1660,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, > .classzone_idx = classzone_idx, > .direct_compaction = true, > .whole_zone = (prio == COMPACT_PRIO_SYNC_FULL), > - .ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL) > + .ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL), > + .ignore_block_suitable = (prio == COMPACT_PRIO_SYNC_FULL) A year ago, I tested to allow unmovable/reclaimable pageblock for freescanner in very limited situation and found that it cause long-term fragmentation. I think that this solution is less tight than mine so I guess it will cause long-term fragmentation. I agree that allocation success is even more important but it's better not to cause long-term fragmentation as much as possible. So, my suggestion is... How about introducing one more priority (last priority) to allow scanning unmovable/reclaimable pageblock? If we don't reach that priority, long-term fragmentation can be avoided. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-23 5:02 ` Joonsoo Kim @ 2016-08-23 7:45 ` Michal Hocko 0 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2016-08-23 7:45 UTC (permalink / raw) To: Joonsoo Kim; +Cc: Vlastimil Babka, Ralf-Peter Rohbeck, linux-mm On Tue 23-08-16 14:02:52, Joonsoo Kim wrote: [...] > How about introducing one more priority (last priority) to allow scanning > unmovable/reclaimable pageblock? If we don't reach that priority, > long-term fragmentation can be avoided. I have already suggested that. We would reach that priority only for !costly orders. Vlastimil already has plans to cook up a patch for that but he is on vacation... -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-16 7:32 ` Michal Hocko 2016-08-16 7:43 ` Michal Hocko @ 2016-08-17 0:26 ` Ralf-Peter Rohbeck 2016-08-17 7:43 ` Vlastimil Babka 1 sibling, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-17 0:26 UTC (permalink / raw) To: Michal Hocko; +Cc: Vlastimil Babka, linux-mm [-- Attachment #1: Type: text/plain, Size: 1418 bytes --] No it wasn't yet in the last run. That OOM happened while I compiled the last change. I ran another test with the trace_printk: See attached. Again I ran only a kernel compilation. Ralf-Peter On 16.08.2016 00:32, Michal Hocko wrote: > On Mon 15-08-16 11:42:11, Ralf-Peter Rohbeck wrote: >> This time the OOM killer hit much quicker. No btrfs balance, just compiling >> the kernel with the new change did it. >> Much smaller logs so I'm attaching them. > Just to clarify. You have added the trace_printk for > try_to_release_page, right? (after fixing it of course). If yes there is > no single mention of that path failing which would support Joonsoo's > theory... Could you try with his patch? ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. [-- Attachment #2: OOM_4.7.0_p2.tar.bz2 --] [-- Type: application/x-bzip, Size: 761047 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-17 0:26 ` Ralf-Peter Rohbeck @ 2016-08-17 7:43 ` Vlastimil Babka 0 siblings, 0 replies; 50+ messages in thread From: Vlastimil Babka @ 2016-08-17 7:43 UTC (permalink / raw) To: Ralf-Peter Rohbeck, Michal Hocko; +Cc: linux-mm On 08/17/2016 02:26 AM, Ralf-Peter Rohbeck wrote: > No it wasn't yet in the last run. That OOM happened while I compiled the > last change. You mean those pr_infos? >From those we've got: Aug 16 17:14:26 fs kernel: [ 1817.044778] XXX: compaction_failed Aug 16 17:15:37 fs kernel: [ 1888.387817] XXX: compaction_failed Aug 16 17:17:32 fs kernel: [ 2002.879726] XXX: compaction_failed e.g. none of the "XXX: no zone suitable for compaction" lines I think my series in mmotm tree could help here. > I ran another test with the trace_printk: See attached. Again I ran only > a kernel compilation. so, the trace_printk didn't hit that many times: grep try_to_release trace_pipe.log | wc -l 52 and vmstat_after shows: pgmigrate_success 851 pgmigrate_fail 817 compact_migrate_scanned 567689 compact_free_scanned 50744242 compact_isolated 19196 compact_stall 876 compact_fail 801 compact_success 75 pagetype_after: Number of blocks type Unmovable Movable Reclaimable HighAtomic Isolate Node 0, zone DMA 1 7 0 0 0 Node 0, zone DMA32 883 91 42 0 0 Node 0, zone Normal 2750 207 115 0 0 So while btrfs migrate failures could be real, in this run it was rather the free scanner struggling due to unmovable blocks, as Joonsoo suggested. > Ralf-Peter > > On 16.08.2016 00:32, Michal Hocko wrote: >> On Mon 15-08-16 11:42:11, Ralf-Peter Rohbeck wrote: >>> This time the OOM killer hit much quicker. No btrfs balance, just compiling >>> the kernel with the new change did it. >>> Much smaller logs so I'm attaching them. >> Just to clarify. You have added the trace_printk for >> try_to_release_page, right? (after fixing it of course). If yes there is >> no single mention of that path failing which would support Joonsoo's >> theory... Could you try with his patch? > > > ---------------------------------------------------------------------- > The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-15 9:16 ` Vlastimil Babka 2016-08-15 15:01 ` Michal Hocko @ 2016-08-16 3:12 ` Joonsoo Kim 2016-08-16 7:44 ` Vlastimil Babka 2016-08-17 4:48 ` Ralf-Peter Rohbeck 1 sibling, 2 replies; 50+ messages in thread From: Joonsoo Kim @ 2016-08-16 3:12 UTC (permalink / raw) To: Vlastimil Babka; +Cc: Ralf-Peter Rohbeck, Michal Hocko, linux-mm On Mon, Aug 15, 2016 at 11:16:36AM +0200, Vlastimil Babka wrote: > On 08/15/2016 06:48 AM, Ralf-Peter Rohbeck wrote: > > On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote: > >> > > Took me a little longer than expected due to work. The failure wouldn't > > happen for a while and so I started a couple of scripts and let them > > run. When I checked today the server didn't respond on the network and > > sure enough it had killed everything. This is with 4.7.0 with the config > > based on Debian 4.7-rc7. > > > > trace_pipe got a little big (5GB) so I uploaded the logs to > > https://filebin.net/box0wycfouvhl6sr/OOM_4.7.0.tar.bz2. before_btrfs is > > before the btrfs filesystems were mounted. > > I did run a btrfs balance because it creates IO load and I needed to > > balance anyway. Maybe that's what caused it? > > pgmigrate_success 46738962 > pgmigrate_fail 135649772 > compact_migrate_scanned 309726659 > compact_free_scanned 9715615169 > compact_isolated 229689596 > compact_stall 4777 > compact_fail 3068 > compact_success 1709 > compact_daemon_wake 207834 > > The migration failures are quite enormous. Very quick analysis of the > trace seems to confirm that these are mostly "real", as opposed to result > of failure to isolate free pages for migration targets, although the free > scanner spent a lot of time: I don't think that main reason of OOM is 'real' migration failure. If it is the case, compaction would find next migratable pages and eventually some of pages would be migrated successfully. pagetypeinfo shows that there are too many unmovable pageblock. Freepage scanner don't scan those pageblocks so there is a large possibility that it cannot find freepages even if the system has many freepages. I think that this is the root cause of the problem. It's better to check that following work-around help the problem. Thanks. ------------>8----------- diff --git a/mm/compaction.c b/mm/compaction.c index 9affb29..965eddd 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1082,10 +1082,6 @@ static void isolate_freepages(struct compact_control *cc) if (!page) continue; - /* Check the block is suitable for migration */ - if (!suitable_migration_target(page)) - continue; - /* If isolation recently failed, do not retry */ if (!isolation_suitable(cc, page)) continue; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-16 3:12 ` Joonsoo Kim @ 2016-08-16 7:44 ` Vlastimil Babka 2016-08-17 4:48 ` Ralf-Peter Rohbeck 1 sibling, 0 replies; 50+ messages in thread From: Vlastimil Babka @ 2016-08-16 7:44 UTC (permalink / raw) To: Joonsoo Kim; +Cc: Ralf-Peter Rohbeck, Michal Hocko, linux-mm On 08/16/2016 05:12 AM, Joonsoo Kim wrote: > On Mon, Aug 15, 2016 at 11:16:36AM +0200, Vlastimil Babka wrote: >> On 08/15/2016 06:48 AM, Ralf-Peter Rohbeck wrote: >>> On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote: >>>> >>> Took me a little longer than expected due to work. The failure wouldn't >>> happen for a while and so I started a couple of scripts and let them >>> run. When I checked today the server didn't respond on the network and >>> sure enough it had killed everything. This is with 4.7.0 with the config >>> based on Debian 4.7-rc7. >>> >>> trace_pipe got a little big (5GB) so I uploaded the logs to >>> https://filebin.net/box0wycfouvhl6sr/OOM_4.7.0.tar.bz2. before_btrfs is >>> before the btrfs filesystems were mounted. >>> I did run a btrfs balance because it creates IO load and I needed to >>> balance anyway. Maybe that's what caused it? >> >> pgmigrate_success 46738962 >> pgmigrate_fail 135649772 >> compact_migrate_scanned 309726659 >> compact_free_scanned 9715615169 >> compact_isolated 229689596 >> compact_stall 4777 >> compact_fail 3068 >> compact_success 1709 >> compact_daemon_wake 207834 >> >> The migration failures are quite enormous. Very quick analysis of the >> trace seems to confirm that these are mostly "real", as opposed to result >> of failure to isolate free pages for migration targets, although the free >> scanner spent a lot of time: > > I don't think that main reason of OOM is 'real' migration failure. > If it is the case, compaction would find next migratable pages and > eventually some of pages would be migrated successfully. > > pagetypeinfo shows that there are too many unmovable pageblock. Hmm, well spotted. And also somewhat suspicious, I would expect filesystem activity to result in reclaimable allocations, not unmovable (not that it makes any difference for compaction). Checking nr_slab_* in zoneinfo shows that it really should be mostly reclaimable: nr_slab_reclaimable 0 nr_slab_unreclaimable 0 nr_slab_reclaimable 32709 nr_slab_unreclaimable 2764 nr_slab_reclaimable 101525 nr_slab_unreclaimable 10852 Compared with: Number of blocks type Unmovable Movable Reclaimable HighAtomic Isolate Node 0, zone DMA 1 7 0 0 0 Node 0, zone DMA32 893 72 51 0 0 Node 0, zone Normal 2780 155 137 0 0 We have 188 reclaimable blocks, that's 96256 pages. sum of nr_slab_reclaimable is 134234, which suggests some fallbacks into unmovable blocks. But the rest of all of those unmovable pageblocks must be filled by something else... some btrfs buffers maybe? > Freepage scanner don't scan those pageblocks so there is a large > possibility that it cannot find freepages even if the system has many > freepages. I think that this is the root cause of the problem. > > It's better to check that following work-around help the problem. Yes this might be good idea, minimally for higher compaction priorities. Thanks. > Thanks. > > ------------>8----------- > diff --git a/mm/compaction.c b/mm/compaction.c > index 9affb29..965eddd 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1082,10 +1082,6 @@ static void isolate_freepages(struct compact_control *cc) > if (!page) > continue; > > - /* Check the block is suitable for migration */ > - if (!suitable_migration_target(page)) > - continue; > - > /* If isolation recently failed, do not retry */ > if (!isolation_suitable(cc, page)) > continue; > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-16 3:12 ` Joonsoo Kim 2016-08-16 7:44 ` Vlastimil Babka @ 2016-08-17 4:48 ` Ralf-Peter Rohbeck 2016-08-17 7:56 ` Vlastimil Babka 1 sibling, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-17 4:48 UTC (permalink / raw) To: Joonsoo Kim, Vlastimil Babka; +Cc: Michal Hocko, linux-mm [-- Attachment #1: Type: text/plain, Size: 3674 bytes --] On 15.08.2016 20:12, Joonsoo Kim wrote: > On Mon, Aug 15, 2016 at 11:16:36AM +0200, Vlastimil Babka wrote: >> On 08/15/2016 06:48 AM, Ralf-Peter Rohbeck wrote: >>> On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote: >>> Took me a little longer than expected due to work. The failure wouldn't >>> happen for a while and so I started a couple of scripts and let them >>> run. When I checked today the server didn't respond on the network and >>> sure enough it had killed everything. This is with 4.7.0 with the config >>> based on Debian 4.7-rc7. >>> >>> trace_pipe got a little big (5GB) so I uploaded the logs to >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__filebin.net_box0wycfouvhl6sr_OOM-5F4.7.0.tar.bz2&d=DQIBAg&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=yGQdEpZknbtYvR0TyhkCGu-ifLklIvXIf740poRFltQ&m=5VwXI8Iw4BejxSrNmLdOj-bp6ZZXeBJ_-ENR4F0NToo&s=KuzRUwyq4itin6x-UJT-XYbJ9q0tOSt3zQuEYZyHKqE&e= . before_btrfs is >>> before the btrfs filesystems were mounted. >>> I did run a btrfs balance because it creates IO load and I needed to >>> balance anyway. Maybe that's what caused it? >> pgmigrate_success 46738962 >> pgmigrate_fail 135649772 >> compact_migrate_scanned 309726659 >> compact_free_scanned 9715615169 >> compact_isolated 229689596 >> compact_stall 4777 >> compact_fail 3068 >> compact_success 1709 >> compact_daemon_wake 207834 >> >> The migration failures are quite enormous. Very quick analysis of the >> trace seems to confirm that these are mostly "real", as opposed to result >> of failure to isolate free pages for migration targets, although the free >> scanner spent a lot of time: > I don't think that main reason of OOM is 'real' migration failure. > If it is the case, compaction would find next migratable pages and > eventually some of pages would be migrated successfully. > > pagetypeinfo shows that there are too many unmovable pageblock. > Freepage scanner don't scan those pageblocks so there is a large > possibility that it cannot find freepages even if the system has many > freepages. I think that this is the root cause of the problem. > > It's better to check that following work-around help the problem. > > Thanks. > > ------------>8----------- > diff --git a/mm/compaction.c b/mm/compaction.c > index 9affb29..965eddd 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1082,10 +1082,6 @@ static void isolate_freepages(struct compact_control *cc) > if (!page) > continue; > > - /* Check the block is suitable for migration */ > - if (!suitable_migration_target(page)) > - continue; > - > /* If isolation recently failed, do not retry */ > if (!isolation_suitable(cc, page)) > continue; > That seemed to help a little (subjectively) but still OOM killed a kernel build. The logs are attached. Thanks, Ralf-Peter ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. [-- Attachment #2: OOM_4.7.0_p3.tar.bz2 --] [-- Type: application/x-bzip, Size: 670163 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-17 4:48 ` Ralf-Peter Rohbeck @ 2016-08-17 7:56 ` Vlastimil Babka 2016-08-17 8:16 ` Joonsoo Kim 2016-08-17 9:11 ` Ralf-Peter Rohbeck 0 siblings, 2 replies; 50+ messages in thread From: Vlastimil Babka @ 2016-08-17 7:56 UTC (permalink / raw) To: Ralf-Peter Rohbeck, Joonsoo Kim; +Cc: Michal Hocko, linux-mm On 08/17/2016 06:48 AM, Ralf-Peter Rohbeck wrote: >> ------------>8----------- >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 9affb29..965eddd 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -1082,10 +1082,6 @@ static void isolate_freepages(struct compact_control *cc) >> if (!page) >> continue; >> >> - /* Check the block is suitable for migration */ >> - if (!suitable_migration_target(page)) >> - continue; >> - >> /* If isolation recently failed, do not retry */ >> if (!isolation_suitable(cc, page)) >> continue; >> > That seemed to help a little (subjectively) but still OOM killed a > kernel build. The logs are attached. > grep XXX messages Aug 16 20:29:13 fs kernel: [ 6850.467250] XXX: compaction_failed pagetypeinfo_after: Number of blocks type Unmovable Movable Reclaimable HighAtomic Isolate Node 0, zone DMA 1 7 0 0 0 Node 0, zone DMA32 879 93 44 0 0 Node 0, zone Normal 2862 136 74 0 0 vmstat_after: pgmigrate_success 5123 pgmigrate_fail 4106 compact_migrate_scanned 62019 compact_free_scanned 44314328 compact_isolated 18572 compact_stall 327 compact_fail 236 compact_success 91 compact_daemon_wake 1162 > grep try_to_release trace_pipe.log | wc -l 0 Again, migration failures are there but not so many, and failures to isolate freepages stand out. I assume it's because the kernel build workload and not the btrfs balance one. I think the patches in mmotm could make compaction try harder and use more appropriate watermarks, but it's not guaranteed that will help. The free scanner seems to become more and more a fundamental problem. And I really wonder how did all those unmovable pageblocks happen. AFAICS zoneinfo shows that most of memory is occupied by file lru pages. These should be movable. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-17 7:56 ` Vlastimil Babka @ 2016-08-17 8:16 ` Joonsoo Kim 2016-08-17 9:21 ` Ralf-Peter Rohbeck 2016-08-17 9:11 ` Ralf-Peter Rohbeck 1 sibling, 1 reply; 50+ messages in thread From: Joonsoo Kim @ 2016-08-17 8:16 UTC (permalink / raw) To: Vlastimil Babka; +Cc: Ralf-Peter Rohbeck, Joonsoo Kim, Michal Hocko, linux-mm 2016-08-17 16:56 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>: > On 08/17/2016 06:48 AM, Ralf-Peter Rohbeck wrote: >>> ------------>8----------- >>> diff --git a/mm/compaction.c b/mm/compaction.c >>> index 9affb29..965eddd 100644 >>> --- a/mm/compaction.c >>> +++ b/mm/compaction.c >>> @@ -1082,10 +1082,6 @@ static void isolate_freepages(struct compact_control *cc) >>> if (!page) >>> continue; >>> >>> - /* Check the block is suitable for migration */ >>> - if (!suitable_migration_target(page)) >>> - continue; >>> - >>> /* If isolation recently failed, do not retry */ >>> if (!isolation_suitable(cc, page)) >>> continue; >>> >> That seemed to help a little (subjectively) but still OOM killed a >> kernel build. The logs are attached. > >> grep XXX messages > Aug 16 20:29:13 fs kernel: [ 6850.467250] XXX: compaction_failed > > pagetypeinfo_after: > Number of blocks type Unmovable Movable Reclaimable HighAtomic Isolate > Node 0, zone DMA 1 7 0 0 0 > Node 0, zone DMA32 879 93 44 0 0 > Node 0, zone Normal 2862 136 74 0 0 > > vmstat_after: > pgmigrate_success 5123 > pgmigrate_fail 4106 > compact_migrate_scanned 62019 > compact_free_scanned 44314328 > compact_isolated 18572 > compact_stall 327 > compact_fail 236 > compact_success 91 > compact_daemon_wake 1162 > >> grep try_to_release trace_pipe.log | wc -l > 0 > > Again, migration failures are there but not so many, and failures to > isolate freepages stand out. I assume it's because the kernel build > workload and not the btrfs balance one. > > I think the patches in mmotm could make compaction try harder and use > more appropriate watermarks, but it's not guaranteed that will help. > The free scanner seems to become more and more a fundamental problem. Following trace is last compaction trial before triggering OOM. Free scanner start at 0x27fe00 but actual scan happens at 0x186a00. And, although log is snipped, compaction fails because it doesn't find any freepage. It skips half of pageblocks in that zone. It would be due to migratetype or skipbit. Both Vlastimil's recent patches and my work-around should be applied to solve this problem. Other part of trace looks like that my work-around isn't applied. Could you confirm that? Thanks. sh-14869 [000] .... 6850.456639: mm_compaction_try_to_compact_pages: order=2 gfp_mask=0x27000c0 mode=1 sh-14869 [000] .... 6850.456640: mm_compaction_suitable: node=0 zone=Normal order=2 ret=continue sh-14869 [000] .... 6850.456641: mm_compaction_begin: zone_start=0x100000 migrate_pfn=0x100000 free_pfn=0x27fe00 zone_end=0x280000, mode=sync sh-14869 [000] .... 6850.456641: mm_compaction_finished: node=0 zone=Normal order=2 ret=continue sh-14869 [000] .... 6850.456648: mm_compaction_isolate_migratepages: range=(0x100000 ~ 0x10002d) nr_scanned=45 nr_taken=32 sh-14869 [000] .... 6850.456834: mm_compaction_isolate_freepages: range=(0x186a00 ~ 0x186c00) nr_scanned=512 nr_taken=0 sh-14869 [000] .... 6850.456842: mm_compaction_isolate_freepages: range=(0x186800 ~ 0x186a00) nr_scanned=512 nr_taken=0 > And I really wonder how did all those unmovable pageblocks happen. > AFAICS zoneinfo shows that most of memory is occupied by file lru pages. > These should be movable. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-17 8:16 ` Joonsoo Kim @ 2016-08-17 9:21 ` Ralf-Peter Rohbeck 0 siblings, 0 replies; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-17 9:21 UTC (permalink / raw) To: Joonsoo Kim, Vlastimil Babka; +Cc: Joonsoo Kim, Michal Hocko, linux-mm On 17.08.2016 01:16, Joonsoo Kim wrote: > > Free scanner start at 0x27fe00 but actual scan happens at 0x186a00. > And, although log is snipped, compaction fails because it doesn't find > any freepage. > > It skips half of pageblocks in that zone. It would be due to > migratetype or skipbit. > Both Vlastimil's recent patches and my work-around should be applied to solve > this problem. > > Other part of trace looks like that my work-around isn't applied. > Could you confirm > that? > > Thanks. Your patch was in my last 4.7 run with the output in OOM_4.7.0_p3.tar.bz2 but not in _p2. Ralf-Peter ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-17 7:56 ` Vlastimil Babka 2016-08-17 8:16 ` Joonsoo Kim @ 2016-08-17 9:11 ` Ralf-Peter Rohbeck 2016-08-17 9:20 ` Vlastimil Babka 1 sibling, 1 reply; 50+ messages in thread From: Ralf-Peter Rohbeck @ 2016-08-17 9:11 UTC (permalink / raw) To: Vlastimil Babka, Joonsoo Kim; +Cc: Michal Hocko, linux-mm On 17.08.2016 00:56, Vlastimil Babka wrote: > > Again, migration failures are there but not so many, and failures to > isolate freepages stand out. I assume it's because the kernel build > workload and not the btrfs balance one. > > I think the patches in mmotm could make compaction try harder and use > more appropriate watermarks, but it's not guaranteed that will help. > The free scanner seems to become more and more a fundamental problem. > > And I really wonder how did all those unmovable pageblocks happen. > AFAICS zoneinfo shows that most of memory is occupied by file lru pages. > These should be movable. Is it the pressure on the page cache? Don't forget that I write to some disk drives (recently, 2) at media speed with dd if=/dev/zero bs=4M of=/dev/SDX. ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-17 9:11 ` Ralf-Peter Rohbeck @ 2016-08-17 9:20 ` Vlastimil Babka 0 siblings, 0 replies; 50+ messages in thread From: Vlastimil Babka @ 2016-08-17 9:20 UTC (permalink / raw) To: Ralf-Peter Rohbeck, Joonsoo Kim; +Cc: Michal Hocko, linux-mm On 08/17/2016 11:11 AM, Ralf-Peter Rohbeck wrote: > On 17.08.2016 00:56, Vlastimil Babka wrote: >> And I really wonder how did all those unmovable pageblocks happen. >> AFAICS zoneinfo shows that most of memory is occupied by file lru pages. >> These should be movable. > > Is it the pressure on the page cache? Don't forget that I write to some > disk drives (recently, 2) at media speed with dd if=/dev/zero bs=4M > of=/dev/SDX. Hmm page cache should be movable. But maybe it's different for writing to devices and not filesystems. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-01 19:26 ` Michal Hocko 2016-08-01 19:35 ` Ralf-Peter Rohbeck @ 2016-08-02 7:11 ` Vlastimil Babka 2016-08-02 9:02 ` Michal Hocko 2 siblings, 0 replies; 50+ messages in thread From: Vlastimil Babka @ 2016-08-02 7:11 UTC (permalink / raw) To: Michal Hocko, Ralf-Peter Rohbeck; +Cc: linux-mm On 08/01/2016 09:26 PM, Michal Hocko wrote: > [re-adding linux-mm mailing list - please always use reply-to-all > also CCing Vlastimil who can help with the compaction debugging] > > On Mon 01-08-16 11:48:53, Ralf-Peter Rohbeck wrote: >> See the messages log attached. It has several OOM killer entries. >> Let me know if there's anything else I can do. I'll try the disk erasing on >> 4.6 and on 4.7. > > Jul 31 17:17:05 fs kernel: [11918.534744] x2golistsession invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 > [...] > Jul 31 17:17:05 fs kernel: [11918.557356] Mem-Info: > Jul 31 17:17:05 fs kernel: [11918.558268] active_anon:7856 inactive_anon:21924 isolated_anon:0 > Jul 31 17:17:05 fs kernel: [11918.558268] active_file:70925 inactive_file:1796707 isolated_file:0 > Jul 31 17:17:05 fs kernel: [11918.558268] unevictable:0 dirty:277675 writeback:57117 unstable:0 > Jul 31 17:17:05 fs kernel: [11918.558268] slab_reclaimable:75821 slab_unreclaimable:9490 > Jul 31 17:17:05 fs kernel: [11918.558268] mapped:12014 shmem:2414 pagetables:1497 bounce:0 > Jul 31 17:17:05 fs kernel: [11918.558268] free:37021 free_pcp:89 free_cma:0 > [...] > Jul 31 17:17:05 fs kernel: [11918.578836] Node 0 DMA32: 2137*4kB (UME) 5043*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 48892kB > Jul 31 17:17:05 fs kernel: [11918.580370] Node 0 Normal: 2663*4kB (UME) 7452*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70268kB > > The above process is trying to allocate the kernel stack which is > order-2 (16kB) of physically contiguous memory which is clearly > not available as you can see. Memory compaction (assuming you have > CONFIG_COMPACTION enabled) which is a part of the oom reclaim process > should help to form such blocks but those retries are bound and if > there is not much hope left we eventually hit the OOM killer. If you > look at the above counters there is a lot of memory dirty and under the > writeback (1.3G), this suggests that the IO is quite slow wrt. writers. > Anyway there is a lot of anonymous memory which should be a good > candidate for compaction. > > But the IO doesn't seem to be the main factor I guess. Later OOM > invocations have a slightly different pattern (let's take the last one): > > Aug 1 06:30:45 fs kernel: [59536.957034] x2golistsession invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 > [...] > Aug 1 06:30:45 fs kernel: [59536.976467] Mem-Info: > Aug 1 06:30:45 fs kernel: [59536.977442] active_anon:16045 inactive_anon:20473 isolated_anon:0 > Aug 1 06:30:45 fs kernel: [59536.977442] active_file:169767 inactive_file:1727008 isolated_file:0 > Aug 1 06:30:45 fs kernel: [59536.977442] unevictable:0 dirty:32734 writeback:0 unstable:0 > Aug 1 06:30:45 fs kernel: [59536.977442] slab_reclaimable:41953 slab_unreclaimable:7507 > Aug 1 06:30:45 fs kernel: [59536.977442] mapped:10619 shmem:2443 pagetables:1971 bounce:0 > Aug 1 06:30:45 fs kernel: [59536.977442] free:36686 free_pcp:119 free_cma:0 > [...] > Aug 1 06:30:45 fs kernel: [59536.996407] Node 0 DMA32: 5909*4kB (UME) 3800*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54036kB > Aug 1 06:30:45 fs kernel: [59536.997846] Node 0 Normal: 4041*4kB (UME) 6799*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70556kB > > the amount of dirty pages is much smaller as well as the anonymous > memory. The biggest portion seems to be in the page cache. The memory > is till hugely fragmented though. In fact if we check all the OOM > invocations the only consistent thing is that the memory is fragmented > and the compaction cannot make sufficient progress consistently. We can > assume that the situation actually gets better because there are some > holes between those OOMs so we can assume that something has unpinned a > larger amount memory and allowed the compaction to make further progress > or that the load has strong peaks. We would need more information from > the compaction to know better. Vlastimil will surely tell you which > tracepoints to enable. Actually a snapshot of /proc/vmstat /proc/zoneinfo and /proc/pagetypeinfo before and after test would be also useful to provide first. Then compaction tracepoints: echo 1 > /sys/kernel/debug/tracing/events/compaction/enable cat /sys/kernel/debug/tracing/trace_pipe > /path/to/trace.log or with trace-cmd trace-cmd record -e compaction trace-cmd report Vlastimil -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: OOM killer changes 2016-08-01 19:26 ` Michal Hocko 2016-08-01 19:35 ` Ralf-Peter Rohbeck 2016-08-02 7:11 ` Vlastimil Babka @ 2016-08-02 9:02 ` Michal Hocko 2 siblings, 0 replies; 50+ messages in thread From: Michal Hocko @ 2016-08-02 9:02 UTC (permalink / raw) To: Ralf-Peter Rohbeck; +Cc: linux-mm, Vlastimil Babka On Mon 01-08-16 21:26:20, Michal Hocko wrote: > [re-adding linux-mm mailing list - please always use reply-to-all > also CCing Vlastimil who can help with the compaction debugging] > > On Mon 01-08-16 11:48:53, Ralf-Peter Rohbeck wrote: > > See the messages log attached. It has several OOM killer entries. > > Let me know if there's anything else I can do. I'll try the disk erasing on > > 4.6 and on 4.7. > > Jul 31 17:17:05 fs kernel: [11918.534744] x2golistsession invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 > [...] > Jul 31 17:17:05 fs kernel: [11918.557356] Mem-Info: > Jul 31 17:17:05 fs kernel: [11918.558268] active_anon:7856 inactive_anon:21924 isolated_anon:0 > Jul 31 17:17:05 fs kernel: [11918.558268] active_file:70925 inactive_file:1796707 isolated_file:0 > Jul 31 17:17:05 fs kernel: [11918.558268] unevictable:0 dirty:277675 writeback:57117 unstable:0 > Jul 31 17:17:05 fs kernel: [11918.558268] slab_reclaimable:75821 slab_unreclaimable:9490 > Jul 31 17:17:05 fs kernel: [11918.558268] mapped:12014 shmem:2414 pagetables:1497 bounce:0 > Jul 31 17:17:05 fs kernel: [11918.558268] free:37021 free_pcp:89 free_cma:0 > [...] > Jul 31 17:17:05 fs kernel: [11918.578836] Node 0 DMA32: 2137*4kB (UME) 5043*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 48892kB > Jul 31 17:17:05 fs kernel: [11918.580370] Node 0 Normal: 2663*4kB (UME) 7452*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70268kB > > The above process is trying to allocate the kernel stack which is > order-2 (16kB) of physically contiguous memory which is clearly > not available as you can see. Memory compaction (assuming you have > CONFIG_COMPACTION enabled) which is a part of the oom reclaim process > should help to form such blocks but those retries are bound and if > there is not much hope left we eventually hit the OOM killer. If you > look at the above counters there is a lot of memory dirty and under the > writeback (1.3G), this suggests that the IO is quite slow wrt. writers. > Anyway there is a lot of anonymous memory which should be a good > candidate for compaction. > > But the IO doesn't seem to be the main factor I guess. Later OOM > invocations have a slightly different pattern (let's take the last one): OK, so I've checked anon/file counters for all of OOM invocations and the pattern is in fact pretty much consistent: anon 29780 (1%) file 1867632 (89%) dirty 334792 (15%) slab 85311 (4%) anon 30215 (1%) file 1866069 (89%) dirty 336974 (16%) slab 85074 (4%) anon 32800 (1%) file 1865752 (89%) dirty 335470 (16%) slab 84793 (4%) anon 33040 (1%) file 1850425 (88%) dirty 349561 (16%) slab 88997 (4%) anon 31536 (1%) file 1859444 (88%) dirty 351498 (16%) slab 87475 (4%) anon 31540 (1%) file 1861497 (88%) dirty 351126 (16%) slab 86976 (4%) anon 28390 (1%) file 1863807 (88%) dirty 351404 (16%) slab 86292 (4%) anon 29655 (1%) file 1863581 (88%) dirty 351632 (16%) slab 86295 (4%) anon 28907 (1%) file 1861612 (88%) dirty 302386 (14%) slab 88269 (4%) anon 28475 (1%) file 1857073 (88%) dirty 299464 (14%) slab 88193 (4%) anon 29610 (1%) file 1861161 (88%) dirty 297911 (14%) slab 87796 (4%) anon 28624 (1%) file 1862460 (88%) dirty 300628 (14%) slab 87650 (4%) anon 35317 (1%) file 1901489 (90%) dirty 32652 (1%) slab 47519 (2%) anon 36518 (1%) file 1896775 (90%) dirty 32734 (1%) slab 49460 (2%) the dirty+writeback (marked as dirty above) drops down in the end but file LRU is consistently ~89% of the memory. That alone shouldn't be problem for the compaction to proceed except when those pages are pinned by the filesystem for some reason. You have said that you are using the Btrfs. Would it be possible to retest with the same storage layout and a different fs? That would help to rule out the FS as the source of the problems. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2016-09-06 11:10 UTC | newest] Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <d8f3adcc-3607-1ef6-9ec5-82b2e125eef2@quantum.com> 2016-08-01 6:16 ` OOM killer changes Michal Hocko [not found] ` <b1a39756-a0b5-1900-6575-d6e1f502cb26@Quantum.com> [not found] ` <20160801182358.GB31957@dhcp22.suse.cz> [not found] ` <30dbabc4-585c-55a5-9f3a-4e243c28356a@Quantum.com> 2016-08-01 19:26 ` Michal Hocko 2016-08-01 19:35 ` Ralf-Peter Rohbeck 2016-08-01 19:43 ` Michal Hocko 2016-08-01 19:52 ` Ralf-Peter Rohbeck 2016-08-01 20:09 ` Michal Hocko 2016-08-01 20:16 ` Ralf-Peter Rohbeck 2016-08-01 20:26 ` Michal Hocko 2016-08-01 21:14 ` Ralf-Peter Rohbeck 2016-08-01 21:27 ` Ralf-Peter Rohbeck 2016-08-02 7:10 ` Michal Hocko 2016-08-02 19:25 ` Ralf-Peter Rohbeck 2016-08-15 4:48 ` Ralf-Peter Rohbeck 2016-08-15 9:16 ` Vlastimil Babka 2016-08-15 15:01 ` Michal Hocko 2016-08-15 18:42 ` Ralf-Peter Rohbeck 2016-08-16 7:32 ` Michal Hocko 2016-08-16 7:43 ` Michal Hocko 2016-08-17 9:14 ` Ralf-Peter Rohbeck 2016-08-17 9:23 ` Vlastimil Babka 2016-08-17 9:28 ` Ralf-Peter Rohbeck 2016-08-17 9:33 ` Michal Hocko 2016-08-17 23:37 ` Ralf-Peter Rohbeck 2016-08-18 6:57 ` Vlastimil Babka 2016-08-18 20:01 ` Ralf-Peter Rohbeck 2016-08-18 20:12 ` Vlastimil Babka 2016-08-19 2:42 ` Ralf-Peter Rohbeck 2016-08-19 6:27 ` Vlastimil Babka 2016-08-19 7:33 ` Michal Hocko 2016-08-19 7:47 ` Vlastimil Babka 2016-08-19 8:26 ` Michal Hocko 2016-08-24 18:13 ` Ralf-Peter Rohbeck 2016-08-25 7:22 ` Michal Hocko 2016-08-25 20:35 ` Ralf-Peter Rohbeck 2016-08-26 8:35 ` Michal Hocko 2016-09-06 11:09 ` Vlastimil Babka 2016-08-23 5:02 ` Joonsoo Kim 2016-08-23 7:45 ` Michal Hocko 2016-08-17 0:26 ` Ralf-Peter Rohbeck 2016-08-17 7:43 ` Vlastimil Babka 2016-08-16 3:12 ` Joonsoo Kim 2016-08-16 7:44 ` Vlastimil Babka 2016-08-17 4:48 ` Ralf-Peter Rohbeck 2016-08-17 7:56 ` Vlastimil Babka 2016-08-17 8:16 ` Joonsoo Kim 2016-08-17 9:21 ` Ralf-Peter Rohbeck 2016-08-17 9:11 ` Ralf-Peter Rohbeck 2016-08-17 9:20 ` Vlastimil Babka 2016-08-02 7:11 ` Vlastimil Babka 2016-08-02 9:02 ` Michal Hocko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.