linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: OOM killer changes
       [not found] <d8f3adcc-3607-1ef6-9ec5-82b2e125eef2@quantum.com>
@ 2016-08-01  6:16 ` Michal Hocko
       [not found]   ` <b1a39756-a0b5-1900-6575-d6e1f502cb26@Quantum.com>
  0 siblings, 1 reply; 50+ messages in thread
From: Michal Hocko @ 2016-08-01  6:16 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck; +Cc: linux-mm

[CC linux-mm]

On Sun 31-07-16 21:29:02, Ralf-Peter Rohbeck wrote:
> Hello,
> 
> I just noted that 4.7rc7 killed processes for no good reason apparently, on
> a system with plenty of memory free and plenty of swap space.

Have you seen a similar with 4.6? Can you reproduce this behavior?
 
> At the time I initialized some USB3 drives by overwriting them with zeroes
> so IO was constantly busy (sync never finished.) Not sure if that was the
> reason. Still looking.

Could you share your OOM report please?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
       [not found]       ` <30dbabc4-585c-55a5-9f3a-4e243c28356a@Quantum.com>
@ 2016-08-01 19:26         ` Michal Hocko
  2016-08-01 19:35           ` Ralf-Peter Rohbeck
                             ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Michal Hocko @ 2016-08-01 19:26 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck; +Cc: linux-mm, Vlastimil Babka

[re-adding linux-mm mailing list - please always use reply-to-all
 also CCing Vlastimil who can help with the compaction debugging]

On Mon 01-08-16 11:48:53, Ralf-Peter Rohbeck wrote:
> See the messages log attached. It has several OOM killer entries.
> Let me know if there's anything else I can do. I'll try the disk erasing on
> 4.6 and on 4.7.

Jul 31 17:17:05 fs kernel: [11918.534744] x2golistsession invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
[...]
Jul 31 17:17:05 fs kernel: [11918.557356] Mem-Info:
Jul 31 17:17:05 fs kernel: [11918.558268] active_anon:7856 inactive_anon:21924 isolated_anon:0
Jul 31 17:17:05 fs kernel: [11918.558268]  active_file:70925 inactive_file:1796707 isolated_file:0
Jul 31 17:17:05 fs kernel: [11918.558268]  unevictable:0 dirty:277675 writeback:57117 unstable:0
Jul 31 17:17:05 fs kernel: [11918.558268]  slab_reclaimable:75821 slab_unreclaimable:9490
Jul 31 17:17:05 fs kernel: [11918.558268]  mapped:12014 shmem:2414 pagetables:1497 bounce:0
Jul 31 17:17:05 fs kernel: [11918.558268]  free:37021 free_pcp:89 free_cma:0
[...]
Jul 31 17:17:05 fs kernel: [11918.578836] Node 0 DMA32: 2137*4kB (UME) 5043*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 48892kB
Jul 31 17:17:05 fs kernel: [11918.580370] Node 0 Normal: 2663*4kB (UME) 7452*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70268kB

The above process is trying to allocate the kernel stack which is
order-2 (16kB) of physically contiguous memory which is clearly
not available as you can see. Memory compaction (assuming you have
CONFIG_COMPACTION enabled) which is a part of the oom reclaim process
should help to form such blocks but those retries are bound and if
there is not much hope left we eventually hit the OOM killer. If you
look at the above counters there is a lot of memory dirty and under the
writeback (1.3G), this suggests that the IO is quite slow wrt. writers.
Anyway there is a lot of anonymous memory which should be a good
candidate for compaction.

But the IO doesn't seem to be the main factor I guess. Later OOM
invocations have a slightly different pattern (let's take the last one):

Aug  1 06:30:45 fs kernel: [59536.957034] x2golistsession invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
[...]
Aug  1 06:30:45 fs kernel: [59536.976467] Mem-Info:
Aug  1 06:30:45 fs kernel: [59536.977442] active_anon:16045 inactive_anon:20473 isolated_anon:0
Aug  1 06:30:45 fs kernel: [59536.977442]  active_file:169767 inactive_file:1727008 isolated_file:0
Aug  1 06:30:45 fs kernel: [59536.977442]  unevictable:0 dirty:32734 writeback:0 unstable:0
Aug  1 06:30:45 fs kernel: [59536.977442]  slab_reclaimable:41953 slab_unreclaimable:7507
Aug  1 06:30:45 fs kernel: [59536.977442]  mapped:10619 shmem:2443 pagetables:1971 bounce:0
Aug  1 06:30:45 fs kernel: [59536.977442]  free:36686 free_pcp:119 free_cma:0
[...]
Aug  1 06:30:45 fs kernel: [59536.996407] Node 0 DMA32: 5909*4kB (UME) 3800*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54036kB
Aug  1 06:30:45 fs kernel: [59536.997846] Node 0 Normal: 4041*4kB (UME) 6799*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70556kB

the amount of dirty pages is much smaller as well as the anonymous
memory. The biggest portion seems to be in the page cache. The memory
is till hugely fragmented though. In fact if we check all the OOM
invocations the only consistent thing is that the memory is fragmented
and the compaction cannot make sufficient progress consistently. We can
assume that the situation actually gets better because there are some
holes between those OOMs so we can assume that something has unpinned a
larger amount memory and allowed the compaction to make further progress
or that the load has strong peaks. We would need more information from
the compaction to know better. Vlastimil will surely tell you which
tracepoints to enable.

Jul 31 17:17:05 fs kernel: [11918.578836] Node 0 DMA32: 2137*4kB (UME) 5043*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 48892kB
Jul 31 17:17:05 fs kernel: [11918.580370] Node 0 Normal: 2663*4kB (UME) 7452*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70268kB
Jul 31 20:17:51 fs kernel: [22764.494449] Node 0 DMA32: 2568*4kB (UME) 5472*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54048kB
Jul 31 20:17:51 fs kernel: [22764.495510] Node 0 Normal: 6109*4kB (UME) 6651*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 77660kB
Jul 31 20:57:18 fs kernel: [25131.260737] Node 0 DMA32: 2139*4kB (UME) 5114*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 49468kB
Jul 31 20:57:18 fs kernel: [25131.262060] Node 0 Normal: 3611*4kB (UME) 7312*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 72940kB
Jul 31 23:36:25 fs kernel: [34677.849133] Node 0 DMA32: 10276*4kB (UME) 3565*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 69624kB
Jul 31 23:36:25 fs kernel: [34677.850547] Node 0 Normal: 19080*4kB (UE) 1361*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 87208kB
Jul 31 23:36:35 fs kernel: [34688.300852] Node 0 DMA32: 2291*4kB (UME) 5208*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 50828kB
Jul 31 23:36:35 fs kernel: [34688.301959] Node 0 Normal: 5519*4kB (UME) 7338*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 80780kB
Jul 31 23:36:40 fs kernel: [34692.902932] Node 0 DMA32: 3163*4kB (UE) 4566*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 49180kB
Jul 31 23:36:40 fs kernel: [34692.904897] Node 0 Normal: 5833*4kB (UE) 6387*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 74428kB
Jul 31 23:36:47 fs kernel: [34699.517079] Node 0 DMA32: 3068*4kB (UME) 4889*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 51384kB
Jul 31 23:36:47 fs kernel: [34699.518537] Node 0 Normal: 5935*4kB (UME) 7324*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 82332kB
Jul 31 23:36:50 fs kernel: [34702.755342] Node 0 DMA32: 4975*4kB (UME) 4500*8kB (UM) 3*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55948kB
Jul 31 23:36:50 fs kernel: [34702.757018] Node 0 Normal: 7171*4kB (UE) 6047*8kB (U) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 77076kB
Jul 31 23:39:39 fs kernel: [34871.854243] Node 0 DMA32: 14269*4kB (UME) 1547*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 69452kB
Jul 31 23:39:39 fs kernel: [34871.855525] Node 0 Normal: 19081*4kB (UME) 28*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 76548kB
Jul 31 23:39:44 fs kernel: [34876.491809] Node 0 DMA32: 11368*4kB (UME) 4265*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 79592kB
Jul 31 23:39:44 fs kernel: [34876.493233] Node 0 Normal: 20088*4kB (UME) 236*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 82240kB
Jul 31 23:39:53 fs kernel: [34885.459361] Node 0 DMA32: 13302*4kB (UME) 2180*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70648kB
Jul 31 23:39:53 fs kernel: [34885.461011] Node 0 Normal: 18393*4kB (UE) 512*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 77668kB
Jul 31 23:39:55 fs kernel: [34887.848712] Node 0 DMA32: 14180*4kB (UE) 1690*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70240kB
Jul 31 23:39:55 fs kernel: [34887.850194] Node 0 Normal: 19598*4kB (UM) 21*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 78560kB
Aug  1 06:30:42 fs kernel: [59534.373842] Node 0 DMA32: 4458*4kB (UME) 4252*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 51848kB
Aug  1 06:30:42 fs kernel: [59534.375266] Node 0 Normal: 2265*4kB (U) 7168*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 66404kB
Aug  1 06:30:45 fs kernel: [59536.996407] Node 0 DMA32: 5909*4kB (UME) 3800*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54036kB
Aug  1 06:30:45 fs kernel: [59536.997846] Node 0 Normal: 4041*4kB (UME) 6799*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70556kB
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-01 19:26         ` Michal Hocko
@ 2016-08-01 19:35           ` Ralf-Peter Rohbeck
  2016-08-01 19:43             ` Michal Hocko
  2016-08-02  7:11           ` Vlastimil Babka
  2016-08-02  9:02           ` Michal Hocko
  2 siblings, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-01 19:35 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Vlastimil Babka

On 01.08.2016 12:26, Michal Hocko wrote:
> [re-adding linux-mm mailing list - please always use reply-to-all
>   also CCing Vlastimil who can help with the compaction debugging]
>
> On Mon 01-08-16 11:48:53, Ralf-Peter Rohbeck wrote:
>> See the messages log attached. It has several OOM killer entries.
>> Let me know if there's anything else I can do. I'll try the disk erasing on
>> 4.6 and on 4.7.
> Jul 31 17:17:05 fs kernel: [11918.534744] x2golistsession invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
> [...]
> Jul 31 17:17:05 fs kernel: [11918.557356] Mem-Info:
> Jul 31 17:17:05 fs kernel: [11918.558268] active_anon:7856 inactive_anon:21924 isolated_anon:0
> Jul 31 17:17:05 fs kernel: [11918.558268]  active_file:70925 inactive_file:1796707 isolated_file:0
> Jul 31 17:17:05 fs kernel: [11918.558268]  unevictable:0 dirty:277675 writeback:57117 unstable:0
> Jul 31 17:17:05 fs kernel: [11918.558268]  slab_reclaimable:75821 slab_unreclaimable:9490
> Jul 31 17:17:05 fs kernel: [11918.558268]  mapped:12014 shmem:2414 pagetables:1497 bounce:0
> Jul 31 17:17:05 fs kernel: [11918.558268]  free:37021 free_pcp:89 free_cma:0
> [...]
> Jul 31 17:17:05 fs kernel: [11918.578836] Node 0 DMA32: 2137*4kB (UME) 5043*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 48892kB
> Jul 31 17:17:05 fs kernel: [11918.580370] Node 0 Normal: 2663*4kB (UME) 7452*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70268kB
>
> The above process is trying to allocate the kernel stack which is
> order-2 (16kB) of physically contiguous memory which is clearly
> not available as you can see. Memory compaction (assuming you have
> CONFIG_COMPACTION enabled) which is a part of the oom reclaim process
I'm using the Debian kernel from experimental. CONFIG_COMPACTION is enabled:
root@fs:~# fgrep CONFIG_COMPACTION /boot/config-4.7.0-rc7-amd64
CONFIG_COMPACTION=y


> should help to form such blocks but those retries are bound and if
> there is not much hope left we eventually hit the OOM killer. If you
> look at the above counters there is a lot of memory dirty and under the
> writeback (1.3G), this suggests that the IO is quite slow wrt. writers.
> Anyway there is a lot of anonymous memory which should be a good
> candidate for compaction.
>
> But the IO doesn't seem to be the main factor I guess. Later OOM
> invocations have a slightly different pattern (let's take the last one):
>
> Aug  1 06:30:45 fs kernel: [59536.957034] x2golistsession invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
> [...]
> Aug  1 06:30:45 fs kernel: [59536.976467] Mem-Info:
> Aug  1 06:30:45 fs kernel: [59536.977442] active_anon:16045 inactive_anon:20473 isolated_anon:0
> Aug  1 06:30:45 fs kernel: [59536.977442]  active_file:169767 inactive_file:1727008 isolated_file:0
> Aug  1 06:30:45 fs kernel: [59536.977442]  unevictable:0 dirty:32734 writeback:0 unstable:0
> Aug  1 06:30:45 fs kernel: [59536.977442]  slab_reclaimable:41953 slab_unreclaimable:7507
> Aug  1 06:30:45 fs kernel: [59536.977442]  mapped:10619 shmem:2443 pagetables:1971 bounce:0
> Aug  1 06:30:45 fs kernel: [59536.977442]  free:36686 free_pcp:119 free_cma:0
> [...]
> Aug  1 06:30:45 fs kernel: [59536.996407] Node 0 DMA32: 5909*4kB (UME) 3800*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54036kB
> Aug  1 06:30:45 fs kernel: [59536.997846] Node 0 Normal: 4041*4kB (UME) 6799*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70556kB
>
> the amount of dirty pages is much smaller as well as the anonymous
> memory. The biggest portion seems to be in the page cache. The memory
The page cache will always be full if I'm writing at full steam to 
multiple drives, no?
> is till hugely fragmented though. In fact if we check all the OOM
> invocations the only consistent thing is that the memory is fragmented
> and the compaction cannot make sufficient progress consistently. We can
> assume that the situation actually gets better because there are some
> holes between those OOMs so we can assume that something has unpinned a
> larger amount memory and allowed the compaction to make further progress
> or that the load has strong peaks. We would need more information from
> the compaction to know better. Vlastimil will surely tell you which
> tracepoints to enable.
>
> Jul 31 17:17:05 fs kernel: [11918.578836] Node 0 DMA32: 2137*4kB (UME) 5043*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 48892kB
> Jul 31 17:17:05 fs kernel: [11918.580370] Node 0 Normal: 2663*4kB (UME) 7452*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70268kB
> Jul 31 20:17:51 fs kernel: [22764.494449] Node 0 DMA32: 2568*4kB (UME) 5472*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54048kB
> Jul 31 20:17:51 fs kernel: [22764.495510] Node 0 Normal: 6109*4kB (UME) 6651*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 77660kB
> Jul 31 20:57:18 fs kernel: [25131.260737] Node 0 DMA32: 2139*4kB (UME) 5114*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 49468kB
> Jul 31 20:57:18 fs kernel: [25131.262060] Node 0 Normal: 3611*4kB (UME) 7312*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 72940kB
> Jul 31 23:36:25 fs kernel: [34677.849133] Node 0 DMA32: 10276*4kB (UME) 3565*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 69624kB
> Jul 31 23:36:25 fs kernel: [34677.850547] Node 0 Normal: 19080*4kB (UE) 1361*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 87208kB
> Jul 31 23:36:35 fs kernel: [34688.300852] Node 0 DMA32: 2291*4kB (UME) 5208*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 50828kB
> Jul 31 23:36:35 fs kernel: [34688.301959] Node 0 Normal: 5519*4kB (UME) 7338*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 80780kB
> Jul 31 23:36:40 fs kernel: [34692.902932] Node 0 DMA32: 3163*4kB (UE) 4566*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 49180kB
> Jul 31 23:36:40 fs kernel: [34692.904897] Node 0 Normal: 5833*4kB (UE) 6387*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 74428kB
> Jul 31 23:36:47 fs kernel: [34699.517079] Node 0 DMA32: 3068*4kB (UME) 4889*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 51384kB
> Jul 31 23:36:47 fs kernel: [34699.518537] Node 0 Normal: 5935*4kB (UME) 7324*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 82332kB
> Jul 31 23:36:50 fs kernel: [34702.755342] Node 0 DMA32: 4975*4kB (UME) 4500*8kB (UM) 3*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55948kB
> Jul 31 23:36:50 fs kernel: [34702.757018] Node 0 Normal: 7171*4kB (UE) 6047*8kB (U) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 77076kB
> Jul 31 23:39:39 fs kernel: [34871.854243] Node 0 DMA32: 14269*4kB (UME) 1547*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 69452kB
> Jul 31 23:39:39 fs kernel: [34871.855525] Node 0 Normal: 19081*4kB (UME) 28*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 76548kB
> Jul 31 23:39:44 fs kernel: [34876.491809] Node 0 DMA32: 11368*4kB (UME) 4265*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 79592kB
> Jul 31 23:39:44 fs kernel: [34876.493233] Node 0 Normal: 20088*4kB (UME) 236*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 82240kB
> Jul 31 23:39:53 fs kernel: [34885.459361] Node 0 DMA32: 13302*4kB (UME) 2180*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70648kB
> Jul 31 23:39:53 fs kernel: [34885.461011] Node 0 Normal: 18393*4kB (UE) 512*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 77668kB
> Jul 31 23:39:55 fs kernel: [34887.848712] Node 0 DMA32: 14180*4kB (UE) 1690*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70240kB
> Jul 31 23:39:55 fs kernel: [34887.850194] Node 0 Normal: 19598*4kB (UM) 21*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 78560kB
> Aug  1 06:30:42 fs kernel: [59534.373842] Node 0 DMA32: 4458*4kB (UME) 4252*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 51848kB
> Aug  1 06:30:42 fs kernel: [59534.375266] Node 0 Normal: 2265*4kB (U) 7168*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 66404kB
> Aug  1 06:30:45 fs kernel: [59536.996407] Node 0 DMA32: 5909*4kB (UME) 3800*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54036kB
> Aug  1 06:30:45 fs kernel: [59536.997846] Node 0 Normal: 4041*4kB (UME) 6799*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70556kB


----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-01 19:35           ` Ralf-Peter Rohbeck
@ 2016-08-01 19:43             ` Michal Hocko
  2016-08-01 19:52               ` Ralf-Peter Rohbeck
  0 siblings, 1 reply; 50+ messages in thread
From: Michal Hocko @ 2016-08-01 19:43 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck; +Cc: linux-mm, Vlastimil Babka

On Mon 01-08-16 12:35:51, Ralf-Peter Rohbeck wrote:
> On 01.08.2016 12:26, Michal Hocko wrote:
[...]
> > the amount of dirty pages is much smaller as well as the anonymous
> > memory. The biggest portion seems to be in the page cache. The memory
>
> The page cache will always be full if I'm writing at full steam to multiple
> drives, no?

Yes, the memory full of page cache is not unusual. The large portion of
that memory being dirty/writeback can be a problem. That is why we have
a dirty memory throttling which slows down (throttles) writers to keep
the amount reasonable. What is your dirty throttling setup?
$ grep . /proc/sys/vm/dirty*

and what is your storage setup?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-01 19:43             ` Michal Hocko
@ 2016-08-01 19:52               ` Ralf-Peter Rohbeck
  2016-08-01 20:09                 ` Michal Hocko
  0 siblings, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-01 19:52 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Vlastimil Babka

On 01.08.2016 12:43, Michal Hocko wrote:
> On Mon 01-08-16 12:35:51, Ralf-Peter Rohbeck wrote:
>> On 01.08.2016 12:26, Michal Hocko wrote:
> [...]
>>> the amount of dirty pages is much smaller as well as the anonymous
>>> memory. The biggest portion seems to be in the page cache. The memory
>> The page cache will always be full if I'm writing at full steam to multiple
>> drives, no?
> Yes, the memory full of page cache is not unusual. The large portion of
> that memory being dirty/writeback can be a problem. That is why we have
> a dirty memory throttling which slows down (throttles) writers to keep
> the amount reasonable. What is your dirty throttling setup?
> $ grep . /proc/sys/vm/dirty*
>
> and what is your storage setup?

root@fs:~# grep . /proc/sys/vm/dirty*
/proc/sys/vm/dirty_background_bytes:0
/proc/sys/vm/dirty_background_ratio:10
/proc/sys/vm/dirty_bytes:0
/proc/sys/vm/dirty_expire_centisecs:3000
/proc/sys/vm/dirty_ratio:20
/proc/sys/vm/dirtytime_expire_seconds:43200
/proc/sys/vm/dirty_writeback_centisecs:500


Storage setup:

root@fs:~# lsscsi
[0:2:0:0]    disk    LSI      MR9271-8iCC      3.29  /dev/sda
[0:2:1:0]    disk    LSI      MR9271-8iCC      3.29  /dev/sdb
[9:0:0:0]    disk    TOSHIBA  External USB 3.0 5438  /dev/sdf
[10:0:0:0]   disk    Seagate  Backup+ Desk     050B  /dev/sdc
[11:0:0:0]   disk    Seagate  Expansion Desk   9400  /dev/sdd
[12:0:0:0]   disk    Seagate  Backup+ Desk     050B /dev/sde
[13:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdg
[14:0:0:0]   disk    TOSHIBA  External USB 3.0 5438 /dev/sdl
[15:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdh
[16:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdi
[17:0:0:0]   disk    TOSHIBA  External USB 3.0 5438 /dev/sdm
[18:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdj
[19:0:0:0]   disk    Seagate  Expansion Desk   9400  /dev/sdk

sda is a 6x 1TB RAID5 and sdb is a single 480GB SSD, both on a MegaRAID 
controller.

The rest are 4TB USB drives that I'm experimenting with.

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-01 19:52               ` Ralf-Peter Rohbeck
@ 2016-08-01 20:09                 ` Michal Hocko
  2016-08-01 20:16                   ` Ralf-Peter Rohbeck
  0 siblings, 1 reply; 50+ messages in thread
From: Michal Hocko @ 2016-08-01 20:09 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck; +Cc: linux-mm, Vlastimil Babka

On Mon 01-08-16 12:52:40, Ralf-Peter Rohbeck wrote:
> On 01.08.2016 12:43, Michal Hocko wrote:
> > On Mon 01-08-16 12:35:51, Ralf-Peter Rohbeck wrote:
> > > On 01.08.2016 12:26, Michal Hocko wrote:
> > [...]
> > > > the amount of dirty pages is much smaller as well as the anonymous
> > > > memory. The biggest portion seems to be in the page cache. The memory
> > > The page cache will always be full if I'm writing at full steam to multiple
> > > drives, no?
> > Yes, the memory full of page cache is not unusual. The large portion of
> > that memory being dirty/writeback can be a problem. That is why we have
> > a dirty memory throttling which slows down (throttles) writers to keep
> > the amount reasonable. What is your dirty throttling setup?
> > $ grep . /proc/sys/vm/dirty*
> > 
> > and what is your storage setup?
> 
> root@fs:~# grep . /proc/sys/vm/dirty*
> /proc/sys/vm/dirty_background_bytes:0
> /proc/sys/vm/dirty_background_ratio:10
> /proc/sys/vm/dirty_bytes:0
> /proc/sys/vm/dirty_expire_centisecs:3000
> /proc/sys/vm/dirty_ratio:20

With your 8G of RAM this can be quite a lot of dirty data at once. Is
your storage able to write that back in a reasonable time? I mean this
shouldn't cause the OOM killer but it can lead to some unexpected stalls
especially when there are a lot of writers AFAIU. dirty_bytes knob
should help to define a better cap.

> /proc/sys/vm/dirtytime_expire_seconds:43200
> /proc/sys/vm/dirty_writeback_centisecs:500
> 
> 
> Storage setup:
> 
> root@fs:~# lsscsi
> [0:2:0:0]    disk    LSI      MR9271-8iCC      3.29  /dev/sda
> [0:2:1:0]    disk    LSI      MR9271-8iCC      3.29  /dev/sdb
> [9:0:0:0]    disk    TOSHIBA  External USB 3.0 5438  /dev/sdf
> [10:0:0:0]   disk    Seagate  Backup+ Desk     050B  /dev/sdc
> [11:0:0:0]   disk    Seagate  Expansion Desk   9400  /dev/sdd
> [12:0:0:0]   disk    Seagate  Backup+ Desk     050B /dev/sde
> [13:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdg
> [14:0:0:0]   disk    TOSHIBA  External USB 3.0 5438 /dev/sdl
> [15:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdh
> [16:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdi
> [17:0:0:0]   disk    TOSHIBA  External USB 3.0 5438 /dev/sdm
> [18:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdj
> [19:0:0:0]   disk    Seagate  Expansion Desk   9400  /dev/sdk
> 
> sda is a 6x 1TB RAID5 and sdb is a single 480GB SSD, both on a MegaRAID
> controller.
> 
> The rest are 4TB USB drives that I'm experimenting with.

Which devices did you write when hitting the OOM killer?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-01 20:09                 ` Michal Hocko
@ 2016-08-01 20:16                   ` Ralf-Peter Rohbeck
  2016-08-01 20:26                     ` Michal Hocko
  0 siblings, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-01 20:16 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Vlastimil Babka



On 08/01/16 13:09, Michal Hocko wrote:
> On Mon 01-08-16 12:52:40, Ralf-Peter Rohbeck wrote:
>> On 01.08.2016 12:43, Michal Hocko wrote:
>>> On Mon 01-08-16 12:35:51, Ralf-Peter Rohbeck wrote:
>>>> On 01.08.2016 12:26, Michal Hocko wrote:
>>> [...]
>>>>> the amount of dirty pages is much smaller as well as the anonymous
>>>>> memory. The biggest portion seems to be in the page cache. The memory
>>>> The page cache will always be full if I'm writing at full steam to multiple
>>>> drives, no?
>>> Yes, the memory full of page cache is not unusual. The large portion of
>>> that memory being dirty/writeback can be a problem. That is why we have
>>> a dirty memory throttling which slows down (throttles) writers to keep
>>> the amount reasonable. What is your dirty throttling setup?
>>> $ grep . /proc/sys/vm/dirty*
>>>
>>> and what is your storage setup?
>> root@fs:~# grep . /proc/sys/vm/dirty*
>> /proc/sys/vm/dirty_background_bytes:0
>> /proc/sys/vm/dirty_background_ratio:10
>> /proc/sys/vm/dirty_bytes:0
>> /proc/sys/vm/dirty_expire_centisecs:3000
>> /proc/sys/vm/dirty_ratio:20
> With your 8G of RAM this can be quite a lot of dirty data at once. Is
> your storage able to write that back in a reasonable time? I mean this
> shouldn't cause the OOM killer but it can lead to some unexpected stalls
> especially when there are a lot of writers AFAIU. dirty_bytes knob
> should help to define a better cap.
The main filesystems are on the MegaRAID and can do 500-600 MB/s. 
Writing to the USB drives only pushes about 90MB/s per drive.
>
>> /proc/sys/vm/dirtytime_expire_seconds:43200
>> /proc/sys/vm/dirty_writeback_centisecs:500
>>
>>
>> Storage setup:
>>
>> root@fs:~# lsscsi
>> [0:2:0:0]    disk    LSI      MR9271-8iCC      3.29  /dev/sda
>> [0:2:1:0]    disk    LSI      MR9271-8iCC      3.29  /dev/sdb
>> [9:0:0:0]    disk    TOSHIBA  External USB 3.0 5438  /dev/sdf
>> [10:0:0:0]   disk    Seagate  Backup+ Desk     050B  /dev/sdc
>> [11:0:0:0]   disk    Seagate  Expansion Desk   9400  /dev/sdd
>> [12:0:0:0]   disk    Seagate  Backup+ Desk     050B /dev/sde
>> [13:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdg
>> [14:0:0:0]   disk    TOSHIBA  External USB 3.0 5438 /dev/sdl
>> [15:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdh
>> [16:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdi
>> [17:0:0:0]   disk    TOSHIBA  External USB 3.0 5438 /dev/sdm
>> [18:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdj
>> [19:0:0:0]   disk    Seagate  Expansion Desk   9400  /dev/sdk
>>
>> sda is a 6x 1TB RAID5 and sdb is a single 480GB SSD, both on a MegaRAID
>> controller.
>>
>> The rest are 4TB USB drives that I'm experimenting with.
> Which devices did you write when hitting the OOM killer?
sdc, sdd and sde each at max speed, with a little bit of garden variety 
IO on sda and sdb.

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-01 20:16                   ` Ralf-Peter Rohbeck
@ 2016-08-01 20:26                     ` Michal Hocko
  2016-08-01 21:14                       ` Ralf-Peter Rohbeck
  0 siblings, 1 reply; 50+ messages in thread
From: Michal Hocko @ 2016-08-01 20:26 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck; +Cc: linux-mm, Vlastimil Babka

On Mon 01-08-16 13:16:49, Ralf-Peter Rohbeck wrote:
> 
> 
> On 08/01/16 13:09, Michal Hocko wrote:
> > On Mon 01-08-16 12:52:40, Ralf-Peter Rohbeck wrote:
[...]
> > > root@fs:~# lsscsi
> > > [0:2:0:0]    disk    LSI      MR9271-8iCC      3.29  /dev/sda
> > > [0:2:1:0]    disk    LSI      MR9271-8iCC      3.29  /dev/sdb
> > > [9:0:0:0]    disk    TOSHIBA  External USB 3.0 5438  /dev/sdf
> > > [10:0:0:0]   disk    Seagate  Backup+ Desk     050B  /dev/sdc
> > > [11:0:0:0]   disk    Seagate  Expansion Desk   9400  /dev/sdd
> > > [12:0:0:0]   disk    Seagate  Backup+ Desk     050B /dev/sde
> > > [13:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdg
> > > [14:0:0:0]   disk    TOSHIBA  External USB 3.0 5438 /dev/sdl
> > > [15:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdh
> > > [16:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdi
> > > [17:0:0:0]   disk    TOSHIBA  External USB 3.0 5438 /dev/sdm
> > > [18:0:0:0]   disk    Seagate  Expansion Desk   9400 /dev/sdj
> > > [19:0:0:0]   disk    Seagate  Expansion Desk   9400  /dev/sdk
> > > 
> > > sda is a 6x 1TB RAID5 and sdb is a single 480GB SSD, both on a MegaRAID
> > > controller.
> > > 
> > > The rest are 4TB USB drives that I'm experimenting with.
> > Which devices did you write when hitting the OOM killer?
> sdc, sdd and sde each at max speed, with a little bit of garden variety IO
> on sda and sdb.

So do I get it right that the majority of the IO is to those slower USB
disks?  If yes then does lowering the dirty_bytes to something smaller
help?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-01 20:26                     ` Michal Hocko
@ 2016-08-01 21:14                       ` Ralf-Peter Rohbeck
  2016-08-01 21:27                         ` Ralf-Peter Rohbeck
  0 siblings, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-01 21:14 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Vlastimil Babka

On 01.08.2016 13:26, Michal Hocko wrote:
>
>> sdc, sdd and sde each at max speed, with a little bit of garden variety IO
>> on sda and sdb.
> So do I get it right that the majority of the IO is to those slower USB
> disks?  If yes then does lowering the dirty_bytes to something smaller
> help?

Yes, the vast majority.

I set dirty_bytes to 128MiB and started a fairly IO and memory intensive 
process and the OOM killer kicked in within a few seconds.

Same with 16MiB dirty_bytes and 1MiB.

Some additional IO load from my fast subsystem is enough:

At 1MiB dirty_bytes,

find /btrfs0/ -type f -exec md5sum {} \;

was enough (where /btrfs0 is on a LVM2 LV and the PV is on sda.) It read 
a few dozen files (random stuff with very mixed file sizes, none very 
big) until the OOM killer kicked in.

I'll try 4.6.


Ralf-Peter


----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-01 21:14                       ` Ralf-Peter Rohbeck
@ 2016-08-01 21:27                         ` Ralf-Peter Rohbeck
  2016-08-02  7:10                           ` Michal Hocko
  0 siblings, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-01 21:27 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Vlastimil Babka

On 01.08.2016 14:14, Ralf-Peter Rohbeck wrote:
> On 01.08.2016 13:26, Michal Hocko wrote:
>>
>>> sdc, sdd and sde each at max speed, with a little bit of garden 
>>> variety IO
>>> on sda and sdb.
>> So do I get it right that the majority of the IO is to those slower USB
>> disks?  If yes then does lowering the dirty_bytes to something smaller
>> help?
>
> Yes, the vast majority.
>
> I set dirty_bytes to 128MiB and started a fairly IO and memory 
> intensive process and the OOM killer kicked in within a few seconds.
>
> Same with 16MiB dirty_bytes and 1MiB.
>
> Some additional IO load from my fast subsystem is enough:
>
> At 1MiB dirty_bytes,
>
> find /btrfs0/ -type f -exec md5sum {} \;
>
> was enough (where /btrfs0 is on a LVM2 LV and the PV is on sda.) It 
> read a few dozen files (random stuff with very mixed file sizes, none 
> very big) until the OOM killer kicked in.
>
> I'll try 4.6.
With Debian 4.6.0.1 (4.6.4-1) it works: Writing to 3 USB drives and 
running each of the 3 tests that triggered the OOM killer in parallel, 
with default dirty settings.

Ralf-Peter

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-01 21:27                         ` Ralf-Peter Rohbeck
@ 2016-08-02  7:10                           ` Michal Hocko
  2016-08-02 19:25                             ` Ralf-Peter Rohbeck
  0 siblings, 1 reply; 50+ messages in thread
From: Michal Hocko @ 2016-08-02  7:10 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck; +Cc: linux-mm, Vlastimil Babka

On Mon 01-08-16 14:27:51, Ralf-Peter Rohbeck wrote:
> On 01.08.2016 14:14, Ralf-Peter Rohbeck wrote:
> > On 01.08.2016 13:26, Michal Hocko wrote:
> > > 
> > > > sdc, sdd and sde each at max speed, with a little bit of garden
> > > > variety IO
> > > > on sda and sdb.
> > > So do I get it right that the majority of the IO is to those slower USB
> > > disks?  If yes then does lowering the dirty_bytes to something smaller
> > > help?
> > 
> > Yes, the vast majority.
> > 
> > I set dirty_bytes to 128MiB and started a fairly IO and memory intensive
> > process and the OOM killer kicked in within a few seconds.
> > 
> > Same with 16MiB dirty_bytes and 1MiB.
> > 
> > Some additional IO load from my fast subsystem is enough:
> > 
> > At 1MiB dirty_bytes,
> > 
> > find /btrfs0/ -type f -exec md5sum {} \;
> > 
> > was enough (where /btrfs0 is on a LVM2 LV and the PV is on sda.) It read
> > a few dozen files (random stuff with very mixed file sizes, none very
> > big) until the OOM killer kicked in.
> > 
> > I'll try 4.6.
>
> With Debian 4.6.0.1 (4.6.4-1) it works: Writing to 3 USB drives and running
> each of the 3 tests that triggered the OOM killer in parallel, with default
> dirty settings.

Thanks for retesting! Now that it seems you are able to reproduce this,
could you do some experiments, please? First of all it would be great to
find out why we do not retry the compaction and whether it could make
some progress. The patch below will tell us the first part. Tracepoints 
can tell us the other part. Vlastimil, could you recommend some which
would give us some hints without generating way too much output?
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8b3e1341b754..a10b29a918d4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3274,6 +3274,7 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 			*migrate_mode = MIGRATE_SYNC_LIGHT;
 			return true;
 		}
+		pr_info("XXX: compaction_failed\n");
 		return false;
 	}
 
@@ -3283,8 +3284,12 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	 * But do not retry if the given zonelist is not suitable for
 	 * compaction.
 	 */
-	if (compaction_withdrawn(compact_result))
-		return compaction_zonelist_suitable(ac, order, alloc_flags);
+	if (compaction_withdrawn(compact_result)) {
+		int ret = compaction_zonelist_suitable(ac, order, alloc_flags);
+		if (!ret)
+			pr_info("XXX: no zone suitable for compaction\n");
+		return ret;
+	}
 
 	/*
 	 * !costly requests are much more important than __GFP_REPEAT
@@ -3299,6 +3304,7 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	if (compaction_retries <= max_retries)
 		return true;
 
+	pr_info("XXX: compaction retries fail after %d\n", compaction_retries);
 	return false;
 }
 #else

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-01 19:26         ` Michal Hocko
  2016-08-01 19:35           ` Ralf-Peter Rohbeck
@ 2016-08-02  7:11           ` Vlastimil Babka
  2016-08-02  9:02           ` Michal Hocko
  2 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2016-08-02  7:11 UTC (permalink / raw)
  To: Michal Hocko, Ralf-Peter Rohbeck; +Cc: linux-mm

On 08/01/2016 09:26 PM, Michal Hocko wrote:
> [re-adding linux-mm mailing list - please always use reply-to-all
>  also CCing Vlastimil who can help with the compaction debugging]
>
> On Mon 01-08-16 11:48:53, Ralf-Peter Rohbeck wrote:
>> See the messages log attached. It has several OOM killer entries.
>> Let me know if there's anything else I can do. I'll try the disk erasing on
>> 4.6 and on 4.7.
>
> Jul 31 17:17:05 fs kernel: [11918.534744] x2golistsession invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
> [...]
> Jul 31 17:17:05 fs kernel: [11918.557356] Mem-Info:
> Jul 31 17:17:05 fs kernel: [11918.558268] active_anon:7856 inactive_anon:21924 isolated_anon:0
> Jul 31 17:17:05 fs kernel: [11918.558268]  active_file:70925 inactive_file:1796707 isolated_file:0
> Jul 31 17:17:05 fs kernel: [11918.558268]  unevictable:0 dirty:277675 writeback:57117 unstable:0
> Jul 31 17:17:05 fs kernel: [11918.558268]  slab_reclaimable:75821 slab_unreclaimable:9490
> Jul 31 17:17:05 fs kernel: [11918.558268]  mapped:12014 shmem:2414 pagetables:1497 bounce:0
> Jul 31 17:17:05 fs kernel: [11918.558268]  free:37021 free_pcp:89 free_cma:0
> [...]
> Jul 31 17:17:05 fs kernel: [11918.578836] Node 0 DMA32: 2137*4kB (UME) 5043*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 48892kB
> Jul 31 17:17:05 fs kernel: [11918.580370] Node 0 Normal: 2663*4kB (UME) 7452*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70268kB
>
> The above process is trying to allocate the kernel stack which is
> order-2 (16kB) of physically contiguous memory which is clearly
> not available as you can see. Memory compaction (assuming you have
> CONFIG_COMPACTION enabled) which is a part of the oom reclaim process
> should help to form such blocks but those retries are bound and if
> there is not much hope left we eventually hit the OOM killer. If you
> look at the above counters there is a lot of memory dirty and under the
> writeback (1.3G), this suggests that the IO is quite slow wrt. writers.
> Anyway there is a lot of anonymous memory which should be a good
> candidate for compaction.
>
> But the IO doesn't seem to be the main factor I guess. Later OOM
> invocations have a slightly different pattern (let's take the last one):
>
> Aug  1 06:30:45 fs kernel: [59536.957034] x2golistsession invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
> [...]
> Aug  1 06:30:45 fs kernel: [59536.976467] Mem-Info:
> Aug  1 06:30:45 fs kernel: [59536.977442] active_anon:16045 inactive_anon:20473 isolated_anon:0
> Aug  1 06:30:45 fs kernel: [59536.977442]  active_file:169767 inactive_file:1727008 isolated_file:0
> Aug  1 06:30:45 fs kernel: [59536.977442]  unevictable:0 dirty:32734 writeback:0 unstable:0
> Aug  1 06:30:45 fs kernel: [59536.977442]  slab_reclaimable:41953 slab_unreclaimable:7507
> Aug  1 06:30:45 fs kernel: [59536.977442]  mapped:10619 shmem:2443 pagetables:1971 bounce:0
> Aug  1 06:30:45 fs kernel: [59536.977442]  free:36686 free_pcp:119 free_cma:0
> [...]
> Aug  1 06:30:45 fs kernel: [59536.996407] Node 0 DMA32: 5909*4kB (UME) 3800*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54036kB
> Aug  1 06:30:45 fs kernel: [59536.997846] Node 0 Normal: 4041*4kB (UME) 6799*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70556kB
>
> the amount of dirty pages is much smaller as well as the anonymous
> memory. The biggest portion seems to be in the page cache. The memory
> is till hugely fragmented though. In fact if we check all the OOM
> invocations the only consistent thing is that the memory is fragmented
> and the compaction cannot make sufficient progress consistently. We can
> assume that the situation actually gets better because there are some
> holes between those OOMs so we can assume that something has unpinned a
> larger amount memory and allowed the compaction to make further progress
> or that the load has strong peaks. We would need more information from
> the compaction to know better. Vlastimil will surely tell you which
> tracepoints to enable.

Actually a snapshot of /proc/vmstat /proc/zoneinfo and 
/proc/pagetypeinfo before and after test would be also useful to provide 
first. Then compaction tracepoints:

echo 1 > /sys/kernel/debug/tracing/events/compaction/enable
cat /sys/kernel/debug/tracing/trace_pipe > /path/to/trace.log

or with trace-cmd
trace-cmd record -e compaction
trace-cmd report

Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-01 19:26         ` Michal Hocko
  2016-08-01 19:35           ` Ralf-Peter Rohbeck
  2016-08-02  7:11           ` Vlastimil Babka
@ 2016-08-02  9:02           ` Michal Hocko
  2 siblings, 0 replies; 50+ messages in thread
From: Michal Hocko @ 2016-08-02  9:02 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck; +Cc: linux-mm, Vlastimil Babka

On Mon 01-08-16 21:26:20, Michal Hocko wrote:
> [re-adding linux-mm mailing list - please always use reply-to-all
>  also CCing Vlastimil who can help with the compaction debugging]
> 
> On Mon 01-08-16 11:48:53, Ralf-Peter Rohbeck wrote:
> > See the messages log attached. It has several OOM killer entries.
> > Let me know if there's anything else I can do. I'll try the disk erasing on
> > 4.6 and on 4.7.
> 
> Jul 31 17:17:05 fs kernel: [11918.534744] x2golistsession invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
> [...]
> Jul 31 17:17:05 fs kernel: [11918.557356] Mem-Info:
> Jul 31 17:17:05 fs kernel: [11918.558268] active_anon:7856 inactive_anon:21924 isolated_anon:0
> Jul 31 17:17:05 fs kernel: [11918.558268]  active_file:70925 inactive_file:1796707 isolated_file:0
> Jul 31 17:17:05 fs kernel: [11918.558268]  unevictable:0 dirty:277675 writeback:57117 unstable:0
> Jul 31 17:17:05 fs kernel: [11918.558268]  slab_reclaimable:75821 slab_unreclaimable:9490
> Jul 31 17:17:05 fs kernel: [11918.558268]  mapped:12014 shmem:2414 pagetables:1497 bounce:0
> Jul 31 17:17:05 fs kernel: [11918.558268]  free:37021 free_pcp:89 free_cma:0
> [...]
> Jul 31 17:17:05 fs kernel: [11918.578836] Node 0 DMA32: 2137*4kB (UME) 5043*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 48892kB
> Jul 31 17:17:05 fs kernel: [11918.580370] Node 0 Normal: 2663*4kB (UME) 7452*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 70268kB
> 
> The above process is trying to allocate the kernel stack which is
> order-2 (16kB) of physically contiguous memory which is clearly
> not available as you can see. Memory compaction (assuming you have
> CONFIG_COMPACTION enabled) which is a part of the oom reclaim process
> should help to form such blocks but those retries are bound and if
> there is not much hope left we eventually hit the OOM killer. If you
> look at the above counters there is a lot of memory dirty and under the
> writeback (1.3G), this suggests that the IO is quite slow wrt. writers.
> Anyway there is a lot of anonymous memory which should be a good
> candidate for compaction.
> 
> But the IO doesn't seem to be the main factor I guess. Later OOM
> invocations have a slightly different pattern (let's take the last one):

OK, so I've checked anon/file counters for all of OOM invocations and
the pattern is in fact pretty much consistent:
anon 29780 (1%) file 1867632 (89%) dirty 334792 (15%) slab 85311 (4%)
anon 30215 (1%) file 1866069 (89%) dirty 336974 (16%) slab 85074 (4%)
anon 32800 (1%) file 1865752 (89%) dirty 335470 (16%) slab 84793 (4%)
anon 33040 (1%) file 1850425 (88%) dirty 349561 (16%) slab 88997 (4%)
anon 31536 (1%) file 1859444 (88%) dirty 351498 (16%) slab 87475 (4%)
anon 31540 (1%) file 1861497 (88%) dirty 351126 (16%) slab 86976 (4%)
anon 28390 (1%) file 1863807 (88%) dirty 351404 (16%) slab 86292 (4%)
anon 29655 (1%) file 1863581 (88%) dirty 351632 (16%) slab 86295 (4%)
anon 28907 (1%) file 1861612 (88%) dirty 302386 (14%) slab 88269 (4%)
anon 28475 (1%) file 1857073 (88%) dirty 299464 (14%) slab 88193 (4%)
anon 29610 (1%) file 1861161 (88%) dirty 297911 (14%) slab 87796 (4%)
anon 28624 (1%) file 1862460 (88%) dirty 300628 (14%) slab 87650 (4%)
anon 35317 (1%) file 1901489 (90%) dirty 32652 (1%) slab 47519 (2%)
anon 36518 (1%) file 1896775 (90%) dirty 32734 (1%) slab 49460 (2%)

the dirty+writeback (marked as dirty above) drops down in the end but
file LRU is consistently ~89% of the memory. That alone shouldn't be
problem for the compaction to proceed except when those pages are pinned
by the filesystem for some reason. You have said that you are using the
Btrfs.  Would it be possible to retest with the same storage layout and
a different fs? That would help to rule out the FS as the source of the
problems.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-02  7:10                           ` Michal Hocko
@ 2016-08-02 19:25                             ` Ralf-Peter Rohbeck
  2016-08-15  4:48                               ` Ralf-Peter Rohbeck
  0 siblings, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-02 19:25 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Vlastimil Babka

I can do that but it'll be later this week.

Ralf-Peter
On 08/02/2016 12:10 AM, Michal Hocko wrote:
> On Mon 01-08-16 14:27:51, Ralf-Peter Rohbeck wrote:
>> On 01.08.2016 14:14, Ralf-Peter Rohbeck wrote:
>>> On 01.08.2016 13:26, Michal Hocko wrote:
>>>>> sdc, sdd and sde each at max speed, with a little bit of garden
>>>>> variety IO
>>>>> on sda and sdb.
>>>> So do I get it right that the majority of the IO is to those slower USB
>>>> disks?  If yes then does lowering the dirty_bytes to something smaller
>>>> help?
>>> ADMIN
>>> Yes, the vast majority.
>>>
>>> I set dirty_bytes to 128MiB and started a fairly IO and memory intensive
>>> process and the OOM killer kicked in within a few seconds.
>>>
>>> Same with 16MiB dirty_bytes and 1MiB.
>>>
>>> Some additional IO load from my fast subsystem is enough:
>>>
>>> At 1MiB dirty_bytes,
>>>
>>> find /btrfs0/ -type f -exec md5sum {} \;
>>>
>>> was enough (where /btrfs0 is on a LVM2 LV and the PV is on sda.) It read
>>> a few dozen files (random stuff with very mixed file sizes, none very
>>> big) until the OOM killer kicked in.
>>>
>>> I'll try 4.6.
>> With Debian 4.6.0.1 (4.6.4-1) it works: Writing to 3 USB drives and running
>> each of the 3 tests that triggered the OOM killer in parallel, with default
>> dirty settings.
> Thanks for retesting! Now that it seems you are able to reproduce this,
> could you do some experiments, please? First of all it would be great to
> find out why we do not retry the compaction and whether it could make
> some progress. The patch below will tell us the first part. Tracepoints
> can tell us the other part. Vlastimil, could you recommend some which
> would give us some hints without generating way too much output?
> ---
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8b3e1341b754..a10b29a918d4 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3274,6 +3274,7 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>   			*migrate_mode = MIGRATE_SYNC_LIGHT;
>   			return true;
>   		}
> +		pr_info("XXX: compaction_failed\n");
>   		return false;
>   	}
>   
> @@ -3283,8 +3284,12 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>   	 * But do not retry if the given zonelist is not suitable for
>   	 * compaction.
>   	 */
> -	if (compaction_withdrawn(compact_result))
> -		return compaction_zonelist_suitable(ac, order, alloc_flags);
> +	if (compaction_withdrawn(compact_result)) {
> +		int ret = compaction_zonelist_suitable(ac, order, alloc_flags);
> +		if (!ret)
> +			pr_info("XXX: no zone suitable for compaction\n");
> +		return ret;
> +	}
>   
>   	/*
>   	 * !costly requests are much more important than __GFP_REPEAT
> @@ -3299,6 +3304,7 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
>   	if (compaction_retries <= max_retries)
>   		return true;
>   
> +	pr_info("XXX: compaction retries fail after %d\n", compaction_retries);
>   	return false;
>   }
>   #else
>

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-02 19:25                             ` Ralf-Peter Rohbeck
@ 2016-08-15  4:48                               ` Ralf-Peter Rohbeck
  2016-08-15  9:16                                 ` Vlastimil Babka
  0 siblings, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-15  4:48 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Vlastimil Babka

On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote:
> I can do that but it'll be later this week.
>
> Ralf-Peter
> On 08/02/2016 12:10 AM, Michal Hocko wrote:
>> On Mon 01-08-16 14:27:51, Ralf-Peter Rohbeck wrote:
>>> On 01.08.2016 14:14, Ralf-Peter Rohbeck wrote:
>>>> On 01.08.2016 13:26, Michal Hocko wrote:
>>>>>> sdc, sdd and sde each at max speed, with a little bit of garden
>>>>>> variety IO
>>>>>> on sda and sdb.
>>>>> So do I get it right that the majority of the IO is to those 
>>>>> slower USB
>>>>> disks?  If yes then does lowering the dirty_bytes to something 
>>>>> smaller
>>>>> help?
>>>> ADMIN
>>>> Yes, the vast majority.
>>>>
>>>> I set dirty_bytes to 128MiB and started a fairly IO and memory 
>>>> intensive
>>>> process and the OOM killer kicked in within a few seconds.
>>>>
>>>> Same with 16MiB dirty_bytes and 1MiB.
>>>>
>>>> Some additional IO load from my fast subsystem is enough:
>>>>
>>>> At 1MiB dirty_bytes,
>>>>
>>>> find /btrfs0/ -type f -exec md5sum {} \;
>>>>
>>>> was enough (where /btrfs0 is on a LVM2 LV and the PV is on sda.) It 
>>>> read
>>>> a few dozen files (random stuff with very mixed file sizes, none very
>>>> big) until the OOM killer kicked in.
>>>>
>>>> I'll try 4.6.
>>> With Debian 4.6.0.1 (4.6.4-1) it works: Writing to 3 USB drives and 
>>> running
>>> each of the 3 tests that triggered the OOM killer in parallel, with 
>>> default
>>> dirty settings.
>> Thanks for retesting! Now that it seems you are able to reproduce this,
>> could you do some experiments, please? First of all it would be great to
>> find out why we do not retry the compaction and whether it could make
>> some progress. The patch below will tell us the first part. Tracepoints
>> can tell us the other part. Vlastimil, could you recommend some which
>> would give us some hints without generating way too much output?
>> ---
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 8b3e1341b754..a10b29a918d4 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3274,6 +3274,7 @@ should_compact_retry(struct alloc_context *ac, 
>> int order, int alloc_flags,
>>               *migrate_mode = MIGRATE_SYNC_LIGHT;
>>               return true;
>>           }
>> +        pr_info("XXX: compaction_failed\n");
>>           return false;
>>       }
>>   @@ -3283,8 +3284,12 @@ should_compact_retry(struct alloc_context 
>> *ac, int order, int alloc_flags,
>>        * But do not retry if the given zonelist is not suitable for
>>        * compaction.
>>        */
>> -    if (compaction_withdrawn(compact_result))
>> -        return compaction_zonelist_suitable(ac, order, alloc_flags);
>> +    if (compaction_withdrawn(compact_result)) {
>> +        int ret = compaction_zonelist_suitable(ac, order, alloc_flags);
>> +        if (!ret)
>> +            pr_info("XXX: no zone suitable for compaction\n");
>> +        return ret;
>> +    }
>>         /*
>>        * !costly requests are much more important than __GFP_REPEAT
>> @@ -3299,6 +3304,7 @@ should_compact_retry(struct alloc_context *ac, 
>> int order, int alloc_flags,
>>       if (compaction_retries <= max_retries)
>>           return true;
>>   +    pr_info("XXX: compaction retries fail after %d\n", 
>> compaction_retries);
>>       return false;
>>   }
>>   #else
>>
>
Took me a little longer than expected due to work. The failure wouldn't 
happen for a while and so I started a couple of scripts and let them 
run. When I checked today the server didn't respond on the network and 
sure enough it had killed everything. This is with 4.7.0 with the config 
based on Debian 4.7-rc7.

trace_pipe got a little big (5GB) so I uploaded the logs to 
https://filebin.net/box0wycfouvhl6sr/OOM_4.7.0.tar.bz2. before_btrfs is 
before the btrfs filesystems were mounted.
I did run a btrfs balance because it creates IO load and I needed to 
balance anyway. Maybe that's what caused it?

I'll make the changes requested by Michal and try again.

Thanks,
Ralf-Peter


----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-15  4:48                               ` Ralf-Peter Rohbeck
@ 2016-08-15  9:16                                 ` Vlastimil Babka
  2016-08-15 15:01                                   ` Michal Hocko
  2016-08-16  3:12                                   ` Joonsoo Kim
  0 siblings, 2 replies; 50+ messages in thread
From: Vlastimil Babka @ 2016-08-15  9:16 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck, Michal Hocko; +Cc: linux-mm

On 08/15/2016 06:48 AM, Ralf-Peter Rohbeck wrote:
> On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote:
>>
> Took me a little longer than expected due to work. The failure wouldn't 
> happen for a while and so I started a couple of scripts and let them 
> run. When I checked today the server didn't respond on the network and 
> sure enough it had killed everything. This is with 4.7.0 with the config 
> based on Debian 4.7-rc7.
> 
> trace_pipe got a little big (5GB) so I uploaded the logs to 
> https://filebin.net/box0wycfouvhl6sr/OOM_4.7.0.tar.bz2. before_btrfs is 
> before the btrfs filesystems were mounted.
> I did run a btrfs balance because it creates IO load and I needed to 
> balance anyway. Maybe that's what caused it?

pgmigrate_success        46738962
pgmigrate_fail          135649772
compact_migrate_scanned 309726659
compact_free_scanned   9715615169
compact_isolated        229689596
compact_stall 4777
compact_fail 3068
compact_success 1709
compact_daemon_wake 207834

The migration failures are quite enormous. Very quick analysis of the
trace seems to confirm that these are mostly "real", as opposed to result
of failure to isolate free pages for migration targets, although the free
scanner spent a lot of time:

> grep "nr_failed=32" -B1 trace_pipe.log | grep isolate_freepages.*nr_taken=0 | wc -l
3246

So is it one of the cases where fs is unable to migrate dirty/writeback pages?

Vlastimil

> I'll make the changes requested by Michal and try again.
> 
> Thanks,
> Ralf-Peter
> 
> 
> ----------------------------------------------------------------------
> The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-15  9:16                                 ` Vlastimil Babka
@ 2016-08-15 15:01                                   ` Michal Hocko
  2016-08-15 18:42                                     ` Ralf-Peter Rohbeck
  2016-08-16  3:12                                   ` Joonsoo Kim
  1 sibling, 1 reply; 50+ messages in thread
From: Michal Hocko @ 2016-08-15 15:01 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Ralf-Peter Rohbeck, linux-mm

On Mon 15-08-16 11:16:36, Vlastimil Babka wrote:
> On 08/15/2016 06:48 AM, Ralf-Peter Rohbeck wrote:
> > On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote:
> >>
> > Took me a little longer than expected due to work. The failure wouldn't 
> > happen for a while and so I started a couple of scripts and let them 
> > run. When I checked today the server didn't respond on the network and 
> > sure enough it had killed everything. This is with 4.7.0 with the config 
> > based on Debian 4.7-rc7.
> > 
> > trace_pipe got a little big (5GB) so I uploaded the logs to 
> > https://filebin.net/box0wycfouvhl6sr/OOM_4.7.0.tar.bz2. before_btrfs is 
> > before the btrfs filesystems were mounted.
> > I did run a btrfs balance because it creates IO load and I needed to 
> > balance anyway. Maybe that's what caused it?
> 
> pgmigrate_success        46738962
> pgmigrate_fail          135649772
> compact_migrate_scanned 309726659
> compact_free_scanned   9715615169
> compact_isolated        229689596
> compact_stall 4777
> compact_fail 3068
> compact_success 1709
> compact_daemon_wake 207834
> 
> The migration failures are quite enormous. Very quick analysis of the
> trace seems to confirm that these are mostly "real", as opposed to result
> of failure to isolate free pages for migration targets, although the free
> scanner spent a lot of time:
> 
> > grep "nr_failed=32" -B1 trace_pipe.log | grep isolate_freepages.*nr_taken=0 | wc -l
> 3246
> 
> So is it one of the cases where fs is unable to migrate dirty/writeback pages?

It smells that way. Now we should find out why and what can we do about
that. I suspect that try_to_release_page is not able to release the page
for migration. Btrfs doesn't seem to have migratepage for page cache
pages so it should go via fallback_migrate_page.

The following diff should tell us whether this is really the case. Just
open trace_pipe and see whether this path really triggered.
---
diff --git a/mm/migrate.c b/mm/migrate.c
index 72c09dea6526..120e2e5fcbea 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -729,8 +729,10 @@ static int fallback_migrate_page(struct address_space *mapping,
 	 * We must have no buffers or drop them.
 	 */
 	if (page_has_private(page) &&
-	    !try_to_release_page(page, GFP_KERNEL))
+	    !try_to_release_page(page, GFP_KERNEL)) {
+		trace_printk("try_to_release_page failed for a_ops:%pS\n", page->a_ops);
 		return -EAGAIN;
+	}
 
 	return migrate_page(mapping, newpage, page, mode);
 }
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-15 15:01                                   ` Michal Hocko
@ 2016-08-15 18:42                                     ` Ralf-Peter Rohbeck
  2016-08-16  7:32                                       ` Michal Hocko
  0 siblings, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-15 18:42 UTC (permalink / raw)
  To: Michal Hocko, Vlastimil Babka; +Cc: linux-mm

[-- Attachment #1: Type: text/plain, Size: 3689 bytes --]

This time the OOM killer hit much quicker. No btrfs balance, just 
compiling the kernel with the new change did it.
Much smaller logs so I'm attaching them.

Ralf-Peter
On 15.08.2016 08:01, Michal Hocko wrote:
> On Mon 15-08-16 11:16:36, Vlastimil Babka wrote:
>> On 08/15/2016 06:48 AM, Ralf-Peter Rohbeck wrote:
>>> On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote:
>>> Took me a little longer than expected due to work. The failure wouldn't
>>> happen for a while and so I started a couple of scripts and let them
>>> run. When I checked today the server didn't respond on the network and
>>> sure enough it had killed everything. This is with 4.7.0 with the config
>>> based on Debian 4.7-rc7.
>>>
>>> trace_pipe got a little big (5GB) so I uploaded the logs to
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__filebin.net_box0wycfouvhl6sr_OOM-5F4.7.0.tar.bz2&d=DQIBAg&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=yGQdEpZknbtYvR0TyhkCGu-ifLklIvXIf740poRFltQ&m=TBVC4CIIUzJlmpDNapp31jIbz3Gy1M-aQ9jhrv0U56I&s=ozhbhqcuwlWiU1Cd8PZGl5-CC69-m-sNUitSYI2ry1Y&e= . before_btrfs is
>>> before the btrfs filesystems were mounted.
>>> I did run a btrfs balance because it creates IO load and I needed to
>>> balance anyway. Maybe that's what caused it?
>> pgmigrate_success        46738962
>> pgmigrate_fail          135649772
>> compact_migrate_scanned 309726659
>> compact_free_scanned   9715615169
>> compact_isolated        229689596
>> compact_stall 4777
>> compact_fail 3068
>> compact_success 1709
>> compact_daemon_wake 207834
>>
>> The migration failures are quite enormous. Very quick analysis of the
>> trace seems to confirm that these are mostly "real", as opposed to result
>> of failure to isolate free pages for migration targets, although the free
>> scanner spent a lot of time:
>>
>>> grep "nr_failed=32" -B1 trace_pipe.log | grep isolate_freepages.*nr_taken=0 | wc -l
>> 3246
>>
>> So is it one of the cases where fs is unable to migrate dirty/writeback pages?
> It smells that way. Now we should find out why and what can we do about
> that. I suspect that try_to_release_page is not able to release the page
> for migration. Btrfs doesn't seem to have migratepage for page cache
> pages so it should go via fallback_migrate_page.
>
> The following diff should tell us whether this is really the case. Just
> open trace_pipe and see whether this path really triggered.
> ---
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 72c09dea6526..120e2e5fcbea 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -729,8 +729,10 @@ static int fallback_migrate_page(struct address_space *mapping,
>   	 * We must have no buffers or drop them.
>   	 */
>   	if (page_has_private(page) &&
> -	    !try_to_release_page(page, GFP_KERNEL))
> +	    !try_to_release_page(page, GFP_KERNEL)) {
> +		trace_printk("try_to_release_page failed for a_ops:%pS\n", page->a_ops);
>   		return -EAGAIN;
> +	}
>   
>   	return migrate_page(mapping, newpage, page, mode);
>   }


----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

[-- Attachment #2: OOM_4.7.0_p1.tar.bz2 --]
[-- Type: application/x-bzip, Size: 2325210 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-15  9:16                                 ` Vlastimil Babka
  2016-08-15 15:01                                   ` Michal Hocko
@ 2016-08-16  3:12                                   ` Joonsoo Kim
  2016-08-16  7:44                                     ` Vlastimil Babka
  2016-08-17  4:48                                     ` Ralf-Peter Rohbeck
  1 sibling, 2 replies; 50+ messages in thread
From: Joonsoo Kim @ 2016-08-16  3:12 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Ralf-Peter Rohbeck, Michal Hocko, linux-mm

On Mon, Aug 15, 2016 at 11:16:36AM +0200, Vlastimil Babka wrote:
> On 08/15/2016 06:48 AM, Ralf-Peter Rohbeck wrote:
> > On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote:
> >>
> > Took me a little longer than expected due to work. The failure wouldn't 
> > happen for a while and so I started a couple of scripts and let them 
> > run. When I checked today the server didn't respond on the network and 
> > sure enough it had killed everything. This is with 4.7.0 with the config 
> > based on Debian 4.7-rc7.
> > 
> > trace_pipe got a little big (5GB) so I uploaded the logs to 
> > https://filebin.net/box0wycfouvhl6sr/OOM_4.7.0.tar.bz2. before_btrfs is 
> > before the btrfs filesystems were mounted.
> > I did run a btrfs balance because it creates IO load and I needed to 
> > balance anyway. Maybe that's what caused it?
> 
> pgmigrate_success        46738962
> pgmigrate_fail          135649772
> compact_migrate_scanned 309726659
> compact_free_scanned   9715615169
> compact_isolated        229689596
> compact_stall 4777
> compact_fail 3068
> compact_success 1709
> compact_daemon_wake 207834
> 
> The migration failures are quite enormous. Very quick analysis of the
> trace seems to confirm that these are mostly "real", as opposed to result
> of failure to isolate free pages for migration targets, although the free
> scanner spent a lot of time:

I don't think that main reason of OOM is 'real' migration failure.
If it is the case, compaction would find next migratable pages and
eventually some of pages would be migrated successfully.

pagetypeinfo shows that there are too many unmovable pageblock.
Freepage scanner don't scan those pageblocks so there is a large
possibility that it cannot find freepages even if the system has many
freepages. I think that this is the root cause of the problem.

It's better to check that following work-around help the problem.

Thanks.

------------>8-----------
diff --git a/mm/compaction.c b/mm/compaction.c
index 9affb29..965eddd 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1082,10 +1082,6 @@ static void isolate_freepages(struct compact_control *cc)
                if (!page)
                        continue;
 
-               /* Check the block is suitable for migration */
-               if (!suitable_migration_target(page))
-                       continue;
-
                /* If isolation recently failed, do not retry */
                if (!isolation_suitable(cc, page))
                        continue;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-15 18:42                                     ` Ralf-Peter Rohbeck
@ 2016-08-16  7:32                                       ` Michal Hocko
  2016-08-16  7:43                                         ` Michal Hocko
  2016-08-17  0:26                                         ` Ralf-Peter Rohbeck
  0 siblings, 2 replies; 50+ messages in thread
From: Michal Hocko @ 2016-08-16  7:32 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck; +Cc: Vlastimil Babka, linux-mm

On Mon 15-08-16 11:42:11, Ralf-Peter Rohbeck wrote:
> This time the OOM killer hit much quicker. No btrfs balance, just compiling
> the kernel with the new change did it.
> Much smaller logs so I'm attaching them.

Just to clarify. You have added the trace_printk for
try_to_release_page, right? (after fixing it of course). If yes there is
no single mention of that path failing which would support Joonsoo's
theory... Could you try with his patch?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-16  7:32                                       ` Michal Hocko
@ 2016-08-16  7:43                                         ` Michal Hocko
  2016-08-17  9:14                                           ` Ralf-Peter Rohbeck
  2016-08-17  0:26                                         ` Ralf-Peter Rohbeck
  1 sibling, 1 reply; 50+ messages in thread
From: Michal Hocko @ 2016-08-16  7:43 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck; +Cc: Vlastimil Babka, linux-mm

On Tue 16-08-16 09:32:46, Michal Hocko wrote:
> On Mon 15-08-16 11:42:11, Ralf-Peter Rohbeck wrote:
> > This time the OOM killer hit much quicker. No btrfs balance, just compiling
> > the kernel with the new change did it.
> > Much smaller logs so I'm attaching them.
> 
> Just to clarify. You have added the trace_printk for
> try_to_release_page, right? (after fixing it of course). If yes there is
> no single mention of that path failing which would support Joonsoo's
> theory... Could you try with his patch?

And then it would be great if you could test with the current linux-next
tree. Vlastimil has done some changes which might help. But even if they
don't then it would be better to add more changes on top of them.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-16  3:12                                   ` Joonsoo Kim
@ 2016-08-16  7:44                                     ` Vlastimil Babka
  2016-08-17  4:48                                     ` Ralf-Peter Rohbeck
  1 sibling, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2016-08-16  7:44 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Ralf-Peter Rohbeck, Michal Hocko, linux-mm

On 08/16/2016 05:12 AM, Joonsoo Kim wrote:
> On Mon, Aug 15, 2016 at 11:16:36AM +0200, Vlastimil Babka wrote:
>> On 08/15/2016 06:48 AM, Ralf-Peter Rohbeck wrote:
>>> On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote:
>>>>
>>> Took me a little longer than expected due to work. The failure wouldn't 
>>> happen for a while and so I started a couple of scripts and let them 
>>> run. When I checked today the server didn't respond on the network and 
>>> sure enough it had killed everything. This is with 4.7.0 with the config 
>>> based on Debian 4.7-rc7.
>>>
>>> trace_pipe got a little big (5GB) so I uploaded the logs to 
>>> https://filebin.net/box0wycfouvhl6sr/OOM_4.7.0.tar.bz2. before_btrfs is 
>>> before the btrfs filesystems were mounted.
>>> I did run a btrfs balance because it creates IO load and I needed to 
>>> balance anyway. Maybe that's what caused it?
>>
>> pgmigrate_success        46738962
>> pgmigrate_fail          135649772
>> compact_migrate_scanned 309726659
>> compact_free_scanned   9715615169
>> compact_isolated        229689596
>> compact_stall 4777
>> compact_fail 3068
>> compact_success 1709
>> compact_daemon_wake 207834
>>
>> The migration failures are quite enormous. Very quick analysis of the
>> trace seems to confirm that these are mostly "real", as opposed to result
>> of failure to isolate free pages for migration targets, although the free
>> scanner spent a lot of time:
> 
> I don't think that main reason of OOM is 'real' migration failure.
> If it is the case, compaction would find next migratable pages and
> eventually some of pages would be migrated successfully.
> 
> pagetypeinfo shows that there are too many unmovable pageblock.

Hmm, well spotted. And also somewhat suspicious, I would expect
filesystem activity to result in reclaimable allocations, not unmovable
(not that it makes any difference for compaction).

Checking nr_slab_* in zoneinfo shows that it really should be mostly
reclaimable:

nr_slab_reclaimable 0
nr_slab_unreclaimable 0
nr_slab_reclaimable 32709
nr_slab_unreclaimable 2764
nr_slab_reclaimable 101525
nr_slab_unreclaimable 10852

Compared with:

Number of blocks type     Unmovable      Movable  Reclaimable   HighAtomic      Isolate 
Node 0, zone      DMA            1            7            0            0            0 
Node 0, zone    DMA32          893           72           51            0            0 
Node 0, zone   Normal         2780          155          137            0            0 

We have 188 reclaimable blocks, that's 96256 pages. sum of nr_slab_reclaimable
is 134234, which suggests some fallbacks into unmovable blocks. But the rest
of all of those unmovable pageblocks must be filled by something else... some
btrfs buffers maybe?

> Freepage scanner don't scan those pageblocks so there is a large
> possibility that it cannot find freepages even if the system has many
> freepages. I think that this is the root cause of the problem.
> 
> It's better to check that following work-around help the problem.

Yes this might be good idea, minimally for higher compaction priorities.

Thanks.

> Thanks.
> 
> ------------>8-----------
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 9affb29..965eddd 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1082,10 +1082,6 @@ static void isolate_freepages(struct compact_control *cc)
>                 if (!page)
>                         continue;
>  
> -               /* Check the block is suitable for migration */
> -               if (!suitable_migration_target(page))
> -                       continue;
> -
>                 /* If isolation recently failed, do not retry */
>                 if (!isolation_suitable(cc, page))
>                         continue;
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-16  7:32                                       ` Michal Hocko
  2016-08-16  7:43                                         ` Michal Hocko
@ 2016-08-17  0:26                                         ` Ralf-Peter Rohbeck
  2016-08-17  7:43                                           ` Vlastimil Babka
  1 sibling, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-17  0:26 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Vlastimil Babka, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1418 bytes --]

No it wasn't yet in the last run. That OOM happened while I compiled the 
last change.
I ran another test with the trace_printk: See attached. Again I ran only 
a kernel compilation.

Ralf-Peter

On 16.08.2016 00:32, Michal Hocko wrote:
> On Mon 15-08-16 11:42:11, Ralf-Peter Rohbeck wrote:
>> This time the OOM killer hit much quicker. No btrfs balance, just compiling
>> the kernel with the new change did it.
>> Much smaller logs so I'm attaching them.
> Just to clarify. You have added the trace_printk for
> try_to_release_page, right? (after fixing it of course). If yes there is
> no single mention of that path failing which would support Joonsoo's
> theory... Could you try with his patch?


----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

[-- Attachment #2: OOM_4.7.0_p2.tar.bz2 --]
[-- Type: application/x-bzip, Size: 761047 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-16  3:12                                   ` Joonsoo Kim
  2016-08-16  7:44                                     ` Vlastimil Babka
@ 2016-08-17  4:48                                     ` Ralf-Peter Rohbeck
  2016-08-17  7:56                                       ` Vlastimil Babka
  1 sibling, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-17  4:48 UTC (permalink / raw)
  To: Joonsoo Kim, Vlastimil Babka; +Cc: Michal Hocko, linux-mm

[-- Attachment #1: Type: text/plain, Size: 3674 bytes --]

On 15.08.2016 20:12, Joonsoo Kim wrote:
> On Mon, Aug 15, 2016 at 11:16:36AM +0200, Vlastimil Babka wrote:
>> On 08/15/2016 06:48 AM, Ralf-Peter Rohbeck wrote:
>>> On 02.08.2016 12:25, Ralf-Peter Rohbeck wrote:
>>> Took me a little longer than expected due to work. The failure wouldn't
>>> happen for a while and so I started a couple of scripts and let them
>>> run. When I checked today the server didn't respond on the network and
>>> sure enough it had killed everything. This is with 4.7.0 with the config
>>> based on Debian 4.7-rc7.
>>>
>>> trace_pipe got a little big (5GB) so I uploaded the logs to
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__filebin.net_box0wycfouvhl6sr_OOM-5F4.7.0.tar.bz2&d=DQIBAg&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=yGQdEpZknbtYvR0TyhkCGu-ifLklIvXIf740poRFltQ&m=5VwXI8Iw4BejxSrNmLdOj-bp6ZZXeBJ_-ENR4F0NToo&s=KuzRUwyq4itin6x-UJT-XYbJ9q0tOSt3zQuEYZyHKqE&e= . before_btrfs is
>>> before the btrfs filesystems were mounted.
>>> I did run a btrfs balance because it creates IO load and I needed to
>>> balance anyway. Maybe that's what caused it?
>> pgmigrate_success        46738962
>> pgmigrate_fail          135649772
>> compact_migrate_scanned 309726659
>> compact_free_scanned   9715615169
>> compact_isolated        229689596
>> compact_stall 4777
>> compact_fail 3068
>> compact_success 1709
>> compact_daemon_wake 207834
>>
>> The migration failures are quite enormous. Very quick analysis of the
>> trace seems to confirm that these are mostly "real", as opposed to result
>> of failure to isolate free pages for migration targets, although the free
>> scanner spent a lot of time:
> I don't think that main reason of OOM is 'real' migration failure.
> If it is the case, compaction would find next migratable pages and
> eventually some of pages would be migrated successfully.
>
> pagetypeinfo shows that there are too many unmovable pageblock.
> Freepage scanner don't scan those pageblocks so there is a large
> possibility that it cannot find freepages even if the system has many
> freepages. I think that this is the root cause of the problem.
>
> It's better to check that following work-around help the problem.
>
> Thanks.
>
> ------------>8-----------
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 9affb29..965eddd 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1082,10 +1082,6 @@ static void isolate_freepages(struct compact_control *cc)
>                  if (!page)
>                          continue;
>   
> -               /* Check the block is suitable for migration */
> -               if (!suitable_migration_target(page))
> -                       continue;
> -
>                  /* If isolation recently failed, do not retry */
>                  if (!isolation_suitable(cc, page))
>                          continue;
>
That seemed to help a little (subjectively) but still OOM killed a 
kernel build. The logs are attached.

Thanks,
Ralf-Peter


----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

[-- Attachment #2: OOM_4.7.0_p3.tar.bz2 --]
[-- Type: application/x-bzip, Size: 670163 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-17  0:26                                         ` Ralf-Peter Rohbeck
@ 2016-08-17  7:43                                           ` Vlastimil Babka
  0 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2016-08-17  7:43 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck, Michal Hocko; +Cc: linux-mm

On 08/17/2016 02:26 AM, Ralf-Peter Rohbeck wrote:
> No it wasn't yet in the last run. That OOM happened while I compiled the 
> last change.

You mean those pr_infos?

>From those we've got:

Aug 16 17:14:26 fs kernel: [ 1817.044778] XXX: compaction_failed
Aug 16 17:15:37 fs kernel: [ 1888.387817] XXX: compaction_failed
Aug 16 17:17:32 fs kernel: [ 2002.879726] XXX: compaction_failed

e.g. none of the "XXX: no zone suitable for compaction" lines

I think my series in mmotm tree could help here.

> I ran another test with the trace_printk: See attached. Again I ran only 
> a kernel compilation.

so, the trace_printk didn't hit that many times:

grep try_to_release trace_pipe.log | wc -l
52

and vmstat_after shows:

pgmigrate_success 851
pgmigrate_fail 817
compact_migrate_scanned 567689
compact_free_scanned 50744242
compact_isolated 19196
compact_stall 876
compact_fail 801
compact_success 75

pagetype_after:

Number of blocks type     Unmovable      Movable  Reclaimable   HighAtomic      Isolate 
Node 0, zone      DMA            1            7            0            0            0 
Node 0, zone    DMA32          883           91           42            0            0 
Node 0, zone   Normal         2750          207          115            0            0 

So while btrfs migrate failures could be real, in this run it was rather the free
scanner struggling due to unmovable blocks, as Joonsoo suggested.

> Ralf-Peter
> 
> On 16.08.2016 00:32, Michal Hocko wrote:
>> On Mon 15-08-16 11:42:11, Ralf-Peter Rohbeck wrote:
>>> This time the OOM killer hit much quicker. No btrfs balance, just compiling
>>> the kernel with the new change did it.
>>> Much smaller logs so I'm attaching them.
>> Just to clarify. You have added the trace_printk for
>> try_to_release_page, right? (after fixing it of course). If yes there is
>> no single mention of that path failing which would support Joonsoo's
>> theory... Could you try with his patch?
> 
> 
> ----------------------------------------------------------------------
> The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-17  4:48                                     ` Ralf-Peter Rohbeck
@ 2016-08-17  7:56                                       ` Vlastimil Babka
  2016-08-17  8:16                                         ` Joonsoo Kim
  2016-08-17  9:11                                         ` Ralf-Peter Rohbeck
  0 siblings, 2 replies; 50+ messages in thread
From: Vlastimil Babka @ 2016-08-17  7:56 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck, Joonsoo Kim; +Cc: Michal Hocko, linux-mm

On 08/17/2016 06:48 AM, Ralf-Peter Rohbeck wrote:
>> ------------>8-----------
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 9affb29..965eddd 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1082,10 +1082,6 @@ static void isolate_freepages(struct compact_control *cc)
>>                  if (!page)
>>                          continue;
>>   
>> -               /* Check the block is suitable for migration */
>> -               if (!suitable_migration_target(page))
>> -                       continue;
>> -
>>                  /* If isolation recently failed, do not retry */
>>                  if (!isolation_suitable(cc, page))
>>                          continue;
>>
> That seemed to help a little (subjectively) but still OOM killed a 
> kernel build. The logs are attached.

> grep XXX messages 
Aug 16 20:29:13 fs kernel: [ 6850.467250] XXX: compaction_failed

pagetypeinfo_after:
Number of blocks type     Unmovable      Movable  Reclaimable   HighAtomic      Isolate 
Node 0, zone      DMA            1            7            0            0            0 
Node 0, zone    DMA32          879           93           44            0            0 
Node 0, zone   Normal         2862          136           74            0            0 

vmstat_after:
pgmigrate_success 5123
pgmigrate_fail 4106
compact_migrate_scanned 62019
compact_free_scanned 44314328
compact_isolated 18572
compact_stall 327
compact_fail 236
compact_success 91
compact_daemon_wake 1162

> grep try_to_release trace_pipe.log | wc -l
0

Again, migration failures are there but not so many, and failures to
isolate freepages stand out. I assume it's because the kernel build
workload and not the btrfs balance one.

I think the patches in mmotm could make compaction try harder and use
more appropriate watermarks, but it's not guaranteed that will help.
The free scanner seems to become more and more a fundamental problem.

And I really wonder how did all those unmovable pageblocks happen.
AFAICS zoneinfo shows that most of memory is occupied by file lru pages.
These should be movable.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-17  7:56                                       ` Vlastimil Babka
@ 2016-08-17  8:16                                         ` Joonsoo Kim
  2016-08-17  9:21                                           ` Ralf-Peter Rohbeck
  2016-08-17  9:11                                         ` Ralf-Peter Rohbeck
  1 sibling, 1 reply; 50+ messages in thread
From: Joonsoo Kim @ 2016-08-17  8:16 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Ralf-Peter Rohbeck, Joonsoo Kim, Michal Hocko, linux-mm

2016-08-17 16:56 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
> On 08/17/2016 06:48 AM, Ralf-Peter Rohbeck wrote:
>>> ------------>8-----------
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index 9affb29..965eddd 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -1082,10 +1082,6 @@ static void isolate_freepages(struct compact_control *cc)
>>>                  if (!page)
>>>                          continue;
>>>
>>> -               /* Check the block is suitable for migration */
>>> -               if (!suitable_migration_target(page))
>>> -                       continue;
>>> -
>>>                  /* If isolation recently failed, do not retry */
>>>                  if (!isolation_suitable(cc, page))
>>>                          continue;
>>>
>> That seemed to help a little (subjectively) but still OOM killed a
>> kernel build. The logs are attached.
>
>> grep XXX messages
> Aug 16 20:29:13 fs kernel: [ 6850.467250] XXX: compaction_failed
>
> pagetypeinfo_after:
> Number of blocks type     Unmovable      Movable  Reclaimable   HighAtomic      Isolate
> Node 0, zone      DMA            1            7            0            0            0
> Node 0, zone    DMA32          879           93           44            0            0
> Node 0, zone   Normal         2862          136           74            0            0
>
> vmstat_after:
> pgmigrate_success 5123
> pgmigrate_fail 4106
> compact_migrate_scanned 62019
> compact_free_scanned 44314328
> compact_isolated 18572
> compact_stall 327
> compact_fail 236
> compact_success 91
> compact_daemon_wake 1162
>
>> grep try_to_release trace_pipe.log | wc -l
> 0
>
> Again, migration failures are there but not so many, and failures to
> isolate freepages stand out. I assume it's because the kernel build
> workload and not the btrfs balance one.
>
> I think the patches in mmotm could make compaction try harder and use
> more appropriate watermarks, but it's not guaranteed that will help.
> The free scanner seems to become more and more a fundamental problem.

Following trace is last compaction trial before triggering OOM.
Free scanner start at 0x27fe00 but actual scan happens at 0x186a00.
And, although log is snipped, compaction fails because it doesn't find
any freepage.

It skips half of pageblocks in that zone. It would be due to
migratetype or skipbit.
Both Vlastimil's recent patches and my work-around should be applied to solve
this problem.

Other part of trace looks like that my work-around isn't applied.
Could you confirm
that?

Thanks.

              sh-14869 [000] ....  6850.456639:
mm_compaction_try_to_compact_pages: order=2 gfp_mask=0x27000c0 mode=1
              sh-14869 [000] ....  6850.456640:
mm_compaction_suitable: node=0 zone=Normal   order=2 ret=continue
              sh-14869 [000] ....  6850.456641: mm_compaction_begin:
zone_start=0x100000 migrate_pfn=0x100000 free_pfn=0x27fe00
zone_end=0x280000, mode=sync
              sh-14869 [000] ....  6850.456641:
mm_compaction_finished: node=0 zone=Normal   order=2 ret=continue
              sh-14869 [000] ....  6850.456648:
mm_compaction_isolate_migratepages: range=(0x100000 ~ 0x10002d)
nr_scanned=45 nr_taken=32
              sh-14869 [000] ....  6850.456834:
mm_compaction_isolate_freepages: range=(0x186a00 ~ 0x186c00)
nr_scanned=512 nr_taken=0
              sh-14869 [000] ....  6850.456842:
mm_compaction_isolate_freepages: range=(0x186800 ~ 0x186a00)
nr_scanned=512 nr_taken=0


> And I really wonder how did all those unmovable pageblocks happen.
> AFAICS zoneinfo shows that most of memory is occupied by file lru pages.
> These should be movable.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-17  7:56                                       ` Vlastimil Babka
  2016-08-17  8:16                                         ` Joonsoo Kim
@ 2016-08-17  9:11                                         ` Ralf-Peter Rohbeck
  2016-08-17  9:20                                           ` Vlastimil Babka
  1 sibling, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-17  9:11 UTC (permalink / raw)
  To: Vlastimil Babka, Joonsoo Kim; +Cc: Michal Hocko, linux-mm

On 17.08.2016 00:56, Vlastimil Babka wrote:
>
> Again, migration failures are there but not so many, and failures to
> isolate freepages stand out. I assume it's because the kernel build
> workload and not the btrfs balance one.
>
> I think the patches in mmotm could make compaction try harder and use
> more appropriate watermarks, but it's not guaranteed that will help.
> The free scanner seems to become more and more a fundamental problem.
>
> And I really wonder how did all those unmovable pageblocks happen.
> AFAICS zoneinfo shows that most of memory is occupied by file lru pages.
> These should be movable.

Is it the pressure on the page cache? Don't forget that I write to some 
disk drives (recently, 2) at media speed with dd if=/dev/zero bs=4M 
of=/dev/SDX.


----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-16  7:43                                         ` Michal Hocko
@ 2016-08-17  9:14                                           ` Ralf-Peter Rohbeck
  2016-08-17  9:23                                             ` Vlastimil Babka
  0 siblings, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-17  9:14 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Vlastimil Babka, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]

On 16.08.2016 00:43, Michal Hocko wrote:
> On Tue 16-08-16 09:32:46, Michal Hocko wrote:
>> On Mon 15-08-16 11:42:11, Ralf-Peter Rohbeck wrote:
>>> This time the OOM killer hit much quicker. No btrfs balance, just compiling
>>> the kernel with the new change did it.
>>> Much smaller logs so I'm attaching them.
>> Just to clarify. You have added the trace_printk for
>> try_to_release_page, right? (after fixing it of course). If yes there is
>> no single mention of that path failing which would support Joonsoo's
>> theory... Could you try with his patch?
> And then it would be great if you could test with the current linux-next
> tree. Vlastimil has done some changes which might help. But even if they
> don't then it would be better to add more changes on top of them.

Results with 4.8.0-rc2 are attached. OOM happened rather quickly.


Ralf-Peter

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

[-- Attachment #2: OOM_4.8.0-rc2.tar.bz2 --]
[-- Type: application/x-bzip, Size: 83610 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-17  9:11                                         ` Ralf-Peter Rohbeck
@ 2016-08-17  9:20                                           ` Vlastimil Babka
  0 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2016-08-17  9:20 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck, Joonsoo Kim; +Cc: Michal Hocko, linux-mm

On 08/17/2016 11:11 AM, Ralf-Peter Rohbeck wrote:
> On 17.08.2016 00:56, Vlastimil Babka wrote:
>> And I really wonder how did all those unmovable pageblocks happen.
>> AFAICS zoneinfo shows that most of memory is occupied by file lru pages.
>> These should be movable.
>
> Is it the pressure on the page cache? Don't forget that I write to some
> disk drives (recently, 2) at media speed with dd if=/dev/zero bs=4M
> of=/dev/SDX.

Hmm page cache should be movable. But maybe it's different for writing 
to devices and not filesystems.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-17  8:16                                         ` Joonsoo Kim
@ 2016-08-17  9:21                                           ` Ralf-Peter Rohbeck
  0 siblings, 0 replies; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-17  9:21 UTC (permalink / raw)
  To: Joonsoo Kim, Vlastimil Babka; +Cc: Joonsoo Kim, Michal Hocko, linux-mm

On 17.08.2016 01:16, Joonsoo Kim wrote:
>
> Free scanner start at 0x27fe00 but actual scan happens at 0x186a00.
> And, although log is snipped, compaction fails because it doesn't find
> any freepage.
>
> It skips half of pageblocks in that zone. It would be due to
> migratetype or skipbit.
> Both Vlastimil's recent patches and my work-around should be applied to solve
> this problem.
>
> Other part of trace looks like that my work-around isn't applied.
> Could you confirm
> that?
>
> Thanks.
Your patch was in my last 4.7 run with the output in 
OOM_4.7.0_p3.tar.bz2 but not in _p2.

Ralf-Peter

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-17  9:14                                           ` Ralf-Peter Rohbeck
@ 2016-08-17  9:23                                             ` Vlastimil Babka
  2016-08-17  9:28                                               ` Ralf-Peter Rohbeck
  0 siblings, 1 reply; 50+ messages in thread
From: Vlastimil Babka @ 2016-08-17  9:23 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck, Michal Hocko; +Cc: linux-mm

On 08/17/2016 11:14 AM, Ralf-Peter Rohbeck wrote:
> On 16.08.2016 00:43, Michal Hocko wrote:
>> On Tue 16-08-16 09:32:46, Michal Hocko wrote:
>>> On Mon 15-08-16 11:42:11, Ralf-Peter Rohbeck wrote:
>>>> This time the OOM killer hit much quicker. No btrfs balance, just compiling
>>>> the kernel with the new change did it.
>>>> Much smaller logs so I'm attaching them.
>>> Just to clarify. You have added the trace_printk for
>>> try_to_release_page, right? (after fixing it of course). If yes there is
>>> no single mention of that path failing which would support Joonsoo's
>>> theory... Could you try with his patch?
>> And then it would be great if you could test with the current linux-next
>> tree. Vlastimil has done some changes which might help. But even if they
>> don't then it would be better to add more changes on top of them.
>
> Results with 4.8.0-rc2 are attached. OOM happened rather quickly.

4.8.0-rc2 is not "linux-next". What Michal meant is the linux-next git 
(there's no tarball on kernel.org for it):
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

> Ralf-Peter
>
> ----------------------------------------------------------------------
> The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-17  9:23                                             ` Vlastimil Babka
@ 2016-08-17  9:28                                               ` Ralf-Peter Rohbeck
  2016-08-17  9:33                                                 ` Michal Hocko
  0 siblings, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-17  9:28 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko; +Cc: linux-mm

On 17.08.2016 02:23, Vlastimil Babka wrote:
> On 08/17/2016 11:14 AM, Ralf-Peter Rohbeck wrote:
>> On 16.08.2016 00:43, Michal Hocko wrote:
>>> On Tue 16-08-16 09:32:46, Michal Hocko wrote:
>>>> On Mon 15-08-16 11:42:11, Ralf-Peter Rohbeck wrote:
>>>>> This time the OOM killer hit much quicker. No btrfs balance, just 
>>>>> compiling
>>>>> the kernel with the new change did it.
>>>>> Much smaller logs so I'm attaching them.
>>>> Just to clarify. You have added the trace_printk for
>>>> try_to_release_page, right? (after fixing it of course). If yes 
>>>> there is
>>>> no single mention of that path failing which would support Joonsoo's
>>>> theory... Could you try with his patch?
>>> And then it would be great if you could test with the current 
>>> linux-next
>>> tree. Vlastimil has done some changes which might help. But even if 
>>> they
>>> don't then it would be better to add more changes on top of them.
>>
>> Results with 4.8.0-rc2 are attached. OOM happened rather quickly.
>
> 4.8.0-rc2 is not "linux-next". What Michal meant is the linux-next git 
> (there's no tarball on kernel.org for it):
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

Hmm. I added linux-next git, fetched it etc but apparently I didn't 
check out the right branch. Do you want next-20160817?

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-17  9:28                                               ` Ralf-Peter Rohbeck
@ 2016-08-17  9:33                                                 ` Michal Hocko
  2016-08-17 23:37                                                   ` Ralf-Peter Rohbeck
  0 siblings, 1 reply; 50+ messages in thread
From: Michal Hocko @ 2016-08-17  9:33 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck; +Cc: Vlastimil Babka, linux-mm

On Wed 17-08-16 02:28:35, Ralf-Peter Rohbeck wrote:
> On 17.08.2016 02:23, Vlastimil Babka wrote:
[...]
> > 4.8.0-rc2 is not "linux-next". What Michal meant is the linux-next git
> > (there's no tarball on kernel.org for it):
> > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> 
> Hmm. I added linux-next git, fetched it etc but apparently I didn't check
> out the right branch. Do you want next-20160817?

Yes this one should be OK. It contains Vlastimil's patches.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-17  9:33                                                 ` Michal Hocko
@ 2016-08-17 23:37                                                   ` Ralf-Peter Rohbeck
  2016-08-18  6:57                                                     ` Vlastimil Babka
  0 siblings, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-17 23:37 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Vlastimil Babka, linux-mm

On 17.08.2016 02:33, Michal Hocko wrote:
> On Wed 17-08-16 02:28:35, Ralf-Peter Rohbeck wrote:
>> On 17.08.2016 02:23, Vlastimil Babka wrote:
> [...]
>>> 4.8.0-rc2 is not "linux-next". What Michal meant is the linux-next git
>>> (there's no tarball on kernel.org for it):
>>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>> Hmm. I added linux-next git, fetched it etc but apparently I didn't check
>> out the right branch. Do you want next-20160817?
> Yes this one should be OK. It contains Vlastimil's patches.
>
> Thanks!

This has been working so far. I built a kernel successfully, with dd 
writing to two drives. There were a number of messages in the trace pipe 
but compaction/migration always succeeded it seems.
I'll run the big torture test overnight.

Ralf-Peter

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-17 23:37                                                   ` Ralf-Peter Rohbeck
@ 2016-08-18  6:57                                                     ` Vlastimil Babka
  2016-08-18 20:01                                                       ` Ralf-Peter Rohbeck
  0 siblings, 1 reply; 50+ messages in thread
From: Vlastimil Babka @ 2016-08-18  6:57 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck, Michal Hocko; +Cc: linux-mm

On 08/18/2016 01:37 AM, Ralf-Peter Rohbeck wrote:
> On 17.08.2016 02:33, Michal Hocko wrote:
>> On Wed 17-08-16 02:28:35, Ralf-Peter Rohbeck wrote:
>>> On 17.08.2016 02:23, Vlastimil Babka wrote:
>> [...]
>>>> 4.8.0-rc2 is not "linux-next". What Michal meant is the linux-next git
>>>> (there's no tarball on kernel.org for it):
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>>> Hmm. I added linux-next git, fetched it etc but apparently I didn't check
>>> out the right branch. Do you want next-20160817?
>> Yes this one should be OK. It contains Vlastimil's patches.
>>
>> Thanks!
>
> This has been working so far. I built a kernel successfully, with dd
> writing to two drives. There were a number of messages in the trace pipe
> but compaction/migration always succeeded it seems.
> I'll run the big torture test overnight.

Good news, thanks. Did you also apply Joonsoo's suggested removal of 
suitable_migration_target() check, or is this just the linux-next 
version with added trace_printk()/pr_info()?

Vlastimil

> Ralf-Peter
>
> ----------------------------------------------------------------------
> The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-18  6:57                                                     ` Vlastimil Babka
@ 2016-08-18 20:01                                                       ` Ralf-Peter Rohbeck
  2016-08-18 20:12                                                         ` Vlastimil Babka
  0 siblings, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-18 20:01 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko; +Cc: linux-mm

On 17.08.2016 23:57, Vlastimil Babka wrote:
>>>> Hmm. I added linux-next git, fetched it etc but apparently I didn't check
>>>> out the right branch. Do you want next-20160817?
>>> Yes this one should be OK. It contains Vlastimil's patches.
>>>
>>> Thanks!
>> This has been working so far. I built a kernel successfully, with dd
>> writing to two drives. There were a number of messages in the trace pipe
>> but compaction/migration always succeeded it seems.
>> I'll run the big torture test overnight.
> Good news, thanks. Did you also apply Joonsoo's suggested removal of
> suitable_migration_target() check, or is this just the linux-next
> version with added trace_printk()/pr_info()?
>
> Vlastimil
Yes, that change was in my test with linux-next-20160817. Here's the diff:

diff --git a/mm/compaction.c b/mm/compaction.c
index f94ae67..60a9ca2 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1083,8 +1083,10 @@ static void isolate_freepages(struct 
compact_control *cc)
                         continue;

                 /* Check the block is suitable for migration */
+/*
                 if (!suitable_migration_target(page))
                         continue;
+*/

                 /* If isolation recently failed, do not retry */
                 if (!isolation_suitable(cc, page))
diff --git a/mm/migrate.c b/mm/migrate.c
index f7ee04a..b1176a4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -827,8 +827,10 @@ static int fallback_migrate_page(struct 
address_space *mapping,
          * We must have no buffers or drop them.
          */
         if (page_has_private(page) &&
-           !try_to_release_page(page, GFP_KERNEL))
+           !try_to_release_page(page, GFP_KERNEL)) {
+               trace_printk("try_to_release_page failed for 
a_ops:%pS\n", page->mapping->a_ops);
                 return -EAGAIN;
+       }

         return migrate_page(mapping, newpage, page, mode);
  }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5637733..b443652 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3202,8 +3202,12 @@ should_compact_retry(struct alloc_context *ac, 
int order, int alloc_flags,
          * But do not retry if the given zonelist is not suitable for
          * compaction.
          */
-       if (compaction_withdrawn(compact_result))
-               return compaction_zonelist_suitable(ac, order, alloc_flags);
+       if (compaction_withdrawn(compact_result)) {
+               int ret = compaction_zonelist_suitable(ac, order, 
alloc_flags);
+               if (!ret)
+                       pr_info("XXX: no zone suitable for compaction\n");
+               return ret;
+       }

         /*
          * !costly requests are much more important than __GFP_REPEAT
@@ -3227,6 +3231,7 @@ check_priority:
                 (*compact_priority)--;
                 return true;
         }
+       pr_info("XXX: compaction retries fail after %d\n", 
compaction_retries);
         return false;
  }
  #else

It ran the whole night with continuous torture tests and writing to two 
drives. No OOM.
Logs are at 
https://filebin.net/l2kp3iit8dj0fq6q/OOM_4.8.0-next-20160817.tar.bz2.

Thanks for fixing this!
Ralf-Peter

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-18 20:01                                                       ` Ralf-Peter Rohbeck
@ 2016-08-18 20:12                                                         ` Vlastimil Babka
  2016-08-19  2:42                                                           ` Ralf-Peter Rohbeck
  0 siblings, 1 reply; 50+ messages in thread
From: Vlastimil Babka @ 2016-08-18 20:12 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck, Michal Hocko; +Cc: linux-mm

On 18.8.2016 22:01, Ralf-Peter Rohbeck wrote:
> On 17.08.2016 23:57, Vlastimil Babka wrote:
>>>>> Hmm. I added linux-next git, fetched it etc but apparently I didn't check
>>>>> out the right branch. Do you want next-20160817?
>>>> Yes this one should be OK. It contains Vlastimil's patches.
>>>>
>>>> Thanks!
>>> This has been working so far. I built a kernel successfully, with dd
>>> writing to two drives. There were a number of messages in the trace pipe
>>> but compaction/migration always succeeded it seems.
>>> I'll run the big torture test overnight.
>> Good news, thanks. Did you also apply Joonsoo's suggested removal of
>> suitable_migration_target() check, or is this just the linux-next
>> version with added trace_printk()/pr_info()?
>>
>> Vlastimil
> Yes, that change was in my test with linux-next-20160817. Here's the diff:
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index f94ae67..60a9ca2 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1083,8 +1083,10 @@ static void isolate_freepages(struct 
> compact_control *cc)
>                          continue;
> 
>                  /* Check the block is suitable for migration */
> +/*
>                  if (!suitable_migration_target(page))
>                          continue;
> +*/

OK, could you please also try if uncommenting the above still works without OOM?
Or just plain linux-next-20160817, I guess we don't need the printk's to test
this difference.

Thanks a lot!
Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-18 20:12                                                         ` Vlastimil Babka
@ 2016-08-19  2:42                                                           ` Ralf-Peter Rohbeck
  2016-08-19  6:27                                                             ` Vlastimil Babka
  0 siblings, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-19  2:42 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko; +Cc: linux-mm

[-- Attachment #1: Type: text/plain, Size: 2365 bytes --]

On 18.08.2016 13:12, Vlastimil Babka wrote:
> On 18.8.2016 22:01, Ralf-Peter Rohbeck wrote:
>> On 17.08.2016 23:57, Vlastimil Babka wrote:
>>>>>> Hmm. I added linux-next git, fetched it etc but apparently I didn't check
>>>>>> out the right branch. Do you want next-20160817?
>>>>> Yes this one should be OK. It contains Vlastimil's patches.
>>>>>
>>>>> Thanks!
>>>> This has been working so far. I built a kernel successfully, with dd
>>>> writing to two drives. There were a number of messages in the trace pipe
>>>> but compaction/migration always succeeded it seems.
>>>> I'll run the big torture test overnight.
>>> Good news, thanks. Did you also apply Joonsoo's suggested removal of
>>> suitable_migration_target() check, or is this just the linux-next
>>> version with added trace_printk()/pr_info()?
>>>
>>> Vlastimil
>> Yes, that change was in my test with linux-next-20160817. Here's the diff:
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index f94ae67..60a9ca2 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1083,8 +1083,10 @@ static void isolate_freepages(struct
>> compact_control *cc)
>>                           continue;
>>
>>                   /* Check the block is suitable for migration */
>> +/*
>>                   if (!suitable_migration_target(page))
>>                           continue;
>> +*/
> OK, could you please also try if uncommenting the above still works without OOM?
> Or just plain linux-next-20160817, I guess we don't need the printk's to test
> this difference.
>
> Thanks a lot!
> Vlastimil
>
With the two lines back in I had OOMs again. See the attached logs.


Ralf-Peter


----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

[-- Attachment #2: OOM_4.8.0-next-20160817_p2.tar.bz2 --]
[-- Type: application/x-bzip, Size: 59669 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-19  2:42                                                           ` Ralf-Peter Rohbeck
@ 2016-08-19  6:27                                                             ` Vlastimil Babka
  2016-08-19  7:33                                                               ` Michal Hocko
  2016-08-23  5:02                                                               ` Joonsoo Kim
  0 siblings, 2 replies; 50+ messages in thread
From: Vlastimil Babka @ 2016-08-19  6:27 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck, Michal Hocko; +Cc: linux-mm

On 08/19/2016 04:42 AM, Ralf-Peter Rohbeck wrote:
> On 18.08.2016 13:12, Vlastimil Babka wrote:
>> On 18.8.2016 22:01, Ralf-Peter Rohbeck wrote:
>>> On 17.08.2016 23:57, Vlastimil Babka wrote:
>>>> Vlastimil
>>> Yes, that change was in my test with linux-next-20160817. Here's the diff:
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index f94ae67..60a9ca2 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -1083,8 +1083,10 @@ static void isolate_freepages(struct
>>> compact_control *cc)
>>>                           continue;
>>>
>>>                   /* Check the block is suitable for migration */
>>> +/*
>>>                   if (!suitable_migration_target(page))
>>>                           continue;
>>> +*/
>> OK, could you please also try if uncommenting the above still works without OOM?
>> Or just plain linux-next-20160817, I guess we don't need the printk's to test
>> this difference.
>>
>> Thanks a lot!
>> Vlastimil
>>
> With the two lines back in I had OOMs again. See the attached logs.

Thanks for the confirmation.

We however shouldn't disable the heuristic completely, so here's a compromise
patch hooking into the new compaction priorities. Can you please test on top of
linux-next?

-----8<-----

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-19  6:27                                                             ` Vlastimil Babka
@ 2016-08-19  7:33                                                               ` Michal Hocko
  2016-08-19  7:47                                                                 ` Vlastimil Babka
  2016-08-23  5:02                                                               ` Joonsoo Kim
  1 sibling, 1 reply; 50+ messages in thread
From: Michal Hocko @ 2016-08-19  7:33 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Ralf-Peter Rohbeck, linux-mm

On Fri 19-08-16 08:27:34, Vlastimil Babka wrote:
> On 08/19/2016 04:42 AM, Ralf-Peter Rohbeck wrote:
> > On 18.08.2016 13:12, Vlastimil Babka wrote:
> >> On 18.8.2016 22:01, Ralf-Peter Rohbeck wrote:
> >>> On 17.08.2016 23:57, Vlastimil Babka wrote:
> >>>> Vlastimil
> >>> Yes, that change was in my test with linux-next-20160817. Here's the diff:
> >>>
> >>> diff --git a/mm/compaction.c b/mm/compaction.c
> >>> index f94ae67..60a9ca2 100644
> >>> --- a/mm/compaction.c
> >>> +++ b/mm/compaction.c
> >>> @@ -1083,8 +1083,10 @@ static void isolate_freepages(struct
> >>> compact_control *cc)
> >>>                           continue;
> >>>
> >>>                   /* Check the block is suitable for migration */
> >>> +/*
> >>>                   if (!suitable_migration_target(page))
> >>>                           continue;
> >>> +*/
> >> OK, could you please also try if uncommenting the above still works without OOM?
> >> Or just plain linux-next-20160817, I guess we don't need the printk's to test
> >> this difference.
> >>
> >> Thanks a lot!
> >> Vlastimil
> >>
> > With the two lines back in I had OOMs again. See the attached logs.
> 
> Thanks for the confirmation.
> 
> We however shouldn't disable the heuristic completely, so here's a compromise
> patch hooking into the new compaction priorities. Can you please test on top of
> linux-next?
> 
> -----8<-----
> >From 0927cc2a4c6a3247111168eace9012c23d06f9db Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Thu, 18 Aug 2016 16:01:14 +0200
> Subject: [PATCH] mm, compaction: make full priority ignore pageblock
>  suitability
> 
> Ralf-Peter Rohbeck has reported premature OOMs for order-2 allocations (stack)
> due to OOM rework in 4.7. In his scenario (parallel kernel build and dd writing
> to two drives) many pageblocks get marked as Unmovable and compaction free
> scanner struggles to isolate free pages. Joonsoo Kim pointed out that the free
> scanner skips pageblocks that are not movable to prevent filling them and
> forcing non-movable allocations to fallback to other pageblocks. Such heuristic
> makes sense to help prevent long-term fragmentation, but premature OOMs are
> relatively more urgent problem. As a compromise, this patch disables the
> heuristic only for the ultimate compaction priority.
> 
> Reported-by: Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com>
> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Thanks to both of you! I do agree that we should drop all these
heuristics when we struggle and there is an OOM risk. I have just a
small nit here. I would prefer
s@COMPACT_PRIO_SYNC_FULL@MIN_COMPACT_PRIORITY@ when disabling them
because this would be easier to follow and it would be easier for future
changes. Which brings me to another thing I was suggesting earlier. I
believe we should go to this MIN_COMPACT_PRIORITY only for !costly
requests because costly orders shouldn't get all those exceptions and
risk long term fragmentation issues. We do not have that many costly
requests (except for hugetlb) so it doesn't matter all that much right
now but long term we want to differentiate those I believe.

That being said, let's wait for the feedback on this patch + linux-next.
If it works out I will send a stable 4.7 patch which drops compaction
feedback from should_compact_retry (turn it to the !COMPACTION version)
so that 4.7 users do not suffer from the premature OOM and will ask
Andrew to sneak the compaction patches to 4.8 as they fix a real issue
and the risk is not really high.

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/compaction.c | 11 ++++++++---
>  mm/internal.h   |  1 +
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 0bba270f97ad..884b1baa58df 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -997,8 +997,12 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
>  #ifdef CONFIG_COMPACTION
>  
>  /* Returns true if the page is within a block suitable for migration to */
> -static bool suitable_migration_target(struct page *page)
> +static bool suitable_migration_target(struct compact_control *cc,
> +							struct page *page)
>  {
> +	if (cc->ignore_block_suitable)
> +		return true;
> +
>  	/* If the page is a large free page, then disallow migration */
>  	if (PageBuddy(page)) {
>  		/*
> @@ -1083,7 +1087,7 @@ static void isolate_freepages(struct compact_control *cc)
>  			continue;
>  
>  		/* Check the block is suitable for migration */
> -		if (!suitable_migration_target(page))
> +		if (!suitable_migration_target(cc, page))
>  			continue;
>  
>  		/* If isolation recently failed, do not retry */
> @@ -1656,7 +1660,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
>  		.classzone_idx = classzone_idx,
>  		.direct_compaction = true,
>  		.whole_zone = (prio == COMPACT_PRIO_SYNC_FULL),
> -		.ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL)
> +		.ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL),
> +		.ignore_block_suitable = (prio == COMPACT_PRIO_SYNC_FULL)
>  	};
>  	INIT_LIST_HEAD(&cc.freepages);
>  	INIT_LIST_HEAD(&cc.migratepages);
> diff --git a/mm/internal.h b/mm/internal.h
> index 5214bf8e3171..537ac9951f5f 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -178,6 +178,7 @@ struct compact_control {
>  	unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
>  	enum migrate_mode mode;		/* Async or sync migration mode */
>  	bool ignore_skip_hint;		/* Scan blocks even if marked skip */
> +	bool ignore_block_suitable;	/* Scan blocks considered unsuitable */
>  	bool direct_compaction;		/* False from kcompactd or /proc/... */
>  	bool whole_zone;		/* Whole zone should/has been scanned */
>  	int order;			/* order a direct compactor needs */
> -- 
> 2.9.2
> 
> 
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-19  7:33                                                               ` Michal Hocko
@ 2016-08-19  7:47                                                                 ` Vlastimil Babka
  2016-08-19  8:26                                                                   ` Michal Hocko
  0 siblings, 1 reply; 50+ messages in thread
From: Vlastimil Babka @ 2016-08-19  7:47 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton; +Cc: Ralf-Peter Rohbeck, linux-mm, Joonsoo Kim

On 08/19/2016 09:33 AM, Michal Hocko wrote:
> On Fri 19-08-16 08:27:34, Vlastimil Babka wrote:
>> On 08/19/2016 04:42 AM, Ralf-Peter Rohbeck wrote:
>>> On 18.08.2016 13:12, Vlastimil Babka wrote:
>>>> On 18.8.2016 22:01, Ralf-Peter Rohbeck wrote:
>>>>> On 17.08.2016 23:57, Vlastimil Babka wrote:
>>>>>> Vlastimil
>>>>> Yes, that change was in my test with linux-next-20160817. Here's the diff:
>>>>>
>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>> index f94ae67..60a9ca2 100644
>>>>> --- a/mm/compaction.c
>>>>> +++ b/mm/compaction.c
>>>>> @@ -1083,8 +1083,10 @@ static void isolate_freepages(struct
>>>>> compact_control *cc)
>>>>>                           continue;
>>>>>
>>>>>                   /* Check the block is suitable for migration */
>>>>> +/*
>>>>>                   if (!suitable_migration_target(page))
>>>>>                           continue;
>>>>> +*/
>>>> OK, could you please also try if uncommenting the above still works without OOM?
>>>> Or just plain linux-next-20160817, I guess we don't need the printk's to test
>>>> this difference.
>>>>
>>>> Thanks a lot!
>>>> Vlastimil
>>>>
>>> With the two lines back in I had OOMs again. See the attached logs.
>>
>> Thanks for the confirmation.
>>
>> We however shouldn't disable the heuristic completely, so here's a compromise
>> patch hooking into the new compaction priorities. Can you please test on top of
>> linux-next?
>>
>> -----8<-----
>> >From 0927cc2a4c6a3247111168eace9012c23d06f9db Mon Sep 17 00:00:00 2001
>> From: Vlastimil Babka <vbabka@suse.cz>
>> Date: Thu, 18 Aug 2016 16:01:14 +0200
>> Subject: [PATCH] mm, compaction: make full priority ignore pageblock
>>  suitability
>>
>> Ralf-Peter Rohbeck has reported premature OOMs for order-2 allocations (stack)
>> due to OOM rework in 4.7. In his scenario (parallel kernel build and dd writing
>> to two drives) many pageblocks get marked as Unmovable and compaction free
>> scanner struggles to isolate free pages. Joonsoo Kim pointed out that the free
>> scanner skips pageblocks that are not movable to prevent filling them and
>> forcing non-movable allocations to fallback to other pageblocks. Such heuristic
>> makes sense to help prevent long-term fragmentation, but premature OOMs are
>> relatively more urgent problem. As a compromise, this patch disables the
>> heuristic only for the ultimate compaction priority.
>>
>> Reported-by: Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com>
>> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Thanks to both of you! I do agree that we should drop all these
> heuristics when we struggle and there is an OOM risk. I have just a
> small nit here. I would prefer
> s@COMPACT_PRIO_SYNC_FULL@MIN_COMPACT_PRIORITY@ when disabling them
> because this would be easier to follow and it would be easier for future
> changes.

OK, but then we should start with a change to
mm-compaction-add-the-ultimate-direct-compaction-priority.patch
(fix at the end of this e-mail) to make things consistent.
Then I will apply that to the new patch if it's successfully tested.

> Which brings me to another thing I was suggesting earlier. I
> believe we should go to this MIN_COMPACT_PRIORITY only for !costly
> requests because costly orders shouldn't get all those exceptions and
> risk long term fragmentation issues. We do not have that many costly
> requests (except for hugetlb) so it doesn't matter all that much right
> now but long term we want to differentiate those I believe.

I'll send such change afterwards as well.

> That being said, let's wait for the feedback on this patch + linux-next.
> If it works out I will send a stable 4.7 patch which drops compaction
> feedback from should_compact_retry (turn it to the !COMPACTION version)
> so that 4.7 users do not suffer from the premature OOM and will ask
> Andrew to sneak the compaction patches to 4.8 as they fix a real issue
> and the risk is not really high.

Agreed.

> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

-----8<-----

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-19  7:47                                                                 ` Vlastimil Babka
@ 2016-08-19  8:26                                                                   ` Michal Hocko
  2016-08-24 18:13                                                                     ` Ralf-Peter Rohbeck
  0 siblings, 1 reply; 50+ messages in thread
From: Michal Hocko @ 2016-08-19  8:26 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Andrew Morton, Ralf-Peter Rohbeck, linux-mm, Joonsoo Kim

On Fri 19-08-16 09:47:59, Vlastimil Babka wrote:
> On 08/19/2016 09:33 AM, Michal Hocko wrote:
> > On Fri 19-08-16 08:27:34, Vlastimil Babka wrote:
> >> On 08/19/2016 04:42 AM, Ralf-Peter Rohbeck wrote:
> >>> On 18.08.2016 13:12, Vlastimil Babka wrote:
> >>>> On 18.8.2016 22:01, Ralf-Peter Rohbeck wrote:
> >>>>> On 17.08.2016 23:57, Vlastimil Babka wrote:
> >>>>>> Vlastimil
> >>>>> Yes, that change was in my test with linux-next-20160817. Here's the diff:
> >>>>>
> >>>>> diff --git a/mm/compaction.c b/mm/compaction.c
> >>>>> index f94ae67..60a9ca2 100644
> >>>>> --- a/mm/compaction.c
> >>>>> +++ b/mm/compaction.c
> >>>>> @@ -1083,8 +1083,10 @@ static void isolate_freepages(struct
> >>>>> compact_control *cc)
> >>>>>                           continue;
> >>>>>
> >>>>>                   /* Check the block is suitable for migration */
> >>>>> +/*
> >>>>>                   if (!suitable_migration_target(page))
> >>>>>                           continue;
> >>>>> +*/
> >>>> OK, could you please also try if uncommenting the above still works without OOM?
> >>>> Or just plain linux-next-20160817, I guess we don't need the printk's to test
> >>>> this difference.
> >>>>
> >>>> Thanks a lot!
> >>>> Vlastimil
> >>>>
> >>> With the two lines back in I had OOMs again. See the attached logs.
> >>
> >> Thanks for the confirmation.
> >>
> >> We however shouldn't disable the heuristic completely, so here's a compromise
> >> patch hooking into the new compaction priorities. Can you please test on top of
> >> linux-next?
> >>
> >> -----8<-----
> >> >From 0927cc2a4c6a3247111168eace9012c23d06f9db Mon Sep 17 00:00:00 2001
> >> From: Vlastimil Babka <vbabka@suse.cz>
> >> Date: Thu, 18 Aug 2016 16:01:14 +0200
> >> Subject: [PATCH] mm, compaction: make full priority ignore pageblock
> >>  suitability
> >>
> >> Ralf-Peter Rohbeck has reported premature OOMs for order-2 allocations (stack)
> >> due to OOM rework in 4.7. In his scenario (parallel kernel build and dd writing
> >> to two drives) many pageblocks get marked as Unmovable and compaction free
> >> scanner struggles to isolate free pages. Joonsoo Kim pointed out that the free
> >> scanner skips pageblocks that are not movable to prevent filling them and
> >> forcing non-movable allocations to fallback to other pageblocks. Such heuristic
> >> makes sense to help prevent long-term fragmentation, but premature OOMs are
> >> relatively more urgent problem. As a compromise, this patch disables the
> >> heuristic only for the ultimate compaction priority.
> >>
> >> Reported-by: Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com>
> >> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > 
> > Thanks to both of you! I do agree that we should drop all these
> > heuristics when we struggle and there is an OOM risk. I have just a
> > small nit here. I would prefer
> > s@COMPACT_PRIO_SYNC_FULL@MIN_COMPACT_PRIORITY@ when disabling them
> > because this would be easier to follow and it would be easier for future
> > changes.
> 
> OK, but then we should start with a change to
> mm-compaction-add-the-ultimate-direct-compaction-priority.patch
> (fix at the end of this e-mail) to make things consistent.
> Then I will apply that to the new patch if it's successfully tested.

This can go as a separate clean up patch. No need to alter previous
patches sitting in the mmotm.
 
> > Which brings me to another thing I was suggesting earlier. I
> > believe we should go to this MIN_COMPACT_PRIORITY only for !costly
> > requests because costly orders shouldn't get all those exceptions and
> > risk long term fragmentation issues. We do not have that many costly
> > requests (except for hugetlb) so it doesn't matter all that much right
> > now but long term we want to differentiate those I believe.
> 
> I'll send such change afterwards as well.

Thanks!

> > That being said, let's wait for the feedback on this patch + linux-next.
> > If it works out I will send a stable 4.7 patch which drops compaction
> > feedback from should_compact_retry (turn it to the !COMPACTION version)
> > so that 4.7 users do not suffer from the premature OOM and will ask
> > Andrew to sneak the compaction patches to 4.8 as they fix a real issue
> > and the risk is not really high.
> 
> Agreed.
> 
> > Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Thanks!
> 
> -----8<-----
> >From c4da7022e85e52f5463055cdc474656652e7a504 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Fri, 19 Aug 2016 09:40:31 +0200
> Subject: [PATCH] mm, compaction: add the ultimate direct compaction
>  priority-fix
> 
> Use the MIN_COMPACT_PRIORITY alias instead of COMPACT_PRIO_SYNC_FULL to
> disable heuristics "because this would be easier to follow and it would be
> easier for future changes", per Michal.
> 
> Suggested-by: Michal Hocko <mhocko@suse.cz>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Fixes: mmotm mm-compaction-add-the-ultimate-direct-compaction-priority.patch

I guess Fixes is a bit misleading. This is not a bug it is a cleanup
patch.

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/compaction.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ae4f40afcca1..3e35fce2cace 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1644,8 +1644,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
>  		.alloc_flags = alloc_flags,
>  		.classzone_idx = classzone_idx,
>  		.direct_compaction = true,
> -		.whole_zone = (prio == COMPACT_PRIO_SYNC_FULL),
> -		.ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL)
> +		.whole_zone = (prio == MIN_COMPACT_PRIORITY),
> +		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY)
>  	};
>  	INIT_LIST_HEAD(&cc.freepages);
>  	INIT_LIST_HEAD(&cc.migratepages);
> @@ -1691,7 +1691,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>  								ac->nodemask) {
>  		enum compact_result status;
>  
> -		if (prio > COMPACT_PRIO_SYNC_FULL
> +		if (prio > MIN_COMPACT_PRIORITY
>  					&& compaction_deferred(zone, order)) {
>  			rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
>  			continue;
> -- 
> 2.9.2
> 
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-19  6:27                                                             ` Vlastimil Babka
  2016-08-19  7:33                                                               ` Michal Hocko
@ 2016-08-23  5:02                                                               ` Joonsoo Kim
  2016-08-23  7:45                                                                 ` Michal Hocko
  1 sibling, 1 reply; 50+ messages in thread
From: Joonsoo Kim @ 2016-08-23  5:02 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Ralf-Peter Rohbeck, Michal Hocko, linux-mm

On Fri, Aug 19, 2016 at 08:27:34AM +0200, Vlastimil Babka wrote:
> On 08/19/2016 04:42 AM, Ralf-Peter Rohbeck wrote:
> > On 18.08.2016 13:12, Vlastimil Babka wrote:
> >> On 18.8.2016 22:01, Ralf-Peter Rohbeck wrote:
> >>> On 17.08.2016 23:57, Vlastimil Babka wrote:
> >>>> Vlastimil
> >>> Yes, that change was in my test with linux-next-20160817. Here's the diff:
> >>>
> >>> diff --git a/mm/compaction.c b/mm/compaction.c
> >>> index f94ae67..60a9ca2 100644
> >>> --- a/mm/compaction.c
> >>> +++ b/mm/compaction.c
> >>> @@ -1083,8 +1083,10 @@ static void isolate_freepages(struct
> >>> compact_control *cc)
> >>>                           continue;
> >>>
> >>>                   /* Check the block is suitable for migration */
> >>> +/*
> >>>                   if (!suitable_migration_target(page))
> >>>                           continue;
> >>> +*/
> >> OK, could you please also try if uncommenting the above still works without OOM?
> >> Or just plain linux-next-20160817, I guess we don't need the printk's to test
> >> this difference.
> >>
> >> Thanks a lot!
> >> Vlastimil
> >>
> > With the two lines back in I had OOMs again. See the attached logs.
> 
> Thanks for the confirmation.
> 
> We however shouldn't disable the heuristic completely, so here's a compromise
> patch hooking into the new compaction priorities. Can you please test on top of
> linux-next?
> 
> -----8<-----
> >From 0927cc2a4c6a3247111168eace9012c23d06f9db Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Thu, 18 Aug 2016 16:01:14 +0200
> Subject: [PATCH] mm, compaction: make full priority ignore pageblock
>  suitability
> 
> Ralf-Peter Rohbeck has reported premature OOMs for order-2 allocations (stack)
> due to OOM rework in 4.7. In his scenario (parallel kernel build and dd writing
> to two drives) many pageblocks get marked as Unmovable and compaction free
> scanner struggles to isolate free pages. Joonsoo Kim pointed out that the free
> scanner skips pageblocks that are not movable to prevent filling them and
> forcing non-movable allocations to fallback to other pageblocks. Such heuristic
> makes sense to help prevent long-term fragmentation, but premature OOMs are
> relatively more urgent problem. As a compromise, this patch disables the
> heuristic only for the ultimate compaction priority.
> 
> Reported-by: Ralf-Peter Rohbeck <Ralf-Peter.Rohbeck@quantum.com>
> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/compaction.c | 11 ++++++++---
>  mm/internal.h   |  1 +
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 0bba270f97ad..884b1baa58df 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -997,8 +997,12 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
>  #ifdef CONFIG_COMPACTION
>  
>  /* Returns true if the page is within a block suitable for migration to */
> -static bool suitable_migration_target(struct page *page)
> +static bool suitable_migration_target(struct compact_control *cc,
> +							struct page *page)
>  {
> +	if (cc->ignore_block_suitable)
> +		return true;
> +
>  	/* If the page is a large free page, then disallow migration */
>  	if (PageBuddy(page)) {
>  		/*
> @@ -1083,7 +1087,7 @@ static void isolate_freepages(struct compact_control *cc)
>  			continue;
>  
>  		/* Check the block is suitable for migration */
> -		if (!suitable_migration_target(page))
> +		if (!suitable_migration_target(cc, page))
>  			continue;
>  
>  		/* If isolation recently failed, do not retry */
> @@ -1656,7 +1660,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
>  		.classzone_idx = classzone_idx,
>  		.direct_compaction = true,
>  		.whole_zone = (prio == COMPACT_PRIO_SYNC_FULL),
> -		.ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL)
> +		.ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL),
> +		.ignore_block_suitable = (prio == COMPACT_PRIO_SYNC_FULL)

A year ago, I tested to allow unmovable/reclaimable pageblock for
freescanner in very limited situation and found that it cause long-term
fragmentation. I think that this solution is less tight than mine so
I guess it will cause long-term fragmentation. I agree that allocation
success is even more important but it's better not to cause long-term
fragmentation as much as possible. So, my suggestion is...

How about introducing one more priority (last priority) to allow scanning
unmovable/reclaimable pageblock? If we don't reach that priority,
long-term fragmentation can be avoided.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-23  5:02                                                               ` Joonsoo Kim
@ 2016-08-23  7:45                                                                 ` Michal Hocko
  0 siblings, 0 replies; 50+ messages in thread
From: Michal Hocko @ 2016-08-23  7:45 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Vlastimil Babka, Ralf-Peter Rohbeck, linux-mm

On Tue 23-08-16 14:02:52, Joonsoo Kim wrote:
[...]
> How about introducing one more priority (last priority) to allow scanning
> unmovable/reclaimable pageblock? If we don't reach that priority,
> long-term fragmentation can be avoided.

I have already suggested that. We would reach that priority only for
!costly orders. Vlastimil already has plans to cook up a patch for that
but he is on vacation...
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-19  8:26                                                                   ` Michal Hocko
@ 2016-08-24 18:13                                                                     ` Ralf-Peter Rohbeck
  2016-08-25  7:22                                                                       ` Michal Hocko
  0 siblings, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-24 18:13 UTC (permalink / raw)
  To: Michal Hocko, Vlastimil Babka; +Cc: Andrew Morton, linux-mm, Joonsoo Kim

[-- Attachment #1: Type: text/plain, Size: 3303 bytes --]

On 19.08.2016 01:26, Michal Hocko wrote:
>
>>> That being said, let's wait for the feedback on this patch + linux-next.
>>> If it works out I will send a stable 4.7 patch which drops compaction
>>> feedback from should_compact_retry (turn it to the !COMPACTION version)
>>> so that 4.7 users do not suffer from the premature OOM and will ask
>>> Andrew to sneak the compaction patches to 4.8 as they fix a real issue
>>> and the risk is not really high.
>> Agreed.
>>
>>> Acked-by: Michal Hocko <mhocko@suse.com>
>> Thanks!
>>
>> -----8<-----
>> >From c4da7022e85e52f5463055cdc474656652e7a504 Mon Sep 17 00:00:00 2001
>> From: Vlastimil Babka <vbabka@suse.cz>
>> Date: Fri, 19 Aug 2016 09:40:31 +0200
>> Subject: [PATCH] mm, compaction: add the ultimate direct compaction
>>   priority-fix
>>
>> Use the MIN_COMPACT_PRIORITY alias instead of COMPACT_PRIO_SYNC_FULL to
>> disable heuristics "because this would be easier to follow and it would be
>> easier for future changes", per Michal.
>>
>> Suggested-by: Michal Hocko <mhocko@suse.cz>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> Fixes: mmotm mm-compaction-add-the-ultimate-direct-compaction-priority.patch
> I guess Fixes is a bit misleading. This is not a bug it is a cleanup
> patch.
>
> Acked-by: Michal Hocko <mhocko@suse.com>
>
> Thanks!
>
>> ---
>>   mm/compaction.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index ae4f40afcca1..3e35fce2cace 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1644,8 +1644,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
>>   		.alloc_flags = alloc_flags,
>>   		.classzone_idx = classzone_idx,
>>   		.direct_compaction = true,
>> -		.whole_zone = (prio == COMPACT_PRIO_SYNC_FULL),
>> -		.ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL)
>> +		.whole_zone = (prio == MIN_COMPACT_PRIORITY),
>> +		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY)
>>   	};
>>   	INIT_LIST_HEAD(&cc.freepages);
>>   	INIT_LIST_HEAD(&cc.migratepages);
>> @@ -1691,7 +1691,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>>   								ac->nodemask) {
>>   		enum compact_result status;
>>   
>> -		if (prio > COMPACT_PRIO_SYNC_FULL
>> +		if (prio > MIN_COMPACT_PRIORITY
>>   					&& compaction_deferred(zone, order)) {
>>   			rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
>>   			continue;
>> -- 
>> 2.9.2
>>
>>
This change was in linux-next-20160823 so I ran it unmodified.

I did get an OOM, see attached.


Thanks,

Ralf-Peter


----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

[-- Attachment #2: OOM_4.8.0-rc3-next-20160823+.tar.bz2 --]
[-- Type: application/x-bzip, Size: 1377662 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-24 18:13                                                                     ` Ralf-Peter Rohbeck
@ 2016-08-25  7:22                                                                       ` Michal Hocko
  2016-08-25 20:35                                                                         ` Ralf-Peter Rohbeck
  0 siblings, 1 reply; 50+ messages in thread
From: Michal Hocko @ 2016-08-25  7:22 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck; +Cc: Vlastimil Babka, Andrew Morton, linux-mm, Joonsoo Kim

On Wed 24-08-16 11:13:31, Ralf-Peter Rohbeck wrote:
> On 19.08.2016 01:26, Michal Hocko wrote:
[...]
> > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > index ae4f40afcca1..3e35fce2cace 100644
> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -1644,8 +1644,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
> > >   		.alloc_flags = alloc_flags,
> > >   		.classzone_idx = classzone_idx,
> > >   		.direct_compaction = true,
> > > -		.whole_zone = (prio == COMPACT_PRIO_SYNC_FULL),
> > > -		.ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL)
> > > +		.whole_zone = (prio == MIN_COMPACT_PRIORITY),
> > > +		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY)
> > >   	};
> > >   	INIT_LIST_HEAD(&cc.freepages);
> > >   	INIT_LIST_HEAD(&cc.migratepages);
> > > @@ -1691,7 +1691,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
> > >   								ac->nodemask) {
> > >   		enum compact_result status;
> > > -		if (prio > COMPACT_PRIO_SYNC_FULL
> > > +		if (prio > MIN_COMPACT_PRIORITY
> > >   					&& compaction_deferred(zone, order)) {
> > >   			rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
> > >   			continue;
> > > -- 
> > > 2.9.2
> > > 
> > > 
> This change was in linux-next-20160823 so I ran it unmodified.
> 
> I did get an OOM, see attached.

This patch shouldn't make any difference to the previous patch you were
testing. Anyway I do not have the above linux-next tag so I cannot check
what exactly was there. The current code in linux-next contains 
http://lkml.kernel.org/r/20160823074339.GB23577@dhcp22.suse.cz so a
different approach. Once that patch hits the Linus tree we will try to
resurrect the compaction improvements series in linux-next and continue
with the testing.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-25  7:22                                                                       ` Michal Hocko
@ 2016-08-25 20:35                                                                         ` Ralf-Peter Rohbeck
  2016-08-26  8:35                                                                           ` Michal Hocko
  0 siblings, 1 reply; 50+ messages in thread
From: Ralf-Peter Rohbeck @ 2016-08-25 20:35 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Vlastimil Babka, Andrew Morton, linux-mm, Joonsoo Kim

On 25.08.2016 00:22, Michal Hocko wrote:
> On Wed 24-08-16 11:13:31, Ralf-Peter Rohbeck wrote:
>> On 19.08.2016 01:26, Michal Hocko wrote:
> [...]
>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>> index ae4f40afcca1..3e35fce2cace 100644
>>>> --- a/mm/compaction.c
>>>> +++ b/mm/compaction.c
>>>> @@ -1644,8 +1644,8 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
>>>>    		.alloc_flags = alloc_flags,
>>>>    		.classzone_idx = classzone_idx,
>>>>    		.direct_compaction = true,
>>>> -		.whole_zone = (prio == COMPACT_PRIO_SYNC_FULL),
>>>> -		.ignore_skip_hint = (prio == COMPACT_PRIO_SYNC_FULL)
>>>> +		.whole_zone = (prio == MIN_COMPACT_PRIORITY),
>>>> +		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY)
>>>>    	};
>>>>    	INIT_LIST_HEAD(&cc.freepages);
>>>>    	INIT_LIST_HEAD(&cc.migratepages);
>>>> @@ -1691,7 +1691,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>>>>    								ac->nodemask) {
>>>>    		enum compact_result status;
>>>> -		if (prio > COMPACT_PRIO_SYNC_FULL
>>>> +		if (prio > MIN_COMPACT_PRIORITY
>>>>    					&& compaction_deferred(zone, order)) {
>>>>    			rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
>>>>    			continue;
>>>> -- 
>>>> 2.9.2
>>>>
>>>>
>> This change was in linux-next-20160823 so I ran it unmodified.
>>
>> I did get an OOM, see attached.
> This patch shouldn't make any difference to the previous patch you were
> testing. Anyway I do not have the above linux-next tag so I cannot check
> what exactly was there. The current code in linux-next contains
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_20160823074339.GB23577-40dhcp22.suse.cz&d=DQIBAg&c=8S5idjlO_n28Ko3lg6lskTMwneSC-WqZ5EBTEEvDlkg&r=yGQdEpZknbtYvR0TyhkCGu-ifLklIvXIf740poRFltQ&m=CNEWNMAovbVAu8gw1UooufVBqAK0HbH5FJskyAmkR1g&s=S-eqTOP5U79awF_vqBSGNfNrvOe5l60XzVoVa6DuWx4&e=  so a
> different approach. Once that patch hits the Linus tree we will try to
> resurrect the compaction improvements series in linux-next and continue
> with the testing.

Sorry, the tag was next-20160823; I called the branch linux-next-20160823.


Ralf-Peter

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-25 20:35                                                                         ` Ralf-Peter Rohbeck
@ 2016-08-26  8:35                                                                           ` Michal Hocko
  2016-09-06 11:09                                                                             ` Vlastimil Babka
  0 siblings, 1 reply; 50+ messages in thread
From: Michal Hocko @ 2016-08-26  8:35 UTC (permalink / raw)
  To: Ralf-Peter Rohbeck; +Cc: Vlastimil Babka, Andrew Morton, linux-mm, Joonsoo Kim

On Thu 25-08-16 13:35:04, Ralf-Peter Rohbeck wrote:
[...]
> Sorry, the tag was next-20160823; I called the branch linux-next-20160823.

Yeah that is the tag I was looking for but the linux-next is quite
volatile and if you do not fetch the particular tag it won't exist in
leter trees. Anyway, I have set up a branch oom-playground in my tree
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git which which
is on top of the current up-to-date mmotm tree + revert of the quick
workaround which you have already tested (thanks for that!) and with
the Vlastimil's patch which was dropped due to workaround. AFAIU this
is what you have previously tested without OOM but later on still
managed to hit OOM again. Which would suggest we are still not there
and need to investigate further. I have some ideas what to do but I
would appreciate if we can confirm this status before we try new things.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: OOM killer changes
  2016-08-26  8:35                                                                           ` Michal Hocko
@ 2016-09-06 11:09                                                                             ` Vlastimil Babka
  0 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2016-09-06 11:09 UTC (permalink / raw)
  To: Michal Hocko, Ralf-Peter Rohbeck; +Cc: Andrew Morton, linux-mm, Joonsoo Kim

On 08/26/2016 10:35 AM, Michal Hocko wrote:
> On Thu 25-08-16 13:35:04, Ralf-Peter Rohbeck wrote:
> [...]
>> Sorry, the tag was next-20160823; I called the branch linux-next-20160823.
>
> Yeah that is the tag I was looking for but the linux-next is quite
> volatile and if you do not fetch the particular tag it won't exist in
> leter trees. Anyway, I have set up a branch oom-playground in my tree
> git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git which which
> is on top of the current up-to-date mmotm tree + revert of the quick
> workaround which you have already tested (thanks for that!) and with
> the Vlastimil's patch which was dropped due to workaround.

This is missing the patch that introduced ignoring pageblock suitability 
for the highest compaction priority [1].

> AFAIU this
> is what you have previously tested without OOM but later on still
> managed to hit OOM again.

I think the test also didn't include the patch [1] due to some 
confusion. I think I'll just resend everything (in a new thread) for 
testing on top of latest mmotm git.

[1] http://marc.info/?l=linux-mm&m=147158805719821

> Which would suggest we are still not there
> and need to investigate further. I have some ideas what to do but I
> would appreciate if we can confirm this status before we try new things.
>
> Thanks!
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2016-09-06 11:10 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <d8f3adcc-3607-1ef6-9ec5-82b2e125eef2@quantum.com>
2016-08-01  6:16 ` OOM killer changes Michal Hocko
     [not found]   ` <b1a39756-a0b5-1900-6575-d6e1f502cb26@Quantum.com>
     [not found]     ` <20160801182358.GB31957@dhcp22.suse.cz>
     [not found]       ` <30dbabc4-585c-55a5-9f3a-4e243c28356a@Quantum.com>
2016-08-01 19:26         ` Michal Hocko
2016-08-01 19:35           ` Ralf-Peter Rohbeck
2016-08-01 19:43             ` Michal Hocko
2016-08-01 19:52               ` Ralf-Peter Rohbeck
2016-08-01 20:09                 ` Michal Hocko
2016-08-01 20:16                   ` Ralf-Peter Rohbeck
2016-08-01 20:26                     ` Michal Hocko
2016-08-01 21:14                       ` Ralf-Peter Rohbeck
2016-08-01 21:27                         ` Ralf-Peter Rohbeck
2016-08-02  7:10                           ` Michal Hocko
2016-08-02 19:25                             ` Ralf-Peter Rohbeck
2016-08-15  4:48                               ` Ralf-Peter Rohbeck
2016-08-15  9:16                                 ` Vlastimil Babka
2016-08-15 15:01                                   ` Michal Hocko
2016-08-15 18:42                                     ` Ralf-Peter Rohbeck
2016-08-16  7:32                                       ` Michal Hocko
2016-08-16  7:43                                         ` Michal Hocko
2016-08-17  9:14                                           ` Ralf-Peter Rohbeck
2016-08-17  9:23                                             ` Vlastimil Babka
2016-08-17  9:28                                               ` Ralf-Peter Rohbeck
2016-08-17  9:33                                                 ` Michal Hocko
2016-08-17 23:37                                                   ` Ralf-Peter Rohbeck
2016-08-18  6:57                                                     ` Vlastimil Babka
2016-08-18 20:01                                                       ` Ralf-Peter Rohbeck
2016-08-18 20:12                                                         ` Vlastimil Babka
2016-08-19  2:42                                                           ` Ralf-Peter Rohbeck
2016-08-19  6:27                                                             ` Vlastimil Babka
2016-08-19  7:33                                                               ` Michal Hocko
2016-08-19  7:47                                                                 ` Vlastimil Babka
2016-08-19  8:26                                                                   ` Michal Hocko
2016-08-24 18:13                                                                     ` Ralf-Peter Rohbeck
2016-08-25  7:22                                                                       ` Michal Hocko
2016-08-25 20:35                                                                         ` Ralf-Peter Rohbeck
2016-08-26  8:35                                                                           ` Michal Hocko
2016-09-06 11:09                                                                             ` Vlastimil Babka
2016-08-23  5:02                                                               ` Joonsoo Kim
2016-08-23  7:45                                                                 ` Michal Hocko
2016-08-17  0:26                                         ` Ralf-Peter Rohbeck
2016-08-17  7:43                                           ` Vlastimil Babka
2016-08-16  3:12                                   ` Joonsoo Kim
2016-08-16  7:44                                     ` Vlastimil Babka
2016-08-17  4:48                                     ` Ralf-Peter Rohbeck
2016-08-17  7:56                                       ` Vlastimil Babka
2016-08-17  8:16                                         ` Joonsoo Kim
2016-08-17  9:21                                           ` Ralf-Peter Rohbeck
2016-08-17  9:11                                         ` Ralf-Peter Rohbeck
2016-08-17  9:20                                           ` Vlastimil Babka
2016-08-02  7:11           ` Vlastimil Babka
2016-08-02  9:02           ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).