linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* OOM kills with lots of free swap
@ 2017-06-23 23:29 Luigi Semenzato
  2017-06-27  6:50 ` Vlastimil Babka
  2017-06-27  7:11 ` Michal Hocko
  0 siblings, 2 replies; 8+ messages in thread
From: Luigi Semenzato @ 2017-06-23 23:29 UTC (permalink / raw)
  To: Linux Memory Management List

It is fairly easy to trigger OOM-kills with almost empty swap, by
running several fast-allocating processes in parallel.  I can
reproduce this on many 3.x kernels (I think I tried also on 4.4 but am
not sure).  I am hoping this is a known problem.

I tried to debug this in the past, by backtracking from the call to
the OOM code, and adding instrumentation to understand why the task
failed to allocate (or even make progress, apparently), but my effort
did not yield results within reasonable time.

I believe that it is possible that one task succeeds in reclaiming
pages, and then another task takes those pages before the first task
has a chance to get them.  But in that case the first task should
still notice progress and should retry, correct?  Is it possible in
theory that one task fails to allocate AND fails to make progress
while other tasks succeed?

(I asked this question, in not so many words, in 2013, but received no answers.)

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: OOM kills with lots of free swap
  2017-06-23 23:29 OOM kills with lots of free swap Luigi Semenzato
@ 2017-06-27  6:50 ` Vlastimil Babka
  2017-06-27  7:11 ` Michal Hocko
  1 sibling, 0 replies; 8+ messages in thread
From: Vlastimil Babka @ 2017-06-27  6:50 UTC (permalink / raw)
  To: Luigi Semenzato, Linux Memory Management List, Michal Hocko

[+CC Michal]

On 06/24/2017 01:29 AM, Luigi Semenzato wrote:
> It is fairly easy to trigger OOM-kills with almost empty swap, by
> running several fast-allocating processes in parallel.  I can
> reproduce this on many 3.x kernels (I think I tried also on 4.4 but am
> not sure).  I am hoping this is a known problem.

There was a notable OOM rework by Michal around 4.6 ?, so knowing the
state on recent kernels would be really useful.

In any case, please include the actual oom reports.

> I tried to debug this in the past, by backtracking from the call to
> the OOM code, and adding instrumentation to understand why the task
> failed to allocate (or even make progress, apparently), but my effort
> did not yield results within reasonable time.
> 
> I believe that it is possible that one task succeeds in reclaiming
> pages, and then another task takes those pages before the first task
> has a chance to get them.  But in that case the first task should
> still notice progress and should retry, correct?  Is it possible in
> theory that one task fails to allocate AND fails to make progress
> while other tasks succeed?
> 
> (I asked this question, in not so many words, in 2013, but received no answers.)
> 
> Thanks!
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: OOM kills with lots of free swap
  2017-06-23 23:29 OOM kills with lots of free swap Luigi Semenzato
  2017-06-27  6:50 ` Vlastimil Babka
@ 2017-06-27  7:11 ` Michal Hocko
  2017-06-27 15:21   ` Luigi Semenzato
  1 sibling, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2017-06-27  7:11 UTC (permalink / raw)
  To: Luigi Semenzato; +Cc: Linux Memory Management List

On Fri 23-06-17 16:29:39, Luigi Semenzato wrote:
> It is fairly easy to trigger OOM-kills with almost empty swap, by
> running several fast-allocating processes in parallel.  I can
> reproduce this on many 3.x kernels (I think I tried also on 4.4 but am
> not sure).  I am hoping this is a known problem.

The oom detection code has been reworked considerably in 4.7 so I would
like to see whether your problem is still presenet with more up-to-date
kernels. Also an OOM report is really necessary to get any clue what
might have been going on.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: OOM kills with lots of free swap
  2017-06-27  7:11 ` Michal Hocko
@ 2017-06-27 15:21   ` Luigi Semenzato
  2017-06-27 15:22     ` Luigi Semenzato
  0 siblings, 1 reply; 8+ messages in thread
From: Luigi Semenzato @ 2017-06-27 15:21 UTC (permalink / raw)
  To: Michal Hocko, Minchan Kim; +Cc: Linux Memory Management List

[-- Attachment #1: Type: text/plain, Size: 1563 bytes --]

(copying Minchan because I just asked him the same question.)

Thank you, I can try this on ToT, although I think that the problem is not
with the OOM killer itself but earlier---i.e. invoking the OOM killer seems
unnecessary and wrong.  Here's the question.

The general strategy for page allocation seems to be (please correct me as
needed):

1. look in the free lists
2. if that did not succeed, try to reclaim, then try again to allocate
3. keep trying as long as progress is made (i.e. something was reclaimed)
4. if no progress was made and no pages were found, invoke the OOM killer.

I'd like to know if that "progress is made" notion is possibly buggy.
Specifically, does it mean "progress is made by this task"?  Is it possible
that resource contention creates a situation where most tasks in most cases
can reclaim and allocate, but one task randomly fails to make progress?


On Tue, Jun 27, 2017 at 12:11 AM, Michal Hocko <mhocko@kernel.org> wrote:

> On Fri 23-06-17 16:29:39, Luigi Semenzato wrote:
> > It is fairly easy to trigger OOM-kills with almost empty swap, by
> > running several fast-allocating processes in parallel.  I can
> > reproduce this on many 3.x kernels (I think I tried also on 4.4 but am
> > not sure).  I am hoping this is a known problem.
>
> The oom detection code has been reworked considerably in 4.7 so I would
> like to see whether your problem is still presenet with more up-to-date
> kernels. Also an OOM report is really necessary to get any clue what
> might have been going on.
>
> --
> Michal Hocko
> SUSE Labs
>

[-- Attachment #2: Type: text/html, Size: 2162 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: OOM kills with lots of free swap
  2017-06-27 15:21   ` Luigi Semenzato
@ 2017-06-27 15:22     ` Luigi Semenzato
  2017-06-27 15:50       ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Luigi Semenzato @ 2017-06-27 15:22 UTC (permalink / raw)
  To: Michal Hocko, Minchan Kim; +Cc: Linux Memory Management List

(sorry, I forgot to turn off HTML formatting)

Thank you, I can try this on ToT, although I think that the problem is
not with the OOM killer itself but earlier---i.e. invoking the OOM
killer seems unnecessary and wrong.  Here's the question.

The general strategy for page allocation seems to be (please correct
me as needed):

1. look in the free lists
2. if that did not succeed, try to reclaim, then try again to allocate
3. keep trying as long as progress is made (i.e. something was reclaimed)
4. if no progress was made and no pages were found, invoke the OOM killer.

I'd like to know if that "progress is made" notion is possibly buggy.
Specifically, does it mean "progress is made by this task"?  Is it
possible that resource contention creates a situation where most tasks
in most cases can reclaim and allocate, but one task randomly fails to
make progress?

On Tue, Jun 27, 2017 at 8:21 AM, Luigi Semenzato <semenzato@google.com> wrote:
> (copying Minchan because I just asked him the same question.)
>
> Thank you, I can try this on ToT, although I think that the problem is not
> with the OOM killer itself but earlier---i.e. invoking the OOM killer seems
> unnecessary and wrong.  Here's the question.
>
> The general strategy for page allocation seems to be (please correct me as
> needed):
>
> 1. look in the free lists
> 2. if that did not succeed, try to reclaim, then try again to allocate
> 3. keep trying as long as progress is made (i.e. something was reclaimed)
> 4. if no progress was made and no pages were found, invoke the OOM killer.
>
> I'd like to know if that "progress is made" notion is possibly buggy.
> Specifically, does it mean "progress is made by this task"?  Is it possible
> that resource contention creates a situation where most tasks in most cases
> can reclaim and allocate, but one task randomly fails to make progress?
>
>
> On Tue, Jun 27, 2017 at 12:11 AM, Michal Hocko <mhocko@kernel.org> wrote:
>>
>> On Fri 23-06-17 16:29:39, Luigi Semenzato wrote:
>> > It is fairly easy to trigger OOM-kills with almost empty swap, by
>> > running several fast-allocating processes in parallel.  I can
>> > reproduce this on many 3.x kernels (I think I tried also on 4.4 but am
>> > not sure).  I am hoping this is a known problem.
>>
>> The oom detection code has been reworked considerably in 4.7 so I would
>> like to see whether your problem is still presenet with more up-to-date
>> kernels. Also an OOM report is really necessary to get any clue what
>> might have been going on.
>>
>> --
>> Michal Hocko
>> SUSE Labs
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: OOM kills with lots of free swap
  2017-06-27 15:22     ` Luigi Semenzato
@ 2017-06-27 15:50       ` Michal Hocko
  2017-06-29 17:46         ` Luigi Semenzato
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2017-06-27 15:50 UTC (permalink / raw)
  To: Luigi Semenzato; +Cc: Minchan Kim, Linux Memory Management List

On Tue 27-06-17 08:22:36, Luigi Semenzato wrote:
> (sorry, I forgot to turn off HTML formatting)
> 
> Thank you, I can try this on ToT, although I think that the problem is
> not with the OOM killer itself but earlier---i.e. invoking the OOM
> killer seems unnecessary and wrong.  Here's the question.
> 
> The general strategy for page allocation seems to be (please correct
> me as needed):
> 
> 1. look in the free lists
> 2. if that did not succeed, try to reclaim, then try again to allocate
> 3. keep trying as long as progress is made (i.e. something was reclaimed)
> 4. if no progress was made and no pages were found, invoke the OOM killer.

Yes that is the case very broadly speaking. The hard question really is
what "no progress" actually means. We use "no pages could be reclaimed"
as the indicator. We cannot blow up at the first such instance of
course because that could be too early (e.g. data under writeback
and many other details). With 4.7+ kernels this is implemented in
should_reclaim_retry. Prior to the rework we used to rely on
zone_reclaimable which simply checked how many pages we have scanned
since the last page has been freed and if that is 6 times the
reclaimable memory then we simply give up. It had some issues described
in 0a0337e0d1d1 ("mm, oom: rework oom detection").

> I'd like to know if that "progress is made" notion is possibly buggy.
> Specifically, does it mean "progress is made by this task"?  Is it
> possible that resource contention creates a situation where most tasks
> in most cases can reclaim and allocate, but one task randomly fails to
> make progress?

This can happen, alhtough it is quite unlikely. We are trying to
throttle allocations but you can hardly fight a consistent badluck ;)

In order to see what is going on in your particular case we need an oom
report though.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: OOM kills with lots of free swap
  2017-06-27 15:50       ` Michal Hocko
@ 2017-06-29 17:46         ` Luigi Semenzato
  2017-06-29 18:02           ` Luigi Semenzato
  0 siblings, 1 reply; 8+ messages in thread
From: Luigi Semenzato @ 2017-06-29 17:46 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Minchan Kim, Linux Memory Management List

Well, my apologies, I haven't been able to reproduce the problem, so
there's nothing to go on here.

We had a bug (a local patch) which caused this, then I had a bug in my
test case, so I was confused.  I also have a recollection of this
happening in older kernels (3.8 I think), but I am not going to go
back that far since even if the problem exists, we have no evidence it
happens frequently.

Thanks!


On Tue, Jun 27, 2017 at 8:50 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Tue 27-06-17 08:22:36, Luigi Semenzato wrote:
>> (sorry, I forgot to turn off HTML formatting)
>>
>> Thank you, I can try this on ToT, although I think that the problem is
>> not with the OOM killer itself but earlier---i.e. invoking the OOM
>> killer seems unnecessary and wrong.  Here's the question.
>>
>> The general strategy for page allocation seems to be (please correct
>> me as needed):
>>
>> 1. look in the free lists
>> 2. if that did not succeed, try to reclaim, then try again to allocate
>> 3. keep trying as long as progress is made (i.e. something was reclaimed)
>> 4. if no progress was made and no pages were found, invoke the OOM killer.
>
> Yes that is the case very broadly speaking. The hard question really is
> what "no progress" actually means. We use "no pages could be reclaimed"
> as the indicator. We cannot blow up at the first such instance of
> course because that could be too early (e.g. data under writeback
> and many other details). With 4.7+ kernels this is implemented in
> should_reclaim_retry. Prior to the rework we used to rely on
> zone_reclaimable which simply checked how many pages we have scanned
> since the last page has been freed and if that is 6 times the
> reclaimable memory then we simply give up. It had some issues described
> in 0a0337e0d1d1 ("mm, oom: rework oom detection").
>
>> I'd like to know if that "progress is made" notion is possibly buggy.
>> Specifically, does it mean "progress is made by this task"?  Is it
>> possible that resource contention creates a situation where most tasks
>> in most cases can reclaim and allocate, but one task randomly fails to
>> make progress?
>
> This can happen, alhtough it is quite unlikely. We are trying to
> throttle allocations but you can hardly fight a consistent badluck ;)
>
> In order to see what is going on in your particular case we need an oom
> report though.
> --
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: OOM kills with lots of free swap
  2017-06-29 17:46         ` Luigi Semenzato
@ 2017-06-29 18:02           ` Luigi Semenzato
  0 siblings, 0 replies; 8+ messages in thread
From: Luigi Semenzato @ 2017-06-29 18:02 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Minchan Kim, Linux Memory Management List

[-- Attachment #1: Type: text/plain, Size: 11207 bytes --]

Just to make sure I wasn't dreaming this, I searched and found some
old logs (5 years ago, kernel 3.4) when I was trying to debug the
problem.  Thanks!

1999-12-31T17:00:07.697712-08:00 localhost kernel: [    0.000000]
Linux version 3.4.0 (semenzato@luigi.mtv.corp.google.com) (gcc version
4.6.x-google 20120301 (prerelease) (gcc-4.6.3_cos_gg_2a32ae6) ) #31
SMP Tue Nov 27 16:44:46 PST 2012
1999-12-31T17:00:07.698992-08:00 localhost kernel: [    0.000000] CPU:
ARMv7 Processor [410fc0f4] revision 4 (ARMv7), cr=10c5387d


2012-11-28T10:21:56.879950-08:00 localhost kernel: [ 1652.710107] hog
invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=15,
oom_score_adj=1000
2012-11-28T10:21:56.879993-08:00 localhost kernel: [ 1652.710137]
[<800154ac>] (unwind_backtrace+0x0/0xec) from [<804ebcac>]
(dump_stack+0x20/0x24)
2012-11-28T10:21:56.913206-08:00 localhost kernel: [ 1652.710156]
[<804ebcac>] (dump_stack+0x20/0x24) from [<800b91b4>]
(dump_header.isra.10+0x7c/0x174)
2012-11-28T10:21:56.913242-08:00 localhost kernel: [ 1652.710173]
[<800b91b4>] (dump_header.isra.10+0x7c/0x174) from [<800b93f0>]
(oom_kill_process.part.13.constprop.14+0x4c/0x20c)
2012-11-28T10:21:56.913252-08:00 localhost kernel: [ 1652.710187]
[<800b93f0>] (oom_kill_process.part.13.constprop.14+0x4c/0x20c) from
[<800b9a00>] (out_of_memory+0x2dc/0x38c)
2012-11-28T10:21:56.913261-08:00 localhost kernel: [ 1652.710201]
[<800b9a00>] (out_of_memory+0x2dc/0x38c) from [<800bccf4>]
(__alloc_pages_nodemask+0x6b8/0x8a0)
2012-11-28T10:21:56.913270-08:00 localhost kernel: [ 1652.710216]
[<800bccf4>] (__alloc_pages_nodemask+0x6b8/0x8a0) from [<800e44e0>]
(read_swap_cache_async+0x54/0x11c)
2012-11-28T10:21:56.913278-08:00 localhost kernel: [ 1652.710233]
[<800e44e0>] (read_swap_cache_async+0x54/0x11c) from [<800e460c>]
(swapin_readahead+0x64/0x9c)
2012-11-28T10:21:56.913288-08:00 localhost kernel: [ 1652.710249]
[<800e460c>] (swapin_readahead+0x64/0x9c) from [<800d6fcc>]
(handle_pte_fault+0x2d8/0x668)
2012-11-28T10:21:56.913296-08:00 localhost kernel: [ 1652.710265]
[<800d6fcc>] (handle_pte_fault+0x2d8/0x668) from [<800d7420>]
(handle_mm_fault+0xc4/0xdc)
2012-11-28T10:21:56.913314-08:00 localhost kernel: [ 1652.710281]
[<800d7420>] (handle_mm_fault+0xc4/0xdc) from [<8001b080>]
(do_page_fault+0x114/0x354)
2012-11-28T10:21:56.913324-08:00 localhost kernel: [ 1652.710296]
[<8001b080>] (do_page_fault+0x114/0x354) from [<800083d8>]
(do_DataAbort+0x44/0xa8)
2012-11-28T10:21:56.913332-08:00 localhost kernel: [ 1652.710309]
[<800083d8>] (do_DataAbort+0x44/0xa8) from [<8000dc78>]
(__dabt_usr+0x38/0x40)
2012-11-28T10:21:56.913338-08:00 localhost kernel: [ 1652.710319]
Exception stack(0xed97bfb0 to 0xed97bff8)
2012-11-28T10:21:56.913344-08:00 localhost kernel: [ 1652.710328]
bfa0:                                     00000004 76682008 7673d303
7673d303
2012-11-28T10:21:56.913351-08:00 localhost kernel: [ 1652.710338]
bfc0: 76fa2f8c 76da29a1 76f997f1 7eaac518 00000000 00000000 76fa2f8c
00000000
2012-11-28T10:21:56.913358-08:00 localhost kernel: [ 1652.710349]
bfe0: 00000000 7eaac518 76f99abd 76f99ad8 00000030 ffffffff
2012-11-28T10:21:56.913363-08:00 localhost kernel: [ 1652.710358] Mem-info:
2012-11-28T10:21:56.913369-08:00 localhost kernel: [ 1652.710366]
Normal per-cpu:
2012-11-28T10:21:56.913385-08:00 localhost kernel: [ 1652.710374] CPU
  0: hi:  186, btch:  31 usd:   0
2012-11-28T10:21:56.913392-08:00 localhost kernel: [ 1652.710381] CPU
  1: hi:  186, btch:  31 usd:   0
2012-11-28T10:21:56.913399-08:00 localhost kernel: [ 1652.710389]
HighMem per-cpu:
2012-11-28T10:21:56.913405-08:00 localhost kernel: [ 1652.710396] CPU
  0: hi:   90, btch:  15 usd:   0
2012-11-28T10:21:56.913412-08:00 localhost kernel: [ 1652.710404] CPU
  1: hi:   90, btch:  15 usd:   0
2012-11-28T10:21:56.913419-08:00 localhost kernel: [ 1652.710416]
active_anon:332491 inactive_anon:125198 isolated_anon:2
2012-11-28T10:21:56.913426-08:00 localhost kernel: [ 1652.710420]
active_file:9210 inactive_file:13058 isolated_file:0
2012-11-28T10:21:56.913432-08:00 localhost kernel: [ 1652.710424]
unevictable:0 dirty:3 writeback:0 unstable:0
2012-11-28T10:21:56.913439-08:00 localhost kernel: [ 1652.710427]
free:3497 slab_reclaimable:2097 slab_unreclaimable:2166
2012-11-28T10:21:56.913454-08:00 localhost kernel: [ 1652.710431]
mapped:3349 shmem:8 pagetables:1527 bounce:0
2012-11-28T10:21:56.913467-08:00 localhost kernel: [ 1652.710451]
Normal free:13728kB min:5380kB low:6724kB high:8068kB
active_anon:1244580kB inactive_anon:415092kB active_file:11640kB
inactive_file:27544kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:1811520kB mlocked:0kB dirty:0kB
writeback:0kB mapped:2012kB shmem:32kB slab_reclaimable:8388kB
slab_unreclaimable:8664kB kernel_stack:1328kB pagetables:6108kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2641916
all_unreclaimable? no
2012-11-28T10:21:56.913477-08:00 localhost kernel: [ 1652.710475]
lowmem_reserve[]: 0 2095 2095
2012-11-28T10:21:56.913491-08:00 localhost kernel: [ 1652.710502]
HighMem free:260kB min:260kB low:456kB high:656kB active_anon:85384kB
inactive_anon:85700kB active_file:25200kB inactive_file:24688kB
unevictable:0kB isolated(anon):8kB isolated(file):0kB present:268224kB
mlocked:0kB dirty:12kB writeback:0kB mapped:11384kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:290707 all_unreclaimable? yes
2012-11-28T10:21:56.913500-08:00 localhost kernel: [ 1652.710526]
lowmem_reserve[]: 0 0 0
2012-11-28T10:21:56.913508-08:00 localhost kernel: [ 1652.710545]
Normal: 6*4kB 4*8kB 3*16kB 24*32kB 89*64kB 0*128kB 0*256kB 0*512kB
1*1024kB 1*2048kB 1*4096kB = 13736kB
2012-11-28T10:21:56.913518-08:00 localhost kernel: [ 1652.710597]
HighMem: 61*4kB 4*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 276kB
2012-11-28T10:21:56.913535-08:00 localhost kernel: [ 1652.710648]
112925 total pagecache pages
2012-11-28T10:21:56.913541-08:00 localhost kernel: [ 1652.710655]
90646 pages in swap cache
2012-11-28T10:21:56.913548-08:00 localhost kernel: [ 1652.710663] Swap
cache stats: add 733583, delete 642937, find 97864/161478
2012-11-28T10:21:56.913554-08:00 localhost kernel: [ 1652.710671] Free
swap  = 2005380kB
2012-11-28T10:21:56.913559-08:00 localhost kernel: [ 1652.710677]
Total swap = 3028768kB
2012-11-28T10:21:56.913564-08:00 localhost kernel: [ 1652.710683]
luigi_nr_reclaimed 495948
2012-11-28T10:21:56.913570-08:00 localhost kernel: [ 1652.710690]
luigi_nr_reclaims 8820
2012-11-28T10:21:56.913577-08:00 localhost kernel: [ 1652.710696]
luigi_aborted_reclaim 0
2012-11-28T10:21:56.914694-08:00 localhost kernel: [ 1652.710702]
luigi_more_to_do 237
2012-11-28T10:21:56.914704-08:00 localhost kernel: [ 1652.710708]
luigi_direct_reclaims 9086
2012-11-28T10:21:56.914718-08:00 localhost kernel: [ 1652.710714]
luigi_failed_direct_reclaims 98
2012-11-28T10:21:56.914725-08:00 localhost kernel: [ 1652.710721]
luigi_no_progress 1
2012-11-28T10:21:56.914730-08:00 localhost kernel: [ 1652.710727]
luigi_restarts 0
2012-11-28T10:21:56.914734-08:00 localhost kernel: [ 1652.710733]
luigi_should_alloc_retry 97
2012-11-28T10:21:56.914740-08:00 localhost kernel: [ 1652.710739]
luigi_direct_compact 0
2012-11-28T10:21:56.914747-08:00 localhost kernel: [ 1652.710745]
luigi_alloc_failed 111
2012-11-28T10:21:56.914752-08:00 localhost kernel: [ 1652.710751]
luigi_gfp_nofail 0
2012-11-28T10:21:56.914757-08:00 localhost kernel: [ 1652.710757]
luigi_costly_order 97
2012-11-28T10:21:56.914762-08:00 localhost kernel: [ 1652.710763] luigi_repeat 0
2012-11-28T10:21:56.914768-08:00 localhost kernel: [ 1652.710769]
luigi_kswapd_nap 71
2012-11-28T10:21:56.914779-08:00 localhost kernel: [ 1652.710775]
luigi_kswapd_sleep 4
2012-11-28T10:21:56.914785-08:00 localhost kernel: [ 1652.710781]
luigi_kswapd_loop 72
2012-11-28T10:21:56.914791-08:00 localhost kernel: [ 1652.710787]
luigi_kswapd_try_to_sleep 71
2012-11-28T10:21:56.914797-08:00 localhost kernel: [ 1652.710793]
luigi_slowpath 530966
2012-11-28T10:21:56.914803-08:00 localhost kernel: [ 1652.710799]
luigi_wake_all_kswapd 530961
2012-11-28T10:21:56.914808-08:00 localhost kernel: [ 1652.721084]
524288 pages of RAM
2012-11-28T10:21:56.914815-08:00 localhost kernel: [ 1652.721094] 4256
free pages
2012-11-28T10:21:56.914822-08:00 localhost kernel: [ 1652.721100] 7122
reserved pages
2012-11-28T10:21:56.914828-08:00 localhost kernel: [ 1652.721106] 2924
slab pages
2012-11-28T10:21:56.914837-08:00 localhost kernel: [ 1652.721112]
23510 pages shared
2012-11-28T10:21:56.914843-08:00 localhost kernel: [ 1652.721118]
90646 pages swap cached

On Thu, Jun 29, 2017 at 10:46 AM, Luigi Semenzato <semenzato@google.com> wrote:
> Well, my apologies, I haven't been able to reproduce the problem, so
> there's nothing to go on here.
>
> We had a bug (a local patch) which caused this, then I had a bug in my
> test case, so I was confused.  I also have a recollection of this
> happening in older kernels (3.8 I think), but I am not going to go
> back that far since even if the problem exists, we have no evidence it
> happens frequently.
>
> Thanks!
>
>
> On Tue, Jun 27, 2017 at 8:50 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> On Tue 27-06-17 08:22:36, Luigi Semenzato wrote:
>>> (sorry, I forgot to turn off HTML formatting)
>>>
>>> Thank you, I can try this on ToT, although I think that the problem is
>>> not with the OOM killer itself but earlier---i.e. invoking the OOM
>>> killer seems unnecessary and wrong.  Here's the question.
>>>
>>> The general strategy for page allocation seems to be (please correct
>>> me as needed):
>>>
>>> 1. look in the free lists
>>> 2. if that did not succeed, try to reclaim, then try again to allocate
>>> 3. keep trying as long as progress is made (i.e. something was reclaimed)
>>> 4. if no progress was made and no pages were found, invoke the OOM killer.
>>
>> Yes that is the case very broadly speaking. The hard question really is
>> what "no progress" actually means. We use "no pages could be reclaimed"
>> as the indicator. We cannot blow up at the first such instance of
>> course because that could be too early (e.g. data under writeback
>> and many other details). With 4.7+ kernels this is implemented in
>> should_reclaim_retry. Prior to the rework we used to rely on
>> zone_reclaimable which simply checked how many pages we have scanned
>> since the last page has been freed and if that is 6 times the
>> reclaimable memory then we simply give up. It had some issues described
>> in 0a0337e0d1d1 ("mm, oom: rework oom detection").
>>
>>> I'd like to know if that "progress is made" notion is possibly buggy.
>>> Specifically, does it mean "progress is made by this task"?  Is it
>>> possible that resource contention creates a situation where most tasks
>>> in most cases can reclaim and allocate, but one task randomly fails to
>>> make progress?
>>
>> This can happen, alhtough it is quite unlikely. We are trying to
>> throttle allocations but you can hardly fight a consistent badluck ;)
>>
>> In order to see what is going on in your particular case we need an oom
>> report though.
>> --
>> Michal Hocko
>> SUSE Labs

[-- Attachment #2: OOM.txt --]
[-- Type: text/plain, Size: 8151 bytes --]

2012-11-28T10:21:56.879950-08:00 localhost kernel: [ 1652.710107] hog invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=15, oom_score_adj=1000
2012-11-28T10:21:56.879993-08:00 localhost kernel: [ 1652.710137] [<800154ac>] (unwind_backtrace+0x0/0xec) from [<804ebcac>] (dump_stack+0x20/0x24)
2012-11-28T10:21:56.913206-08:00 localhost kernel: [ 1652.710156] [<804ebcac>] (dump_stack+0x20/0x24) from [<800b91b4>] (dump_header.isra.10+0x7c/0x174)
2012-11-28T10:21:56.913242-08:00 localhost kernel: [ 1652.710173] [<800b91b4>] (dump_header.isra.10+0x7c/0x174) from [<800b93f0>] (oom_kill_process.part.13.constprop.14+0x4c/0x20c)
2012-11-28T10:21:56.913252-08:00 localhost kernel: [ 1652.710187] [<800b93f0>] (oom_kill_process.part.13.constprop.14+0x4c/0x20c) from [<800b9a00>] (out_of_memory+0x2dc/0x38c)
2012-11-28T10:21:56.913261-08:00 localhost kernel: [ 1652.710201] [<800b9a00>] (out_of_memory+0x2dc/0x38c) from [<800bccf4>] (__alloc_pages_nodemask+0x6b8/0x8a0)
2012-11-28T10:21:56.913270-08:00 localhost kernel: [ 1652.710216] [<800bccf4>] (__alloc_pages_nodemask+0x6b8/0x8a0) from [<800e44e0>] (read_swap_cache_async+0x54/0x11c)
2012-11-28T10:21:56.913278-08:00 localhost kernel: [ 1652.710233] [<800e44e0>] (read_swap_cache_async+0x54/0x11c) from [<800e460c>] (swapin_readahead+0x64/0x9c)
2012-11-28T10:21:56.913288-08:00 localhost kernel: [ 1652.710249] [<800e460c>] (swapin_readahead+0x64/0x9c) from [<800d6fcc>] (handle_pte_fault+0x2d8/0x668)
2012-11-28T10:21:56.913296-08:00 localhost kernel: [ 1652.710265] [<800d6fcc>] (handle_pte_fault+0x2d8/0x668) from [<800d7420>] (handle_mm_fault+0xc4/0xdc)
2012-11-28T10:21:56.913314-08:00 localhost kernel: [ 1652.710281] [<800d7420>] (handle_mm_fault+0xc4/0xdc) from [<8001b080>] (do_page_fault+0x114/0x354)
2012-11-28T10:21:56.913324-08:00 localhost kernel: [ 1652.710296] [<8001b080>] (do_page_fault+0x114/0x354) from [<800083d8>] (do_DataAbort+0x44/0xa8)
2012-11-28T10:21:56.913332-08:00 localhost kernel: [ 1652.710309] [<800083d8>] (do_DataAbort+0x44/0xa8) from [<8000dc78>] (__dabt_usr+0x38/0x40)
2012-11-28T10:21:56.913338-08:00 localhost kernel: [ 1652.710319] Exception stack(0xed97bfb0 to 0xed97bff8)
2012-11-28T10:21:56.913344-08:00 localhost kernel: [ 1652.710328] bfa0:                                     00000004 76682008 7673d303 7673d303
2012-11-28T10:21:56.913351-08:00 localhost kernel: [ 1652.710338] bfc0: 76fa2f8c 76da29a1 76f997f1 7eaac518 00000000 00000000 76fa2f8c 00000000
2012-11-28T10:21:56.913358-08:00 localhost kernel: [ 1652.710349] bfe0: 00000000 7eaac518 76f99abd 76f99ad8 00000030 ffffffff
2012-11-28T10:21:56.913363-08:00 localhost kernel: [ 1652.710358] Mem-info:
2012-11-28T10:21:56.913369-08:00 localhost kernel: [ 1652.710366] Normal per-cpu:
2012-11-28T10:21:56.913385-08:00 localhost kernel: [ 1652.710374] CPU    0: hi:  186, btch:  31 usd:   0
2012-11-28T10:21:56.913392-08:00 localhost kernel: [ 1652.710381] CPU    1: hi:  186, btch:  31 usd:   0
2012-11-28T10:21:56.913399-08:00 localhost kernel: [ 1652.710389] HighMem per-cpu:
2012-11-28T10:21:56.913405-08:00 localhost kernel: [ 1652.710396] CPU    0: hi:   90, btch:  15 usd:   0
2012-11-28T10:21:56.913412-08:00 localhost kernel: [ 1652.710404] CPU    1: hi:   90, btch:  15 usd:   0
2012-11-28T10:21:56.913419-08:00 localhost kernel: [ 1652.710416] active_anon:332491 inactive_anon:125198 isolated_anon:2
2012-11-28T10:21:56.913426-08:00 localhost kernel: [ 1652.710420]  active_file:9210 inactive_file:13058 isolated_file:0
2012-11-28T10:21:56.913432-08:00 localhost kernel: [ 1652.710424]  unevictable:0 dirty:3 writeback:0 unstable:0
2012-11-28T10:21:56.913439-08:00 localhost kernel: [ 1652.710427]  free:3497 slab_reclaimable:2097 slab_unreclaimable:2166
2012-11-28T10:21:56.913454-08:00 localhost kernel: [ 1652.710431]  mapped:3349 shmem:8 pagetables:1527 bounce:0
2012-11-28T10:21:56.913467-08:00 localhost kernel: [ 1652.710451] Normal free:13728kB min:5380kB low:6724kB high:8068kB active_anon:1244580kB inactive_anon:415092kB active_file:11640kB inactive_file:27544kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1811520kB mlocked:0kB dirty:0kB writeback:0kB mapped:2012kB shmem:32kB slab_reclaimable:8388kB slab_unreclaimable:8664kB kernel_stack:1328kB pagetables:6108kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2641916 all_unreclaimable? no
2012-11-28T10:21:56.913477-08:00 localhost kernel: [ 1652.710475] lowmem_reserve[]: 0 2095 2095
2012-11-28T10:21:56.913491-08:00 localhost kernel: [ 1652.710502] HighMem free:260kB min:260kB low:456kB high:656kB active_anon:85384kB inactive_anon:85700kB active_file:25200kB inactive_file:24688kB unevictable:0kB isolated(anon):8kB isolated(file):0kB present:268224kB mlocked:0kB dirty:12kB writeback:0kB mapped:11384kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:290707 all_unreclaimable? yes
2012-11-28T10:21:56.913500-08:00 localhost kernel: [ 1652.710526] lowmem_reserve[]: 0 0 0
2012-11-28T10:21:56.913508-08:00 localhost kernel: [ 1652.710545] Normal: 6*4kB 4*8kB 3*16kB 24*32kB 89*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 13736kB
2012-11-28T10:21:56.913518-08:00 localhost kernel: [ 1652.710597] HighMem: 61*4kB 4*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 276kB
2012-11-28T10:21:56.913535-08:00 localhost kernel: [ 1652.710648] 112925 total pagecache pages
2012-11-28T10:21:56.913541-08:00 localhost kernel: [ 1652.710655] 90646 pages in swap cache
2012-11-28T10:21:56.913548-08:00 localhost kernel: [ 1652.710663] Swap cache stats: add 733583, delete 642937, find 97864/161478
2012-11-28T10:21:56.913554-08:00 localhost kernel: [ 1652.710671] Free swap  = 2005380kB
2012-11-28T10:21:56.913559-08:00 localhost kernel: [ 1652.710677] Total swap = 3028768kB
2012-11-28T10:21:56.913564-08:00 localhost kernel: [ 1652.710683] luigi_nr_reclaimed 495948
2012-11-28T10:21:56.913570-08:00 localhost kernel: [ 1652.710690] luigi_nr_reclaims 8820
2012-11-28T10:21:56.913577-08:00 localhost kernel: [ 1652.710696] luigi_aborted_reclaim 0
2012-11-28T10:21:56.914694-08:00 localhost kernel: [ 1652.710702] luigi_more_to_do 237
2012-11-28T10:21:56.914704-08:00 localhost kernel: [ 1652.710708] luigi_direct_reclaims 9086
2012-11-28T10:21:56.914718-08:00 localhost kernel: [ 1652.710714] luigi_failed_direct_reclaims 98
2012-11-28T10:21:56.914725-08:00 localhost kernel: [ 1652.710721] luigi_no_progress 1
2012-11-28T10:21:56.914730-08:00 localhost kernel: [ 1652.710727] luigi_restarts 0
2012-11-28T10:21:56.914734-08:00 localhost kernel: [ 1652.710733] luigi_should_alloc_retry 97
2012-11-28T10:21:56.914740-08:00 localhost kernel: [ 1652.710739] luigi_direct_compact 0
2012-11-28T10:21:56.914747-08:00 localhost kernel: [ 1652.710745] luigi_alloc_failed 111
2012-11-28T10:21:56.914752-08:00 localhost kernel: [ 1652.710751] luigi_gfp_nofail 0
2012-11-28T10:21:56.914757-08:00 localhost kernel: [ 1652.710757] luigi_costly_order 97
2012-11-28T10:21:56.914762-08:00 localhost kernel: [ 1652.710763] luigi_repeat 0
2012-11-28T10:21:56.914768-08:00 localhost kernel: [ 1652.710769] luigi_kswapd_nap 71
2012-11-28T10:21:56.914779-08:00 localhost kernel: [ 1652.710775] luigi_kswapd_sleep 4
2012-11-28T10:21:56.914785-08:00 localhost kernel: [ 1652.710781] luigi_kswapd_loop 72
2012-11-28T10:21:56.914791-08:00 localhost kernel: [ 1652.710787] luigi_kswapd_try_to_sleep 71
2012-11-28T10:21:56.914797-08:00 localhost kernel: [ 1652.710793] luigi_slowpath 530966
2012-11-28T10:21:56.914803-08:00 localhost kernel: [ 1652.710799] luigi_wake_all_kswapd 530961
2012-11-28T10:21:56.914808-08:00 localhost kernel: [ 1652.721084] 524288 pages of RAM
2012-11-28T10:21:56.914815-08:00 localhost kernel: [ 1652.721094] 4256 free pages
2012-11-28T10:21:56.914822-08:00 localhost kernel: [ 1652.721100] 7122 reserved pages
2012-11-28T10:21:56.914828-08:00 localhost kernel: [ 1652.721106] 2924 slab pages
2012-11-28T10:21:56.914837-08:00 localhost kernel: [ 1652.721112] 23510 pages shared
2012-11-28T10:21:56.914843-08:00 localhost kernel: [ 1652.721118] 90646 pages swap cached

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-06-29 18:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-23 23:29 OOM kills with lots of free swap Luigi Semenzato
2017-06-27  6:50 ` Vlastimil Babka
2017-06-27  7:11 ` Michal Hocko
2017-06-27 15:21   ` Luigi Semenzato
2017-06-27 15:22     ` Luigi Semenzato
2017-06-27 15:50       ` Michal Hocko
2017-06-29 17:46         ` Luigi Semenzato
2017-06-29 18:02           ` Luigi Semenzato

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).