* Re: More OOM problems
[not found] ` <982671bd-5733-0cd5-c15d-112648ff14c5@Quantum.com>
@ 2016-10-11 6:44 ` Michal Hocko
2016-10-11 7:10 ` Vlastimil Babka
0 siblings, 1 reply; 31+ messages in thread
From: Michal Hocko @ 2016-10-11 6:44 UTC (permalink / raw)
To: Ralf-Peter Rohbeck
Cc: Vlastimil Babka, Linus Torvalds, Tetsuo Handa, Oleg Nesterov,
Vladimir Davydov, Andrew Morton, Markus Trippelsdorf,
Arkadiusz Miskiewicz, Jiri Slaby, Olaf Hering, Joonsoo Kim,
linux-mm
[Let's restore the CC list]
On Mon 10-10-16 10:20:27, Ralf-Peter Rohbeck wrote:
> I ran my torture test overnight (after finding the last linux-next branch
> that compiled, sigh...):
> Wrote two 4TB USB3 drives, compiled a kernel and ran my btrfs dedup script
> in parallel.
Thanks for testing and good to hear that premature OOMs are gone
> There were a few allocation failures but I didn't notice anything amiss but
> the log entries.
> Logs are at
> https://filebin.net/duj4c1bv64uohm5q/OOM_4.8.0-rc7-next-20160920.tar.bz2.
Oct 10 03:35:18 fs kernel: kworker/1:202: page allocation failure: order:0, mode:0x2204000(GFP_NOWAIT|__GFP_COMP|__GFP_NOTRACK)
Oct 10 03:35:18 fs kernel: kworker/1:214: page allocation failure: order:0, mode:0x2204000(GFP_NOWAIT|__GFP_COMP|__GFP_NOTRACK)
Oct 10 03:35:18 fs kernel: kworker/1:236: page allocation failure: order:0, mode:0x2204000(GFP_NOWAIT|__GFP_COMP|__GFP_NOTRACK)
Oct 10 03:35:18 fs kernel: kworker/1:236: page allocation failure: order:0, mode:0x2204000(GFP_NOWAIT|__GFP_COMP|__GFP_NOTRACK)
Oct 10 03:35:18 fs kernel: kworker/1:224: page allocation failure: order:0, mode:0x2204000(GFP_NOWAIT|__GFP_COMP|__GFP_NOTRACK)
Oct 10 03:35:18 fs kernel: kworker/1:224: page allocation failure: order:0, mode:0x2204000(GFP_NOWAIT|__GFP_COMP|__GFP_NOTRACK)
Oct 10 03:35:18 fs kernel: kworker/1:172: page allocation failure: order:0, mode:0x2204000(GFP_NOWAIT|__GFP_COMP|__GFP_NOTRACK)
Oct 10 03:35:18 fs kernel: kworker/1:227: page allocation failure: order:0, mode:0x2204000(GFP_NOWAIT|__GFP_COMP|__GFP_NOTRACK)
Oct 10 03:35:18 fs kernel: kworker/1:226: page allocation failure: order:0, mode:0x2204000(GFP_NOWAIT|__GFP_COMP|__GFP_NOTRACK)
Oct 10 03:35:18 fs kernel: kworker/1:229: page allocation failure: order:0, mode:0x2204000(GFP_NOWAIT|__GFP_COMP|__GFP_NOTRACK)
Oct 10 06:45:54 fs kernel: kworker/3:91: page allocation failure: order:0, mode:0x2204000(GFP_NOWAIT|__GFP_COMP|__GFP_NOTRACK)
Oct 10 06:45:54 fs kernel: kworker/3:91: page allocation failure: order:0, mode:0x2204000(GFP_NOWAIT|__GFP_COMP|__GFP_NOTRACK)
So those are all atomic (aka not sleeping) 4K allocations failing
because you are running low on memory and this kind of allocation
requests cannot reclaim any memory.
: Oct 10 03:35:18 fs kernel: Node 0 active_anon:28004kB inactive_anon:532404kB active_file:5665056kB inactive_file:1290052kB unevictable:64kB isolated(anon):0kB isolated(file):128kB mapped:46196kB dirty:686200kB writeback:124196kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 17920kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
: Oct 10 03:35:18 fs kernel: Node 0 DMA free:14236kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15980kB managed:15896kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:1660kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
: Oct 10 03:35:18 fs kernel: lowmem_reserve[]: 0 1939 7939 7939 7939
: Oct 10 03:35:18 fs kernel: Node 0 DMA32 free:40476kB min:16480kB low:20600kB high:24720kB active_anon:6472kB inactive_anon:14408kB active_file:1073784kB inactive_file:740536kB unevictable:0kB writepending:470432kB present:2072256kB managed:2006688kB mlocked:0kB slab_reclaimable:60376kB slab_unreclaimable:32844kB kernel_stack:8352kB pagetables:1984kB bounce:0kB free_pcp:164kB local_pcp:0kB free_cma:0kB
: Oct 10 03:35:18 fs kernel: lowmem_reserve[]: 0 0 5999 5999 5999
These two zones are above min watermark but still under if we consider
lowmemory reserves.
: Oct 10 03:35:18 fs kernel: Node 0 Normal free:50928kB min:50968kB low:63708kB high:76448kB active_anon:21532kB inactive_anon:517996kB active_file:4591272kB inactive_file:549636kB unevictable:64kB writepending:339940kB present:6291456kB managed:6147908kB mlocked:64kB slab_reclaimable:105320kB slab_unreclaimable:146140kB kernel_stack:17664kB pagetables:43872kB bounce:0kB free_pcp:340kB local_pcp:0kB free_cma:0kB
: Oct 10 03:35:18 fs kernel: lowmem_reserve[]: 0 0 0 0 0
and this zone is below the min watermark. I haven't checked other
allocation failures but I assume a similar situation. It looks that you
have a peak memory pressure load and kswapd just cannot catch up with it
for a moment. Note that most of those failures come within a second. You
can ignore these warnings.
I will just note that all those failures come from bcache.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-10-11 6:44 ` More OOM problems Michal Hocko
@ 2016-10-11 7:10 ` Vlastimil Babka
2016-10-30 4:17 ` Simon Kirby
0 siblings, 1 reply; 31+ messages in thread
From: Vlastimil Babka @ 2016-10-11 7:10 UTC (permalink / raw)
To: Michal Hocko, Ralf-Peter Rohbeck
Cc: Linus Torvalds, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Andrew Morton, Markus Trippelsdorf, Arkadiusz Miskiewicz,
Jiri Slaby, Olaf Hering, Joonsoo Kim, linux-mm
On 10/11/2016 08:44 AM, Michal Hocko wrote:
> [Let's restore the CC list]
>
> On Mon 10-10-16 10:20:27, Ralf-Peter Rohbeck wrote:
>> I ran my torture test overnight (after finding the last linux-next branch
>> that compiled, sigh...):
>> Wrote two 4TB USB3 drives, compiled a kernel and ran my btrfs dedup script
>> in parallel.
>
> Thanks for testing and good to hear that premature OOMs are gone
Great indeed. Note that meanwhile the patches went to mainline so we'd
definitely welcome testing from the rest of you who had originally problems with
4.7/4.8 and didn't try the linux-next recently. So a good point would be to test
4.9-rc1 when it's released. I hope you don't want to discover regressions again
too late, in the 4.9 final release :)
Vlastimil
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-10-11 7:10 ` Vlastimil Babka
@ 2016-10-30 4:17 ` Simon Kirby
2016-10-31 21:41 ` Vlastimil Babka
0 siblings, 1 reply; 31+ messages in thread
From: Simon Kirby @ 2016-10-30 4:17 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Michal Hocko, Ralf-Peter Rohbeck, Linus Torvalds, Tetsuo Handa,
Oleg Nesterov, Vladimir Davydov, Andrew Morton,
Markus Trippelsdorf, Arkadiusz Miskiewicz, Jiri Slaby,
Olaf Hering, Joonsoo Kim, linux-mm
On Tue, Oct 11, 2016 at 09:10:13AM +0200, Vlastimil Babka wrote:
> Great indeed. Note that meanwhile the patches went to mainline so
> we'd definitely welcome testing from the rest of you who had
> originally problems with 4.7/4.8 and didn't try the linux-next
> recently. So a good point would be to test 4.9-rc1 when it's
> released. I hope you don't want to discover regressions again too
> late, in the 4.9 final release :)
Hello!
I have a mixed-purpose HTPCish box running MythTV, etc. that I recently
upgraded from 4.6.7 to 4.8.4. This upgrade started OOM killing of various
processes even when there is plenty (gigabytes) of memory as page cache.
This is with CONFIG_COMPACTION=y, and it occurs with or without swap on.
I'm not able to confirm on 4.9-rc2 since nouveau doesn't support NV117
and binary blob nvidia doesn't yet like the changes to get_user_pages.
4.8 includes "prevent premature OOM killer invocation for high order
request" which sounds like it should fix the issue, but this certainly
does not seem to be the case for me. I copied kern.log and .config here:
http://0x.ca/sim/ref/4.8.4/
I see that this is reverted in 4.9-rc and replaced with something else.
Unfortunately, I can't test this workload without the nvidia tainting,
and "git log --oneline v4.8..v4.9-rc2 mm | grep oom | wc -l" returns 13.
Is there some stuff I should cherry-pick to try?
Simon-
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-10-30 4:17 ` Simon Kirby
@ 2016-10-31 21:41 ` Vlastimil Babka
2016-10-31 21:51 ` Vlastimil Babka
0 siblings, 1 reply; 31+ messages in thread
From: Vlastimil Babka @ 2016-10-31 21:41 UTC (permalink / raw)
To: Simon Kirby
Cc: Michal Hocko, Ralf-Peter Rohbeck, Linus Torvalds, Tetsuo Handa,
Oleg Nesterov, Vladimir Davydov, Andrew Morton,
Markus Trippelsdorf, Arkadiusz Miskiewicz, Jiri Slaby,
Olaf Hering, Joonsoo Kim, linux-mm
On 10/30/2016 05:17 AM, Simon Kirby wrote:
> On Tue, Oct 11, 2016 at 09:10:13AM +0200, Vlastimil Babka wrote:
>
>> Great indeed. Note that meanwhile the patches went to mainline so
>> we'd definitely welcome testing from the rest of you who had
>> originally problems with 4.7/4.8 and didn't try the linux-next
>> recently. So a good point would be to test 4.9-rc1 when it's
>> released. I hope you don't want to discover regressions again too
>> late, in the 4.9 final release :)
>
> Hello!
>
> I have a mixed-purpose HTPCish box running MythTV, etc. that I recently
> upgraded from 4.6.7 to 4.8.4. This upgrade started OOM killing of various
> processes even when there is plenty (gigabytes) of memory as page cache.
Hmm, that's too bad.
> This is with CONFIG_COMPACTION=y, and it occurs with or without swap on.
> I'm not able to confirm on 4.9-rc2 since nouveau doesn't support NV117
> and binary blob nvidia doesn't yet like the changes to get_user_pages.
Please try once it starts liking the changes.
Actually this kernel-interface part of the driver isn't binary blob
AFAIK, so it should be possible to adapt it?
> 4.8 includes "prevent premature OOM killer invocation for high order
> request" which sounds like it should fix the issue, but this certainly
> does not seem to be the case for me. I copied kern.log and .config here:
> http://0x.ca/sim/ref/4.8.4/
Looks like the available high-order pages are only as part of the
highatomic reserves. I've checked if there might be some error in the
functions deciding to reclaim/compact where they would wrongly decide
that these pages are available, but it seems fine to me.
> I see that this is reverted in 4.9-rc and replaced with something else.
> Unfortunately, I can't test this workload without the nvidia tainting,
> and "git log --oneline v4.8..v4.9-rc2 mm | grep oom | wc -l" returns 13.
> Is there some stuff I should cherry-pick to try?
Well, there were around 10 related patches, so I would rather try to
adapt the nvidia code first, if possible.
In any case, it's still bad for 4.8 then.
Can you send /proc/vmstat from the system with an uptime that already
experienced at least one such oom?
> Simon-
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-10-31 21:41 ` Vlastimil Babka
@ 2016-10-31 21:51 ` Vlastimil Babka
0 siblings, 0 replies; 31+ messages in thread
From: Vlastimil Babka @ 2016-10-31 21:51 UTC (permalink / raw)
To: Simon Kirby
Cc: Michal Hocko, Ralf-Peter Rohbeck, Linus Torvalds, Tetsuo Handa,
Oleg Nesterov, Vladimir Davydov, Andrew Morton,
Markus Trippelsdorf, Arkadiusz Miskiewicz, Jiri Slaby,
Olaf Hering, Joonsoo Kim, linux-mm
On 10/31/2016 10:41 PM, Vlastimil Babka wrote:
> In any case, it's still bad for 4.8 then.
> Can you send /proc/vmstat from the system with an uptime that already
> experienced at least one such oom?
Oh, and it might make sense to try the patch at the end of this e-mail:
https://marc.info/?l=linux-mm&m=147423605024993
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-25 21:48 ` Lorenzo Stoakes
@ 2016-09-26 7:48 ` Michal Hocko
0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-09-26 7:48 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Linus Torvalds, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Vlastimil Babka, Andrew Morton, Markus Trippelsdorf,
Arkadiusz Miskiewicz, Ralf-Peter Rohbeck, Jiri Slaby,
Olaf Hering, Joonsoo Kim, linux-mm
On Sun 25-09-16 22:48:23, Lorenzo Stoakes wrote:
> On Mon, Sep 19, 2016 at 10:53:48AM +0200, Michal Hocko wrote:
> > On Mon 19-09-16 09:42:37, Lorenzo Stoakes wrote:
> > > On Mon, Sep 19, 2016 at 10:32:15AM +0200, Michal Hocko wrote:
> > > >
> > > > so this is the same thing as in Linus case. All the zones are hitting
> > > > min wmark so the should_compact_retry() gave up. As mentioned in other
> > > > email [1] this is inherent limitation of the workaround. Your system is
> > > > swapless but there is a lot of the reclaimable page cache so Vlastimil's
> > > > patches should help.
> > >
> > > I will experiment with a linux-next kernel and see if the problem
> > > recurs. I've attempted to see if there is a way to manually reproduce
> > > on the mainline kernel by performing workloads that triggered the
> > > OOM (loading google sheets tabs, compiling a kernel, playing a video
> > > on youtube), but to no avail - it seems the system needs to be
> > > sufficiently fragmented first before it'll trigger.
> > >
> > > Given that's the case, I'll just have to try using the linux-next
> > > kernel and if you don't hear from me you can assume it did not repro
> > > again :)
> >
> > OK, fair deal ;)
>
> Actually, I'll break the deal :) I've been running workloads similar to previous
> weeks when I encountered the issue - including kernel builds, video playing,
> lotsa tabs, etc. and also tried to intentionally eat up a bit of RAM from
> time-to-time and have not seen a single OOM, so it looks like this is sorted it
> for my system, notwithstanding Murphy's law.
Thanks for the feedback. Your testing is highly appreciated! I guess
Andrew can put your Tested-by for the latest Vlastimil patches to credit
your effort.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-19 8:53 ` Michal Hocko
@ 2016-09-25 21:48 ` Lorenzo Stoakes
2016-09-26 7:48 ` Michal Hocko
0 siblings, 1 reply; 31+ messages in thread
From: Lorenzo Stoakes @ 2016-09-25 21:48 UTC (permalink / raw)
To: Michal Hocko
Cc: Linus Torvalds, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Vlastimil Babka, Andrew Morton, Markus Trippelsdorf,
Arkadiusz Miskiewicz, Ralf-Peter Rohbeck, Jiri Slaby,
Olaf Hering, Joonsoo Kim, linux-mm
On Mon, Sep 19, 2016 at 10:53:48AM +0200, Michal Hocko wrote:
> On Mon 19-09-16 09:42:37, Lorenzo Stoakes wrote:
> > On Mon, Sep 19, 2016 at 10:32:15AM +0200, Michal Hocko wrote:
> > >
> > > so this is the same thing as in Linus case. All the zones are hitting
> > > min wmark so the should_compact_retry() gave up. As mentioned in other
> > > email [1] this is inherent limitation of the workaround. Your system is
> > > swapless but there is a lot of the reclaimable page cache so Vlastimil's
> > > patches should help.
> >
> > I will experiment with a linux-next kernel and see if the problem
> > recurs. I've attempted to see if there is a way to manually reproduce
> > on the mainline kernel by performing workloads that triggered the
> > OOM (loading google sheets tabs, compiling a kernel, playing a video
> > on youtube), but to no avail - it seems the system needs to be
> > sufficiently fragmented first before it'll trigger.
> >
> > Given that's the case, I'll just have to try using the linux-next
> > kernel and if you don't hear from me you can assume it did not repro
> > again :)
>
> OK, fair deal ;)
Actually, I'll break the deal :) I've been running workloads similar to previous
weeks when I encountered the issue - including kernel builds, video playing,
lotsa tabs, etc. and also tried to intentionally eat up a bit of RAM from
time-to-time and have not seen a single OOM, so it looks like this is sorted it
for my system, notwithstanding Murphy's law.
(I ended up using the mm tree as irritatingly I couldn't get linux-next working
with the arch linux build system, but it definitely includes Vlastimil's
patches.)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-21 7:04 ` Raymond Jennings
@ 2016-09-21 7:29 ` Michal Hocko
0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-09-21 7:29 UTC (permalink / raw)
To: Raymond Jennings
Cc: Linus Torvalds, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Vlastimil Babka, Andrew Morton, Markus Trippelsdorf,
Arkadiusz Miskiewicz, Ralf-Peter Rohbeck, Jiri Slaby,
Olaf Hering, Joonsoo Kim, linux-mm
On Wed 21-09-16 00:04:58, Raymond Jennings wrote:
> On Sun, 18 Sep 2016 13:03:01 -0700
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > [ More or less random collection of people from previous oom patches
> > and/or discussions, if you feel you shouldn't have been cc'd, blame me
> > for just picking things from earlier threads and/or commits ]
> >
> > I'm afraid that the oom situation is still not fixed, and the "let's
> > die quickly" patches are still a nasty regression.
> >
> > I have a 16GB desktop that I just noticed killed one of the chrome
> > tabs yesterday. Tha machine had *tons* of freeable memory, with
> > something like 7GB of page cache at the time, if I read this right.
>
> Suggestions:
>
> * Live compaction?
>
> Have a background process that actively defragments free memory by
> bubbling movable pages to one end of the zone and the free holes to the
> other end?
>
> Same spirit perhaps as khugepaged, periodically walk a zone from one
> end and migrate any used movable pages into the hole closest to the
> other end?
we have something like that already. It's called kcompactd
> I dunno, doing this manually with /proc/sys/vm/compact_blah seems a
> little hamfisted to me, and maybe a background process doing it
> incrementally would be better?
>
> Also, question (for myself but also for the curious):
>
> If you're allocating memory, can you synchronously reclaim, or does the
> memory have to be free already?
Yes we do direct reclaim if we are hitting watermarks. kswapd will start
earlier to prevent from direct reclaim because that will incur
latencies.
[...]
> > And yes, CONFIG_COMPACTION was enabled.
>
> Does this compact manually or automatically?
Without this option there is no compaction at all and the reclaim is the
only source of high order pages.
> > So quite honestly, I *really* don't think that a 1kB allocation should
> > have reasonably failed and killed anything at all (ok, it could have
> > been an 8kB one, who knows - but it really looks like it *could* have
> > been just 1kB).
> >
> > Considering that kmalloc() pattern, I suspect that we need to consider
> > order-3 allocations "small", and try a lot harder.
> >
> > Because killing processes due to "out of memory" in this situation is
> > unquestionably a bug.
>
> In this case I'd wonder why the freeable-but-still-used-in-pagecache
> memory isn't being reaped at alloc time.
I've tried to explain in other email. But let me try again. Compaction
code will back off and refrain from doing anything if we are close the
watermarks. This was your case as I've pointed in other email. The
workaround (retry as long as we are above order-0 watermark) which is
sitting in the Linus' tree will prevent only high order ooms only if
there is some memory left which should be normally the case because the
reclaim should free up something but if you hit parallel allocation
during reclaim somebody might have eaten up that memory. That's why I've
said it's far from idea but it should at least plug the biggest hole.
The patches from Vlastimil get us back to compaction feedback route
which was my original design. That means we keep reclaiming while the
compaction backs off and keep retrying as long as the compaction doesn't
fail. His changes get rid of some heuristics if we are getting close to
OOM situation so it should work much more reliably than my original
implementation. He doesn't have to change the detection code but rather
change compaction implementation details.
HTH
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-18 20:03 Linus Torvalds
` (3 preceding siblings ...)
2016-09-19 6:48 ` Michal Hocko
@ 2016-09-21 7:04 ` Raymond Jennings
2016-09-21 7:29 ` Michal Hocko
4 siblings, 1 reply; 31+ messages in thread
From: Raymond Jennings @ 2016-09-21 7:04 UTC (permalink / raw)
To: Linus Torvalds
Cc: Michal Hocko, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Vlastimil Babka, Andrew Morton, Markus Trippelsdorf,
Arkadiusz Miskiewicz, Ralf-Peter Rohbeck, Jiri Slaby,
Olaf Hering, Joonsoo Kim, linux-mm
On Sun, 18 Sep 2016 13:03:01 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> [ More or less random collection of people from previous oom patches
> and/or discussions, if you feel you shouldn't have been cc'd, blame me
> for just picking things from earlier threads and/or commits ]
>
> I'm afraid that the oom situation is still not fixed, and the "let's
> die quickly" patches are still a nasty regression.
>
> I have a 16GB desktop that I just noticed killed one of the chrome
> tabs yesterday. Tha machine had *tons* of freeable memory, with
> something like 7GB of page cache at the time, if I read this right.
Suggestions:
* Live compaction?
Have a background process that actively defragments free memory by
bubbling movable pages to one end of the zone and the free holes to the
other end?
Same spirit perhaps as khugepaged, periodically walk a zone from one
end and migrate any used movable pages into the hole closest to the
other end?
I dunno, doing this manually with /proc/sys/vm/compact_blah seems a
little hamfisted to me, and maybe a background process doing it
incrementally would be better?
Also, question (for myself but also for the curious):
If you're allocating memory, can you synchronously reclaim, or does the
memory have to be free already? I have a hunch that if you get caught
with freeable memory that's still being used as clean pagecache, you
should be able to free it immediately if memory is scarce...but then
again it might choke because a process in userland could always touch
it through vfs or something like that.
> The trigger is a kcalloc() in the i915 driver:
>
> Xorg invoked oom-killer:
> gfp_mask=0x240c0d0(GFP_TEMPORARY|__GFP_COMP|__GFP_ZERO), order=3,
> oom_score_adj=0
>
> __kmalloc+0x1cd/0x1f0
> alloc_gen8_temp_bitmaps+0x47/0x80 [i915]
>
> which looks like it is one of these:
>
> slabinfo - version: 2.1
> # name <active_objs> <num_objs> <objsize> <objperslab>
> <pagesperslab>
> kmalloc-8192 268 268 8192 4 8
> kmalloc-4096 732 786 4096 8 8
> kmalloc-2048 1402 1456 2048 16 8
> kmalloc-1024 2505 2976 1024 32 8
>
> so even just a 1kB allocation can cause an order-3 page allocation.
>
> And yeah, I had what, 137MB free memory, it's just that it's all
> fairly fragmented. There's actually even order-4 pages, but they are
> in low DMA memory and the system tries to protect them:
>
> Node 0 DMA: 0*4kB 1*8kB (U) 2*16kB (U) 1*32kB (U) 3*64kB (U) 2*128kB
> (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
> Node 0 DMA32: 11110*4kB (UMEH) 2929*8kB (UMEH) 44*16kB (MH) 1*32kB
> (H) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
> 68608kB
> Node 0 Normal: 14031*4kB (UMEH) 49*8kB (UMEH) 18*16kB (UH) 0*32kB
> 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 56804kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
> hugepages_size=1048576kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
> hugepages_size=2048kB
> 2084682 total pagecache pages
> 11 pages in swap cache
> Swap cache stats: add 35, delete 24, find 2/3
> Free swap = 8191868kB
> Total swap = 8191996kB
> 4168499 pages RAM
>
> And it looks like there's a fair amount of memory busy under writeback
> (470MB or so)
>
> active_anon:1539159 inactive_anon:374915 isolated_anon:0
> active_file:1251771 inactive_file:450068
> isolated_file:0
> unevictable:175 dirty:26 writeback:118690
> unstable:0 slab_reclaimable:220784 slab_unreclaimable:39819
> mapped:491617 shmem:382891
> pagetables:20439 bounce:0 free:35301 free_pcp:895 free_cma:0
>
> And yes, CONFIG_COMPACTION was enabled.
Does this compact manually or automatically?
> So quite honestly, I *really* don't think that a 1kB allocation should
> have reasonably failed and killed anything at all (ok, it could have
> been an 8kB one, who knows - but it really looks like it *could* have
> been just 1kB).
>
> Considering that kmalloc() pattern, I suspect that we need to consider
> order-3 allocations "small", and try a lot harder.
>
> Because killing processes due to "out of memory" in this situation is
> unquestionably a bug.
In this case I'd wonder why the freeable-but-still-used-in-pagecache
memory isn't being reaped at alloc time.
> And no, I can't recreate this, obviously.
>
> I think there's a series in -mm that hasn't been merged and that is
> pending (presumably for 4.9). I think Arkadiusz tested it for his
> (repeatable) workload. It may need to be considered for 4.8, because
> the above is ridiculously bad, imho.
>
> Andrew? Vlastimil? Michal? Others?
>
> Linus
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-19 14:41 ` Vlastimil Babka
2016-09-19 18:18 ` Linus Torvalds
@ 2016-09-19 19:57 ` Christoph Lameter
1 sibling, 0 replies; 31+ messages in thread
From: Christoph Lameter @ 2016-09-19 19:57 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andi Kleen, Linus Torvalds, Michal Hocko, Tetsuo Handa,
Oleg Nesterov, Vladimir Davydov, Andrew Morton,
Markus Trippelsdorf, Arkadiusz Miskiewicz, Ralf-Peter Rohbeck,
Jiri Slaby, Olaf Hering, Joonsoo Kim, linux-mm
On Mon, 19 Sep 2016, Vlastimil Babka wrote:
> There's no __GFP_NOWARN | __GFP_NORETRY, so it clearly wasn't the
> opportunistic "initial higher-order allocation". The logical conclusion is
> that it was a genuine order-3 allocation. 1kB allocation using order-3 would
> silently fail without OOM or warning, and then fallback to order-0.
Sorry if you really want an object that is greater than page size then the
slab allocators wont be able to satisfy that with an order 0 allocation.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-19 14:41 ` Vlastimil Babka
@ 2016-09-19 18:18 ` Linus Torvalds
2016-09-19 19:57 ` Christoph Lameter
1 sibling, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2016-09-19 18:18 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andi Kleen, Christoph Lameter, Michal Hocko, Tetsuo Handa,
Oleg Nesterov, Vladimir Davydov, Andrew Morton,
Markus Trippelsdorf, Arkadiusz Miskiewicz, Ralf-Peter Rohbeck,
Jiri Slaby, Olaf Hering, Joonsoo Kim, linux-mm
On Mon, Sep 19, 2016 at 7:41 AM, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> There's no __GFP_NOWARN | __GFP_NORETRY, so it clearly wasn't the
> opportunistic "initial higher-order allocation". The logical conclusion is
> that it was a genuine order-3 allocation. 1kB allocation using order-3 would
> silently fail without OOM or warning, and then fallback to order-0.
Yes, I think you're right. The kcalloc() probably *was* a 32kB
allocation. In which case it's really more of a i915 driver issue.
I'll talk to the drm people and see if they can perhaps fix their
allocation patterns.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-19 14:31 ` Andi Kleen
2016-09-19 14:39 ` Michal Hocko
@ 2016-09-19 14:41 ` Vlastimil Babka
2016-09-19 18:18 ` Linus Torvalds
2016-09-19 19:57 ` Christoph Lameter
1 sibling, 2 replies; 31+ messages in thread
From: Vlastimil Babka @ 2016-09-19 14:41 UTC (permalink / raw)
To: Andi Kleen, Christoph Lameter
Cc: Linus Torvalds, Michal Hocko, Tetsuo Handa, Oleg Nesterov,
Vladimir Davydov, Andrew Morton, Markus Trippelsdorf,
Arkadiusz Miskiewicz, Ralf-Peter Rohbeck, Jiri Slaby,
Olaf Hering, Joonsoo Kim, linux-mm
On 09/19/2016 04:31 PM, Andi Kleen wrote:
> On Mon, Sep 19, 2016 at 08:37:36AM -0500, Christoph Lameter wrote:
>> On Sun, 18 Sep 2016, Andi Kleen wrote:
>>
>>>> Sounds like SLUB. SLAB would use order-0 as long as things fit. I would
>>>> hope for SLUB to fallback to order-0 (or order-1 for 8kB) instead of
>>>> OOM, though. Guess not...
>>>
>>> It's already trying to do that, perhaps just some flags need to be
>>> changed?
>>
>> SLUB tries order-N and falls back to order 0 on failure.
>
> Right it tries, but Linus apparently got an OOM in the order-N
> allocation. So somehow the flag combination that it passes first
> is not preventing the OOM killer.
But Linus' error was:
Xorg invoked oom-killer:
gfp_mask=0x240c0d0(GFP_TEMPORARY|__GFP_COMP|__GFP_ZERO), order=3,
oom_score_adj=0
There's no __GFP_NOWARN | __GFP_NORETRY, so it clearly wasn't the
opportunistic "initial higher-order allocation". The logical conclusion
is that it was a genuine order-3 allocation. 1kB allocation using
order-3 would silently fail without OOM or warning, and then fallback to
order-0.
> -Andi
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-19 14:31 ` Andi Kleen
@ 2016-09-19 14:39 ` Michal Hocko
2016-09-19 14:41 ` Vlastimil Babka
1 sibling, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-09-19 14:39 UTC (permalink / raw)
To: Andi Kleen
Cc: Christoph Lameter, Vlastimil Babka, Linus Torvalds, Tetsuo Handa,
Oleg Nesterov, Vladimir Davydov, Andrew Morton,
Markus Trippelsdorf, Arkadiusz Miskiewicz, Ralf-Peter Rohbeck,
Jiri Slaby, Olaf Hering, Joonsoo Kim, linux-mm
On Mon 19-09-16 07:31:06, Andi Kleen wrote:
> On Mon, Sep 19, 2016 at 08:37:36AM -0500, Christoph Lameter wrote:
> > On Sun, 18 Sep 2016, Andi Kleen wrote:
> >
> > > > Sounds like SLUB. SLAB would use order-0 as long as things fit. I would
> > > > hope for SLUB to fallback to order-0 (or order-1 for 8kB) instead of
> > > > OOM, though. Guess not...
> > >
> > > It's already trying to do that, perhaps just some flags need to be
> > > changed?
> >
> > SLUB tries order-N and falls back to order 0 on failure.
>
> Right it tries, but Linus apparently got an OOM in the order-N
> allocation. So somehow the flag combination that it passes first
> is not preventing the OOM killer.
It does AFAICS:
alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > oo_order(s->min))
alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~(__GFP_RECLAIM|__GFP_NOFAIL);
page = alloc_slab_page(s, alloc_gfp, node, oo);
if (unlikely(!page)) {
oo = s->min;
alloc_gfp = flags;
/*
* Allocation may have failed due to fragmentation.
* Try a lower order alloc if possible
*/
page = alloc_slab_page(s, alloc_gfp, node, oo);
I think that Linus just see a genuine order-3 request
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
[not found] ` <alpine.DEB.2.20.1609190836540.12121@east.gentwo.org>
@ 2016-09-19 14:31 ` Andi Kleen
2016-09-19 14:39 ` Michal Hocko
2016-09-19 14:41 ` Vlastimil Babka
0 siblings, 2 replies; 31+ messages in thread
From: Andi Kleen @ 2016-09-19 14:31 UTC (permalink / raw)
To: Christoph Lameter
Cc: Andi Kleen, Vlastimil Babka, Linus Torvalds, Michal Hocko,
Tetsuo Handa, Oleg Nesterov, Vladimir Davydov, Andrew Morton,
Markus Trippelsdorf, Arkadiusz Miskiewicz, Ralf-Peter Rohbeck,
Jiri Slaby, Olaf Hering, Joonsoo Kim, linux-mm
On Mon, Sep 19, 2016 at 08:37:36AM -0500, Christoph Lameter wrote:
> On Sun, 18 Sep 2016, Andi Kleen wrote:
>
> > > Sounds like SLUB. SLAB would use order-0 as long as things fit. I would
> > > hope for SLUB to fallback to order-0 (or order-1 for 8kB) instead of
> > > OOM, though. Guess not...
> >
> > It's already trying to do that, perhaps just some flags need to be
> > changed?
>
> SLUB tries order-N and falls back to order 0 on failure.
Right it tries, but Linus apparently got an OOM in the order-N
allocation. So somehow the flag combination that it passes first
is not preventing the OOM killer.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-19 8:42 ` Lorenzo Stoakes
@ 2016-09-19 8:53 ` Michal Hocko
2016-09-25 21:48 ` Lorenzo Stoakes
0 siblings, 1 reply; 31+ messages in thread
From: Michal Hocko @ 2016-09-19 8:53 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Linus Torvalds, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Vlastimil Babka, Andrew Morton, Markus Trippelsdorf,
Arkadiusz Miskiewicz, Ralf-Peter Rohbeck, Jiri Slaby,
Olaf Hering, Joonsoo Kim, linux-mm
On Mon 19-09-16 09:42:37, Lorenzo Stoakes wrote:
> On Mon, Sep 19, 2016 at 10:32:15AM +0200, Michal Hocko wrote:
> >
> > so this is the same thing as in Linus case. All the zones are hitting
> > min wmark so the should_compact_retry() gave up. As mentioned in other
> > email [1] this is inherent limitation of the workaround. Your system is
> > swapless but there is a lot of the reclaimable page cache so Vlastimil's
> > patches should help.
>
> I will experiment with a linux-next kernel and see if the problem
> recurs. I've attempted to see if there is a way to manually reproduce
> on the mainline kernel by performing workloads that triggered the
> OOM (loading google sheets tabs, compiling a kernel, playing a video
> on youtube), but to no avail - it seems the system needs to be
> sufficiently fragmented first before it'll trigger.
>
> Given that's the case, I'll just have to try using the linux-next
> kernel and if you don't hear from me you can assume it did not repro
> again :)
OK, fair deal ;)
> I actually have a whole bunch of other OOM kill logs that I saved
> from previous occurrences of this issue, would it be useful for me to
> pastebin them, or would they not add anything of use beyond what's
> been shown in this thread?
If they are from before the workaround then it probably won't be that
useful.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-19 8:32 ` Michal Hocko
@ 2016-09-19 8:42 ` Lorenzo Stoakes
2016-09-19 8:53 ` Michal Hocko
0 siblings, 1 reply; 31+ messages in thread
From: Lorenzo Stoakes @ 2016-09-19 8:42 UTC (permalink / raw)
To: Michal Hocko
Cc: Linus Torvalds, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Vlastimil Babka, Andrew Morton, Markus Trippelsdorf,
Arkadiusz Miskiewicz, Ralf-Peter Rohbeck, Jiri Slaby,
Olaf Hering, Joonsoo Kim, linux-mm
On Mon, Sep 19, 2016 at 10:32:15AM +0200, Michal Hocko wrote:
>
> so this is the same thing as in Linus case. All the zones are hitting
> min wmark so the should_compact_retry() gave up. As mentioned in other
> email [1] this is inherent limitation of the workaround. Your system is
> swapless but there is a lot of the reclaimable page cache so Vlastimil's
> patches should help.
I will experiment with a linux-next kernel and see if the problem recurs. I've attempted to see if there is a way to manually reproduce on the mainline kernel by performing workloads that triggered the OOM (loading google sheets tabs, compiling a kernel, playing a video on youtube), but to no avail - it seems the system needs to be sufficiently fragmented first before it'll trigger.
Given that's the case, I'll just have to try using the linux-next kernel and if you don't hear from me you can assume it did not repro again :)
I actually have a whole bunch of other OOM kill logs that I saved from previous occurrences of this issue, would it be useful for me to pastebin them, or would they not add anything of use beyond what's been shown in this thread?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-18 20:26 ` Lorenzo Stoakes
2016-09-18 20:58 ` Linus Torvalds
@ 2016-09-19 8:32 ` Michal Hocko
2016-09-19 8:42 ` Lorenzo Stoakes
1 sibling, 1 reply; 31+ messages in thread
From: Michal Hocko @ 2016-09-19 8:32 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Linus Torvalds, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Vlastimil Babka, Andrew Morton, Markus Trippelsdorf,
Arkadiusz Miskiewicz, Ralf-Peter Rohbeck, Jiri Slaby,
Olaf Hering, Joonsoo Kim, linux-mm
On Sun 18-09-16 21:26:14, Lorenzo Stoakes wrote:
> Hi all,
>
> In case it's helpful - I have experienced these OOM issues invoked
> in my case via the nvidia driver and similarly to Linus an order
> 3 allocation resulted in killed chromium tabs. I encountered this
> even after applying the patch discussed in the original thread at
> https://lkml.org/lkml/2016/8/22/184. It's not easily reproducible
> but it is happening enough that I could probably check some specific
> state when it next occurs or test out a patch to see if it stops it if
> that'd be useful.
>
> I saved a couple OOM's from the last time it occurred, this is on a
> 8GiB system with plenty of reclaimable memory:
Just for the reference
> [350085.038693] Xorg invoked oom-killer: gfp_mask=0x24040c0(GFP_KERNEL|__GFP_COMP), order=3, oom_score_adj=0
> [350085.038696] Xorg cpuset=/ mems_allowed=0
> [350085.038699] CPU: 0 PID: 2119 Comm: Xorg Tainted: P O 4.7.2-1-custom #1
[...]
> [350085.039048] Mem-Info:
> [350085.039051] active_anon:861397 inactive_anon:23397 isolated_anon:0
> active_file:146274 inactive_file:144248 isolated_file:0
> unevictable:8 dirty:14587 writeback:0 unstable:0
> slab_reclaimable:697630 slab_unreclaimable:24397
> mapped:79655 shmem:26548 pagetables:7211 bounce:0
> free:25159 free_pcp:235 free_cma:0
> [350085.039054] Node 0 DMA free:15516kB min:136kB low:168kB high:200kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> [350085.039058] lowmem_reserve[]: 0 3196 7658 7658
> [350085.039060] Node 0 DMA32 free:45980kB min:28148kB low:35184kB high:42220kB active_anon:1466208kB inactive_anon:43120kB active_file:239740kB inactive_file:234920kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3617864kB managed:3280092kB mlocked:0kB dirty:21692kB writeback:0kB mapped:131184kB shmem:47588kB slab_reclaimable:1147984kB slab_unreclaimable:37484kB kernel_stack:2976kB pagetables:11512kB unstable:0kB bounce:0kB free_pcp:188kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> [350085.039064] lowmem_reserve[]: 0 0 4462 4462
45980-(4462*4) = 28132
> [350085.039065] Node 0 Normal free:39140kB min:39296kB low:49120kB high:58944kB active_anon:1979380kB inactive_anon:50468kB active_file:345356kB inactive_file:342072kB unevictable:32kB isolated(anon):0kB isolated(file):0kB present:4702208kB managed:4569312kB mlocked:32kB dirty:36656kB writeback:0kB mapped:187436kB shmem:58604kB slab_reclaimable:1642536kB slab_unreclaimable:60104kB kernel_stack:5040kB pagetables:17332kB unstable:0kB bounce:0kB free_pcp:752kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:136 all_unreclaimable? no
so this is the same thing as in Linus case. All the zones are hitting
min wmark so the should_compact_retry() gave up. As mentioned in other
email [1] this is inherent limitation of the workaround. Your system is
swapless but there is a lot of the reclaimable page cache so Vlastimil's
patches should help.
[1] http://lkml.kernel.org/r/20160919075230.GE10785@dhcp22.suse.cz
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-19 7:01 ` Michal Hocko
@ 2016-09-19 7:52 ` Michal Hocko
0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-09-19 7:52 UTC (permalink / raw)
To: Linus Torvalds
Cc: Vlastimil Babka, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Andrew Morton, Markus Trippelsdorf, Arkadiusz Miskiewicz,
Ralf-Peter Rohbeck, Jiri Slaby, Olaf Hering, Joonsoo Kim,
linux-mm
On Mon 19-09-16 09:01:06, Michal Hocko wrote:
> On Sun 18-09-16 14:18:22, Linus Torvalds wrote:
[...]
> > I'm not saying the code should fail and return NULL either, of course.
> >
> > So PAGE_ALLOC_COSTLY_ORDER should *not* mean "oom rather than return
> > NULL". It really has to mean "try a _lot_ harder".
>
> Agreed and Vlastimil's patches go that route. We just do not try
> sufficiently hard with the compaction.
And just to clarify why I think that Vlastimil's patches might help
here. Your allocation fails because you seem to be hitting min watermark
even for order-0 with my workaround which is sitting in 4.8. If this is
a longer term state then the compaction even doesn't try to do anything.
With the original should_compact_retry we would keep retrying based on
compaction_withdrawn() feedback. That would get us over order-0
watermarks kick the compaction in. Without Vlastimil's patches we could
still give up too early due some of the back off heuristic in the
compaction code. But most of those should be gone with his patches. So I
believe that they should really help here. Maybe there are still some
places to look at - I didn't get to fully review his patches (plan to do
it this week).
So in short the workaround we have in 4.8 currently tried to plug the
biggest hole while the situation is not ideal. That's why I originally
hoped for the compaction feedback already in 4.8.
I fully realize this is a lot of code for late 4.8 cycle, though. So if
this turns out to be really critical for 4.8 then what Vlastimil was
suggesting in
http://lkml.kernel.org/r/6aa81fe3-7f04-78d7-d477-609a7acd351a@suse.cz
might be another workaround on top. We can even consider completely
disable OOM killer for !costly orders.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-18 21:18 ` Linus Torvalds
2016-09-19 6:27 ` Jiri Slaby
@ 2016-09-19 7:01 ` Michal Hocko
2016-09-19 7:52 ` Michal Hocko
1 sibling, 1 reply; 31+ messages in thread
From: Michal Hocko @ 2016-09-19 7:01 UTC (permalink / raw)
To: Linus Torvalds
Cc: Vlastimil Babka, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Andrew Morton, Markus Trippelsdorf, Arkadiusz Miskiewicz,
Ralf-Peter Rohbeck, Jiri Slaby, Olaf Hering, Joonsoo Kim,
linux-mm
On Sun 18-09-16 14:18:22, Linus Torvalds wrote:
> On Sun, Sep 18, 2016 at 2:00 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > Sounds like SLUB. SLAB would use order-0 as long as things fit. I would
> > hope for SLUB to fallback to order-0 (or order-1 for 8kB) instead of
> > OOM, though. Guess not...
>
> SLUB it is - and I think that's pretty much all the world these days.
> SLAB is largely deprecated.
It seems that this is not a general consensus
http://lkml.kernel.org/r/20160823153807.GN23577@dhcp22.suse.cz
> We should probably start to remove SLAB entirely, and I definitely
> hope that no oom people run with it. SLUB is marked default in our
> config files, and I think most distros follow that (I know Fedora
> does, didn't check others).
>
> > Well, order-3 is actually PAGE_ALLOC_COSTLY_ORDER, and costly orders
> > have to be strictly larger in all the tests. So order-3 is in fact still
> > considered "small", and thus it actually results in OOM instead of
> > allocation failure.
>
> Yeah, but I do think that "oom when you have 156MB free and 7GB
> reclaimable, and haven't even tried swapping" counts as obviously
> wrong.
The thing is that swapping doesn't really help. You can easily migrate
anonymous memory to create larger blocks even without reclaiming them.
So I still believe compaction is giving up too easily.
> I'm not saying the code should fail and return NULL either, of course.
>
> So PAGE_ALLOC_COSTLY_ORDER should *not* mean "oom rather than return
> NULL". It really has to mean "try a _lot_ harder".
Agreed and Vlastimil's patches go that route. We just do not try
sufficiently hard with the compaction.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-18 22:00 ` Vlastimil Babka
@ 2016-09-19 6:56 ` Michal Hocko
0 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-09-19 6:56 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Linus Torvalds, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Andrew Morton, Markus Trippelsdorf, Arkadiusz Miskiewicz,
Ralf-Peter Rohbeck, Jiri Slaby, Olaf Hering, Joonsoo Kim,
linux-mm
On Mon 19-09-16 00:00:24, Vlastimil Babka wrote:
[...]
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a2214c64ed3c..9b3b3a79c58a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3347,17 +3347,24 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
> ac->nodemask) {
> unsigned long available;
> unsigned long reclaimable;
> + int check_order = order;
> + unsigned long watermark = min_wmark_pages(zone);
>
> available = reclaimable = zone_reclaimable_pages(zone);
> available -= DIV_ROUND_UP(no_progress_loops * available,
> MAX_RECLAIM_RETRIES);
> available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
>
> + if (order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER) {
> + check_order = 0;
> + watermark += 1UL << order;
> + }
> +
> /*
> * Would the allocation succeed if we reclaimed the whole
> * available?
> */
> - if (__zone_watermark_ok(zone, order, min_wmark_pages(zone),
> + if (__zone_watermark_ok(zone, check_order, watermark,
> ac_classzone_idx(ac), alloc_flags, available)) {
> /*
> * If we didn't make any progress and have a lot of
Joonsoo was suggesting something like this before and I really hated
that. We can very well just not invoke the OOM killer for those requests
at all and rely on a smaller order request to trigger it for us. But
who knows maybe we will have no other option and bite the bullet and
declare the defeat and do something special for !costly orders.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-18 20:03 Linus Torvalds
` (2 preceding siblings ...)
2016-09-18 22:00 ` Vlastimil Babka
@ 2016-09-19 6:48 ` Michal Hocko
2016-09-21 7:04 ` Raymond Jennings
4 siblings, 0 replies; 31+ messages in thread
From: Michal Hocko @ 2016-09-19 6:48 UTC (permalink / raw)
To: Linus Torvalds
Cc: Tetsuo Handa, Oleg Nesterov, Vladimir Davydov, Vlastimil Babka,
Andrew Morton, Markus Trippelsdorf, Arkadiusz Miskiewicz,
Ralf-Peter Rohbeck, Jiri Slaby, Olaf Hering, Joonsoo Kim,
linux-mm
On Sun 18-09-16 13:03:01, Linus Torvalds wrote:
> [ More or less random collection of people from previous oom patches
> and/or discussions, if you feel you shouldn't have been cc'd, blame me
> for just picking things from earlier threads and/or commits ]
>
> I'm afraid that the oom situation is still not fixed, and the "let's
> die quickly" patches are still a nasty regression.
>
> I have a 16GB desktop that I just noticed killed one of the chrome
> tabs yesterday. Tha machine had *tons* of freeable memory, with
> something like 7GB of page cache at the time, if I read this right.
>
> The trigger is a kcalloc() in the i915 driver:
>
> Xorg invoked oom-killer:
> gfp_mask=0x240c0d0(GFP_TEMPORARY|__GFP_COMP|__GFP_ZERO), order=3,
> oom_score_adj=0
>
> __kmalloc+0x1cd/0x1f0
> alloc_gen8_temp_bitmaps+0x47/0x80 [i915]
>
> which looks like it is one of these:
>
> slabinfo - version: 2.1
> # name <active_objs> <num_objs> <objsize> <objperslab>
> <pagesperslab>
> kmalloc-8192 268 268 8192 4 8
> kmalloc-4096 732 786 4096 8 8
> kmalloc-2048 1402 1456 2048 16 8
> kmalloc-1024 2505 2976 1024 32 8
>
> so even just a 1kB allocation can cause an order-3 page allocation.
Yes it can trigger order-3 but that should be just
alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL
so not triggering OOM and failing early rather than retry really hard.
Considering the above gfp_mask this seems like the real order-3 size
request.
> And yeah, I had what, 137MB free memory, it's just that it's all
> fairly fragmented.
137MB in your case means that all usable zones are not meating the min
wmark so 6b4e3181d7bd ("mm, oom: prevent premature OOM killer invocation
for high order request") didn't stop the OOM.
[...]
> So quite honestly, I *really* don't think that a 1kB allocation should
> have reasonably failed and killed anything at all (ok, it could have
> been an 8kB one, who knows - but it really looks like it *could* have
> been just 1kB).
Unless I am missing something this should really be a 32k request. It is
true that retrying some or much more might help here indeed this is
really hard to tell. Vlastimil's patches you have mentioned might really
help here because they are getting rid of most of the heuristics that
would give up just too early. But I am also wondering whether a more
pragmatic approach in this case would be to simply use GFP_NORETRY and
fallback to vmalloc. Note that I am not familiar with the code and
vmalloc might be a no-go but it is at least worth exploring this option.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-18 21:18 ` Linus Torvalds
@ 2016-09-19 6:27 ` Jiri Slaby
2016-09-19 7:01 ` Michal Hocko
1 sibling, 0 replies; 31+ messages in thread
From: Jiri Slaby @ 2016-09-19 6:27 UTC (permalink / raw)
To: Linus Torvalds, Vlastimil Babka
Cc: Olaf Hering, Arkadiusz Miskiewicz, Joonsoo Kim, Tetsuo Handa,
Michal Hocko, linux-mm, Andrew Morton, Vladimir Davydov,
Ralf-Peter Rohbeck, Oleg Nesterov, Markus Trippelsdorf
On 09/18/2016, 11:18 PM, Linus Torvalds wrote:
> SLUB is marked default in our
> config files, and I think most distros follow that (I know Fedora
> does, didn't check others).
For the reference, all active SUSE kernels use SLAB.
thanks,
--
js
suse labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-18 21:00 ` Vlastimil Babka
2016-09-18 21:18 ` Linus Torvalds
@ 2016-09-19 1:07 ` Andi Kleen
[not found] ` <alpine.DEB.2.20.1609190836540.12121@east.gentwo.org>
1 sibling, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2016-09-19 1:07 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Linus Torvalds, Michal Hocko, Tetsuo Handa, Oleg Nesterov,
Vladimir Davydov, Andrew Morton, Markus Trippelsdorf,
Arkadiusz Miskiewicz, Ralf-Peter Rohbeck, Jiri Slaby,
Olaf Hering, Joonsoo Kim, linux-mm, cl
Vlastimil Babka <vbabka@suse.cz> writes:
>>
>> The trigger is a kcalloc() in the i915 driver:
>>
>> Xorg invoked oom-killer:
>> gfp_mask=0x240c0d0(GFP_TEMPORARY|__GFP_COMP|__GFP_ZERO), order=3,
>> oom_score_adj=0
>>
>> __kmalloc+0x1cd/0x1f0
>> alloc_gen8_temp_bitmaps+0x47/0x80 [i915]
>>
>> which looks like it is one of these:
>>
>> slabinfo - version: 2.1
>> # name <active_objs> <num_objs> <objsize> <objperslab>
>> <pagesperslab>
>> kmalloc-8192 268 268 8192 4 8
>> kmalloc-4096 732 786 4096 8 8
>> kmalloc-2048 1402 1456 2048 16 8
>> kmalloc-1024 2505 2976 1024 32 8
>>
>> so even just a 1kB allocation can cause an order-3 page allocation.
>
> Sounds like SLUB. SLAB would use order-0 as long as things fit. I would
> hope for SLUB to fallback to order-0 (or order-1 for 8kB) instead of
> OOM, though. Guess not...
It's already trying to do that, perhaps just some flags need to be
changed?
Adding Christoph.
flags |= s->allocflags;
/*
* Let the initial higher-order allocation fail under memory pressure
* so we fall-back to the minimum order allocation.
*/
alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
page = alloc_slab_page(alloc_gfp, node, oo);
if (unlikely(!page)) {
oo = s->min;
/*
* Allocation may have failed due to fragmentation.
* Try a lower order alloc if possible
*/
page = alloc_slab_page(flags, node, oo);
if (page)
stat(s, ORDER_FALLBACK);
}
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-18 20:03 Linus Torvalds
2016-09-18 20:26 ` Lorenzo Stoakes
2016-09-18 21:00 ` Vlastimil Babka
@ 2016-09-18 22:00 ` Vlastimil Babka
2016-09-19 6:56 ` Michal Hocko
2016-09-19 6:48 ` Michal Hocko
2016-09-21 7:04 ` Raymond Jennings
4 siblings, 1 reply; 31+ messages in thread
From: Vlastimil Babka @ 2016-09-18 22:00 UTC (permalink / raw)
To: Linus Torvalds, Michal Hocko, Tetsuo Handa, Oleg Nesterov,
Vladimir Davydov
Cc: Andrew Morton, Markus Trippelsdorf, Arkadiusz Miskiewicz,
Ralf-Peter Rohbeck, Jiri Slaby, Olaf Hering, Joonsoo Kim,
linux-mm
On 09/18/2016 10:03 PM, Linus Torvalds wrote:
> [ More or less random collection of people from previous oom patches
> and/or discussions, if you feel you shouldn't have been cc'd, blame
> me for just picking things from earlier threads and/or commits ]
>
> I'm afraid that the oom situation is still not fixed, and the "let's
> die quickly" patches are still a nasty regression.
So I'm trying to understand the core of the regression compared to
pre-4.7. It can't be the compaction feedback, as that was reverted, and
compaction itself shouldn't perform worse than pre-4.7. This leaves us
with should_reclaim_retry() false. This can return false if:
1) no_progress_loops > MAX_RECLAIM_RETRIES
But we have in __allow_pages_slowpath() this:
if (did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER)
no_progress_loops = 0;
I doubt reclaim makes no progress in your case, and the non-costly order
is also true. So, unlikely.
2) The watermark check that includes estimate for pages available for
reclaim fails.
Could be the backoff in calculation of "available" in
should_reclaim_retry() is too aggressive. But it depends on the
no_progress_loops which I think is 0 (see above). Again, unlikely.
But the watermark check doesn't actually work for order-1+ allocations,
the "available" estimate only affects order-0 check. For higher orders
it will be false if the page of sufficient order doesn't already exist.
That's fine if we trust should_compact_retry() in such case.
But Joonsoo already had a theoretical scenario where this can fall apart:
http://lkml.kernel.org/r/<20160824050157.GA22781@js1304-P5Q-DELUXE>
See the part that starts at "Assume following situation:". I suspect
something like that happened here.
I think at least temporarily we'll have to make the watermark check
to be order-0 check for non-costly orders.
Something like below (untested)?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a2214c64ed3c..9b3b3a79c58a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3347,17 +3347,24 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
ac->nodemask) {
unsigned long available;
unsigned long reclaimable;
+ int check_order = order;
+ unsigned long watermark = min_wmark_pages(zone);
available = reclaimable = zone_reclaimable_pages(zone);
available -= DIV_ROUND_UP(no_progress_loops * available,
MAX_RECLAIM_RETRIES);
available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
+ if (order > 0 && order <= PAGE_ALLOC_COSTLY_ORDER) {
+ check_order = 0;
+ watermark += 1UL << order;
+ }
+
/*
* Would the allocation succeed if we reclaimed the whole
* available?
*/
- if (__zone_watermark_ok(zone, order, min_wmark_pages(zone),
+ if (__zone_watermark_ok(zone, check_order, watermark,
ac_classzone_idx(ac), alloc_flags, available)) {
/*
* If we didn't make any progress and have a lot of
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-18 21:13 ` Vlastimil Babka
@ 2016-09-18 21:34 ` Lorenzo Stoakes
0 siblings, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2016-09-18 21:34 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Linus Torvalds, Michal Hocko, Tetsuo Handa, Oleg Nesterov,
Vladimir Davydov, Andrew Morton, Markus Trippelsdorf,
Arkadiusz Miskiewicz, Ralf-Peter Rohbeck, Jiri Slaby,
Olaf Hering, Joonsoo Kim, linux-mm
On Sun, Sep 18, 2016 at 11:13:36PM +0200, Vlastimil Babka wrote:
>
> The 4 patches above had more as prerequisities already in -mm. So one
> way to test is the whole tree:
> git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
> tag mmotm-2016-09-14-16-49
>
> or just a recent -next.
>
Thanks, I will try this out (probably using a recent -next.)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-18 21:00 ` Vlastimil Babka
@ 2016-09-18 21:18 ` Linus Torvalds
2016-09-19 6:27 ` Jiri Slaby
2016-09-19 7:01 ` Michal Hocko
2016-09-19 1:07 ` Andi Kleen
1 sibling, 2 replies; 31+ messages in thread
From: Linus Torvalds @ 2016-09-18 21:18 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Michal Hocko, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Andrew Morton, Markus Trippelsdorf, Arkadiusz Miskiewicz,
Ralf-Peter Rohbeck, Jiri Slaby, Olaf Hering, Joonsoo Kim,
linux-mm
On Sun, Sep 18, 2016 at 2:00 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> Sounds like SLUB. SLAB would use order-0 as long as things fit. I would
> hope for SLUB to fallback to order-0 (or order-1 for 8kB) instead of
> OOM, though. Guess not...
SLUB it is - and I think that's pretty much all the world these days.
SLAB is largely deprecated.
We should probably start to remove SLAB entirely, and I definitely
hope that no oom people run with it. SLUB is marked default in our
config files, and I think most distros follow that (I know Fedora
does, didn't check others).
> Well, order-3 is actually PAGE_ALLOC_COSTLY_ORDER, and costly orders
> have to be strictly larger in all the tests. So order-3 is in fact still
> considered "small", and thus it actually results in OOM instead of
> allocation failure.
Yeah, but I do think that "oom when you have 156MB free and 7GB
reclaimable, and haven't even tried swapping" counts as obviously
wrong.
I'm not saying the code should fail and return NULL either, of course.
So PAGE_ALLOC_COSTLY_ORDER should *not* mean "oom rather than return
NULL". It really has to mean "try a _lot_ harder".
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-18 20:58 ` Linus Torvalds
@ 2016-09-18 21:13 ` Vlastimil Babka
2016-09-18 21:34 ` Lorenzo Stoakes
0 siblings, 1 reply; 31+ messages in thread
From: Vlastimil Babka @ 2016-09-18 21:13 UTC (permalink / raw)
To: Linus Torvalds, Lorenzo Stoakes
Cc: Michal Hocko, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Andrew Morton, Markus Trippelsdorf, Arkadiusz Miskiewicz,
Ralf-Peter Rohbeck, Jiri Slaby, Olaf Hering, Joonsoo Kim,
linux-mm
On 09/18/2016 10:58 PM, Linus Torvalds wrote:
> On Sun, Sep 18, 2016 at 1:26 PM, Lorenzo Stoakes <lstoakes@gmail.com> wrote:
>>
>> I encountered this even after applying the patch discussed in the
>> original thread at https://lkml.org/lkml/2016/8/22/184. It's not easily
>> reproducible but it is happening enough that I could probably check some
>> specific state when it next occurs or test out a patch to see if it
>> stops it if that'd be useful.
>
> Since you can at least try to recreate it, how about the series in -mm
> by Vlastimil? The series was called "reintroduce compaction feedback
> for OOM decisions", and is in -mm right now:
>
> Vlastimil Babka (4):
> Revert "mm, oom: prevent premature OOM killer invocation for high
> order request"
> mm, compaction: more reliably increase direct compaction priority
> mm, compaction: restrict full priority to non-costly orders
> mm, compaction: make full priority ignore pageblock suitability
>
> I'm not sure if Andrew has any other ones pending that are relevant to oom.
The 4 patches above had more as prerequisities already in -mm. So one
way to test is the whole tree:
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
tag mmotm-2016-09-14-16-49
or just a recent -next.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-18 20:03 Linus Torvalds
2016-09-18 20:26 ` Lorenzo Stoakes
@ 2016-09-18 21:00 ` Vlastimil Babka
2016-09-18 21:18 ` Linus Torvalds
2016-09-19 1:07 ` Andi Kleen
2016-09-18 22:00 ` Vlastimil Babka
` (2 subsequent siblings)
4 siblings, 2 replies; 31+ messages in thread
From: Vlastimil Babka @ 2016-09-18 21:00 UTC (permalink / raw)
To: Linus Torvalds, Michal Hocko, Tetsuo Handa, Oleg Nesterov,
Vladimir Davydov
Cc: Andrew Morton, Markus Trippelsdorf, Arkadiusz Miskiewicz,
Ralf-Peter Rohbeck, Jiri Slaby, Olaf Hering, Joonsoo Kim,
linux-mm
On 09/18/2016 10:03 PM, Linus Torvalds wrote:
> [ More or less random collection of people from previous oom patches
> and/or discussions, if you feel you shouldn't have been cc'd, blame me
> for just picking things from earlier threads and/or commits ]
>
> I'm afraid that the oom situation is still not fixed, and the "let's
> die quickly" patches are still a nasty regression.
>
> I have a 16GB desktop that I just noticed killed one of the chrome
> tabs yesterday. Tha machine had *tons* of freeable memory, with
> something like 7GB of page cache at the time, if I read this right.
>
> The trigger is a kcalloc() in the i915 driver:
>
> Xorg invoked oom-killer:
> gfp_mask=0x240c0d0(GFP_TEMPORARY|__GFP_COMP|__GFP_ZERO), order=3,
> oom_score_adj=0
>
> __kmalloc+0x1cd/0x1f0
> alloc_gen8_temp_bitmaps+0x47/0x80 [i915]
>
> which looks like it is one of these:
>
> slabinfo - version: 2.1
> # name <active_objs> <num_objs> <objsize> <objperslab>
> <pagesperslab>
> kmalloc-8192 268 268 8192 4 8
> kmalloc-4096 732 786 4096 8 8
> kmalloc-2048 1402 1456 2048 16 8
> kmalloc-1024 2505 2976 1024 32 8
>
> so even just a 1kB allocation can cause an order-3 page allocation.
Sounds like SLUB. SLAB would use order-0 as long as things fit. I would
hope for SLUB to fallback to order-0 (or order-1 for 8kB) instead of
OOM, though. Guess not...
> And yeah, I had what, 137MB free memory, it's just that it's all
> fairly fragmented. There's actually even order-4 pages, but they are
> in low DMA memory and the system tries to protect them:
>
> Node 0 DMA: 0*4kB 1*8kB (U) 2*16kB (U) 1*32kB (U) 3*64kB (U) 2*128kB
> (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
> Node 0 DMA32: 11110*4kB (UMEH) 2929*8kB (UMEH) 44*16kB (MH) 1*32kB
> (H) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
> 68608kB
> Node 0 Normal: 14031*4kB (UMEH) 49*8kB (UMEH) 18*16kB (UH) 0*32kB
> 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 56804kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
> hugepages_size=1048576kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
> hugepages_size=2048kB
> 2084682 total pagecache pages
> 11 pages in swap cache
> Swap cache stats: add 35, delete 24, find 2/3
> Free swap = 8191868kB
> Total swap = 8191996kB
> 4168499 pages RAM
>
> And it looks like there's a fair amount of memory busy under writeback
> (470MB or so)
>
> active_anon:1539159 inactive_anon:374915 isolated_anon:0
> active_file:1251771 inactive_file:450068
> isolated_file:0
> unevictable:175 dirty:26 writeback:118690 unstable:0
> slab_reclaimable:220784 slab_unreclaimable:39819
> mapped:491617 shmem:382891 pagetables:20439 bounce:0
> free:35301 free_pcp:895 free_cma:0
>
> And yes, CONFIG_COMPACTION was enabled.
>
> So quite honestly, I *really* don't think that a 1kB allocation should
> have reasonably failed and killed anything at all (ok, it could have
> been an 8kB one, who knows - but it really looks like it *could* have
> been just 1kB).
>
> Considering that kmalloc() pattern, I suspect that we need to consider
> order-3 allocations "small", and try a lot harder.
Well, order-3 is actually PAGE_ALLOC_COSTLY_ORDER, and costly orders
have to be strictly larger in all the tests. So order-3 is in fact still
considered "small", and thus it actually results in OOM instead of
allocation failure.
> Because killing processes due to "out of memory" in this situation is
> unquestionably a bug.
>
> And no, I can't recreate this, obviously.
>
> I think there's a series in -mm that hasn't been merged and that is
> pending (presumably for 4.9). I think Arkadiusz tested it for his
> (repeatable) workload. It may need to be considered for 4.8, because
> the above is ridiculously bad, imho.
So this series will make compaction ignore most of its heuristics
intended for reducing latency, when we keep repeating reclaim/compaction
long enough without success. This should help. But it also restores the
feedback from compaction to the retry loop (that Michal disabled for 4.8
and 4.7.x stable due to the earlier reports). So the result might not be
a clear win and that's why I hoped for more testing (thanks Arkadiusz,
though).
> Andrew? Vlastimil? Michal? Others?
>
> Linus
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-18 20:26 ` Lorenzo Stoakes
@ 2016-09-18 20:58 ` Linus Torvalds
2016-09-18 21:13 ` Vlastimil Babka
2016-09-19 8:32 ` Michal Hocko
1 sibling, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2016-09-18 20:58 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Michal Hocko, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Vlastimil Babka, Andrew Morton, Markus Trippelsdorf,
Arkadiusz Miskiewicz, Ralf-Peter Rohbeck, Jiri Slaby,
Olaf Hering, Joonsoo Kim, linux-mm
On Sun, Sep 18, 2016 at 1:26 PM, Lorenzo Stoakes <lstoakes@gmail.com> wrote:
>
> I encountered this even after applying the patch discussed in the
> original thread at https://lkml.org/lkml/2016/8/22/184. It's not easily
> reproducible but it is happening enough that I could probably check some
> specific state when it next occurs or test out a patch to see if it
> stops it if that'd be useful.
Since you can at least try to recreate it, how about the series in -mm
by Vlastimil? The series was called "reintroduce compaction feedback
for OOM decisions", and is in -mm right now:
Vlastimil Babka (4):
Revert "mm, oom: prevent premature OOM killer invocation for high
order request"
mm, compaction: more reliably increase direct compaction priority
mm, compaction: restrict full priority to non-costly orders
mm, compaction: make full priority ignore pageblock suitability
I'm not sure if Andrew has any other ones pending that are relevant to oom.
A lot of the oom discussion seemed to be about the task stack
allocation (order-2), but kmalloc() really can and does trigger those
order-3 allocations even for small allocations.
Just as an example, these are the slab entries for me that are order-3:
bio-1, UDPv6, TCPv6, kcopyd_job, dm_uevent, mqueue_inode_cache,
ext4_inode_cache, pid_namespace, PING, UDP, TCP, request_queue,
net_namespace, bdev_cache, mm_struct, signal_cache, sighand_cache,
task_struct, idr_layer_cache, dma-kmalloc-8192, dma-kmalloc-4096,
dma-kmalloc-2048, dma-kmalloc-1024, kmalloc-8192, kmalloc-4096,
kmalloc-2048, kmalloc-1024
and most of those are 1-2kB in size.
Of course, any slab allocation failure is harder to trigger just
because slab itself ends up often having empty cache entries, so only
a small percentage makes it to the page allocator itself. But the page
allocator failure case really needs to treat PAGE_ALLOC_COSTLY_ORDER
specially.
Which implies that if compaction is magical for page allocation
success, then compaction needs to do so too.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: More OOM problems
2016-09-18 20:03 Linus Torvalds
@ 2016-09-18 20:26 ` Lorenzo Stoakes
2016-09-18 20:58 ` Linus Torvalds
2016-09-19 8:32 ` Michal Hocko
2016-09-18 21:00 ` Vlastimil Babka
` (3 subsequent siblings)
4 siblings, 2 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2016-09-18 20:26 UTC (permalink / raw)
To: Linus Torvalds
Cc: Michal Hocko, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Vlastimil Babka, Andrew Morton, Markus Trippelsdorf,
Arkadiusz Miskiewicz, Ralf-Peter Rohbeck, Jiri Slaby,
Olaf Hering, Joonsoo Kim, linux-mm
Hi all,
In case it's helpful - I have experienced these OOM issues invoked in my case via the nvidia driver and similarly to Linus an order 3 allocation resulted in killed chromium tabs. I encountered this even after applying the patch discussed in the original thread at https://lkml.org/lkml/2016/8/22/184. It's not easily reproducible but it is happening enough that I could probably check some specific state when it next occurs or test out a patch to see if it stops it if that'd be useful.
I saved a couple OOM's from the last time it occurred, this is on a 8GiB system with plenty of reclaimable memory:
[350085.038693] Xorg invoked oom-killer: gfp_mask=0x24040c0(GFP_KERNEL|__GFP_COMP), order=3, oom_score_adj=0
[350085.038696] Xorg cpuset=/ mems_allowed=0
[350085.038699] CPU: 0 PID: 2119 Comm: Xorg Tainted: P O 4.7.2-1-custom #1
[350085.038701] Hardware name: MSI MS-7850/Z97 PC Mate(MS-7850), BIOS V4.10 08/11/2015
[350085.038702] 0000000000000286 000000009fd6569c ffff88020c60f940 ffffffff812eb122
[350085.038704] ffff88020c60fb18 ffff8800b4cfaac0 ffff88020c60f9b0 ffffffff811f6e4c
[350085.038706] 0000000000000246 ffff880200000000 ffff88020c60f970 ffffffff00000001
[350085.038708] Call Trace:
[350085.038712] [<ffffffff812eb122>] dump_stack+0x63/0x81
[350085.038715] [<ffffffff811f6e4c>] dump_header+0x60/0x1e8
[350085.038718] [<ffffffff811762fa>] oom_kill_process+0x22a/0x440
[350085.038720] [<ffffffff8117696a>] out_of_memory+0x40a/0x4b0
[350085.038723] [<ffffffff812ffdf8>] ? find_next_bit+0x18/0x20
[350085.038725] [<ffffffff8117c034>] __alloc_pages_nodemask+0xee4/0xf20
[350085.038727] [<ffffffff811cb835>] alloc_pages_current+0x95/0x140
[350085.038729] [<ffffffff8117c2f9>] alloc_kmem_pages+0x19/0x90
[350085.038731] [<ffffffff8119a79e>] kmalloc_order_trace+0x2e/0x100
[350085.038733] [<ffffffff811d6bd3>] __kmalloc+0x213/0x230
[350085.038745] [<ffffffffa147d2c7>] nvkms_alloc+0x27/0x60 [nvidia_modeset]
[350085.038752] [<ffffffffa147e540>] ? _nv000318kms+0x40/0x40 [nvidia_modeset]
[350085.038760] [<ffffffffa14b7eea>] _nv001929kms+0x1a/0x30 [nvidia_modeset]
[350085.038767] [<ffffffffa14a4242>] ? _nv001878kms+0x32/0xcf0 [nvidia_modeset]
[350085.038768] [<ffffffff8117c2f9>] ? alloc_kmem_pages+0x19/0x90
[350085.038770] [<ffffffff811d6bd3>] ? __kmalloc+0x213/0x230
[350085.038776] [<ffffffffa147d2c7>] ? nvkms_alloc+0x27/0x60 [nvidia_modeset]
[350085.038782] [<ffffffffa147e540>] ? _nv000318kms+0x40/0x40 [nvidia_modeset]
[350085.038788] [<ffffffffa147e56e>] ? _nv000169kms+0x2e/0x40 [nvidia_modeset]
[350085.038794] [<ffffffffa147f0c1>] ? nvKmsIoctl+0x161/0x1e0 [nvidia_modeset]
[350085.038800] [<ffffffffa147dd65>] ? nvkms_ioctl_common+0x45/0x80 [nvidia_modeset]
[350085.038806] [<ffffffffa147de11>] ? nvkms_ioctl+0x71/0xa0 [nvidia_modeset]
[350085.038962] [<ffffffffa0831080>] ? nvidia_frontend_compat_ioctl+0x40/0x50 [nvidia]
[350085.039032] [<ffffffffa083109e>] ? nvidia_frontend_unlocked_ioctl+0xe/0x10 [nvidia]
[350085.039035] [<ffffffff8120cd62>] ? do_vfs_ioctl+0xa2/0x5d0
[350085.039037] [<ffffffff8120d309>] ? SyS_ioctl+0x79/0x90
[350085.039039] [<ffffffff815de7b2>] ? entry_SYSCALL_64_fastpath+0x1a/0xa4
[350085.039048] Mem-Info:
[350085.039051] active_anon:861397 inactive_anon:23397 isolated_anon:0
active_file:146274 inactive_file:144248 isolated_file:0
unevictable:8 dirty:14587 writeback:0 unstable:0
slab_reclaimable:697630 slab_unreclaimable:24397
mapped:79655 shmem:26548 pagetables:7211 bounce:0
free:25159 free_pcp:235 free_cma:0
[350085.039054] Node 0 DMA free:15516kB min:136kB low:168kB high:200kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[350085.039058] lowmem_reserve[]: 0 3196 7658 7658
[350085.039060] Node 0 DMA32 free:45980kB min:28148kB low:35184kB high:42220kB active_anon:1466208kB inactive_anon:43120kB active_file:239740kB inactive_file:234920kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3617864kB managed:3280092kB mlocked:0kB dirty:21692kB writeback:0kB mapped:131184kB shmem:47588kB slab_reclaimable:1147984kB slab_unreclaimable:37484kB kernel_stack:2976kB pagetables:11512kB unstable:0kB bounce:0kB free_pcp:188kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[350085.039064] lowmem_reserve[]: 0 0 4462 4462
[350085.039065] Node 0 Normal free:39140kB min:39296kB low:49120kB high:58944kB active_anon:1979380kB inactive_anon:50468kB active_file:345356kB inactive_file:342072kB unevictable:32kB isolated(anon):0kB isolated(file):0kB present:4702208kB managed:4569312kB mlocked:32kB dirty:36656kB writeback:0kB mapped:187436kB shmem:58604kB slab_reclaimable:1642536kB slab_unreclaimable:60104kB kernel_stack:5040kB pagetables:17332kB unstable:0kB bounce:0kB free_pcp:752kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:136 all_unreclaimable? no
[350085.039069] lowmem_reserve[]: 0 0 0 0
[350085.039071] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15516kB
[350085.039077] Node 0 DMA32: 11569*4kB (UME) 50*8kB (M) 2*16kB (M) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 46708kB
[350085.039083] Node 0 Normal: 9282*4kB (UE) 0*8kB 4*16kB (H) 2*32kB (H) 3*64kB (H) 1*128kB (H) 2*256kB (H) 2*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 39112kB
[350085.039090] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[350085.039092] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[350085.039092] 316873 total pagecache pages
[350085.039093] 0 pages in swap cache
[350085.039094] Swap cache stats: add 0, delete 0, find 0/0
[350085.039095] Free swap = 0kB
[350085.039096] Total swap = 0kB
[350085.039097] 2084014 pages RAM
[350085.039097] 0 pages HighMem/MovableOnly
[350085.039098] 117688 pages reserved
[350085.039099] 0 pages hwpoisoned
[350085.039099] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[350085.039107] [ 208] 0 208 19325 3639 35 3 0 0 systemd-journal
[350085.039109] [ 265] 0 265 9102 818 18 3 0 -1000 systemd-udevd
[350085.039113] [ 1809] 192 1809 28539 542 25 3 0 0 systemd-timesyn
[350085.039115] [ 1822] 81 1822 8296 653 21 4 0 -900 dbus-daemon
[350085.039117] [ 1832] 0 1832 9629 511 22 3 0 0 systemd-logind
[350085.039119] [ 1862] 0 1862 3260 611 10 3 0 0 crond
[350085.039121] [ 1910] 1000 1910 3505 776 11 3 0 0 devmon
[350085.039122] [ 2041] 1000 2041 6453 707 18 3 0 0 udevil
[350085.039124] [ 2050] 0 2050 1685 77 9 3 0 0 dhcpcd
[350085.039126] [ 2051] 0 2051 132774 4634 65 6 0 -500 dockerd
[350085.039128] [ 2057] 0 2057 10099 746 25 3 0 -1000 sshd
[350085.039130] [ 2085] 0 2085 108773 1180 30 5 0 -500 docker-containe
[350085.039132] [ 2100] 0 2100 66532 937 32 3 0 0 lightdm
[350085.039134] [ 2119] 0 2119 50761 17423 97 3 0 0 Xorg
[350085.039136] [ 2123] 0 2123 68666 1431 36 3 0 0 accounts-daemon
[350085.039138] [ 2135] 102 2135 129825 2707 50 4 0 0 polkitd
[350085.039142] [ 2562] 0 2562 65038 1104 59 3 0 0 lightdm
[350085.039144] [ 2572] 1000 2572 13695 1020 29 3 0 0 systemd
[350085.039146] [ 2577] 1000 2577 24641 397 48 3 0 0 (sd-pam)
[350085.039147] [ 2584] 1000 2584 32170 1929 66 3 0 0 i3
[350085.039149] [ 2596] 1000 2596 2788 184 10 3 0 0 ssh-agent
[350085.039151] [ 2603] 1000 2603 8212 576 20 3 0 0 dbus-daemon
[350085.039153] [ 2605] 1000 2605 27268 1497 56 3 0 0 i3bar
[350085.039155] [ 2606] 1000 2606 3406 459 10 3 0 0 measure-net-spe
[350085.039157] [ 2607] 1000 2607 17782 505 40 3 0 0 i3status
[350085.039159] [ 2608] 1000 2608 3406 585 10 3 0 0 measure-net-spe
[350085.039161] [ 2658] 1000 2658 67811 849 34 3 0 0 gvfsd
[350085.039163] [ 2663] 1000 2663 84725 1246 31 3 0 0 gvfsd-fuse
[350085.039164] [ 2671] 1000 2671 84401 651 33 3 0 0 at-spi-bus-laun
[350085.039166] [ 2676] 1000 2676 8186 714 20 3 0 0 dbus-daemon
[350085.039168] [ 2678] 1000 2678 53563 667 40 3 0 0 at-spi2-registr
[350085.039169] [ 2682] 1000 2682 14718 829 32 3 0 0 gconfd-2
[350085.039171] [ 2690] 1000 2690 222566 5206 122 4 0 0 pulseaudio
[350085.039173] [ 2691] 133 2691 44462 530 22 3 0 0 rtkit-daemon
[350085.039174] [ 2743] 1000 2743 4649 755 13 3 0 0 zsh
[350085.039176] [ 2748] 1000 2748 437286 84044 478 6 0 0 chromium
[350085.039178] [ 2752] 1000 2752 1585 191 9 3 0 0 chrome-sandbox
[350085.039179] [ 2753] 1000 2753 113527 5589 166 4 0 0 chromium
[350085.039181] [ 2756] 1000 2756 1585 177 8 3 0 0 chrome-sandbox
[350085.039182] [ 2757] 1000 2757 7909 840 22 4 0 0 nacl_helper
[350085.039184] [ 2759] 1000 2759 113527 2847 127 4 0 0 chromium
[350085.039186] [ 2866] 1000 2866 340267 219730 629 7 0 200 chromium
[350085.039187] [ 2881] 1000 2881 114831 4858 144 5 0 200 chromium
[350085.039189] [ 2891] 1000 2891 258525 43032 338 68 0 300 chromium
[350085.039191] [ 2908] 1000 2908 216776 17487 220 31 0 300 chromium
[350085.039193] [ 3096] 0 3096 73383 1417 42 3 0 0 upowerd
[350085.039194] [ 4273] 1000 4273 4649 761 13 3 0 0 zsh
[350085.039196] [ 4276] 1000 4276 206798 6849 144 4 0 0 pavucontrol
[350085.039198] [ 6647] 1000 6647 250470 37756 295 54 0 300 chromium
[350085.039200] [ 6658] 1000 6658 214211 17257 215 29 0 300 chromium
[350085.039201] [ 7390] 1000 7390 216243 17154 217 29 0 300 chromium
[350085.039204] [23007] 1000 23007 113232 2020 54 4 0 0 gvfs-udisks2-vo
[350085.039205] [23010] 0 23010 91532 2142 44 3 0 0 udisksd
[350085.039207] [ 6558] 1000 6558 20485 2858 42 3 0 0 urxvt
[350085.039209] [ 6559] 1000 6559 9121 1722 22 3 0 0 zsh
[350085.039210] [ 6581] 1000 6581 39165 25124 80 4 0 0 mutt
[350085.039213] [18246] 1000 18246 4649 848 12 3 0 0 zsh
[350085.039215] [18251] 1000 18251 191866 14934 175 4 0 0 emacs
[350085.039216] [18256] 1000 18256 4004 813 12 3 0 0 bash
[350085.039218] [18261] 1000 18261 20305 2924 43 3 0 0 urxvt
[350085.039220] [18262] 1000 18262 9121 1714 23 3 0 0 zsh
[350085.039223] [ 7362] 1000 7362 319274 102294 527 164 0 300 chromium
[350085.039225] [ 9185] 1000 9185 400186 164602 672 161 0 300 chromium
[350085.039227] [10839] 1000 10839 253464 41492 303 50 0 300 chromium
[350085.039228] [10957] 0 10957 17509 1231 37 3 0 0 sudo
[350085.039230] [10960] 0 10960 55798 21075 81 4 0 0 pacman
[350085.039232] [15262] 0 15262 3438 787 11 3 0 0 alpm-hook
[350085.039234] [15263] 0 15263 3868 1244 11 3 0 0 dkms
[350085.039236] [15278] 0 15278 3869 1168 11 3 0 0 dkms
[350085.039237] [15611] 0 15611 3869 916 11 3 0 0 dkms
[350085.039239] [15612] 0 15612 3869 947 11 3 0 0 dkms
[350085.039241] [15613] 0 15613 8562 865 20 3 0 0 make
[350085.039242] [15619] 0 15619 8793 1078 20 3 0 0 make
[350085.039244] [15889] 0 15889 9148 1498 22 3 0 0 make
[350085.039246] [18079] 0 18079 3442 779 11 3 0 0 sh
[350085.039248] [18080] 0 18080 2490 227 9 3 0 0 cc
[350085.039249] [18081] 0 18081 68687 38388 97 3 0 0 cc1
[350085.039251] [18082] 0 18082 4786 1977 14 3 0 0 as
[350085.039253] [18091] 0 18091 3442 808 11 3 0 0 sh
[350085.039255] [18093] 0 18093 1454 165 8 3 0 0 sleep
[350085.039257] [18094] 0 18094 2490 253 9 3 0 0 cc
[350085.039259] [18095] 0 18095 68650 38238 96 3 0 0 cc1
[350085.039261] [18101] 0 18101 4786 1964 14 3 0 0 as
[350085.039263] [18104] 0 18104 3442 814 11 3 0 0 sh
[350085.039264] [18106] 0 18106 2490 248 9 3 0 0 cc
[350085.039266] [18107] 0 18107 67906 36050 93 3 0 0 cc1
[350085.039268] [18108] 0 18108 4786 2030 14 3 0 0 as
[350085.039270] [18130] 0 18130 3442 790 12 3 0 0 sh
[350085.039271] [18133] 0 18133 2490 235 8 3 0 0 cc
[350085.039273] [18134] 0 18134 3442 781 12 3 0 0 sh
[350085.039275] [18135] 0 18135 67911 36623 95 3 0 0 cc1
[350085.039277] [18136] 0 18136 4786 1935 15 3 0 0 as
[350085.039278] [18137] 0 18137 3442 786 10 3 0 0 sh
[350085.039280] [18138] 0 18138 2490 229 9 3 0 0 cc
[350085.039282] [18139] 0 18139 2490 242 9 3 0 0 cc
[350085.039284] [18140] 0 18140 67922 20214 63 3 0 0 cc1
[350085.039286] [18141] 0 18141 66967 36993 94 3 0 0 cc1
[350085.039288] [18142] 0 18142 4786 1952 14 4 0 0 as
[350085.039289] [18143] 0 18143 4786 2012 13 3 0 0 as
[350085.039291] [18152] 0 18152 3442 778 10 3 0 0 sh
[350085.039293] [18153] 0 18153 2490 226 9 3 0 0 cc
[350085.039295] [18154] 0 18154 22881 13677 47 3 0 0 cc1
[350085.039296] [18155] 0 18155 4786 2012 15 3 0 0 as
[350085.039298] [18166] 0 18166 3442 809 10 3 0 0 sh
[350085.039300] [18167] 0 18167 3442 137 8 3 0 0 sh
[350085.039301] Out of memory: Kill process 9185 (chromium) score 384 or sacrifice child
[350085.039346] Killed process 9185 (chromium) total-vm:1600744kB, anon-rss:548240kB, file-rss:71988kB, shmem-rss:38180kB
[350085.075980] oom_reaper: reaped process 9185 (chromium), now anon-rss:0kB, file-rss:0kB, shmem-rss:38480kB
[350086.337625] Xorg invoked oom-killer: gfp_mask=0x24040c0(GFP_KERNEL|__GFP_COMP), order=3, oom_score_adj=0
[350086.337628] Xorg cpuset=/ mems_allowed=0
[350086.337633] CPU: 0 PID: 2119 Comm: Xorg Tainted: P O 4.7.2-1-custom #1
[350086.337634] Hardware name: MSI MS-7850/Z97 PC Mate(MS-7850), BIOS V4.10 08/11/2015
[350086.337635] 0000000000000286 000000009fd6569c ffff88020c60f940 ffffffff812eb122
[350086.337637] ffff88020c60fb18 ffff8800cb5ae3c0 ffff88020c60f9b0 ffffffff811f6e4c
[350086.337639] 0000000000000246 ffff880200000000 ffff88020c60f970 ffffffff00000002
[350086.337640] Call Trace:
[350086.337646] [<ffffffff812eb122>] dump_stack+0x63/0x81
[350086.337649] [<ffffffff811f6e4c>] dump_header+0x60/0x1e8
[350086.337653] [<ffffffff811762fa>] oom_kill_process+0x22a/0x440
[350086.337655] [<ffffffff8117696a>] out_of_memory+0x40a/0x4b0
[350086.337657] [<ffffffff812ffdf8>] ? find_next_bit+0x18/0x20
[350086.337659] [<ffffffff8117c034>] __alloc_pages_nodemask+0xee4/0xf20
[350086.337662] [<ffffffff811cb835>] alloc_pages_current+0x95/0x140
[350086.337663] [<ffffffff8117c2f9>] alloc_kmem_pages+0x19/0x90
[350086.337666] [<ffffffff8119a79e>] kmalloc_order_trace+0x2e/0x100
[350086.337668] [<ffffffff811d6bd3>] __kmalloc+0x213/0x230
[350086.337681] [<ffffffffa147d2c7>] nvkms_alloc+0x27/0x60 [nvidia_modeset]
[350086.337687] [<ffffffffa147e540>] ? _nv000318kms+0x40/0x40 [nvidia_modeset]
[350086.337695] [<ffffffffa14b7eea>] _nv001929kms+0x1a/0x30 [nvidia_modeset]
[350086.337702] [<ffffffffa14a4242>] ? _nv001878kms+0x32/0xcf0 [nvidia_modeset]
[350086.337703] [<ffffffff8117c2f9>] ? alloc_kmem_pages+0x19/0x90
[350086.337705] [<ffffffff811d6bd3>] ? __kmalloc+0x213/0x230
[350086.337711] [<ffffffffa147d2c7>] ? nvkms_alloc+0x27/0x60 [nvidia_modeset]
[350086.337716] [<ffffffffa147e540>] ? _nv000318kms+0x40/0x40 [nvidia_modeset]
[350086.337722] [<ffffffffa147e56e>] ? _nv000169kms+0x2e/0x40 [nvidia_modeset]
[350086.337728] [<ffffffffa147f0c1>] ? nvKmsIoctl+0x161/0x1e0 [nvidia_modeset]
[350086.337734] [<ffffffffa147dd65>] ? nvkms_ioctl_common+0x45/0x80 [nvidia_modeset]
[350086.337740] [<ffffffffa147de11>] ? nvkms_ioctl+0x71/0xa0 [nvidia_modeset]
[350086.337838] [<ffffffffa0831080>] ? nvidia_frontend_compat_ioctl+0x40/0x50 [nvidia]
[350086.337911] [<ffffffffa083109e>] ? nvidia_frontend_unlocked_ioctl+0xe/0x10 [nvidia]
[350086.337915] [<ffffffff8120cd62>] ? do_vfs_ioctl+0xa2/0x5d0
[350086.337917] [<ffffffff8120d309>] ? SyS_ioctl+0x79/0x90
[350086.337920] [<ffffffff815de7b2>] ? entry_SYSCALL_64_fastpath+0x1a/0xa4
[350086.337933] Mem-Info:
[350086.337936] active_anon:926090 inactive_anon:14054 isolated_anon:0
active_file:127217 inactive_file:124640 isolated_file:0
unevictable:8 dirty:14757 writeback:0 unstable:0
slab_reclaimable:685505 slab_unreclaimable:20594
mapped:69794 shmem:17206 pagetables:7032 bounce:0
free:25275 free_pcp:114 free_cma:0
[350086.337939] Node 0 DMA free:15516kB min:136kB low:168kB high:200kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[350086.337944] lowmem_reserve[]: 0 3196 7658 7658
[350086.337946] Node 0 DMA32 free:46168kB min:28148kB low:35184kB high:42220kB active_anon:1571968kB inactive_anon:33316kB active_file:206232kB inactive_file:198884kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3617864kB managed:3280092kB mlocked:0kB dirty:21952kB writeback:0kB mapped:120868kB shmem:37784kB slab_reclaimable:1128300kB slab_unreclaimable:31216kB kernel_stack:2848kB pagetables:11300kB unstable:0kB bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:92 all_unreclaimable? no
[350086.337950] lowmem_reserve[]: 0 0 4462 4462
[350086.337952] Node 0 Normal free:39416kB min:39296kB low:49120kB high:58944kB active_anon:2132392kB inactive_anon:22900kB active_file:302636kB inactive_file:299676kB unevictable:32kB isolated(anon):0kB isolated(file):0kB present:4702208kB managed:4569312kB mlocked:32kB dirty:37076kB writeback:0kB mapped:158308kB shmem:31040kB slab_reclaimable:1613720kB slab_unreclaimable:51160kB kernel_stack:4752kB pagetables:16828kB unstable:0kB bounce:0kB free_pcp:440kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:400 all_unreclaimable? no
[350086.337956] lowmem_reserve[]: 0 0 0 0
[350086.337958] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15516kB
[350086.337984] Node 0 DMA32: 11350*4kB (UME) 172*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 46776kB
[350086.337989] Node 0 Normal: 9232*4kB (UME) 54*8kB (ME) 62*16kB (MEH) 2*32kB (H) 3*64kB (H) 1*128kB (H) 2*256kB (H) 2*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 40272kB
[350086.337997] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[350086.337998] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[350086.337999] 269040 total pagecache pages
[350086.338011] 0 pages in swap cache
[350086.338012] Swap cache stats: add 0, delete 0, find 0/0
[350086.338013] Free swap = 0kB
[350086.338013] Total swap = 0kB
[350086.338014] 2084014 pages RAM
[350086.338015] 0 pages HighMem/MovableOnly
[350086.338016] 117688 pages reserved
[350086.338016] 0 pages hwpoisoned
[350086.338017] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[350086.338027] [ 208] 0 208 19325 3815 35 3 0 0 systemd-journal
[350086.338029] [ 265] 0 265 9102 818 18 3 0 -1000 systemd-udevd
[350086.338033] [ 1809] 192 1809 28539 542 25 3 0 0 systemd-timesyn
[350086.338035] [ 1822] 81 1822 8296 653 21 4 0 -900 dbus-daemon
[350086.338037] [ 1832] 0 1832 9629 511 22 3 0 0 systemd-logind
[350086.338039] [ 1862] 0 1862 3260 611 10 3 0 0 crond
[350086.338041] [ 1910] 1000 1910 3505 776 11 3 0 0 devmon
[350086.338043] [ 2041] 1000 2041 6453 707 18 3 0 0 udevil
[350086.338045] [ 2050] 0 2050 1685 77 9 3 0 0 dhcpcd
[350086.338047] [ 2051] 0 2051 132774 4634 65 6 0 -500 dockerd
[350086.338049] [ 2057] 0 2057 10099 746 25 3 0 -1000 sshd
[350086.338051] [ 2085] 0 2085 108773 1180 30 5 0 -500 docker-containe
[350086.338053] [ 2100] 0 2100 66532 937 32 3 0 0 lightdm
[350086.338055] [ 2119] 0 2119 50761 17520 97 3 0 0 Xorg
[350086.338057] [ 2123] 0 2123 68666 1431 36 3 0 0 accounts-daemon
[350086.338058] [ 2135] 102 2135 129825 2705 50 4 0 0 polkitd
[350086.338062] [ 2562] 0 2562 65038 1104 59 3 0 0 lightdm
[350086.338064] [ 2572] 1000 2572 13695 1020 29 3 0 0 systemd
[350086.338066] [ 2577] 1000 2577 24641 397 48 3 0 0 (sd-pam)
[350086.338069] [ 2584] 1000 2584 32170 1929 66 3 0 0 i3
[350086.338070] [ 2596] 1000 2596 2788 184 10 3 0 0 ssh-agent
[350086.338073] [ 2603] 1000 2603 8212 576 20 3 0 0 dbus-daemon
[350086.338075] [ 2605] 1000 2605 27268 1497 56 3 0 0 i3bar
[350086.338077] [ 2606] 1000 2606 3406 459 10 3 0 0 measure-net-spe
[350086.338079] [ 2607] 1000 2607 17782 505 40 3 0 0 i3status
[350086.338081] [ 2608] 1000 2608 3406 585 10 3 0 0 measure-net-spe
[350086.338084] [ 2658] 1000 2658 67811 849 34 3 0 0 gvfsd
[350086.338086] [ 2663] 1000 2663 84725 1246 31 3 0 0 gvfsd-fuse
[350086.338088] [ 2671] 1000 2671 84401 651 33 3 0 0 at-spi-bus-laun
[350086.338091] [ 2676] 1000 2676 8186 714 20 3 0 0 dbus-daemon
[350086.338093] [ 2678] 1000 2678 53563 667 40 3 0 0 at-spi2-registr
[350086.338095] [ 2682] 1000 2682 14718 829 32 3 0 0 gconfd-2
[350086.338098] [ 2690] 1000 2690 222566 5206 122 4 0 0 pulseaudio
[350086.338100] [ 2691] 133 2691 44462 530 22 3 0 0 rtkit-daemon
[350086.338103] [ 2743] 1000 2743 4649 755 13 3 0 0 zsh
[350086.338106] [ 2748] 1000 2748 433311 84260 475 6 0 0 chromium
[350086.338108] [ 2752] 1000 2752 1585 191 9 3 0 0 chrome-sandbox
[350086.338110] [ 2753] 1000 2753 113527 5589 166 4 0 0 chromium
[350086.338112] [ 2756] 1000 2756 1585 177 8 3 0 0 chrome-sandbox
[350086.338114] [ 2757] 1000 2757 7909 840 22 4 0 0 nacl_helper
[350086.338117] [ 2759] 1000 2759 113527 2847 127 4 0 0 chromium
[350086.338120] [ 2866] 1000 2866 332680 213705 612 7 0 200 chromium
[350086.338122] [ 2881] 1000 2881 114831 4858 144 5 0 200 chromium
[350086.338124] [ 2891] 1000 2891 258525 43032 338 68 0 300 chromium
[350086.338126] [ 2908] 1000 2908 216776 17487 220 31 0 300 chromium
[350086.338129] [ 3096] 0 3096 73383 1417 42 3 0 0 upowerd
[350086.338131] [ 4273] 1000 4273 4649 761 13 3 0 0 zsh
[350086.338134] [ 4276] 1000 4276 206984 7921 144 4 0 0 pavucontrol
[350086.338136] [ 6647] 1000 6647 250470 37756 295 54 0 300 chromium
[350086.338138] [ 6658] 1000 6658 214211 17257 215 29 0 300 chromium
[350086.338140] [ 7390] 1000 7390 216243 17154 217 29 0 300 chromium
[350086.338143] [23007] 1000 23007 113232 2020 54 4 0 0 gvfs-udisks2-vo
[350086.338145] [23010] 0 23010 91532 2140 44 3 0 0 udisksd
[350086.338147] [ 6558] 1000 6558 20485 2858 42 3 0 0 urxvt
[350086.338150] [ 6559] 1000 6559 9121 1722 22 3 0 0 zsh
[350086.338152] [ 6581] 1000 6581 39165 25124 80 4 0 0 mutt
[350086.338155] [18246] 1000 18246 4649 848 12 3 0 0 zsh
[350086.338157] [18251] 1000 18251 191866 14934 175 4 0 0 emacs
[350086.338159] [18256] 1000 18256 4004 813 12 3 0 0 bash
[350086.338161] [18261] 1000 18261 20305 2924 43 3 0 0 urxvt
[350086.338163] [18262] 1000 18262 9121 1714 23 3 0 0 zsh
[350086.338168] [ 7362] 1000 7362 319274 102294 527 164 0 300 chromium
[350086.338171] [10839] 1000 10839 253464 41492 303 50 0 300 chromium
[350086.338173] [10957] 0 10957 17509 1231 37 3 0 0 sudo
[350086.338175] [10960] 0 10960 55798 21075 81 4 0 0 pacman
[350086.338178] [15262] 0 15262 3438 787 11 3 0 0 alpm-hook
[350086.338180] [15263] 0 15263 3868 1244 11 3 0 0 dkms
[350086.338182] [15278] 0 15278 3869 1168 11 3 0 0 dkms
[350086.338184] [15611] 0 15611 3869 916 11 3 0 0 dkms
[350086.338186] [15612] 0 15612 3869 947 11 3 0 0 dkms
[350086.338189] [15613] 0 15613 8562 865 20 3 0 0 make
[350086.338191] [15619] 0 15619 8793 1078 20 3 0 0 make
[350086.338193] [15889] 0 15889 9148 1498 22 3 0 0 make
[350086.338196] [18079] 0 18079 3442 779 11 3 0 0 sh
[350086.338198] [18080] 0 18080 2490 227 9 3 0 0 cc
[350086.338201] [18081] 0 18081 99723 59927 144 3 0 0 cc1
[350086.338203] [18082] 0 18082 4786 1977 14 3 0 0 as
[350086.338205] [18091] 0 18091 3442 808 11 3 0 0 sh
[350086.338208] [18093] 0 18093 1454 165 8 3 0 0 sleep
[350086.338210] [18094] 0 18094 2490 253 9 3 0 0 cc
[350086.338211] [18095] 0 18095 100238 60442 146 3 0 0 cc1
[350086.338213] [18101] 0 18101 4786 1964 14 3 0 0 as
[350086.338215] [18104] 0 18104 3442 814 11 3 0 0 sh
[350086.338217] [18106] 0 18106 2490 248 9 3 0 0 cc
[350086.338219] [18107] 0 18107 99725 57639 141 4 0 0 cc1
[350086.338221] [18108] 0 18108 4786 2030 14 3 0 0 as
[350086.338223] [18130] 0 18130 3442 790 12 3 0 0 sh
[350086.338226] [18133] 0 18133 2490 235 8 3 0 0 cc
[350086.338228] [18134] 0 18134 3442 781 12 3 0 0 sh
[350086.338230] [18135] 0 18135 81008 48498 121 3 0 0 cc1
[350086.338232] [18136] 0 18136 4786 1935 15 3 0 0 as
[350086.338234] [18137] 0 18137 3442 786 10 3 0 0 sh
[350086.338236] [18138] 0 18138 2490 229 9 3 0 0 cc
[350086.338238] [18139] 0 18139 2490 242 9 3 0 0 cc
[350086.338240] [18140] 0 18140 80993 48202 118 3 0 0 cc1
[350086.338242] [18141] 0 18141 99713 53841 132 3 0 0 cc1
[350086.338243] [18142] 0 18142 4786 1952 14 4 0 0 as
[350086.338245] [18143] 0 18143 4786 2012 13 3 0 0 as
[350086.338247] [18152] 0 18152 3442 778 10 3 0 0 sh
[350086.338249] [18153] 0 18153 2490 226 9 3 0 0 cc
[350086.338251] [18154] 0 18154 83047 50126 121 3 0 0 cc1
[350086.338253] [18155] 0 18155 4786 2012 15 3 0 0 as
[350086.338255] [18166] 0 18166 3442 809 10 3 0 0 sh
[350086.338257] [18167] 0 18167 2490 236 9 3 0 0 cc
[350086.338259] [18168] 0 18168 73800 42887 106 4 0 0 cc1
[350086.338260] [18169] 0 18169 4786 1952 14 3 0 0 as
[350086.338262] Out of memory: Kill process 7362 (chromium) score 352 or sacrifice child
[350086.338298] Killed process 7362 (chromium) total-vm:1277096kB, anon-rss:313392kB, file-rss:68416kB, shmem-rss:27368kB
[350086.360581] oom_reaper: reaped process 7362 (chromium), now anon-rss:0kB, file-rss:0kB, shmem-rss:27268kB
[~]$ free -h
total used free shared buff/cache available
Mem: 7.5G 3.3G 810M 39M 3.4G 3.9G
Swap: 0B 0B 0B
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
* More OOM problems
@ 2016-09-18 20:03 Linus Torvalds
2016-09-18 20:26 ` Lorenzo Stoakes
` (4 more replies)
0 siblings, 5 replies; 31+ messages in thread
From: Linus Torvalds @ 2016-09-18 20:03 UTC (permalink / raw)
To: Michal Hocko, Tetsuo Handa, Oleg Nesterov, Vladimir Davydov,
Vlastimil Babka
Cc: Andrew Morton, Markus Trippelsdorf, Arkadiusz Miskiewicz,
Ralf-Peter Rohbeck, Jiri Slaby, Olaf Hering, Joonsoo Kim,
linux-mm
[ More or less random collection of people from previous oom patches
and/or discussions, if you feel you shouldn't have been cc'd, blame me
for just picking things from earlier threads and/or commits ]
I'm afraid that the oom situation is still not fixed, and the "let's
die quickly" patches are still a nasty regression.
I have a 16GB desktop that I just noticed killed one of the chrome
tabs yesterday. Tha machine had *tons* of freeable memory, with
something like 7GB of page cache at the time, if I read this right.
The trigger is a kcalloc() in the i915 driver:
Xorg invoked oom-killer:
gfp_mask=0x240c0d0(GFP_TEMPORARY|__GFP_COMP|__GFP_ZERO), order=3,
oom_score_adj=0
__kmalloc+0x1cd/0x1f0
alloc_gen8_temp_bitmaps+0x47/0x80 [i915]
which looks like it is one of these:
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab>
<pagesperslab>
kmalloc-8192 268 268 8192 4 8
kmalloc-4096 732 786 4096 8 8
kmalloc-2048 1402 1456 2048 16 8
kmalloc-1024 2505 2976 1024 32 8
so even just a 1kB allocation can cause an order-3 page allocation.
And yeah, I had what, 137MB free memory, it's just that it's all
fairly fragmented. There's actually even order-4 pages, but they are
in low DMA memory and the system tries to protect them:
Node 0 DMA: 0*4kB 1*8kB (U) 2*16kB (U) 1*32kB (U) 3*64kB (U) 2*128kB
(U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
Node 0 DMA32: 11110*4kB (UMEH) 2929*8kB (UMEH) 44*16kB (MH) 1*32kB
(H) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
68608kB
Node 0 Normal: 14031*4kB (UMEH) 49*8kB (UMEH) 18*16kB (UH) 0*32kB
0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 56804kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=2048kB
2084682 total pagecache pages
11 pages in swap cache
Swap cache stats: add 35, delete 24, find 2/3
Free swap = 8191868kB
Total swap = 8191996kB
4168499 pages RAM
And it looks like there's a fair amount of memory busy under writeback
(470MB or so)
active_anon:1539159 inactive_anon:374915 isolated_anon:0
active_file:1251771 inactive_file:450068
isolated_file:0
unevictable:175 dirty:26 writeback:118690 unstable:0
slab_reclaimable:220784 slab_unreclaimable:39819
mapped:491617 shmem:382891 pagetables:20439 bounce:0
free:35301 free_pcp:895 free_cma:0
And yes, CONFIG_COMPACTION was enabled.
So quite honestly, I *really* don't think that a 1kB allocation should
have reasonably failed and killed anything at all (ok, it could have
been an 8kB one, who knows - but it really looks like it *could* have
been just 1kB).
Considering that kmalloc() pattern, I suspect that we need to consider
order-3 allocations "small", and try a lot harder.
Because killing processes due to "out of memory" in this situation is
unquestionably a bug.
And no, I can't recreate this, obviously.
I think there's a series in -mm that hasn't been merged and that is
pending (presumably for 4.9). I think Arkadiusz tested it for his
(repeatable) workload. It may need to be considered for 4.8, because
the above is ridiculously bad, imho.
Andrew? Vlastimil? Michal? Others?
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2016-10-31 21:51 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <eafb59b5-0a2b-0e28-ca79-f044470a2851@Quantum.com>
[not found] ` <20160930214448.GB28379@dhcp22.suse.cz>
[not found] ` <982671bd-5733-0cd5-c15d-112648ff14c5@Quantum.com>
2016-10-11 6:44 ` More OOM problems Michal Hocko
2016-10-11 7:10 ` Vlastimil Babka
2016-10-30 4:17 ` Simon Kirby
2016-10-31 21:41 ` Vlastimil Babka
2016-10-31 21:51 ` Vlastimil Babka
2016-09-18 20:03 Linus Torvalds
2016-09-18 20:26 ` Lorenzo Stoakes
2016-09-18 20:58 ` Linus Torvalds
2016-09-18 21:13 ` Vlastimil Babka
2016-09-18 21:34 ` Lorenzo Stoakes
2016-09-19 8:32 ` Michal Hocko
2016-09-19 8:42 ` Lorenzo Stoakes
2016-09-19 8:53 ` Michal Hocko
2016-09-25 21:48 ` Lorenzo Stoakes
2016-09-26 7:48 ` Michal Hocko
2016-09-18 21:00 ` Vlastimil Babka
2016-09-18 21:18 ` Linus Torvalds
2016-09-19 6:27 ` Jiri Slaby
2016-09-19 7:01 ` Michal Hocko
2016-09-19 7:52 ` Michal Hocko
2016-09-19 1:07 ` Andi Kleen
[not found] ` <alpine.DEB.2.20.1609190836540.12121@east.gentwo.org>
2016-09-19 14:31 ` Andi Kleen
2016-09-19 14:39 ` Michal Hocko
2016-09-19 14:41 ` Vlastimil Babka
2016-09-19 18:18 ` Linus Torvalds
2016-09-19 19:57 ` Christoph Lameter
2016-09-18 22:00 ` Vlastimil Babka
2016-09-19 6:56 ` Michal Hocko
2016-09-19 6:48 ` Michal Hocko
2016-09-21 7:04 ` Raymond Jennings
2016-09-21 7:29 ` Michal Hocko
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.