All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Gerhard Wiesinger <lists@wiesinger.com>,
	Michal Hocko <mhocko@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: Still OOM problems with 4.9er kernels
Date: Fri, 9 Dec 2016 22:42:09 +0100	[thread overview]
Message-ID: <dce6a53e-9c13-2a17-ecef-824883506f72@suse.cz> (raw)
In-Reply-To: <a7ebcdbe-9feb-a88f-594c-161e7daa5818@wiesinger.com>

On 12/09/2016 07:01 PM, Gerhard Wiesinger wrote:
> On 09.12.2016 18:30, Michal Hocko wrote:
>> On Fri 09-12-16 17:58:14, Gerhard Wiesinger wrote:
>>> On 09.12.2016 17:09, Michal Hocko wrote:
>> [...]
>>>>> [97883.882611] Mem-Info:
>>>>> [97883.883747] active_anon:2915 inactive_anon:3376 isolated_anon:0
>>>>>                   active_file:3902 inactive_file:3639 isolated_file:0
>>>>>                   unevictable:0 dirty:205 writeback:0 unstable:0
>>>>>                   slab_reclaimable:9856 slab_unreclaimable:9682
>>>>>                   mapped:3722 shmem:59 pagetables:2080 bounce:0
>>>>>                   free:748 free_pcp:15 free_cma:0
>>>> there is still some page cache which doesn't seem to be neither dirty
>>>> nor under writeback. So it should be theoretically reclaimable but for
>>>> some reason we cannot seem to reclaim that memory.
>>>> There is still some anonymous memory and free swap so we could reclaim
>>>> it as well but it all seems pretty down and the memory pressure is
>>>> really large
>>> Yes, it might be large on the update situation, but that should be handled
>>> by a virtual memory system by the kernel, right?
>> Well this is what we try and call it memory reclaim. But if we are not
>> able to reclaim anything then we eventually have to give up and trigger
>> the OOM killer.
> 
> I'm not familiar with the Linux implementation of the VM system in 
> detail. But can't you reserve as much memory for the kernel (non 
> pageable) at least that you can swap everything out (even without 
> killing a process at least as long there is enough swap available, which 
> should be in all of my cases)?

We don't have such bulletproof reserves. In this case the amount of
anonymous memory that can be swapped out is relatively low, and either
something is pinning it in memory, or it's being swapped back in quickly.

>>   Now the information that 4.4 made a difference is
>> interesting. I do not really see any major differences in the reclaim
>> between 4.3 and 4.4 kernels. The reason might be somewhere else as well.
>> E.g. some of the subsystem consumes much more memory than before.
>>
>> Just curious, what kind of filesystem are you using?
> 
> I'm using ext4 only with virt-* drivers (storage, network). But it is 
> definitly a virtual memory allocation/swap usage issue.
> 
>>   Could you try some
>> additional debugging. Enabling reclaim related tracepoints might tell us
>> more. The following should tell us more
>> mount -t tracefs none /trace
>> echo 1 > /trace/events/vmscan/enable
>> echo 1 > /trace/events/writeback/writeback_congestion_wait/enable
>> cat /trace/trace_pipe > trace.log
>>
>> Collecting /proc/vmstat over time might be helpful as well
>> mkdir logs
>> while true
>> do
>> 	cp /proc/vmstat vmstat.$(date +%s)
>> 	sleep 1s
>> done
> 
> Activated it. But I think it should be very easy to trigger also on your 
> side. A very small configured VM with a program running RAM 
> allocations/writes (I guess you have some testing programs already) 
> should be sufficient to trigger it. You can also use the attached 
> program which I used to trigger such situations some years ago. If it 
> doesn't help try to reduce the available CPU for the VM and also I/O 
> (e.g. use all CPU/IO on the host or other VMs).

Well it's not really a surprise that if the VM is small enough and
workload large enough, OOM killer will kick in. The exact threshold
might have changed between kernel versions for a number of possible reasons.

> 
> BTW: Don't know if you have seen also my original message on the kernel 
> mailinglist only:
> 
> Linus had also OOM problems with 1kB RAM requests and a lot of free RAM 
> (use a translation service for the german page):
> https://lkml.org/lkml/2016/11/30/64
> https://marius.bloggt-in-braunschweig.de/2016/11/17/linuxkernel-4-74-8-und-der-oom-killer/
> https://www.spinics.net/lists/linux-mm/msg113661.html

Yeah we were involved in the last one. The regressions were about
high-order allocations
though (the 1kB premise turned out to be misinterpretation) and there
were regressions
for those in 4.7/4.8. But yours are order-0.

WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz>
To: Gerhard Wiesinger <lists@wiesinger.com>,
	Michal Hocko <mhocko@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: Still OOM problems with 4.9er kernels
Date: Fri, 9 Dec 2016 22:42:09 +0100	[thread overview]
Message-ID: <dce6a53e-9c13-2a17-ecef-824883506f72@suse.cz> (raw)
In-Reply-To: <a7ebcdbe-9feb-a88f-594c-161e7daa5818@wiesinger.com>

On 12/09/2016 07:01 PM, Gerhard Wiesinger wrote:
> On 09.12.2016 18:30, Michal Hocko wrote:
>> On Fri 09-12-16 17:58:14, Gerhard Wiesinger wrote:
>>> On 09.12.2016 17:09, Michal Hocko wrote:
>> [...]
>>>>> [97883.882611] Mem-Info:
>>>>> [97883.883747] active_anon:2915 inactive_anon:3376 isolated_anon:0
>>>>>                   active_file:3902 inactive_file:3639 isolated_file:0
>>>>>                   unevictable:0 dirty:205 writeback:0 unstable:0
>>>>>                   slab_reclaimable:9856 slab_unreclaimable:9682
>>>>>                   mapped:3722 shmem:59 pagetables:2080 bounce:0
>>>>>                   free:748 free_pcp:15 free_cma:0
>>>> there is still some page cache which doesn't seem to be neither dirty
>>>> nor under writeback. So it should be theoretically reclaimable but for
>>>> some reason we cannot seem to reclaim that memory.
>>>> There is still some anonymous memory and free swap so we could reclaim
>>>> it as well but it all seems pretty down and the memory pressure is
>>>> really large
>>> Yes, it might be large on the update situation, but that should be handled
>>> by a virtual memory system by the kernel, right?
>> Well this is what we try and call it memory reclaim. But if we are not
>> able to reclaim anything then we eventually have to give up and trigger
>> the OOM killer.
> 
> I'm not familiar with the Linux implementation of the VM system in 
> detail. But can't you reserve as much memory for the kernel (non 
> pageable) at least that you can swap everything out (even without 
> killing a process at least as long there is enough swap available, which 
> should be in all of my cases)?

We don't have such bulletproof reserves. In this case the amount of
anonymous memory that can be swapped out is relatively low, and either
something is pinning it in memory, or it's being swapped back in quickly.

>>   Now the information that 4.4 made a difference is
>> interesting. I do not really see any major differences in the reclaim
>> between 4.3 and 4.4 kernels. The reason might be somewhere else as well.
>> E.g. some of the subsystem consumes much more memory than before.
>>
>> Just curious, what kind of filesystem are you using?
> 
> I'm using ext4 only with virt-* drivers (storage, network). But it is 
> definitly a virtual memory allocation/swap usage issue.
> 
>>   Could you try some
>> additional debugging. Enabling reclaim related tracepoints might tell us
>> more. The following should tell us more
>> mount -t tracefs none /trace
>> echo 1 > /trace/events/vmscan/enable
>> echo 1 > /trace/events/writeback/writeback_congestion_wait/enable
>> cat /trace/trace_pipe > trace.log
>>
>> Collecting /proc/vmstat over time might be helpful as well
>> mkdir logs
>> while true
>> do
>> 	cp /proc/vmstat vmstat.$(date +%s)
>> 	sleep 1s
>> done
> 
> Activated it. But I think it should be very easy to trigger also on your 
> side. A very small configured VM with a program running RAM 
> allocations/writes (I guess you have some testing programs already) 
> should be sufficient to trigger it. You can also use the attached 
> program which I used to trigger such situations some years ago. If it 
> doesn't help try to reduce the available CPU for the VM and also I/O 
> (e.g. use all CPU/IO on the host or other VMs).

Well it's not really a surprise that if the VM is small enough and
workload large enough, OOM killer will kick in. The exact threshold
might have changed between kernel versions for a number of possible reasons.

> 
> BTW: Don't know if you have seen also my original message on the kernel 
> mailinglist only:
> 
> Linus had also OOM problems with 1kB RAM requests and a lot of free RAM 
> (use a translation service for the german page):
> https://lkml.org/lkml/2016/11/30/64
> https://marius.bloggt-in-braunschweig.de/2016/11/17/linuxkernel-4-74-8-und-der-oom-killer/
> https://www.spinics.net/lists/linux-mm/msg113661.html

Yeah we were involved in the last one. The regressions were about
high-order allocations
though (the 1kB premise turned out to be misinterpretation) and there
were regressions
for those in 4.7/4.8. But yours are order-0.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-12-09 21:42 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-30  7:10 Still OOM problems with 4.9er kernels Gerhard Wiesinger
2016-11-30  7:20 ` Gerhard Wiesinger
2016-12-09  7:06   ` Gerhard Wiesinger
2016-12-09 13:40     ` Michal Hocko
2016-12-09 13:40       ` Michal Hocko
2016-12-09 15:52       ` Gerhard Wiesinger
2016-12-09 15:52         ` Gerhard Wiesinger
2016-12-09 15:58         ` Gerhard Wiesinger
2016-12-09 15:58           ` Gerhard Wiesinger
2016-12-09 16:09         ` Michal Hocko
2016-12-09 16:09           ` Michal Hocko
2016-12-09 16:58           ` Gerhard Wiesinger
2016-12-09 17:30             ` Michal Hocko
2016-12-09 17:30               ` Michal Hocko
2016-12-09 18:01               ` Gerhard Wiesinger
2016-12-09 18:01                 ` Gerhard Wiesinger
2016-12-09 21:42                 ` Vlastimil Babka [this message]
2016-12-09 21:42                   ` Vlastimil Babka
2016-12-10 13:50                   ` Gerhard Wiesinger
2016-12-10 13:50                     ` Gerhard Wiesinger
2016-12-12  8:24                     ` Michal Hocko
2016-12-12  8:24                       ` Michal Hocko
2016-12-23  2:55         ` Minchan Kim
2016-12-23  2:55           ` Minchan Kim
2017-01-01 17:20           ` Gerhard Wiesinger
2017-01-01 17:20             ` Gerhard Wiesinger
2017-01-04  8:40           ` Gerhard Wiesinger
2017-01-04  9:11             ` Michal Hocko
2017-01-04  9:11               ` Michal Hocko
2017-02-26  8:40               ` Still OOM problems with 4.9er/4.10er kernels Gerhard Wiesinger
2017-02-27  8:27                 ` Michal Hocko
2017-02-27  8:27                   ` Michal Hocko
2017-02-28  6:06                   ` Gerhard Wiesinger
2017-02-28  6:06                     ` Gerhard Wiesinger
2017-02-28  8:14                     ` Michal Hocko
2017-02-28  8:14                       ` Michal Hocko
2017-02-27  9:02                 ` Minchan Kim
2017-02-27  9:02                   ` Minchan Kim
2017-02-27  9:44                   ` Michal Hocko
2017-02-27  9:44                     ` Michal Hocko
2017-02-28  5:17                     ` Minchan Kim
2017-02-28  5:17                       ` Minchan Kim
2017-02-28  8:12                       ` Michal Hocko
2017-02-28  8:12                         ` Michal Hocko
2017-03-02  7:17                         ` Minchan Kim
2017-03-02  7:17                           ` Minchan Kim
2017-03-16  6:38                           ` Gerhard Wiesinger
2017-03-16  6:38                             ` Gerhard Wiesinger
2017-03-16  8:27                             ` Michal Hocko
2017-03-16  8:27                               ` Michal Hocko
2017-03-16  8:47                               ` lkml
2017-03-16  8:47                                 ` lkml
2017-03-16  9:08                                 ` Michal Hocko
2017-03-16  9:08                                   ` Michal Hocko
2017-03-16  9:23                                   ` lkml
2017-03-16  9:23                                     ` lkml
2017-03-16  9:39                                     ` Michal Hocko
2017-03-16  9:39                                       ` Michal Hocko
2017-03-17 16:37                                       ` Gerhard Wiesinger
2017-03-17 16:37                                         ` Gerhard Wiesinger
2017-03-17 17:13                                         ` Michal Hocko
2017-03-17 17:13                                           ` Michal Hocko
2017-03-17 20:08                                           ` Gerhard Wiesinger
2017-03-17 20:08                                             ` Gerhard Wiesinger
2017-03-19  8:17                                             ` Gerhard Wiesinger
2017-03-19  8:17                                               ` Gerhard Wiesinger
2017-03-20  1:54                                               ` Tetsuo Handa
2017-03-20  1:54                                                 ` Tetsuo Handa
2017-03-19 15:18                                             ` Michal Hocko
2017-03-19 15:18                                               ` Michal Hocko
2017-03-19 16:02                                               ` Gerhard Wiesinger
2017-03-19 16:02                                                 ` Gerhard Wiesinger
2017-03-20  3:05                                                 ` Mike Galbraith
2017-03-20  3:05                                                   ` Mike Galbraith
2017-03-21  5:59                                                   ` Gerhard Wiesinger
2017-03-21  5:59                                                     ` Gerhard Wiesinger
2017-03-21  7:13                                                     ` Mike Galbraith
2017-03-21  7:13                                                       ` Mike Galbraith
2017-03-23  7:16                                                       ` Gerhard Wiesinger
2017-03-23  7:16                                                         ` Gerhard Wiesinger
2017-03-23  8:38                                                         ` Mike Galbraith
2017-03-23  8:38                                                           ` Mike Galbraith
2017-03-23 14:46                                                           ` Tetsuo Handa
2017-03-23 14:46                                                             ` Tetsuo Handa
2017-03-26  8:36                                                           ` Gerhard Wiesinger
2017-03-26  8:36                                                             ` Gerhard Wiesinger
2016-12-09 16:03       ` Still OOM problems with 4.9er kernels Gerhard Wiesinger
2016-12-09 16:03         ` Gerhard Wiesinger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dce6a53e-9c13-2a17-ecef-824883506f72@suse.cz \
    --to=vbabka@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lists@wiesinger.com \
    --cc=mhocko@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.