All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Michal Hocko <mhocko@kernel.org>
Cc: David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: Re: [patch] mm, oom: prevent additional oom kills before memory is freed
Date: Fri, 16 Jun 2017 09:54:34 +0900	[thread overview]
Message-ID: <201706160054.v5G0sY7c064781@www262.sakura.ne.jp> (raw)
In-Reply-To: <20170615221236.GB22341@dhcp22.suse.cz>

Michal Hocko wrote:
> On Thu 15-06-17 15:03:17, David Rientjes wrote:
> > On Thu, 15 Jun 2017, Michal Hocko wrote:
> > 
> > > > Yes, quite a bit in testing.
> > > > 
> > > > One oom kill shows the system to be oom:
> > > > 
> > > > [22999.488705] Node 0 Normal free:90484kB min:90500kB ...
> > > > [22999.488711] Node 1 Normal free:91536kB min:91948kB ...
> > > > 
> > > > followed up by one or more unnecessary oom kills showing the oom killer 
> > > > racing with memory freeing of the victim:
> > > > 
> > > > [22999.510329] Node 0 Normal free:229588kB min:90500kB ...
> > > > [22999.510334] Node 1 Normal free:600036kB min:91948kB ...
> > > > 
> > > > The patch is absolutely required for us to prevent continuous oom killing 
> > > > of processes after a single process has been oom killed and its memory is 
> > > > in the process of being freed.
> > > 
> > > OK, could you play with the patch/idea suggested in
> > > http://lkml.kernel.org/r/20170615122031.GL1486@dhcp22.suse.cz?
> > > 
> > 
> > I cannot, I am trying to unblock a stable kernel release to my production 
> > that is obviously fixed with this patch and cannot experiment with 
> > uncompiled and untested patches that introduce otherwise unnecessary 
> > locking into the __mmput() path and is based on speculation rather than 
> > hard data that __mmput() for some reason stalls for the oom victim's mm.  
> > I was hoping that this fix could make it in time for 4.12 since 4.12 kills 
> > 1-4 processes unnecessarily for each oom condition and then can review any 
> > tested solution you may propose at a later time.
> 
> I am sorry but I have really hard to make the oom reaper a reliable way
> to stop all the potential oom lockups go away. I do not want to
> reintroduce another potential lockup now. I also do not see why any
> solution should be rushed into. I have proposed a way to go and unless
> it is clear that this is not a way forward then I simply do not agree
> with any partial workarounds or shortcuts.

And the patch you proposed is broken.

----------
[  161.846202] Out of memory: Kill process 6331 (a.out) score 999 or sacrifice child
[  161.850327] Killed process 6331 (a.out) total-vm:4172kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
[  161.858503] ------------[ cut here ]------------
[  161.861512] kernel BUG at mm/memory.c:1381!
[  161.864154] invalid opcode: 0000 [#1] SMP
[  161.866599] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel vmw_balloon pcspkr ppdev shpchp parport_pc i2c_piix4 parport vmw_vmci xfs libcrc32c vmwgfx crc32c_intel drm_kms_helper serio_raw ttm drm e1000 mptspi scsi_transport_spi mptscsih mptbase ata_generic pata_acpi floppy
[  161.896811] CPU: 1 PID: 43 Comm: oom_reaper Not tainted 4.12.0-rc5+ #221
[  161.900458] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[  161.905588] task: ffff937bb1c13200 task.stack: ffffa13cc0b94000
[  161.908876] RIP: 0010:unmap_page_range+0xa19/0xa60
[  161.911739] RSP: 0000:ffffa13cc0b97d08 EFLAGS: 00010282
[  161.914767] RAX: 0000000000000000 RBX: ffff937ba9e89300 RCX: 0000000000401000
[  161.918543] RDX: ffff937baf707440 RSI: ffff937baf707680 RDI: ffffa13cc0b97df0
[  161.922314] RBP: ffffa13cc0b97de0 R08: 0000000000000000 R09: 0000000000000000
[  161.926059] R10: 0000000000000000 R11: 000000001f1e8b15 R12: ffff937ba9e893c0
[  161.929789] R13: ffff937ba4198000 R14: ffff937baf707440 R15: ffff937ba9e89300
[  161.933509] FS:  0000000000000000(0000) GS:ffff937bb3800000(0000) knlGS:0000000000000000
[  161.937615] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  161.940774] CR2: 0000561fb93c1b00 CR3: 000000009ee11000 CR4: 00000000001406e0
[  161.944477] Call Trace:
[  161.946333]  ? __mutex_lock+0x574/0x950
[  161.948678]  ? __mutex_lock+0xce/0x950
[  161.950996]  ? __oom_reap_task_mm+0x49/0x170
[  161.953485]  __oom_reap_task_mm+0xd8/0x170
[  161.955893]  oom_reaper+0xac/0x1c0
[  161.957992]  ? remove_wait_queue+0x60/0x60
[  161.960688]  kthread+0x117/0x150
[  161.962719]  ? trace_event_raw_event_oom_score_adj_update+0xe0/0xe0
[  161.965920]  ? kthread_create_on_node+0x70/0x70
[  161.968417]  ret_from_fork+0x2a/0x40
[  161.970530] Code: 13 fb ff ff e9 25 fc ff ff 48 83 e8 01 e9 77 fc ff ff 48 83 e8 01 e9 62 fe ff ff e8 22 0a e6 ff 48 8b 7d 98 e8 09 ba ff ff 0f 0b <0f> 0b 48 83 e9 01 e9 a1 fb ff ff e8 03 a5 06 00 48 83 e9 01 e9 
[  161.979386] RIP: unmap_page_range+0xa19/0xa60 RSP: ffffa13cc0b97d08
[  161.982611] ---[ end trace ef2b349884b0aaa4 ]---
----------

Please carefully consider the reason why there is VM_BUG_ON() in __mmput(),
and clarify in your patch that what are possible side effects of racing
uprobe_clear_state()/exit_aio()/ksm_exit()/exit_mmap() etc. with
__oom_reap_task_mm() and clarify in your patch that there is no possibility
of waiting for direct/indirect memory allocation inside free_pgtables(),
in addition to fixing the bug above.

----------
	VM_BUG_ON(atomic_read(&mm->mm_users));

	uprobe_clear_state(mm);
	exit_aio(mm);
	ksm_exit(mm);
	khugepaged_exit(mm); /* must run before exit_mmap */
	exit_mmap(mm);
----------

WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Michal Hocko <mhocko@kernel.org>
Cc: David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: Re: [patch] mm, oom: prevent additional oom kills before memory is freed
Date: Fri, 16 Jun 2017 09:54:34 +0900	[thread overview]
Message-ID: <201706160054.v5G0sY7c064781@www262.sakura.ne.jp> (raw)
In-Reply-To: <20170615221236.GB22341@dhcp22.suse.cz>

Michal Hocko wrote:
> On Thu 15-06-17 15:03:17, David Rientjes wrote:
> > On Thu, 15 Jun 2017, Michal Hocko wrote:
> > 
> > > > Yes, quite a bit in testing.
> > > > 
> > > > One oom kill shows the system to be oom:
> > > > 
> > > > [22999.488705] Node 0 Normal free:90484kB min:90500kB ...
> > > > [22999.488711] Node 1 Normal free:91536kB min:91948kB ...
> > > > 
> > > > followed up by one or more unnecessary oom kills showing the oom killer 
> > > > racing with memory freeing of the victim:
> > > > 
> > > > [22999.510329] Node 0 Normal free:229588kB min:90500kB ...
> > > > [22999.510334] Node 1 Normal free:600036kB min:91948kB ...
> > > > 
> > > > The patch is absolutely required for us to prevent continuous oom killing 
> > > > of processes after a single process has been oom killed and its memory is 
> > > > in the process of being freed.
> > > 
> > > OK, could you play with the patch/idea suggested in
> > > http://lkml.kernel.org/r/20170615122031.GL1486@dhcp22.suse.cz?
> > > 
> > 
> > I cannot, I am trying to unblock a stable kernel release to my production 
> > that is obviously fixed with this patch and cannot experiment with 
> > uncompiled and untested patches that introduce otherwise unnecessary 
> > locking into the __mmput() path and is based on speculation rather than 
> > hard data that __mmput() for some reason stalls for the oom victim's mm.  
> > I was hoping that this fix could make it in time for 4.12 since 4.12 kills 
> > 1-4 processes unnecessarily for each oom condition and then can review any 
> > tested solution you may propose at a later time.
> 
> I am sorry but I have really hard to make the oom reaper a reliable way
> to stop all the potential oom lockups go away. I do not want to
> reintroduce another potential lockup now. I also do not see why any
> solution should be rushed into. I have proposed a way to go and unless
> it is clear that this is not a way forward then I simply do not agree
> with any partial workarounds or shortcuts.

And the patch you proposed is broken.

----------
[  161.846202] Out of memory: Kill process 6331 (a.out) score 999 or sacrifice child
[  161.850327] Killed process 6331 (a.out) total-vm:4172kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
[  161.858503] ------------[ cut here ]------------
[  161.861512] kernel BUG at mm/memory.c:1381!
[  161.864154] invalid opcode: 0000 [#1] SMP
[  161.866599] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel vmw_balloon pcspkr ppdev shpchp parport_pc i2c_piix4 parport vmw_vmci xfs libcrc32c vmwgfx crc32c_intel drm_kms_helper serio_raw ttm drm e1000 mptspi scsi_transport_spi mptscsih mptbase ata_generic pata_acpi floppy
[  161.896811] CPU: 1 PID: 43 Comm: oom_reaper Not tainted 4.12.0-rc5+ #221
[  161.900458] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[  161.905588] task: ffff937bb1c13200 task.stack: ffffa13cc0b94000
[  161.908876] RIP: 0010:unmap_page_range+0xa19/0xa60
[  161.911739] RSP: 0000:ffffa13cc0b97d08 EFLAGS: 00010282
[  161.914767] RAX: 0000000000000000 RBX: ffff937ba9e89300 RCX: 0000000000401000
[  161.918543] RDX: ffff937baf707440 RSI: ffff937baf707680 RDI: ffffa13cc0b97df0
[  161.922314] RBP: ffffa13cc0b97de0 R08: 0000000000000000 R09: 0000000000000000
[  161.926059] R10: 0000000000000000 R11: 000000001f1e8b15 R12: ffff937ba9e893c0
[  161.929789] R13: ffff937ba4198000 R14: ffff937baf707440 R15: ffff937ba9e89300
[  161.933509] FS:  0000000000000000(0000) GS:ffff937bb3800000(0000) knlGS:0000000000000000
[  161.937615] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  161.940774] CR2: 0000561fb93c1b00 CR3: 000000009ee11000 CR4: 00000000001406e0
[  161.944477] Call Trace:
[  161.946333]  ? __mutex_lock+0x574/0x950
[  161.948678]  ? __mutex_lock+0xce/0x950
[  161.950996]  ? __oom_reap_task_mm+0x49/0x170
[  161.953485]  __oom_reap_task_mm+0xd8/0x170
[  161.955893]  oom_reaper+0xac/0x1c0
[  161.957992]  ? remove_wait_queue+0x60/0x60
[  161.960688]  kthread+0x117/0x150
[  161.962719]  ? trace_event_raw_event_oom_score_adj_update+0xe0/0xe0
[  161.965920]  ? kthread_create_on_node+0x70/0x70
[  161.968417]  ret_from_fork+0x2a/0x40
[  161.970530] Code: 13 fb ff ff e9 25 fc ff ff 48 83 e8 01 e9 77 fc ff ff 48 83 e8 01 e9 62 fe ff ff e8 22 0a e6 ff 48 8b 7d 98 e8 09 ba ff ff 0f 0b <0f> 0b 48 83 e9 01 e9 a1 fb ff ff e8 03 a5 06 00 48 83 e9 01 e9 
[  161.979386] RIP: unmap_page_range+0xa19/0xa60 RSP: ffffa13cc0b97d08
[  161.982611] ---[ end trace ef2b349884b0aaa4 ]---
----------

Please carefully consider the reason why there is VM_BUG_ON() in __mmput(),
and clarify in your patch that what are possible side effects of racing
uprobe_clear_state()/exit_aio()/ksm_exit()/exit_mmap() etc. with
__oom_reap_task_mm() and clarify in your patch that there is no possibility
of waiting for direct/indirect memory allocation inside free_pgtables(),
in addition to fixing the bug above.

----------
	VM_BUG_ON(atomic_read(&mm->mm_users));

	uprobe_clear_state(mm);
	exit_aio(mm);
	ksm_exit(mm);
	khugepaged_exit(mm); /* must run before exit_mmap */
	exit_mmap(mm);
----------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-06-16  0:54 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-14 23:43 [patch] mm, oom: prevent additional oom kills before memory is freed David Rientjes
2017-06-14 23:43 ` David Rientjes
2017-06-15 10:39 ` Michal Hocko
2017-06-15 10:39   ` Michal Hocko
2017-06-15 10:53   ` Tetsuo Handa
2017-06-15 10:53     ` Tetsuo Handa
2017-06-15 11:01     ` Michal Hocko
2017-06-15 11:01       ` Michal Hocko
2017-06-15 11:32       ` Tetsuo Handa
2017-06-15 11:32         ` Tetsuo Handa
2017-06-15 12:03         ` Michal Hocko
2017-06-15 12:03           ` Michal Hocko
2017-06-15 12:13           ` Michal Hocko
2017-06-15 12:13             ` Michal Hocko
2017-06-15 13:01             ` Tetsuo Handa
2017-06-15 13:01               ` Tetsuo Handa
2017-06-15 13:22               ` Michal Hocko
2017-06-15 13:22                 ` Michal Hocko
2017-06-15 21:43                 ` Tetsuo Handa
2017-06-15 21:43                   ` Tetsuo Handa
2017-06-15 21:37               ` David Rientjes
2017-06-15 21:37                 ` David Rientjes
2017-06-15 12:20       ` Michal Hocko
2017-06-15 12:20         ` Michal Hocko
2017-06-15 21:26   ` David Rientjes
2017-06-15 21:26     ` David Rientjes
2017-06-15 21:41     ` Michal Hocko
2017-06-15 21:41       ` Michal Hocko
2017-06-15 22:03       ` David Rientjes
2017-06-15 22:03         ` David Rientjes
2017-06-15 22:12         ` Michal Hocko
2017-06-15 22:12           ` Michal Hocko
2017-06-15 22:42           ` David Rientjes
2017-06-15 22:42             ` David Rientjes
2017-06-16  8:06             ` Michal Hocko
2017-06-16  8:06               ` Michal Hocko
2017-06-16  0:54           ` Tetsuo Handa [this message]
2017-06-16  0:54             ` Tetsuo Handa
2017-06-16  4:00             ` Tetsuo Handa
2017-06-16  4:00               ` Tetsuo Handa
2017-06-16  8:39             ` Michal Hocko
2017-06-16  8:39               ` Michal Hocko
2017-06-16 10:27               ` Tetsuo Handa
2017-06-16 10:27                 ` Tetsuo Handa
2017-06-16 11:02                 ` Michal Hocko
2017-06-16 11:02                   ` Michal Hocko
2017-06-16 14:26                   ` Re: [patch] mm, oom: prevent additional oom kills before memoryis freed Tetsuo Handa
2017-06-16 14:26                     ` Tetsuo Handa
2017-06-16 14:42                     ` Michal Hocko
2017-06-16 14:42                       ` Michal Hocko
2017-06-17 13:30                       ` Re: [patch] mm, oom: prevent additional oom kills before memory is freed Tetsuo Handa
2017-06-17 13:30                         ` Tetsuo Handa
2017-06-23 12:38                         ` Michal Hocko
2017-06-23 12:38                           ` Michal Hocko
2017-06-16 12:22       ` Tetsuo Handa
2017-06-16 12:22         ` Tetsuo Handa
2017-06-16 14:12         ` Michal Hocko
2017-06-16 14:12           ` Michal Hocko
2017-06-17  5:17           ` [PATCH] mm,oom_kill: Close race window of needlessly selecting new victims Tetsuo Handa
2017-06-17  5:17             ` Tetsuo Handa
2017-06-20 22:12             ` David Rientjes
2017-06-20 22:12               ` David Rientjes
2017-06-21  2:17               ` Tetsuo Handa
2017-06-21 20:31                 ` David Rientjes
2017-06-21 20:31                   ` David Rientjes
2017-06-22  0:53                   ` Tetsuo Handa
2017-06-23 12:45                     ` Michal Hocko
2017-06-23 12:45                       ` Michal Hocko
2017-06-21 13:18               ` Michal Hocko
2017-06-21 13:18                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201706160054.v5G0sY7c064781@www262.sakura.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.