All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
@ 2016-01-18 15:38 ` Jan Stancek
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Stancek @ 2016-01-18 15:38 UTC (permalink / raw)
  To: linux-mm; +Cc: ltp

Hi,

I'm seeing system occasionally hanging after "oom01" testcase
from LTP triggers OOM.

Here's a console log obtained from v4.4-8606 (shows oom, followed
by blocked task messages, followed by me triggering sysrq-t):
  http://jan.stancek.eu/tmp/oom_hangs/oom_hang_v4.4-8606.txt
  http://jan.stancek.eu/tmp/oom_hangs/config-v4.4-8606.txt

I'm running this patch on top, to trigger sysrq-t (system is in remote location):

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 36e2697..f1a27f3 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -77,6 +77,7 @@
 #include <linux/string.h>
 #include <linux/netfilter_ipv4.h>
 #include <linux/slab.h>
+#include <linux/sched.h>
 #include <net/snmp.h>
 #include <net/ip.h>
 #include <net/route.h>
@@ -917,6 +918,10 @@ static bool icmp_echo(struct sk_buff *skb)
                icmp_param.data_len        = skb->len;
                icmp_param.head_len        = sizeof(struct icmphdr);
                icmp_reply(&icmp_param, skb);
+               if (icmp_param.data_len == 1025) {
+                       printk("icmp_echo: %d\n", icmp_param.data_len);
+                       show_state();
+               }
        }
        /* should there be an ICMP stat for ignored echos? */
        return true;


oom01 testcase used to be single threaded, which however caused
tests to run a long time on big boxes with 4+TB of RAM. So, to speed
memory consumption we made it to consume memory in multiple threads.

This was roughly the time kernels started hanging during OOM.
I went back to try older longterm stable releases (3.10.94, 3.12.52), but
I could reproduce problem here as well. So it seems that problem always
existed, but only recent test change exposed it.

I have couple bare metal systems where it triggers within couple hours. For
example: 1x CPU Intel(R) Xeon(R) CPU E3-1285L with 16GB ram. It's not arch
specific, it happens on ppc64 be/le lpar's or KVM guests too.

My reproducer involves running LTP's oom01 testcase in loop. The core
of test is alloc_mem(), which is a combination of mmap/mlock/madvice
and touching all pages:
  https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/lib/mem.c#L29

Regards,
Jan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-01-29 12:35 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-18 15:38 [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab Jan Stancek
2016-01-18 15:38 ` [LTP] " Jan Stancek
2016-01-19 10:29 ` Tetsuo Handa
2016-01-19 10:29   ` [LTP] " Tetsuo Handa
2016-01-19 15:13   ` Jan Stancek
2016-01-19 15:13     ` [LTP] " Jan Stancek
2016-01-20 10:23     ` [BUG] oom hangs the system, NMI backtrace shows most CPUs inshrink_slab Tetsuo Handa
2016-01-20 10:23       ` [LTP] " Tetsuo Handa
2016-01-20 13:17       ` [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab Tetsuo Handa
2016-01-20 13:17         ` [LTP] " Tetsuo Handa
2016-01-20 15:10         ` Tejun Heo
2016-01-20 15:10           ` [LTP] " Tejun Heo
2016-01-20 15:54           ` Tetsuo Handa
2016-01-20 15:54             ` [LTP] " Tetsuo Handa
2016-01-22 15:14   ` Jan Stancek
2016-01-22 15:14     ` [LTP] " Jan Stancek
2016-01-23  6:30     ` Tetsuo Handa
2016-01-23  6:30       ` [LTP] " Tetsuo Handa
2016-01-26  7:48     ` Jan Stancek
2016-01-26  7:48       ` Jan Stancek
2016-01-26 14:46       ` Tetsuo Handa
2016-01-26 14:46         ` Tetsuo Handa
2016-01-27 11:02         ` Tetsuo Handa
2016-01-28 15:48           ` Tetsuo Handa
2016-01-29  7:32             ` Jan Stancek
2016-01-29 12:35               ` Tetsuo Handa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.