All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
@ 2016-01-18 15:38 ` Jan Stancek
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Stancek @ 2016-01-18 15:38 UTC (permalink / raw)
  To: linux-mm; +Cc: ltp

Hi,

I'm seeing system occasionally hanging after "oom01" testcase
from LTP triggers OOM.

Here's a console log obtained from v4.4-8606 (shows oom, followed
by blocked task messages, followed by me triggering sysrq-t):
  http://jan.stancek.eu/tmp/oom_hangs/oom_hang_v4.4-8606.txt
  http://jan.stancek.eu/tmp/oom_hangs/config-v4.4-8606.txt

I'm running this patch on top, to trigger sysrq-t (system is in remote location):

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 36e2697..f1a27f3 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -77,6 +77,7 @@
 #include <linux/string.h>
 #include <linux/netfilter_ipv4.h>
 #include <linux/slab.h>
+#include <linux/sched.h>
 #include <net/snmp.h>
 #include <net/ip.h>
 #include <net/route.h>
@@ -917,6 +918,10 @@ static bool icmp_echo(struct sk_buff *skb)
                icmp_param.data_len        = skb->len;
                icmp_param.head_len        = sizeof(struct icmphdr);
                icmp_reply(&icmp_param, skb);
+               if (icmp_param.data_len == 1025) {
+                       printk("icmp_echo: %d\n", icmp_param.data_len);
+                       show_state();
+               }
        }
        /* should there be an ICMP stat for ignored echos? */
        return true;


oom01 testcase used to be single threaded, which however caused
tests to run a long time on big boxes with 4+TB of RAM. So, to speed
memory consumption we made it to consume memory in multiple threads.

This was roughly the time kernels started hanging during OOM.
I went back to try older longterm stable releases (3.10.94, 3.12.52), but
I could reproduce problem here as well. So it seems that problem always
existed, but only recent test change exposed it.

I have couple bare metal systems where it triggers within couple hours. For
example: 1x CPU Intel(R) Xeon(R) CPU E3-1285L with 16GB ram. It's not arch
specific, it happens on ppc64 be/le lpar's or KVM guests too.

My reproducer involves running LTP's oom01 testcase in loop. The core
of test is alloc_mem(), which is a combination of mmap/mlock/madvice
and touching all pages:
  https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/lib/mem.c#L29

Regards,
Jan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
@ 2016-01-18 15:38 ` Jan Stancek
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Stancek @ 2016-01-18 15:38 UTC (permalink / raw)
  To: ltp

Hi,

I'm seeing system occasionally hanging after "oom01" testcase
from LTP triggers OOM.

Here's a console log obtained from v4.4-8606 (shows oom, followed
by blocked task messages, followed by me triggering sysrq-t):
  http://jan.stancek.eu/tmp/oom_hangs/oom_hang_v4.4-8606.txt
  http://jan.stancek.eu/tmp/oom_hangs/config-v4.4-8606.txt

I'm running this patch on top, to trigger sysrq-t (system is in remote location):

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 36e2697..f1a27f3 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -77,6 +77,7 @@
 #include <linux/string.h>
 #include <linux/netfilter_ipv4.h>
 #include <linux/slab.h>
+#include <linux/sched.h>
 #include <net/snmp.h>
 #include <net/ip.h>
 #include <net/route.h>
@@ -917,6 +918,10 @@ static bool icmp_echo(struct sk_buff *skb)
                icmp_param.data_len        = skb->len;
                icmp_param.head_len        = sizeof(struct icmphdr);
                icmp_reply(&icmp_param, skb);
+               if (icmp_param.data_len == 1025) {
+                       printk("icmp_echo: %d\n", icmp_param.data_len);
+                       show_state();
+               }
        }
        /* should there be an ICMP stat for ignored echos? */
        return true;


oom01 testcase used to be single threaded, which however caused
tests to run a long time on big boxes with 4+TB of RAM. So, to speed
memory consumption we made it to consume memory in multiple threads.

This was roughly the time kernels started hanging during OOM.
I went back to try older longterm stable releases (3.10.94, 3.12.52), but
I could reproduce problem here as well. So it seems that problem always
existed, but only recent test change exposed it.

I have couple bare metal systems where it triggers within couple hours. For
example: 1x CPU Intel(R) Xeon(R) CPU E3-1285L with 16GB ram. It's not arch
specific, it happens on ppc64 be/le lpar's or KVM guests too.

My reproducer involves running LTP's oom01 testcase in loop. The core
of test is alloc_mem(), which is a combination of mmap/mlock/madvice
and touching all pages:
  https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/lib/mem.c#L29

Regards,
Jan

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
  2016-01-18 15:38 ` [LTP] " Jan Stancek
@ 2016-01-19 10:29   ` Tetsuo Handa
  -1 siblings, 0 replies; 26+ messages in thread
From: Tetsuo Handa @ 2016-01-19 10:29 UTC (permalink / raw)
  To: Jan Stancek, linux-mm; +Cc: ltp

Jan Stancek wrote:
> I'm seeing system occasionally hanging after "oom01" testcase
> from LTP triggers OOM.
>
> Here's a console log obtained from v4.4-8606 (shows oom, followed
> by blocked task messages, followed by me triggering sysrq-t):
>   http://jan.stancek.eu/tmp/oom_hangs/oom_hang_v4.4-8606.txt
>   http://jan.stancek.eu/tmp/oom_hangs/config-v4.4-8606.txt

I think this console log reports an OOM livelock.

dump_tasks() shows that there are 2 oom1 tasks (10482 and 10528) but
show_state() shows that there are 8 oom1 tasks whose parent is 10482
(10528 10529 10530 10531 10532 10533 10534 10535). Thus, I guess oom1
process called fork() and the child process created 7 threads.

----------
[  495.322098] [10482]     0 10482     2692       10      11       3       27             0 oom01
[  495.341231] [10528]     0 10528  8667787  3731327   11002      28  1890661             0 oom01
[ 1736.414809] oom01           S ffff880453dd3e48 13008 10482  10479 0x00000080
[ 1737.025258] oom01           x ffff88044ff9bc98 13080 10528  10482 0x00000084
[ 1737.129553] oom01           R  running task    11616 10529  10482 0x00000084
[ 1737.309520] oom01           R  running task    11544 10530  10482 0x00000084
[ 1737.417212] oom01           R  running task    11616 10531  10482 0x00000084
[ 1737.551904] oom01           R  running task    11616 10532  10482 0x00000084
[ 1737.725174] oom01           R  running task    11616 10533  10482 0x00000084
[ 1737.898452] oom01           R  running task    11616 10534  10482 0x00000084
[ 1738.078431] oom01           R  running task    11616 10535  10482 0x00000084
----------

10528 got both SIGKILL and TIF_MEMDIE at uptime = 495 and has exited by uptime = 1737.

----------
[  495.350845] Out of memory: Kill process 10528 (oom01) score 952 or sacrifice child
[  495.359301] Killed process 10528 (oom01) total-vm:34671148kB, anon-rss:14925308kB, file-rss:0kB, shmem-rss:0kB
----------

Since 10529 to 10535 got only SIGKILL, all of them are looping inside __alloc_pages_slowpath().

----------
[ 1737.129553] oom01           R  running task    11616 10529  10482 0x00000084
[ 1737.172049]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
[ 1737.178072]  [<ffffffff811e6daf>] shrink_slab.part.42+0x2cf/0x540
[ 1737.184873]  [<ffffffff811176ae>] ? rcu_read_lock_held+0x5e/0x70
[ 1737.191577]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
[ 1737.197602]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
[ 1737.204499]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
[ 1737.211107]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
[ 1737.309520] oom01           R  running task    11544 10530  10482 0x00000084
[ 1737.352024]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
[ 1737.358049]  [<ffffffff81268b12>] __alloc_pages_slowpath.constprop.85+0x3f8/0x74a
[ 1737.417212] oom01           R  running task    11616 10531  10482 0x00000084
[ 1737.459715]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
[ 1737.465739]  [<ffffffff811e6daf>] shrink_slab.part.42+0x2cf/0x540
[ 1737.472540]  [<ffffffff811176ae>] ? rcu_read_lock_held+0x5e/0x70
[ 1737.479242]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
[ 1737.485266]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
[ 1737.492162]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
[ 1737.498768]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
[ 1737.551904] oom01           R  running task    11616 10532  10482 0x00000084
[ 1737.594403]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
[ 1737.600428]  [<ffffffff811e6cf0>] shrink_slab.part.42+0x210/0x540
[ 1737.607228]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
[ 1737.613252]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
[ 1737.620150]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
[ 1737.626755]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
[ 1737.725174] oom01           R  running task    11616 10533  10482 0x00000084
[ 1737.767682]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
[ 1737.773705]  [<ffffffff811e6cf0>] shrink_slab.part.42+0x210/0x540
[ 1737.780506]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
[ 1737.786532]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
[ 1737.793428]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
[ 1737.800037]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
[ 1737.898452] oom01           R  running task    11616 10534  10482 0x00000084
[ 1737.940961]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
[ 1737.946985]  [<ffffffff811e6daf>] shrink_slab.part.42+0x2cf/0x540
[ 1737.953785]  [<ffffffff811176ae>] ? rcu_read_lock_held+0x5e/0x70
[ 1737.960488]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
[ 1737.966512]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
[ 1737.973408]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
[ 1737.980015]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
[ 1738.078431] oom01           R  running task    11616 10535  10482 0x00000084
[ 1738.120938]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
[ 1738.126961]  [<ffffffff81268b12>] __alloc_pages_slowpath.constprop.85+0x3f8/0x74a
----------

As a result, 10482 remains sleeping at wait().

----------
[ 1736.414809] oom01           S ffff880453dd3e48 13008 10482  10479 0x00000080
[ 1736.450320]  [<ffffffff817787ec>] schedule+0x3c/0x90
[ 1736.455859]  [<ffffffff8109ba93>] do_wait+0x213/0x2f0
[ 1736.461495]  [<ffffffff8109ce60>] SyS_wait4+0x80/0x100
----------

Also, 10519 is a typical trace which is looping inside __alloc_pages_slowpath()
which can be observed when we hit OOM livelock. Since this allocation request
includes neither __GFP_NOFAIL nor __GFP_FS, out_of_memory() cannot be called.

----------
[ 1736.716427] kworker/2:1     R  running task    12616 10519      2 0x00000088
[ 1736.724319] Workqueue: events_freezable_power_ disk_events_workfn
[ 1736.731128]  00000000ca948786 ffff88035997b8e8 0000000000000000 0000000000000000
[ 1736.739427]  0000000000000000 0000000000000000 ffff88035997b850 ffffffff811ec4ba
[ 1736.747725]  ffff88035997b9d0 0000000000000000 0000000000000000 0001000000000000
[ 1736.756022] Call Trace:
[ 1736.758749]  [<ffffffff811ec4ba>] ? shrink_zone+0x30a/0x330
[ 1736.764969]  [<ffffffff811ec884>] ? do_try_to_free_pages+0x174/0x440
[ 1736.772060]  [<ffffffff811ecc50>] ? try_to_free_pages+0x100/0x2c0
[ 1736.778861]  [<ffffffff81268ae0>] ? __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
[ 1736.787408]  [<ffffffff811dc666>] ? __alloc_pages_nodemask+0x456/0x460
[ 1736.794690]  [<ffffffff810f51e4>] ? __lock_is_held+0x54/0x70
[ 1736.801004]  [<ffffffff81231c17>] ? alloc_pages_current+0x97/0x1b0
[ 1736.807902]  [<ffffffff8137ebd9>] ? bio_copy_kern+0xc9/0x180
[ 1736.814217]  [<ffffffff8138db15>] ? blk_rq_map_kern+0x75/0x130
[ 1736.820726]  [<ffffffff815269e2>] ? scsi_execute+0x132/0x160
[ 1736.827041]  [<ffffffff81528f7e>] ? scsi_execute_req_flags+0x8e/0xf0
[ 1736.834133]  [<ffffffffa01956e7>] ? sr_check_events+0xb7/0x2a0 [sr_mod]
[ 1736.841514]  [<ffffffffa00d7058>] ? cdrom_check_events+0x18/0x30 [cdrom]
[ 1736.848992]  [<ffffffffa0195b2a>] ? sr_block_check_events+0x2a/0x30 [sr_mod]
[ 1736.856859]  [<ffffffff81399bb0>] ? disk_check_events+0x60/0x170
[ 1736.863561]  [<ffffffff81399cdc>] ? disk_events_workfn+0x1c/0x20
[ 1736.870264]  [<ffffffff810b7055>] ? process_one_work+0x215/0x650
[ 1736.876967]  [<ffffffff810b6fc1>] ? process_one_work+0x181/0x650
[ 1736.883671]  [<ffffffff810b75b5>] ? worker_thread+0x125/0x4a0
[ 1736.890082]  [<ffffffff810b7490>] ? process_one_work+0x650/0x650
[ 1736.896785]  [<ffffffff810be191>] ? kthread+0x101/0x120
[ 1736.902615]  [<ffffffff810f8c19>] ? trace_hardirqs_on_caller+0xf9/0x1c0
[ 1736.909997]  [<ffffffff810be090>] ? kthread_create_on_node+0x250/0x250
[ 1736.917281]  [<ffffffff8177ef9f>] ? ret_from_fork+0x3f/0x70
[ 1736.923499]  [<ffffffff810be090>] ? kthread_create_on_node+0x250/0x250
----------

What is strange is, 10529 to 10535 should be able to get TIF_MEMDIE and exit
__alloc_pages_slowpath(). There are three possibilities. First is that they are
too unlucky to take oom_lock mutex which is needed for calling out_of_memory(),
but it should not continue failing to take oom_lock mutex for such long time.
Second is that their allocation requests include neither __GFP_NOFAIL nor
__GFP_FS which is needed for calling out_of_memory(). Third is that, although I
couldn't find evidence that mlock() and madvice() are related with this hangup,
something is preventing __GFP_NOFAIL or __GFP_FS allocation requests from
calling out_of_memory(). If you can reproduce with kmallocwd patch at
https://lkml.kernel.org/r/201511250024.AAE78692.QVOtFFOSFOMLJH@I-love.SAKURA.ne.jp
applied, I think you will be able to know which possibility you are hitting.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
@ 2016-01-19 10:29   ` Tetsuo Handa
  0 siblings, 0 replies; 26+ messages in thread
From: Tetsuo Handa @ 2016-01-19 10:29 UTC (permalink / raw)
  To: ltp

Jan Stancek wrote:
> I'm seeing system occasionally hanging after "oom01" testcase
> from LTP triggers OOM.
>
> Here's a console log obtained from v4.4-8606 (shows oom, followed
> by blocked task messages, followed by me triggering sysrq-t):
>   http://jan.stancek.eu/tmp/oom_hangs/oom_hang_v4.4-8606.txt
>   http://jan.stancek.eu/tmp/oom_hangs/config-v4.4-8606.txt

I think this console log reports an OOM livelock.

dump_tasks() shows that there are 2 oom1 tasks (10482 and 10528) but
show_state() shows that there are 8 oom1 tasks whose parent is 10482
(10528 10529 10530 10531 10532 10533 10534 10535). Thus, I guess oom1
process called fork() and the child process created 7 threads.

----------
[  495.322098] [10482]     0 10482     2692       10      11       3       27             0 oom01
[  495.341231] [10528]     0 10528  8667787  3731327   11002      28  1890661             0 oom01
[ 1736.414809] oom01           S ffff880453dd3e48 13008 10482  10479 0x00000080
[ 1737.025258] oom01           x ffff88044ff9bc98 13080 10528  10482 0x00000084
[ 1737.129553] oom01           R  running task    11616 10529  10482 0x00000084
[ 1737.309520] oom01           R  running task    11544 10530  10482 0x00000084
[ 1737.417212] oom01           R  running task    11616 10531  10482 0x00000084
[ 1737.551904] oom01           R  running task    11616 10532  10482 0x00000084
[ 1737.725174] oom01           R  running task    11616 10533  10482 0x00000084
[ 1737.898452] oom01           R  running task    11616 10534  10482 0x00000084
[ 1738.078431] oom01           R  running task    11616 10535  10482 0x00000084
----------

10528 got both SIGKILL and TIF_MEMDIE at uptime = 495 and has exited by uptime = 1737.

----------
[  495.350845] Out of memory: Kill process 10528 (oom01) score 952 or sacrifice child
[  495.359301] Killed process 10528 (oom01) total-vm:34671148kB, anon-rss:14925308kB, file-rss:0kB, shmem-rss:0kB
----------

Since 10529 to 10535 got only SIGKILL, all of them are looping inside __alloc_pages_slowpath().

----------
[ 1737.129553] oom01           R  running task    11616 10529  10482 0x00000084
[ 1737.172049]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
[ 1737.178072]  [<ffffffff811e6daf>] shrink_slab.part.42+0x2cf/0x540
[ 1737.184873]  [<ffffffff811176ae>] ? rcu_read_lock_held+0x5e/0x70
[ 1737.191577]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
[ 1737.197602]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
[ 1737.204499]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
[ 1737.211107]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
[ 1737.309520] oom01           R  running task    11544 10530  10482 0x00000084
[ 1737.352024]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
[ 1737.358049]  [<ffffffff81268b12>] __alloc_pages_slowpath.constprop.85+0x3f8/0x74a
[ 1737.417212] oom01           R  running task    11616 10531  10482 0x00000084
[ 1737.459715]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
[ 1737.465739]  [<ffffffff811e6daf>] shrink_slab.part.42+0x2cf/0x540
[ 1737.472540]  [<ffffffff811176ae>] ? rcu_read_lock_held+0x5e/0x70
[ 1737.479242]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
[ 1737.485266]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
[ 1737.492162]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
[ 1737.498768]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
[ 1737.551904] oom01           R  running task    11616 10532  10482 0x00000084
[ 1737.594403]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
[ 1737.600428]  [<ffffffff811e6cf0>] shrink_slab.part.42+0x210/0x540
[ 1737.607228]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
[ 1737.613252]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
[ 1737.620150]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
[ 1737.626755]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
[ 1737.725174] oom01           R  running task    11616 10533  10482 0x00000084
[ 1737.767682]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
[ 1737.773705]  [<ffffffff811e6cf0>] shrink_slab.part.42+0x210/0x540
[ 1737.780506]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
[ 1737.786532]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
[ 1737.793428]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
[ 1737.800037]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
[ 1737.898452] oom01           R  running task    11616 10534  10482 0x00000084
[ 1737.940961]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
[ 1737.946985]  [<ffffffff811e6daf>] shrink_slab.part.42+0x2cf/0x540
[ 1737.953785]  [<ffffffff811176ae>] ? rcu_read_lock_held+0x5e/0x70
[ 1737.960488]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
[ 1737.966512]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
[ 1737.973408]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
[ 1737.980015]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
[ 1738.078431] oom01           R  running task    11616 10535  10482 0x00000084
[ 1738.120938]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
[ 1738.126961]  [<ffffffff81268b12>] __alloc_pages_slowpath.constprop.85+0x3f8/0x74a
----------

As a result, 10482 remains sleeping at wait().

----------
[ 1736.414809] oom01           S ffff880453dd3e48 13008 10482  10479 0x00000080
[ 1736.450320]  [<ffffffff817787ec>] schedule+0x3c/0x90
[ 1736.455859]  [<ffffffff8109ba93>] do_wait+0x213/0x2f0
[ 1736.461495]  [<ffffffff8109ce60>] SyS_wait4+0x80/0x100
----------

Also, 10519 is a typical trace which is looping inside __alloc_pages_slowpath()
which can be observed when we hit OOM livelock. Since this allocation request
includes neither __GFP_NOFAIL nor __GFP_FS, out_of_memory() cannot be called.

----------
[ 1736.716427] kworker/2:1     R  running task    12616 10519      2 0x00000088
[ 1736.724319] Workqueue: events_freezable_power_ disk_events_workfn
[ 1736.731128]  00000000ca948786 ffff88035997b8e8 0000000000000000 0000000000000000
[ 1736.739427]  0000000000000000 0000000000000000 ffff88035997b850 ffffffff811ec4ba
[ 1736.747725]  ffff88035997b9d0 0000000000000000 0000000000000000 0001000000000000
[ 1736.756022] Call Trace:
[ 1736.758749]  [<ffffffff811ec4ba>] ? shrink_zone+0x30a/0x330
[ 1736.764969]  [<ffffffff811ec884>] ? do_try_to_free_pages+0x174/0x440
[ 1736.772060]  [<ffffffff811ecc50>] ? try_to_free_pages+0x100/0x2c0
[ 1736.778861]  [<ffffffff81268ae0>] ? __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
[ 1736.787408]  [<ffffffff811dc666>] ? __alloc_pages_nodemask+0x456/0x460
[ 1736.794690]  [<ffffffff810f51e4>] ? __lock_is_held+0x54/0x70
[ 1736.801004]  [<ffffffff81231c17>] ? alloc_pages_current+0x97/0x1b0
[ 1736.807902]  [<ffffffff8137ebd9>] ? bio_copy_kern+0xc9/0x180
[ 1736.814217]  [<ffffffff8138db15>] ? blk_rq_map_kern+0x75/0x130
[ 1736.820726]  [<ffffffff815269e2>] ? scsi_execute+0x132/0x160
[ 1736.827041]  [<ffffffff81528f7e>] ? scsi_execute_req_flags+0x8e/0xf0
[ 1736.834133]  [<ffffffffa01956e7>] ? sr_check_events+0xb7/0x2a0 [sr_mod]
[ 1736.841514]  [<ffffffffa00d7058>] ? cdrom_check_events+0x18/0x30 [cdrom]
[ 1736.848992]  [<ffffffffa0195b2a>] ? sr_block_check_events+0x2a/0x30 [sr_mod]
[ 1736.856859]  [<ffffffff81399bb0>] ? disk_check_events+0x60/0x170
[ 1736.863561]  [<ffffffff81399cdc>] ? disk_events_workfn+0x1c/0x20
[ 1736.870264]  [<ffffffff810b7055>] ? process_one_work+0x215/0x650
[ 1736.876967]  [<ffffffff810b6fc1>] ? process_one_work+0x181/0x650
[ 1736.883671]  [<ffffffff810b75b5>] ? worker_thread+0x125/0x4a0
[ 1736.890082]  [<ffffffff810b7490>] ? process_one_work+0x650/0x650
[ 1736.896785]  [<ffffffff810be191>] ? kthread+0x101/0x120
[ 1736.902615]  [<ffffffff810f8c19>] ? trace_hardirqs_on_caller+0xf9/0x1c0
[ 1736.909997]  [<ffffffff810be090>] ? kthread_create_on_node+0x250/0x250
[ 1736.917281]  [<ffffffff8177ef9f>] ? ret_from_fork+0x3f/0x70
[ 1736.923499]  [<ffffffff810be090>] ? kthread_create_on_node+0x250/0x250
----------

What is strange is, 10529 to 10535 should be able to get TIF_MEMDIE and exit
__alloc_pages_slowpath(). There are three possibilities. First is that they are
too unlucky to take oom_lock mutex which is needed for calling out_of_memory(),
but it should not continue failing to take oom_lock mutex for such long time.
Second is that their allocation requests include neither __GFP_NOFAIL nor
__GFP_FS which is needed for calling out_of_memory(). Third is that, although I
couldn't find evidence that mlock() and madvice() are related with this hangup,
something is preventing __GFP_NOFAIL or __GFP_FS allocation requests from
calling out_of_memory(). If you can reproduce with kmallocwd patch at
https://lkml.kernel.org/r/201511250024.AAE78692.QVOtFFOSFOMLJH@I-love.SAKURA.ne.jp
applied, I think you will be able to know which possibility you are hitting.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
  2016-01-19 10:29   ` [LTP] " Tetsuo Handa
@ 2016-01-19 15:13     ` Jan Stancek
  -1 siblings, 0 replies; 26+ messages in thread
From: Jan Stancek @ 2016-01-19 15:13 UTC (permalink / raw)
  To: Tetsuo Handa, linux-mm; +Cc: ltp

On 01/19/2016 11:29 AM, Tetsuo Handa wrote:
> Jan Stancek wrote:
>> I'm seeing system occasionally hanging after "oom01" testcase
>> from LTP triggers OOM.
>>
>> Here's a console log obtained from v4.4-8606 (shows oom, followed
>> by blocked task messages, followed by me triggering sysrq-t):
>>   http://jan.stancek.eu/tmp/oom_hangs/oom_hang_v4.4-8606.txt
>>   http://jan.stancek.eu/tmp/oom_hangs/config-v4.4-8606.txt
> 
> I think this console log reports an OOM livelock.
> 
> dump_tasks() shows that there are 2 oom1 tasks (10482 and 10528) but
> show_state() shows that there are 8 oom1 tasks whose parent is 10482
> (10528 10529 10530 10531 10532 10533 10534 10535). Thus, I guess oom1
> process called fork() and the child process created 7 threads.

Correct, parent forks a child and expects that child to be killed by OOM.
Child forks NCPU-1 threads, which all keep allocating/touching memory
while they can.

> 
> ----------
> [  495.322098] [10482]     0 10482     2692       10      11       3       27             0 oom01
> [  495.341231] [10528]     0 10528  8667787  3731327   11002      28  1890661             0 oom01
> [ 1736.414809] oom01           S ffff880453dd3e48 13008 10482  10479 0x00000080
> [ 1737.025258] oom01           x ffff88044ff9bc98 13080 10528  10482 0x00000084
> [ 1737.129553] oom01           R  running task    11616 10529  10482 0x00000084
> [ 1737.309520] oom01           R  running task    11544 10530  10482 0x00000084
> [ 1737.417212] oom01           R  running task    11616 10531  10482 0x00000084
> [ 1737.551904] oom01           R  running task    11616 10532  10482 0x00000084
> [ 1737.725174] oom01           R  running task    11616 10533  10482 0x00000084
> [ 1737.898452] oom01           R  running task    11616 10534  10482 0x00000084
> [ 1738.078431] oom01           R  running task    11616 10535  10482 0x00000084
> ----------
> 
> 10528 got both SIGKILL and TIF_MEMDIE at uptime = 495 and has exited by uptime = 1737.
> 
> ----------
> [  495.350845] Out of memory: Kill process 10528 (oom01) score 952 or sacrifice child
> [  495.359301] Killed process 10528 (oom01) total-vm:34671148kB, anon-rss:14925308kB, file-rss:0kB, shmem-rss:0kB
> ----------
> 
> Since 10529 to 10535 got only SIGKILL, all of them are looping inside __alloc_pages_slowpath().
> 
> ----------
> [ 1737.129553] oom01           R  running task    11616 10529  10482 0x00000084
> [ 1737.172049]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
> [ 1737.178072]  [<ffffffff811e6daf>] shrink_slab.part.42+0x2cf/0x540
> [ 1737.184873]  [<ffffffff811176ae>] ? rcu_read_lock_held+0x5e/0x70
> [ 1737.191577]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
> [ 1737.197602]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
> [ 1737.204499]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
> [ 1737.211107]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
> [ 1737.309520] oom01           R  running task    11544 10530  10482 0x00000084
> [ 1737.352024]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
> [ 1737.358049]  [<ffffffff81268b12>] __alloc_pages_slowpath.constprop.85+0x3f8/0x74a
> [ 1737.417212] oom01           R  running task    11616 10531  10482 0x00000084
> [ 1737.459715]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
> [ 1737.465739]  [<ffffffff811e6daf>] shrink_slab.part.42+0x2cf/0x540
> [ 1737.472540]  [<ffffffff811176ae>] ? rcu_read_lock_held+0x5e/0x70
> [ 1737.479242]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
> [ 1737.485266]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
> [ 1737.492162]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
> [ 1737.498768]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
> [ 1737.551904] oom01           R  running task    11616 10532  10482 0x00000084
> [ 1737.594403]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
> [ 1737.600428]  [<ffffffff811e6cf0>] shrink_slab.part.42+0x210/0x540
> [ 1737.607228]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
> [ 1737.613252]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
> [ 1737.620150]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
> [ 1737.626755]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
> [ 1737.725174] oom01           R  running task    11616 10533  10482 0x00000084
> [ 1737.767682]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
> [ 1737.773705]  [<ffffffff811e6cf0>] shrink_slab.part.42+0x210/0x540
> [ 1737.780506]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
> [ 1737.786532]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
> [ 1737.793428]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
> [ 1737.800037]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
> [ 1737.898452] oom01           R  running task    11616 10534  10482 0x00000084
> [ 1737.940961]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
> [ 1737.946985]  [<ffffffff811e6daf>] shrink_slab.part.42+0x2cf/0x540
> [ 1737.953785]  [<ffffffff811176ae>] ? rcu_read_lock_held+0x5e/0x70
> [ 1737.960488]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
> [ 1737.966512]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
> [ 1737.973408]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
> [ 1737.980015]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
> [ 1738.078431] oom01           R  running task    11616 10535  10482 0x00000084
> [ 1738.120938]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
> [ 1738.126961]  [<ffffffff81268b12>] __alloc_pages_slowpath.constprop.85+0x3f8/0x74a
> ----------
> 
> As a result, 10482 remains sleeping at wait().
> 
> ----------
> [ 1736.414809] oom01           S ffff880453dd3e48 13008 10482  10479 0x00000080
> [ 1736.450320]  [<ffffffff817787ec>] schedule+0x3c/0x90
> [ 1736.455859]  [<ffffffff8109ba93>] do_wait+0x213/0x2f0
> [ 1736.461495]  [<ffffffff8109ce60>] SyS_wait4+0x80/0x100
> ----------
> 
> Also, 10519 is a typical trace which is looping inside __alloc_pages_slowpath()
> which can be observed when we hit OOM livelock. Since this allocation request
> includes neither __GFP_NOFAIL nor __GFP_FS, out_of_memory() cannot be called.
> 
> ----------
> [ 1736.716427] kworker/2:1     R  running task    12616 10519      2 0x00000088
> [ 1736.724319] Workqueue: events_freezable_power_ disk_events_workfn
> [ 1736.731128]  00000000ca948786 ffff88035997b8e8 0000000000000000 0000000000000000
> [ 1736.739427]  0000000000000000 0000000000000000 ffff88035997b850 ffffffff811ec4ba
> [ 1736.747725]  ffff88035997b9d0 0000000000000000 0000000000000000 0001000000000000
> [ 1736.756022] Call Trace:
> [ 1736.758749]  [<ffffffff811ec4ba>] ? shrink_zone+0x30a/0x330
> [ 1736.764969]  [<ffffffff811ec884>] ? do_try_to_free_pages+0x174/0x440
> [ 1736.772060]  [<ffffffff811ecc50>] ? try_to_free_pages+0x100/0x2c0
> [ 1736.778861]  [<ffffffff81268ae0>] ? __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
> [ 1736.787408]  [<ffffffff811dc666>] ? __alloc_pages_nodemask+0x456/0x460
> [ 1736.794690]  [<ffffffff810f51e4>] ? __lock_is_held+0x54/0x70
> [ 1736.801004]  [<ffffffff81231c17>] ? alloc_pages_current+0x97/0x1b0
> [ 1736.807902]  [<ffffffff8137ebd9>] ? bio_copy_kern+0xc9/0x180
> [ 1736.814217]  [<ffffffff8138db15>] ? blk_rq_map_kern+0x75/0x130
> [ 1736.820726]  [<ffffffff815269e2>] ? scsi_execute+0x132/0x160
> [ 1736.827041]  [<ffffffff81528f7e>] ? scsi_execute_req_flags+0x8e/0xf0
> [ 1736.834133]  [<ffffffffa01956e7>] ? sr_check_events+0xb7/0x2a0 [sr_mod]
> [ 1736.841514]  [<ffffffffa00d7058>] ? cdrom_check_events+0x18/0x30 [cdrom]
> [ 1736.848992]  [<ffffffffa0195b2a>] ? sr_block_check_events+0x2a/0x30 [sr_mod]
> [ 1736.856859]  [<ffffffff81399bb0>] ? disk_check_events+0x60/0x170
> [ 1736.863561]  [<ffffffff81399cdc>] ? disk_events_workfn+0x1c/0x20
> [ 1736.870264]  [<ffffffff810b7055>] ? process_one_work+0x215/0x650
> [ 1736.876967]  [<ffffffff810b6fc1>] ? process_one_work+0x181/0x650
> [ 1736.883671]  [<ffffffff810b75b5>] ? worker_thread+0x125/0x4a0
> [ 1736.890082]  [<ffffffff810b7490>] ? process_one_work+0x650/0x650
> [ 1736.896785]  [<ffffffff810be191>] ? kthread+0x101/0x120
> [ 1736.902615]  [<ffffffff810f8c19>] ? trace_hardirqs_on_caller+0xf9/0x1c0
> [ 1736.909997]  [<ffffffff810be090>] ? kthread_create_on_node+0x250/0x250
> [ 1736.917281]  [<ffffffff8177ef9f>] ? ret_from_fork+0x3f/0x70
> [ 1736.923499]  [<ffffffff810be090>] ? kthread_create_on_node+0x250/0x250
> ----------
> 
> What is strange is, 10529 to 10535 should be able to get TIF_MEMDIE and exit
> __alloc_pages_slowpath(). There are three possibilities. First is that they are
> too unlucky to take oom_lock mutex which is needed for calling out_of_memory(),
> but it should not continue failing to take oom_lock mutex for such long time.
> Second is that their allocation requests include neither __GFP_NOFAIL nor
> __GFP_FS which is needed for calling out_of_memory(). Third is that, although I
> couldn't find evidence that mlock() and madvice() are related with this hangup,
> something is preventing __GFP_NOFAIL or __GFP_FS allocation requests from
> calling out_of_memory(). If you can reproduce with kmallocwd patch at
> https://lkml.kernel.org/r/201511250024.AAE78692.QVOtFFOSFOMLJH@I-love.SAKURA.ne.jp
> applied, I think you will be able to know which possibility you are hitting.
> 

It took slightly longer this time. Here's full console log:
  http://jan.stancek.eu/tmp/oom_hangs/console.log.2-v4.4-8606-with-memalloc_wc.txt.bz2

Last OOM message happens at:
[ 6845.765153] oom01 invoked oom-killer: gfp_mask=0x24280ca, order=0, oom_score_adj=0

The moments later, first MemAlloc output starts appearing:
[ 6904.555880] MemAlloc-Info: 2 stalling task, 0 dying task, 0 victim task.
[ 6904.563387] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=10001
[ 6904.571353] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=10001

[ 6915.195869] MemAlloc-Info: 16 stalling task, 0 dying task, 0 victim task.
[ 6915.203458] MemAlloc: systemd-journal(592) seq=33409 gfp=0x24201ca order=0 delay=20495
[ 6915.212300] MemAlloc: NetworkManager(807) seq=42042 gfp=0x24200ca order=0 delay=12030
[ 6915.221042] MemAlloc: gssproxy(815) seq=1551 gfp=0x24201ca order=0 delay=19414
[ 6915.229104] MemAlloc: irqbalance(825) seq=6763 gfp=0x24201ca order=0 delay=11234
[ 6915.237363] MemAlloc: tuned(1339) seq=74664 gfp=0x24201ca order=0 delay=20354
[ 6915.245329] MemAlloc: top(10485) seq=486624 gfp=0x24201ca order=0 delay=20124
[ 6915.253288] MemAlloc: kworker/1:1(20708) seq=48 gfp=0x2400000 order=0 delay=20248
[ 6915.261640] MemAlloc: sendmail(21855) seq=207 gfp=0x24201ca order=0 delay=19977
[ 6915.269800] MemAlloc: oom01(22007) seq=2 gfp=0x24201ca order=0 delay=20269
[ 6915.277466] MemAlloc: oom01(22008) seq=5659 gfp=0x24280ca order=0 delay=20502
[ 6915.285432] MemAlloc: oom01(22009) seq=5189 gfp=0x24280ca order=0 delay=20502
[ 6915.293389] MemAlloc: oom01(22010) seq=4795 gfp=0x24280ca order=0 delay=20502
[ 6915.301353] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=20641
[ 6915.309316] MemAlloc: oom01(22012) seq=3828 gfp=0x24280ca order=0 delay=20502
[ 6915.317280] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=20641
[ 6915.325244] MemAlloc: oom01(22014) seq=3633 gfp=0x24280ca order=0 delay=20502

seq numbers above appear to hold for another ~40 minutes, then I power cycled the system.

Regards,
Jan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
@ 2016-01-19 15:13     ` Jan Stancek
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Stancek @ 2016-01-19 15:13 UTC (permalink / raw)
  To: ltp

On 01/19/2016 11:29 AM, Tetsuo Handa wrote:
> Jan Stancek wrote:
>> I'm seeing system occasionally hanging after "oom01" testcase
>> from LTP triggers OOM.
>>
>> Here's a console log obtained from v4.4-8606 (shows oom, followed
>> by blocked task messages, followed by me triggering sysrq-t):
>>   http://jan.stancek.eu/tmp/oom_hangs/oom_hang_v4.4-8606.txt
>>   http://jan.stancek.eu/tmp/oom_hangs/config-v4.4-8606.txt
> 
> I think this console log reports an OOM livelock.
> 
> dump_tasks() shows that there are 2 oom1 tasks (10482 and 10528) but
> show_state() shows that there are 8 oom1 tasks whose parent is 10482
> (10528 10529 10530 10531 10532 10533 10534 10535). Thus, I guess oom1
> process called fork() and the child process created 7 threads.

Correct, parent forks a child and expects that child to be killed by OOM.
Child forks NCPU-1 threads, which all keep allocating/touching memory
while they can.

> 
> ----------
> [  495.322098] [10482]     0 10482     2692       10      11       3       27             0 oom01
> [  495.341231] [10528]     0 10528  8667787  3731327   11002      28  1890661             0 oom01
> [ 1736.414809] oom01           S ffff880453dd3e48 13008 10482  10479 0x00000080
> [ 1737.025258] oom01           x ffff88044ff9bc98 13080 10528  10482 0x00000084
> [ 1737.129553] oom01           R  running task    11616 10529  10482 0x00000084
> [ 1737.309520] oom01           R  running task    11544 10530  10482 0x00000084
> [ 1737.417212] oom01           R  running task    11616 10531  10482 0x00000084
> [ 1737.551904] oom01           R  running task    11616 10532  10482 0x00000084
> [ 1737.725174] oom01           R  running task    11616 10533  10482 0x00000084
> [ 1737.898452] oom01           R  running task    11616 10534  10482 0x00000084
> [ 1738.078431] oom01           R  running task    11616 10535  10482 0x00000084
> ----------
> 
> 10528 got both SIGKILL and TIF_MEMDIE at uptime = 495 and has exited by uptime = 1737.
> 
> ----------
> [  495.350845] Out of memory: Kill process 10528 (oom01) score 952 or sacrifice child
> [  495.359301] Killed process 10528 (oom01) total-vm:34671148kB, anon-rss:14925308kB, file-rss:0kB, shmem-rss:0kB
> ----------
> 
> Since 10529 to 10535 got only SIGKILL, all of them are looping inside __alloc_pages_slowpath().
> 
> ----------
> [ 1737.129553] oom01           R  running task    11616 10529  10482 0x00000084
> [ 1737.172049]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
> [ 1737.178072]  [<ffffffff811e6daf>] shrink_slab.part.42+0x2cf/0x540
> [ 1737.184873]  [<ffffffff811176ae>] ? rcu_read_lock_held+0x5e/0x70
> [ 1737.191577]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
> [ 1737.197602]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
> [ 1737.204499]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
> [ 1737.211107]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
> [ 1737.309520] oom01           R  running task    11544 10530  10482 0x00000084
> [ 1737.352024]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
> [ 1737.358049]  [<ffffffff81268b12>] __alloc_pages_slowpath.constprop.85+0x3f8/0x74a
> [ 1737.417212] oom01           R  running task    11616 10531  10482 0x00000084
> [ 1737.459715]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
> [ 1737.465739]  [<ffffffff811e6daf>] shrink_slab.part.42+0x2cf/0x540
> [ 1737.472540]  [<ffffffff811176ae>] ? rcu_read_lock_held+0x5e/0x70
> [ 1737.479242]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
> [ 1737.485266]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
> [ 1737.492162]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
> [ 1737.498768]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
> [ 1737.551904] oom01           R  running task    11616 10532  10482 0x00000084
> [ 1737.594403]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
> [ 1737.600428]  [<ffffffff811e6cf0>] shrink_slab.part.42+0x210/0x540
> [ 1737.607228]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
> [ 1737.613252]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
> [ 1737.620150]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
> [ 1737.626755]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
> [ 1737.725174] oom01           R  running task    11616 10533  10482 0x00000084
> [ 1737.767682]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
> [ 1737.773705]  [<ffffffff811e6cf0>] shrink_slab.part.42+0x210/0x540
> [ 1737.780506]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
> [ 1737.786532]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
> [ 1737.793428]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
> [ 1737.800037]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
> [ 1737.898452] oom01           R  running task    11616 10534  10482 0x00000084
> [ 1737.940961]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
> [ 1737.946985]  [<ffffffff811e6daf>] shrink_slab.part.42+0x2cf/0x540
> [ 1737.953785]  [<ffffffff811176ae>] ? rcu_read_lock_held+0x5e/0x70
> [ 1737.960488]  [<ffffffff811ec4ba>] shrink_zone+0x30a/0x330
> [ 1737.966512]  [<ffffffff811ec884>] do_try_to_free_pages+0x174/0x440
> [ 1737.973408]  [<ffffffff811ecc50>] try_to_free_pages+0x100/0x2c0
> [ 1737.980015]  [<ffffffff81268ae0>] __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
> [ 1738.078431] oom01           R  running task    11616 10535  10482 0x00000084
> [ 1738.120938]  [<ffffffff81778a8c>] _cond_resched+0x1c/0x30
> [ 1738.126961]  [<ffffffff81268b12>] __alloc_pages_slowpath.constprop.85+0x3f8/0x74a
> ----------
> 
> As a result, 10482 remains sleeping at wait().
> 
> ----------
> [ 1736.414809] oom01           S ffff880453dd3e48 13008 10482  10479 0x00000080
> [ 1736.450320]  [<ffffffff817787ec>] schedule+0x3c/0x90
> [ 1736.455859]  [<ffffffff8109ba93>] do_wait+0x213/0x2f0
> [ 1736.461495]  [<ffffffff8109ce60>] SyS_wait4+0x80/0x100
> ----------
> 
> Also, 10519 is a typical trace which is looping inside __alloc_pages_slowpath()
> which can be observed when we hit OOM livelock. Since this allocation request
> includes neither __GFP_NOFAIL nor __GFP_FS, out_of_memory() cannot be called.
> 
> ----------
> [ 1736.716427] kworker/2:1     R  running task    12616 10519      2 0x00000088
> [ 1736.724319] Workqueue: events_freezable_power_ disk_events_workfn
> [ 1736.731128]  00000000ca948786 ffff88035997b8e8 0000000000000000 0000000000000000
> [ 1736.739427]  0000000000000000 0000000000000000 ffff88035997b850 ffffffff811ec4ba
> [ 1736.747725]  ffff88035997b9d0 0000000000000000 0000000000000000 0001000000000000
> [ 1736.756022] Call Trace:
> [ 1736.758749]  [<ffffffff811ec4ba>] ? shrink_zone+0x30a/0x330
> [ 1736.764969]  [<ffffffff811ec884>] ? do_try_to_free_pages+0x174/0x440
> [ 1736.772060]  [<ffffffff811ecc50>] ? try_to_free_pages+0x100/0x2c0
> [ 1736.778861]  [<ffffffff81268ae0>] ? __alloc_pages_slowpath.constprop.85+0x3c6/0x74a
> [ 1736.787408]  [<ffffffff811dc666>] ? __alloc_pages_nodemask+0x456/0x460
> [ 1736.794690]  [<ffffffff810f51e4>] ? __lock_is_held+0x54/0x70
> [ 1736.801004]  [<ffffffff81231c17>] ? alloc_pages_current+0x97/0x1b0
> [ 1736.807902]  [<ffffffff8137ebd9>] ? bio_copy_kern+0xc9/0x180
> [ 1736.814217]  [<ffffffff8138db15>] ? blk_rq_map_kern+0x75/0x130
> [ 1736.820726]  [<ffffffff815269e2>] ? scsi_execute+0x132/0x160
> [ 1736.827041]  [<ffffffff81528f7e>] ? scsi_execute_req_flags+0x8e/0xf0
> [ 1736.834133]  [<ffffffffa01956e7>] ? sr_check_events+0xb7/0x2a0 [sr_mod]
> [ 1736.841514]  [<ffffffffa00d7058>] ? cdrom_check_events+0x18/0x30 [cdrom]
> [ 1736.848992]  [<ffffffffa0195b2a>] ? sr_block_check_events+0x2a/0x30 [sr_mod]
> [ 1736.856859]  [<ffffffff81399bb0>] ? disk_check_events+0x60/0x170
> [ 1736.863561]  [<ffffffff81399cdc>] ? disk_events_workfn+0x1c/0x20
> [ 1736.870264]  [<ffffffff810b7055>] ? process_one_work+0x215/0x650
> [ 1736.876967]  [<ffffffff810b6fc1>] ? process_one_work+0x181/0x650
> [ 1736.883671]  [<ffffffff810b75b5>] ? worker_thread+0x125/0x4a0
> [ 1736.890082]  [<ffffffff810b7490>] ? process_one_work+0x650/0x650
> [ 1736.896785]  [<ffffffff810be191>] ? kthread+0x101/0x120
> [ 1736.902615]  [<ffffffff810f8c19>] ? trace_hardirqs_on_caller+0xf9/0x1c0
> [ 1736.909997]  [<ffffffff810be090>] ? kthread_create_on_node+0x250/0x250
> [ 1736.917281]  [<ffffffff8177ef9f>] ? ret_from_fork+0x3f/0x70
> [ 1736.923499]  [<ffffffff810be090>] ? kthread_create_on_node+0x250/0x250
> ----------
> 
> What is strange is, 10529 to 10535 should be able to get TIF_MEMDIE and exit
> __alloc_pages_slowpath(). There are three possibilities. First is that they are
> too unlucky to take oom_lock mutex which is needed for calling out_of_memory(),
> but it should not continue failing to take oom_lock mutex for such long time.
> Second is that their allocation requests include neither __GFP_NOFAIL nor
> __GFP_FS which is needed for calling out_of_memory(). Third is that, although I
> couldn't find evidence that mlock() and madvice() are related with this hangup,
> something is preventing __GFP_NOFAIL or __GFP_FS allocation requests from
> calling out_of_memory(). If you can reproduce with kmallocwd patch at
> https://lkml.kernel.org/r/201511250024.AAE78692.QVOtFFOSFOMLJH@I-love.SAKURA.ne.jp
> applied, I think you will be able to know which possibility you are hitting.
> 

It took slightly longer this time. Here's full console log:
  http://jan.stancek.eu/tmp/oom_hangs/console.log.2-v4.4-8606-with-memalloc_wc.txt.bz2

Last OOM message happens at:
[ 6845.765153] oom01 invoked oom-killer: gfp_mask=0x24280ca, order=0, oom_score_adj=0

The moments later, first MemAlloc output starts appearing:
[ 6904.555880] MemAlloc-Info: 2 stalling task, 0 dying task, 0 victim task.
[ 6904.563387] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=10001
[ 6904.571353] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=10001

[ 6915.195869] MemAlloc-Info: 16 stalling task, 0 dying task, 0 victim task.
[ 6915.203458] MemAlloc: systemd-journal(592) seq=33409 gfp=0x24201ca order=0 delay=20495
[ 6915.212300] MemAlloc: NetworkManager(807) seq=42042 gfp=0x24200ca order=0 delay=12030
[ 6915.221042] MemAlloc: gssproxy(815) seq=1551 gfp=0x24201ca order=0 delay=19414
[ 6915.229104] MemAlloc: irqbalance(825) seq=6763 gfp=0x24201ca order=0 delay=11234
[ 6915.237363] MemAlloc: tuned(1339) seq=74664 gfp=0x24201ca order=0 delay=20354
[ 6915.245329] MemAlloc: top(10485) seq=486624 gfp=0x24201ca order=0 delay=20124
[ 6915.253288] MemAlloc: kworker/1:1(20708) seq=48 gfp=0x2400000 order=0 delay=20248
[ 6915.261640] MemAlloc: sendmail(21855) seq=207 gfp=0x24201ca order=0 delay=19977
[ 6915.269800] MemAlloc: oom01(22007) seq=2 gfp=0x24201ca order=0 delay=20269
[ 6915.277466] MemAlloc: oom01(22008) seq=5659 gfp=0x24280ca order=0 delay=20502
[ 6915.285432] MemAlloc: oom01(22009) seq=5189 gfp=0x24280ca order=0 delay=20502
[ 6915.293389] MemAlloc: oom01(22010) seq=4795 gfp=0x24280ca order=0 delay=20502
[ 6915.301353] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=20641
[ 6915.309316] MemAlloc: oom01(22012) seq=3828 gfp=0x24280ca order=0 delay=20502
[ 6915.317280] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=20641
[ 6915.325244] MemAlloc: oom01(22014) seq=3633 gfp=0x24280ca order=0 delay=20502

seq numbers above appear to hold for another ~40 minutes, then I power cycled the system.

Regards,
Jan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] oom hangs the system, NMI backtrace shows most CPUs inshrink_slab
  2016-01-19 15:13     ` [LTP] " Jan Stancek
@ 2016-01-20 10:23       ` Tetsuo Handa
  -1 siblings, 0 replies; 26+ messages in thread
From: Tetsuo Handa @ 2016-01-20 10:23 UTC (permalink / raw)
  To: jstancek, linux-mm; +Cc: ltp

Jan Stancek wrote:
> It took slightly longer this time. Here's full console log:
>   http://jan.stancek.eu/tmp/oom_hangs/console.log.2-v4.4-8606-with-memalloc_wc.txt.bz2

Thank you. This time, tasks are unable to make forward progress because
memory used by oom01 (21979) cannot be reclaimed despite there is no longer
OOM victims or dying tasks.

----------
[ 6822.445821] Swap cache stats: add 442518058, delete 442518058, find 382713/717869
[ 6846.196919] Swap cache stats: add 444426248, delete 444419195, find 384554/720831
[ 6846.575207] Out of memory: Kill process 21986 (oom01) score 953 or sacrifice child
[ 6846.583665] Killed process 21979 (oom01) total-vm:31525420kB, anon-rss:14936316kB, file-rss:0kB, shmem-rss:0kB
[ 6904.555880] MemAlloc-Info: 2 stalling task, 0 dying task, 0 victim task.
[ 6904.563387] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=10001
[ 6904.571353] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=10001
[ 6905.162365] Swap cache stats: add 448028091, delete 448003852, find 386328/724035
[ 6915.195869] MemAlloc-Info: 16 stalling task, 0 dying task, 0 victim task.
[ 6929.233835] MemAlloc-Info: 16 stalling task, 0 dying task, 0 victim task.
[ 6942.387848] MemAlloc-Info: 17 stalling task, 0 dying task, 0 victim task.
[ 6953.243802] MemAlloc-Info: 18 stalling task, 0 dying task, 0 victim task.
[ 6964.150790] MemAlloc-Info: 22 stalling task, 0 dying task, 0 victim task.
[ 7117.623703] MemAlloc-Info: 22 stalling task, 0 dying task, 0 victim task.
[ 7128.348659] MemAlloc-Info: 23 stalling task, 0 dying task, 0 victim task.
[ 7215.508583] MemAlloc-Info: 23 stalling task, 0 dying task, 0 victim task.
[ 7216.221140] Swap cache stats: add 448028091, delete 448003852, find 386328/724035
[ 7226.262927] MemAlloc-Info: 24 stalling task, 0 dying task, 0 victim task.
[ 7227.518106] Swap cache stats: add 448028091, delete 448003852, find 386328/724036
[ 7237.551543] MemAlloc-Info: 25 stalling task, 0 dying task, 0 victim task.
[ 8414.846196] Swap cache stats: add 448028091, delete 448003852, find 386328/724036
[ 8424.879423] MemAlloc-Info: 25 stalling task, 0 dying task, 0 victim task.
[ 8425.679183] Swap cache stats: add 448028091, delete 448003852, find 386328/724037
[ 8435.712419] MemAlloc-Info: 26 stalling task, 0 dying task, 0 victim task.
[ 9549.714223] Swap cache stats: add 448028091, delete 448003852, find 386328/724037
[ 9559.747360] MemAlloc-Info: 26 stalling task, 0 dying task, 0 victim task.
[ 9560.546201] Swap cache stats: add 448028091, delete 448003852, find 386328/724038
[ 9570.579371] MemAlloc-Info: 27 stalling task, 0 dying task, 0 victim task.
[ 9648.210276] MemAlloc-Info: 27 stalling task, 0 dying task, 0 victim task.
[ 9649.057129] Swap cache stats: add 448028091, delete 448003852, find 386328/724038
----------

I noticed three strange things.

(1) oom01 (22011 amd 22013) did not declare out of memory.

    0x24280ca is ___GFP_KSWAPD_RECLAIM | ___GFP_DIRECT_RECLAIM | ___GFP_HARDWALL |
    ___GFP_ZERO | ___GFP_FS | ___GFP_IO | ___GFP_MOVABLE | ___GFP_HIGHMEM. Thus, if
    oom01 (22011 amd 22013) declares out of memory, the OOM killer will be invoked.

    This time you did not hit first and second possibility, for 22011 and 22013
    did not declare out of memory. I guess current vmscan logic prevented them
    from declaring out of memory because one of zones was reclaimable.

	/* Any of the zones still reclaimable?  Don't OOM. */
	if (zones_reclaimable)
		return 1;

(Please ignore mlocked: field, for that value is bogus.)
----------
[ 6904.919732] Node 0 DMA free:15860kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:15996kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB
free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 6904.971210] Node 0 DMA32 free:56072kB min:1132kB low:1412kB high:1696kB active_anon:523468kB inactive_anon:524296kB active_file:8kB inactive_file:12kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:1214568kB managed:1117824kB mlocked:707844970119168kB dirty:0kB writeback:0kB mapped:24kB shmem:80kB slab_reclaimable:636kB slab_unreclaimable:4868kB kernel_stack:336kB
pagetables:3776kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 6905.026961] Node 0 Normal free:13836kB min:14428kB low:18032kB high:21640kB active_anon:12764756kB inactive_anon:1161944kB active_file:52kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:14426112kB managed:14145740kB mlocked:8627678764597248kB dirty:0kB writeback:0kB mapped:208kB shmem:144kB slab_reclaimable:27720kB slab_unreclaimable:60416kB
kernel_stack:3408kB pagetables:44436kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:14084 all_unreclaimable? yes

[ 9648.814118] Node 0 DMA free:15860kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:15996kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB
free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 9648.865599] Node 0 DMA32 free:56072kB min:1132kB low:1412kB high:1696kB active_anon:523468kB inactive_anon:524296kB active_file:8kB inactive_file:12kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:1214568kB managed:1117824kB mlocked:707844970119168kB dirty:0kB writeback:0kB mapped:24kB shmem:80kB slab_reclaimable:636kB slab_unreclaimable:4868kB kernel_stack:336kB
pagetables:3776kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 9648.921343] Node 0 Normal free:13260kB min:14428kB low:18032kB high:21640kB active_anon:12764756kB inactive_anon:1161944kB active_file:12kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:14426112kB managed:14145740kB mlocked:8627678764597248kB dirty:0kB writeback:0kB mapped:208kB shmem:144kB slab_reclaimable:27720kB slab_unreclaimable:60416kB
kernel_stack:3344kB pagetables:44436kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:108547704 all_unreclaimable? yes
----------

(2) free: field of Normal zone remained smaller than min: field.

    After oom01 which was killed at uptime = 6846 terminated, free: field
    should have been recovered because there is no longer OOM victims or
    dying tasks. Something (third possibility) is preventing it.

(3) I/O for swap memory was effectively disabled by uptime = 6904.

    I don't know the reason why the kernel cannot access swap memory.
    To access swap memory, some memory allocation is needed which is
    failing due to free: field of Normal zone smaller than min: field?
    If accessing swap memory depends on workqueue items, are they
    created with WQ_MEM_RECLAIM?

Anyway, a help from MM people is needed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs inshrink_slab
@ 2016-01-20 10:23       ` Tetsuo Handa
  0 siblings, 0 replies; 26+ messages in thread
From: Tetsuo Handa @ 2016-01-20 10:23 UTC (permalink / raw)
  To: ltp

Jan Stancek wrote:
> It took slightly longer this time. Here's full console log:
>   http://jan.stancek.eu/tmp/oom_hangs/console.log.2-v4.4-8606-with-memalloc_wc.txt.bz2

Thank you. This time, tasks are unable to make forward progress because
memory used by oom01 (21979) cannot be reclaimed despite there is no longer
OOM victims or dying tasks.

----------
[ 6822.445821] Swap cache stats: add 442518058, delete 442518058, find 382713/717869
[ 6846.196919] Swap cache stats: add 444426248, delete 444419195, find 384554/720831
[ 6846.575207] Out of memory: Kill process 21986 (oom01) score 953 or sacrifice child
[ 6846.583665] Killed process 21979 (oom01) total-vm:31525420kB, anon-rss:14936316kB, file-rss:0kB, shmem-rss:0kB
[ 6904.555880] MemAlloc-Info: 2 stalling task, 0 dying task, 0 victim task.
[ 6904.563387] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=10001
[ 6904.571353] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=10001
[ 6905.162365] Swap cache stats: add 448028091, delete 448003852, find 386328/724035
[ 6915.195869] MemAlloc-Info: 16 stalling task, 0 dying task, 0 victim task.
[ 6929.233835] MemAlloc-Info: 16 stalling task, 0 dying task, 0 victim task.
[ 6942.387848] MemAlloc-Info: 17 stalling task, 0 dying task, 0 victim task.
[ 6953.243802] MemAlloc-Info: 18 stalling task, 0 dying task, 0 victim task.
[ 6964.150790] MemAlloc-Info: 22 stalling task, 0 dying task, 0 victim task.
[ 7117.623703] MemAlloc-Info: 22 stalling task, 0 dying task, 0 victim task.
[ 7128.348659] MemAlloc-Info: 23 stalling task, 0 dying task, 0 victim task.
[ 7215.508583] MemAlloc-Info: 23 stalling task, 0 dying task, 0 victim task.
[ 7216.221140] Swap cache stats: add 448028091, delete 448003852, find 386328/724035
[ 7226.262927] MemAlloc-Info: 24 stalling task, 0 dying task, 0 victim task.
[ 7227.518106] Swap cache stats: add 448028091, delete 448003852, find 386328/724036
[ 7237.551543] MemAlloc-Info: 25 stalling task, 0 dying task, 0 victim task.
[ 8414.846196] Swap cache stats: add 448028091, delete 448003852, find 386328/724036
[ 8424.879423] MemAlloc-Info: 25 stalling task, 0 dying task, 0 victim task.
[ 8425.679183] Swap cache stats: add 448028091, delete 448003852, find 386328/724037
[ 8435.712419] MemAlloc-Info: 26 stalling task, 0 dying task, 0 victim task.
[ 9549.714223] Swap cache stats: add 448028091, delete 448003852, find 386328/724037
[ 9559.747360] MemAlloc-Info: 26 stalling task, 0 dying task, 0 victim task.
[ 9560.546201] Swap cache stats: add 448028091, delete 448003852, find 386328/724038
[ 9570.579371] MemAlloc-Info: 27 stalling task, 0 dying task, 0 victim task.
[ 9648.210276] MemAlloc-Info: 27 stalling task, 0 dying task, 0 victim task.
[ 9649.057129] Swap cache stats: add 448028091, delete 448003852, find 386328/724038
----------

I noticed three strange things.

(1) oom01 (22011 amd 22013) did not declare out of memory.

    0x24280ca is ___GFP_KSWAPD_RECLAIM | ___GFP_DIRECT_RECLAIM | ___GFP_HARDWALL |
    ___GFP_ZERO | ___GFP_FS | ___GFP_IO | ___GFP_MOVABLE | ___GFP_HIGHMEM. Thus, if
    oom01 (22011 amd 22013) declares out of memory, the OOM killer will be invoked.

    This time you did not hit first and second possibility, for 22011 and 22013
    did not declare out of memory. I guess current vmscan logic prevented them
    from declaring out of memory because one of zones was reclaimable.

	/* Any of the zones still reclaimable?  Don't OOM. */
	if (zones_reclaimable)
		return 1;

(Please ignore mlocked: field, for that value is bogus.)
----------
[ 6904.919732] Node 0 DMA free:15860kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:15996kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB
free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 6904.971210] Node 0 DMA32 free:56072kB min:1132kB low:1412kB high:1696kB active_anon:523468kB inactive_anon:524296kB active_file:8kB inactive_file:12kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:1214568kB managed:1117824kB mlocked:707844970119168kB dirty:0kB writeback:0kB mapped:24kB shmem:80kB slab_reclaimable:636kB slab_unreclaimable:4868kB kernel_stack:336kB
pagetables:3776kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 6905.026961] Node 0 Normal free:13836kB min:14428kB low:18032kB high:21640kB active_anon:12764756kB inactive_anon:1161944kB active_file:52kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:14426112kB managed:14145740kB mlocked:8627678764597248kB dirty:0kB writeback:0kB mapped:208kB shmem:144kB slab_reclaimable:27720kB slab_unreclaimable:60416kB
kernel_stack:3408kB pagetables:44436kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:14084 all_unreclaimable? yes

[ 9648.814118] Node 0 DMA free:15860kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:15996kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB
free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 9648.865599] Node 0 DMA32 free:56072kB min:1132kB low:1412kB high:1696kB active_anon:523468kB inactive_anon:524296kB active_file:8kB inactive_file:12kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:1214568kB managed:1117824kB mlocked:707844970119168kB dirty:0kB writeback:0kB mapped:24kB shmem:80kB slab_reclaimable:636kB slab_unreclaimable:4868kB kernel_stack:336kB
pagetables:3776kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 9648.921343] Node 0 Normal free:13260kB min:14428kB low:18032kB high:21640kB active_anon:12764756kB inactive_anon:1161944kB active_file:12kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:14426112kB managed:14145740kB mlocked:8627678764597248kB dirty:0kB writeback:0kB mapped:208kB shmem:144kB slab_reclaimable:27720kB slab_unreclaimable:60416kB
kernel_stack:3344kB pagetables:44436kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:108547704 all_unreclaimable? yes
----------

(2) free: field of Normal zone remained smaller than min: field.

    After oom01 which was killed at uptime = 6846 terminated, free: field
    should have been recovered because there is no longer OOM victims or
    dying tasks. Something (third possibility) is preventing it.

(3) I/O for swap memory was effectively disabled by uptime = 6904.

    I don't know the reason why the kernel cannot access swap memory.
    To access swap memory, some memory allocation is needed which is
    failing due to free: field of Normal zone smaller than min: field?
    If accessing swap memory depends on workqueue items, are they
    created with WQ_MEM_RECLAIM?

Anyway, a help from MM people is needed.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
  2016-01-20 10:23       ` [LTP] " Tetsuo Handa
@ 2016-01-20 13:17         ` Tetsuo Handa
  -1 siblings, 0 replies; 26+ messages in thread
From: Tetsuo Handa @ 2016-01-20 13:17 UTC (permalink / raw)
  To: tj, jiangshanlai; +Cc: jstancek, linux-mm, ltp

Tetsuo Handa wrote:
> (3) I/O for swap memory was effectively disabled by uptime = 6904.
> 
>     I don't know the reason why the kernel cannot access swap memory.
>     To access swap memory, some memory allocation is needed which is
>     failing due to free: field of Normal zone smaller than min: field?
>     If accessing swap memory depends on workqueue items, are they
>     created with WQ_MEM_RECLAIM?
> 

Tejun, I have a question about system_freezable_power_efficient_wq.

        system_freezable_power_efficient_wq = alloc_workqueue("events_freezable_power_efficient",
                                              WQ_FREEZABLE | WQ_POWER_EFFICIENT,
                                              0);

This workqueue is used by cdrom driver for periodically checking status.
----------
[ 6916.734842] kworker/1:1     R  running task    12592 20708      2 0x00000080 
[ 6916.742739] Workqueue: events_freezable_power_ disk_events_workfn 
[ 6916.749541]  ffff8801cf5877f8 ffff8801cf5878e8 0000000000000000 0000000000000002 
[ 6916.757840]  0000000000000000 ffff88045e822800 ffff88045a138000 ffff8801cf588000 
[ 6916.766139]  ffff8801cf5879d0 0000000000000000 0000000000000000 0000000000000000 
[ 6916.774440] Call Trace: 
[ 6916.777169]  [<ffffffff8177e946>] ? _raw_spin_unlock_irqrestore+0x36/0x60 
[ 6916.784746]  [<ffffffff811ecaaf>] ? shrink_zone+0x18f/0x330 
[ 6916.790966]  [<ffffffff811ecff4>] ? do_try_to_free_pages+0x174/0x440 
[ 6916.798058]  [<ffffffff811ed3c0>] ? try_to_free_pages+0x100/0x2c0 
[ 6916.804860]  [<ffffffff81269102>] ? __alloc_pages_slowpath+0x278/0x78c 
[ 6916.812147]  [<ffffffff811dcdb1>] ? __alloc_pages_nodemask+0x4a1/0x4d0 
[ 6916.819426]  [<ffffffff810f51f4>] ? __lock_is_held+0x54/0x70 
[ 6916.825742]  [<ffffffff81232387>] ? alloc_pages_current+0x97/0x1b0 
[ 6916.832641]  [<ffffffff8137f389>] ? bio_copy_kern+0xc9/0x180 
[ 6916.838957]  [<ffffffff8138e2c5>] ? blk_rq_map_kern+0x75/0x130 
[ 6916.845469]  [<ffffffff81527192>] ? scsi_execute+0x132/0x160 
[ 6916.851787]  [<ffffffff8152972e>] ? scsi_execute_req_flags+0x8e/0xf0 
[ 6916.858880]  [<ffffffffa012d6e7>] ? sr_check_events+0xb7/0x2a0 [sr_mod] 
[ 6916.866257]  [<ffffffffa0085058>] ? cdrom_check_events+0x18/0x30 [cdrom] 
[ 6916.873736]  [<ffffffffa012db2a>] ? sr_block_check_events+0x2a/0x30 [sr_mod] 
[ 6916.881603]  [<ffffffff8139a360>] ? disk_check_events+0x60/0x170 
[ 6916.888308]  [<ffffffff8139a48c>] ? disk_events_workfn+0x1c/0x20 
[ 6916.895013]  [<ffffffff810b7065>] ? process_one_work+0x215/0x650 
[ 6916.901721]  [<ffffffff810b6fd1>] ? process_one_work+0x181/0x650 
[ 6916.908428]  [<ffffffff810b75c5>] ? worker_thread+0x125/0x4a0 
[ 6916.914840]  [<ffffffff810b74a0>] ? process_one_work+0x650/0x650 
[ 6916.921544]  [<ffffffff810be1a1>] ? kthread+0x101/0x120 
[ 6916.927377]  [<ffffffff810f8c29>] ? trace_hardirqs_on_caller+0xf9/0x1c0 
[ 6916.934754]  [<ffffffff810be0a0>] ? kthread_create_on_node+0x250/0x250 
[ 6916.942040]  [<ffffffff8177f75f>] ? ret_from_fork+0x3f/0x70 
[ 6916.948260]  [<ffffffff810be0a0>] ? kthread_create_on_node+0x250/0x250 
[ 6916.955546] 3 locks held by kworker/1:1/20708: 
[ 6916.960504]  #0:  ("events_freezable_power_efficient"){.+.+.+}, at: [<ffffffff810b6fd1>] process_one_work+0x181/0x650 
[ 6916.972402]  #1:  ((&(&ev->dwork)->work)){+.+.+.}, at: [<ffffffff810b6fd1>] process_one_work+0x181/0x650 
[ 6916.983044]  #2:  (shrinker_rwsem){++++..}, at: [<ffffffff8125712f>] mem_cgroup_iter+0x14f/0x870 
----------

What happens if memory allocation requests from items using this workqueue
got stuck due to OOM livelock? Are pending items in this workqueue cannot
be processed because this workqueue was created without WQ_MEM_RECLAIM?

I don't know whether accessing swap memory depends on this workqueue.
But if disk driver depends on this workqueue for accessing swap partition
on the disk, some event is looping inside memory allocator will result in
unable to process disk I/O request for accessing swap partition on the disk?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
@ 2016-01-20 13:17         ` Tetsuo Handa
  0 siblings, 0 replies; 26+ messages in thread
From: Tetsuo Handa @ 2016-01-20 13:17 UTC (permalink / raw)
  To: ltp

Tetsuo Handa wrote:
> (3) I/O for swap memory was effectively disabled by uptime = 6904.
> 
>     I don't know the reason why the kernel cannot access swap memory.
>     To access swap memory, some memory allocation is needed which is
>     failing due to free: field of Normal zone smaller than min: field?
>     If accessing swap memory depends on workqueue items, are they
>     created with WQ_MEM_RECLAIM?
> 

Tejun, I have a question about system_freezable_power_efficient_wq.

        system_freezable_power_efficient_wq = alloc_workqueue("events_freezable_power_efficient",
                                              WQ_FREEZABLE | WQ_POWER_EFFICIENT,
                                              0);

This workqueue is used by cdrom driver for periodically checking status.
----------
[ 6916.734842] kworker/1:1     R  running task    12592 20708      2 0x00000080 
[ 6916.742739] Workqueue: events_freezable_power_ disk_events_workfn 
[ 6916.749541]  ffff8801cf5877f8 ffff8801cf5878e8 0000000000000000 0000000000000002 
[ 6916.757840]  0000000000000000 ffff88045e822800 ffff88045a138000 ffff8801cf588000 
[ 6916.766139]  ffff8801cf5879d0 0000000000000000 0000000000000000 0000000000000000 
[ 6916.774440] Call Trace: 
[ 6916.777169]  [<ffffffff8177e946>] ? _raw_spin_unlock_irqrestore+0x36/0x60 
[ 6916.784746]  [<ffffffff811ecaaf>] ? shrink_zone+0x18f/0x330 
[ 6916.790966]  [<ffffffff811ecff4>] ? do_try_to_free_pages+0x174/0x440 
[ 6916.798058]  [<ffffffff811ed3c0>] ? try_to_free_pages+0x100/0x2c0 
[ 6916.804860]  [<ffffffff81269102>] ? __alloc_pages_slowpath+0x278/0x78c 
[ 6916.812147]  [<ffffffff811dcdb1>] ? __alloc_pages_nodemask+0x4a1/0x4d0 
[ 6916.819426]  [<ffffffff810f51f4>] ? __lock_is_held+0x54/0x70 
[ 6916.825742]  [<ffffffff81232387>] ? alloc_pages_current+0x97/0x1b0 
[ 6916.832641]  [<ffffffff8137f389>] ? bio_copy_kern+0xc9/0x180 
[ 6916.838957]  [<ffffffff8138e2c5>] ? blk_rq_map_kern+0x75/0x130 
[ 6916.845469]  [<ffffffff81527192>] ? scsi_execute+0x132/0x160 
[ 6916.851787]  [<ffffffff8152972e>] ? scsi_execute_req_flags+0x8e/0xf0 
[ 6916.858880]  [<ffffffffa012d6e7>] ? sr_check_events+0xb7/0x2a0 [sr_mod] 
[ 6916.866257]  [<ffffffffa0085058>] ? cdrom_check_events+0x18/0x30 [cdrom] 
[ 6916.873736]  [<ffffffffa012db2a>] ? sr_block_check_events+0x2a/0x30 [sr_mod] 
[ 6916.881603]  [<ffffffff8139a360>] ? disk_check_events+0x60/0x170 
[ 6916.888308]  [<ffffffff8139a48c>] ? disk_events_workfn+0x1c/0x20 
[ 6916.895013]  [<ffffffff810b7065>] ? process_one_work+0x215/0x650 
[ 6916.901721]  [<ffffffff810b6fd1>] ? process_one_work+0x181/0x650 
[ 6916.908428]  [<ffffffff810b75c5>] ? worker_thread+0x125/0x4a0 
[ 6916.914840]  [<ffffffff810b74a0>] ? process_one_work+0x650/0x650 
[ 6916.921544]  [<ffffffff810be1a1>] ? kthread+0x101/0x120 
[ 6916.927377]  [<ffffffff810f8c29>] ? trace_hardirqs_on_caller+0xf9/0x1c0 
[ 6916.934754]  [<ffffffff810be0a0>] ? kthread_create_on_node+0x250/0x250 
[ 6916.942040]  [<ffffffff8177f75f>] ? ret_from_fork+0x3f/0x70 
[ 6916.948260]  [<ffffffff810be0a0>] ? kthread_create_on_node+0x250/0x250 
[ 6916.955546] 3 locks held by kworker/1:1/20708: 
[ 6916.960504]  #0:  ("events_freezable_power_efficient"){.+.+.+}, at: [<ffffffff810b6fd1>] process_one_work+0x181/0x650 
[ 6916.972402]  #1:  ((&(&ev->dwork)->work)){+.+.+.}, at: [<ffffffff810b6fd1>] process_one_work+0x181/0x650 
[ 6916.983044]  #2:  (shrinker_rwsem){++++..}, at: [<ffffffff8125712f>] mem_cgroup_iter+0x14f/0x870 
----------

What happens if memory allocation requests from items using this workqueue
got stuck due to OOM livelock? Are pending items in this workqueue cannot
be processed because this workqueue was created without WQ_MEM_RECLAIM?

I don't know whether accessing swap memory depends on this workqueue.
But if disk driver depends on this workqueue for accessing swap partition
on the disk, some event is looping inside memory allocator will result in
unable to process disk I/O request for accessing swap partition on the disk?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
  2016-01-20 13:17         ` [LTP] " Tetsuo Handa
@ 2016-01-20 15:10           ` Tejun Heo
  -1 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2016-01-20 15:10 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: jiangshanlai, jstancek, linux-mm, ltp

On Wed, Jan 20, 2016 at 10:17:23PM +0900, Tetsuo Handa wrote:
> What happens if memory allocation requests from items using this workqueue
> got stuck due to OOM livelock? Are pending items in this workqueue cannot
> be processed because this workqueue was created without WQ_MEM_RECLAIM?

If something gets stuck due to OOM livelock, anything which tries to
allocate memory can hang.  That's why it's called a livelock.
WQ_MEM_RECLAIM or not wouldn't make any difference.

> I don't know whether accessing swap memory depends on this workqueue.
> But if disk driver depends on this workqueue for accessing swap partition
> on the disk, some event is looping inside memory allocator will result in
> unable to process disk I/O request for accessing swap partition on the disk?

What you're saying is too vauge for me to decipher exactly what you
have on mind.  Can you please elaborate?

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
@ 2016-01-20 15:10           ` Tejun Heo
  0 siblings, 0 replies; 26+ messages in thread
From: Tejun Heo @ 2016-01-20 15:10 UTC (permalink / raw)
  To: ltp

On Wed, Jan 20, 2016 at 10:17:23PM +0900, Tetsuo Handa wrote:
> What happens if memory allocation requests from items using this workqueue
> got stuck due to OOM livelock? Are pending items in this workqueue cannot
> be processed because this workqueue was created without WQ_MEM_RECLAIM?

If something gets stuck due to OOM livelock, anything which tries to
allocate memory can hang.  That's why it's called a livelock.
WQ_MEM_RECLAIM or not wouldn't make any difference.

> I don't know whether accessing swap memory depends on this workqueue.
> But if disk driver depends on this workqueue for accessing swap partition
> on the disk, some event is looping inside memory allocator will result in
> unable to process disk I/O request for accessing swap partition on the disk?

What you're saying is too vauge for me to decipher exactly what you
have on mind.  Can you please elaborate?

-- 
tejun

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
  2016-01-20 15:10           ` [LTP] " Tejun Heo
@ 2016-01-20 15:54             ` Tetsuo Handa
  -1 siblings, 0 replies; 26+ messages in thread
From: Tetsuo Handa @ 2016-01-20 15:54 UTC (permalink / raw)
  To: tj; +Cc: jiangshanlai, jstancek, linux-mm, ltp

Tejun Heo wrote:
> On Wed, Jan 20, 2016 at 10:17:23PM +0900, Tetsuo Handa wrote:
> > What happens if memory allocation requests from items using this workqueue
> > got stuck due to OOM livelock? Are pending items in this workqueue cannot
> > be processed because this workqueue was created without WQ_MEM_RECLAIM?
> 
> If something gets stuck due to OOM livelock, anything which tries to
> allocate memory can hang.  That's why it's called a livelock.
> WQ_MEM_RECLAIM or not wouldn't make any difference.
> 
> > I don't know whether accessing swap memory depends on this workqueue.
> > But if disk driver depends on this workqueue for accessing swap partition
> > on the disk, some event is looping inside memory allocator will result in
> > unable to process disk I/O request for accessing swap partition on the disk?
> 
> What you're saying is too vauge for me to decipher exactly what you
> have on mind.  Can you please elaborate?
> 

In this thread ( http://lkml.kernel.org/r/569D06F8.4040209@redhat.com )
Jan hit an OOM stall where free memory does not increase even after OOM
victim and dying tasks terminated. I'm wondering why such thing can happen.

Since "Swap cache stats:" stopped increasing immediately after the OOM
stall began, I'm suspecting possibility that disk I/O event which is
needed for accessing swap memory is deferred due to cdrom I/O event
stalling at memory allocation when that disk I/O event is needed for
increasing free memory.

  [ 6915.253288] MemAlloc: kworker/1:1(20708) seq=48 gfp=0x2400000 order=0 delay=20248 
  [ 6915.301353] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=20641 
  [ 6915.317280] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=20641 

Maybe retesting with show_workqueue_state() added answers my question.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
@ 2016-01-20 15:54             ` Tetsuo Handa
  0 siblings, 0 replies; 26+ messages in thread
From: Tetsuo Handa @ 2016-01-20 15:54 UTC (permalink / raw)
  To: ltp

Tejun Heo wrote:
> On Wed, Jan 20, 2016 at 10:17:23PM +0900, Tetsuo Handa wrote:
> > What happens if memory allocation requests from items using this workqueue
> > got stuck due to OOM livelock? Are pending items in this workqueue cannot
> > be processed because this workqueue was created without WQ_MEM_RECLAIM?
> 
> If something gets stuck due to OOM livelock, anything which tries to
> allocate memory can hang.  That's why it's called a livelock.
> WQ_MEM_RECLAIM or not wouldn't make any difference.
> 
> > I don't know whether accessing swap memory depends on this workqueue.
> > But if disk driver depends on this workqueue for accessing swap partition
> > on the disk, some event is looping inside memory allocator will result in
> > unable to process disk I/O request for accessing swap partition on the disk?
> 
> What you're saying is too vauge for me to decipher exactly what you
> have on mind.  Can you please elaborate?
> 

In this thread ( http://lkml.kernel.org/r/569D06F8.4040209@redhat.com )
Jan hit an OOM stall where free memory does not increase even after OOM
victim and dying tasks terminated. I'm wondering why such thing can happen.

Since "Swap cache stats:" stopped increasing immediately after the OOM
stall began, I'm suspecting possibility that disk I/O event which is
needed for accessing swap memory is deferred due to cdrom I/O event
stalling at memory allocation when that disk I/O event is needed for
increasing free memory.

  [ 6915.253288] MemAlloc: kworker/1:1(20708) seq=48 gfp=0x2400000 order=0 delay=20248 
  [ 6915.301353] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=20641 
  [ 6915.317280] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=20641 

Maybe retesting with show_workqueue_state() added answers my question.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
  2016-01-19 10:29   ` [LTP] " Tetsuo Handa
@ 2016-01-22 15:14     ` Jan Stancek
  -1 siblings, 0 replies; 26+ messages in thread
From: Jan Stancek @ 2016-01-22 15:14 UTC (permalink / raw)
  To: Tetsuo Handa, linux-mm; +Cc: ltp

On 01/19/2016 11:29 AM, Tetsuo Handa wrote:
> although I
> couldn't find evidence that mlock() and madvice() are related with this hangup,

I simplified reproducer by having only single thread allocating
memory when OOM triggers:
  http://jan.stancek.eu/tmp/oom_hangs/console.log.3-v4.4-8606-with-memalloc.txt

In this instance it was mmap + mlock, as you can see from oom call trace.
It made it to do_exit(), but couldn't complete it:

[19419.667714] oom01           R  running task    11568 32704  32676 0x00000084
[19419.675606]  0000000000000000 0000000000000000 ffffffff81c8dfc0 0000000000000007
[19419.683910]  ffff88045fbd7ad8 ffff880451c88000 ffff88045aac8000 ffff880456f18000
[19419.692207]  ffff88044fb239c0 0000000000000000 ffffffff81205113 ffff880456f17428
[19419.700507] Call Trace:
[19419.703235]  [<ffffffff8177e8af>] ? _raw_spin_unlock+0x1f/0x40
[19419.709746]  [<ffffffff81779207>] ? preempt_schedule_common+0x1f/0x38
[19419.716936]  [<ffffffff8177923c>] ? _cond_resched+0x1c/0x30
[19419.723146]  [<ffffffff811e7369>] ? shrink_slab.part.42+0x119/0x540
[19419.730141]  [<ffffffff81027c59>] ? sched_clock+0x9/0x10
[19419.736070]  [<ffffffff811ecc2a>] ? shrink_zone+0x30a/0x330
[19419.742290]  [<ffffffff811ecff4>] ? do_try_to_free_pages+0x174/0x440
[19419.749381]  [<ffffffff811ed3c0>] ? try_to_free_pages+0x100/0x2c0
[19419.756182]  [<ffffffff81269102>] ? __alloc_pages_slowpath+0x278/0x78c
[19419.763468]  [<ffffffff811dcdb1>] ? __alloc_pages_nodemask+0x4a1/0x4d0
[19419.770755]  [<ffffffff810d592c>] ? local_clock+0x1c/0x30
[19419.776782]  [<ffffffff8123482e>] ? alloc_pages_vma+0xbe/0x2d0
[19419.783294]  [<ffffffff812224d8>] ? __read_swap_cache_async+0x118/0x200
[19419.790666]  [<ffffffff812225e6>] ? read_swap_cache_async+0x26/0x60
[19419.797665]  [<ffffffff81222785>] ? swapin_readahead+0x165/0x1f0
[19419.804367]  [<ffffffff811d0e15>] ? find_get_entry+0x5/0x220
[19419.810686]  [<ffffffff811d1c4c>] ? pagecache_get_page+0x2c/0x270
[19419.817487]  [<ffffffff810f930a>] ? __lock_acquire+0x2aa/0x1130
[19419.824094]  [<ffffffff8120aeac>] ? do_swap_page.isra.61+0x48c/0x830
[19419.831185]  [<ffffffff810f51f4>] ? __lock_is_held+0x54/0x70
[19419.837502]  [<ffffffff8120d3b9>] ? handle_mm_fault+0xa99/0x1720
[19419.844207]  [<ffffffff8120c979>] ? handle_mm_fault+0x59/0x1720
[19419.850814]  [<ffffffff810f51f4>] ? __lock_is_held+0x54/0x70
[19419.857130]  [<ffffffff8107480a>] ? __do_page_fault+0x1ca/0x470
[19419.863737]  [<ffffffff81074ae0>] ? do_page_fault+0x30/0x80
[19419.869955]  [<ffffffff8177ff87>] ? native_iret+0x7/0x7
[19419.875788]  [<ffffffff81781838>] ? page_fault+0x28/0x30
[19419.881708]  [<ffffffff813c77ef>] ? __get_user_8+0x1f/0x29
[19419.887831]  [<ffffffff8113b072>] ? exit_robust_list+0x52/0x1a0
[19419.894437]  [<ffffffff810f8cfd>] ? trace_hardirqs_on+0xd/0x10
[19419.900952]  [<ffffffff81095913>] ? mm_release+0x143/0x160
[19419.907075]  [<ffffffff8109bf36>] ? do_exit+0x166/0xce0
[19419.912899]  [<ffffffff8109cb3c>] ? do_group_exit+0x4c/0xc0
[19419.919118]  [<ffffffff810aaaf1>] ? get_signal+0x331/0x8f0
[19419.925243]  [<ffffffff8101d3c7>] ? do_signal+0x37/0x680
[19419.931172]  [<ffffffff810d592c>] ? local_clock+0x1c/0x30
[19419.937197]  [<ffffffff81117653>] ? rcu_read_lock_sched_held+0x93/0xa0
[19419.944485]  [<ffffffff8123f11d>] ? kfree+0x1bd/0x290
[19419.950124]  [<ffffffff810926a4>] ? exit_to_usermode_loop+0x33/0xac
[19419.957110]  [<ffffffff810926cf>] ? exit_to_usermode_loop+0x5e/0xac
[19419.964107]  [<ffffffff81003d0b>] ? syscall_return_slowpath+0xbb/0x130
[19419.971383]  [<ffffffff8177f55a>] ? int_ret_from_sys_call+0x25/0x9f

[19430.309232] MemAlloc-Info: 12 stalling task, 0 dying task, 0 victim task.
[19430.316821] MemAlloc: auditd(783) seq=615 gfp=0x24201ca order=0 delay=40786
[19430.324593] MemAlloc: irqbalance(806) seq=8107 gfp=0x24201ca order=0 delay=44125
[19430.332847] MemAlloc: chronyd(808) seq=3155 gfp=0x24200ca order=0 delay=10259
[19430.340812] MemAlloc: systemd-logind(818) seq=2191 gfp=0x24200ca order=0 delay=21762
[19430.349456] MemAlloc: NetworkManager(820) seq=10854 gfp=0x24200ca order=0 delay=45212
[19430.358190] MemAlloc: gssproxy(826) seq=586 gfp=0x24201ca order=0 delay=44172
[19430.366157] MemAlloc: tuned(1337) seq=40098 gfp=0x24201ca order=0 delay=45585
[19430.374121] MemAlloc: crond(2242) seq=5612 gfp=0x24201ca order=0 delay=41014
[19430.381989] MemAlloc: systemd-journal(22961) seq=151917 gfp=0x24201ca order=0 delay=46264
[19430.391118] MemAlloc: sendmail(31908) seq=7256 gfp=0x24200ca order=0 delay=43318
[19430.399365] MemAlloc: kworker/2:2(32161) seq=9 gfp=0x2400000 order=0 delay=45574
[19430.407619] MemAlloc: oom01(32704) seq=6391 gfp=0x24200ca order=0 delay=44849 exiting

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
@ 2016-01-22 15:14     ` Jan Stancek
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Stancek @ 2016-01-22 15:14 UTC (permalink / raw)
  To: ltp

On 01/19/2016 11:29 AM, Tetsuo Handa wrote:
> although I
> couldn't find evidence that mlock() and madvice() are related with this hangup,

I simplified reproducer by having only single thread allocating
memory when OOM triggers:
  http://jan.stancek.eu/tmp/oom_hangs/console.log.3-v4.4-8606-with-memalloc.txt

In this instance it was mmap + mlock, as you can see from oom call trace.
It made it to do_exit(), but couldn't complete it:

[19419.667714] oom01           R  running task    11568 32704  32676 0x00000084
[19419.675606]  0000000000000000 0000000000000000 ffffffff81c8dfc0 0000000000000007
[19419.683910]  ffff88045fbd7ad8 ffff880451c88000 ffff88045aac8000 ffff880456f18000
[19419.692207]  ffff88044fb239c0 0000000000000000 ffffffff81205113 ffff880456f17428
[19419.700507] Call Trace:
[19419.703235]  [<ffffffff8177e8af>] ? _raw_spin_unlock+0x1f/0x40
[19419.709746]  [<ffffffff81779207>] ? preempt_schedule_common+0x1f/0x38
[19419.716936]  [<ffffffff8177923c>] ? _cond_resched+0x1c/0x30
[19419.723146]  [<ffffffff811e7369>] ? shrink_slab.part.42+0x119/0x540
[19419.730141]  [<ffffffff81027c59>] ? sched_clock+0x9/0x10
[19419.736070]  [<ffffffff811ecc2a>] ? shrink_zone+0x30a/0x330
[19419.742290]  [<ffffffff811ecff4>] ? do_try_to_free_pages+0x174/0x440
[19419.749381]  [<ffffffff811ed3c0>] ? try_to_free_pages+0x100/0x2c0
[19419.756182]  [<ffffffff81269102>] ? __alloc_pages_slowpath+0x278/0x78c
[19419.763468]  [<ffffffff811dcdb1>] ? __alloc_pages_nodemask+0x4a1/0x4d0
[19419.770755]  [<ffffffff810d592c>] ? local_clock+0x1c/0x30
[19419.776782]  [<ffffffff8123482e>] ? alloc_pages_vma+0xbe/0x2d0
[19419.783294]  [<ffffffff812224d8>] ? __read_swap_cache_async+0x118/0x200
[19419.790666]  [<ffffffff812225e6>] ? read_swap_cache_async+0x26/0x60
[19419.797665]  [<ffffffff81222785>] ? swapin_readahead+0x165/0x1f0
[19419.804367]  [<ffffffff811d0e15>] ? find_get_entry+0x5/0x220
[19419.810686]  [<ffffffff811d1c4c>] ? pagecache_get_page+0x2c/0x270
[19419.817487]  [<ffffffff810f930a>] ? __lock_acquire+0x2aa/0x1130
[19419.824094]  [<ffffffff8120aeac>] ? do_swap_page.isra.61+0x48c/0x830
[19419.831185]  [<ffffffff810f51f4>] ? __lock_is_held+0x54/0x70
[19419.837502]  [<ffffffff8120d3b9>] ? handle_mm_fault+0xa99/0x1720
[19419.844207]  [<ffffffff8120c979>] ? handle_mm_fault+0x59/0x1720
[19419.850814]  [<ffffffff810f51f4>] ? __lock_is_held+0x54/0x70
[19419.857130]  [<ffffffff8107480a>] ? __do_page_fault+0x1ca/0x470
[19419.863737]  [<ffffffff81074ae0>] ? do_page_fault+0x30/0x80
[19419.869955]  [<ffffffff8177ff87>] ? native_iret+0x7/0x7
[19419.875788]  [<ffffffff81781838>] ? page_fault+0x28/0x30
[19419.881708]  [<ffffffff813c77ef>] ? __get_user_8+0x1f/0x29
[19419.887831]  [<ffffffff8113b072>] ? exit_robust_list+0x52/0x1a0
[19419.894437]  [<ffffffff810f8cfd>] ? trace_hardirqs_on+0xd/0x10
[19419.900952]  [<ffffffff81095913>] ? mm_release+0x143/0x160
[19419.907075]  [<ffffffff8109bf36>] ? do_exit+0x166/0xce0
[19419.912899]  [<ffffffff8109cb3c>] ? do_group_exit+0x4c/0xc0
[19419.919118]  [<ffffffff810aaaf1>] ? get_signal+0x331/0x8f0
[19419.925243]  [<ffffffff8101d3c7>] ? do_signal+0x37/0x680
[19419.931172]  [<ffffffff810d592c>] ? local_clock+0x1c/0x30
[19419.937197]  [<ffffffff81117653>] ? rcu_read_lock_sched_held+0x93/0xa0
[19419.944485]  [<ffffffff8123f11d>] ? kfree+0x1bd/0x290
[19419.950124]  [<ffffffff810926a4>] ? exit_to_usermode_loop+0x33/0xac
[19419.957110]  [<ffffffff810926cf>] ? exit_to_usermode_loop+0x5e/0xac
[19419.964107]  [<ffffffff81003d0b>] ? syscall_return_slowpath+0xbb/0x130
[19419.971383]  [<ffffffff8177f55a>] ? int_ret_from_sys_call+0x25/0x9f

[19430.309232] MemAlloc-Info: 12 stalling task, 0 dying task, 0 victim task.
[19430.316821] MemAlloc: auditd(783) seq=615 gfp=0x24201ca order=0 delay=40786
[19430.324593] MemAlloc: irqbalance(806) seq=8107 gfp=0x24201ca order=0 delay=44125
[19430.332847] MemAlloc: chronyd(808) seq=3155 gfp=0x24200ca order=0 delay=10259
[19430.340812] MemAlloc: systemd-logind(818) seq=2191 gfp=0x24200ca order=0 delay=21762
[19430.349456] MemAlloc: NetworkManager(820) seq=10854 gfp=0x24200ca order=0 delay=45212
[19430.358190] MemAlloc: gssproxy(826) seq=586 gfp=0x24201ca order=0 delay=44172
[19430.366157] MemAlloc: tuned(1337) seq=40098 gfp=0x24201ca order=0 delay=45585
[19430.374121] MemAlloc: crond(2242) seq=5612 gfp=0x24201ca order=0 delay=41014
[19430.381989] MemAlloc: systemd-journal(22961) seq=151917 gfp=0x24201ca order=0 delay=46264
[19430.391118] MemAlloc: sendmail(31908) seq=7256 gfp=0x24200ca order=0 delay=43318
[19430.399365] MemAlloc: kworker/2:2(32161) seq=9 gfp=0x2400000 order=0 delay=45574
[19430.407619] MemAlloc: oom01(32704) seq=6391 gfp=0x24200ca order=0 delay=44849 exiting


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
  2016-01-22 15:14     ` [LTP] " Jan Stancek
@ 2016-01-23  6:30       ` Tetsuo Handa
  -1 siblings, 0 replies; 26+ messages in thread
From: Tetsuo Handa @ 2016-01-23  6:30 UTC (permalink / raw)
  To: jstancek, linux-mm; +Cc: ltp

Jan Stancek wrote:
> On 01/19/2016 11:29 AM, Tetsuo Handa wrote:
> > although I
> > couldn't find evidence that mlock() and madvice() are related with this hangup,
> 
> I simplified reproducer by having only single thread allocating
> memory when OOM triggers:
>   http://jan.stancek.eu/tmp/oom_hangs/console.log.3-v4.4-8606-with-memalloc.txt
> 
> In this instance it was mmap + mlock, as you can see from oom call trace.
> It made it to do_exit(), but couldn't complete it:

Thank you for retaking.

Comparing console.log.2-v4.4-8606-with-memalloc_wc.txt.bz2 and
console.log.3-v4.4-8606-with-memalloc.txt :

  different things

    Free swap  = 0kB for the former
    Free swap  = 7556632kB for the latter

  common things

    All stalling allocations are order 0.
    Swap cache stats: stopped increasing
    Node 0 Normal free: remained below min:
    A kworker got stuck inside 0x2400000 (GFP_NOIO) allocation within 1 second
    after other allocations (0x24280ca (GFP_HIGHUSER_MOVABLE) or 0x24201ca
    (GFP_HIGHUSER_MOVABLE | __GFP_COLD)) got stuck.

----------
[ 6904.555880] MemAlloc-Info: 2 stalling task, 0 dying task, 0 victim task.
[ 6904.563387] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=10001
[ 6904.571353] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=10001
[ 6915.195869] MemAlloc-Info: 16 stalling task, 0 dying task, 0 victim task.
[ 6915.203458] MemAlloc: systemd-journal(592) seq=33409 gfp=0x24201ca order=0 delay=20495
[ 6915.212300] MemAlloc: NetworkManager(807) seq=42042 gfp=0x24200ca order=0 delay=12030
[ 6915.221042] MemAlloc: gssproxy(815) seq=1551 gfp=0x24201ca order=0 delay=19414
[ 6915.229104] MemAlloc: irqbalance(825) seq=6763 gfp=0x24201ca order=0 delay=11234
[ 6915.237363] MemAlloc: tuned(1339) seq=74664 gfp=0x24201ca order=0 delay=20354
[ 6915.245329] MemAlloc: top(10485) seq=486624 gfp=0x24201ca order=0 delay=20124
[ 6915.253288] MemAlloc: kworker/1:1(20708) seq=48 gfp=0x2400000 order=0 delay=20248
[ 6915.261640] MemAlloc: sendmail(21855) seq=207 gfp=0x24201ca order=0 delay=19977
[ 6915.269800] MemAlloc: oom01(22007) seq=2 gfp=0x24201ca order=0 delay=20269
[ 6915.277466] MemAlloc: oom01(22008) seq=5659 gfp=0x24280ca order=0 delay=20502
[ 6915.285432] MemAlloc: oom01(22009) seq=5189 gfp=0x24280ca order=0 delay=20502
[ 6915.293389] MemAlloc: oom01(22010) seq=4795 gfp=0x24280ca order=0 delay=20502
[ 6915.301353] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=20641
[ 6915.309316] MemAlloc: oom01(22012) seq=3828 gfp=0x24280ca order=0 delay=20502
[ 6915.317280] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=20641
[ 6915.325244] MemAlloc: oom01(22014) seq=3633 gfp=0x24280ca order=0 delay=20502
----------
[19394.048063] MemAlloc-Info: 1 stalling task, 0 dying task, 0 victim task.
[19394.055562] MemAlloc: systemd-journal(22961) seq=151917 gfp=0x24201ca order=0 delay=10001
[19404.625516] MemAlloc-Info: 10 stalling task, 0 dying task, 0 victim task.
[19404.633107] MemAlloc: auditd(783) seq=615 gfp=0x24201ca order=0 delay=15101
[19404.640877] MemAlloc: irqbalance(806) seq=8107 gfp=0x24201ca order=0 delay=18440
[19404.649135] MemAlloc: NetworkManager(820) seq=10854 gfp=0x24200ca order=0 delay=19527
[19404.657874] MemAlloc: gssproxy(826) seq=586 gfp=0x24201ca order=0 delay=18487
[19404.665841] MemAlloc: tuned(1337) seq=40098 gfp=0x24201ca order=0 delay=19900
[19404.673805] MemAlloc: crond(2242) seq=5612 gfp=0x24201ca order=0 delay=15329
[19404.681674] MemAlloc: systemd-journal(22961) seq=151917 gfp=0x24201ca order=0 delay=20579
[19404.690796] MemAlloc: sendmail(31908) seq=7256 gfp=0x24200ca order=0 delay=17633
[19404.699051] MemAlloc: kworker/2:2(32161) seq=9 gfp=0x2400000 order=0 delay=19889
[19404.707306] MemAlloc: oom01(32704) seq=6391 gfp=0x24200ca order=0 delay=19164 exiting
----------

Does somebody know whether GFP_HIGHUSER_MOVABLE depend on workqueue status?

   * GFP_HIGHUSER_MOVABLE is for userspace allocations that the kernel does not
   *   need direct access to but can use kmap() when access is required. They
   *   are expected to be movable via page reclaim or page migration. Typically,
   *   pages on the LRU would also be allocated with GFP_HIGHUSER_MOVABLE.

I don't have reproducer environment. But if this problem involves workqueue,
running kernel module below which requests GFP_NOIO allocation more frequently
than disk_check_events() does might help reproducing this problem.

---------- test/wq_test.c ----------
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/kthread.h>
#include <linux/delay.h>

static void wq_test_fn(struct work_struct *work);
static struct task_struct *task;
static bool pending;
static DECLARE_WORK(wq_test_work, wq_test_fn);

static void wq_test_fn(struct work_struct *unused)
{
	kfree(kmalloc(PAGE_SIZE, GFP_NOIO));
	pending = false;
}

static int wq_test_thread(void *unused)
{
	while (!kthread_should_stop()) {
		msleep(HZ / 10);
		pending = true;
		queue_work(system_freezable_power_efficient_wq, &wq_test_work);
		while (pending)
			msleep(1);
	}
	return 0;
}

static int __init wq_test_init(void)
{
	task = kthread_run(wq_test_thread, NULL, "wq_test");
	return IS_ERR(task) ? -ENOMEM : 0;
}

static void __exit wq_test_exit(void)
{
	kthread_stop(task);
	ssleep(1);
}

module_init(wq_test_init);
module_exit(wq_test_exit);
MODULE_LICENSE("GPL");
---------- test/wq_test.c ----------
---------- test/Makefile ----------
obj-m += wq_test.o
---------- test/Makefile ----------

$ make SUBDIRS=$PWD/test
# insmod test/wq_test.ko

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
@ 2016-01-23  6:30       ` Tetsuo Handa
  0 siblings, 0 replies; 26+ messages in thread
From: Tetsuo Handa @ 2016-01-23  6:30 UTC (permalink / raw)
  To: ltp

Jan Stancek wrote:
> On 01/19/2016 11:29 AM, Tetsuo Handa wrote:
> > although I
> > couldn't find evidence that mlock() and madvice() are related with this hangup,
> 
> I simplified reproducer by having only single thread allocating
> memory when OOM triggers:
>   http://jan.stancek.eu/tmp/oom_hangs/console.log.3-v4.4-8606-with-memalloc.txt
> 
> In this instance it was mmap + mlock, as you can see from oom call trace.
> It made it to do_exit(), but couldn't complete it:

Thank you for retaking.

Comparing console.log.2-v4.4-8606-with-memalloc_wc.txt.bz2 and
console.log.3-v4.4-8606-with-memalloc.txt :

  different things

    Free swap  = 0kB for the former
    Free swap  = 7556632kB for the latter

  common things

    All stalling allocations are order 0.
    Swap cache stats: stopped increasing
    Node 0 Normal free: remained below min:
    A kworker got stuck inside 0x2400000 (GFP_NOIO) allocation within 1 second
    after other allocations (0x24280ca (GFP_HIGHUSER_MOVABLE) or 0x24201ca
    (GFP_HIGHUSER_MOVABLE | __GFP_COLD)) got stuck.

----------
[ 6904.555880] MemAlloc-Info: 2 stalling task, 0 dying task, 0 victim task.
[ 6904.563387] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=10001
[ 6904.571353] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=10001
[ 6915.195869] MemAlloc-Info: 16 stalling task, 0 dying task, 0 victim task.
[ 6915.203458] MemAlloc: systemd-journal(592) seq=33409 gfp=0x24201ca order=0 delay=20495
[ 6915.212300] MemAlloc: NetworkManager(807) seq=42042 gfp=0x24200ca order=0 delay=12030
[ 6915.221042] MemAlloc: gssproxy(815) seq=1551 gfp=0x24201ca order=0 delay=19414
[ 6915.229104] MemAlloc: irqbalance(825) seq=6763 gfp=0x24201ca order=0 delay=11234
[ 6915.237363] MemAlloc: tuned(1339) seq=74664 gfp=0x24201ca order=0 delay=20354
[ 6915.245329] MemAlloc: top(10485) seq=486624 gfp=0x24201ca order=0 delay=20124
[ 6915.253288] MemAlloc: kworker/1:1(20708) seq=48 gfp=0x2400000 order=0 delay=20248
[ 6915.261640] MemAlloc: sendmail(21855) seq=207 gfp=0x24201ca order=0 delay=19977
[ 6915.269800] MemAlloc: oom01(22007) seq=2 gfp=0x24201ca order=0 delay=20269
[ 6915.277466] MemAlloc: oom01(22008) seq=5659 gfp=0x24280ca order=0 delay=20502
[ 6915.285432] MemAlloc: oom01(22009) seq=5189 gfp=0x24280ca order=0 delay=20502
[ 6915.293389] MemAlloc: oom01(22010) seq=4795 gfp=0x24280ca order=0 delay=20502
[ 6915.301353] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=20641
[ 6915.309316] MemAlloc: oom01(22012) seq=3828 gfp=0x24280ca order=0 delay=20502
[ 6915.317280] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=20641
[ 6915.325244] MemAlloc: oom01(22014) seq=3633 gfp=0x24280ca order=0 delay=20502
----------
[19394.048063] MemAlloc-Info: 1 stalling task, 0 dying task, 0 victim task.
[19394.055562] MemAlloc: systemd-journal(22961) seq=151917 gfp=0x24201ca order=0 delay=10001
[19404.625516] MemAlloc-Info: 10 stalling task, 0 dying task, 0 victim task.
[19404.633107] MemAlloc: auditd(783) seq=615 gfp=0x24201ca order=0 delay=15101
[19404.640877] MemAlloc: irqbalance(806) seq=8107 gfp=0x24201ca order=0 delay=18440
[19404.649135] MemAlloc: NetworkManager(820) seq=10854 gfp=0x24200ca order=0 delay=19527
[19404.657874] MemAlloc: gssproxy(826) seq=586 gfp=0x24201ca order=0 delay=18487
[19404.665841] MemAlloc: tuned(1337) seq=40098 gfp=0x24201ca order=0 delay=19900
[19404.673805] MemAlloc: crond(2242) seq=5612 gfp=0x24201ca order=0 delay=15329
[19404.681674] MemAlloc: systemd-journal(22961) seq=151917 gfp=0x24201ca order=0 delay=20579
[19404.690796] MemAlloc: sendmail(31908) seq=7256 gfp=0x24200ca order=0 delay=17633
[19404.699051] MemAlloc: kworker/2:2(32161) seq=9 gfp=0x2400000 order=0 delay=19889
[19404.707306] MemAlloc: oom01(32704) seq=6391 gfp=0x24200ca order=0 delay=19164 exiting
----------

Does somebody know whether GFP_HIGHUSER_MOVABLE depend on workqueue status?

   * GFP_HIGHUSER_MOVABLE is for userspace allocations that the kernel does not
   *   need direct access to but can use kmap() when access is required. They
   *   are expected to be movable via page reclaim or page migration. Typically,
   *   pages on the LRU would also be allocated with GFP_HIGHUSER_MOVABLE.

I don't have reproducer environment. But if this problem involves workqueue,
running kernel module below which requests GFP_NOIO allocation more frequently
than disk_check_events() does might help reproducing this problem.

---------- test/wq_test.c ----------
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/kthread.h>
#include <linux/delay.h>

static void wq_test_fn(struct work_struct *work);
static struct task_struct *task;
static bool pending;
static DECLARE_WORK(wq_test_work, wq_test_fn);

static void wq_test_fn(struct work_struct *unused)
{
	kfree(kmalloc(PAGE_SIZE, GFP_NOIO));
	pending = false;
}

static int wq_test_thread(void *unused)
{
	while (!kthread_should_stop()) {
		msleep(HZ / 10);
		pending = true;
		queue_work(system_freezable_power_efficient_wq, &wq_test_work);
		while (pending)
			msleep(1);
	}
	return 0;
}

static int __init wq_test_init(void)
{
	task = kthread_run(wq_test_thread, NULL, "wq_test");
	return IS_ERR(task) ? -ENOMEM : 0;
}

static void __exit wq_test_exit(void)
{
	kthread_stop(task);
	ssleep(1);
}

module_init(wq_test_init);
module_exit(wq_test_exit);
MODULE_LICENSE("GPL");
---------- test/wq_test.c ----------
---------- test/Makefile ----------
obj-m += wq_test.o
---------- test/Makefile ----------

$ make SUBDIRS=$PWD/test
# insmod test/wq_test.ko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
  2016-01-22 15:14     ` [LTP] " Jan Stancek
@ 2016-01-26  7:48       ` Jan Stancek
  -1 siblings, 0 replies; 26+ messages in thread
From: Jan Stancek @ 2016-01-26  7:48 UTC (permalink / raw)
  To: Tetsuo Handa, linux-mm; +Cc: ltp

[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]

On 01/22/2016 04:14 PM, Jan Stancek wrote:
> On 01/19/2016 11:29 AM, Tetsuo Handa wrote:
>> although I
>> couldn't find evidence that mlock() and madvice() are related with this hangup,
> 
> I simplified reproducer by having only single thread allocating
> memory when OOM triggers:
>   http://jan.stancek.eu/tmp/oom_hangs/console.log.3-v4.4-8606-with-memalloc.txt
> 
> In this instance it was mmap + mlock, as you can see from oom call trace.
> It made it to do_exit(), but couldn't complete it:

I have extracted test from LTP into standalone reproducer (attached),
if you want to give a try. It usually hangs my system within ~30
minutes. If it takes too long, you can try disabling swap. From my past
experience this usually helped to reproduce it faster on small KVM guests.

# gcc oom_mlock.c -pthread -O2
# echo 1 > /proc/sys/vm/overcommit_memory
(optionally) # swapoff -a
# ./a.out

Also, it's interesting to note, that when I disabled mlock() calls
test ran fine over night. I'll look into confirming this observation
on more systems.

Regards,
Jan

[-- Attachment #2: oom_mlock.c --]
[-- Type: text/x-csrc, Size: 1974 bytes --]

#include <errno.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/wait.h>

/*
 * oom hang reproducer v1
 *
 * # gcc oom_mlock.c -pthread -O2
 * # echo 1 > /proc/sys/vm/overcommit_memory
 * (optionally) # swapoff -a
 * # ./a.out
 */

#define _1GB (1024L*1024*1024)

static do_mlock = 1;

static int alloc_mem(long int length)
{
	char *s;
	long i, pagesz = getpagesize();
	int loop = 10;

	printf("thread (%lx), allocating %ld bytes, do_mlock: %d\n",
		(unsigned long) pthread_self(), length, do_mlock);

	s = mmap(NULL, length, PROT_READ | PROT_WRITE,
		 MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
	if (s == MAP_FAILED)
		return errno;

	if (do_mlock) {
		while (mlock(s, length) == -1 && loop > 0) {
			if (EAGAIN != errno)
				return errno;
			usleep(300000);
			loop--;
		}
	}

	for (i = 0; i < length; i += pagesz)
		s[i] = '\a';

	return 0;
}

void *alloc_thread(void *args)
{
	int ret;

	do {
		ret = alloc_mem(3 * _1GB);
	} while (ret == 0);

	exit(ret);
}

int trigger_oom(void)
{
	int i, ret, child, status, threads;
	pthread_t *th;

	threads = sysconf(_SC_NPROCESSORS_ONLN) - 1;
	th = malloc(sizeof(pthread_t) * threads);
	if (!th) {
		printf("malloc failed\n");
		exit(2);
	}

	do_mlock = !do_mlock;
	child = fork();
	if (child == 0) {
		for (i = 0; i < threads - 1; i++) {
			ret = pthread_create(&th[i], NULL, alloc_thread, NULL);
			if (ret) {
				printf("pthread_create failed with %d\n", ret);
				exit(3);
			}
		}
		pause();
	}
	
	if (waitpid(-1, &status, 0) == -1) {
		perror("waitpid");
		exit(1);
	}

	if (WIFSIGNALED(status)) {
		printf("child killed by %d\n", WTERMSIG(status));
		if (WTERMSIG(status) != SIGKILL)
			exit(1);
	}
	
	if (WIFEXITED(status)) {
		printf("child exited with %d\n", WEXITSTATUS(status));
		if (WEXITSTATUS(status) != ENOMEM)
			exit(1);
	}
}

int main(void)
{
	int i = 1;

	while (1) {
		printf("starting iteration %d\n", i++);
		trigger_oom();
	}

	return 0;
}


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
@ 2016-01-26  7:48       ` Jan Stancek
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Stancek @ 2016-01-26  7:48 UTC (permalink / raw)
  To: ltp

On 01/22/2016 04:14 PM, Jan Stancek wrote:
> On 01/19/2016 11:29 AM, Tetsuo Handa wrote:
>> although I
>> couldn't find evidence that mlock() and madvice() are related with this hangup,
> 
> I simplified reproducer by having only single thread allocating
> memory when OOM triggers:
>   http://jan.stancek.eu/tmp/oom_hangs/console.log.3-v4.4-8606-with-memalloc.txt
> 
> In this instance it was mmap + mlock, as you can see from oom call trace.
> It made it to do_exit(), but couldn't complete it:

I have extracted test from LTP into standalone reproducer (attached),
if you want to give a try. It usually hangs my system within ~30
minutes. If it takes too long, you can try disabling swap. From my past
experience this usually helped to reproduce it faster on small KVM guests.

# gcc oom_mlock.c -pthread -O2
# echo 1 > /proc/sys/vm/overcommit_memory
(optionally) # swapoff -a
# ./a.out

Also, it's interesting to note, that when I disabled mlock() calls
test ran fine over night. I'll look into confirming this observation
on more systems.

Regards,
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oom_mlock.c
Type: text/x-csrc
Size: 1974 bytes
Desc: not available
URL: <http://lists.linux.it/pipermail/ltp/attachments/20160126/f54f7c7b/attachment.c>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
  2016-01-26  7:48       ` Jan Stancek
@ 2016-01-26 14:46         ` Tetsuo Handa
  -1 siblings, 0 replies; 26+ messages in thread
From: Tetsuo Handa @ 2016-01-26 14:46 UTC (permalink / raw)
  To: jstancek, linux-mm; +Cc: ltp

Jan Stancek wrote:
> On 01/22/2016 04:14 PM, Jan Stancek wrote:
> > On 01/19/2016 11:29 AM, Tetsuo Handa wrote:
> >> although I
> >> couldn't find evidence that mlock() and madvice() are related with this hangup,
> > 
> > I simplified reproducer by having only single thread allocating
> > memory when OOM triggers:
> >   http://jan.stancek.eu/tmp/oom_hangs/console.log.3-v4.4-8606-with-memalloc.txt
> > 
> > In this instance it was mmap + mlock, as you can see from oom call trace.
> > It made it to do_exit(), but couldn't complete it:
> 
> I have extracted test from LTP into standalone reproducer (attached),
> if you want to give a try. It usually hangs my system within ~30
> minutes. If it takes too long, you can try disabling swap. From my past
> experience this usually helped to reproduce it faster on small KVM guests.
> 
> # gcc oom_mlock.c -pthread -O2
> # echo 1 > /proc/sys/vm/overcommit_memory
> (optionally) # swapoff -a
> # ./a.out
> 
> Also, it's interesting to note, that when I disabled mlock() calls
> test ran fine over night. I'll look into confirming this observation
> on more systems.
> 

Thank you for a reproducer. I tried it with

----------
--- oom_mlock.c
+++ oom_mlock.c
@@ -33,7 +33,7 @@
 	if (s == MAP_FAILED)
 		return errno;
 
-	if (do_mlock) {
+	if (0 && do_mlock) {
 		while (mlock(s, length) == -1 && loop > 0) {
 			if (EAGAIN != errno)
 				return errno;
----------

applied (i.e. disabled mlock() calls) on a VM with 4CPUs / 5120MB RAM, and
successfully reproduced a livelock. Therefore, I think mlock() is irrelevant.

What I observed is that while disk_events_workfn workqueue item was looping,
"Node 0 Normal free:" remained smaller than "min:" but "Node 0 Normal:" was
larger than "Node 0 Normal free:".

Is this difference caused by pending vmstat_update, vmstat_shepherd, vmpressure_work_fn ?
Can we somehow check how long these workqueue items remained pending?

Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160126.txt.xz .
----------
[  312.018243] Out of memory: Kill process 9785 (a.out) score 937 or sacrifice child
[  312.020469] Killed process 9785 (a.out) total-vm:6314312kB, anon-rss:4637568kB, file-rss:0kB
[  323.867935] MemAlloc-Info: 4 stalling task, 0 dying task, 0 victim task.
[  323.870075] MemAlloc: kworker/1:8(9698) seq=10 gfp=0x2400000 order=0 delay=9828
[  323.872259] kworker/1:8     R  running task        0  9698      2 0x00000080
[  323.874452] Workqueue: events_freezable_power_ disk_events_workfn
(...snipped...)
[  324.194104] Mem-Info:
[  324.195095] active_anon:1165831 inactive_anon:3486 isolated_anon:0
[  324.195095]  active_file:1 inactive_file:0 isolated_file:0
[  324.195095]  unevictable:0 dirty:0 writeback:0 unstable:0
[  324.195095]  slab_reclaimable:1906 slab_unreclaimable:5202
[  324.195095]  mapped:1175 shmem:4204 pagetables:2555 bounce:0
[  324.195095]  free:8087 free_pcp:0 free_cma:0
[  324.204574] Node 0 DMA free:15888kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[  324.213795] lowmem_reserve[]: 0 2708 4673 4673
[  324.215372] Node 0 DMA32 free:12836kB min:5008kB low:6260kB high:7512kB active_anon:2712464kB inactive_anon:7984kB active_file:4kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2776588kB mlocked:0kB dirty:0kB writeback:0kB mapped:2880kB shmem:9704kB slab_reclaimable:4556kB slab_unreclaimable:9116kB kernel_stack:1184kB pagetables:4644kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[  324.226159] lowmem_reserve[]: 0 0 1965 1965
[  324.227795] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:184 all_unreclaimable? yes
[  324.238987] lowmem_reserve[]: 0 0 0 0
[  324.240614] Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 2*32kB (U) 3*64kB (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
[  324.244417] Node 0 DMA32: 38*4kB (UM) 24*8kB (UME) 146*16kB (UME) 139*32kB (UME) 52*64kB (UME) 11*128kB (UME) 2*256kB (UM) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 12888kB
[  324.249520] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
[  324.253481] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  324.255910] 4219 total pagecache pages
[  324.257497] 0 pages in swap cache
[  324.258996] Swap cache stats: add 0, delete 0, find 0/0
[  324.260835] Free swap  = 0kB
[  324.262243] Total swap = 0kB
[  324.263648] 1310589 pages RAM
[  324.265051] 0 pages HighMem/MovableOnly
[  324.266607] 109257 pages reserved
[  324.268079] 0 pages hwpoisoned
[  324.269503] Showing busy workqueues and worker pools:
[  324.271285] workqueue events: flags=0x0
[  324.272865]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=3/256
[  324.274933]     pending: vmstat_shepherd, vmpressure_work_fn, vmw_fb_dirty_flush [vmwgfx]
[  324.277426] workqueue events_freezable_power_: flags=0x84
[  324.279303]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  324.281398]     in-flight: 9698:disk_events_workfn
[  324.283242] workqueue vmstat: flags=0xc
[  324.284849]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  324.286947]     pending: vmstat_update
[  324.288647] pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=10 idle: 9696 14 9665 9701 9675 9664 407 46 9691
(...snipped...)
[  334.771342] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:184 all_unreclaimable? yes
(...snipped...)
[  334.792901] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  345.351647] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:184 all_unreclaimable? yes
(...snipped...)
[  345.373938] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  355.983345] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:184 all_unreclaimable? yes
(...snipped...)
[  356.008948] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  366.569312] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  366.591131] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  377.255040] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  377.276782] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  387.948890] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  387.970629] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  398.582824] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  398.604575] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  409.307406] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  409.329012] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  419.866932] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  419.888642] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  430.444391] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  430.469660] Node 0 Normal: 24*4kB (UM) 0*8kB 3*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3760kB
(...snipped...)
[  441.055258] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  441.078312] Node 0 Normal: 24*4kB (UM) 0*8kB 3*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3760kB
(...snipped...)
[  451.624394] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  451.646614] Node 0 Normal: 24*4kB (UM) 0*8kB 3*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3760kB
(...snipped...)
[  461.701021] MemAlloc-Info: 6 stalling task, 0 dying task, 0 victim task.
(...snipped...)
[  461.798182] MemAlloc: kworker/1:8(9698) seq=10 gfp=0x2400000 order=0 delay=147661
[  461.800145] kworker/1:8     R  running task        0  9698      2 0x00000080
[  461.802061] Workqueue: events_freezable_power_ disk_events_workfn
(...snipped...)
[  462.141865] Mem-Info:
[  462.142877] active_anon:1165831 inactive_anon:3486 isolated_anon:0
[  462.142877]  active_file:1 inactive_file:0 isolated_file:0
[  462.142877]  unevictable:0 dirty:0 writeback:0 unstable:0
[  462.142877]  slab_reclaimable:1906 slab_unreclaimable:5202
[  462.142877]  mapped:1175 shmem:4204 pagetables:2555 bounce:0
[  462.142877]  free:8087 free_pcp:0 free_cma:0
[  462.152641] Node 0 DMA free:15888kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[  462.161919] lowmem_reserve[]: 0 2708 4673 4673
[  462.163515] Node 0 DMA32 free:12836kB min:5008kB low:6260kB high:7512kB active_anon:2712464kB inactive_anon:7984kB active_file:4kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2776588kB mlocked:0kB dirty:0kB writeback:0kB mapped:2880kB shmem:9704kB slab_reclaimable:4556kB slab_unreclaimable:9116kB kernel_stack:1184kB pagetables:4644kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[  462.174286] lowmem_reserve[]: 0 0 1965 1965
[  462.175919] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[  462.187058] lowmem_reserve[]: 0 0 0 0
[  462.188704] Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 2*32kB (U) 3*64kB (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
[  462.192473] Node 0 DMA32: 38*4kB (UM) 24*8kB (UME) 146*16kB (UME) 139*32kB (UME) 52*64kB (UME) 11*128kB (UME) 2*256kB (UM) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 12888kB
[  462.198054] Node 0 Normal: 24*4kB (UM) 0*8kB 3*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3760kB
[  462.202046] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  462.204481] 4219 total pagecache pages
[  462.206068] 0 pages in swap cache
[  462.207569] Swap cache stats: add 0, delete 0, find 0/0
[  462.209416] Free swap  = 0kB
[  462.210833] Total swap = 0kB
[  462.212240] 1310589 pages RAM
[  462.213656] 0 pages HighMem/MovableOnly
[  462.215229] 109257 pages reserved
[  462.216696] 0 pages hwpoisoned
[  462.218116] Showing busy workqueues and worker pools:
[  462.219928] workqueue events: flags=0x0
[  462.221501]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=3/256
[  462.223582]     pending: vmstat_shepherd, vmpressure_work_fn, vmw_fb_dirty_flush [vmwgfx]
[  462.226085] workqueue events_power_efficient: flags=0x80
[  462.227943]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  462.230056]     pending: check_lifetime
[  462.231713] workqueue events_freezable_power_: flags=0x84
[  462.233577]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  462.235693]     in-flight: 9698:disk_events_workfn
[  462.237540] workqueue vmstat: flags=0xc
[  462.239172]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  462.241287]     pending: vmstat_update
[  462.242970] pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=10 idle: 9696 14 9665 9701 9675 9664 407 46 9691
----------

Above result was obtained using below patch on Linux 4.4.

----------
diff --git a/include/linux/sched.h b/include/linux/sched.h
index fa39434..daf2a1a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1375,6 +1375,28 @@ struct tlbflush_unmap_batch {
 	bool writable;
 };
 
+struct memalloc_info {
+	/* For locking and progress monitoring. */
+	unsigned int sequence;
+	/*
+	 * 0: not doing __GFP_RECLAIM allocation.
+	 * 1: doing non-recursive __GFP_RECLAIM allocation.
+	 * 2: doing recursive __GFP_RECLAIM allocation.
+	 */
+	u8 valid;
+	/*
+	 * bit 0: Will be reported as OOM victim.
+	 * bit 1: Will be reported as dying task.
+	 * bit 2: Will be reported as stalling task.
+	 */
+	u8 type;
+	/* Started time in jiffies as of valid == 1. */
+	unsigned long start;
+	/* Requested order and gfp flags as of valid == 1. */
+	unsigned int order;
+	gfp_t gfp;
+};
+
 struct task_struct {
 	volatile long state;	/* -1 unrunnable, 0 runnable, >0 stopped */
 	void *stack;
@@ -1813,6 +1835,9 @@ struct task_struct {
 	unsigned long	task_state_change;
 #endif
 	int pagefault_disabled;
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+	struct memalloc_info memalloc;
+#endif
 /* CPU-specific state of this task */
 	struct thread_struct thread;
 /*
diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index c9e4731..fb3004a 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -9,6 +9,9 @@ extern int sysctl_hung_task_warnings;
 extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
 					 void __user *buffer,
 					 size_t *lenp, loff_t *ppos);
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+extern unsigned long sysctl_memalloc_task_timeout_secs;
+#endif
 #else
 /* Avoid need for ifdefs elsewhere in the code */
 enum { sysctl_hung_task_timeout_secs = 0 };
diff --git a/kernel/fork.c b/kernel/fork.c
index 1155eac..894221df 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1416,6 +1416,10 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	p->sequential_io_avg	= 0;
 #endif
 
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+	p->memalloc.sequence = 0;
+#endif
+
 	/* Perform scheduler related setup. Assign this task to a CPU. */
 	retval = sched_fork(clone_flags, p);
 	if (retval)
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index e0f90c2..2abebb9 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -16,6 +16,7 @@
 #include <linux/export.h>
 #include <linux/sysctl.h>
 #include <linux/utsname.h>
+#include <linux/console.h>
 #include <trace/events/sched.h>
 
 /*
@@ -72,6 +73,207 @@ static struct notifier_block panic_block = {
 	.notifier_call = hung_task_panic,
 };
 
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+/*
+ * Zero means infinite timeout - no checking done:
+ */
+unsigned long __read_mostly sysctl_memalloc_task_timeout_secs =
+	CONFIG_DEFAULT_MEMALLOC_TASK_TIMEOUT;
+static struct memalloc_info memalloc; /* Filled by is_stalling_task(). */
+
+static long memalloc_timeout_jiffies(unsigned long last_checked, long timeout)
+{
+	struct task_struct *g, *p;
+	long t;
+	unsigned long delta;
+
+	/* timeout of 0 will disable the watchdog */
+	if (!timeout)
+		return MAX_SCHEDULE_TIMEOUT;
+	/* At least wait for timeout duration. */
+	t = last_checked - jiffies + timeout * HZ;
+	if (t > 0)
+		return t;
+	/* Calculate how long to wait more. */
+	t = timeout * HZ;
+	delta = t - jiffies;
+
+	/*
+	 * We might see outdated values in "struct memalloc_info" here.
+	 * We will recheck later using is_stalling_task().
+	 */
+	preempt_disable();
+	rcu_read_lock();
+	for_each_process_thread(g, p) {
+		if (likely(!p->memalloc.valid))
+			continue;
+		t = min_t(long, t, p->memalloc.start + delta);
+		if (unlikely(t <= 0))
+			goto stalling;
+	}
+ stalling:
+	rcu_read_unlock();
+	preempt_enable();
+	return t;
+}
+
+/**
+ * is_stalling_task - Check and copy a task's memalloc variable.
+ *
+ * @task:   A task to check.
+ * @expire: Timeout in jiffies.
+ *
+ * Returns true if a task is stalling, false otherwise.
+ */
+static bool is_stalling_task(const struct task_struct *task,
+			     const unsigned long expire)
+{
+	const struct memalloc_info *m = &task->memalloc;
+
+	/*
+	 * If start_memalloc_timer() is updating "struct memalloc_info" now,
+	 * we can ignore it because timeout jiffies cannot be expired as soon
+	 * as updating it completes.
+	 */
+	if (!m->valid || (m->sequence & 1))
+		return false;
+	smp_rmb(); /* Block start_memalloc_timer(). */
+	memalloc.start = m->start;
+	memalloc.order = m->order;
+	memalloc.gfp = m->gfp;
+	smp_rmb(); /* Unblock start_memalloc_timer(). */
+	memalloc.sequence = m->sequence;
+	/*
+	 * If start_memalloc_timer() started updating it while we read it,
+	 * we can ignore it for the same reason.
+	 */
+	if (!m->valid || (memalloc.sequence & 1))
+		return false;
+	/* This is a valid "struct memalloc_info". Check for timeout. */
+	return time_after_eq(expire, memalloc.start);
+}
+
+/* Check for memory allocation stalls. */
+static void check_memalloc_stalling_tasks(unsigned long timeout)
+{
+	char buf[128];
+	struct task_struct *g, *p;
+	unsigned long now;
+	unsigned long expire;
+	unsigned int sigkill_pending;
+	unsigned int memdie_pending;
+	unsigned int stalling_tasks;
+
+	cond_resched();
+	now = jiffies;
+	/*
+	 * Report tasks that stalled for more than half of timeout duration
+	 * because such tasks might be correlated with tasks that already
+	 * stalled for full timeout duration.
+	 */
+	expire = now - timeout * (HZ / 2);
+	/* Count stalling tasks, dying and victim tasks. */
+	sigkill_pending = 0;
+	memdie_pending = 0;
+	stalling_tasks = 0;
+	preempt_disable();
+	rcu_read_lock();
+	for_each_process_thread(g, p) {
+		u8 type = 0;
+
+		if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
+			type |= 1;
+			memdie_pending++;
+		}
+		if (fatal_signal_pending(p)) {
+			type |= 2;
+			sigkill_pending++;
+		}
+		if (is_stalling_task(p, expire)) {
+			type |= 4;
+			stalling_tasks++;
+		}
+		p->memalloc.type = type;
+	}
+	rcu_read_unlock();
+	preempt_enable();
+	if (!stalling_tasks)
+		return;
+	/* Report stalling tasks, dying and victim tasks. */
+	pr_warn("MemAlloc-Info: %u stalling task, %u dying task, %u victim task.\n",
+		stalling_tasks, sigkill_pending, memdie_pending);
+	cond_resched();
+	preempt_disable();
+	rcu_read_lock();
+ restart_report:
+	for_each_process_thread(g, p) {
+		bool can_cont;
+		u8 type;
+
+		if (likely(!p->memalloc.type))
+			continue;
+		p->memalloc.type = 0;
+		/* Recheck in case state changed meanwhile. */
+		type = 0;
+		if (test_tsk_thread_flag(p, TIF_MEMDIE))
+			type |= 1;
+		if (fatal_signal_pending(p))
+			type |= 2;
+		if (is_stalling_task(p, expire)) {
+			type |= 4;
+			snprintf(buf, sizeof(buf),
+				 " seq=%u gfp=0x%x order=%u delay=%lu",
+				 memalloc.sequence >> 1, memalloc.gfp,
+				 memalloc.order, now - memalloc.start);
+		} else {
+			buf[0] = '\0';
+		}
+		if (unlikely(!type))
+			continue;
+		/*
+		 * Victim tasks get pending SIGKILL removed before arriving at
+		 * do_exit(). Therefore, print " exiting" instead for " dying".
+		 */
+		pr_warn("MemAlloc: %s(%u)%s%s%s%s%s\n", p->comm, p->pid, buf,
+			(p->state & TASK_UNINTERRUPTIBLE) ?
+			" uninterruptible" : "",
+			(p->flags & PF_EXITING) ? " exiting" : "",
+			(type & 2) ? " dying" : "",
+			(type & 1) ? " victim" : "");
+		sched_show_task(p);
+		debug_show_held_locks(p);
+		/*
+		 * Since there could be thousands of tasks to report, we always
+		 * sleep and try to flush printk() buffer after each report, in
+		 * order to avoid RCU stalls and reduce possibility of messages
+		 * being dropped by continuous printk() flood.
+		 *
+		 * Since not yet reported tasks have p->memalloc.type > 0, we
+		 * can simply restart this loop in case "g" or "p" went away.
+		 */
+		get_task_struct(g);
+		get_task_struct(p);
+		rcu_read_unlock();
+		preempt_enable();
+		schedule_timeout_interruptible(1);
+		preempt_disable();
+		rcu_read_lock();
+		can_cont = pid_alive(g) && pid_alive(p);
+		put_task_struct(p);
+		put_task_struct(g);
+		if (!can_cont)
+			goto restart_report;
+	}
+	rcu_read_unlock();
+	preempt_enable();
+	cond_resched();
+	/* Show memory information. (SysRq-m) */
+	show_mem(0);
+	/* Show workqueue state. */
+	show_workqueue_state();
+}
+#endif /* CONFIG_DETECT_MEMALLOC_STALL_TASK */
+
 static void check_hung_task(struct task_struct *t, unsigned long timeout)
 {
 	unsigned long switch_count = t->nvcsw + t->nivcsw;
@@ -185,10 +387,12 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	rcu_read_unlock();
 }
 
-static unsigned long timeout_jiffies(unsigned long timeout)
+static unsigned long hung_timeout_jiffies(long last_checked, long timeout)
 {
 	/* timeout of 0 will disable the watchdog */
-	return timeout ? timeout * HZ : MAX_SCHEDULE_TIMEOUT;
+	if (!timeout)
+		return MAX_SCHEDULE_TIMEOUT;
+	return last_checked - jiffies + timeout * HZ;
 }
 
 /*
@@ -224,18 +428,36 @@ EXPORT_SYMBOL_GPL(reset_hung_task_detector);
  */
 static int watchdog(void *dummy)
 {
+	unsigned long hung_last_checked = jiffies;
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+	unsigned long stall_last_checked = hung_last_checked;
+#endif
+
 	set_user_nice(current, 0);
 
 	for ( ; ; ) {
 		unsigned long timeout = sysctl_hung_task_timeout_secs;
-
-		while (schedule_timeout_interruptible(timeout_jiffies(timeout)))
-			timeout = sysctl_hung_task_timeout_secs;
-
-		if (atomic_xchg(&reset_hung_task, 0))
+		long t = hung_timeout_jiffies(hung_last_checked, timeout);
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+		unsigned long timeout2 = sysctl_memalloc_task_timeout_secs;
+		long t2 = memalloc_timeout_jiffies(stall_last_checked,
+						   timeout2);
+
+		if (t2 <= 0) {
+			check_memalloc_stalling_tasks(timeout2);
+			stall_last_checked = jiffies;
 			continue;
-
-		check_hung_uninterruptible_tasks(timeout);
+		}
+#else
+		long t2 = t;
+#endif
+		if (t <= 0) {
+			if (!atomic_xchg(&reset_hung_task, 0))
+				check_hung_uninterruptible_tasks(timeout);
+			hung_last_checked = jiffies;
+			continue;
+		}
+		schedule_timeout_interruptible(min(t, t2));
 	}
 
 	return 0;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index dc6858d..96460aa 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1061,6 +1061,16 @@ static struct ctl_table kern_table[] = {
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= &neg_one,
 	},
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+	{
+		.procname	= "memalloc_task_timeout_secs",
+		.data		= &sysctl_memalloc_task_timeout_secs,
+		.maxlen		= sizeof(unsigned long),
+		.mode		= 0644,
+		.proc_handler	= proc_dohung_task_timeout_secs,
+		.extra2		= &hung_task_timeout_max,
+	},
+#endif
 #endif
 #ifdef CONFIG_COMPAT
 	{
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 8c15b29..26d2c91 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -812,6 +812,30 @@ config BOOTPARAM_HUNG_TASK_PANIC_VALUE
 	default 0 if !BOOTPARAM_HUNG_TASK_PANIC
 	default 1 if BOOTPARAM_HUNG_TASK_PANIC
 
+config DETECT_MEMALLOC_STALL_TASK
+	bool "Detect tasks stalling inside memory allocator"
+	default n
+	depends on DETECT_HUNG_TASK
+	help
+	  This option emits warning messages and traces when memory
+	  allocation requests are stalling, in order to catch unexplained
+	  hangups/reboots caused by memory allocation stalls.
+
+config DEFAULT_MEMALLOC_TASK_TIMEOUT
+	int "Default timeout for stalling task detection (in seconds)"
+	depends on DETECT_MEMALLOC_STALL_TASK
+	default 10
+	help
+	  This option controls the default timeout (in seconds) used
+	  to determine when a task has become non-responsive and should
+	  be considered stalling inside memory allocator.
+
+	  It can be adjusted at runtime via the kernel.memalloc_task_timeout_secs
+	  sysctl or by writing a value to
+	  /proc/sys/kernel/memalloc_task_timeout_secs.
+
+	  A timeout of 0 disables the check. The default is 10 seconds.
+
 endmenu # "Debug lockups and hangs"
 
 config PANIC_ON_OOPS
diff --git a/mm/mlock.c b/mm/mlock.c
index 339d9e0..d6006b1 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -172,7 +172,7 @@ static void __munlock_isolation_failed(struct page *page)
  */
 unsigned int munlock_vma_page(struct page *page)
 {
-	unsigned int nr_pages;
+	int nr_pages;
 	struct zone *zone = page_zone(page);
 
 	/* For try_to_munlock() and to serialize with page migration */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9d666df..4e4e4b6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3165,6 +3165,37 @@ got_pg:
 	return page;
 }
 
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+static void start_memalloc_timer(const gfp_t gfp_mask, const int order)
+{
+	struct memalloc_info *m = &current->memalloc;
+
+	/* We don't check for stalls for !__GFP_RECLAIM allocations. */
+	if (!(gfp_mask & __GFP_RECLAIM))
+		return;
+	/* We don't check for stalls for nested __GFP_RECLAIM allocations */
+	if (!m->valid) {
+		m->sequence++;
+		smp_wmb(); /* Block is_stalling_task(). */
+		m->start = jiffies;
+		m->order = order;
+		m->gfp = gfp_mask;
+		smp_wmb(); /* Unblock is_stalling_task(). */
+		m->sequence++;
+	}
+	m->valid++;
+}
+
+static void stop_memalloc_timer(const gfp_t gfp_mask)
+{
+	if (gfp_mask & __GFP_RECLAIM)
+		current->memalloc.valid--;
+}
+#else
+#define start_memalloc_timer(gfp_mask, order) do { } while (0)
+#define stop_memalloc_timer(gfp_mask) do { } while (0)
+#endif
+
 /*
  * This is the 'heart' of the zoned buddy allocator.
  */
@@ -3232,7 +3263,9 @@ retry_cpuset:
 		alloc_mask = memalloc_noio_flags(gfp_mask);
 		ac.spread_dirty_pages = false;
 
+		start_memalloc_timer(alloc_mask, order);
 		page = __alloc_pages_slowpath(alloc_mask, order, &ac);
+		stop_memalloc_timer(alloc_mask);
 	}
 
 	if (kmemcheck_enabled && page)
----------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
@ 2016-01-26 14:46         ` Tetsuo Handa
  0 siblings, 0 replies; 26+ messages in thread
From: Tetsuo Handa @ 2016-01-26 14:46 UTC (permalink / raw)
  To: ltp

Jan Stancek wrote:
> On 01/22/2016 04:14 PM, Jan Stancek wrote:
> > On 01/19/2016 11:29 AM, Tetsuo Handa wrote:
> >> although I
> >> couldn't find evidence that mlock() and madvice() are related with this hangup,
> > 
> > I simplified reproducer by having only single thread allocating
> > memory when OOM triggers:
> >   http://jan.stancek.eu/tmp/oom_hangs/console.log.3-v4.4-8606-with-memalloc.txt
> > 
> > In this instance it was mmap + mlock, as you can see from oom call trace.
> > It made it to do_exit(), but couldn't complete it:
> 
> I have extracted test from LTP into standalone reproducer (attached),
> if you want to give a try. It usually hangs my system within ~30
> minutes. If it takes too long, you can try disabling swap. From my past
> experience this usually helped to reproduce it faster on small KVM guests.
> 
> # gcc oom_mlock.c -pthread -O2
> # echo 1 > /proc/sys/vm/overcommit_memory
> (optionally) # swapoff -a
> # ./a.out
> 
> Also, it's interesting to note, that when I disabled mlock() calls
> test ran fine over night. I'll look into confirming this observation
> on more systems.
> 

Thank you for a reproducer. I tried it with

----------
--- oom_mlock.c
+++ oom_mlock.c
@@ -33,7 +33,7 @@
 	if (s == MAP_FAILED)
 		return errno;
 
-	if (do_mlock) {
+	if (0 && do_mlock) {
 		while (mlock(s, length) == -1 && loop > 0) {
 			if (EAGAIN != errno)
 				return errno;
----------

applied (i.e. disabled mlock() calls) on a VM with 4CPUs / 5120MB RAM, and
successfully reproduced a livelock. Therefore, I think mlock() is irrelevant.

What I observed is that while disk_events_workfn workqueue item was looping,
"Node 0 Normal free:" remained smaller than "min:" but "Node 0 Normal:" was
larger than "Node 0 Normal free:".

Is this difference caused by pending vmstat_update, vmstat_shepherd, vmpressure_work_fn ?
Can we somehow check how long these workqueue items remained pending?

Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160126.txt.xz .
----------
[  312.018243] Out of memory: Kill process 9785 (a.out) score 937 or sacrifice child
[  312.020469] Killed process 9785 (a.out) total-vm:6314312kB, anon-rss:4637568kB, file-rss:0kB
[  323.867935] MemAlloc-Info: 4 stalling task, 0 dying task, 0 victim task.
[  323.870075] MemAlloc: kworker/1:8(9698) seq=10 gfp=0x2400000 order=0 delay=9828
[  323.872259] kworker/1:8     R  running task        0  9698      2 0x00000080
[  323.874452] Workqueue: events_freezable_power_ disk_events_workfn
(...snipped...)
[  324.194104] Mem-Info:
[  324.195095] active_anon:1165831 inactive_anon:3486 isolated_anon:0
[  324.195095]  active_file:1 inactive_file:0 isolated_file:0
[  324.195095]  unevictable:0 dirty:0 writeback:0 unstable:0
[  324.195095]  slab_reclaimable:1906 slab_unreclaimable:5202
[  324.195095]  mapped:1175 shmem:4204 pagetables:2555 bounce:0
[  324.195095]  free:8087 free_pcp:0 free_cma:0
[  324.204574] Node 0 DMA free:15888kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[  324.213795] lowmem_reserve[]: 0 2708 4673 4673
[  324.215372] Node 0 DMA32 free:12836kB min:5008kB low:6260kB high:7512kB active_anon:2712464kB inactive_anon:7984kB active_file:4kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2776588kB mlocked:0kB dirty:0kB writeback:0kB mapped:2880kB shmem:9704kB slab_reclaimable:4556kB slab_unreclaimable:9116kB kernel_stack:1184kB pagetables:4644kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[  324.226159] lowmem_reserve[]: 0 0 1965 1965
[  324.227795] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:184 all_unreclaimable? yes
[  324.238987] lowmem_reserve[]: 0 0 0 0
[  324.240614] Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 2*32kB (U) 3*64kB (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
[  324.244417] Node 0 DMA32: 38*4kB (UM) 24*8kB (UME) 146*16kB (UME) 139*32kB (UME) 52*64kB (UME) 11*128kB (UME) 2*256kB (UM) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 12888kB
[  324.249520] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
[  324.253481] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  324.255910] 4219 total pagecache pages
[  324.257497] 0 pages in swap cache
[  324.258996] Swap cache stats: add 0, delete 0, find 0/0
[  324.260835] Free swap  = 0kB
[  324.262243] Total swap = 0kB
[  324.263648] 1310589 pages RAM
[  324.265051] 0 pages HighMem/MovableOnly
[  324.266607] 109257 pages reserved
[  324.268079] 0 pages hwpoisoned
[  324.269503] Showing busy workqueues and worker pools:
[  324.271285] workqueue events: flags=0x0
[  324.272865]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=3/256
[  324.274933]     pending: vmstat_shepherd, vmpressure_work_fn, vmw_fb_dirty_flush [vmwgfx]
[  324.277426] workqueue events_freezable_power_: flags=0x84
[  324.279303]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  324.281398]     in-flight: 9698:disk_events_workfn
[  324.283242] workqueue vmstat: flags=0xc
[  324.284849]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  324.286947]     pending: vmstat_update
[  324.288647] pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=10 idle: 9696 14 9665 9701 9675 9664 407 46 9691
(...snipped...)
[  334.771342] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:184 all_unreclaimable? yes
(...snipped...)
[  334.792901] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  345.351647] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:184 all_unreclaimable? yes
(...snipped...)
[  345.373938] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  355.983345] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:184 all_unreclaimable? yes
(...snipped...)
[  356.008948] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  366.569312] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  366.591131] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  377.255040] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  377.276782] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  387.948890] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  387.970629] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  398.582824] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  398.604575] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  409.307406] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  409.329012] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  419.866932] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  419.888642] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[  430.444391] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  430.469660] Node 0 Normal: 24*4kB (UM) 0*8kB 3*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3760kB
(...snipped...)
[  441.055258] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  441.078312] Node 0 Normal: 24*4kB (UM) 0*8kB 3*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3760kB
(...snipped...)
[  451.624394] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[  451.646614] Node 0 Normal: 24*4kB (UM) 0*8kB 3*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3760kB
(...snipped...)
[  461.701021] MemAlloc-Info: 6 stalling task, 0 dying task, 0 victim task.
(...snipped...)
[  461.798182] MemAlloc: kworker/1:8(9698) seq=10 gfp=0x2400000 order=0 delay=147661
[  461.800145] kworker/1:8     R  running task        0  9698      2 0x00000080
[  461.802061] Workqueue: events_freezable_power_ disk_events_workfn
(...snipped...)
[  462.141865] Mem-Info:
[  462.142877] active_anon:1165831 inactive_anon:3486 isolated_anon:0
[  462.142877]  active_file:1 inactive_file:0 isolated_file:0
[  462.142877]  unevictable:0 dirty:0 writeback:0 unstable:0
[  462.142877]  slab_reclaimable:1906 slab_unreclaimable:5202
[  462.142877]  mapped:1175 shmem:4204 pagetables:2555 bounce:0
[  462.142877]  free:8087 free_pcp:0 free_cma:0
[  462.152641] Node 0 DMA free:15888kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[  462.161919] lowmem_reserve[]: 0 2708 4673 4673
[  462.163515] Node 0 DMA32 free:12836kB min:5008kB low:6260kB high:7512kB active_anon:2712464kB inactive_anon:7984kB active_file:4kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2776588kB mlocked:0kB dirty:0kB writeback:0kB mapped:2880kB shmem:9704kB slab_reclaimable:4556kB slab_unreclaimable:9116kB kernel_stack:1184kB pagetables:4644kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[  462.174286] lowmem_reserve[]: 0 0 1965 1965
[  462.175919] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[  462.187058] lowmem_reserve[]: 0 0 0 0
[  462.188704] Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 2*32kB (U) 3*64kB (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
[  462.192473] Node 0 DMA32: 38*4kB (UM) 24*8kB (UME) 146*16kB (UME) 139*32kB (UME) 52*64kB (UME) 11*128kB (UME) 2*256kB (UM) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 12888kB
[  462.198054] Node 0 Normal: 24*4kB (UM) 0*8kB 3*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3760kB
[  462.202046] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  462.204481] 4219 total pagecache pages
[  462.206068] 0 pages in swap cache
[  462.207569] Swap cache stats: add 0, delete 0, find 0/0
[  462.209416] Free swap  = 0kB
[  462.210833] Total swap = 0kB
[  462.212240] 1310589 pages RAM
[  462.213656] 0 pages HighMem/MovableOnly
[  462.215229] 109257 pages reserved
[  462.216696] 0 pages hwpoisoned
[  462.218116] Showing busy workqueues and worker pools:
[  462.219928] workqueue events: flags=0x0
[  462.221501]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=3/256
[  462.223582]     pending: vmstat_shepherd, vmpressure_work_fn, vmw_fb_dirty_flush [vmwgfx]
[  462.226085] workqueue events_power_efficient: flags=0x80
[  462.227943]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  462.230056]     pending: check_lifetime
[  462.231713] workqueue events_freezable_power_: flags=0x84
[  462.233577]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  462.235693]     in-flight: 9698:disk_events_workfn
[  462.237540] workqueue vmstat: flags=0xc
[  462.239172]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  462.241287]     pending: vmstat_update
[  462.242970] pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=10 idle: 9696 14 9665 9701 9675 9664 407 46 9691
----------

Above result was obtained using below patch on Linux 4.4.

----------
diff --git a/include/linux/sched.h b/include/linux/sched.h
index fa39434..daf2a1a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1375,6 +1375,28 @@ struct tlbflush_unmap_batch {
 	bool writable;
 };
 
+struct memalloc_info {
+	/* For locking and progress monitoring. */
+	unsigned int sequence;
+	/*
+	 * 0: not doing __GFP_RECLAIM allocation.
+	 * 1: doing non-recursive __GFP_RECLAIM allocation.
+	 * 2: doing recursive __GFP_RECLAIM allocation.
+	 */
+	u8 valid;
+	/*
+	 * bit 0: Will be reported as OOM victim.
+	 * bit 1: Will be reported as dying task.
+	 * bit 2: Will be reported as stalling task.
+	 */
+	u8 type;
+	/* Started time in jiffies as of valid == 1. */
+	unsigned long start;
+	/* Requested order and gfp flags as of valid == 1. */
+	unsigned int order;
+	gfp_t gfp;
+};
+
 struct task_struct {
 	volatile long state;	/* -1 unrunnable, 0 runnable, >0 stopped */
 	void *stack;
@@ -1813,6 +1835,9 @@ struct task_struct {
 	unsigned long	task_state_change;
 #endif
 	int pagefault_disabled;
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+	struct memalloc_info memalloc;
+#endif
 /* CPU-specific state of this task */
 	struct thread_struct thread;
 /*
diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index c9e4731..fb3004a 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -9,6 +9,9 @@ extern int sysctl_hung_task_warnings;
 extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
 					 void __user *buffer,
 					 size_t *lenp, loff_t *ppos);
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+extern unsigned long sysctl_memalloc_task_timeout_secs;
+#endif
 #else
 /* Avoid need for ifdefs elsewhere in the code */
 enum { sysctl_hung_task_timeout_secs = 0 };
diff --git a/kernel/fork.c b/kernel/fork.c
index 1155eac..894221df 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1416,6 +1416,10 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	p->sequential_io_avg	= 0;
 #endif
 
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+	p->memalloc.sequence = 0;
+#endif
+
 	/* Perform scheduler related setup. Assign this task to a CPU. */
 	retval = sched_fork(clone_flags, p);
 	if (retval)
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index e0f90c2..2abebb9 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -16,6 +16,7 @@
 #include <linux/export.h>
 #include <linux/sysctl.h>
 #include <linux/utsname.h>
+#include <linux/console.h>
 #include <trace/events/sched.h>
 
 /*
@@ -72,6 +73,207 @@ static struct notifier_block panic_block = {
 	.notifier_call = hung_task_panic,
 };
 
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+/*
+ * Zero means infinite timeout - no checking done:
+ */
+unsigned long __read_mostly sysctl_memalloc_task_timeout_secs =
+	CONFIG_DEFAULT_MEMALLOC_TASK_TIMEOUT;
+static struct memalloc_info memalloc; /* Filled by is_stalling_task(). */
+
+static long memalloc_timeout_jiffies(unsigned long last_checked, long timeout)
+{
+	struct task_struct *g, *p;
+	long t;
+	unsigned long delta;
+
+	/* timeout of 0 will disable the watchdog */
+	if (!timeout)
+		return MAX_SCHEDULE_TIMEOUT;
+	/* At least wait for timeout duration. */
+	t = last_checked - jiffies + timeout * HZ;
+	if (t > 0)
+		return t;
+	/* Calculate how long to wait more. */
+	t = timeout * HZ;
+	delta = t - jiffies;
+
+	/*
+	 * We might see outdated values in "struct memalloc_info" here.
+	 * We will recheck later using is_stalling_task().
+	 */
+	preempt_disable();
+	rcu_read_lock();
+	for_each_process_thread(g, p) {
+		if (likely(!p->memalloc.valid))
+			continue;
+		t = min_t(long, t, p->memalloc.start + delta);
+		if (unlikely(t <= 0))
+			goto stalling;
+	}
+ stalling:
+	rcu_read_unlock();
+	preempt_enable();
+	return t;
+}
+
+/**
+ * is_stalling_task - Check and copy a task's memalloc variable.
+ *
+ * @task:   A task to check.
+ * @expire: Timeout in jiffies.
+ *
+ * Returns true if a task is stalling, false otherwise.
+ */
+static bool is_stalling_task(const struct task_struct *task,
+			     const unsigned long expire)
+{
+	const struct memalloc_info *m = &task->memalloc;
+
+	/*
+	 * If start_memalloc_timer() is updating "struct memalloc_info" now,
+	 * we can ignore it because timeout jiffies cannot be expired as soon
+	 * as updating it completes.
+	 */
+	if (!m->valid || (m->sequence & 1))
+		return false;
+	smp_rmb(); /* Block start_memalloc_timer(). */
+	memalloc.start = m->start;
+	memalloc.order = m->order;
+	memalloc.gfp = m->gfp;
+	smp_rmb(); /* Unblock start_memalloc_timer(). */
+	memalloc.sequence = m->sequence;
+	/*
+	 * If start_memalloc_timer() started updating it while we read it,
+	 * we can ignore it for the same reason.
+	 */
+	if (!m->valid || (memalloc.sequence & 1))
+		return false;
+	/* This is a valid "struct memalloc_info". Check for timeout. */
+	return time_after_eq(expire, memalloc.start);
+}
+
+/* Check for memory allocation stalls. */
+static void check_memalloc_stalling_tasks(unsigned long timeout)
+{
+	char buf[128];
+	struct task_struct *g, *p;
+	unsigned long now;
+	unsigned long expire;
+	unsigned int sigkill_pending;
+	unsigned int memdie_pending;
+	unsigned int stalling_tasks;
+
+	cond_resched();
+	now = jiffies;
+	/*
+	 * Report tasks that stalled for more than half of timeout duration
+	 * because such tasks might be correlated with tasks that already
+	 * stalled for full timeout duration.
+	 */
+	expire = now - timeout * (HZ / 2);
+	/* Count stalling tasks, dying and victim tasks. */
+	sigkill_pending = 0;
+	memdie_pending = 0;
+	stalling_tasks = 0;
+	preempt_disable();
+	rcu_read_lock();
+	for_each_process_thread(g, p) {
+		u8 type = 0;
+
+		if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
+			type |= 1;
+			memdie_pending++;
+		}
+		if (fatal_signal_pending(p)) {
+			type |= 2;
+			sigkill_pending++;
+		}
+		if (is_stalling_task(p, expire)) {
+			type |= 4;
+			stalling_tasks++;
+		}
+		p->memalloc.type = type;
+	}
+	rcu_read_unlock();
+	preempt_enable();
+	if (!stalling_tasks)
+		return;
+	/* Report stalling tasks, dying and victim tasks. */
+	pr_warn("MemAlloc-Info: %u stalling task, %u dying task, %u victim task.\n",
+		stalling_tasks, sigkill_pending, memdie_pending);
+	cond_resched();
+	preempt_disable();
+	rcu_read_lock();
+ restart_report:
+	for_each_process_thread(g, p) {
+		bool can_cont;
+		u8 type;
+
+		if (likely(!p->memalloc.type))
+			continue;
+		p->memalloc.type = 0;
+		/* Recheck in case state changed meanwhile. */
+		type = 0;
+		if (test_tsk_thread_flag(p, TIF_MEMDIE))
+			type |= 1;
+		if (fatal_signal_pending(p))
+			type |= 2;
+		if (is_stalling_task(p, expire)) {
+			type |= 4;
+			snprintf(buf, sizeof(buf),
+				 " seq=%u gfp=0x%x order=%u delay=%lu",
+				 memalloc.sequence >> 1, memalloc.gfp,
+				 memalloc.order, now - memalloc.start);
+		} else {
+			buf[0] = '\0';
+		}
+		if (unlikely(!type))
+			continue;
+		/*
+		 * Victim tasks get pending SIGKILL removed before arriving at
+		 * do_exit(). Therefore, print " exiting" instead for " dying".
+		 */
+		pr_warn("MemAlloc: %s(%u)%s%s%s%s%s\n", p->comm, p->pid, buf,
+			(p->state & TASK_UNINTERRUPTIBLE) ?
+			" uninterruptible" : "",
+			(p->flags & PF_EXITING) ? " exiting" : "",
+			(type & 2) ? " dying" : "",
+			(type & 1) ? " victim" : "");
+		sched_show_task(p);
+		debug_show_held_locks(p);
+		/*
+		 * Since there could be thousands of tasks to report, we always
+		 * sleep and try to flush printk() buffer after each report, in
+		 * order to avoid RCU stalls and reduce possibility of messages
+		 * being dropped by continuous printk() flood.
+		 *
+		 * Since not yet reported tasks have p->memalloc.type > 0, we
+		 * can simply restart this loop in case "g" or "p" went away.
+		 */
+		get_task_struct(g);
+		get_task_struct(p);
+		rcu_read_unlock();
+		preempt_enable();
+		schedule_timeout_interruptible(1);
+		preempt_disable();
+		rcu_read_lock();
+		can_cont = pid_alive(g) && pid_alive(p);
+		put_task_struct(p);
+		put_task_struct(g);
+		if (!can_cont)
+			goto restart_report;
+	}
+	rcu_read_unlock();
+	preempt_enable();
+	cond_resched();
+	/* Show memory information. (SysRq-m) */
+	show_mem(0);
+	/* Show workqueue state. */
+	show_workqueue_state();
+}
+#endif /* CONFIG_DETECT_MEMALLOC_STALL_TASK */
+
 static void check_hung_task(struct task_struct *t, unsigned long timeout)
 {
 	unsigned long switch_count = t->nvcsw + t->nivcsw;
@@ -185,10 +387,12 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 	rcu_read_unlock();
 }
 
-static unsigned long timeout_jiffies(unsigned long timeout)
+static unsigned long hung_timeout_jiffies(long last_checked, long timeout)
 {
 	/* timeout of 0 will disable the watchdog */
-	return timeout ? timeout * HZ : MAX_SCHEDULE_TIMEOUT;
+	if (!timeout)
+		return MAX_SCHEDULE_TIMEOUT;
+	return last_checked - jiffies + timeout * HZ;
 }
 
 /*
@@ -224,18 +428,36 @@ EXPORT_SYMBOL_GPL(reset_hung_task_detector);
  */
 static int watchdog(void *dummy)
 {
+	unsigned long hung_last_checked = jiffies;
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+	unsigned long stall_last_checked = hung_last_checked;
+#endif
+
 	set_user_nice(current, 0);
 
 	for ( ; ; ) {
 		unsigned long timeout = sysctl_hung_task_timeout_secs;
-
-		while (schedule_timeout_interruptible(timeout_jiffies(timeout)))
-			timeout = sysctl_hung_task_timeout_secs;
-
-		if (atomic_xchg(&reset_hung_task, 0))
+		long t = hung_timeout_jiffies(hung_last_checked, timeout);
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+		unsigned long timeout2 = sysctl_memalloc_task_timeout_secs;
+		long t2 = memalloc_timeout_jiffies(stall_last_checked,
+						   timeout2);
+
+		if (t2 <= 0) {
+			check_memalloc_stalling_tasks(timeout2);
+			stall_last_checked = jiffies;
 			continue;
-
-		check_hung_uninterruptible_tasks(timeout);
+		}
+#else
+		long t2 = t;
+#endif
+		if (t <= 0) {
+			if (!atomic_xchg(&reset_hung_task, 0))
+				check_hung_uninterruptible_tasks(timeout);
+			hung_last_checked = jiffies;
+			continue;
+		}
+		schedule_timeout_interruptible(min(t, t2));
 	}
 
 	return 0;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index dc6858d..96460aa 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1061,6 +1061,16 @@ static struct ctl_table kern_table[] = {
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= &neg_one,
 	},
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+	{
+		.procname	= "memalloc_task_timeout_secs",
+		.data		= &sysctl_memalloc_task_timeout_secs,
+		.maxlen		= sizeof(unsigned long),
+		.mode		= 0644,
+		.proc_handler	= proc_dohung_task_timeout_secs,
+		.extra2		= &hung_task_timeout_max,
+	},
+#endif
 #endif
 #ifdef CONFIG_COMPAT
 	{
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 8c15b29..26d2c91 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -812,6 +812,30 @@ config BOOTPARAM_HUNG_TASK_PANIC_VALUE
 	default 0 if !BOOTPARAM_HUNG_TASK_PANIC
 	default 1 if BOOTPARAM_HUNG_TASK_PANIC
 
+config DETECT_MEMALLOC_STALL_TASK
+	bool "Detect tasks stalling inside memory allocator"
+	default n
+	depends on DETECT_HUNG_TASK
+	help
+	  This option emits warning messages and traces when memory
+	  allocation requests are stalling, in order to catch unexplained
+	  hangups/reboots caused by memory allocation stalls.
+
+config DEFAULT_MEMALLOC_TASK_TIMEOUT
+	int "Default timeout for stalling task detection (in seconds)"
+	depends on DETECT_MEMALLOC_STALL_TASK
+	default 10
+	help
+	  This option controls the default timeout (in seconds) used
+	  to determine when a task has become non-responsive and should
+	  be considered stalling inside memory allocator.
+
+	  It can be adjusted@runtime via the kernel.memalloc_task_timeout_secs
+	  sysctl or by writing a value to
+	  /proc/sys/kernel/memalloc_task_timeout_secs.
+
+	  A timeout of 0 disables the check. The default is 10 seconds.
+
 endmenu # "Debug lockups and hangs"
 
 config PANIC_ON_OOPS
diff --git a/mm/mlock.c b/mm/mlock.c
index 339d9e0..d6006b1 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -172,7 +172,7 @@ static void __munlock_isolation_failed(struct page *page)
  */
 unsigned int munlock_vma_page(struct page *page)
 {
-	unsigned int nr_pages;
+	int nr_pages;
 	struct zone *zone = page_zone(page);
 
 	/* For try_to_munlock() and to serialize with page migration */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9d666df..4e4e4b6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3165,6 +3165,37 @@ got_pg:
 	return page;
 }
 
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+static void start_memalloc_timer(const gfp_t gfp_mask, const int order)
+{
+	struct memalloc_info *m = &current->memalloc;
+
+	/* We don't check for stalls for !__GFP_RECLAIM allocations. */
+	if (!(gfp_mask & __GFP_RECLAIM))
+		return;
+	/* We don't check for stalls for nested __GFP_RECLAIM allocations */
+	if (!m->valid) {
+		m->sequence++;
+		smp_wmb(); /* Block is_stalling_task(). */
+		m->start = jiffies;
+		m->order = order;
+		m->gfp = gfp_mask;
+		smp_wmb(); /* Unblock is_stalling_task(). */
+		m->sequence++;
+	}
+	m->valid++;
+}
+
+static void stop_memalloc_timer(const gfp_t gfp_mask)
+{
+	if (gfp_mask & __GFP_RECLAIM)
+		current->memalloc.valid--;
+}
+#else
+#define start_memalloc_timer(gfp_mask, order) do { } while (0)
+#define stop_memalloc_timer(gfp_mask) do { } while (0)
+#endif
+
 /*
  * This is the 'heart' of the zoned buddy allocator.
  */
@@ -3232,7 +3263,9 @@ retry_cpuset:
 		alloc_mask = memalloc_noio_flags(gfp_mask);
 		ac.spread_dirty_pages = false;
 
+		start_memalloc_timer(alloc_mask, order);
 		page = __alloc_pages_slowpath(alloc_mask, order, &ac);
+		stop_memalloc_timer(alloc_mask);
 	}
 
 	if (kmemcheck_enabled && page)
----------

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
  2016-01-26 14:46         ` Tetsuo Handa
  (?)
@ 2016-01-27 11:02         ` Tetsuo Handa
  2016-01-28 15:48           ` Tetsuo Handa
  -1 siblings, 1 reply; 26+ messages in thread
From: Tetsuo Handa @ 2016-01-27 11:02 UTC (permalink / raw)
  To: mhocko; +Cc: tj, clameter, js1304, arekm, akpm, torvalds, jstancek, linux-mm

Inviting people who involved in commit 373ccbe5927034b5 "mm, vmstat: allow
WQ concurrency to discover memory reclaim doesn't make any progress".

In this thread, Jan hit an OOM stall where free memory does not increase
even after OOM victim and dying tasks terminated. I'm wondering why such
thing can happen. Jan established a reproducer and I tried it.

I'm observing vmstat_update workqueue item forever remains pending.
Didn't we make sure that vmstat_update is processed when memory allocation
is stalling?
----------------------------------------
Tetsuo Handa wrote:
> Jan Stancek wrote:
> > On 01/22/2016 04:14 PM, Jan Stancek wrote:
> > > On 01/19/2016 11:29 AM, Tetsuo Handa wrote:
> > >> although I
> > >> couldn't find evidence that mlock() and madvice() are related with this hangup,
> > > 
> > > I simplified reproducer by having only single thread allocating
> > > memory when OOM triggers:
> > >   http://jan.stancek.eu/tmp/oom_hangs/console.log.3-v4.4-8606-with-memalloc.txt
> > > 
> > > In this instance it was mmap + mlock, as you can see from oom call trace.
> > > It made it to do_exit(), but couldn't complete it:
> > 
> > I have extracted test from LTP into standalone reproducer (attached),
> > if you want to give a try. It usually hangs my system within ~30
> > minutes. If it takes too long, you can try disabling swap. From my past
> > experience this usually helped to reproduce it faster on small KVM guests.
> > 
> > # gcc oom_mlock.c -pthread -O2
> > # echo 1 > /proc/sys/vm/overcommit_memory
> > (optionally) # swapoff -a
> > # ./a.out
> > 
> > Also, it's interesting to note, that when I disabled mlock() calls
> > test ran fine over night. I'll look into confirming this observation
> > on more systems.
> > 
> 
> Thank you for a reproducer. I tried it with
> 
> ----------
> --- oom_mlock.c
> +++ oom_mlock.c
> @@ -33,7 +33,7 @@
>  	if (s == MAP_FAILED)
>  		return errno;
>  
> -	if (do_mlock) {
> +	if (0 && do_mlock) {
>  		while (mlock(s, length) == -1 && loop > 0) {
>  			if (EAGAIN != errno)
>  				return errno;
> ----------
> 
> applied (i.e. disabled mlock() calls) on a VM with 4CPUs / 5120MB RAM, and
> successfully reproduced a livelock. Therefore, I think mlock() is irrelevant.
> 
> What I observed is that while disk_events_workfn workqueue item was looping,
> "Node 0 Normal free:" remained smaller than "min:" but "Node 0 Normal:" was
> larger than "Node 0 Normal free:".
> 
> Is this difference caused by pending vmstat_update, vmstat_shepherd, vmpressure_work_fn ?
> Can we somehow check how long these workqueue items remained pending?
> 
I added a counter to workqueue item (on top of yesterday's patch) for checking
how long workqueue items remained pending.

----------
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 0197358..fb1ebfc 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -101,6 +101,7 @@ struct work_struct {
 	atomic_long_t data;
 	struct list_head entry;
 	work_func_t func;
+	unsigned long inserted_time;
 #ifdef CONFIG_LOCKDEP
 	struct lockdep_map lockdep_map;
 #endif
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c579dba..579ea82 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -617,6 +617,7 @@ static inline void set_work_data(struct work_struct *work, unsigned long data,
 static void set_work_pwq(struct work_struct *work, struct pool_workqueue *pwq,
 			 unsigned long extra_flags)
 {
+	work->inserted_time = jiffies;
 	set_work_data(work, (unsigned long)pwq,
 		      WORK_STRUCT_PENDING | WORK_STRUCT_PWQ | extra_flags);
 }
@@ -624,6 +625,7 @@ static void set_work_pwq(struct work_struct *work, struct pool_workqueue *pwq,
 static void set_work_pool_and_keep_pending(struct work_struct *work,
 					   int pool_id)
 {
+	work->inserted_time = jiffies;
 	set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT,
 		      WORK_STRUCT_PENDING);
 }
@@ -4173,7 +4175,7 @@ static void pr_cont_pool_info(struct worker_pool *pool)
 	pr_cont(" flags=0x%x nice=%d", pool->flags, pool->attrs->nice);
 }

-static void pr_cont_work(bool comma, struct work_struct *work)
+static void pr_cont_work(bool comma, struct work_struct *work, const unsigned long current_time)
 {
 	if (work->func == wq_barrier_func) {
 		struct wq_barrier *barr;
@@ -4185,9 +4187,10 @@ static void pr_cont_work(bool comma, struct work_struct *work)
 	} else {
 		pr_cont("%s %pf", comma ? "," : "", work->func);
 	}
+	pr_cont("(delay=%lu)", current_time - work->inserted_time);
 }

-static void show_pwq(struct pool_workqueue *pwq)
+static void show_pwq(struct pool_workqueue *pwq, const unsigned long current_time)
 {
 	struct worker_pool *pool = pwq->pool;
 	struct work_struct *work;
@@ -4215,12 +4218,13 @@ static void show_pwq(struct pool_workqueue *pwq)
 			if (worker->current_pwq != pwq)
 				continue;

-			pr_cont("%s %d%s:%pf", comma ? "," : "",
+			work = READ_ONCE(worker->current_work);
+			pr_cont("%s %d%s:%pf(delay=%lu)", comma ? "," : "",
 				task_pid_nr(worker->task),
 				worker == pwq->wq->rescuer ? "(RESCUER)" : "",
-				worker->current_func);
+				worker->current_func, work ? current_time - work->inserted_time : 0);
 			list_for_each_entry(work, &worker->scheduled, entry)
-				pr_cont_work(false, work);
+				pr_cont_work(false, work, current_time);
 			comma = true;
 		}
 		pr_cont("\n");
@@ -4240,7 +4244,7 @@ static void show_pwq(struct pool_workqueue *pwq)
 			if (get_work_pwq(work) != pwq)
 				continue;

-			pr_cont_work(comma, work);
+			pr_cont_work(comma, work, current_time);
 			comma = !(*work_data_bits(work) & WORK_STRUCT_LINKED);
 		}
 		pr_cont("\n");
@@ -4251,7 +4255,7 @@ static void show_pwq(struct pool_workqueue *pwq)

 		pr_info("    delayed:");
 		list_for_each_entry(work, &pwq->delayed_works, entry) {
-			pr_cont_work(comma, work);
+			pr_cont_work(comma, work, current_time);
 			comma = !(*work_data_bits(work) & WORK_STRUCT_LINKED);
 		}
 		pr_cont("\n");
@@ -4270,6 +4274,7 @@ void show_workqueue_state(void)
 	struct worker_pool *pool;
 	unsigned long flags;
 	int pi;
+	const unsigned long current_time = jiffies;

 	rcu_read_lock_sched();

@@ -4293,7 +4298,7 @@ void show_workqueue_state(void)
 		for_each_pwq(pwq, wq) {
 			spin_lock_irqsave(&pwq->pool->lock, flags);
 			if (pwq->nr_active || !list_empty(&pwq->delayed_works))
-				show_pwq(pwq);
+				show_pwq(pwq, current_time);
 			spin_unlock_irqrestore(&pwq->pool->lock, flags);
 		}
 	}
----------

Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160127.txt.xz .
Today's log used /proc/sys/vm/overcommit_memory == 0. (Yesterday's log used 1.)
----------
[  103.148077] Out of memory: Kill process 9609 (a.out) score 925 or sacrifice child
[  103.149866] Killed process 9609 (a.out) total-vm:6314312kB, anon-rss:4578056kB, file-rss:0kB
[  113.848798] MemAlloc-Info: 8 stalling task, 0 dying task, 0 victim task.
[  113.850536] MemAlloc: kworker/3:1(86) seq=5 gfp=0x2400000 order=0 delay=9793
(...snipped...)
[  114.997391] Mem-Info:
[  114.998421] active_anon:1164299 inactive_anon:2092 isolated_anon:0
[  114.998421]  active_file:9 inactive_file:14 isolated_file:0
[  114.998421]  unevictable:0 dirty:0 writeback:0 unstable:0
[  114.998421]  slab_reclaimable:2909 slab_unreclaimable:5767
[  114.998421]  mapped:751 shmem:2160 pagetables:3631 bounce:0
[  114.998421]  free:8045 free_pcp:0 free_cma:0
[  115.007451] Node 0 DMA free:15888kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[  115.016836] lowmem_reserve[]: 0 2708 4673 4673
[  115.018444] Node 0 DMA32 free:12724kB min:5008kB low:6260kB high:7512kB active_anon:2709028kB inactive_anon:4596kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2776588kB mlocked:0kB dirty:0kB writeback:0kB mapped:1692kB shmem:4716kB slab_reclaimable:6812kB slab_unreclaimable:10880kB kernel_stack:2032kB pagetables:7412kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[  115.029268] lowmem_reserve[]: 0 0 1965 1965
[  115.030880] Node 0 Normal free:3568kB min:3632kB low:4540kB high:5448kB active_anon:1948168kB inactive_anon:3772kB active_file:36kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1312kB shmem:3924kB slab_reclaimable:4824kB slab_unreclaimable:12172kB kernel_stack:1872kB pagetables:7112kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[  115.042223] lowmem_reserve[]: 0 0 0 0
[  115.044295] Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 2*32kB (U) 3*64kB (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
[  115.048168] Node 0 DMA32: 33*4kB (UM) 345*8kB (UE) 192*16kB (UME) 95*32kB (UME) 20*64kB (UE) 10*128kB (UME) 3*256kB (UM) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 12844kB
[  115.053309] Node 0 Normal: 44*4kB (UME) 35*8kB (UME) 20*16kB (UME) 42*32kB (UME) 7*64kB (E) 4*128kB (M) 2*256kB (UE) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3592kB
[  115.058500] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  115.061009] 2199 total pagecache pages
[  115.062616] 0 pages in swap cache
[  115.064130] Swap cache stats: add 0, delete 0, find 0/0
[  115.065964] Free swap  = 0kB
[  115.067414] Total swap = 0kB
[  115.068840] 1310589 pages RAM
[  115.070273] 0 pages HighMem/MovableOnly
[  115.071847] 109257 pages reserved
[  115.073345] 0 pages hwpoisoned
[  115.074806] Showing busy workqueues and worker pools:
[  115.076607] workqueue events: flags=0x0
[  115.078220]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=2/256
[  115.080314]     pending: vmpressure_work_fn(delay=11001), vmw_fb_dirty_flush [vmwgfx](delay=1192)
[  115.083003] workqueue events_power_efficient: flags=0x80
[  115.084874]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
[  115.086986]     pending: neigh_periodic_work(delay=7661)
[  115.089044]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  115.091204]     pending: fb_flashcursor(delay=25)
[  115.093136] workqueue events_freezable_power_: flags=0x84
[  115.095052]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=3/256
[  115.097177]     in-flight: 86:disk_events_workfn(delay=11024)
[  115.099765]     pending: disk_events_workfn(delay=9707), disk_events_workfn(delay=9707)
[  115.102397] workqueue vmstat: flags=0xc
[  115.104004]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
[  115.106111]     pending: vmstat_update(delay=10241)
[  115.108064] pool 6: cpus=3 node=0 flags=0x0 nice=0 workers=3 idle: 88 26
(...snipped...)
[  125.111557] MemAlloc-Info: 11 stalling task, 0 dying task, 0 victim task.
(...snipped...)
[  126.919780] Node 0 Normal free:3580kB min:3632kB low:4540kB high:5448kB active_anon:1948168kB inactive_anon:3772kB active_file:36kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1312kB shmem:3924kB slab_reclaimable:4824kB slab_unreclaimable:12172kB kernel_stack:1872kB pagetables:7112kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
(...snipped...)
[  126.941519] Node 0 Normal: 47*4kB (UME) 34*8kB (ME) 20*16kB (UME) 41*32kB (ME) 7*64kB (E) 4*128kB (M) 2*256kB (UE) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3564kB
(...snipped...)
[  139.276624] Node 0 Normal free:3580kB min:3632kB low:4540kB high:5448kB active_anon:1948168kB inactive_anon:3772kB active_file:36kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1312kB shmem:3924kB slab_reclaimable:4824kB slab_unreclaimable:12172kB kernel_stack:1872kB pagetables:7112kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
(...snipped...)
[  139.298331] Node 0 Normal: 47*4kB (UME) 34*8kB (ME) 20*16kB (UME) 41*32kB (ME) 7*64kB (E) 4*128kB (M) 2*256kB (UE) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3564kB
(...snipped...)
[  151.557912] Node 0 Normal free:3580kB min:3632kB low:4540kB high:5448kB active_anon:1948168kB inactive_anon:3772kB active_file:36kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1312kB shmem:3924kB slab_reclaimable:4824kB slab_unreclaimable:12172kB kernel_stack:1872kB pagetables:7112kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
(...snipped...)
[  151.579565] Node 0 Normal: 47*4kB (UME) 34*8kB (ME) 20*16kB (UME) 41*32kB (ME) 7*64kB (E) 4*128kB (M) 2*256kB (UE) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3564kB
(...snipped...)
[  164.184580] Node 0 Normal free:3580kB min:3632kB low:4540kB high:5448kB active_anon:1948168kB inactive_anon:3772kB active_file:36kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1312kB shmem:3924kB slab_reclaimable:4824kB slab_unreclaimable:12172kB kernel_stack:1872kB pagetables:7112kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
(...snipped...)
[  164.206423] Node 0 Normal: 47*4kB (UME) 34*8kB (ME) 20*16kB (UME) 41*32kB (ME) 7*64kB (E) 4*128kB (M) 2*256kB (UE) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3564kB
(...snipped...)
[  177.314557] Node 0 Normal free:3580kB min:3632kB low:4540kB high:5448kB active_anon:1948168kB inactive_anon:3772kB active_file:36kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1312kB shmem:3924kB slab_reclaimable:4824kB slab_unreclaimable:12172kB kernel_stack:1872kB pagetables:7112kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
(...snipped...)
[  177.337059] Node 0 Normal: 47*4kB (UME) 34*8kB (ME) 20*16kB (UME) 41*32kB (ME) 7*64kB (E) 4*128kB (M) 2*256kB (UE) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3564kB
(...snipped...)
[  190.804567] Node 0 Normal free:3580kB min:3632kB low:4540kB high:5448kB active_anon:1948168kB inactive_anon:3772kB active_file:36kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1312kB shmem:3924kB slab_reclaimable:4824kB slab_unreclaimable:12172kB kernel_stack:1872kB pagetables:7112kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
(...snipped...)
[  190.826437] Node 0 Normal: 47*4kB (UME) 34*8kB (ME) 20*16kB (UME) 41*32kB (ME) 7*64kB (E) 4*128kB (M) 2*256kB (UE) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3564kB
(...snipped...)
[  204.260591] Node 0 Normal free:3580kB min:3632kB low:4540kB high:5448kB active_anon:1948168kB inactive_anon:3772kB active_file:36kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1312kB shmem:3924kB slab_reclaimable:4824kB slab_unreclaimable:12172kB kernel_stack:1872kB pagetables:7112kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
(...snipped...)
[  204.282413] Node 0 Normal: 47*4kB (UME) 34*8kB (ME) 20*16kB (UME) 41*32kB (ME) 7*64kB (E) 4*128kB (M) 2*256kB (UE) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3564kB
(...snipped...)
[  218.454026] Node 0 Normal free:3580kB min:3632kB low:4540kB high:5448kB active_anon:1948168kB inactive_anon:3772kB active_file:36kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1312kB shmem:3924kB slab_reclaimable:4824kB slab_unreclaimable:12172kB kernel_stack:1872kB pagetables:7112kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
(...snipped...)
[  218.475787] Node 0 Normal: 47*4kB (UME) 34*8kB (ME) 20*16kB (UME) 41*32kB (ME) 7*64kB (E) 4*128kB (M) 2*256kB (UE) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3564kB
(...snipped...)
[  232.040600] Node 0 Normal free:3580kB min:3632kB low:4540kB high:5448kB active_anon:1948168kB inactive_anon:3772kB active_file:36kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1312kB shmem:3924kB slab_reclaimable:4824kB slab_unreclaimable:12172kB kernel_stack:1872kB pagetables:7112kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
(...snipped...)
[  232.062342] Node 0 Normal: 47*4kB (UME) 34*8kB (ME) 20*16kB (UME) 41*32kB (ME) 7*64kB (E) 4*128kB (M) 2*256kB (UE) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3564kB
(...snipped...)
[  242.126476] MemAlloc-Info: 18 stalling task, 0 dying task, 0 victim task.
(...snipped...)
[  245.612136] Mem-Info:
[  245.613096] active_anon:1164299 inactive_anon:2092 isolated_anon:0
[  245.613096]  active_file:9 inactive_file:14 isolated_file:0
[  245.613096]  unevictable:0 dirty:0 writeback:0 unstable:0
[  245.613096]  slab_reclaimable:2909 slab_unreclaimable:5767
[  245.613096]  mapped:751 shmem:2160 pagetables:3631 bounce:0
[  245.613096]  free:8050 free_pcp:0 free_cma:0
[  245.621912] Node 0 DMA free:15888kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[  245.630960] lowmem_reserve[]: 0 2708 4673 4673
[  245.632517] Node 0 DMA32 free:12732kB min:5008kB low:6260kB high:7512kB active_anon:2709028kB inactive_anon:4596kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2776588kB mlocked:0kB dirty:0kB writeback:0kB mapped:1692kB shmem:4716kB slab_reclaimable:6812kB slab_unreclaimable:10880kB kernel_stack:2032kB pagetables:7412kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[  245.643221] lowmem_reserve[]: 0 0 1965 1965
[  245.644889] Node 0 Normal free:3580kB min:3632kB low:4540kB high:5448kB active_anon:1948168kB inactive_anon:3772kB active_file:36kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1312kB shmem:3924kB slab_reclaimable:4824kB slab_unreclaimable:12172kB kernel_stack:1872kB pagetables:7112kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[  245.656041] lowmem_reserve[]: 0 0 0 0
[  245.657666] Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 2*32kB (U) 3*64kB (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
[  245.661481] Node 0 DMA32: 36*4kB (UM) 345*8kB (UE) 192*16kB (UME) 95*32kB (UME) 20*64kB (UE) 10*128kB (UME) 3*256kB (UM) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 12856kB
[  245.666575] Node 0 Normal: 47*4kB (UME) 34*8kB (ME) 20*16kB (UME) 41*32kB (ME) 7*64kB (E) 4*128kB (M) 2*256kB (UE) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3564kB
[  245.671649] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  245.674101] 2199 total pagecache pages
[  245.675677] 0 pages in swap cache
[  245.677165] Swap cache stats: add 0, delete 0, find 0/0
[  245.679033] Free swap  = 0kB
[  245.680442] Total swap = 0kB
[  245.681841] 1310589 pages RAM
[  245.683255] 0 pages HighMem/MovableOnly
[  245.684837] 109257 pages reserved
[  245.686305] 0 pages hwpoisoned
[  245.687730] Showing busy workqueues and worker pools:
[  245.689549] workqueue events: flags=0x0
[  245.691125]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=3/256
[  245.693188]     pending: vmpressure_work_fn(delay=141614), vmw_fb_dirty_flush [vmwgfx](delay=131805), push_to_pool(delay=53092)
[  245.696407] workqueue events_power_efficient: flags=0x80
[  245.698280]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
[  245.700401]     pending: neigh_periodic_work(delay=138274)
[  245.702475]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  245.704586]     pending: fb_flashcursor(delay=13)
[  245.706490] workqueue events_freezable_power_: flags=0x84
[  245.708440]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=3/256
[  245.710607]     in-flight: 86:disk_events_workfn(delay=141637)
[  245.712649]     pending: disk_events_workfn(delay=140320), disk_events_workfn(delay=140320)
[  245.715265] workqueue vmstat: flags=0xc
[  245.716898]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
[  245.719036]     pending: vmstat_update(delay=140854)
[  245.721018] pool 6: cpus=3 node=0 flags=0x0 nice=0 workers=3 idle: 88 26
----------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
  2016-01-27 11:02         ` Tetsuo Handa
@ 2016-01-28 15:48           ` Tetsuo Handa
  2016-01-29  7:32             ` Jan Stancek
  0 siblings, 1 reply; 26+ messages in thread
From: Tetsuo Handa @ 2016-01-28 15:48 UTC (permalink / raw)
  To: mhocko, jstancek; +Cc: tj, clameter, js1304, arekm, akpm, torvalds, linux-mm

Tetsuo Handa wrote:
> Inviting people who involved in commit 373ccbe5927034b5 "mm, vmstat: allow
> WQ concurrency to discover memory reclaim doesn't make any progress".
> 
> In this thread, Jan hit an OOM stall where free memory does not increase
> even after OOM victim and dying tasks terminated. I'm wondering why such
> thing can happen. Jan established a reproducer and I tried it.
> 
> I'm observing vmstat_update workqueue item forever remains pending.
> Didn't we make sure that vmstat_update is processed when memory allocation
> is stalling?

I confirmed that a forced sleep patch solves this problem.

----------------------------------------
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 7340353..b986216 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -979,6 +979,12 @@ long wait_iff_congested(struct zone *zone, int sync, long timeout)
 	 */
 	if (atomic_read(&nr_wb_congested[sync]) == 0 ||
 	    !test_bit(ZONE_CONGESTED, &zone->flags)) {
+		const struct memalloc_info *m = &current->memalloc;
+		if (m->valid && time_after_eq(jiffies, m->start + 30 * HZ)) {
+			pr_err("********** %s(%u) Forced sleep **********\n",
+			       current->comm, current->pid);
+			schedule_timeout_uninterruptible(HZ);
+		}
 
 		/*
 		 * Memory allocation/reclaim might be called from a WQ
----------------------------------------

----------------------------------------
[  939.038719] Showing busy workqueues and worker pools:
[  939.040519] workqueue events: flags=0x0
[  939.042142]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[  939.044350]     pending: vmpressure_work_fn(delay=20659)
[  939.046302]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
[  939.048392]     pending: vmw_fb_dirty_flush [vmwgfx](delay=42), vmstat_shepherd(delay=10)
[  939.050946] workqueue events_power_efficient: flags=0x80
[  939.052844]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[  939.054980]     pending: fb_flashcursor(delay=20573)
[  939.056939] workqueue events_freezable_power_: flags=0x84
[  939.058872]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[  939.060994]     in-flight: 9571:disk_events_workfn(delay=20719)
[  939.063069] workqueue vmstat: flags=0xc
[  939.064667]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[  939.066795]     pending: vmstat_update(delay=20016)
[  939.068752] workqueue xfs-eofblocks/sda1: flags=0xc
[  939.070546]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[  939.072675]     pending: xfs_eofblocks_worker(delay=5574)
[  939.074660] pool 4: cpus=2 node=0 flags=0x0 nice=0 workers=8 idle: 20 10098 10100 505 10099 10364 10363
[  948.026046] ********** a.out(10423) Forced sleep **********
[  948.036318] ********** a.out(10424) Forced sleep **********
[  948.323267] ********** kworker/2:3(9571) Forced sleep **********
[  949.030045] a.out invoked oom-killer: gfp_mask=0x24280ca, order=0, oom_score_adj=0
[  949.032320] a.out cpuset=/ mems_allowed=0
[  949.033976] CPU: 3 PID: 10423 Comm: a.out Not tainted 4.4.0+ #39
----------------------------------------
[ 1255.809372] Showing busy workqueues and worker pools:
[ 1255.811163] workqueue events: flags=0x0
[ 1255.812744]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[ 1255.814877]     pending: vmpressure_work_fn(delay=10713)
[ 1255.816837]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[ 1255.818960]     pending: vmw_fb_dirty_flush [vmwgfx](delay=42)
[ 1255.821025] workqueue events_power_efficient: flags=0x80
[ 1255.822937]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[ 1255.825136]     pending: fb_flashcursor(delay=20673)
[ 1255.827069] workqueue events_freezable_power_: flags=0x84
[ 1255.828953]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[ 1255.831050]     in-flight: 20:disk_events_workfn(delay=20777)
[ 1255.833063]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[ 1255.835168]     pending: disk_events_workfn(delay=7)
[ 1255.837084] workqueue vmstat: flags=0xc
[ 1255.838707]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[ 1255.840827]     pending: vmstat_update(delay=19787)
[ 1255.842794] pool 4: cpus=2 node=0 flags=0x0 nice=0 workers=3 idle: 9571 10098
[ 1265.036032] ********** kworker/2:0(20) Forced sleep **********
[ 1265.038131] a.out invoked oom-killer: gfp_mask=0x24280ca, order=0, oom_score_adj=0
[ 1265.041018] a.out cpuset=/ mems_allowed=0
[ 1265.043008] CPU: 2 PID: 10622 Comm: a.out Not tainted 4.4.0+ #39
----------------------------------------

In the post "[PATCH 1/2] mm, oom: introduce oom reaper", Andrew Morton said
that "schedule_timeout() in state TASK_RUNNING doesn't do anything".

Looking at commit 373ccbe5927034b5, it is indeed using schedule_timeout(1)
instead of schedule_timeout_*(1). What!? We meant to force the kworker to
sleep but the kworker did not sleep at all? Then, that explains why the
forced sleep patch above solves the OOM livelock.

Jan, can you reproduce your problem with below patch applied?

----------
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 7340353..cbe6f0b 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -989,7 +989,7 @@ long wait_iff_congested(struct zone *zone, int sync, long timeout)
 		 * here rather than calling cond_resched().
 		 */
 		if (current->flags & PF_WQ_WORKER)
-			schedule_timeout(1);
+			schedule_timeout_uninterruptible(1);
 		else
 			cond_resched();
 
----------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
  2016-01-28 15:48           ` Tetsuo Handa
@ 2016-01-29  7:32             ` Jan Stancek
  2016-01-29 12:35               ` Tetsuo Handa
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Stancek @ 2016-01-29  7:32 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: mhocko, tj, clameter, js1304, arekm, akpm, torvalds, linux-mm



----- Original Message -----
> From: "Tetsuo Handa" <penguin-kernel@I-love.SAKURA.ne.jp>
> To: mhocko@suse.com, jstancek@redhat.com
> Cc: tj@kernel.org, clameter@sgi.com, js1304@gmail.com, arekm@maven.pl, akpm@linux-foundation.org,
> torvalds@linux-foundation.org, linux-mm@kvack.org
> Sent: Thursday, 28 January, 2016 4:48:36 PM
> Subject: Re: [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
> 
> Tetsuo Handa wrote:
> > Inviting people who involved in commit 373ccbe5927034b5 "mm, vmstat: allow
> > WQ concurrency to discover memory reclaim doesn't make any progress".
> > 
> > In this thread, Jan hit an OOM stall where free memory does not increase
> > even after OOM victim and dying tasks terminated. I'm wondering why such
> > thing can happen. Jan established a reproducer and I tried it.
> > 
> > I'm observing vmstat_update workqueue item forever remains pending.
> > Didn't we make sure that vmstat_update is processed when memory allocation
> > is stalling?
> 
> I confirmed that a forced sleep patch solves this problem.
> 
> ----------------------------------------
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 7340353..b986216 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -979,6 +979,12 @@ long wait_iff_congested(struct zone *zone, int sync,
> long timeout)
>  	 */
>  	if (atomic_read(&nr_wb_congested[sync]) == 0 ||
>  	    !test_bit(ZONE_CONGESTED, &zone->flags)) {
> +		const struct memalloc_info *m = &current->memalloc;
> +		if (m->valid && time_after_eq(jiffies, m->start + 30 * HZ)) {
> +			pr_err("********** %s(%u) Forced sleep **********\n",
> +			       current->comm, current->pid);
> +			schedule_timeout_uninterruptible(HZ);
> +		}
>  
>  		/*
>  		 * Memory allocation/reclaim might be called from a WQ
> ----------------------------------------
> 
> ----------------------------------------
> [  939.038719] Showing busy workqueues and worker pools:
> [  939.040519] workqueue events: flags=0x0
> [  939.042142]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [  939.044350]     pending: vmpressure_work_fn(delay=20659)
> [  939.046302]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
> [  939.048392]     pending: vmw_fb_dirty_flush [vmwgfx](delay=42),
> vmstat_shepherd(delay=10)
> [  939.050946] workqueue events_power_efficient: flags=0x80
> [  939.052844]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [  939.054980]     pending: fb_flashcursor(delay=20573)
> [  939.056939] workqueue events_freezable_power_: flags=0x84
> [  939.058872]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [  939.060994]     in-flight: 9571:disk_events_workfn(delay=20719)
> [  939.063069] workqueue vmstat: flags=0xc
> [  939.064667]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [  939.066795]     pending: vmstat_update(delay=20016)
> [  939.068752] workqueue xfs-eofblocks/sda1: flags=0xc
> [  939.070546]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [  939.072675]     pending: xfs_eofblocks_worker(delay=5574)
> [  939.074660] pool 4: cpus=2 node=0 flags=0x0 nice=0 workers=8 idle: 20
> 10098 10100 505 10099 10364 10363
> [  948.026046] ********** a.out(10423) Forced sleep **********
> [  948.036318] ********** a.out(10424) Forced sleep **********
> [  948.323267] ********** kworker/2:3(9571) Forced sleep **********
> [  949.030045] a.out invoked oom-killer: gfp_mask=0x24280ca, order=0,
> oom_score_adj=0
> [  949.032320] a.out cpuset=/ mems_allowed=0
> [  949.033976] CPU: 3 PID: 10423 Comm: a.out Not tainted 4.4.0+ #39
> ----------------------------------------
> [ 1255.809372] Showing busy workqueues and worker pools:
> [ 1255.811163] workqueue events: flags=0x0
> [ 1255.812744]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.814877]     pending: vmpressure_work_fn(delay=10713)
> [ 1255.816837]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.818960]     pending: vmw_fb_dirty_flush [vmwgfx](delay=42)
> [ 1255.821025] workqueue events_power_efficient: flags=0x80
> [ 1255.822937]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.825136]     pending: fb_flashcursor(delay=20673)
> [ 1255.827069] workqueue events_freezable_power_: flags=0x84
> [ 1255.828953]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.831050]     in-flight: 20:disk_events_workfn(delay=20777)
> [ 1255.833063]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.835168]     pending: disk_events_workfn(delay=7)
> [ 1255.837084] workqueue vmstat: flags=0xc
> [ 1255.838707]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
> [ 1255.840827]     pending: vmstat_update(delay=19787)
> [ 1255.842794] pool 4: cpus=2 node=0 flags=0x0 nice=0 workers=3 idle: 9571
> 10098
> [ 1265.036032] ********** kworker/2:0(20) Forced sleep **********
> [ 1265.038131] a.out invoked oom-killer: gfp_mask=0x24280ca, order=0,
> oom_score_adj=0
> [ 1265.041018] a.out cpuset=/ mems_allowed=0
> [ 1265.043008] CPU: 2 PID: 10622 Comm: a.out Not tainted 4.4.0+ #39
> ----------------------------------------
> 
> In the post "[PATCH 1/2] mm, oom: introduce oom reaper", Andrew Morton said
> that "schedule_timeout() in state TASK_RUNNING doesn't do anything".
> 
> Looking at commit 373ccbe5927034b5, it is indeed using schedule_timeout(1)
> instead of schedule_timeout_*(1). What!? We meant to force the kworker to
> sleep but the kworker did not sleep at all? Then, that explains why the
> forced sleep patch above solves the OOM livelock.
> 
> Jan, can you reproduce your problem with below patch applied?

I took v4.5-rc1, applied your memalloc patch and then patch below.

I have mixed results so far. First attempt hanged after ~15 minutes,
second is still running (for 12+ hours).

The way it hanged is different from previous ones, I don't recall seeing
messages like these before:
  SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
  NMI watchdog: Watchdog detected hard LOCKUP on cpu 0

Full log from one that hanged:
  http://jan.stancek.eu/tmp/oom_hangs/console.log.4-v4.5-rc1_and_wait_iff_congested_patch.txt

I'll let it run through the weekend.

Regards,
Jan

> 
> ----------
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 7340353..cbe6f0b 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -989,7 +989,7 @@ long wait_iff_congested(struct zone *zone, int sync, long
> timeout)
>  		 * here rather than calling cond_resched().
>  		 */
>  		if (current->flags & PF_WQ_WORKER)
> -			schedule_timeout(1);
> +			schedule_timeout_uninterruptible(1);
>  		else
>  			cond_resched();
>  
> ----------
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
  2016-01-29  7:32             ` Jan Stancek
@ 2016-01-29 12:35               ` Tetsuo Handa
  0 siblings, 0 replies; 26+ messages in thread
From: Tetsuo Handa @ 2016-01-29 12:35 UTC (permalink / raw)
  To: jstancek; +Cc: mhocko, tj, clameter, js1304, arekm, akpm, torvalds, linux-mm

Jan Stancek wrote:
> > Jan, can you reproduce your problem with below patch applied?
> 
> I took v4.5-rc1, applied your memalloc patch and then patch below.
> 
> I have mixed results so far. First attempt hanged after ~15 minutes,
> second is still running (for 12+ hours).
> 
> The way it hanged is different from previous ones, I don't recall seeing
> messages like these before:
>   SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
>   NMI watchdog: Watchdog detected hard LOCKUP on cpu 0
> 
> Full log from one that hanged:
>   http://jan.stancek.eu/tmp/oom_hangs/console.log.4-v4.5-rc1_and_wait_iff_congested_patch.txt
> 

The first attempt's failure is not a OOM bug. It's a hard lockup due to
flood of memory allocation failure messages which lasted for 10 seconds
with IRQ disabled. The caller which requested these atomic allocation
did not expect such situation. I think dma_active_cacheline can consider
adding __GFP_NOWARN. Please consult lib/dma-debug.c maintainers.

  static RADIX_TREE(dma_active_cacheline, GFP_NOWAIT);

----------
int ata_scsi_queuecmd(struct Scsi_Host *shost, struct scsi_cmnd *cmd) {
  spin_lock_irqsave(ap->lock, irq_flags); /* Disable IRQ. */
  __ata_scsi_queuecmd(cmd, dev) {
    ata_scsi_translate(dev, scmd, xlat_func) {
      ata_qc_issue(qc) {
        ata_sg_setup(qc) {
          dma_map_sg(ap->dev, qc->sg, qc->n_elem, qc->dma_dir) {
            debug_dma_map_sg(dev, sg, nents, ents, dir) {
              add_dma_entry(entry) { /* Iterate the loop for "ents" times. */
                rc = active_cacheline_insert(entry); /* "SLUB: Unable to allocate memory" message */
                if (rc == -ENOMEM) {
                        pr_err("DMA-API: cacheline tracking ENOMEM, dma-debug disabled\n");
                        global_disable = true;
                }
              }
            }
          }
        }
      }
    }
  }
  spin_unlock_irqrestore(ap->lock, irq_flags); /* Enable IRQ */
}
----------

By the way, I think there is no need to print these error messages
again after global_disable became true.

----------
[ 1053.123934] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1053.147529] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1053.796970] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1053.820563] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1054.469776] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1054.493371] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1055.142562] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1055.166156] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1055.815330] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1055.838924] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1056.495796] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1056.519400] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1057.168741] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1057.192333] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1057.841671] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1057.865264] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1058.514604] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1058.538200] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1059.187551] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1059.211142] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1059.860486] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1059.884080] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1060.533430] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1060.557023] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1061.206393] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1061.229984] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1061.879330] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1061.902924] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1062.552266] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1062.575857] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1063.219374] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1063.242967] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
[ 1063.892314] SLUB: Unable to allocate memory on node -1 (gfp=0x2000000)
[ 1063.915908] DMA-API: cacheline tracking ENOMEM, dma-debug disabled
----------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-01-29 12:35 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-18 15:38 [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab Jan Stancek
2016-01-18 15:38 ` [LTP] " Jan Stancek
2016-01-19 10:29 ` Tetsuo Handa
2016-01-19 10:29   ` [LTP] " Tetsuo Handa
2016-01-19 15:13   ` Jan Stancek
2016-01-19 15:13     ` [LTP] " Jan Stancek
2016-01-20 10:23     ` [BUG] oom hangs the system, NMI backtrace shows most CPUs inshrink_slab Tetsuo Handa
2016-01-20 10:23       ` [LTP] " Tetsuo Handa
2016-01-20 13:17       ` [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab Tetsuo Handa
2016-01-20 13:17         ` [LTP] " Tetsuo Handa
2016-01-20 15:10         ` Tejun Heo
2016-01-20 15:10           ` [LTP] " Tejun Heo
2016-01-20 15:54           ` Tetsuo Handa
2016-01-20 15:54             ` [LTP] " Tetsuo Handa
2016-01-22 15:14   ` Jan Stancek
2016-01-22 15:14     ` [LTP] " Jan Stancek
2016-01-23  6:30     ` Tetsuo Handa
2016-01-23  6:30       ` [LTP] " Tetsuo Handa
2016-01-26  7:48     ` Jan Stancek
2016-01-26  7:48       ` Jan Stancek
2016-01-26 14:46       ` Tetsuo Handa
2016-01-26 14:46         ` Tetsuo Handa
2016-01-27 11:02         ` Tetsuo Handa
2016-01-28 15:48           ` Tetsuo Handa
2016-01-29  7:32             ` Jan Stancek
2016-01-29 12:35               ` Tetsuo Handa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.