All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: INFO: task hung in ext4_da_get_block_prep
       [not found] ` <CACT4Y+aPRGUqAdJCMDWM=Zcy8ZQcHyrsB1ZuWS4VB_+wvLfeaQ@mail.gmail.com>
@ 2018-09-05 10:53   ` Tetsuo Handa
  2018-09-05 11:06     ` Dmitry Vyukov
  0 siblings, 1 reply; 25+ messages in thread
From: Tetsuo Handa @ 2018-09-05 10:53 UTC (permalink / raw)
  To: Dmitry Vyukov, syzbot
  Cc: 'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm, Michal Hocko

On 2018/09/05 16:22, Dmitry Vyukov wrote:
> On Wed, Sep 5, 2018 at 5:41 AM, syzbot
> <syzbot+f0fc7f62e88b1de99af3@syzkaller.appspotmail.com> wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    f2b6e66e9885 Add linux-next specific files for 20180904
>> git tree:       linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1735dc92400000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=15ad48400e39c1b3
>> dashboard link: https://syzkaller.appspot.com/bug?extid=f0fc7f62e88b1de99af3
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>> CC:             [adilger.kernel@dilger.ca linux-ext4@vger.kernel.org
>> linux-kernel@vger.kernel.org tytso@mit.edu]
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+f0fc7f62e88b1de99af3@syzkaller.appspotmail.com
>>
>> [   7961]     0  7961    17585     8737   131072        0             0
>> syz-executor3
> 
> Hi Tetsuo,
> 
> Maybe you know what are these repeated lines with numbers?
> We started getting them on linux-next recently, also:
> https://syzkaller.appspot.com/bug?extid=f8fa79b458bcae4d913d
> They seem to cause various hangs/stalls.

Yes, these lines are from the OOM killer. (Thus, if we can, I want to
remove ext4 people before upstreaming this report.)

  dump_tasks mm/oom_kill.c:420 [inline]
  dump_header+0xf0d/0xf70 mm/oom_kill.c:450
  oom_kill_process.cold.28+0x10/0x95a mm/oom_kill.c:953
  out_of_memory+0xa88/0x1430 mm/oom_kill.c:1120

What is annoying is that one for_each_process() traversal with printk() is
taking 52 seconds which is too long to do under RCU section. Under such
situation, invoking the OOM killer for three times will exceed khungtaskd
threshold 140 seconds. Was syzbot trying to test fork bomb situation?

Anyway, we might need to introduce rcu_lock_break() like
check_hung_uninterruptible_tasks() does...

[  999.629589] [  16497]     0 16497    17585     8739   126976        0             0 syz-executor5
[ 1026.435955] [  32764]     0 32764    17585     8739   126976        0             0 syz-executor5
[ 1026.445027] [    311]     0   311    17585     8737   131072        0             0 syz-executor3
[ 1047.914324] [  10315]     0 10315    17585     8271   126976        0             0 syz-executor0
[ 1047.923384] Out of memory: Kill process 4670 (syz-fuzzer) score 53 or sacrifice child
[ 1047.931934] Killed process 5032 (syz-executor1) total-vm:70212kB, anon-rss:60kB, file-rss:0kB, shmem-rss:0kB
[ 1047.988138] syz-executor2 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=1, oom_score_adj=0
[ 1048.000015] syz-executor2 cpuset=syz2 mems_allowed=0
[ 1048.005199] CPU: 0 PID: 4700 Comm: syz-executor2 Not tainted 4.19.0-rc2-next-20180904+ #55
[ 1048.740679] [   2347]     0  2347      278      186    32768        0             0 none
[ 1051.319928] [  16497]     0 16497    17585     8739   126976        0             0 syz-executor5
[ 1096.740878] [   8841]     0  8841    17585     8232   126976        0             0 syz-executor5
[ 1078.140677] [  32764]     0 32764    17585     8739   126976        0             0 syz-executor5
[ 1078.149807] [    311]     0   311    17585     8737   131072        0             0 syz-executor3
[ 1096.740878] [   8841]     0  8841    17585     8232   126976        0             0 syz-executor5

Also, another notable thing is that the backtrace for some reason includes

[ 1048.211540]  ? oom_killer_disable+0x3a0/0x3a0

line. Was syzbot testing process freezing functionality?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: INFO: task hung in ext4_da_get_block_prep
  2018-09-05 10:53   ` INFO: task hung in ext4_da_get_block_prep Tetsuo Handa
@ 2018-09-05 11:06     ` Dmitry Vyukov
  2018-09-06  5:53       ` Tetsuo Handa
  0 siblings, 1 reply; 25+ messages in thread
From: Dmitry Vyukov @ 2018-09-05 11:06 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm, Michal Hocko

On Wed, Sep 5, 2018 at 12:53 PM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> On 2018/09/05 16:22, Dmitry Vyukov wrote:
>> On Wed, Sep 5, 2018 at 5:41 AM, syzbot
>> <syzbot+f0fc7f62e88b1de99af3@syzkaller.appspotmail.com> wrote:
>>> Hello,
>>>
>>> syzbot found the following crash on:
>>>
>>> HEAD commit:    f2b6e66e9885 Add linux-next specific files for 20180904
>>> git tree:       linux-next
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1735dc92400000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=15ad48400e39c1b3
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=f0fc7f62e88b1de99af3
>>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>> CC:             [adilger.kernel@dilger.ca linux-ext4@vger.kernel.org
>>> linux-kernel@vger.kernel.org tytso@mit.edu]
>>>
>>> Unfortunately, I don't have any reproducer for this crash yet.
>>>
>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> Reported-by: syzbot+f0fc7f62e88b1de99af3@syzkaller.appspotmail.com
>>>
>>> [   7961]     0  7961    17585     8737   131072        0             0
>>> syz-executor3
>>
>> Hi Tetsuo,
>>
>> Maybe you know what are these repeated lines with numbers?
>> We started getting them on linux-next recently, also:
>> https://syzkaller.appspot.com/bug?extid=f8fa79b458bcae4d913d
>> They seem to cause various hangs/stalls.
>
> Yes, these lines are from the OOM killer. (Thus, if we can, I want to
> remove ext4 people before upstreaming this report.)

This is not possible at the moment.

>   dump_tasks mm/oom_kill.c:420 [inline]
>   dump_header+0xf0d/0xf70 mm/oom_kill.c:450
>   oom_kill_process.cold.28+0x10/0x95a mm/oom_kill.c:953
>   out_of_memory+0xa88/0x1430 mm/oom_kill.c:1120
>
> What is annoying is that one for_each_process() traversal with printk() is
> taking 52 seconds which is too long to do under RCU section. Under such
> situation, invoking the OOM killer for three times will exceed khungtaskd
> threshold 140 seconds. Was syzbot trying to test fork bomb situation?

Hard to tell. I only know what's captured in the console output.

> Anyway, we might need to introduce rcu_lock_break() like
> check_hung_uninterruptible_tasks() does...
>
> [  999.629589] [  16497]     0 16497    17585     8739   126976        0             0 syz-executor5
> [ 1026.435955] [  32764]     0 32764    17585     8739   126976        0             0 syz-executor5
> [ 1026.445027] [    311]     0   311    17585     8737   131072        0             0 syz-executor3
> [ 1047.914324] [  10315]     0 10315    17585     8271   126976        0             0 syz-executor0
> [ 1047.923384] Out of memory: Kill process 4670 (syz-fuzzer) score 53 or sacrifice child
> [ 1047.931934] Killed process 5032 (syz-executor1) total-vm:70212kB, anon-rss:60kB, file-rss:0kB, shmem-rss:0kB
> [ 1047.988138] syz-executor2 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=1, oom_score_adj=0
> [ 1048.000015] syz-executor2 cpuset=syz2 mems_allowed=0
> [ 1048.005199] CPU: 0 PID: 4700 Comm: syz-executor2 Not tainted 4.19.0-rc2-next-20180904+ #55
> [ 1048.740679] [   2347]     0  2347      278      186    32768        0             0 none
> [ 1051.319928] [  16497]     0 16497    17585     8739   126976        0             0 syz-executor5
> [ 1096.740878] [   8841]     0  8841    17585     8232   126976        0             0 syz-executor5
> [ 1078.140677] [  32764]     0 32764    17585     8739   126976        0             0 syz-executor5
> [ 1078.149807] [    311]     0   311    17585     8737   131072        0             0 syz-executor3
> [ 1096.740878] [   8841]     0  8841    17585     8232   126976        0             0 syz-executor5
>
> Also, another notable thing is that the backtrace for some reason includes
>
> [ 1048.211540]  ? oom_killer_disable+0x3a0/0x3a0
>
> line. Was syzbot testing process freezing functionality?

What's the API for this?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: INFO: task hung in ext4_da_get_block_prep
  2018-09-05 11:06     ` Dmitry Vyukov
@ 2018-09-06  5:53       ` Tetsuo Handa
  2018-09-06  9:54         ` Dmitry Vyukov
  0 siblings, 1 reply; 25+ messages in thread
From: Tetsuo Handa @ 2018-09-06  5:53 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm, Michal Hocko

Dmitry Vyukov wrote:
> > Also, another notable thing is that the backtrace for some reason includes
> >
> > [ 1048.211540]  ? oom_killer_disable+0x3a0/0x3a0
> >
> > line. Was syzbot testing process freezing functionality?
> 
> What's the API for this?
> 

I'm not a user of suspend/hibernation. But it seems that usage of the API
is to write one of words listed in /sys/power/state into /sys/power/state .

# echo suspend > /sys/power/state

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: INFO: task hung in ext4_da_get_block_prep
  2018-09-06  5:53       ` Tetsuo Handa
@ 2018-09-06  9:54         ` Dmitry Vyukov
  2018-09-06 10:58           ` [PATCH] mm, oom: Introduce time limit for dump_tasks duration Tetsuo Handa
  0 siblings, 1 reply; 25+ messages in thread
From: Dmitry Vyukov @ 2018-09-06  9:54 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm, Michal Hocko

On Thu, Sep 6, 2018 at 7:53 AM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> Dmitry Vyukov wrote:
>> > Also, another notable thing is that the backtrace for some reason includes
>> >
>> > [ 1048.211540]  ? oom_killer_disable+0x3a0/0x3a0
>> >
>> > line. Was syzbot testing process freezing functionality?
>>
>> What's the API for this?
>>
>
> I'm not a user of suspend/hibernation. But it seems that usage of the API
> is to write one of words listed in /sys/power/state into /sys/power/state .
>
> # echo suspend > /sys/power/state

syzkaller should not write to /sys/power/state. The only mention of
"power" is in some selinux contexts.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-06  9:54         ` Dmitry Vyukov
@ 2018-09-06 10:58           ` Tetsuo Handa
  2018-09-06 11:07             ` Dmitry Vyukov
  2018-09-06 11:23             ` Michal Hocko
  0 siblings, 2 replies; 25+ messages in thread
From: Tetsuo Handa @ 2018-09-06 10:58 UTC (permalink / raw)
  To: Dmitry Vyukov, Michal Hocko, Andrew Morton, David Rientjes
  Cc: syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On 2018/09/06 18:54, Dmitry Vyukov wrote:
> On Thu, Sep 6, 2018 at 7:53 AM, Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>> Dmitry Vyukov wrote:
>>>> Also, another notable thing is that the backtrace for some reason includes
>>>>
>>>> [ 1048.211540]  ? oom_killer_disable+0x3a0/0x3a0
>>>>
>>>> line. Was syzbot testing process freezing functionality?
>>>
>>> What's the API for this?
>>>
>>
>> I'm not a user of suspend/hibernation. But it seems that usage of the API
>> is to write one of words listed in /sys/power/state into /sys/power/state .
>>
>> # echo suspend > /sys/power/state
> 
> syzkaller should not write to /sys/power/state. The only mention of
> "power" is in some selinux contexts.
> 

OK. Then, I have no idea.
Anyway, I think we can apply this patch.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-06 10:58           ` [PATCH] mm, oom: Introduce time limit for dump_tasks duration Tetsuo Handa
@ 2018-09-06 11:07             ` Dmitry Vyukov
  2018-09-06 11:25               ` Tetsuo Handa
  2018-09-06 11:23             ` Michal Hocko
  1 sibling, 1 reply; 25+ messages in thread
From: Dmitry Vyukov @ 2018-09-06 11:07 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Michal Hocko, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On Thu, Sep 6, 2018 at 12:58 PM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> On 2018/09/06 18:54, Dmitry Vyukov wrote:
>> On Thu, Sep 6, 2018 at 7:53 AM, Tetsuo Handa
>> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>> Dmitry Vyukov wrote:
>>>>> Also, another notable thing is that the backtrace for some reason includes
>>>>>
>>>>> [ 1048.211540]  ? oom_killer_disable+0x3a0/0x3a0
>>>>>
>>>>> line. Was syzbot testing process freezing functionality?
>>>>
>>>> What's the API for this?
>>>>
>>>
>>> I'm not a user of suspend/hibernation. But it seems that usage of the API
>>> is to write one of words listed in /sys/power/state into /sys/power/state .
>>>
>>> # echo suspend > /sys/power/state
>>
>> syzkaller should not write to /sys/power/state. The only mention of
>> "power" is in some selinux contexts.
>>
>
> OK. Then, I have no idea.
> Anyway, I think we can apply this patch.
>
> From 18876f287dd69a7c33f65c91cfcda3564233f55e Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Thu, 6 Sep 2018 19:53:18 +0900
> Subject: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
>
> Since printk() is slow, printing one line takes nearly 0.01 second.
> As a result, syzbot is stalling for 52 seconds trying to dump 5600

I wonder why there are so many of them?
We have at most 8 test processes (each having no more than 16 threads
if that matters).
No more than 1 instance of syz-executor1 at a time. But we see output
like the one below. It has lots of instances of syz-executor1 with
different pid's. So does it print all tasks that ever existed (kernel
does not store that info, right)? Or it livelocks picking up new and
new tasks as they are created slower than they are created? Or we have
tons of zombies?

...
[   8037]     0  8037    17618     8738   131072        0
0 syz-executor1
[   8039]     0  8039    17585     8737   131072        0
0 syz-executor3
[   8040]     0  8040    17618     8738   131072        0
0 syz-executor1
 schedule+0xfb/0x450 kernel/sched/core.c:3517
[   8056]     0  8056    17585     8738   126976        0
0 syz-executor4
[   8055]     0  8055    17618     8741   126976        0
0 syz-executor5
[   8060]     0  8060    17585     8740   126976        0
0 syz-executor0
[   8062]     0  8062    17585     8739   126976        0
0 syz-executor7
[   8063]     0  8063    17618     8741   126976        0
0 syz-executor5
[   8066]     0  8066    17585     8740   126976        0
0 syz-executor0
[   8067]     0  8067    17585     8737   126976        0
0 syz-executor6
[   8070]     0  8070    17618     8739   131072        0
0 syz-executor3
[   8073]     0  8073    17618     8738   131072        0
0 syz-executor1
[   8074]     0  8074    17585     8737   126976        0
0 syz-executor6
 __rwsem_down_read_failed_common kernel/locking/rwsem-xadd.c:269 [inline]
 rwsem_down_read_failed+0x362/0x610 kernel/locking/rwsem-xadd.c:286
[   8075]     0  8075    17618     8739   131072        0
0 syz-executor3
[   8077]     0  8077    17618     8738   131072        0
0 syz-executor1
[   8079]     0  8079    17585     8739   126976        0
0 syz-executor7
[   8092]     0  8092    17618     8738   131072        0
0 syz-executor1
...


> tasks at for_each_process() under RCU. Since such situation is almost
> inflight fork bomb attack (the OOM killer will print similar tasks for
> so many times), it makes little sense to print all candidate tasks.
> Thus, this patch introduces 3 seconds limit for printing.
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> ---
>  mm/oom_kill.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index f10aa53..48e5bf6 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -399,14 +399,22 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
>  {
>         struct task_struct *p;
>         struct task_struct *task;
> +       unsigned long start;
> +       unsigned int skipped = 0;
>
>         pr_info("Tasks state (memory values in pages):\n");
>         pr_info("[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name\n");
>         rcu_read_lock();
> +       start = jiffies;
>         for_each_process(p) {
>                 if (oom_unkillable_task(p, memcg, nodemask))
>                         continue;
>
> +               if (time_after(jiffies, start + 3 * HZ)) {
> +                       skipped++;
> +                       continue;
> +               }
> +
>                 task = find_lock_task_mm(p);
>                 if (!task) {
>                         /*
> @@ -426,6 +434,8 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
>                 task_unlock(task);
>         }
>         rcu_read_unlock();
> +       if (skipped)
> +               pr_info("Printing %u tasks omitted.\n", skipped);
>  }
>
>  static void dump_header(struct oom_control *oc, struct task_struct *p)
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-06 10:58           ` [PATCH] mm, oom: Introduce time limit for dump_tasks duration Tetsuo Handa
  2018-09-06 11:07             ` Dmitry Vyukov
@ 2018-09-06 11:23             ` Michal Hocko
  2018-09-06 11:40               ` Tetsuo Handa
  1 sibling, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2018-09-06 11:23 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Dmitry Vyukov, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On Thu 06-09-18 19:58:25, Tetsuo Handa wrote:
[...]
> >From 18876f287dd69a7c33f65c91cfcda3564233f55e Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Thu, 6 Sep 2018 19:53:18 +0900
> Subject: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
> 
> Since printk() is slow, printing one line takes nearly 0.01 second.
> As a result, syzbot is stalling for 52 seconds trying to dump 5600
> tasks at for_each_process() under RCU. Since such situation is almost
> inflight fork bomb attack (the OOM killer will print similar tasks for
> so many times), it makes little sense to print all candidate tasks.
> Thus, this patch introduces 3 seconds limit for printing.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Dmitry Vyukov <dvyukov@google.com>

You really love timeout based solutions with randomly chosen timeouts,
don't you. This is just ugly as hell. We already have means to disable
tasks dumping (see /proc/sys/vm/oom_dump_tasks).
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-06 11:07             ` Dmitry Vyukov
@ 2018-09-06 11:25               ` Tetsuo Handa
  0 siblings, 0 replies; 25+ messages in thread
From: Tetsuo Handa @ 2018-09-06 11:25 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Michal Hocko, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On 2018/09/06 20:07, Dmitry Vyukov wrote:
>> Since printk() is slow, printing one line takes nearly 0.01 second.
>> As a result, syzbot is stalling for 52 seconds trying to dump 5600
> 
> I wonder why there are so many of them?
> We have at most 8 test processes (each having no more than 16 threads
> if that matters).
> No more than 1 instance of syz-executor1 at a time. But we see output
> like the one below. It has lots of instances of syz-executor1 with
> different pid's. So does it print all tasks that ever existed (kernel
> does not store that info, right)? Or it livelocks picking up new and
> new tasks as they are created slower than they are created? Or we have
> tons of zombies?
> 
> ...

I don't think they are zombies. Since tasks which already released ->mm
are not printed, these tasks are still alive.

  [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
> [   8037]     0  8037    17618     8738   131072        0             0 syz-executor1

Maybe something signal / fork() / exit() / wait() related regression?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-06 11:23             ` Michal Hocko
@ 2018-09-06 11:40               ` Tetsuo Handa
  2018-09-06 11:53                 ` Michal Hocko
  0 siblings, 1 reply; 25+ messages in thread
From: Tetsuo Handa @ 2018-09-06 11:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Dmitry Vyukov, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On 2018/09/06 20:23, Michal Hocko wrote:
> On Thu 06-09-18 19:58:25, Tetsuo Handa wrote:
> [...]
>> >From 18876f287dd69a7c33f65c91cfcda3564233f55e Mon Sep 17 00:00:00 2001
>> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Date: Thu, 6 Sep 2018 19:53:18 +0900
>> Subject: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
>>
>> Since printk() is slow, printing one line takes nearly 0.01 second.
>> As a result, syzbot is stalling for 52 seconds trying to dump 5600
>> tasks at for_each_process() under RCU. Since such situation is almost
>> inflight fork bomb attack (the OOM killer will print similar tasks for
>> so many times), it makes little sense to print all candidate tasks.
>> Thus, this patch introduces 3 seconds limit for printing.
>>
>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Cc: Dmitry Vyukov <dvyukov@google.com>
> 
> You really love timeout based solutions with randomly chosen timeouts,
> don't you. This is just ugly as hell. We already have means to disable
> tasks dumping (see /proc/sys/vm/oom_dump_tasks).

I know /proc/sys/vm/oom_dump_tasks . Showing some entries while not always
printing all entries might be helpful. For example, allow
/proc/sys/vm/oom_dump_tasks > 1 and use it as timeout in seconds.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-06 11:40               ` Tetsuo Handa
@ 2018-09-06 11:53                 ` Michal Hocko
  2018-09-06 12:08                   ` Dmitry Vyukov
  2018-09-06 13:45                   ` Tetsuo Handa
  0 siblings, 2 replies; 25+ messages in thread
From: Michal Hocko @ 2018-09-06 11:53 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Dmitry Vyukov, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On Thu 06-09-18 20:40:34, Tetsuo Handa wrote:
> On 2018/09/06 20:23, Michal Hocko wrote:
> > On Thu 06-09-18 19:58:25, Tetsuo Handa wrote:
> > [...]
> >> >From 18876f287dd69a7c33f65c91cfcda3564233f55e Mon Sep 17 00:00:00 2001
> >> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> >> Date: Thu, 6 Sep 2018 19:53:18 +0900
> >> Subject: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
> >>
> >> Since printk() is slow, printing one line takes nearly 0.01 second.
> >> As a result, syzbot is stalling for 52 seconds trying to dump 5600
> >> tasks at for_each_process() under RCU. Since such situation is almost
> >> inflight fork bomb attack (the OOM killer will print similar tasks for
> >> so many times), it makes little sense to print all candidate tasks.
> >> Thus, this patch introduces 3 seconds limit for printing.
> >>
> >> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> >> Cc: Dmitry Vyukov <dvyukov@google.com>
> > 
> > You really love timeout based solutions with randomly chosen timeouts,
> > don't you. This is just ugly as hell. We already have means to disable
> > tasks dumping (see /proc/sys/vm/oom_dump_tasks).
> 
> I know /proc/sys/vm/oom_dump_tasks . Showing some entries while not always
> printing all entries might be helpful.

Not really. It could be more confusing than helpful. The main purpose of
the listing is to double check the list to understand the oom victim
selection. If you have a partial list you simply cannot do that.

If the iteration takes too long and I can imagine it does with zillions
of tasks then the proper way around it is either release the lock
periodically after N tasks is processed or outright skip the whole thing
if there are too many tasks. The first option is obviously tricky to
prevent from duplicate entries or other artifacts.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-06 11:53                 ` Michal Hocko
@ 2018-09-06 12:08                   ` Dmitry Vyukov
  2018-09-06 12:16                     ` Michal Hocko
  2018-09-06 13:45                   ` Tetsuo Handa
  1 sibling, 1 reply; 25+ messages in thread
From: Dmitry Vyukov @ 2018-09-06 12:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On Thu, Sep 6, 2018 at 1:53 PM, Michal Hocko <mhocko@kernel.org> wrote:
> On Thu 06-09-18 20:40:34, Tetsuo Handa wrote:
>> On 2018/09/06 20:23, Michal Hocko wrote:
>> > On Thu 06-09-18 19:58:25, Tetsuo Handa wrote:
>> > [...]
>> >> >From 18876f287dd69a7c33f65c91cfcda3564233f55e Mon Sep 17 00:00:00 2001
>> >> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> >> Date: Thu, 6 Sep 2018 19:53:18 +0900
>> >> Subject: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
>> >>
>> >> Since printk() is slow, printing one line takes nearly 0.01 second.
>> >> As a result, syzbot is stalling for 52 seconds trying to dump 5600
>> >> tasks at for_each_process() under RCU. Since such situation is almost
>> >> inflight fork bomb attack (the OOM killer will print similar tasks for
>> >> so many times), it makes little sense to print all candidate tasks.
>> >> Thus, this patch introduces 3 seconds limit for printing.
>> >>
>> >> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> >> Cc: Dmitry Vyukov <dvyukov@google.com>
>> >
>> > You really love timeout based solutions with randomly chosen timeouts,
>> > don't you. This is just ugly as hell. We already have means to disable
>> > tasks dumping (see /proc/sys/vm/oom_dump_tasks).
>>
>> I know /proc/sys/vm/oom_dump_tasks . Showing some entries while not always
>> printing all entries might be helpful.
>
> Not really. It could be more confusing than helpful. The main purpose of
> the listing is to double check the list to understand the oom victim
> selection. If you have a partial list you simply cannot do that.
>
> If the iteration takes too long and I can imagine it does with zillions
> of tasks then the proper way around it is either release the lock
> periodically after N tasks is processed or outright skip the whole thing
> if there are too many tasks. The first option is obviously tricky to
> prevent from duplicate entries or other artifacts.


So does anybody know if it can live lock picking up new tasks all the
time? That's what it looks like at first glance. I also don't remember
seeing anything similar in the past.
If it's a live lock and we resolve it, then we don't need to solve the
problem of too many tasks here.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-06 12:08                   ` Dmitry Vyukov
@ 2018-09-06 12:16                     ` Michal Hocko
  2018-09-11 16:37                       ` Oleg Nesterov
  0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2018-09-06 12:16 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Tetsuo Handa, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm, Oleg Nesterov

Ccing Oleg.

On Thu 06-09-18 14:08:43, Dmitry Vyukov wrote:
[...]
> So does anybody know if it can live lock picking up new tasks all the
> time? That's what it looks like at first glance. I also don't remember
> seeing anything similar in the past.

That is an interesting question. I find it unlikely here because it is
quite hard to get new tasks spawned while you are genuinely OOM. But we
do have these for_each_process loops at other places as well. Some of
them even controlled from the userspace. Some of them like exit path
(zap_threads) sound even more interesting even when that is a rare path.

So a question for Oleg I guess. Is it possible that for_each_process
live locks (or stalls for way too long/unbounded amount of time) under
heavy fork/exit loads? Is there any protection from that?

> If it's a live lock and we resolve it, then we don't need to solve the
> problem of too many tasks here.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-06 11:53                 ` Michal Hocko
  2018-09-06 12:08                   ` Dmitry Vyukov
@ 2018-09-06 13:45                   ` Tetsuo Handa
  2018-09-06 14:39                     ` Michal Hocko
  1 sibling, 1 reply; 25+ messages in thread
From: Tetsuo Handa @ 2018-09-06 13:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Dmitry Vyukov, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On 2018/09/06 20:53, Michal Hocko wrote:
> On Thu 06-09-18 20:40:34, Tetsuo Handa wrote:
>> On 2018/09/06 20:23, Michal Hocko wrote:
>>> On Thu 06-09-18 19:58:25, Tetsuo Handa wrote:
>>> [...]
>>>> >From 18876f287dd69a7c33f65c91cfcda3564233f55e Mon Sep 17 00:00:00 2001
>>>> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>>>> Date: Thu, 6 Sep 2018 19:53:18 +0900
>>>> Subject: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
>>>>
>>>> Since printk() is slow, printing one line takes nearly 0.01 second.
>>>> As a result, syzbot is stalling for 52 seconds trying to dump 5600
>>>> tasks at for_each_process() under RCU. Since such situation is almost
>>>> inflight fork bomb attack (the OOM killer will print similar tasks for
>>>> so many times), it makes little sense to print all candidate tasks.
>>>> Thus, this patch introduces 3 seconds limit for printing.
>>>>
>>>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>>>> Cc: Dmitry Vyukov <dvyukov@google.com>
>>>
>>> You really love timeout based solutions with randomly chosen timeouts,
>>> don't you. This is just ugly as hell. We already have means to disable
>>> tasks dumping (see /proc/sys/vm/oom_dump_tasks).
>>
>> I know /proc/sys/vm/oom_dump_tasks . Showing some entries while not always
>> printing all entries might be helpful.
> 
> Not really. It could be more confusing than helpful. The main purpose of
> the listing is to double check the list to understand the oom victim
> selection. If you have a partial list you simply cannot do that.

It serves as a safeguard for avoiding RCU stall warnings.

> 
> If the iteration takes too long and I can imagine it does with zillions
> of tasks then the proper way around it is either release the lock
> periodically after N tasks is processed or outright skip the whole thing
> if there are too many tasks. The first option is obviously tricky to
> prevent from duplicate entries or other artifacts.
> 

Can we add rcu_lock_break() like check_hung_uninterruptible_tasks() does?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-06 13:45                   ` Tetsuo Handa
@ 2018-09-06 14:39                     ` Michal Hocko
  2018-09-06 20:58                       ` Tetsuo Handa
  0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2018-09-06 14:39 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Dmitry Vyukov, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On Thu 06-09-18 22:45:26, Tetsuo Handa wrote:
> On 2018/09/06 20:53, Michal Hocko wrote:
> > On Thu 06-09-18 20:40:34, Tetsuo Handa wrote:
> >> On 2018/09/06 20:23, Michal Hocko wrote:
> >>> On Thu 06-09-18 19:58:25, Tetsuo Handa wrote:
> >>> [...]
> >>>> >From 18876f287dd69a7c33f65c91cfcda3564233f55e Mon Sep 17 00:00:00 2001
> >>>> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> >>>> Date: Thu, 6 Sep 2018 19:53:18 +0900
> >>>> Subject: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
> >>>>
> >>>> Since printk() is slow, printing one line takes nearly 0.01 second.
> >>>> As a result, syzbot is stalling for 52 seconds trying to dump 5600
> >>>> tasks at for_each_process() under RCU. Since such situation is almost
> >>>> inflight fork bomb attack (the OOM killer will print similar tasks for
> >>>> so many times), it makes little sense to print all candidate tasks.
> >>>> Thus, this patch introduces 3 seconds limit for printing.
> >>>>
> >>>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> >>>> Cc: Dmitry Vyukov <dvyukov@google.com>
> >>>
> >>> You really love timeout based solutions with randomly chosen timeouts,
> >>> don't you. This is just ugly as hell. We already have means to disable
> >>> tasks dumping (see /proc/sys/vm/oom_dump_tasks).
> >>
> >> I know /proc/sys/vm/oom_dump_tasks . Showing some entries while not always
> >> printing all entries might be helpful.
> > 
> > Not really. It could be more confusing than helpful. The main purpose of
> > the listing is to double check the list to understand the oom victim
> > selection. If you have a partial list you simply cannot do that.
> 
> It serves as a safeguard for avoiding RCU stall warnings.
> 
> > 
> > If the iteration takes too long and I can imagine it does with zillions
> > of tasks then the proper way around it is either release the lock
> > periodically after N tasks is processed or outright skip the whole thing
> > if there are too many tasks. The first option is obviously tricky to
> > prevent from duplicate entries or other artifacts.
> > 
> 
> Can we add rcu_lock_break() like check_hung_uninterruptible_tasks() does?

This would be a better variant of your timeout based approach. But it
can still produce an incomplete task list so it still consumes a lot of
resources to print a long list of tasks potentially while that list is not
useful for any evaluation. Maybe that is good enough. I don't know. I
would generally recommend to disable the whole thing with workloads with
many tasks though.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-06 14:39                     ` Michal Hocko
@ 2018-09-06 20:58                       ` Tetsuo Handa
  2018-09-07  8:27                         ` Michal Hocko
  0 siblings, 1 reply; 25+ messages in thread
From: Tetsuo Handa @ 2018-09-06 20:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Dmitry Vyukov, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On 2018/09/06 23:39, Michal Hocko wrote:
>>>> I know /proc/sys/vm/oom_dump_tasks . Showing some entries while not always
>>>> printing all entries might be helpful.
>>>
>>> Not really. It could be more confusing than helpful. The main purpose of
>>> the listing is to double check the list to understand the oom victim
>>> selection. If you have a partial list you simply cannot do that.
>>
>> It serves as a safeguard for avoiding RCU stall warnings.
>>
>>>
>>> If the iteration takes too long and I can imagine it does with zillions
>>> of tasks then the proper way around it is either release the lock
>>> periodically after N tasks is processed or outright skip the whole thing
>>> if there are too many tasks. The first option is obviously tricky to
>>> prevent from duplicate entries or other artifacts.
>>>
>>
>> Can we add rcu_lock_break() like check_hung_uninterruptible_tasks() does?
> 
> This would be a better variant of your timeout based approach. But it
> can still produce an incomplete task list so it still consumes a lot of
> resources to print a long list of tasks potentially while that list is not
> useful for any evaluation. Maybe that is good enough. I don't know. I
> would generally recommend to disable the whole thing with workloads with
> many tasks though.
> 

The "safeguard" is useful when there are _unexpectedly_ many tasks (like
syzbot in this case). Why not to allow those who want to avoid lockup to
avoid lockup rather than forcing them to disable the whole thing?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-06 20:58                       ` Tetsuo Handa
@ 2018-09-07  8:27                         ` Michal Hocko
  2018-09-07  9:36                           ` Dmitry Vyukov
  2018-09-07 10:20                           ` Tetsuo Handa
  0 siblings, 2 replies; 25+ messages in thread
From: Michal Hocko @ 2018-09-07  8:27 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Dmitry Vyukov, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On Fri 07-09-18 05:58:06, Tetsuo Handa wrote:
> On 2018/09/06 23:39, Michal Hocko wrote:
> >>>> I know /proc/sys/vm/oom_dump_tasks . Showing some entries while not always
> >>>> printing all entries might be helpful.
> >>>
> >>> Not really. It could be more confusing than helpful. The main purpose of
> >>> the listing is to double check the list to understand the oom victim
> >>> selection. If you have a partial list you simply cannot do that.
> >>
> >> It serves as a safeguard for avoiding RCU stall warnings.
> >>
> >>>
> >>> If the iteration takes too long and I can imagine it does with zillions
> >>> of tasks then the proper way around it is either release the lock
> >>> periodically after N tasks is processed or outright skip the whole thing
> >>> if there are too many tasks. The first option is obviously tricky to
> >>> prevent from duplicate entries or other artifacts.
> >>>
> >>
> >> Can we add rcu_lock_break() like check_hung_uninterruptible_tasks() does?
> > 
> > This would be a better variant of your timeout based approach. But it
> > can still produce an incomplete task list so it still consumes a lot of
> > resources to print a long list of tasks potentially while that list is not
> > useful for any evaluation. Maybe that is good enough. I don't know. I
> > would generally recommend to disable the whole thing with workloads with
> > many tasks though.
> > 
> 
> The "safeguard" is useful when there are _unexpectedly_ many tasks (like
> syzbot in this case). Why not to allow those who want to avoid lockup to
> avoid lockup rather than forcing them to disable the whole thing?

So you get an rcu lockup splat and what? Unless you have panic_on_rcu_stall
then this should be recoverable thing (assuming we cannot really
livelock as described by Dmitry).

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-07  8:27                         ` Michal Hocko
@ 2018-09-07  9:36                           ` Dmitry Vyukov
  2018-09-07 10:49                             ` Tetsuo Handa
  2018-09-07 11:08                             ` Michal Hocko
  2018-09-07 10:20                           ` Tetsuo Handa
  1 sibling, 2 replies; 25+ messages in thread
From: Dmitry Vyukov @ 2018-09-07  9:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On Fri, Sep 7, 2018 at 10:27 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Fri 07-09-18 05:58:06, Tetsuo Handa wrote:
>> On 2018/09/06 23:39, Michal Hocko wrote:
>> >>>> I know /proc/sys/vm/oom_dump_tasks . Showing some entries while not always
>> >>>> printing all entries might be helpful.
>> >>>
>> >>> Not really. It could be more confusing than helpful. The main purpose of
>> >>> the listing is to double check the list to understand the oom victim
>> >>> selection. If you have a partial list you simply cannot do that.
>> >>
>> >> It serves as a safeguard for avoiding RCU stall warnings.
>> >>
>> >>>
>> >>> If the iteration takes too long and I can imagine it does with zillions
>> >>> of tasks then the proper way around it is either release the lock
>> >>> periodically after N tasks is processed or outright skip the whole thing
>> >>> if there are too many tasks. The first option is obviously tricky to
>> >>> prevent from duplicate entries or other artifacts.
>> >>>
>> >>
>> >> Can we add rcu_lock_break() like check_hung_uninterruptible_tasks() does?
>> >
>> > This would be a better variant of your timeout based approach. But it
>> > can still produce an incomplete task list so it still consumes a lot of
>> > resources to print a long list of tasks potentially while that list is not
>> > useful for any evaluation. Maybe that is good enough. I don't know. I
>> > would generally recommend to disable the whole thing with workloads with
>> > many tasks though.
>> >
>>
>> The "safeguard" is useful when there are _unexpectedly_ many tasks (like
>> syzbot in this case). Why not to allow those who want to avoid lockup to
>> avoid lockup rather than forcing them to disable the whole thing?
>
> So you get an rcu lockup splat and what? Unless you have panic_on_rcu_stall
> then this should be recoverable thing (assuming we cannot really
> livelock as described by Dmitry).


Should I add "vm.oom_dump_tasks = 0" to /etc/sysctl.conf on syzbot?
It looks like it will make things faster, not pollute console output,
prevent these stalls and that output does not seem to be too useful
for debugging.

But I am still concerned as to what has changed recently. Potentially
this happens only on linux-next, at least that's where I saw all
existing reports.
New tasks seem to be added to the tail of the tasks list, but this
part does not seem to be changed recently in linux-next..

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-07  8:27                         ` Michal Hocko
  2018-09-07  9:36                           ` Dmitry Vyukov
@ 2018-09-07 10:20                           ` Tetsuo Handa
  1 sibling, 0 replies; 25+ messages in thread
From: Tetsuo Handa @ 2018-09-07 10:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Dmitry Vyukov, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On 2018/09/07 17:27, Michal Hocko wrote:
> On Fri 07-09-18 05:58:06, Tetsuo Handa wrote:
>> On 2018/09/06 23:39, Michal Hocko wrote:
>>>>>> I know /proc/sys/vm/oom_dump_tasks . Showing some entries while not always
>>>>>> printing all entries might be helpful.
>>>>>
>>>>> Not really. It could be more confusing than helpful. The main purpose of
>>>>> the listing is to double check the list to understand the oom victim
>>>>> selection. If you have a partial list you simply cannot do that.
>>>>
>>>> It serves as a safeguard for avoiding RCU stall warnings.
>>>>
>>>>>
>>>>> If the iteration takes too long and I can imagine it does with zillions
>>>>> of tasks then the proper way around it is either release the lock
>>>>> periodically after N tasks is processed or outright skip the whole thing
>>>>> if there are too many tasks. The first option is obviously tricky to
>>>>> prevent from duplicate entries or other artifacts.
>>>>>
>>>>
>>>> Can we add rcu_lock_break() like check_hung_uninterruptible_tasks() does?
>>>
>>> This would be a better variant of your timeout based approach. But it
>>> can still produce an incomplete task list so it still consumes a lot of
>>> resources to print a long list of tasks potentially while that list is not
>>> useful for any evaluation. Maybe that is good enough. I don't know. I
>>> would generally recommend to disable the whole thing with workloads with
>>> many tasks though.
>>>
>>
>> The "safeguard" is useful when there are _unexpectedly_ many tasks (like
>> syzbot in this case). Why not to allow those who want to avoid lockup to
>> avoid lockup rather than forcing them to disable the whole thing?
> 
> So you get an rcu lockup splat and what? Unless you have panic_on_rcu_stall
> then this should be recoverable thing (assuming we cannot really
> livelock as described by Dmitry).
> 

syzbot is getting hung task panic (140 seconds) because one dump_tasks() from
out_of_memory() consumes 52 seconds on a 2 CPU machine because we have only 
cond_resched() which can yield CPU resource to tasks which need CPU resource.
This is similar to a bug shown below.

  [upstream] INFO: task hung in fsnotify_mark_destroy_workfn
  https://syzkaller.appspot.com/bug?id=0e75779a6f0faac461510c6330514e8f0e893038

  [upstream] INFO: task hung in fsnotify_connector_destroy_workfn
  https://syzkaller.appspot.com/bug?id=aa11d2d767f3750ef9a40d156a149e9cfa735b73

Continuing printk() until khungtaskd fires is a stupid behavior.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-07  9:36                           ` Dmitry Vyukov
@ 2018-09-07 10:49                             ` Tetsuo Handa
  2018-09-07 11:08                             ` Michal Hocko
  1 sibling, 0 replies; 25+ messages in thread
From: Tetsuo Handa @ 2018-09-07 10:49 UTC (permalink / raw)
  To: Dmitry Vyukov, Andrew Morton
  Cc: Michal Hocko, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On 2018/09/07 18:36, Dmitry Vyukov wrote:
> But I am still concerned as to what has changed recently. Potentially
> this happens only on linux-next, at least that's where I saw all
> existing reports.
> New tasks seem to be added to the tail of the tasks list, but this
> part does not seem to be changed recently in linux-next..
> 

As far as dump_tasks() is saying, these tasks are alive. Thus, I want to know
what these tasks are doing (i.e. SysRq-t output). Since this is occurring in
linux-next, we can try CONFIG_DEBUG_AID_FOR_SYZBOT=y case like
https://lkml.org/lkml/2018/9/3/353 does. 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-07  9:36                           ` Dmitry Vyukov
  2018-09-07 10:49                             ` Tetsuo Handa
@ 2018-09-07 11:08                             ` Michal Hocko
  2018-09-08 14:00                               ` Dmitry Vyukov
  1 sibling, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2018-09-07 11:08 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Tetsuo Handa, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On Fri 07-09-18 11:36:55, Dmitry Vyukov wrote:
> On Fri, Sep 7, 2018 at 10:27 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > On Fri 07-09-18 05:58:06, Tetsuo Handa wrote:
> >> On 2018/09/06 23:39, Michal Hocko wrote:
> >> >>>> I know /proc/sys/vm/oom_dump_tasks . Showing some entries while not always
> >> >>>> printing all entries might be helpful.
> >> >>>
> >> >>> Not really. It could be more confusing than helpful. The main purpose of
> >> >>> the listing is to double check the list to understand the oom victim
> >> >>> selection. If you have a partial list you simply cannot do that.
> >> >>
> >> >> It serves as a safeguard for avoiding RCU stall warnings.
> >> >>
> >> >>>
> >> >>> If the iteration takes too long and I can imagine it does with zillions
> >> >>> of tasks then the proper way around it is either release the lock
> >> >>> periodically after N tasks is processed or outright skip the whole thing
> >> >>> if there are too many tasks. The first option is obviously tricky to
> >> >>> prevent from duplicate entries or other artifacts.
> >> >>>
> >> >>
> >> >> Can we add rcu_lock_break() like check_hung_uninterruptible_tasks() does?
> >> >
> >> > This would be a better variant of your timeout based approach. But it
> >> > can still produce an incomplete task list so it still consumes a lot of
> >> > resources to print a long list of tasks potentially while that list is not
> >> > useful for any evaluation. Maybe that is good enough. I don't know. I
> >> > would generally recommend to disable the whole thing with workloads with
> >> > many tasks though.
> >> >
> >>
> >> The "safeguard" is useful when there are _unexpectedly_ many tasks (like
> >> syzbot in this case). Why not to allow those who want to avoid lockup to
> >> avoid lockup rather than forcing them to disable the whole thing?
> >
> > So you get an rcu lockup splat and what? Unless you have panic_on_rcu_stall
> > then this should be recoverable thing (assuming we cannot really
> > livelock as described by Dmitry).
> 
> 
> Should I add "vm.oom_dump_tasks = 0" to /etc/sysctl.conf on syzbot?
> It looks like it will make things faster, not pollute console output,
> prevent these stalls and that output does not seem to be too useful
> for debugging.

I think that oom_dump_tasks has only very limited usefulness for your
testing.

> But I am still concerned as to what has changed recently. Potentially
> this happens only on linux-next, at least that's where I saw all
> existing reports.
> New tasks seem to be added to the tail of the tasks list, but this
> part does not seem to be changed recently in linux-next..

Yes, that would be interesting to find out.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-07 11:08                             ` Michal Hocko
@ 2018-09-08 14:00                               ` Dmitry Vyukov
  2018-09-10 14:36                                 ` Dmitry Vyukov
  0 siblings, 1 reply; 25+ messages in thread
From: Dmitry Vyukov @ 2018-09-08 14:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On Fri, Sep 7, 2018 at 1:08 PM, Michal Hocko <mhocko@kernel.org> wrote:
>> >> >>>> I know /proc/sys/vm/oom_dump_tasks . Showing some entries while not always
>> >> >>>> printing all entries might be helpful.
>> >> >>>
>> >> >>> Not really. It could be more confusing than helpful. The main purpose of
>> >> >>> the listing is to double check the list to understand the oom victim
>> >> >>> selection. If you have a partial list you simply cannot do that.
>> >> >>
>> >> >> It serves as a safeguard for avoiding RCU stall warnings.
>> >> >>
>> >> >>>
>> >> >>> If the iteration takes too long and I can imagine it does with zillions
>> >> >>> of tasks then the proper way around it is either release the lock
>> >> >>> periodically after N tasks is processed or outright skip the whole thing
>> >> >>> if there are too many tasks. The first option is obviously tricky to
>> >> >>> prevent from duplicate entries or other artifacts.
>> >> >>>
>> >> >>
>> >> >> Can we add rcu_lock_break() like check_hung_uninterruptible_tasks() does?
>> >> >
>> >> > This would be a better variant of your timeout based approach. But it
>> >> > can still produce an incomplete task list so it still consumes a lot of
>> >> > resources to print a long list of tasks potentially while that list is not
>> >> > useful for any evaluation. Maybe that is good enough. I don't know. I
>> >> > would generally recommend to disable the whole thing with workloads with
>> >> > many tasks though.
>> >> >
>> >>
>> >> The "safeguard" is useful when there are _unexpectedly_ many tasks (like
>> >> syzbot in this case). Why not to allow those who want to avoid lockup to
>> >> avoid lockup rather than forcing them to disable the whole thing?
>> >
>> > So you get an rcu lockup splat and what? Unless you have panic_on_rcu_stall
>> > then this should be recoverable thing (assuming we cannot really
>> > livelock as described by Dmitry).
>>
>>
>> Should I add "vm.oom_dump_tasks = 0" to /etc/sysctl.conf on syzbot?
>> It looks like it will make things faster, not pollute console output,
>> prevent these stalls and that output does not seem to be too useful
>> for debugging.
>
> I think that oom_dump_tasks has only very limited usefulness for your
> testing.
>
>> But I am still concerned as to what has changed recently. Potentially
>> this happens only on linux-next, at least that's where I saw all
>> existing reports.
>> New tasks seem to be added to the tail of the tasks list, but this
>> part does not seem to be changed recently in linux-next..
>
> Yes, that would be interesting to find out.


Looking at another similar report:
https://syzkaller.appspot.com/bug?extid=0d867757fdc016c0157e
It looks like it can be just syzkaller learning how to do fork bombs
after all (same binary multiplied infinite amount of times). Probably
required some creativity because test programs do not contain loops
per se and clone syscall does not accept start function pc.
I will set vm.oom_dump_tasks = 0 and try to additionally restrict it
with cgroups.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-08 14:00                               ` Dmitry Vyukov
@ 2018-09-10 14:36                                 ` Dmitry Vyukov
  0 siblings, 0 replies; 25+ messages in thread
From: Dmitry Vyukov @ 2018-09-10 14:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Andrew Morton, David Rientjes, syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On Sat, Sep 8, 2018 at 4:00 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Sep 7, 2018 at 1:08 PM, Michal Hocko <mhocko@kernel.org> wrote:
>>> >> >>>> I know /proc/sys/vm/oom_dump_tasks . Showing some entries while not always
>>> >> >>>> printing all entries might be helpful.
>>> >> >>>
>>> >> >>> Not really. It could be more confusing than helpful. The main purpose of
>>> >> >>> the listing is to double check the list to understand the oom victim
>>> >> >>> selection. If you have a partial list you simply cannot do that.
>>> >> >>
>>> >> >> It serves as a safeguard for avoiding RCU stall warnings.
>>> >> >>
>>> >> >>>
>>> >> >>> If the iteration takes too long and I can imagine it does with zillions
>>> >> >>> of tasks then the proper way around it is either release the lock
>>> >> >>> periodically after N tasks is processed or outright skip the whole thing
>>> >> >>> if there are too many tasks. The first option is obviously tricky to
>>> >> >>> prevent from duplicate entries or other artifacts.
>>> >> >>>
>>> >> >>
>>> >> >> Can we add rcu_lock_break() like check_hung_uninterruptible_tasks() does?
>>> >> >
>>> >> > This would be a better variant of your timeout based approach. But it
>>> >> > can still produce an incomplete task list so it still consumes a lot of
>>> >> > resources to print a long list of tasks potentially while that list is not
>>> >> > useful for any evaluation. Maybe that is good enough. I don't know. I
>>> >> > would generally recommend to disable the whole thing with workloads with
>>> >> > many tasks though.
>>> >> >
>>> >>
>>> >> The "safeguard" is useful when there are _unexpectedly_ many tasks (like
>>> >> syzbot in this case). Why not to allow those who want to avoid lockup to
>>> >> avoid lockup rather than forcing them to disable the whole thing?
>>> >
>>> > So you get an rcu lockup splat and what? Unless you have panic_on_rcu_stall
>>> > then this should be recoverable thing (assuming we cannot really
>>> > livelock as described by Dmitry).
>>>
>>>
>>> Should I add "vm.oom_dump_tasks = 0" to /etc/sysctl.conf on syzbot?
>>> It looks like it will make things faster, not pollute console output,
>>> prevent these stalls and that output does not seem to be too useful
>>> for debugging.
>>
>> I think that oom_dump_tasks has only very limited usefulness for your
>> testing.
>>
>>> But I am still concerned as to what has changed recently. Potentially
>>> this happens only on linux-next, at least that's where I saw all
>>> existing reports.
>>> New tasks seem to be added to the tail of the tasks list, but this
>>> part does not seem to be changed recently in linux-next..
>>
>> Yes, that would be interesting to find out.
>
>
> Looking at another similar report:
> https://syzkaller.appspot.com/bug?extid=0d867757fdc016c0157e
> It looks like it can be just syzkaller learning how to do fork bombs
> after all (same binary multiplied infinite amount of times). Probably
> required some creativity because test programs do not contain loops
> per se and clone syscall does not accept start function pc.
> I will set vm.oom_dump_tasks = 0 and try to additionally restrict it
> with cgroups.


FTR, syzkaller now restricts test processes with pids.max=32. This
should prevent any fork bombs.
https://github.com/google/syzkaller/commit/f167cb6b0957d34f95b1067525aa87083f264035

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-06 12:16                     ` Michal Hocko
@ 2018-09-11 16:37                       ` Oleg Nesterov
  2018-09-12 16:45                         ` Oleg Nesterov
  0 siblings, 1 reply; 25+ messages in thread
From: Oleg Nesterov @ 2018-09-11 16:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Dmitry Vyukov, Tetsuo Handa, Andrew Morton, David Rientjes,
	syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On 09/06, Michal Hocko wrote:
>
> Ccing Oleg.

Thanks, but somehow I can't find this patch on marc.info ...

> So a question for Oleg I guess. Is it possible that for_each_process
> live locks (or stalls for way too long/unbounded amount of time) under
> heavy fork/exit loads?

Oh yes, it can... plus other problems.

I even sent the initial patches which introduce for_each_process_break/continue
a long ago... I'll try to find them tommorrow and resend.

Oleg.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] mm, oom: Introduce time limit for dump_tasks duration.
  2018-09-11 16:37                       ` Oleg Nesterov
@ 2018-09-12 16:45                         ` Oleg Nesterov
  0 siblings, 0 replies; 25+ messages in thread
From: Oleg Nesterov @ 2018-09-12 16:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Dmitry Vyukov, Tetsuo Handa, Andrew Morton, David Rientjes,
	syzbot,
	'Dmitry Vyukov' via syzkaller-upstream-moderation,
	linux-mm

On 09/11, Oleg Nesterov wrote:
>
> On 09/06, Michal Hocko wrote:
> >
> > So a question for Oleg I guess. Is it possible that for_each_process
> > live locks (or stalls for way too long/unbounded amount of time) under
> > heavy fork/exit loads?
>
> Oh yes, it can... plus other problems.
>
> I even sent the initial patches which introduce for_each_process_break/continue
> a long ago... I'll try to find them tommorrow and resend.

Two years ago ;) I don't understand why there were ignored, please see
"[PATCH 0/2] introduce for_each_process_thread_break() and for_each_process_thread_continue()"
I sent a minute ago.

However, I didn't notice that the subject mentions oom/dump_tasks... As for
dump_tasks() it probably doesn't need the new helpers, I'll write another email
tomorrow, but perhaps the time limit is all we need.

Oleg.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: INFO: task hung in ext4_da_get_block_prep
       [not found] <0000000000004a6b700575178b5a@google.com>
       [not found] ` <CACT4Y+aPRGUqAdJCMDWM=Zcy8ZQcHyrsB1ZuWS4VB_+wvLfeaQ@mail.gmail.com>
@ 2019-03-03 11:33 ` syzbot
  1 sibling, 0 replies; 25+ messages in thread
From: syzbot @ 2019-03-03 11:33 UTC (permalink / raw)
  To: akpm, dvyukov, linux-mm, mhocko, oleg, penguin-kernel, rientjes,
	syzkaller-upstream-moderation

Auto-closing this bug as obsolete.
Crashes did not happen for a while, no reproducer and no activity.


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-03-03 11:33 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <0000000000004a6b700575178b5a@google.com>
     [not found] ` <CACT4Y+aPRGUqAdJCMDWM=Zcy8ZQcHyrsB1ZuWS4VB_+wvLfeaQ@mail.gmail.com>
2018-09-05 10:53   ` INFO: task hung in ext4_da_get_block_prep Tetsuo Handa
2018-09-05 11:06     ` Dmitry Vyukov
2018-09-06  5:53       ` Tetsuo Handa
2018-09-06  9:54         ` Dmitry Vyukov
2018-09-06 10:58           ` [PATCH] mm, oom: Introduce time limit for dump_tasks duration Tetsuo Handa
2018-09-06 11:07             ` Dmitry Vyukov
2018-09-06 11:25               ` Tetsuo Handa
2018-09-06 11:23             ` Michal Hocko
2018-09-06 11:40               ` Tetsuo Handa
2018-09-06 11:53                 ` Michal Hocko
2018-09-06 12:08                   ` Dmitry Vyukov
2018-09-06 12:16                     ` Michal Hocko
2018-09-11 16:37                       ` Oleg Nesterov
2018-09-12 16:45                         ` Oleg Nesterov
2018-09-06 13:45                   ` Tetsuo Handa
2018-09-06 14:39                     ` Michal Hocko
2018-09-06 20:58                       ` Tetsuo Handa
2018-09-07  8:27                         ` Michal Hocko
2018-09-07  9:36                           ` Dmitry Vyukov
2018-09-07 10:49                             ` Tetsuo Handa
2018-09-07 11:08                             ` Michal Hocko
2018-09-08 14:00                               ` Dmitry Vyukov
2018-09-10 14:36                                 ` Dmitry Vyukov
2018-09-07 10:20                           ` Tetsuo Handa
2019-03-03 11:33 ` INFO: task hung in ext4_da_get_block_prep syzbot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.