linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* general protection fault in oom_unkillable_task
@ 2019-06-15  1:08 syzbot
  2019-06-15  1:10 ` Tetsuo Handa
  2019-06-15  3:15 ` Shakeel Butt
  0 siblings, 2 replies; 19+ messages in thread
From: syzbot @ 2019-06-15  1:08 UTC (permalink / raw)
  To: akpm, ebiederm, guro, hannes, jglisse, linux-kernel, linux-mm,
	mhocko, penguin-kernel, shakeelb, syzkaller-bugs, yuzhoujian

Hello,

syzbot found the following crash on:

HEAD commit:    3f310e51 Add linux-next specific files for 20190607
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=15ab8771a00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5d176e1849bbc45
dashboard link: https://syzkaller.appspot.com/bug?extid=d0fc9d3c166bc5e4a94b
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d0fc9d3c166bc5e4a94b@syzkaller.appspotmail.com

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 28426 Comm: syz-executor.5 Not tainted 5.2.0-rc3-next-20190607  
#11
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline]
RIP: 0010:has_intersects_mems_allowed mm/oom_kill.c:84 [inline]
RIP: 0010:oom_unkillable_task mm/oom_kill.c:168 [inline]
RIP: 0010:oom_unkillable_task+0x180/0x400 mm/oom_kill.c:155
Code: c1 ea 03 80 3c 02 00 0f 85 80 02 00 00 4c 8b a3 10 07 00 00 48 b8 00  
00 00 00 00 fc ff df 4d 8d 74 24 10 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f  
85 67 02 00 00 49 8b 44 24 10 4c 8d a0 68 fa ff ff
RSP: 0018:ffff888000127490 EFLAGS: 00010a03
RAX: dffffc0000000000 RBX: ffff8880a4cd5438 RCX: ffffffff818dae9c
RDX: 100000000c3cc602 RSI: ffffffff818dac8d RDI: 0000000000000001
RBP: ffff8880001274d0 R08: ffff888000086180 R09: ffffed1015d26be0
R10: ffffed1015d26bdf R11: ffff8880ae935efb R12: 8000000061e63007
R13: 0000000000000000 R14: 8000000061e63017 R15: 1ffff11000024ea6
FS:  00005555561f5940(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000607304 CR3: 000000009237e000 CR4: 00000000001426f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Call Trace:
  oom_evaluate_task+0x49/0x520 mm/oom_kill.c:321
  mem_cgroup_scan_tasks+0xcc/0x180 mm/memcontrol.c:1169
  select_bad_process mm/oom_kill.c:374 [inline]
  out_of_memory mm/oom_kill.c:1088 [inline]
  out_of_memory+0x6b2/0x1280 mm/oom_kill.c:1035
  mem_cgroup_out_of_memory+0x1ca/0x230 mm/memcontrol.c:1573
  mem_cgroup_oom mm/memcontrol.c:1905 [inline]
  try_charge+0xfbe/0x1480 mm/memcontrol.c:2468
  mem_cgroup_try_charge+0x24d/0x5e0 mm/memcontrol.c:6073
  mem_cgroup_try_charge_delay+0x1f/0xa0 mm/memcontrol.c:6088
  do_huge_pmd_wp_page_fallback+0x24f/0x1680 mm/huge_memory.c:1201
  do_huge_pmd_wp_page+0x7fc/0x2160 mm/huge_memory.c:1359
  wp_huge_pmd mm/memory.c:3793 [inline]
  __handle_mm_fault+0x164c/0x3eb0 mm/memory.c:4006
  handle_mm_fault+0x3b7/0xa90 mm/memory.c:4053
  do_user_addr_fault arch/x86/mm/fault.c:1455 [inline]
  __do_page_fault+0x5ef/0xda0 arch/x86/mm/fault.c:1521
  do_page_fault+0x71/0x57d arch/x86/mm/fault.c:1552
  page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1156
RIP: 0033:0x400590
Code: 06 e9 49 01 00 00 48 8b 44 24 10 48 0b 44 24 28 75 1f 48 8b 14 24 48  
8b 7c 24 20 be 04 00 00 00 e8 f5 56 00 00 48 8b 74 24 08 <89> 06 e9 1e 01  
00 00 48 8b 44 24 08 48 8b 14 24 be 04 00 00 00 8b
RSP: 002b:00007fff7bc49780 EFLAGS: 00010206
RAX: 0000000000000001 RBX: 0000000000760000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000002000cffc RDI: 0000000000000001
RBP: fffffffffffffffe R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000075 R11: 0000000000000246 R12: 0000000000760008
R13: 00000000004c55f2 R14: 0000000000000000 R15: 00007fff7bc499b0
Modules linked in:
---[ end trace a65689219582ffff ]---
RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline]
RIP: 0010:has_intersects_mems_allowed mm/oom_kill.c:84 [inline]
RIP: 0010:oom_unkillable_task mm/oom_kill.c:168 [inline]
RIP: 0010:oom_unkillable_task+0x180/0x400 mm/oom_kill.c:155
Code: c1 ea 03 80 3c 02 00 0f 85 80 02 00 00 4c 8b a3 10 07 00 00 48 b8 00  
00 00 00 00 fc ff df 4d 8d 74 24 10 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f  
85 67 02 00 00 49 8b 44 24 10 4c 8d a0 68 fa ff ff
RSP: 0018:ffff888000127490 EFLAGS: 00010a03
RAX: dffffc0000000000 RBX: ffff8880a4cd5438 RCX: ffffffff818dae9c
RDX: 100000000c3cc602 RSI: ffffffff818dac8d RDI: 0000000000000001
RBP: ffff8880001274d0 R08: ffff888000086180 R09: ffffed1015d26be0
R10: ffffed1015d26bdf R11: ffff8880ae935efb R12: 8000000061e63007
R13: 0000000000000000 R14: 8000000061e63017 R15: 1ffff11000024ea6
FS:  00005555561f5940(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2f823000 CR3: 000000009237e000 CR4: 00000000001426f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-15  1:08 general protection fault in oom_unkillable_task syzbot
@ 2019-06-15  1:10 ` Tetsuo Handa
  2019-06-15 15:59   ` Tetsuo Handa
  2019-06-15  3:15 ` Shakeel Butt
  1 sibling, 1 reply; 19+ messages in thread
From: Tetsuo Handa @ 2019-06-15  1:10 UTC (permalink / raw)
  To: syzbot, akpm, mhocko
  Cc: ebiederm, guro, hannes, jglisse, linux-kernel, linux-mm,
	shakeelb, syzkaller-bugs, yuzhoujian

I'm not sure this patch is correct/safe. Can you try memcg OOM torture
test (including memcg group OOM killing enabled) with this patch applied?
----------------------------------------
From a436624c73d106fad9b880a6cef5abd83b2329a2 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sat, 15 Jun 2019 10:06:00 +0900
Subject: [PATCH] memcg: Protect whole mem_cgroup_scan_tasks() using RCU.

syzbot is reporting a GFP at for_each_thread() from a memcg OOM event [1].
While select_bad_process() in a global OOM event traverses whole threads
under RCU, select_bad_process() in a memcg OOM event is traversing threads
without RCU, and I guess that this can result in traversing bogus pointer.

Suppose a process containing three threads T1, T2, T3 is in a memcg.
T3 invokes memcg OOM killer, and starts traversing from T1. T3 elevates
refcount on T1, but T3 is preempted before oom_unkillable_task(T1) check.
Then, T1 reaches do_exit() and T1 does list_del_rcu(&T1->thread_node).

  do_exit() {
    cgroup_exit() {
      css_set_move_task(tsk, cset, NULL, false);
    }
    exit_notify() {
      release_task() {
        __exit_signal() {
          __unhash_process() {
            list_del_rcu(&p->thread_node);
          }
        }
      }
    }
  }

Then, T2 also reaches do_exit() and does list_del_rcu(&T2->thread_node).
Since the refcount of T1 was kept elevated by T3, T1 cannot be freed. But
since the refcount of T2 was not elevated by T3, T2 can complete do_exit()
and T2 is freed as soon as RCU grace period elapsed. At this point, since
T1 was removed from thread group before T2 was removed, T1's next thread
remains already freed T2. If memory used for T2 was reallocated before T3
resumes execution, accessing T1's next thread will not be reported as
use-after-free but memory referenced as T1's next thread contains bogus
values.

Thus, I think that the rule is: when traversing threads inside a section
between css_task_iter_start() and css_task_iter_end(), each thread must
not involve e.g. for_each_thread() unless whole section is protected by
RCU.

[1] https://syzkaller.appspot.com/bug?id=4559bc383e7c73a35bc6c8336557635459fb7a62

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+d0fc9d3c166bc5e4a94b@syzkaller.appspotmail.com>
---
 mm/memcontrol.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ba9138a..8e01f01 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1159,6 +1159,7 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
 
 	BUG_ON(memcg == root_mem_cgroup);
 
+	rcu_read_lock();
 	for_each_mem_cgroup_tree(iter, memcg) {
 		struct css_task_iter it;
 		struct task_struct *task;
@@ -1172,6 +1173,7 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
 			break;
 		}
 	}
+	rcu_read_unlock();
 	return ret;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-15  1:08 general protection fault in oom_unkillable_task syzbot
  2019-06-15  1:10 ` Tetsuo Handa
@ 2019-06-15  3:15 ` Shakeel Butt
  2019-06-15 13:49   ` Michal Hocko
  1 sibling, 1 reply; 19+ messages in thread
From: Shakeel Butt @ 2019-06-15  3:15 UTC (permalink / raw)
  To: syzbot
  Cc: Andrew Morton, Eric W. Biederman, Roman Gushchin,
	Johannes Weiner, Jérôme Glisse, LKML, Linux MM,
	Michal Hocko, Tetsuo Handa, syzkaller-bugs, yuzhoujian

On Fri, Jun 14, 2019 at 6:08 PM syzbot
<syzbot+d0fc9d3c166bc5e4a94b@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    3f310e51 Add linux-next specific files for 20190607
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=15ab8771a00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=5d176e1849bbc45
> dashboard link: https://syzkaller.appspot.com/bug?extid=d0fc9d3c166bc5e4a94b
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+d0fc9d3c166bc5e4a94b@syzkaller.appspotmail.com
>
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 28426 Comm: syz-executor.5 Not tainted 5.2.0-rc3-next-20190607
> #11
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline]
> RIP: 0010:has_intersects_mems_allowed mm/oom_kill.c:84 [inline]

It seems like oom_unkillable_task() is broken for memcg OOMs. It
should not be calling has_intersects_mems_allowed() for memcg OOMs.

> RIP: 0010:oom_unkillable_task mm/oom_kill.c:168 [inline]
> RIP: 0010:oom_unkillable_task+0x180/0x400 mm/oom_kill.c:155
> Code: c1 ea 03 80 3c 02 00 0f 85 80 02 00 00 4c 8b a3 10 07 00 00 48 b8 00
> 00 00 00 00 fc ff df 4d 8d 74 24 10 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f
> 85 67 02 00 00 49 8b 44 24 10 4c 8d a0 68 fa ff ff
> RSP: 0018:ffff888000127490 EFLAGS: 00010a03
> RAX: dffffc0000000000 RBX: ffff8880a4cd5438 RCX: ffffffff818dae9c
> RDX: 100000000c3cc602 RSI: ffffffff818dac8d RDI: 0000000000000001
> RBP: ffff8880001274d0 R08: ffff888000086180 R09: ffffed1015d26be0
> R10: ffffed1015d26bdf R11: ffff8880ae935efb R12: 8000000061e63007
> R13: 0000000000000000 R14: 8000000061e63017 R15: 1ffff11000024ea6
> FS:  00005555561f5940(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000607304 CR3: 000000009237e000 CR4: 00000000001426f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Call Trace:
>   oom_evaluate_task+0x49/0x520 mm/oom_kill.c:321
>   mem_cgroup_scan_tasks+0xcc/0x180 mm/memcontrol.c:1169
>   select_bad_process mm/oom_kill.c:374 [inline]
>   out_of_memory mm/oom_kill.c:1088 [inline]
>   out_of_memory+0x6b2/0x1280 mm/oom_kill.c:1035
>   mem_cgroup_out_of_memory+0x1ca/0x230 mm/memcontrol.c:1573
>   mem_cgroup_oom mm/memcontrol.c:1905 [inline]
>   try_charge+0xfbe/0x1480 mm/memcontrol.c:2468
>   mem_cgroup_try_charge+0x24d/0x5e0 mm/memcontrol.c:6073
>   mem_cgroup_try_charge_delay+0x1f/0xa0 mm/memcontrol.c:6088
>   do_huge_pmd_wp_page_fallback+0x24f/0x1680 mm/huge_memory.c:1201
>   do_huge_pmd_wp_page+0x7fc/0x2160 mm/huge_memory.c:1359
>   wp_huge_pmd mm/memory.c:3793 [inline]
>   __handle_mm_fault+0x164c/0x3eb0 mm/memory.c:4006
>   handle_mm_fault+0x3b7/0xa90 mm/memory.c:4053
>   do_user_addr_fault arch/x86/mm/fault.c:1455 [inline]
>   __do_page_fault+0x5ef/0xda0 arch/x86/mm/fault.c:1521
>   do_page_fault+0x71/0x57d arch/x86/mm/fault.c:1552
>   page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1156
> RIP: 0033:0x400590
> Code: 06 e9 49 01 00 00 48 8b 44 24 10 48 0b 44 24 28 75 1f 48 8b 14 24 48
> 8b 7c 24 20 be 04 00 00 00 e8 f5 56 00 00 48 8b 74 24 08 <89> 06 e9 1e 01
> 00 00 48 8b 44 24 08 48 8b 14 24 be 04 00 00 00 8b
> RSP: 002b:00007fff7bc49780 EFLAGS: 00010206
> RAX: 0000000000000001 RBX: 0000000000760000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 000000002000cffc RDI: 0000000000000001
> RBP: fffffffffffffffe R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000075 R11: 0000000000000246 R12: 0000000000760008
> R13: 00000000004c55f2 R14: 0000000000000000 R15: 00007fff7bc499b0
> Modules linked in:
> ---[ end trace a65689219582ffff ]---
> RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline]
> RIP: 0010:has_intersects_mems_allowed mm/oom_kill.c:84 [inline]
> RIP: 0010:oom_unkillable_task mm/oom_kill.c:168 [inline]
> RIP: 0010:oom_unkillable_task+0x180/0x400 mm/oom_kill.c:155
> Code: c1 ea 03 80 3c 02 00 0f 85 80 02 00 00 4c 8b a3 10 07 00 00 48 b8 00
> 00 00 00 00 fc ff df 4d 8d 74 24 10 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f
> 85 67 02 00 00 49 8b 44 24 10 4c 8d a0 68 fa ff ff
> RSP: 0018:ffff888000127490 EFLAGS: 00010a03
> RAX: dffffc0000000000 RBX: ffff8880a4cd5438 RCX: ffffffff818dae9c
> RDX: 100000000c3cc602 RSI: ffffffff818dac8d RDI: 0000000000000001
> RBP: ffff8880001274d0 R08: ffff888000086180 R09: ffffed1015d26be0
> R10: ffffed1015d26bdf R11: ffff8880ae935efb R12: 8000000061e63007
> R13: 0000000000000000 R14: 8000000061e63017 R15: 1ffff11000024ea6
> FS:  00005555561f5940(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000001b2f823000 CR3: 000000009237e000 CR4: 00000000001426f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-15  3:15 ` Shakeel Butt
@ 2019-06-15 13:49   ` Michal Hocko
  2019-06-15 16:11     ` Shakeel Butt
  2019-06-16  5:48     ` Hillf Danton
  0 siblings, 2 replies; 19+ messages in thread
From: Michal Hocko @ 2019-06-15 13:49 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: syzbot, Andrew Morton, Eric W. Biederman, Roman Gushchin,
	Johannes Weiner, Jérôme Glisse, LKML, Linux MM,
	Tetsuo Handa, syzkaller-bugs, yuzhoujian

On Fri 14-06-19 20:15:31, Shakeel Butt wrote:
> On Fri, Jun 14, 2019 at 6:08 PM syzbot
> <syzbot+d0fc9d3c166bc5e4a94b@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    3f310e51 Add linux-next specific files for 20190607
> > git tree:       linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=15ab8771a00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=5d176e1849bbc45
> > dashboard link: https://syzkaller.appspot.com/bug?extid=d0fc9d3c166bc5e4a94b
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+d0fc9d3c166bc5e4a94b@syzkaller.appspotmail.com
> >
> > kasan: CONFIG_KASAN_INLINE enabled
> > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > general protection fault: 0000 [#1] PREEMPT SMP KASAN
> > CPU: 0 PID: 28426 Comm: syz-executor.5 Not tainted 5.2.0-rc3-next-20190607
> > #11
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline]
> > RIP: 0010:has_intersects_mems_allowed mm/oom_kill.c:84 [inline]
> 
> It seems like oom_unkillable_task() is broken for memcg OOMs. It
> should not be calling has_intersects_mems_allowed() for memcg OOMs.

You are right. It doesn't really make much sense to check for the NUMA
policy/cpusets when the memcg oom is NUMA agnostic. Now that I am
looking at the code then I am really wondering why do we even call
oom_unkillable_task from oom_badness. proc_oom_score shouldn't care
about NUMA either.

In other words the following should fix this unless I am missing
something (task_in_mem_cgroup seems to be a relict from before the group
oom handling). But please note that I am still not fully operation and
laying in the bed.

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 5a58778c91d4..43eb479a5dc7 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -161,8 +161,8 @@ static bool oom_unkillable_task(struct task_struct *p,
 		return true;
 
 	/* When mem_cgroup_out_of_memory() and p is not member of the group */
-	if (memcg && !task_in_mem_cgroup(p, memcg))
-		return true;
+	if (memcg)
+		return false;
 
 	/* p may not have freeable memory in nodemask */
 	if (!has_intersects_mems_allowed(p, nodemask))
@@ -318,7 +318,7 @@ static int oom_evaluate_task(struct task_struct *task, void *arg)
 	struct oom_control *oc = arg;
 	unsigned long points;
 
-	if (oom_unkillable_task(task, NULL, oc->nodemask))
+	if (oom_unkillable_task(task, oc->memcg, oc->nodemask))
 		goto next;
 
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-15  1:10 ` Tetsuo Handa
@ 2019-06-15 15:59   ` Tetsuo Handa
  0 siblings, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2019-06-15 15:59 UTC (permalink / raw)
  To: syzbot, akpm, mhocko
  Cc: ebiederm, guro, hannes, jglisse, linux-kernel, linux-mm,
	shakeelb, syzkaller-bugs, yuzhoujian

On 2019/06/15 10:10, Tetsuo Handa wrote:
> I'm not sure this patch is correct/safe. Can you try memcg OOM torture
> test (including memcg group OOM killing enabled) with this patch applied?

Well, I guess this patch was wrong. The ordering of removing threads
does not matter as long as we start traversing via signal_struct.
The reason why crashed at for_each_thread() is unknown...


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-15 13:49   ` Michal Hocko
@ 2019-06-15 16:11     ` Shakeel Butt
  2019-06-15 16:48       ` Tetsuo Handa
  2019-06-17  6:33       ` Michal Hocko
  2019-06-16  5:48     ` Hillf Danton
  1 sibling, 2 replies; 19+ messages in thread
From: Shakeel Butt @ 2019-06-15 16:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: syzbot, Andrew Morton, Eric W. Biederman, Roman Gushchin,
	Johannes Weiner, Jérôme Glisse, LKML, Linux MM,
	Tetsuo Handa, syzkaller-bugs, yuzhoujian

On Sat, Jun 15, 2019 at 6:50 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 14-06-19 20:15:31, Shakeel Butt wrote:
> > On Fri, Jun 14, 2019 at 6:08 PM syzbot
> > <syzbot+d0fc9d3c166bc5e4a94b@syzkaller.appspotmail.com> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit:    3f310e51 Add linux-next specific files for 20190607
> > > git tree:       linux-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=15ab8771a00000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=5d176e1849bbc45
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=d0fc9d3c166bc5e4a94b
> > > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > >
> > > Unfortunately, I don't have any reproducer for this crash yet.
> > >
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+d0fc9d3c166bc5e4a94b@syzkaller.appspotmail.com
> > >
> > > kasan: CONFIG_KASAN_INLINE enabled
> > > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > > general protection fault: 0000 [#1] PREEMPT SMP KASAN
> > > CPU: 0 PID: 28426 Comm: syz-executor.5 Not tainted 5.2.0-rc3-next-20190607
> > > #11
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > Google 01/01/2011
> > > RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline]
> > > RIP: 0010:has_intersects_mems_allowed mm/oom_kill.c:84 [inline]
> >
> > It seems like oom_unkillable_task() is broken for memcg OOMs. It
> > should not be calling has_intersects_mems_allowed() for memcg OOMs.
>
> You are right. It doesn't really make much sense to check for the NUMA
> policy/cpusets when the memcg oom is NUMA agnostic. Now that I am
> looking at the code then I am really wondering why do we even call
> oom_unkillable_task from oom_badness. proc_oom_score shouldn't care
> about NUMA either.
>
> In other words the following should fix this unless I am missing
> something (task_in_mem_cgroup seems to be a relict from before the group
> oom handling). But please note that I am still not fully operation and
> laying in the bed.
>

Yes, we need something like this but not exactly.

> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 5a58778c91d4..43eb479a5dc7 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -161,8 +161,8 @@ static bool oom_unkillable_task(struct task_struct *p,
>                 return true;
>
>         /* When mem_cgroup_out_of_memory() and p is not member of the group */
> -       if (memcg && !task_in_mem_cgroup(p, memcg))
> -               return true;
> +       if (memcg)
> +               return false;

This will break the dump_tasks() usage of oom_unkillable_task(). We
can change dump_tasks() to traverse processes like
mem_cgroup_scan_tasks() for memcg OOMs.

>
>         /* p may not have freeable memory in nodemask */
>         if (!has_intersects_mems_allowed(p, nodemask))
> @@ -318,7 +318,7 @@ static int oom_evaluate_task(struct task_struct *task, void *arg)
>         struct oom_control *oc = arg;
>         unsigned long points;
>
> -       if (oom_unkillable_task(task, NULL, oc->nodemask))
> +       if (oom_unkillable_task(task, oc->memcg, oc->nodemask))
>                 goto next;
>
> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-15 16:11     ` Shakeel Butt
@ 2019-06-15 16:48       ` Tetsuo Handa
  2019-06-15 18:50         ` Shakeel Butt
  2019-06-17  6:33       ` Michal Hocko
  1 sibling, 1 reply; 19+ messages in thread
From: Tetsuo Handa @ 2019-06-15 16:48 UTC (permalink / raw)
  To: Shakeel Butt, Michal Hocko
  Cc: syzbot, Andrew Morton, Eric W. Biederman, Roman Gushchin,
	Johannes Weiner, Jérôme Glisse, LKML, Linux MM,
	syzkaller-bugs, yuzhoujian

On 2019/06/16 1:11, Shakeel Butt wrote:
> On Sat, Jun 15, 2019 at 6:50 AM Michal Hocko <mhocko@kernel.org> wrote:
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 5a58778c91d4..43eb479a5dc7 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -161,8 +161,8 @@ static bool oom_unkillable_task(struct task_struct *p,
>>                 return true;
>>
>>         /* When mem_cgroup_out_of_memory() and p is not member of the group */
>> -       if (memcg && !task_in_mem_cgroup(p, memcg))
>> -               return true;
>> +       if (memcg)
>> +               return false;
> 
> This will break the dump_tasks() usage of oom_unkillable_task(). We
> can change dump_tasks() to traverse processes like
> mem_cgroup_scan_tasks() for memcg OOMs.

While dump_tasks() traverses only each thread group, mem_cgroup_scan_tasks()
traverses each thread. To avoid printk()ing all threads in a thread group,
moving that check to

	if (memcg && !task_in_mem_cgroup(p, memcg))
		continue;

in dump_tasks() is better?

> 
>>
>>         /* p may not have freeable memory in nodemask */
>>         if (!has_intersects_mems_allowed(p, nodemask))


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-15 16:48       ` Tetsuo Handa
@ 2019-06-15 18:50         ` Shakeel Butt
  2019-06-15 21:33           ` Tetsuo Handa
  0 siblings, 1 reply; 19+ messages in thread
From: Shakeel Butt @ 2019-06-15 18:50 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Michal Hocko, syzbot, Andrew Morton, Eric W. Biederman,
	Roman Gushchin, Johannes Weiner, Jérôme Glisse, LKML,
	Linux MM, syzkaller-bugs, yuzhoujian

On Sat, Jun 15, 2019 at 9:49 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/06/16 1:11, Shakeel Butt wrote:
> > On Sat, Jun 15, 2019 at 6:50 AM Michal Hocko <mhocko@kernel.org> wrote:
> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> >> index 5a58778c91d4..43eb479a5dc7 100644
> >> --- a/mm/oom_kill.c
> >> +++ b/mm/oom_kill.c
> >> @@ -161,8 +161,8 @@ static bool oom_unkillable_task(struct task_struct *p,
> >>                 return true;
> >>
> >>         /* When mem_cgroup_out_of_memory() and p is not member of the group */
> >> -       if (memcg && !task_in_mem_cgroup(p, memcg))
> >> -               return true;
> >> +       if (memcg)
> >> +               return false;
> >
> > This will break the dump_tasks() usage of oom_unkillable_task(). We
> > can change dump_tasks() to traverse processes like
> > mem_cgroup_scan_tasks() for memcg OOMs.
>
> While dump_tasks() traverses only each thread group, mem_cgroup_scan_tasks()
> traverses each thread.

I think mem_cgroup_scan_tasks() traversing threads is not intentional
and css_task_iter_start in it should use CSS_TASK_ITER_PROCS as the
oom killer only cares about the processes or more specifically
mm_struct (though two different thread groups can have same mm_struct
but that is fine).

> To avoid printk()ing all threads in a thread group,
> moving that check to
>
>         if (memcg && !task_in_mem_cgroup(p, memcg))
>                 continue;
>
> in dump_tasks() is better?
>
> >
> >>
> >>         /* p may not have freeable memory in nodemask */
> >>         if (!has_intersects_mems_allowed(p, nodemask))
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-15 18:50         ` Shakeel Butt
@ 2019-06-15 21:33           ` Tetsuo Handa
  2019-06-16  7:37             ` Tetsuo Handa
  0 siblings, 1 reply; 19+ messages in thread
From: Tetsuo Handa @ 2019-06-15 21:33 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Michal Hocko, syzbot, Andrew Morton, Eric W. Biederman,
	Roman Gushchin, Johannes Weiner, Jérôme Glisse, LKML,
	Linux MM, syzkaller-bugs, yuzhoujian

On 2019/06/16 3:50, Shakeel Butt wrote:
>> While dump_tasks() traverses only each thread group, mem_cgroup_scan_tasks()
>> traverses each thread.
> 
> I think mem_cgroup_scan_tasks() traversing threads is not intentional
> and css_task_iter_start in it should use CSS_TASK_ITER_PROCS as the
> oom killer only cares about the processes or more specifically
> mm_struct (though two different thread groups can have same mm_struct
> but that is fine).

We can't use CSS_TASK_ITER_PROCS from mem_cgroup_scan_tasks(). I've tried
CSS_TASK_ITER_PROCS in an attempt to evaluate only one thread from each
thread group, but I found that CSS_TASK_ITER_PROCS causes skipping whole
threads in a thread group (and trivially allowing "Out of memory and no
killable processes...\n" flood) if thread group leader has already exited.

If we can agree with using a flag in mm_struct in order to track whether 
each mm_struct was already evaluated for each out_of_memory() call, we can
printk() only one thread from all thread groups sharing that mm_struct...


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-15 13:49   ` Michal Hocko
  2019-06-15 16:11     ` Shakeel Butt
@ 2019-06-16  5:48     ` Hillf Danton
  1 sibling, 0 replies; 19+ messages in thread
From: Hillf Danton @ 2019-06-16  5:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Shakeel Butt, syzbot, Andrew Morton, Eric W. Biederman,
	Roman Gushchin, Johannes Weiner, jglisse, LKML, Linux MM,
	Tetsuo Handa, syzkaller-bugs, yuzhoujian


Hello Michal

On Sat, 15 Jun 2019 13:49:57 +0000 (UTC) Michal Hocko wrote:
> On Fri 14-06-19 20:15:31, Shakeel Butt wrote:
> > On Fri, Jun 14, 2019 at 6:08 PM syzbot
> > <syzbot+d0fc9d3c166bc5e4a94b@syzkaller.appspotmail.com> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit:    3f310e51 Add linux-next specific files for 20190607
> > > git tree:       linux-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=15ab8771a00000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=5d176e1849bbc45
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=d0fc9d3c166bc5e4a94b
> > > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > >
> > > Unfortunately, I don't have any reproducer for this crash yet.
> > >
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+d0fc9d3c166bc5e4a94b@syzkaller.appspotmail.com
> > >
> > > kasan: CONFIG_KASAN_INLINE enabled
> > > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > > general protection fault: 0000 [#1] PREEMPT SMP KASAN
> > > CPU: 0 PID: 28426 Comm: syz-executor.5 Not tainted 5.2.0-rc3-next-20190607
> > > #11
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > Google 01/01/2011
> > > RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline]
> > > RIP: 0010:has_intersects_mems_allowed mm/oom_kill.c:84 [inline]
> > 
> > It seems like oom_unkillable_task() is broken for memcg OOMs. It
> > should not be calling has_intersects_mems_allowed() for memcg OOMs.
> 
> You are right. It doesn't really make much sense to check for the NUMA
> policy/cpusets when the memcg oom is NUMA agnostic. Now that I am
> looking at the code then I am really wondering why do we even call
> oom_unkillable_task from oom_badness. proc_oom_score shouldn't care
> about NUMA either.
> 
> In other words the following should fix this unless I am missing
> something (task_in_mem_cgroup seems to be a relict from before the group
> oom handling). But please note that I am still not fully operation and
> laying in the bed.
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 5a58778c91d4..43eb479a5dc7 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -161,8 +161,8 @@ static bool oom_unkillable_task(struct task_struct *p,
>  		return true;
>  
>  	/* When mem_cgroup_out_of_memory() and p is not member of the group */
> -	if (memcg && !task_in_mem_cgroup(p, memcg))
> -		return true;
> +	if (memcg)
> +		return false;
>
Given the members of the memcg:
1> tasks with flags having PF_EXITING set.
2> tasks without memory footprints on numa node-A-B.
3> tasks with memory footprint on numa node-A-B-C.

We'd try much to avoid killing 1> and 2> tasks imo to meet the current memory
allocation that only wants pages from node-A.

--
Hillf
>  	/* p may not have freeable memory in nodemask */
>  	if (!has_intersects_mems_allowed(p, nodemask))
> @@ -318,7 +318,7 @@ static int oom_evaluate_task(struct task_struct *task, void *arg)
>  	struct oom_control *oc = arg;
>  	unsigned long points;
>  
> -	if (oom_unkillable_task(task, NULL, oc->nodemask))
> +	if (oom_unkillable_task(task, oc->memcg, oc->nodemask))
>  		goto next;
>  
> -- 
> Michal Hocko
> SUSE Labs
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-15 21:33           ` Tetsuo Handa
@ 2019-06-16  7:37             ` Tetsuo Handa
  2019-06-16 15:13               ` Tetsuo Handa
  0 siblings, 1 reply; 19+ messages in thread
From: Tetsuo Handa @ 2019-06-16  7:37 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Michal Hocko, syzbot, Andrew Morton, Eric W. Biederman,
	Roman Gushchin, Johannes Weiner, Jérôme Glisse, LKML,
	Linux MM, syzkaller-bugs, yuzhoujian

On 2019/06/16 6:33, Tetsuo Handa wrote:
> On 2019/06/16 3:50, Shakeel Butt wrote:
>>> While dump_tasks() traverses only each thread group, mem_cgroup_scan_tasks()
>>> traverses each thread.
>>
>> I think mem_cgroup_scan_tasks() traversing threads is not intentional
>> and css_task_iter_start in it should use CSS_TASK_ITER_PROCS as the
>> oom killer only cares about the processes or more specifically
>> mm_struct (though two different thread groups can have same mm_struct
>> but that is fine).
> 
> We can't use CSS_TASK_ITER_PROCS from mem_cgroup_scan_tasks(). I've tried
> CSS_TASK_ITER_PROCS in an attempt to evaluate only one thread from each
> thread group, but I found that CSS_TASK_ITER_PROCS causes skipping whole
> threads in a thread group (and trivially allowing "Out of memory and no
> killable processes...\n" flood) if thread group leader has already exited.

Seems that CSS_TASK_ITER_PROCS from mem_cgroup_scan_tasks() is now working.
Maybe I was confused due to without commit 7775face207922ea ("memcg: killed
threads should not invoke memcg OOM killer"). We can scan one thread from
each thread group, and remove

	/* Prefer thread group leaders for display purposes */
	if (points == oc->chosen_points && thread_group_leader(oc->chosen))
		goto next;

check.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-16  7:37             ` Tetsuo Handa
@ 2019-06-16 15:13               ` Tetsuo Handa
  2019-06-17  6:31                 ` Michal Hocko
  2019-06-17 13:23                 ` Shakeel Butt
  0 siblings, 2 replies; 19+ messages in thread
From: Tetsuo Handa @ 2019-06-16 15:13 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Michal Hocko, syzbot, Andrew Morton, Eric W. Biederman,
	Roman Gushchin, Johannes Weiner, Jérôme Glisse, LKML,
	Linux MM, syzkaller-bugs, yuzhoujian

On 2019/06/16 16:37, Tetsuo Handa wrote:
> On 2019/06/16 6:33, Tetsuo Handa wrote:
>> On 2019/06/16 3:50, Shakeel Butt wrote:
>>>> While dump_tasks() traverses only each thread group, mem_cgroup_scan_tasks()
>>>> traverses each thread.
>>>
>>> I think mem_cgroup_scan_tasks() traversing threads is not intentional
>>> and css_task_iter_start in it should use CSS_TASK_ITER_PROCS as the
>>> oom killer only cares about the processes or more specifically
>>> mm_struct (though two different thread groups can have same mm_struct
>>> but that is fine).
>>
>> We can't use CSS_TASK_ITER_PROCS from mem_cgroup_scan_tasks(). I've tried
>> CSS_TASK_ITER_PROCS in an attempt to evaluate only one thread from each
>> thread group, but I found that CSS_TASK_ITER_PROCS causes skipping whole
>> threads in a thread group (and trivially allowing "Out of memory and no
>> killable processes...\n" flood) if thread group leader has already exited.
> 
> Seems that CSS_TASK_ITER_PROCS from mem_cgroup_scan_tasks() is now working.


I found a reproducer and the commit.

----------------------------------------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sched.h>
#include <sys/mman.h>
#include <asm/unistd.h>

static const unsigned long size = 1048576 * 200;
static int thread(void *unused)
{
        int fd = open("/dev/zero", O_RDONLY);
        char *buf = mmap(NULL, size, PROT_WRITE | PROT_READ,
                         MAP_ANONYMOUS | MAP_SHARED, EOF, 0);
        sleep(1);
        read(fd, buf, size);
        return syscall(__NR_exit, 0);
}
int main(int argc, char *argv[])
{
        FILE *fp;
        mkdir("/sys/fs/cgroup/memory/test1", 0755);
        fp = fopen("/sys/fs/cgroup/memory/test1/memory.limit_in_bytes", "w");
        fprintf(fp, "%lu\n", size);
        fclose(fp);
        fp = fopen("/sys/fs/cgroup/memory/test1/tasks", "w");
        fprintf(fp, "%u\n", getpid());
        fclose(fp);
        clone(thread, malloc(8192) + 4096, CLONE_SIGHAND | CLONE_THREAD | CLONE_VM, NULL);
        return syscall(__NR_exit, 0);
}
----------------------------------------

Here is a patch to use CSS_TASK_ITER_PROCS.

From 415e52cf55bc4ad931e4f005421b827f0b02693d Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Mon, 17 Jun 2019 00:09:38 +0900
Subject: [PATCH] mm: memcontrol: Use CSS_TASK_ITER_PROCS at mem_cgroup_scan_tasks().

Since commit c03cd7738a83b137 ("cgroup: Include dying leaders with live
threads in PROCS iterations") corrected how CSS_TASK_ITER_PROCS works,
mem_cgroup_scan_tasks() can use CSS_TASK_ITER_PROCS in order to check
only one thread from each thread group.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 mm/memcontrol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ba9138a..b09ff45 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1163,7 +1163,7 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
 		struct css_task_iter it;
 		struct task_struct *task;
 
-		css_task_iter_start(&iter->css, 0, &it);
+		css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it);
 		while (!ret && (task = css_task_iter_next(&it)))
 			ret = fn(task, arg);
 		css_task_iter_end(&it);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-16 15:13               ` Tetsuo Handa
@ 2019-06-17  6:31                 ` Michal Hocko
  2019-06-17 13:23                 ` Shakeel Butt
  1 sibling, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2019-06-17  6:31 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Shakeel Butt, syzbot, Andrew Morton, Eric W. Biederman,
	Roman Gushchin, Johannes Weiner, Jérôme Glisse, LKML,
	Linux MM, syzkaller-bugs, yuzhoujian

On Mon 17-06-19 00:13:47, Tetsuo Handa wrote:
[...]
> >From 415e52cf55bc4ad931e4f005421b827f0b02693d Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Mon, 17 Jun 2019 00:09:38 +0900
> Subject: [PATCH] mm: memcontrol: Use CSS_TASK_ITER_PROCS at mem_cgroup_scan_tasks().
> 
> Since commit c03cd7738a83b137 ("cgroup: Include dying leaders with live
> threads in PROCS iterations") corrected how CSS_TASK_ITER_PROCS works,
> mem_cgroup_scan_tasks() can use CSS_TASK_ITER_PROCS in order to check
> only one thread from each thread group.

O(Threads#) is definitely much worse than O(proc#)

> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/memcontrol.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ba9138a..b09ff45 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1163,7 +1163,7 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
>  		struct css_task_iter it;
>  		struct task_struct *task;
>  
> -		css_task_iter_start(&iter->css, 0, &it);
> +		css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it);
>  		while (!ret && (task = css_task_iter_next(&it)))
>  			ret = fn(task, arg);
>  		css_task_iter_end(&it);
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-15 16:11     ` Shakeel Butt
  2019-06-15 16:48       ` Tetsuo Handa
@ 2019-06-17  6:33       ` Michal Hocko
  2019-06-17  9:56         ` Tetsuo Handa
  1 sibling, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-06-17  6:33 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: syzbot, Andrew Morton, Eric W. Biederman, Roman Gushchin,
	Johannes Weiner, Jérôme Glisse, LKML, Linux MM,
	Tetsuo Handa, syzkaller-bugs, yuzhoujian

On Sat 15-06-19 09:11:37, Shakeel Butt wrote:
> On Sat, Jun 15, 2019 at 6:50 AM Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 5a58778c91d4..43eb479a5dc7 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -161,8 +161,8 @@ static bool oom_unkillable_task(struct task_struct *p,
> >                 return true;
> >
> >         /* When mem_cgroup_out_of_memory() and p is not member of the group */
> > -       if (memcg && !task_in_mem_cgroup(p, memcg))
> > -               return true;
> > +       if (memcg)
> > +               return false;
> 
> This will break the dump_tasks() usage of oom_unkillable_task(). We
> can change dump_tasks() to traverse processes like
> mem_cgroup_scan_tasks() for memcg OOMs.

Right you are. Doing a similar trick to the oom victim selection is
indeed better. We should really strive to not doing a global process
iteration when we can do a targeted scan. Care to send a patch?

Thanks!
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-17  6:33       ` Michal Hocko
@ 2019-06-17  9:56         ` Tetsuo Handa
  2019-06-17 10:13           ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: Tetsuo Handa @ 2019-06-17  9:56 UTC (permalink / raw)
  To: Michal Hocko, Shakeel Butt
  Cc: syzbot, Andrew Morton, Eric W. Biederman, Roman Gushchin,
	Johannes Weiner, Jérôme Glisse, LKML, Linux MM,
	syzkaller-bugs, yuzhoujian

On 2019/06/17 15:33, Michal Hocko wrote:
> On Sat 15-06-19 09:11:37, Shakeel Butt wrote:
>> On Sat, Jun 15, 2019 at 6:50 AM Michal Hocko <mhocko@kernel.org> wrote:
> [...]
>>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>>> index 5a58778c91d4..43eb479a5dc7 100644
>>> --- a/mm/oom_kill.c
>>> +++ b/mm/oom_kill.c
>>> @@ -161,8 +161,8 @@ static bool oom_unkillable_task(struct task_struct *p,
>>>                 return true;
>>>
>>>         /* When mem_cgroup_out_of_memory() and p is not member of the group */
>>> -       if (memcg && !task_in_mem_cgroup(p, memcg))
>>> -               return true;
>>> +       if (memcg)
>>> +               return false;
>>
>> This will break the dump_tasks() usage of oom_unkillable_task(). We
>> can change dump_tasks() to traverse processes like
>> mem_cgroup_scan_tasks() for memcg OOMs.
> 
> Right you are. Doing a similar trick to the oom victim selection is
> indeed better. We should really strive to not doing a global process
> iteration when we can do a targeted scan. Care to send a patch?

I posted a patch that (as a side effect) avoids oom_unkillable_task() from dump_tasks() at
https://lore.kernel.org/linux-mm/1558519686-16057-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp/ .
What do you think?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-17  9:56         ` Tetsuo Handa
@ 2019-06-17 10:13           ` Michal Hocko
  0 siblings, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2019-06-17 10:13 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Shakeel Butt, syzbot, Andrew Morton, Eric W. Biederman,
	Roman Gushchin, Johannes Weiner, Jérôme Glisse, LKML,
	Linux MM, syzkaller-bugs, yuzhoujian

On Mon 17-06-19 18:56:47, Tetsuo Handa wrote:
> On 2019/06/17 15:33, Michal Hocko wrote:
> > On Sat 15-06-19 09:11:37, Shakeel Butt wrote:
> >> On Sat, Jun 15, 2019 at 6:50 AM Michal Hocko <mhocko@kernel.org> wrote:
> > [...]
> >>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> >>> index 5a58778c91d4..43eb479a5dc7 100644
> >>> --- a/mm/oom_kill.c
> >>> +++ b/mm/oom_kill.c
> >>> @@ -161,8 +161,8 @@ static bool oom_unkillable_task(struct task_struct *p,
> >>>                 return true;
> >>>
> >>>         /* When mem_cgroup_out_of_memory() and p is not member of the group */
> >>> -       if (memcg && !task_in_mem_cgroup(p, memcg))
> >>> -               return true;
> >>> +       if (memcg)
> >>> +               return false;
> >>
> >> This will break the dump_tasks() usage of oom_unkillable_task(). We
> >> can change dump_tasks() to traverse processes like
> >> mem_cgroup_scan_tasks() for memcg OOMs.
> > 
> > Right you are. Doing a similar trick to the oom victim selection is
> > indeed better. We should really strive to not doing a global process
> > iteration when we can do a targeted scan. Care to send a patch?
> 
> I posted a patch that (as a side effect) avoids oom_unkillable_task() from dump_tasks() at
> https://lore.kernel.org/linux-mm/1558519686-16057-2-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp/ .
> What do you think?

I am sorry but I didn't get to look at this series yet. Anyway, changing
the dump_tasks to use a cgroup iterator for the memcg oom sounds like a
straight forward thing to do without making much more changes around.
Global task list iteration under a single RCU is a more complex problem
that is not limited to the OOM path.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-16 15:13               ` Tetsuo Handa
  2019-06-17  6:31                 ` Michal Hocko
@ 2019-06-17 13:23                 ` Shakeel Butt
  2019-06-18  1:45                   ` Andrew Morton
  1 sibling, 1 reply; 19+ messages in thread
From: Shakeel Butt @ 2019-06-17 13:23 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Michal Hocko, syzbot, Andrew Morton, Eric W. Biederman,
	Roman Gushchin, Johannes Weiner, Jérôme Glisse, LKML,
	Linux MM, syzkaller-bugs, yuzhoujian

On Sun, Jun 16, 2019 at 8:14 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/06/16 16:37, Tetsuo Handa wrote:
> > On 2019/06/16 6:33, Tetsuo Handa wrote:
> >> On 2019/06/16 3:50, Shakeel Butt wrote:
> >>>> While dump_tasks() traverses only each thread group, mem_cgroup_scan_tasks()
> >>>> traverses each thread.
> >>>
> >>> I think mem_cgroup_scan_tasks() traversing threads is not intentional
> >>> and css_task_iter_start in it should use CSS_TASK_ITER_PROCS as the
> >>> oom killer only cares about the processes or more specifically
> >>> mm_struct (though two different thread groups can have same mm_struct
> >>> but that is fine).
> >>
> >> We can't use CSS_TASK_ITER_PROCS from mem_cgroup_scan_tasks(). I've tried
> >> CSS_TASK_ITER_PROCS in an attempt to evaluate only one thread from each
> >> thread group, but I found that CSS_TASK_ITER_PROCS causes skipping whole
> >> threads in a thread group (and trivially allowing "Out of memory and no
> >> killable processes...\n" flood) if thread group leader has already exited.
> >
> > Seems that CSS_TASK_ITER_PROCS from mem_cgroup_scan_tasks() is now working.
>
>
> I found a reproducer and the commit.
>
> ----------------------------------------
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <sched.h>
> #include <sys/mman.h>
> #include <asm/unistd.h>
>
> static const unsigned long size = 1048576 * 200;
> static int thread(void *unused)
> {
>         int fd = open("/dev/zero", O_RDONLY);
>         char *buf = mmap(NULL, size, PROT_WRITE | PROT_READ,
>                          MAP_ANONYMOUS | MAP_SHARED, EOF, 0);
>         sleep(1);
>         read(fd, buf, size);
>         return syscall(__NR_exit, 0);
> }
> int main(int argc, char *argv[])
> {
>         FILE *fp;
>         mkdir("/sys/fs/cgroup/memory/test1", 0755);
>         fp = fopen("/sys/fs/cgroup/memory/test1/memory.limit_in_bytes", "w");
>         fprintf(fp, "%lu\n", size);
>         fclose(fp);
>         fp = fopen("/sys/fs/cgroup/memory/test1/tasks", "w");
>         fprintf(fp, "%u\n", getpid());
>         fclose(fp);
>         clone(thread, malloc(8192) + 4096, CLONE_SIGHAND | CLONE_THREAD | CLONE_VM, NULL);
>         return syscall(__NR_exit, 0);
> }
> ----------------------------------------
>
> Here is a patch to use CSS_TASK_ITER_PROCS.
>
> From 415e52cf55bc4ad931e4f005421b827f0b02693d Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Mon, 17 Jun 2019 00:09:38 +0900
> Subject: [PATCH] mm: memcontrol: Use CSS_TASK_ITER_PROCS at mem_cgroup_scan_tasks().
>
> Since commit c03cd7738a83b137 ("cgroup: Include dying leaders with live
> threads in PROCS iterations") corrected how CSS_TASK_ITER_PROCS works,
> mem_cgroup_scan_tasks() can use CSS_TASK_ITER_PROCS in order to check
> only one thread from each thread group.
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

Reviewed-by: Shakeel Butt <shakeelb@google.com>

Why not add the reproducer in the commit message?

> ---
>  mm/memcontrol.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ba9138a..b09ff45 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1163,7 +1163,7 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
>                 struct css_task_iter it;
>                 struct task_struct *task;
>
> -               css_task_iter_start(&iter->css, 0, &it);
> +               css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it);
>                 while (!ret && (task = css_task_iter_next(&it)))
>                         ret = fn(task, arg);
>                 css_task_iter_end(&it);
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-17 13:23                 ` Shakeel Butt
@ 2019-06-18  1:45                   ` Andrew Morton
  2019-06-18  4:21                     ` Shakeel Butt
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2019-06-18  1:45 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Tetsuo Handa, Michal Hocko, syzbot, Eric W. Biederman,
	Roman Gushchin, Johannes Weiner, Jérôme Glisse, LKML,
	Linux MM, syzkaller-bugs, yuzhoujian

On Mon, 17 Jun 2019 06:23:07 -0700 Shakeel Butt <shakeelb@google.com> wrote:

> > Here is a patch to use CSS_TASK_ITER_PROCS.
> >
> > From 415e52cf55bc4ad931e4f005421b827f0b02693d Mon Sep 17 00:00:00 2001
> > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Date: Mon, 17 Jun 2019 00:09:38 +0900
> > Subject: [PATCH] mm: memcontrol: Use CSS_TASK_ITER_PROCS at mem_cgroup_scan_tasks().
> >
> > Since commit c03cd7738a83b137 ("cgroup: Include dying leaders with live
> > threads in PROCS iterations") corrected how CSS_TASK_ITER_PROCS works,
> > mem_cgroup_scan_tasks() can use CSS_TASK_ITER_PROCS in order to check
> > only one thread from each thread group.
> >
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> 
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> 
> Why not add the reproducer in the commit message?

That would be nice.

More nice would be, as always, a descriptoin of the user-visible impact
of the patch.

As I understand it, it's just a bit of a cleanup against current
mainline but without this patch in place, Shakeel's "mm, oom: refactor
dump_tasks for memcg OOMs" will cause kernel crashes.  Correct?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: general protection fault in oom_unkillable_task
  2019-06-18  1:45                   ` Andrew Morton
@ 2019-06-18  4:21                     ` Shakeel Butt
  0 siblings, 0 replies; 19+ messages in thread
From: Shakeel Butt @ 2019-06-18  4:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tetsuo Handa, Michal Hocko, syzbot, Eric W. Biederman,
	Roman Gushchin, Johannes Weiner, Jérôme Glisse, LKML,
	Linux MM, syzkaller-bugs, yuzhoujian

On Mon, Jun 17, 2019 at 6:45 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 17 Jun 2019 06:23:07 -0700 Shakeel Butt <shakeelb@google.com> wrote:
>
> > > Here is a patch to use CSS_TASK_ITER_PROCS.
> > >
> > > From 415e52cf55bc4ad931e4f005421b827f0b02693d Mon Sep 17 00:00:00 2001
> > > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > Date: Mon, 17 Jun 2019 00:09:38 +0900
> > > Subject: [PATCH] mm: memcontrol: Use CSS_TASK_ITER_PROCS at mem_cgroup_scan_tasks().
> > >
> > > Since commit c03cd7738a83b137 ("cgroup: Include dying leaders with live
> > > threads in PROCS iterations") corrected how CSS_TASK_ITER_PROCS works,
> > > mem_cgroup_scan_tasks() can use CSS_TASK_ITER_PROCS in order to check
> > > only one thread from each thread group.
> > >
> > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> >
> > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> >
> > Why not add the reproducer in the commit message?
>
> That would be nice.
>
> More nice would be, as always, a descriptoin of the user-visible impact
> of the patch.
>

This is just a cleanup and optimization where instead of traversing
all the threads in a memcg, we only traverse only one thread for each
thread group in a memcg. There is no user visible impact.

> As I understand it, it's just a bit of a cleanup against current
> mainline but without this patch in place, Shakeel's "mm, oom: refactor
> dump_tasks for memcg OOMs" will cause kernel crashes.  Correct?

No, the patch "mm, oom: refactor dump_tasks for memcg OOMs" is making
dump_stacks not depend on the memcg check within
oom_unkillable_task().

"mm, oom: fix oom_unkillable_task for memcg OOMs" is the actual fix
which is making oom_unkillable_task() correctly handle the memcg OOMs
code paths.

Shakeel


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-06-18  4:21 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-15  1:08 general protection fault in oom_unkillable_task syzbot
2019-06-15  1:10 ` Tetsuo Handa
2019-06-15 15:59   ` Tetsuo Handa
2019-06-15  3:15 ` Shakeel Butt
2019-06-15 13:49   ` Michal Hocko
2019-06-15 16:11     ` Shakeel Butt
2019-06-15 16:48       ` Tetsuo Handa
2019-06-15 18:50         ` Shakeel Butt
2019-06-15 21:33           ` Tetsuo Handa
2019-06-16  7:37             ` Tetsuo Handa
2019-06-16 15:13               ` Tetsuo Handa
2019-06-17  6:31                 ` Michal Hocko
2019-06-17 13:23                 ` Shakeel Butt
2019-06-18  1:45                   ` Andrew Morton
2019-06-18  4:21                     ` Shakeel Butt
2019-06-17  6:33       ` Michal Hocko
2019-06-17  9:56         ` Tetsuo Handa
2019-06-17 10:13           ` Michal Hocko
2019-06-16  5:48     ` Hillf Danton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).