* general protection fault in put_pid @ 2018-12-11 20:23 syzbot 2018-12-12 10:55 ` Dmitry Vyukov ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: syzbot @ 2018-12-11 20:23 UTC (permalink / raw) To: akpm, dhowells, ebiederm, ktsanaktsidis, linux-kernel, mhocko, rppt, sfr, syzkaller-bugs, willy Hello, syzbot found the following crash on: HEAD commit: f5d582777bcb Merge branch 'for-linus' of git://git.kernel... git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=135bc547400000 kernel config: https://syzkaller.appspot.com/x/.config?x=c8970c89a0efbb23 dashboard link: https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac compiler: gcc (GCC) 8.0.1 20180413 (experimental) syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16803afb400000 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+1145ec2e23165570c3ac@syzkaller.appspotmail.com kmem_cache 221KB 225KB Out of memory: Kill process 6139 (syz-execprog) score 1 or sacrifice child Killed process 6164 (syz-executor0) total-vm:37444kB, anon-rss:60kB, file-rss:0kB, shmem-rss:0kB oom_reaper: reaped process 6164 (syz-executor0), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 6159 Comm: syz-executor3 Not tainted 4.20.0-rc6+ #151 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:put_pid.part.3+0xb6/0x240 kernel/pid.c:108 Code: d2 0f 85 89 01 00 00 44 8b 63 04 49 8d 44 24 03 48 c1 e0 04 48 8d 7c 03 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 66 01 00 00 49 83 c4 03 be 04 00 00 00 48 89 df RSP: 0018:ffff8881ae116e78 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffffffff816149a0 RCX: ffffffff833e420e RDX: 00000001933eab7c RSI: ffffffff8152bf8e RDI: 0000000c99f55be0 RBP: ffff8881ae116f08 R08: ffff8881cdbf2300 R09: fffff52001507600 R10: fffff52001507600 R11: ffffc9000a83b003 R12: 00000000d1894120 R13: 1ffff11035c22dd0 R14: ffff8881ae116ee0 R15: dffffc0000000000 FS: 000000000166d940(0000) GS:ffff8881dae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000075c458 CR3: 00000001d2bd1000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: put_pid+0x1f/0x30 kernel/pid.c:105 ipc_update_pid ipc/util.h:159 [inline] freeary+0x10c8/0x1a40 ipc/sem.c:1167 free_ipcs+0x9f/0x1c0 ipc/namespace.c:112 sem_exit_ns+0x20/0x40 ipc/sem.c:237 free_ipc_ns ipc/namespace.c:120 [inline] put_ipc_ns+0x66/0x180 ipc/namespace.c:152 free_nsproxy+0xcf/0x220 kernel/nsproxy.c:180 switch_task_namespaces+0xb3/0xd0 kernel/nsproxy.c:229 exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:234 do_exit+0x1ad1/0x26d0 kernel/exit.c:866 do_group_exit+0x177/0x440 kernel/exit.c:970 get_signal+0x8b0/0x1980 kernel/signal.c:2517 do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816 exit_to_usermode_loop+0x2e5/0x380 arch/x86/entry/common.c:162 prepare_exit_to_usermode+0x342/0x3b0 arch/x86/entry/common.c:197 retint_user+0x8/0x18 RIP: 0033:0x45a4d0 Code: 10 44 00 00 00 e8 c0 cc ff ff 0f b6 44 24 18 eb c2 e8 44 ad ff ff e9 6f ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc <64> 48 8b 0c 25 f8 ff ff ff 48 3b 61 10 76 68 48 83 ec 28 48 89 6c RSP: 002b:00007fff9b9d3578 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000000a4edb RCX: 0000000000483170 RDX: 0000000000000000 RSI: 00007fff9b9d3580 RDI: 0000000000000001 RBP: 00000000000002ef R08: 0000000000000001 R09: 000000000166d940 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 R13: 00000000000a4e70 R14: 00000000000001f1 R15: 0000000000000003 Modules linked in: ---[ end trace 9933854824df8420 ]--- RIP: 0010:put_pid.part.3+0xb6/0x240 kernel/pid.c:108 Code: d2 0f 85 89 01 00 00 44 8b 63 04 49 8d 44 24 03 48 c1 e0 04 48 8d 7c 03 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 66 01 00 00 49 83 c4 03 be 04 00 00 00 48 89 df RSP: 0018:ffff8881ae116e78 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffffffff816149a0 RCX: ffffffff833e420e RDX: 00000001933eab7c RSI: ffffffff8152bf8e RDI: 0000000c99f55be0 RBP: ffff8881ae116f08 R08: ffff8881cdbf2300 R09: fffff52001507600 R10: fffff52001507600 R11: ffffc9000a83b003 R12: 00000000d1894120 R13: 1ffff11035c22dd0 R14: ffff8881ae116ee0 R15: dffffc0000000000 FS: 000000000166d940(0000) GS:ffff8881dae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000075c458 CR3: 00000001d2bd1000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot. syzbot can test patches for this bug, for details see: https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-11 20:23 general protection fault in put_pid syzbot @ 2018-12-12 10:55 ` Dmitry Vyukov 2018-12-19 9:04 ` Manfred Spraul 2019-03-27 20:10 ` syzbot 2019-11-07 13:42 ` syzbot 2 siblings, 1 reply; 23+ messages in thread From: Dmitry Vyukov @ 2018-12-12 10:55 UTC (permalink / raw) To: syzbot+1145ec2e23165570c3ac, manfred Cc: Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox On Tue, Dec 11, 2018 at 9:23 PM syzbot <syzbot+1145ec2e23165570c3ac@syzkaller.appspotmail.com> wrote: > > Hello, > > syzbot found the following crash on: > > HEAD commit: f5d582777bcb Merge branch 'for-linus' of git://git.kernel... > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=135bc547400000 > kernel config: https://syzkaller.appspot.com/x/.config?x=c8970c89a0efbb23 > dashboard link: https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16803afb400000 +Manfred, this looks similar to the other few crashes related to semget$private(0x0, 0x4000, 0x3f) that you looked at. > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+1145ec2e23165570c3ac@syzkaller.appspotmail.com > > kmem_cache 221KB 225KB > Out of memory: Kill process 6139 (syz-execprog) score 1 or sacrifice child > Killed process 6164 (syz-executor0) total-vm:37444kB, anon-rss:60kB, > file-rss:0kB, shmem-rss:0kB > oom_reaper: reaped process 6164 (syz-executor0), now anon-rss:0kB, > file-rss:0kB, shmem-rss:0kB > kasan: GPF could be caused by NULL-ptr deref or user memory access > general protection fault: 0000 [#1] PREEMPT SMP KASAN > CPU: 0 PID: 6159 Comm: syz-executor3 Not tainted 4.20.0-rc6+ #151 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > RIP: 0010:put_pid.part.3+0xb6/0x240 kernel/pid.c:108 > Code: d2 0f 85 89 01 00 00 44 8b 63 04 49 8d 44 24 03 48 c1 e0 04 48 8d 7c > 03 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f > 85 66 01 00 00 49 83 c4 03 be 04 00 00 00 48 89 df > RSP: 0018:ffff8881ae116e78 EFLAGS: 00010202 > RAX: dffffc0000000000 RBX: ffffffff816149a0 RCX: ffffffff833e420e > RDX: 00000001933eab7c RSI: ffffffff8152bf8e RDI: 0000000c99f55be0 > RBP: ffff8881ae116f08 R08: ffff8881cdbf2300 R09: fffff52001507600 > R10: fffff52001507600 R11: ffffc9000a83b003 R12: 00000000d1894120 > R13: 1ffff11035c22dd0 R14: ffff8881ae116ee0 R15: dffffc0000000000 > FS: 000000000166d940(0000) GS:ffff8881dae00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000000000075c458 CR3: 00000001d2bd1000 CR4: 00000000001406f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > put_pid+0x1f/0x30 kernel/pid.c:105 > ipc_update_pid ipc/util.h:159 [inline] > freeary+0x10c8/0x1a40 ipc/sem.c:1167 > free_ipcs+0x9f/0x1c0 ipc/namespace.c:112 > sem_exit_ns+0x20/0x40 ipc/sem.c:237 > free_ipc_ns ipc/namespace.c:120 [inline] > put_ipc_ns+0x66/0x180 ipc/namespace.c:152 > free_nsproxy+0xcf/0x220 kernel/nsproxy.c:180 > switch_task_namespaces+0xb3/0xd0 kernel/nsproxy.c:229 > exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:234 > do_exit+0x1ad1/0x26d0 kernel/exit.c:866 > do_group_exit+0x177/0x440 kernel/exit.c:970 > get_signal+0x8b0/0x1980 kernel/signal.c:2517 > do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816 > exit_to_usermode_loop+0x2e5/0x380 arch/x86/entry/common.c:162 > prepare_exit_to_usermode+0x342/0x3b0 arch/x86/entry/common.c:197 > retint_user+0x8/0x18 > RIP: 0033:0x45a4d0 > Code: 10 44 00 00 00 e8 c0 cc ff ff 0f b6 44 24 18 eb c2 e8 44 ad ff ff e9 > 6f ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc <64> 48 8b 0c 25 > f8 ff ff ff 48 3b 61 10 76 68 48 83 ec 28 48 89 6c > RSP: 002b:00007fff9b9d3578 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: 00000000000a4edb RCX: 0000000000483170 > RDX: 0000000000000000 RSI: 00007fff9b9d3580 RDI: 0000000000000001 > RBP: 00000000000002ef R08: 0000000000000001 R09: 000000000166d940 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 > R13: 00000000000a4e70 R14: 00000000000001f1 R15: 0000000000000003 > Modules linked in: > ---[ end trace 9933854824df8420 ]--- > RIP: 0010:put_pid.part.3+0xb6/0x240 kernel/pid.c:108 > Code: d2 0f 85 89 01 00 00 44 8b 63 04 49 8d 44 24 03 48 c1 e0 04 48 8d 7c > 03 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f > 85 66 01 00 00 49 83 c4 03 be 04 00 00 00 48 89 df > RSP: 0018:ffff8881ae116e78 EFLAGS: 00010202 > RAX: dffffc0000000000 RBX: ffffffff816149a0 RCX: ffffffff833e420e > RDX: 00000001933eab7c RSI: ffffffff8152bf8e RDI: 0000000c99f55be0 > RBP: ffff8881ae116f08 R08: ffff8881cdbf2300 R09: fffff52001507600 > R10: fffff52001507600 R11: ffffc9000a83b003 R12: 00000000d1894120 > R13: 1ffff11035c22dd0 R14: ffff8881ae116ee0 R15: dffffc0000000000 > FS: 000000000166d940(0000) GS:ffff8881dae00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000000000075c458 CR3: 00000001d2bd1000 CR4: 00000000001406f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > --- > This bug is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this bug report. See: > https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with > syzbot. > syzbot can test patches for this bug, for details see: > https://goo.gl/tpsmEJ#testing-patches > > -- > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000051ee78057cc4d98f%40google.com. > For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-12 10:55 ` Dmitry Vyukov @ 2018-12-19 9:04 ` Manfred Spraul 2018-12-20 15:36 ` Dmitry Vyukov 0 siblings, 1 reply; 23+ messages in thread From: Manfred Spraul @ 2018-12-19 9:04 UTC (permalink / raw) To: Dmitry Vyukov, syzbot+1145ec2e23165570c3ac Cc: Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso [-- Attachment #1: Type: text/plain, Size: 1069 bytes --] Hello Dmitry, On 12/12/18 11:55 AM, Dmitry Vyukov wrote: > On Tue, Dec 11, 2018 at 9:23 PM syzbot > <syzbot+1145ec2e23165570c3ac@syzkaller.appspotmail.com> wrote: >> Hello, >> >> syzbot found the following crash on: >> >> HEAD commit: f5d582777bcb Merge branch 'for-linus' of git://git.kernel... >> git tree: upstream >> console output: https://syzkaller.appspot.com/x/log.txt?x=135bc547400000 >> kernel config: https://syzkaller.appspot.com/x/.config?x=c8970c89a0efbb23 >> dashboard link: https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac >> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16803afb400000 > +Manfred, this looks similar to the other few crashes related to > semget$private(0x0, 0x4000, 0x3f) that you looked at. I found one unexpected (incorrect?) locking, see the attached patch. But I doubt that this is the root cause of the crashes. Any remarks on the patch? I would continue to search, and then send a series with all findings. -- Manfred [-- Attachment #2: 0001-ipc-sem.c-ensure-proper-locking-during-namespace-tea.patch --] [-- Type: text/x-patch, Size: 3137 bytes --] From 733e888993b71fb3c139f71de61534bc603a2bcb Mon Sep 17 00:00:00 2001 From: Manfred Spraul <manfred@colorfullife.com> Date: Wed, 19 Dec 2018 09:26:48 +0100 Subject: [PATCH] ipc/sem.c: ensure proper locking during namespace teardown free_ipcs() only calls ipc_lock_object() before calling the free callback. This means: - There is no exclusion against parallel simple semop() calls. - sma->use_global_lock may underflow (i.e. jump to UNIT_MAX) when freeary() calls sem_unlock(,,-1). The patch fixes that, by adding complexmode_enter() before calling freeary(). There are multiple syzbot crashes in this code area, but I don't see yet how a missing complexmode_enter() may cause a crash: - 1) simple semop() calls are not used by these syzbox tests, and 2) we are in namespace teardown, noone may run in parallel. - 1) freeary() is the last call (except parallel operations, which are impossible due to namespace teardown) and 2) the underflow of use_global_lock merely delays switching to parallel simple semop handling for the next UINT_MAX semop() calls. Thus I think the patch is "only" a cleanup, and does not fix the observed crashes. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reported-by: syzbot+1145ec2e23165570c3ac@syzkaller.appspotmail.com Reported-by: syzbot+c92d3646e35bc5d1a909@syzkaller.appspotmail.com Reported-by: syzbot+9d8b6fa6ee7636f350c1@syzkaller.appspotmail.com Cc: dvyukov@google.com Cc: dbueso@suse.de Cc: Andrew Morton <akpm@linux-foundation.org> --- ipc/sem.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/ipc/sem.c b/ipc/sem.c index 745dc6187e84..8ccacd11fb15 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -184,6 +184,9 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it); */ #define USE_GLOBAL_LOCK_HYSTERESIS 10 +static void complexmode_enter(struct sem_array *sma); +static void complexmode_tryleave(struct sem_array *sma); + /* * Locking: * a) global sem_lock() for read/write @@ -232,9 +235,24 @@ void sem_init_ns(struct ipc_namespace *ns) } #ifdef CONFIG_IPC_NS + +static void freeary_lock(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp) +{ + struct sem_array *sma = container_of(ipcp, struct sem_array, sem_perm); + + /* + * free_ipcs() isn't aware of sem_lock(), it calls ipc_lock_object() + * directly. In order to stay compatible with sem_lock(), we must + * upgrade from "simple" ipc_lock_object() to sem_lock(,,-1). + */ + complexmode_enter(sma); + + freeary(ns, ipcp); +} + void sem_exit_ns(struct ipc_namespace *ns) { - free_ipcs(ns, &sem_ids(ns), freeary); + free_ipcs(ns, &sem_ids(ns), freeary_lock); idr_destroy(&ns->ids[IPC_SEM_IDS].ipcs_idr); rhashtable_destroy(&ns->ids[IPC_SEM_IDS].key_ht); } @@ -374,7 +392,9 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops, /* Complex operation - acquire a full lock */ ipc_lock_object(&sma->sem_perm); - /* Prevent parallel simple ops */ + /* Prevent parallel simple ops. + * This must be identical to freeary_lock(). + */ complexmode_enter(sma); return SEM_GLOBAL_LOCK; } -- 2.17.2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-19 9:04 ` Manfred Spraul @ 2018-12-20 15:36 ` Dmitry Vyukov 2018-12-22 19:07 ` Manfred Spraul 0 siblings, 1 reply; 23+ messages in thread From: Dmitry Vyukov @ 2018-12-20 15:36 UTC (permalink / raw) To: Manfred Spraul Cc: syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso On Wed, Dec 19, 2018 at 10:04 AM Manfred Spraul <manfred@colorfullife.com> wrote: > > Hello Dmitry, > > On 12/12/18 11:55 AM, Dmitry Vyukov wrote: > > On Tue, Dec 11, 2018 at 9:23 PM syzbot > > <syzbot+1145ec2e23165570c3ac@syzkaller.appspotmail.com> wrote: > >> Hello, > >> > >> syzbot found the following crash on: > >> > >> HEAD commit: f5d582777bcb Merge branch 'for-linus' of git://git.kernel... > >> git tree: upstream > >> console output: https://syzkaller.appspot.com/x/log.txt?x=135bc547400000 > >> kernel config: https://syzkaller.appspot.com/x/.config?x=c8970c89a0efbb23 > >> dashboard link: https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac > >> compiler: gcc (GCC) 8.0.1 20180413 (experimental) > >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16803afb400000 > > +Manfred, this looks similar to the other few crashes related to > > semget$private(0x0, 0x4000, 0x3f) that you looked at. > > I found one unexpected (incorrect?) locking, see the attached patch. > > But I doubt that this is the root cause of the crashes. But why? These one-off sporadic crashes reported by syzbot looks exactly like a subtle race and your patch touches sem_exit_ns involved in all reports. So if you don't spot anything else, I would say close these 3 reports with this patch (I see you already included Reported-by tags which is great!) and then wait for syzbot reaction. Since we got 3 of them, if it's still not fixed I would expect that syzbot will be able to retrigger this later again. > Any remarks on the patch? > > I would continue to search, and then send a series with all findings. > > -- > > Manfred > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-20 15:36 ` Dmitry Vyukov @ 2018-12-22 19:07 ` Manfred Spraul 2018-12-23 7:37 ` Dmitry Vyukov 0 siblings, 1 reply; 23+ messages in thread From: Manfred Spraul @ 2018-12-22 19:07 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso Hi Dmitry, On 12/20/18 4:36 PM, Dmitry Vyukov wrote: > On Wed, Dec 19, 2018 at 10:04 AM Manfred Spraul > <manfred@colorfullife.com> wrote: >> Hello Dmitry, >> >> On 12/12/18 11:55 AM, Dmitry Vyukov wrote: >>> On Tue, Dec 11, 2018 at 9:23 PM syzbot >>> <syzbot+1145ec2e23165570c3ac@syzkaller.appspotmail.com> wrote: >>>> Hello, >>>> >>>> syzbot found the following crash on: >>>> >>>> HEAD commit: f5d582777bcb Merge branch 'for-linus' of git://git.kernel... >>>> git tree: upstream >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=135bc547400000 >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=c8970c89a0efbb23 >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac >>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16803afb400000 >>> +Manfred, this looks similar to the other few crashes related to >>> semget$private(0x0, 0x4000, 0x3f) that you looked at. >> I found one unexpected (incorrect?) locking, see the attached patch. >> >> But I doubt that this is the root cause of the crashes. > > But why? These one-off sporadic crashes reported by syzbot looks > exactly like a subtle race and your patch touches sem_exit_ns involved > in all reports. > So if you don't spot anything else, I would say close these 3 reports > with this patch (I see you already included Reported-by tags which is > great!) and then wait for syzbot reaction. Since we got 3 of them, if > it's still not fixed I would expect that syzbot will be able to > retrigger this later again. As I wrote, unless semop() is used, sma->use_global_lock is always 9 and nothing can happen. Every single-operation semop() reduces use_global_lock by one, i.e a single semop call as done here cannot trigger the bug: https://syzkaller.appspot.com/text?tag=ReproSyz&x=16803afb400000 But, one more finding: https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac https://syzkaller.appspot.com/text?tag=CrashLog&x=109ecf6e400000 The log file contain 1080 lines like these: > semget$private(..., 0x4003, ...) > > semget$private(..., 0x4006, ...) > > semget$private(..., 0x4007, ...) It ends up as kmalloc(128*0x400x), i.e. slightly more than 2 MB, an allocation in the 4 MB kmalloc buffer: > [ 1201.210245] kmalloc-4194304 4698112KB 4698112KB > i.e.: 1147 4 MB kmalloc blocks --> are we leaking nearly 100% of the semaphore arrays?? This one looks similar: https://syzkaller.appspot.com/bug?extid=c92d3646e35bc5d1a909 except that the array sizes are mixed, and thus there are kmalloc-1M and kmalloc-2M as well. (and I did not count the number of semget calls) The test apps use unshare(CLONE_NEWNS) and unshare(CLONE_NEWIPC), correct? I.e. no CLONE_NEWUSER. https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L1523 -- Manfred ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-22 19:07 ` Manfred Spraul @ 2018-12-23 7:37 ` Dmitry Vyukov 2018-12-23 9:57 ` Dmitry Vyukov 0 siblings, 1 reply; 23+ messages in thread From: Dmitry Vyukov @ 2018-12-23 7:37 UTC (permalink / raw) To: manfred Cc: syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso On Sat, Dec 22, 2018 at 8:07 PM Manfred Spraul <manfred@colorfullife.com> wrote: > > Hi Dmitry, > > On 12/20/18 4:36 PM, Dmitry Vyukov wrote: > > On Wed, Dec 19, 2018 at 10:04 AM Manfred Spraul > > <manfred@colorfullife.com> wrote: > >> Hello Dmitry, > >> > >> On 12/12/18 11:55 AM, Dmitry Vyukov wrote: > >>> On Tue, Dec 11, 2018 at 9:23 PM syzbot > >>> <syzbot+1145ec2e23165570c3ac@syzkaller.appspotmail.com> wrote: > >>>> Hello, > >>>> > >>>> syzbot found the following crash on: > >>>> > >>>> HEAD commit: f5d582777bcb Merge branch 'for-linus' of git://git.kernel... > >>>> git tree: upstream > >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=135bc547400000 > >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=c8970c89a0efbb23 > >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac > >>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental) > >>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16803afb400000 > >>> +Manfred, this looks similar to the other few crashes related to > >>> semget$private(0x0, 0x4000, 0x3f) that you looked at. > >> I found one unexpected (incorrect?) locking, see the attached patch. > >> > >> But I doubt that this is the root cause of the crashes. > > > > But why? These one-off sporadic crashes reported by syzbot looks > > exactly like a subtle race and your patch touches sem_exit_ns involved > > in all reports. > > So if you don't spot anything else, I would say close these 3 reports > > with this patch (I see you already included Reported-by tags which is > > great!) and then wait for syzbot reaction. Since we got 3 of them, if > > it's still not fixed I would expect that syzbot will be able to > > retrigger this later again. > > As I wrote, unless semop() is used, sma->use_global_lock is always 9 and > nothing can happen. > > Every single-operation semop() reduces use_global_lock by one, i.e a > single semop call as done here cannot trigger the bug: > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16803afb400000 It contains "repeat":true,"procs":6, which means that it run 6 processes running this test in infinite loop. The last mark about number of tests executed was: 2018/12/11 18:38:02 executed programs: 2955 > But, one more finding: > > https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac > > https://syzkaller.appspot.com/text?tag=CrashLog&x=109ecf6e400000 > > The log file contain 1080 lines like these: > > > semget$private(..., 0x4003, ...) > > > > semget$private(..., 0x4006, ...) > > > > semget$private(..., 0x4007, ...) > > It ends up as kmalloc(128*0x400x), i.e. slightly more than 2 MB, an > allocation in the 4 MB kmalloc buffer: > > > [ 1201.210245] kmalloc-4194304 4698112KB 4698112KB > > > i.e.: 1147 4 MB kmalloc blocks --> are we leaking nearly 100% of the > semaphore arrays?? /\/\/\/\/\/\ Ha, this is definitely not healthy. > This one looks similar: > > https://syzkaller.appspot.com/bug?extid=c92d3646e35bc5d1a909 > > except that the array sizes are mixed, and thus there are kmalloc-1M and > kmalloc-2M as well. > > (and I did not count the number of semget calls) > > > The test apps use unshare(CLONE_NEWNS) and unshare(CLONE_NEWIPC), correct? > > I.e. no CLONE_NEWUSER. > > https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L1523 CLONE_NEWUSER is used on some instances as well: https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L1765 This crash happened on 2 different instances and 1 of them uses CLONE_NEWUSER and another does not. If it's important because of CAP_ADMIN in IPC namespace, then all instances should have it (instances that don't use NEWUSER are just root). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-23 7:37 ` Dmitry Vyukov @ 2018-12-23 9:57 ` Dmitry Vyukov 2018-12-23 10:30 ` Dmitry Vyukov 2018-12-23 12:25 ` Manfred Spraul 0 siblings, 2 replies; 23+ messages in thread From: Dmitry Vyukov @ 2018-12-23 9:57 UTC (permalink / raw) To: manfred Cc: syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso On Sun, Dec 23, 2018 at 8:37 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Sat, Dec 22, 2018 at 8:07 PM Manfred Spraul <manfred@colorfullife.com> wrote: > > > > Hi Dmitry, > > > > On 12/20/18 4:36 PM, Dmitry Vyukov wrote: > > > On Wed, Dec 19, 2018 at 10:04 AM Manfred Spraul > > > <manfred@colorfullife.com> wrote: > > >> Hello Dmitry, > > >> > > >> On 12/12/18 11:55 AM, Dmitry Vyukov wrote: > > >>> On Tue, Dec 11, 2018 at 9:23 PM syzbot > > >>> <syzbot+1145ec2e23165570c3ac@syzkaller.appspotmail.com> wrote: > > >>>> Hello, > > >>>> > > >>>> syzbot found the following crash on: > > >>>> > > >>>> HEAD commit: f5d582777bcb Merge branch 'for-linus' of git://git.kernel... > > >>>> git tree: upstream > > >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=135bc547400000 > > >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=c8970c89a0efbb23 > > >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac > > >>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > >>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16803afb400000 > > >>> +Manfred, this looks similar to the other few crashes related to > > >>> semget$private(0x0, 0x4000, 0x3f) that you looked at. > > >> I found one unexpected (incorrect?) locking, see the attached patch. > > >> > > >> But I doubt that this is the root cause of the crashes. > > > > > > But why? These one-off sporadic crashes reported by syzbot looks > > > exactly like a subtle race and your patch touches sem_exit_ns involved > > > in all reports. > > > So if you don't spot anything else, I would say close these 3 reports > > > with this patch (I see you already included Reported-by tags which is > > > great!) and then wait for syzbot reaction. Since we got 3 of them, if > > > it's still not fixed I would expect that syzbot will be able to > > > retrigger this later again. > > > > As I wrote, unless semop() is used, sma->use_global_lock is always 9 and > > nothing can happen. > > > > Every single-operation semop() reduces use_global_lock by one, i.e a > > single semop call as done here cannot trigger the bug: > > > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16803afb400000 > > It contains "repeat":true,"procs":6, which means that it run 6 > processes running this test in infinite loop. The last mark about > number of tests executed was: > 2018/12/11 18:38:02 executed programs: 2955 > > > But, one more finding: > > > > https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac > > > > https://syzkaller.appspot.com/text?tag=CrashLog&x=109ecf6e400000 > > > > The log file contain 1080 lines like these: > > > > > semget$private(..., 0x4003, ...) > > > > > > semget$private(..., 0x4006, ...) > > > > > > semget$private(..., 0x4007, ...) > > > > It ends up as kmalloc(128*0x400x), i.e. slightly more than 2 MB, an > > allocation in the 4 MB kmalloc buffer: > > > > > [ 1201.210245] kmalloc-4194304 4698112KB 4698112KB > > > > > i.e.: 1147 4 MB kmalloc blocks --> are we leaking nearly 100% of the > > semaphore arrays?? > > /\/\/\/\/\/\ > > Ha, this is definitely not healthy. I can reproduce this infinite memory consumption with the C program: https://gist.githubusercontent.com/dvyukov/03ec54b3429ade16fa07bf8b2379aff3/raw/ae4f654e279810de2505e8fa41b73dc1d77778e6/gistfile1.txt But this is working as intended, right? It just creates infinite number of large semaphore sets, which reasonably consumes infinite amount of memory. Except that it also violates the memcg bound and a process can have effectively unlimited amount of such "drum memory" in semaphores. > > This one looks similar: > > > > https://syzkaller.appspot.com/bug?extid=c92d3646e35bc5d1a909 > > > > except that the array sizes are mixed, and thus there are kmalloc-1M and > > kmalloc-2M as well. > > > > (and I did not count the number of semget calls) > > > > > > The test apps use unshare(CLONE_NEWNS) and unshare(CLONE_NEWIPC), correct? > > > > I.e. no CLONE_NEWUSER. > > > > https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L1523 > > CLONE_NEWUSER is used on some instances as well: > https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L1765 > This crash happened on 2 different instances and 1 of them uses > CLONE_NEWUSER and another does not. > If it's important because of CAP_ADMIN in IPC namespace, then all > instances should have it (instances that don't use NEWUSER are just > root). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-23 9:57 ` Dmitry Vyukov @ 2018-12-23 10:30 ` Dmitry Vyukov 2018-12-23 10:42 ` Dmitry Vyukov 2018-12-23 12:25 ` Manfred Spraul 1 sibling, 1 reply; 23+ messages in thread From: Dmitry Vyukov @ 2018-12-23 10:30 UTC (permalink / raw) To: manfred Cc: syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso On Sun, Dec 23, 2018 at 10:57 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Sun, Dec 23, 2018 at 8:37 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Sat, Dec 22, 2018 at 8:07 PM Manfred Spraul <manfred@colorfullife.com> wrote: > > > > > > Hi Dmitry, > > > > > > On 12/20/18 4:36 PM, Dmitry Vyukov wrote: > > > > On Wed, Dec 19, 2018 at 10:04 AM Manfred Spraul > > > > <manfred@colorfullife.com> wrote: > > > >> Hello Dmitry, > > > >> > > > >> On 12/12/18 11:55 AM, Dmitry Vyukov wrote: > > > >>> On Tue, Dec 11, 2018 at 9:23 PM syzbot > > > >>> <syzbot+1145ec2e23165570c3ac@syzkaller.appspotmail.com> wrote: > > > >>>> Hello, > > > >>>> > > > >>>> syzbot found the following crash on: > > > >>>> > > > >>>> HEAD commit: f5d582777bcb Merge branch 'for-linus' of git://git.kernel... > > > >>>> git tree: upstream > > > >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=135bc547400000 > > > >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=c8970c89a0efbb23 > > > >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac > > > >>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > > >>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16803afb400000 > > > >>> +Manfred, this looks similar to the other few crashes related to > > > >>> semget$private(0x0, 0x4000, 0x3f) that you looked at. > > > >> I found one unexpected (incorrect?) locking, see the attached patch. > > > >> > > > >> But I doubt that this is the root cause of the crashes. > > > > > > > > But why? These one-off sporadic crashes reported by syzbot looks > > > > exactly like a subtle race and your patch touches sem_exit_ns involved > > > > in all reports. > > > > So if you don't spot anything else, I would say close these 3 reports > > > > with this patch (I see you already included Reported-by tags which is > > > > great!) and then wait for syzbot reaction. Since we got 3 of them, if > > > > it's still not fixed I would expect that syzbot will be able to > > > > retrigger this later again. > > > > > > As I wrote, unless semop() is used, sma->use_global_lock is always 9 and > > > nothing can happen. > > > > > > Every single-operation semop() reduces use_global_lock by one, i.e a > > > single semop call as done here cannot trigger the bug: > > > > > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16803afb400000 > > > > It contains "repeat":true,"procs":6, which means that it run 6 > > processes running this test in infinite loop. The last mark about > > number of tests executed was: > > 2018/12/11 18:38:02 executed programs: 2955 > > > > > But, one more finding: > > > > > > https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac > > > > > > https://syzkaller.appspot.com/text?tag=CrashLog&x=109ecf6e400000 > > > > > > The log file contain 1080 lines like these: > > > > > > > semget$private(..., 0x4003, ...) > > > > > > > > semget$private(..., 0x4006, ...) > > > > > > > > semget$private(..., 0x4007, ...) > > > > > > It ends up as kmalloc(128*0x400x), i.e. slightly more than 2 MB, an > > > allocation in the 4 MB kmalloc buffer: > > > > > > > [ 1201.210245] kmalloc-4194304 4698112KB 4698112KB > > > > > > > i.e.: 1147 4 MB kmalloc blocks --> are we leaking nearly 100% of the > > > semaphore arrays?? > > > > /\/\/\/\/\/\ > > > > Ha, this is definitely not healthy. > > I can reproduce this infinite memory consumption with the C program: > https://gist.githubusercontent.com/dvyukov/03ec54b3429ade16fa07bf8b2379aff3/raw/ae4f654e279810de2505e8fa41b73dc1d77778e6/gistfile1.txt > > But this is working as intended, right? It just creates infinite > number of large semaphore sets, which reasonably consumes infinite > amount of memory. > Except that it also violates the memcg bound and a process can have > effectively unlimited amount of such "drum memory" in semaphores. > > > > > > > This one looks similar: > > > > > > https://syzkaller.appspot.com/bug?extid=c92d3646e35bc5d1a909 > > > > > > except that the array sizes are mixed, and thus there are kmalloc-1M and > > > kmalloc-2M as well. > > > > > > (and I did not count the number of semget calls) > > > > > > > > > The test apps use unshare(CLONE_NEWNS) and unshare(CLONE_NEWIPC), correct? > > > > > > I.e. no CLONE_NEWUSER. > > > > > > https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L1523 > > > > CLONE_NEWUSER is used on some instances as well: > > https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L1765 > > This crash happened on 2 different instances and 1 of them uses > > CLONE_NEWUSER and another does not. > > If it's important because of CAP_ADMIN in IPC namespace, then all > > instances should have it (instances that don't use NEWUSER are just > > root). My naive attempts to re-reproduce this failed so far. But I noticed that _all_ logs for these 3 crashes: https://syzkaller.appspot.com/bug?extid=c92d3646e35bc5d1a909 https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac https://syzkaller.appspot.com/bug?extid=9d8b6fa6ee7636f350c1 involve low memory conditions. My gut feeling says this is not a coincidence. This is also probably the reason why all reproducers create large sem sets. There must be some bad interaction between low memory condition and semaphores/ipc namespaces. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-23 10:30 ` Dmitry Vyukov @ 2018-12-23 10:42 ` Dmitry Vyukov 2018-12-23 12:32 ` Manfred Spraul 0 siblings, 1 reply; 23+ messages in thread From: Dmitry Vyukov @ 2018-12-23 10:42 UTC (permalink / raw) To: manfred Cc: syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso On Sun, Dec 23, 2018 at 11:30 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Sun, Dec 23, 2018 at 10:57 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Sun, Dec 23, 2018 at 8:37 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > > > On Sat, Dec 22, 2018 at 8:07 PM Manfred Spraul <manfred@colorfullife.com> wrote: > > > > > > > > Hi Dmitry, > > > > > > > > On 12/20/18 4:36 PM, Dmitry Vyukov wrote: > > > > > On Wed, Dec 19, 2018 at 10:04 AM Manfred Spraul > > > > > <manfred@colorfullife.com> wrote: > > > > >> Hello Dmitry, > > > > >> > > > > >> On 12/12/18 11:55 AM, Dmitry Vyukov wrote: > > > > >>> On Tue, Dec 11, 2018 at 9:23 PM syzbot > > > > >>> <syzbot+1145ec2e23165570c3ac@syzkaller.appspotmail.com> wrote: > > > > >>>> Hello, > > > > >>>> > > > > >>>> syzbot found the following crash on: > > > > >>>> > > > > >>>> HEAD commit: f5d582777bcb Merge branch 'for-linus' of git://git.kernel... > > > > >>>> git tree: upstream > > > > >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=135bc547400000 > > > > >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=c8970c89a0efbb23 > > > > >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac > > > > >>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > > > >>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16803afb400000 > > > > >>> +Manfred, this looks similar to the other few crashes related to > > > > >>> semget$private(0x0, 0x4000, 0x3f) that you looked at. > > > > >> I found one unexpected (incorrect?) locking, see the attached patch. > > > > >> > > > > >> But I doubt that this is the root cause of the crashes. > > > > > > > > > > But why? These one-off sporadic crashes reported by syzbot looks > > > > > exactly like a subtle race and your patch touches sem_exit_ns involved > > > > > in all reports. > > > > > So if you don't spot anything else, I would say close these 3 reports > > > > > with this patch (I see you already included Reported-by tags which is > > > > > great!) and then wait for syzbot reaction. Since we got 3 of them, if > > > > > it's still not fixed I would expect that syzbot will be able to > > > > > retrigger this later again. > > > > > > > > As I wrote, unless semop() is used, sma->use_global_lock is always 9 and > > > > nothing can happen. > > > > > > > > Every single-operation semop() reduces use_global_lock by one, i.e a > > > > single semop call as done here cannot trigger the bug: > > > > > > > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16803afb400000 > > > > > > It contains "repeat":true,"procs":6, which means that it run 6 > > > processes running this test in infinite loop. The last mark about > > > number of tests executed was: > > > 2018/12/11 18:38:02 executed programs: 2955 > > > > > > > But, one more finding: > > > > > > > > https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac > > > > > > > > https://syzkaller.appspot.com/text?tag=CrashLog&x=109ecf6e400000 > > > > > > > > The log file contain 1080 lines like these: > > > > > > > > > semget$private(..., 0x4003, ...) > > > > > > > > > > semget$private(..., 0x4006, ...) > > > > > > > > > > semget$private(..., 0x4007, ...) > > > > > > > > It ends up as kmalloc(128*0x400x), i.e. slightly more than 2 MB, an > > > > allocation in the 4 MB kmalloc buffer: > > > > > > > > > [ 1201.210245] kmalloc-4194304 4698112KB 4698112KB > > > > > > > > > i.e.: 1147 4 MB kmalloc blocks --> are we leaking nearly 100% of the > > > > semaphore arrays?? > > > > > > /\/\/\/\/\/\ > > > > > > Ha, this is definitely not healthy. > > > > I can reproduce this infinite memory consumption with the C program: > > https://gist.githubusercontent.com/dvyukov/03ec54b3429ade16fa07bf8b2379aff3/raw/ae4f654e279810de2505e8fa41b73dc1d77778e6/gistfile1.txt > > > > But this is working as intended, right? It just creates infinite > > number of large semaphore sets, which reasonably consumes infinite > > amount of memory. > > Except that it also violates the memcg bound and a process can have > > effectively unlimited amount of such "drum memory" in semaphores. > > > > > > > > > > > > This one looks similar: > > > > > > > > https://syzkaller.appspot.com/bug?extid=c92d3646e35bc5d1a909 > > > > > > > > except that the array sizes are mixed, and thus there are kmalloc-1M and > > > > kmalloc-2M as well. > > > > > > > > (and I did not count the number of semget calls) > > > > > > > > > > > > The test apps use unshare(CLONE_NEWNS) and unshare(CLONE_NEWIPC), correct? > > > > > > > > I.e. no CLONE_NEWUSER. > > > > > > > > https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L1523 > > > > > > CLONE_NEWUSER is used on some instances as well: > > > https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L1765 > > > This crash happened on 2 different instances and 1 of them uses > > > CLONE_NEWUSER and another does not. > > > If it's important because of CAP_ADMIN in IPC namespace, then all > > > instances should have it (instances that don't use NEWUSER are just > > > root). > > My naive attempts to re-reproduce this failed so far. > But I noticed that _all_ logs for these 3 crashes: > https://syzkaller.appspot.com/bug?extid=c92d3646e35bc5d1a909 > https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac > https://syzkaller.appspot.com/bug?extid=9d8b6fa6ee7636f350c1 > involve low memory conditions. My gut feeling says this is not a > coincidence. This is also probably the reason why all reproducers > create large sem sets. There must be some bad interaction between low > memory condition and semaphores/ipc namespaces. Actually was able to reproduce this with a syzkaller program: ./syz-execprog -repeat=0 -procs=10 prog ... kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 8788 Comm: syz-executor8 Not tainted 4.20.0-rc7+ #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00 RSP: 0018:ffff88804faef210 EFLAGS: 00010a02 RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728 RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401 R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98 R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000 FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __list_del_entry include/linux/list.h:117 [inline] list_del include/linux/list.h:125 [inline] unlink_queue ipc/sem.c:786 [inline] freeary+0xddb/0x1c90 ipc/sem.c:1164 free_ipcs+0xf0/0x160 ipc/namespace.c:112 sem_exit_ns+0x20/0x40 ipc/sem.c:237 free_ipc_ns ipc/namespace.c:120 [inline] put_ipc_ns+0x55/0x160 ipc/namespace.c:152 free_nsproxy+0xc0/0x1f0 kernel/nsproxy.c:180 switch_task_namespaces+0xa5/0xc0 kernel/nsproxy.c:229 exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:234 do_exit+0x19e5/0x27d0 kernel/exit.c:866 do_group_exit+0x151/0x410 kernel/exit.c:970 __do_sys_exit_group kernel/exit.c:981 [inline] __se_sys_exit_group kernel/exit.c:979 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:979 do_syscall_64+0x192/0x770 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4570e9 Code: 5d af fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b af fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffe35f12018 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00000000004570e9 RDX: 0000000000410540 RSI: 0000000000a34c00 RDI: 0000000000000045 RBP: 00000000004a43a4 R08: 000000000000000c R09: 0000000000000000 R10: 0000000000d24940 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000008 Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace 17829b0f00569a59 ]--- RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00 RSP: 0018:ffff88804faef210 EFLAGS: 00010a02 RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728 RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401 R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98 R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000 FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 The prog is: unshare(0x8020000) semget$private(0x0, 0x4007, 0x0) kernel is on 9105b8aa50c182371533fc97db64fc8f26f051b3 and again it involved lots of oom kills, the repro eats all memory, a process getting killed, frees some memory and the process repeats. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-23 10:42 ` Dmitry Vyukov @ 2018-12-23 12:32 ` Manfred Spraul 2018-12-25 9:35 ` Dmitry Vyukov 2018-12-25 9:41 ` Dmitry Vyukov 0 siblings, 2 replies; 23+ messages in thread From: Manfred Spraul @ 2018-12-23 12:32 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso Hi Dmitry, let's simplify the mail, otherwise noone can follow: On 12/23/18 11:42 AM, Dmitry Vyukov wrote: > >> My naive attempts to re-reproduce this failed so far. >> But I noticed that _all_ logs for these 3 crashes: >> https://syzkaller.appspot.com/bug?extid=c92d3646e35bc5d1a909 >> https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac >> https://syzkaller.appspot.com/bug?extid=9d8b6fa6ee7636f350c1 >> involve low memory conditions. My gut feeling says this is not a >> coincidence. This is also probably the reason why all reproducers >> create large sem sets. There must be some bad interaction between low >> memory condition and semaphores/ipc namespaces. > > Actually was able to reproduce this with a syzkaller program: > > ./syz-execprog -repeat=0 -procs=10 prog > ... > kasan: CONFIG_KASAN_INLINE enabled > kasan: GPF could be caused by NULL-ptr deref or user memory access > general protection fault: 0000 [#1] PREEMPT SMP KASAN > CPU: 1 PID: 8788 Comm: syz-executor8 Not tainted 4.20.0-rc7+ #6 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00 > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02 > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728 > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401 > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98 > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000 > FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > __list_del_entry include/linux/list.h:117 [inline] > list_del include/linux/list.h:125 [inline] > unlink_queue ipc/sem.c:786 [inline] > freeary+0xddb/0x1c90 ipc/sem.c:1164 > free_ipcs+0xf0/0x160 ipc/namespace.c:112 > sem_exit_ns+0x20/0x40 ipc/sem.c:237 > free_ipc_ns ipc/namespace.c:120 [inline] > put_ipc_ns+0x55/0x160 ipc/namespace.c:152 > free_nsproxy+0xc0/0x1f0 kernel/nsproxy.c:180 > switch_task_namespaces+0xa5/0xc0 kernel/nsproxy.c:229 > exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:234 > do_exit+0x19e5/0x27d0 kernel/exit.c:866 > do_group_exit+0x151/0x410 kernel/exit.c:970 > __do_sys_exit_group kernel/exit.c:981 [inline] > __se_sys_exit_group kernel/exit.c:979 [inline] > __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:979 > do_syscall_64+0x192/0x770 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x4570e9 > Code: 5d af fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d > 01 f0 ff ff 0f 83 2b af fb ff c3 66 2e 0f 1f 84 00 00 00 00 > RSP: 002b:00007ffe35f12018 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00000000004570e9 > RDX: 0000000000410540 RSI: 0000000000a34c00 RDI: 0000000000000045 > RBP: 00000000004a43a4 R08: 000000000000000c R09: 0000000000000000 > R10: 0000000000d24940 R11: 0000000000000246 R12: 0000000000000000 > R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000008 > Modules linked in: > Dumping ftrace buffer: > (ftrace buffer empty) > ---[ end trace 17829b0f00569a59 ]--- > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00 > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02 > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728 > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401 > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98 > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000 > FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > The prog is: > unshare(0x8020000) > semget$private(0x0, 0x4007, 0x0) > > kernel is on 9105b8aa50c182371533fc97db64fc8f26f051b3 > > and again it involved lots of oom kills, the repro eats all memory, a > process getting killed, frees some memory and the process repeats. Ok, thus the above program triggers two bugs: - a huge memory leak with semaphore arrays - under OOM pressure, an oops. 1) I can reproduce the memory leak, it happens all the time :-( I must look what is wrong. 2) regarding the crash: What differs under oom pressure? - kvmalloc can fall back to vmalloc() - the 2nd or 3rd of multiple allocations can fail, and that triggers a rare codepath/race condition. - rcu callback can happen earlier that expected So far, I didn't notice anything unexpected :-( -- Manfred ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-23 12:32 ` Manfred Spraul @ 2018-12-25 9:35 ` Dmitry Vyukov 2018-12-26 9:03 ` Dmitry Vyukov 2018-12-25 9:41 ` Dmitry Vyukov 1 sibling, 1 reply; 23+ messages in thread From: Dmitry Vyukov @ 2018-12-25 9:35 UTC (permalink / raw) To: Manfred Spraul Cc: syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso On Sun, Dec 23, 2018 at 7:38 PM Manfred Spraul <manfred@colorfullife.com> wrote: > > Hello Dmitry, > > On 12/23/18 11:42 AM, Dmitry Vyukov wrote: > > Actually was able to reproduce this with a syzkaller program: > > ./syz-execprog -repeat=0 -procs=10 prog > > ... > > kasan: CONFIG_KASAN_INLINE enabled > > kasan: GPF could be caused by NULL-ptr deref or user memory access > > general protection fault: 0000 [#1] PREEMPT SMP KASAN > > CPU: 1 PID: 8788 Comm: syz-executor8 Not tainted 4.20.0-rc7+ #6 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 > > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 > > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c > > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00 > > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02 > > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f > > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728 > > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401 > > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98 > > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000 > > FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Call Trace: > > __list_del_entry include/linux/list.h:117 [inline] > > list_del include/linux/list.h:125 [inline] > > unlink_queue ipc/sem.c:786 [inline] > > freeary+0xddb/0x1c90 ipc/sem.c:1164 > > free_ipcs+0xf0/0x160 ipc/namespace.c:112 > > sem_exit_ns+0x20/0x40 ipc/sem.c:237 > > free_ipc_ns ipc/namespace.c:120 [inline] > > put_ipc_ns+0x55/0x160 ipc/namespace.c:152 > > free_nsproxy+0xc0/0x1f0 kernel/nsproxy.c:180 > > switch_task_namespaces+0xa5/0xc0 kernel/nsproxy.c:229 > > exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:234 > > do_exit+0x19e5/0x27d0 kernel/exit.c:866 > > do_group_exit+0x151/0x410 kernel/exit.c:970 > > __do_sys_exit_group kernel/exit.c:981 [inline] > > __se_sys_exit_group kernel/exit.c:979 [inline] > > __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:979 > > do_syscall_64+0x192/0x770 arch/x86/entry/common.c:290 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > RIP: 0033:0x4570e9 > > Code: 5d af fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 > > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d > > 01 f0 ff ff 0f 83 2b af fb ff c3 66 2e 0f 1f 84 00 00 00 00 > > RSP: 002b:00007ffe35f12018 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00000000004570e9 > > RDX: 0000000000410540 RSI: 0000000000a34c00 RDI: 0000000000000045 > > RBP: 00000000004a43a4 R08: 000000000000000c R09: 0000000000000000 > > R10: 0000000000d24940 R11: 0000000000000246 R12: 0000000000000000 > > R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000008 > > Modules linked in: > > Dumping ftrace buffer: > > (ftrace buffer empty) > > ---[ end trace 17829b0f00569a59 ]--- > > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 > > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 > > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c > > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00 > > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02 > > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f > > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728 > > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401 > > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98 > > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000 > > FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > > > The prog is: > > unshare(0x8020000) > > semget$private(0x0, 0x4007, 0x0) > > > > kernel is on 9105b8aa50c182371533fc97db64fc8f26f051b3 > > > > and again it involved lots of oom kills, the repro eats all memory, a > > process getting killed, frees some memory and the process repeats. > > I was too fast: I can't reproduce the memory leak. > > Can you send me the source for prog? Here is the program: https://gist.githubusercontent.com/dvyukov/03ec54b3429ade16fa07bf8b2379aff3/raw/ae4f654e279810de2505e8fa41b73dc1d77778e6/gistfile1.txt But we concluded this is not a leak, right? It just creates large semaphores tied to a persistent ipcns. Once the process is killed, all memory is released. When this program runs, it eats all memory, then one of the subprocesses is oom-killed, part of memory is released, then all memory is consumed again by a new subprocess and this repeats. If all processes are killed, all memory is released back. It seems to be working as intended. However, what you said about kernel.sem sysctl is useful and I think we need to use it for additional sandboxing of syzkaller test processes. I am thinking of applying: kernel.shmmax = 16777216 kernel.shmall = 536870912 kernel.shmmni = 1024 kernel.msgmax = 8192 kernel.msgmni = 1024 kernel.msgmnb = 1024 kernel.sem = 1024 1048576 500 1024 It should be enough to trigger bugs of any complexity (oom's aside), but should prevent uncontrolled memory consumption. Looking at the code I figured that these sysctls are per-ipc-namespace, right? I.e. if I do sysctl from an ipcns, the limits will be set only only for that ns. I won't use this initially, but something to keep in mind if the global limits will fail in some way. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-25 9:35 ` Dmitry Vyukov @ 2018-12-26 9:03 ` Dmitry Vyukov 2018-12-30 9:31 ` Dmitry Vyukov 0 siblings, 1 reply; 23+ messages in thread From: Dmitry Vyukov @ 2018-12-26 9:03 UTC (permalink / raw) To: Manfred Spraul, Shakeel Butt Cc: syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso On Tue, Dec 25, 2018 at 10:35 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Sun, Dec 23, 2018 at 7:38 PM Manfred Spraul <manfred@colorfullife.com> wrote: > > > > Hello Dmitry, > > > > On 12/23/18 11:42 AM, Dmitry Vyukov wrote: > > > Actually was able to reproduce this with a syzkaller program: > > > ./syz-execprog -repeat=0 -procs=10 prog > > > ... > > > kasan: CONFIG_KASAN_INLINE enabled > > > kasan: GPF could be caused by NULL-ptr deref or user memory access > > > general protection fault: 0000 [#1] PREEMPT SMP KASAN > > > CPU: 1 PID: 8788 Comm: syz-executor8 Not tainted 4.20.0-rc7+ #6 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > > > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 > > > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 > > > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c > > > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00 > > > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02 > > > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f > > > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728 > > > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401 > > > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98 > > > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000 > > > FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > Call Trace: > > > __list_del_entry include/linux/list.h:117 [inline] > > > list_del include/linux/list.h:125 [inline] > > > unlink_queue ipc/sem.c:786 [inline] > > > freeary+0xddb/0x1c90 ipc/sem.c:1164 > > > free_ipcs+0xf0/0x160 ipc/namespace.c:112 > > > sem_exit_ns+0x20/0x40 ipc/sem.c:237 > > > free_ipc_ns ipc/namespace.c:120 [inline] > > > put_ipc_ns+0x55/0x160 ipc/namespace.c:152 > > > free_nsproxy+0xc0/0x1f0 kernel/nsproxy.c:180 > > > switch_task_namespaces+0xa5/0xc0 kernel/nsproxy.c:229 > > > exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:234 > > > do_exit+0x19e5/0x27d0 kernel/exit.c:866 > > > do_group_exit+0x151/0x410 kernel/exit.c:970 > > > __do_sys_exit_group kernel/exit.c:981 [inline] > > > __se_sys_exit_group kernel/exit.c:979 [inline] > > > __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:979 > > > do_syscall_64+0x192/0x770 arch/x86/entry/common.c:290 > > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > RIP: 0033:0x4570e9 > > > Code: 5d af fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 > > > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d > > > 01 f0 ff ff 0f 83 2b af fb ff c3 66 2e 0f 1f 84 00 00 00 00 > > > RSP: 002b:00007ffe35f12018 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > > > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00000000004570e9 > > > RDX: 0000000000410540 RSI: 0000000000a34c00 RDI: 0000000000000045 > > > RBP: 00000000004a43a4 R08: 000000000000000c R09: 0000000000000000 > > > R10: 0000000000d24940 R11: 0000000000000246 R12: 0000000000000000 > > > R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000008 > > > Modules linked in: > > > Dumping ftrace buffer: > > > (ftrace buffer empty) > > > ---[ end trace 17829b0f00569a59 ]--- > > > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 > > > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 > > > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c > > > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00 > > > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02 > > > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f > > > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728 > > > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401 > > > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98 > > > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000 > > > FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > > > > > > The prog is: > > > unshare(0x8020000) > > > semget$private(0x0, 0x4007, 0x0) > > > > > > kernel is on 9105b8aa50c182371533fc97db64fc8f26f051b3 > > > > > > and again it involved lots of oom kills, the repro eats all memory, a > > > process getting killed, frees some memory and the process repeats. > > > > I was too fast: I can't reproduce the memory leak. > > > > Can you send me the source for prog? > > > Here is the program: > https://gist.githubusercontent.com/dvyukov/03ec54b3429ade16fa07bf8b2379aff3/raw/ae4f654e279810de2505e8fa41b73dc1d77778e6/gistfile1.txt > > But we concluded this is not a leak, right? > It just creates large semaphores tied to a persistent ipcns. Once the > process is killed, all memory is released. When this program runs, it > eats all memory, then one of the subprocesses is oom-killed, part of > memory is released, then all memory is consumed again by a new > subprocess and this repeats. If all processes are killed, all memory > is released back. It seems to be working as intended. > > However, what you said about kernel.sem sysctl is useful and I think > we need to use it for additional sandboxing of syzkaller test > processes. I am thinking of applying: > > kernel.shmmax = 16777216 > kernel.shmall = 536870912 > kernel.shmmni = 1024 > kernel.msgmax = 8192 > kernel.msgmni = 1024 > kernel.msgmnb = 1024 > kernel.sem = 1024 1048576 500 1024 > > It should be enough to trigger bugs of any complexity (oom's aside), > but should prevent uncontrolled memory consumption. > Looking at the code I figured that these sysctls are > per-ipc-namespace, right? I.e. if I do sysctl from an ipcns, the > limits will be set only only for that ns. I won't use this initially, > but something to keep in mind if the global limits will fail in some > way. +Shakeel who was interested in memory isolation problems Setting these sysctl's globally does not help, as they are reset for new ipc namespaces (?). Setting them for test process namespaces does not help either, as it's trivial to do unshare(NEWIPC) (which the repro in fact does). It seems to make things somewhat better for syzkaller because any namespaces that a test creates are short-lived. But this seems to be a general resource isolation issue for containers. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-26 9:03 ` Dmitry Vyukov @ 2018-12-30 9:31 ` Dmitry Vyukov 2018-12-31 6:35 ` Dmitry Vyukov 0 siblings, 1 reply; 23+ messages in thread From: Dmitry Vyukov @ 2018-12-30 9:31 UTC (permalink / raw) To: Manfred Spraul, Shakeel Butt Cc: syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso On Wed, Dec 26, 2018 at 10:03 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > Hello Dmitry, > > > > > > On 12/23/18 11:42 AM, Dmitry Vyukov wrote: > > > > Actually was able to reproduce this with a syzkaller program: > > > > ./syz-execprog -repeat=0 -procs=10 prog > > > > ... > > > > kasan: CONFIG_KASAN_INLINE enabled > > > > kasan: GPF could be caused by NULL-ptr deref or user memory access > > > > general protection fault: 0000 [#1] PREEMPT SMP KASAN > > > > CPU: 1 PID: 8788 Comm: syz-executor8 Not tainted 4.20.0-rc7+ #6 > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > > > > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 > > > > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 > > > > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c > > > > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00 > > > > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02 > > > > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f > > > > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728 > > > > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401 > > > > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98 > > > > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000 > > > > FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000 > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0 > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > Call Trace: > > > > __list_del_entry include/linux/list.h:117 [inline] > > > > list_del include/linux/list.h:125 [inline] > > > > unlink_queue ipc/sem.c:786 [inline] > > > > freeary+0xddb/0x1c90 ipc/sem.c:1164 > > > > free_ipcs+0xf0/0x160 ipc/namespace.c:112 > > > > sem_exit_ns+0x20/0x40 ipc/sem.c:237 > > > > free_ipc_ns ipc/namespace.c:120 [inline] > > > > put_ipc_ns+0x55/0x160 ipc/namespace.c:152 > > > > free_nsproxy+0xc0/0x1f0 kernel/nsproxy.c:180 > > > > switch_task_namespaces+0xa5/0xc0 kernel/nsproxy.c:229 > > > > exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:234 > > > > do_exit+0x19e5/0x27d0 kernel/exit.c:866 > > > > do_group_exit+0x151/0x410 kernel/exit.c:970 > > > > __do_sys_exit_group kernel/exit.c:981 [inline] > > > > __se_sys_exit_group kernel/exit.c:979 [inline] > > > > __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:979 > > > > do_syscall_64+0x192/0x770 arch/x86/entry/common.c:290 > > > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > RIP: 0033:0x4570e9 > > > > Code: 5d af fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 > > > > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d > > > > 01 f0 ff ff 0f 83 2b af fb ff c3 66 2e 0f 1f 84 00 00 00 00 > > > > RSP: 002b:00007ffe35f12018 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > > > > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00000000004570e9 > > > > RDX: 0000000000410540 RSI: 0000000000a34c00 RDI: 0000000000000045 > > > > RBP: 00000000004a43a4 R08: 000000000000000c R09: 0000000000000000 > > > > R10: 0000000000d24940 R11: 0000000000000246 R12: 0000000000000000 > > > > R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000008 > > > > Modules linked in: > > > > Dumping ftrace buffer: > > > > (ftrace buffer empty) > > > > ---[ end trace 17829b0f00569a59 ]--- > > > > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 > > > > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 > > > > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c > > > > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00 > > > > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02 > > > > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f > > > > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728 > > > > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401 > > > > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98 > > > > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000 > > > > FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000 > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0 > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > > > > > > > > > The prog is: > > > > unshare(0x8020000) > > > > semget$private(0x0, 0x4007, 0x0) > > > > > > > > kernel is on 9105b8aa50c182371533fc97db64fc8f26f051b3 > > > > > > > > and again it involved lots of oom kills, the repro eats all memory, a > > > > process getting killed, frees some memory and the process repeats. > > > > > > I was too fast: I can't reproduce the memory leak. > > > > > > Can you send me the source for prog? > > > > > > Here is the program: > > https://gist.githubusercontent.com/dvyukov/03ec54b3429ade16fa07bf8b2379aff3/raw/ae4f654e279810de2505e8fa41b73dc1d77778e6/gistfile1.txt > > > > But we concluded this is not a leak, right? > > It just creates large semaphores tied to a persistent ipcns. Once the > > process is killed, all memory is released. When this program runs, it > > eats all memory, then one of the subprocesses is oom-killed, part of > > memory is released, then all memory is consumed again by a new > > subprocess and this repeats. If all processes are killed, all memory > > is released back. It seems to be working as intended. > > > > However, what you said about kernel.sem sysctl is useful and I think > > we need to use it for additional sandboxing of syzkaller test > > processes. I am thinking of applying: > > > > kernel.shmmax = 16777216 > > kernel.shmall = 536870912 > > kernel.shmmni = 1024 > > kernel.msgmax = 8192 > > kernel.msgmni = 1024 > > kernel.msgmnb = 1024 > > kernel.sem = 1024 1048576 500 1024 > > > > It should be enough to trigger bugs of any complexity (oom's aside), > > but should prevent uncontrolled memory consumption. > > Looking at the code I figured that these sysctls are > > per-ipc-namespace, right? I.e. if I do sysctl from an ipcns, the > > limits will be set only only for that ns. I won't use this initially, > > but something to keep in mind if the global limits will fail in some > > way. > > +Shakeel who was interested in memory isolation problems > > Setting these sysctl's globally does not help, as they are reset for > new ipc namespaces (?). Setting them for test process namespaces does > not help either, as it's trivial to do unshare(NEWIPC) (which the > repro in fact does). It seems to make things somewhat better for > syzkaller because any namespaces that a test creates are short-lived. > But this seems to be a general resource isolation issue for > containers. The stack overflow was reported 5 months ago with a bunch of repros: https://groups.google.com/forum/#!msg/syzkaller-bugs/C7d0Hm6YcDM/nQeciKgtCgAJ now we are spending time re-debugging other incarnations of the same bug. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-30 9:31 ` Dmitry Vyukov @ 2018-12-31 6:35 ` Dmitry Vyukov 0 siblings, 0 replies; 23+ messages in thread From: Dmitry Vyukov @ 2018-12-31 6:35 UTC (permalink / raw) To: Manfred Spraul, Shakeel Butt Cc: syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso On Sun, Dec 30, 2018 at 10:31 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Wed, Dec 26, 2018 at 10:03 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > Hello Dmitry, > > > > > > > > On 12/23/18 11:42 AM, Dmitry Vyukov wrote: > > > > > Actually was able to reproduce this with a syzkaller program: > > > > > ./syz-execprog -repeat=0 -procs=10 prog > > > > > ... > > > > > kasan: CONFIG_KASAN_INLINE enabled > > > > > kasan: GPF could be caused by NULL-ptr deref or user memory access > > > > > general protection fault: 0000 [#1] PREEMPT SMP KASAN > > > > > CPU: 1 PID: 8788 Comm: syz-executor8 Not tainted 4.20.0-rc7+ #6 > > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > > > > > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 > > > > > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 > > > > > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c > > > > > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00 > > > > > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02 > > > > > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f > > > > > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728 > > > > > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401 > > > > > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98 > > > > > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000 > > > > > FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000 > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0 > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > > Call Trace: > > > > > __list_del_entry include/linux/list.h:117 [inline] > > > > > list_del include/linux/list.h:125 [inline] > > > > > unlink_queue ipc/sem.c:786 [inline] > > > > > freeary+0xddb/0x1c90 ipc/sem.c:1164 > > > > > free_ipcs+0xf0/0x160 ipc/namespace.c:112 > > > > > sem_exit_ns+0x20/0x40 ipc/sem.c:237 > > > > > free_ipc_ns ipc/namespace.c:120 [inline] > > > > > put_ipc_ns+0x55/0x160 ipc/namespace.c:152 > > > > > free_nsproxy+0xc0/0x1f0 kernel/nsproxy.c:180 > > > > > switch_task_namespaces+0xa5/0xc0 kernel/nsproxy.c:229 > > > > > exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:234 > > > > > do_exit+0x19e5/0x27d0 kernel/exit.c:866 > > > > > do_group_exit+0x151/0x410 kernel/exit.c:970 > > > > > __do_sys_exit_group kernel/exit.c:981 [inline] > > > > > __se_sys_exit_group kernel/exit.c:979 [inline] > > > > > __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:979 > > > > > do_syscall_64+0x192/0x770 arch/x86/entry/common.c:290 > > > > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > > RIP: 0033:0x4570e9 > > > > > Code: 5d af fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 > > > > > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d > > > > > 01 f0 ff ff 0f 83 2b af fb ff c3 66 2e 0f 1f 84 00 00 00 00 > > > > > RSP: 002b:00007ffe35f12018 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > > > > > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00000000004570e9 > > > > > RDX: 0000000000410540 RSI: 0000000000a34c00 RDI: 0000000000000045 > > > > > RBP: 00000000004a43a4 R08: 000000000000000c R09: 0000000000000000 > > > > > R10: 0000000000d24940 R11: 0000000000000246 R12: 0000000000000000 > > > > > R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000008 > > > > > Modules linked in: > > > > > Dumping ftrace buffer: > > > > > (ftrace buffer empty) > > > > > ---[ end trace 17829b0f00569a59 ]--- > > > > > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 > > > > > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 > > > > > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c > > > > > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00 > > > > > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02 > > > > > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f > > > > > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728 > > > > > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401 > > > > > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98 > > > > > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000 > > > > > FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000 > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0 > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > > > > > > > > > > > > The prog is: > > > > > unshare(0x8020000) > > > > > semget$private(0x0, 0x4007, 0x0) > > > > > > > > > > kernel is on 9105b8aa50c182371533fc97db64fc8f26f051b3 > > > > > > > > > > and again it involved lots of oom kills, the repro eats all memory, a > > > > > process getting killed, frees some memory and the process repeats. > > > > > > > > I was too fast: I can't reproduce the memory leak. > > > > > > > > Can you send me the source for prog? > > > > > > > > > Here is the program: > > > https://gist.githubusercontent.com/dvyukov/03ec54b3429ade16fa07bf8b2379aff3/raw/ae4f654e279810de2505e8fa41b73dc1d77778e6/gistfile1.txt > > > > > > But we concluded this is not a leak, right? > > > It just creates large semaphores tied to a persistent ipcns. Once the > > > process is killed, all memory is released. When this program runs, it > > > eats all memory, then one of the subprocesses is oom-killed, part of > > > memory is released, then all memory is consumed again by a new > > > subprocess and this repeats. If all processes are killed, all memory > > > is released back. It seems to be working as intended. > > > > > > However, what you said about kernel.sem sysctl is useful and I think > > > we need to use it for additional sandboxing of syzkaller test > > > processes. I am thinking of applying: > > > > > > kernel.shmmax = 16777216 > > > kernel.shmall = 536870912 > > > kernel.shmmni = 1024 > > > kernel.msgmax = 8192 > > > kernel.msgmni = 1024 > > > kernel.msgmnb = 1024 > > > kernel.sem = 1024 1048576 500 1024 > > > > > > It should be enough to trigger bugs of any complexity (oom's aside), > > > but should prevent uncontrolled memory consumption. > > > Looking at the code I figured that these sysctls are > > > per-ipc-namespace, right? I.e. if I do sysctl from an ipcns, the > > > limits will be set only only for that ns. I won't use this initially, > > > but something to keep in mind if the global limits will fail in some > > > way. > > > > +Shakeel who was interested in memory isolation problems > > > > Setting these sysctl's globally does not help, as they are reset for > > new ipc namespaces (?). Setting them for test process namespaces does > > not help either, as it's trivial to do unshare(NEWIPC) (which the > > repro in fact does). It seems to make things somewhat better for > > syzkaller because any namespaces that a test creates are short-lived. > > But this seems to be a general resource isolation issue for > > containers. > > > The stack overflow was reported 5 months ago with a bunch of repros: > https://groups.google.com/forum/#!msg/syzkaller-bugs/C7d0Hm6YcDM/nQeciKgtCgAJ > now we are spending time re-debugging other incarnations of the same bug. FTR, the main place to track the stack overflow is now this thread: https://groups.google.com/forum/#!msg/syzkaller-bugs/nFeC8-UG1gg/_KMuN0ViFQAJ Manfred, you are proceeding with submission of the race fix, right? Since it includes 3 Reported-by tags I will not mark these bugs as dup of "kernel panic: corrupted stack end in wb_workfn", otherwise it will cause confusion (the patch will appear as fixing the stack overflow which it is not). Of, if you remove the tags, we can mark these 3 bugs as a dup. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-23 12:32 ` Manfred Spraul 2018-12-25 9:35 ` Dmitry Vyukov @ 2018-12-25 9:41 ` Dmitry Vyukov 1 sibling, 0 replies; 23+ messages in thread From: Dmitry Vyukov @ 2018-12-25 9:41 UTC (permalink / raw) To: Manfred Spraul Cc: syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso On Sun, Dec 23, 2018 at 1:32 PM Manfred Spraul <manfred@colorfullife.com> wrote: > > Hi Dmitry, > > let's simplify the mail, otherwise noone can follow: > > On 12/23/18 11:42 AM, Dmitry Vyukov wrote: > > > >> My naive attempts to re-reproduce this failed so far. > >> But I noticed that _all_ logs for these 3 crashes: > >> https://syzkaller.appspot.com/bug?extid=c92d3646e35bc5d1a909 > >> https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac > >> https://syzkaller.appspot.com/bug?extid=9d8b6fa6ee7636f350c1 > >> involve low memory conditions. My gut feeling says this is not a > >> coincidence. This is also probably the reason why all reproducers > >> create large sem sets. There must be some bad interaction between low > >> memory condition and semaphores/ipc namespaces. > > > > Actually was able to reproduce this with a syzkaller program: > > > > ./syz-execprog -repeat=0 -procs=10 prog > > ... > > kasan: CONFIG_KASAN_INLINE enabled > > kasan: GPF could be caused by NULL-ptr deref or user memory access > > general protection fault: 0000 [#1] PREEMPT SMP KASAN > > CPU: 1 PID: 8788 Comm: syz-executor8 Not tainted 4.20.0-rc7+ #6 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 > > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 > > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c > > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00 > > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02 > > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f > > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728 > > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401 > > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98 > > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000 > > FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Call Trace: > > __list_del_entry include/linux/list.h:117 [inline] > > list_del include/linux/list.h:125 [inline] > > unlink_queue ipc/sem.c:786 [inline] > > freeary+0xddb/0x1c90 ipc/sem.c:1164 > > free_ipcs+0xf0/0x160 ipc/namespace.c:112 > > sem_exit_ns+0x20/0x40 ipc/sem.c:237 > > free_ipc_ns ipc/namespace.c:120 [inline] > > put_ipc_ns+0x55/0x160 ipc/namespace.c:152 > > free_nsproxy+0xc0/0x1f0 kernel/nsproxy.c:180 > > switch_task_namespaces+0xa5/0xc0 kernel/nsproxy.c:229 > > exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:234 > > do_exit+0x19e5/0x27d0 kernel/exit.c:866 > > do_group_exit+0x151/0x410 kernel/exit.c:970 > > __do_sys_exit_group kernel/exit.c:981 [inline] > > __se_sys_exit_group kernel/exit.c:979 [inline] > > __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:979 > > do_syscall_64+0x192/0x770 arch/x86/entry/common.c:290 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > RIP: 0033:0x4570e9 > > Code: 5d af fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 > > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d > > 01 f0 ff ff 0f 83 2b af fb ff c3 66 2e 0f 1f 84 00 00 00 00 > > RSP: 002b:00007ffe35f12018 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > > RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00000000004570e9 > > RDX: 0000000000410540 RSI: 0000000000a34c00 RDI: 0000000000000045 > > RBP: 00000000004a43a4 R08: 000000000000000c R09: 0000000000000000 > > R10: 0000000000d24940 R11: 0000000000000246 R12: 0000000000000000 > > R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000008 > > Modules linked in: > > Dumping ftrace buffer: > > (ftrace buffer empty) > > ---[ end trace 17829b0f00569a59 ]--- > > RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 > > Code: ad de 4c 8b 26 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 > > 89 da 48 39 c3 74 65 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c > > 02 00 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8 00 > > RSP: 0018:ffff88804faef210 EFLAGS: 00010a02 > > RAX: dffffc0000000000 RBX: f817edba555e1f00 RCX: ffffffff831bad5f > > RDX: 1f02fdb74aabc3e0 RSI: ffff88801b8a0720 RDI: ffff88801b8a0728 > > RBP: ffff88804faef228 R08: fffff52001055401 R09: fffff52001055401 > > R10: 0000000000000001 R11: fffff52001055400 R12: ffff88802d52cc98 > > R13: ffff88801b8a0728 R14: ffff88801b8a0720 R15: dffffc0000000000 > > FS: 0000000000d24940(0000) GS:ffff88802d500000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00000000004bb580 CR3: 0000000011177005 CR4: 00000000003606e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > > > The prog is: > > unshare(0x8020000) > > semget$private(0x0, 0x4007, 0x0) > > > > kernel is on 9105b8aa50c182371533fc97db64fc8f26f051b3 > > > > and again it involved lots of oom kills, the repro eats all memory, a > > process getting killed, frees some memory and the process repeats. > > Ok, thus the above program triggers two bugs: > > - a huge memory leak with semaphore arrays > > - under OOM pressure, an oops. > > > 1) I can reproduce the memory leak, it happens all the time :-( > > I must look what is wrong. > > 2) regarding the crash: > > What differs under oom pressure? > > - kvmalloc can fall back to vmalloc() > > - the 2nd or 3rd of multiple allocations can fail, and that triggers a > rare codepath/race condition. > > - rcu callback can happen earlier that expected > > So far, I didn't notice anything unexpected :-( I started suspecting a stack overflow. But I was afraid if may be a KASAN artifact, as it both increases stack usage and disables vmap stacks. But I was able to reproduce this without KASAN and root cause at the same time. I am on v4.20, config is (basically just defconfig+kvmconfig): https://gist.githubusercontent.com/dvyukov/f8401c8da367088c789bfb953d42d3b3/raw/eac0e85d3db577ba68ec59acf916899b61741ee1/gistfile1.txt Running the syzkaller program gave me: Out of memory: Kill process 13971 (syz-executor) score 998 or sacrifice child Killed process 13971 (syz-executor) total-vm:37512kB, anon-rss:92kB, file-rss:0kB, shmem-rss:0kB oom_reaper: reaped process 13971 (syz-executor), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB Kernel panic - not syncing: corrupted stack end detected inside scheduler CPU: 3 PID: 2555 Comm: kworker/u12:3 Not tainted 4.20.0-rc7+ #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Workqueue: writeback wb_workfn (flush-8:0) Call Trace: dump_stack+0x1d4/0x2b5 lib/earlycpio.c:120 panic+0x25e/0x49c kernel/cpu.c:617 __schedule+0x1be8/0x21d0 preempt_schedule_common+0x35/0xe0 preempt_schedule+0x23/0x30 ___preempt_schedule+0x16/0x18 _raw_spin_unlock_irq+0x75/0x80 mark_work_canceling kernel/workqueue.c:747 [inline] __flush_work+0x4f5/0x970 kernel/workqueue.c:2996 flush_work+0x17/0x20 kernel/workqueue.c:3059 drain_all_pages+0x418/0x680 mm/page_alloc.c:4570 __alloc_pages_slowpath+0xb76/0x2c10 mm/page_alloc.c:4072 __alloc_pages_nodemask+0xa6c/0xe10 mm/page_alloc.c:5029 cache_grow_begin+0x9d/0x8a0 fallback_alloc+0x204/0x2e0 ____cache_alloc_node+0x1cc/0x1f0 slab_alloc_node mm/slub.c:2710 [inline] slab_alloc mm/slub.c:2752 [inline] kmem_cache_alloc+0x296/0x720 mm/slub.c:2769 mempool_alloc_slab+0x44/0x60 mm/mempool.c:130 mempool_alloc+0x174/0x4e0 mm/mempool.c:433 bvec_alloc+0x150/0x2d0 block/bio.c:485 bio_alloc_bioset+0x44e/0x650 block/bio.c:1455 ext4_bio_write_page+0xc11/0x1780 fs/ext4/resize.c:76 mpage_add_bh_to_extent fs/ext4/inode.c:2300 [inline] mpage_submit_page+0x138/0x230 fs/ext4/inode.c:2335 ext4_da_page_release_reservation fs/ext4/inode.c:1651 [inline] mpage_process_page_bufs+0x429/0x500 fs/ext4/inode.c:3226 mpage_prepare_extent_to_map+0xb2a/0x1640 fs/ext4/inode.c:154 ext4_inode_journal_mode fs/ext4/ext4_jbd2.h:411 [inline] ext4_should_journal_data fs/ext4/ext4_jbd2.h:427 [inline] ext4_writepages+0x112c/0x3a20 fs/ext4/inode.c:2190 test_and_set_bit arch/x86/include/asm/bitops.h:220 [inline] TestSetPageDirty include/linux/page-flags.h:287 [inline] do_writepages+0xfc/0x170 mm/page-writeback.c:2383 mark_inode_dirty_sync include/linux/fs.h:2124 [inline] __writeback_single_inode+0x1cd/0x12e0 fs/fs-writeback.c:1372 writeback_sb_inodes+0x6c7/0x1040 fs/fs-writeback.c:1795 __writeback_inodes_wb+0x1a3/0x310 fs/fs-writeback.c:1704 wb_writeback+0x92c/0xe10 include/trace/events/writeback.h:572 syz-executor invoked oom-killer: gfp_mask=0x7080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null), order=3, oom_score_adj=0 syz-executor cpuset=/ mems_allowed=0-1 wb_workfn+0xdf3/0x1600 fs/pnode.c:430 get_unbound_pool kernel/workqueue.c:3437 [inline] process_one_work+0xcf3/0x1be0 kernel/workqueue.c:3612 worker_thread+0x17d/0x12f0 kernel/workqueue.c:2289 __write_once_size include/linux/compiler.h:218 [inline] __list_del include/linux/list.h:106 [inline] __list_del_entry include/linux/list.h:120 [inline] list_del_init include/linux/list.h:159 [inline] kthread+0x354/0x430 kernel/kthread.c:1010 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:358 CPU: 0 PID: 6768 Comm: syz-executor Not tainted 4.20.0-rc7+ #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x1d4/0x2b5 lib/earlycpio.c:120 dump_header+0x294/0xfaf oom_killer_enable mm/oom_kill.c:715 [inline] oom_kill_process+0xa3f/0xd20 mm/oom_kill.c:750 out_of_memory+0x88c/0x12a0 mm/fadvise.c:184 compound_order include/linux/mm.h:707 [inline] page_hstate include/linux/hugetlb.h:469 [inline] __alloc_pages_slowpath+0x1cfa/0x2c10 mm/page_alloc.c:7820 __alloc_pages_nodemask+0xa6c/0xe10 mm/page_alloc.c:5029 copy_process+0x94c/0x7b00 variable_test_bit arch/x86/include/asm/bitops.h:332 [inline] cpumask_test_cpu include/linux/cpumask.h:344 [inline] trace_sched_process_fork include/trace/events/sched.h:288 [inline] _do_fork+0x191/0xf20 kernel/fork.c:2232 __x64_sys_clone+0xbf/0x150 kernel/fork.c:2340 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:268 [inline] do_syscall_32_irqs_on arch/x86/entry/common.c:341 [inline] do_syscall_64+0x192/0x770 arch/x86/entry/common.c:349 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x45578b Code: db 45 85 f6 0f 85 95 01 00 00 64 4c 8b 04 25 10 00 00 00 31 d2 4d 8d 90 d0 02 00 00 31 f6 bf 11 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 d6 00 00 00 85 c0 41 89 c5 0f 85 dd 00 00 RSP: 002b:00007fff9dc6ca20 EFLAGS: 00000246 ORIG_RAX: 0000000000000038 RAX: ffffffffffffffda RBX: 00007fff9dc6ca20 RCX: 000000000045578b RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011 RBP: 00007fff9dc6ca70 R08: 0000000001d0d940 R09: 0000000000000000 R10: 0000000001d0dc10 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000020 R14: 0000000000000000 R15: 0000000000000000 and second time: [ 281.244340] Kernel panic - not syncing: corrupted stack end detected inside scheduler [ 281.245754] CPU: 2 PID: 6265 Comm: kworker/u12:4 Not tainted 4.20.0-rc7+ #6 [ 281.246887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 281.248240] Workqueue: writeback wb_workfn (flush-8:0) [ 281.248992] Call Trace: [ 281.249364] dump_stack+0x1d4/0x2b5 [ 281.252261] panic+0x25e/0x49c [ 281.255403] __schedule+0x1be8/0x21d0 [ 281.263754] preempt_schedule_common+0x35/0xe0 [ 281.264425] preempt_schedule+0x23/0x30 [ 281.265010] ___preempt_schedule+0x16/0x18 [ 281.265635] _raw_spin_unlock_irqrestore+0xbf/0xe0 [ 281.266357] __remove_mapping+0x77b/0x17e0 [ 281.291388] shrink_page_list+0x5232/0xa6b0 [ 281.414732] shrink_inactive_list+0x997/0x1ab0 [ 281.419009] shrink_node_memcg+0x9de/0x16a0 [ 281.424799] shrink_node+0x3af/0x1530 [ 281.433316] do_try_to_free_pages+0x3bc/0x1170 [ 281.435723] try_to_free_pages+0x43c/0x9e0 [ 281.442644] __alloc_pages_slowpath+0xa4c/0x2c10 [ 281.459197] __alloc_pages_nodemask+0xa6c/0xe10 [ 281.466504] alloc_pages_current+0xb6/0x1e0 [ 281.467326] __page_cache_alloc+0x332/0x560 [ 281.471049] pagecache_get_page+0x2af/0xdd0 [ 281.487360] __getblk_gfp+0x36e/0xd50 [ 281.497989] ext4_read_block_bitmap_nowait+0x2ed/0x1e10 [ 281.509111] ext4_read_block_bitmap+0x23/0x80 [ 281.509934] ext4_mb_mark_diskspace_used+0x180/0x10a0 [ 281.512755] ext4_mb_new_blocks+0xeb7/0x4260 [ 281.540189] ext4_ext_map_blocks+0x2776/0x5b00 [ 281.556040] ext4_map_blocks+0xcaa/0x1860 [ 281.559967] ext4_writepages+0x1e4c/0x3a20 [ 281.575738] do_writepages+0xfc/0x170 [ 281.578546] __writeback_single_inode+0x1cd/0x12e0 [ 281.592498] writeback_sb_inodes+0x6c7/0x1040 [ 281.598601] __writeback_inodes_wb+0x1a3/0x310 [ 281.600816] wb_writeback+0x92c/0xe10 [ 281.618064] wb_workfn+0xdf3/0x1600 [ 281.635970] process_one_work+0xcf3/0x1be0 [ 281.662614] worker_thread+0x17d/0x12f0 [ 281.680989] kthread+0x354/0x430 [ 281.682529] ret_from_fork+0x3a/0x50 One time it took about 10 seconds and another time it took 5 minutes. Whom should we route this to? It looks both mm and ext4 related. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-23 9:57 ` Dmitry Vyukov 2018-12-23 10:30 ` Dmitry Vyukov @ 2018-12-23 12:25 ` Manfred Spraul 2019-01-03 22:18 ` Shakeel Butt 1 sibling, 1 reply; 23+ messages in thread From: Manfred Spraul @ 2018-12-23 12:25 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso Hello Dmitry, On 12/23/18 10:57 AM, Dmitry Vyukov wrote: > > I can reproduce this infinite memory consumption with the C program: > https://gist.githubusercontent.com/dvyukov/03ec54b3429ade16fa07bf8b2379aff3/raw/ae4f654e279810de2505e8fa41b73dc1d77778e6/gistfile1.txt > > But this is working as intended, right? It just creates infinite > number of large semaphore sets, which reasonably consumes infinite > amount of memory. > Except that it also violates the memcg bound and a process can have > effectively unlimited amount of such "drum memory" in semaphores. Yes, this is as intended: If you call semget(), then you can use memory, up to the limits in /proc/sys/kernel/sem. Memcg is not taken into account, an admin must set /proc/sys/kernel/sem. The default are "infinite amount of memory allowed", as this is the most sane default: We had a logic that tried to autotune (i.e.: a new namespace "inherits" a fraction of the parent namespaces memory limits), but this we more or less always wrong. -- Manfred ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-23 12:25 ` Manfred Spraul @ 2019-01-03 22:18 ` Shakeel Butt 2019-01-07 18:04 ` Manfred Spraul 0 siblings, 1 reply; 23+ messages in thread From: Shakeel Butt @ 2019-01-03 22:18 UTC (permalink / raw) To: Manfred Spraul Cc: Dmitry Vyukov, syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso Hi Manfred, On Sun, Dec 23, 2018 at 4:26 AM Manfred Spraul <manfred@colorfullife.com> wrote: > > Hello Dmitry, > > On 12/23/18 10:57 AM, Dmitry Vyukov wrote: > > > > I can reproduce this infinite memory consumption with the C program: > > https://gist.githubusercontent.com/dvyukov/03ec54b3429ade16fa07bf8b2379aff3/raw/ae4f654e279810de2505e8fa41b73dc1d77778e6/gistfile1.txt > > > > But this is working as intended, right? It just creates infinite > > number of large semaphore sets, which reasonably consumes infinite > > amount of memory. > > Except that it also violates the memcg bound and a process can have > > effectively unlimited amount of such "drum memory" in semaphores. > > Yes, this is as intended: > > If you call semget(), then you can use memory, up to the limits in > /proc/sys/kernel/sem. > > Memcg is not taken into account, an admin must set /proc/sys/kernel/sem. > > The default are "infinite amount of memory allowed", as this is the most > sane default: We had a logic that tried to autotune (i.e.: a new > namespace "inherits" a fraction of the parent namespaces memory limits), > but this we more or less always wrong. > > What's the disadvantage of setting the limits in /proc/sys/kernel/sem high and let the task's memcg limits the number of semaphore a process can create? Please note that the memory underlying shmget and msgget is already accounted to memcg. thanks, Shakeel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2019-01-03 22:18 ` Shakeel Butt @ 2019-01-07 18:04 ` Manfred Spraul 2019-01-07 18:22 ` Shakeel Butt 0 siblings, 1 reply; 23+ messages in thread From: Manfred Spraul @ 2019-01-07 18:04 UTC (permalink / raw) To: Shakeel Butt Cc: Dmitry Vyukov, syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W. Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso On 1/3/19 11:18 PM, Shakeel Butt wrote: > Hi Manfred, > > On Sun, Dec 23, 2018 at 4:26 AM Manfred Spraul <manfred@colorfullife.com> wrote: >> Hello Dmitry, >> >> On 12/23/18 10:57 AM, Dmitry Vyukov wrote: >>> I can reproduce this infinite memory consumption with the C program: >>> https://gist.githubusercontent.com/dvyukov/03ec54b3429ade16fa07bf8b2379aff3/raw/ae4f654e279810de2505e8fa41b73dc1d77778e6/gistfile1.txt >>> >>> But this is working as intended, right? It just creates infinite >>> number of large semaphore sets, which reasonably consumes infinite >>> amount of memory. >>> Except that it also violates the memcg bound and a process can have >>> effectively unlimited amount of such "drum memory" in semaphores. >> Yes, this is as intended: >> >> If you call semget(), then you can use memory, up to the limits in >> /proc/sys/kernel/sem. >> >> Memcg is not taken into account, an admin must set /proc/sys/kernel/sem. >> >> The default are "infinite amount of memory allowed", as this is the most >> sane default: We had a logic that tried to autotune (i.e.: a new >> namespace "inherits" a fraction of the parent namespaces memory limits), >> but this we more or less always wrong. >> >> > What's the disadvantage of setting the limits in /proc/sys/kernel/sem > high and let the task's memcg limits the number of semaphore a process > can create? Please note that the memory underlying shmget and msgget > is already accounted to memcg. Nothing, it it just a question of implementing it. I'll try to look at it. -- Manfred ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2019-01-07 18:04 ` Manfred Spraul @ 2019-01-07 18:22 ` Shakeel Butt 0 siblings, 0 replies; 23+ messages in thread From: Shakeel Butt @ 2019-01-07 18:22 UTC (permalink / raw) To: Manfred Spraul Cc: Dmitry Vyukov, syzbot+1145ec2e23165570c3ac, Andrew Morton, David Howells, Eric W . Biederman, ktsanaktsidis, LKML, Michal Hocko, Mike Rapoport, Stephen Rothwell, syzkaller-bugs, Matthew Wilcox, Davidlohr Bueso, Shakeel Butt On Mon, Jan 7, 2019 at 10:04 AM Manfred Spraul <manfred@colorfullife.com> wrote: > > On 1/3/19 11:18 PM, Shakeel Butt wrote: > > Hi Manfred, > > > > On Sun, Dec 23, 2018 at 4:26 AM Manfred Spraul > > <manfred@colorfullife.com> wrote: > >> Hello Dmitry, > >> > >> On 12/23/18 10:57 AM, Dmitry Vyukov wrote: > >>> I can reproduce this infinite memory consumption with the C > >>> program: > >>> https://gist.githubusercontent.com/dvyukov/03ec54b3429ade16fa07bf8b2379aff3/raw/ae4f654e279810de2505e8fa41b73dc1d77778e6/gistfile1.txt > >>> > >>> But this is working as intended, right? It just creates infinite > >>> number of large semaphore sets, which reasonably consumes infinite > >>> amount of memory. > >>> Except that it also violates the memcg bound and a process can > >>> have > >>> effectively unlimited amount of such "drum memory" in semaphores. > >> Yes, this is as intended: > >> > >> If you call semget(), then you can use memory, up to the limits in > >> /proc/sys/kernel/sem. > >> > >> Memcg is not taken into account, an admin must set > >> /proc/sys/kernel/sem. > >> > >> The default are "infinite amount of memory allowed", as this is the > >> most > >> sane default: We had a logic that tried to autotune (i.e.: a new > >> namespace "inherits" a fraction of the parent namespaces memory > >> limits), > >> but this we more or less always wrong. > >> > >> > > What's the disadvantage of setting the limits in > > /proc/sys/kernel/sem > > high and let the task's memcg limits the number of semaphore a > > process > > can create? Please note that the memory underlying shmget and msgget > > is already accounted to memcg. > > Nothing, it it just a question of implementing it. > I think it should be something like following: --- ipc/sem.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/ipc/sem.c b/ipc/sem.c index 745dc6187e84..ad63df2658aa 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -494,7 +494,7 @@ static struct sem_array *sem_alloc(size_t nsems) return NULL; size = sizeof(*sma) + nsems * sizeof(sma->sems[0]); - sma = kvmalloc(size, GFP_KERNEL); + sma = kvmalloc(size, GFP_KERNEL_ACCOUNT); if (unlikely(!sma)) return NULL; @@ -1897,7 +1897,8 @@ static struct sem_undo *find_alloc_undo(struct ipc_namespace *ns, int semid) rcu_read_unlock(); /* step 2: allocate new undo structure */ - new = kzalloc(sizeof(struct sem_undo) + sizeof(short)*nsems, GFP_KERNEL); + new = kzalloc(sizeof(struct sem_undo) + sizeof(short)*nsems, + GFP_KERNEL_ACCOUNT); if (!new) { ipc_rcu_putref(&sma->sem_perm, sem_rcu_free); return ERR_PTR(-ENOMEM); -- 2.20.1.97.g81188d93c3-goog ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-11 20:23 general protection fault in put_pid syzbot 2018-12-12 10:55 ` Dmitry Vyukov @ 2019-03-27 20:10 ` syzbot 2019-03-27 20:27 ` Matthew Wilcox 2019-11-07 13:42 ` syzbot 2 siblings, 1 reply; 23+ messages in thread From: syzbot @ 2019-03-27 20:10 UTC (permalink / raw) To: akpm, clm, dan.carpenter, dave, dhowells, dsterba, dvyukov, ebiederm, jbacik, ktkhai, ktsanaktsidis, linux-btrfs, linux-kernel, linux-mm, manfred, mhocko, nborisov, penguin-kernel, penguin-kernel, rppt, sfr, shakeelb, syzkaller-bugs, vdavydov.dev, willy syzbot has bisected this bug to: commit b9b8a41adeff5666b402996020b698504c927353 Author: Dan Carpenter <dan.carpenter@oracle.com> Date: Mon Aug 20 08:25:33 2018 +0000 btrfs: use after free in btrfs_quota_enable bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=14155a1f200000 start commit: f5d58277 Merge branch 'for-linus' of git://git.kernel.org/.. git tree: upstream final crash: https://syzkaller.appspot.com/x/report.txt?x=16155a1f200000 console output: https://syzkaller.appspot.com/x/log.txt?x=12155a1f200000 kernel config: https://syzkaller.appspot.com/x/.config?x=c8970c89a0efbb23 dashboard link: https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16803afb400000 Reported-by: syzbot+1145ec2e23165570c3ac@syzkaller.appspotmail.com Fixes: b9b8a41adeff ("btrfs: use after free in btrfs_quota_enable") For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2019-03-27 20:10 ` syzbot @ 2019-03-27 20:27 ` Matthew Wilcox 2019-03-27 22:51 ` David Sterba 0 siblings, 1 reply; 23+ messages in thread From: Matthew Wilcox @ 2019-03-27 20:27 UTC (permalink / raw) To: syzbot Cc: akpm, clm, dan.carpenter, dave, dhowells, dsterba, dvyukov, ebiederm, jbacik, ktkhai, ktsanaktsidis, linux-btrfs, linux-kernel, linux-mm, manfred, mhocko, nborisov, penguin-kernel, rppt, sfr, shakeelb, syzkaller-bugs, vdavydov.dev On Wed, Mar 27, 2019 at 01:10:01PM -0700, syzbot wrote: > syzbot has bisected this bug to: > > commit b9b8a41adeff5666b402996020b698504c927353 > Author: Dan Carpenter <dan.carpenter@oracle.com> > Date: Mon Aug 20 08:25:33 2018 +0000 > > btrfs: use after free in btrfs_quota_enable Not plausible. Try again. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2019-03-27 20:27 ` Matthew Wilcox @ 2019-03-27 22:51 ` David Sterba 0 siblings, 0 replies; 23+ messages in thread From: David Sterba @ 2019-03-27 22:51 UTC (permalink / raw) To: Matthew Wilcox Cc: syzbot, akpm, clm, dan.carpenter, dave, dhowells, dsterba, dvyukov, ebiederm, jbacik, ktkhai, ktsanaktsidis, linux-btrfs, linux-kernel, linux-mm, manfred, mhocko, nborisov, penguin-kernel, rppt, sfr, shakeelb, syzkaller-bugs, vdavydov.dev On Wed, Mar 27, 2019 at 01:27:12PM -0700, Matthew Wilcox wrote: > On Wed, Mar 27, 2019 at 01:10:01PM -0700, syzbot wrote: > > syzbot has bisected this bug to: > > > > commit b9b8a41adeff5666b402996020b698504c927353 > > Author: Dan Carpenter <dan.carpenter@oracle.com> > > Date: Mon Aug 20 08:25:33 2018 +0000 > > > > btrfs: use after free in btrfs_quota_enable > > Not plausible. Try again. Agreed, grep for 'btrfs' in the console log does not show anything, ie. no messages, slab caches nor functions on the stack. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: general protection fault in put_pid 2018-12-11 20:23 general protection fault in put_pid syzbot 2018-12-12 10:55 ` Dmitry Vyukov 2019-03-27 20:10 ` syzbot @ 2019-11-07 13:42 ` syzbot 2 siblings, 0 replies; 23+ messages in thread From: syzbot @ 2019-11-07 13:42 UTC (permalink / raw) To: akpm, aryabinin, bp, cai, clm, dan.carpenter, dave, dhowells, dsterba, dsterba, dvyukov, ebiederm, glider, hpa, jbacik, ktkhai, ktsanaktsidis, linux-btrfs, linux-kernel, linux-mm, manfred, mhocko, mingo, nborisov, penguin-kernel, penguin-kernel, rppt, sfr, shakeelb, syzkaller-bugs, tglx, torvalds, vdavydov.dev, willy syzbot suspects this bug was fixed by commit: commit a8e911d13540487942d53137c156bd7707f66e5d Author: Qian Cai <cai@lca.pw> Date: Fri Feb 1 22:20:20 2019 +0000 x86_64: increase stack size for KASAN_EXTRA bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10364f3c600000 start commit: f5d58277 Merge branch 'for-linus' of git://git.kernel.org/.. git tree: upstream kernel config: https://syzkaller.appspot.com/x/.config?x=c8970c89a0efbb23 dashboard link: https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16803afb400000 If the result looks correct, please mark the bug fixed by replying with: #syz fix: x86_64: increase stack size for KASAN_EXTRA For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2019-11-07 13:42 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-12-11 20:23 general protection fault in put_pid syzbot 2018-12-12 10:55 ` Dmitry Vyukov 2018-12-19 9:04 ` Manfred Spraul 2018-12-20 15:36 ` Dmitry Vyukov 2018-12-22 19:07 ` Manfred Spraul 2018-12-23 7:37 ` Dmitry Vyukov 2018-12-23 9:57 ` Dmitry Vyukov 2018-12-23 10:30 ` Dmitry Vyukov 2018-12-23 10:42 ` Dmitry Vyukov 2018-12-23 12:32 ` Manfred Spraul 2018-12-25 9:35 ` Dmitry Vyukov 2018-12-26 9:03 ` Dmitry Vyukov 2018-12-30 9:31 ` Dmitry Vyukov 2018-12-31 6:35 ` Dmitry Vyukov 2018-12-25 9:41 ` Dmitry Vyukov 2018-12-23 12:25 ` Manfred Spraul 2019-01-03 22:18 ` Shakeel Butt 2019-01-07 18:04 ` Manfred Spraul 2019-01-07 18:22 ` Shakeel Butt 2019-03-27 20:10 ` syzbot 2019-03-27 20:27 ` Matthew Wilcox 2019-03-27 22:51 ` David Sterba 2019-11-07 13:42 ` syzbot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).