On Thu, Apr 22, 2021 at 02:05:17PM +0200, Peter Zijlstra wrote: > From: Chris Hyser > > This patch provides support for setting and copying core scheduling > 'task cookies' between threads (PID), processes (TGID), and process > groups (PGID). Hello. It seems that there is some issue within the scheduler code that can be triggered via this interface: # gcc -std=gnu99 -Wextra -Werror prctl-sched-core-oops-repro.c -o prctl-sched-core-oops-repro # ../src/strace -fvq -eprctl,clone,setsid ./prctl-sched-core-oops-repro clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f271f875890) = 239820 [pid 239820] setsid() = 239820 [pid 239820] +++ exited with 0 +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=239820, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- Iteration 0 status: 0 prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, 239816, 0x2 /* PIDTYPE_PGID */, NULL) = 0 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f271f875890) = 239821 [pid 239821] setsid() = ? [pid 239821] +++ killed by SIGKILL +++ --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=239821, si_uid=0, si_status=SIGKILL, si_utime=0, si_stime=0} --- Iteration 1 status: 0x9 prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, 239816, 0x2 /* PIDTYPE_PGID */, NULL) = 0 +++ exited with 0 +++ kmsg indicates that a NULL pointer dereference has occurred: [76195.611570] BUG: kernel NULL pointer dereference, address: 0000000000000000 ... [76195.621771] RIP: 0010:do_raw_spin_trylock+0x5/0x40 ... [76195.640144] Call Trace: [76195.640706] _raw_spin_lock_nested+0x37/0x80 [76195.641645] ? raw_spin_rq_lock_nested+0x4b/0x80 [76195.642693] raw_spin_rq_lock_nested+0x4b/0x80 [76195.643669] online_fair_sched_group+0x39/0x240 [76195.644663] sched_autogroup_create_attach+0x9d/0x170 [76195.645765] ksys_setsid+0xe6/0x110 [76195.646533] __do_sys_setsid+0xa/0x10 [76195.647358] do_syscall_64+0x3b/0x90 [76195.648219] entry_SYSCALL_64_after_hwframe+0x44/0xae The full kmsg excerpt and the reproducer code are attached. There's also additional "BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49" message is produced (see the full log in the attached file "prctl-sched-core-oops-bug-dmesg.log") when the full test case[1] is run, but I haven't been successful so far in producing a minimal reproduced for it. [1] https://github.com/strace/strace/commit/a90a5a56d2b76ba3ebd417472a02f40d3d6599d8 Run with `./bootstrap && ./configure CFLAGS='-g -Og' --enable-gcc-Werror && make check TESTS=prctl-sched-core--pidns-translation.gen`