On Thu, Sep 22, 2016 at 03:44:37PM -0700, Andy Lutomirski wrote: > On Fri, Sep 16, 2016 at 7:00 PM, Jann Horn wrote: > > On Tue, Sep 13, 2016 at 02:29:29PM -0700, Andy Lutomirski wrote: > >> This will prevent a crash if get_wchan() runs after the task stack > >> is freed. > > > > I think I found some more stuff. Have a look at KSTK_EIP() and KSTK_ESP(), I think > > they read from the saved userspace registers area at the top of the kernel stack? > > > > Used on remote processes in: > > vma_is_stack_for_task() (via /proc/$pid/maps) > > This isn't used in /proc/$pid/maps -- it's only used in > /proc/$pid/task/$tid/maps. I wonder if anyone actually cares about it > -- it certainly won't work reliably. > > I could pin the stack in vma_is_stack_for_task, but it seems > potentially better to me to change it to vma_is_stack_for_current() > and remove the offending caller in /proc, replacing it with "return > 0". Thoughts? I just scrolled through the debian codesearch results for "\[stack\]" - there seem to only be 105 across all of debian's packages, many of them duplicates - and I didn't see any that looked like they used the tid map. So I think this might work. ( https://codesearch.debian.net/search?q=%22%5C%5Bstack%5C%5D%22 ) > > do_task_stat() (/proc/$pid/stat) > > Like this: > > mm = get_task_mm(task); > if (mm) { > vsize = task_vsize(mm); > if (permitted) { > eip = KSTK_EIP(task); > esp = KSTK_ESP(task); > } > } > > Can we just delete this outright? It seems somewhere between mostly > and entirely useless, and it also seems dangerous. Until very > recently, on x86_64, this would have been a potential info leak, as > SYSCALL followed closely by a hardware interrupt would cause *kernel* > values to land in task_pt_regs(). I don't even want to think about > what this code does if the task is in vm86 mode. I wouldn't be at all > surprised if non-x86 architectures have all kinds of interesting > thinks happen if you do this to a task that isn't running normal > non-atomic kernel code at the time. > > I would advocate for unconditionally returning zeros in these two stat fields. I'd like that a lot. I guess the two things that might theoretically use it are ptrace users and (very theoretically) sampling profiling stuff or so? In gdb, the only code I can find that reads this is in gdb/linux-nat.c, but it's behind an "#ifdef 0": #if 0 /* Don't know how architecture-dependent the rest is... Anyway the signal bitmap info is available from "status". */ if (fscanf (procfile, "%lu ", <mp) > 0) /* FIXME arch? */ printf_filtered (_("Kernel stack pointer: 0x%lx\n"), ltmp); if (fscanf (procfile, "%lu ", <mp) > 0) /* FIXME arch? */ printf_filtered (_("Kernel instr pointer: 0x%lx\n"), ltmp); [...] strace and ltrace don't seem to be using it.