2005/11/29, Andrew Morton : > > I found a race condition in procfs on SMP systems. The result is an > > oops in processes like pidof. Apparently ->proc_read() gets passed a > > potentially NULL pointer. > > Do you know what the race is? Apparently it's a race between deleting a process and accessing its /proc/pid entries. It came out in pidof while it was accessing /proc/pid/stat (fs/proc/array.c:do_task_stat crashed on first instruction - it was an inline function accessing task->state, get_task_state IIRC). oops (with vserver history data - I'm using a patch mentioned below) is attached. > > How does one reproduce it? I managed to reproduce it (although not reliably) during high CPU load and I/O (parallel kernel compiles) on SMP systems with the vserver patch (http://linux-vserver.org, the exact patch is http://vserver.13thfloor.at/Experimental/patch-2.6.14.2-vs2.1.0-rc8.diff), but the vserver maintainer pointed out that it probably is a mainline issue. We're not using 2.6 systems too much except for the vserver test beds so I cannot tell if it happens on vanilla kernels. > > > The following micro-patch seems to fix it. > > It might be right, or it might be a workaround.. > I'm not a kernel guru so it's just my proposal. Can it break anything? An alternative _might_ be somewhat coarser task_struct locking (do_task_stat grabs a spinlock but then it's already too late). However, if no "right" solution appears, I'll keep using my two-liner because it seems to help, at least in my setup. Best regards, Grzegorz Nosek