migration thread and active_load_balance

* migration thread and active_load_balance
@ 2008-04-20 18:21 Dan Upton
  2008-04-20 21:26 ` Dmitry Adamushko
  2008-04-20 21:35 ` Andi Kleen
  0 siblings, 2 replies; 9+ messages in thread
From: Dan Upton @ 2008-04-20 18:21 UTC (permalink / raw)
  To: linux-kernel

Back again with more questions about the scheduler, as I've spent two
or three days trying to debug on my own and I'm just not getting
anywhere.

Basically, I'm trying to add a new active balancing mechanism.  I made
out a diagram of how migration_thread  calls active_load_balance and
so on, and I use a flag (set by writing to a file in sysfs) to
determine whether to use the standard iterator for the CFS runqueue or
a different iterator I wrote.  The new iterator seems to work fine, as
I've been using it (again, with a flag) to replace the regular
iterator when it's called from schedule by idle_balance.  I basically
tried adding an extra conditional in migration_thread that sets up
some state and then calls active_load_balance, but I was getting
deadlocks.  I'm not really sure why, since all I've really changed is
add a few variables to struct rq and struct cfs_rq.

I tried only doing my state setup and restore in that conditional,
without actually calling active_load_balance, which has given me an
even more frustrating result--the kernel does not deadlock, but it
does seem to crash in such a manner as to require a hard reset of the
machine.  (For instance, at one point I got an "invalid page state in
process 'init'" message from the kernel; if I try to reboot from Gnome
though it hangs.)  I don't understand this at all, since as far as I
can tell I'm using thread-local variables and really all I'm doing
right now is assignments to them.  Unless, of course the struct rq
(from rq = cpu_rq(cpu);) could be being manipulated elsewhere, leading
to some sort of race condition...

Anyway, like I said, I've spent several days trying to understand this
error by putting in printk()s galore and doing traces through the
source code to figure out the call chain, but I'm really stuck here.
Can anybody shed some light, or point me to some more thorough
documentation on the scheduler and active load balancing?

Thanks,
-dan

^ permalink raw reply	[flat|nested] 9+ messages in thread