George Dunlap wrote: > wrote: >> On Tue, Feb 8, 2011 at 4:33 PM, Andre Przywara wrote: >>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24 >>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24 >>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24 >>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25 >>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25 >>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25 >>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26 >>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26 >>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26 >>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27 >>> (XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27 >>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27 >>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27 >>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28 >>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28 >>> (XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28 >>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28 >>> (XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28 >>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29 >> Interesting -- what seems to happen here is that as cpus are disabled, >> vcpus are "shovelled" in an accumulative fashion from one cpu to the >> next: >> * v18,34,42 start on cpu 24. >> * When 24 is brought down, they're all migrated to 25; then when 25 is >> brougth down, to 26, then to 27 >> * v24 is running on cpu 27, so when 27 is brought down, v24 is added to the mix >> * v3 is running on cpu 28, so all of them plus v3 are shoveled onto cpu 29. >> >> While that behavior may not be ideal, it should certainly be bug-free. >> >> Another interesting thing to note is that the bug happened on pcpu 32, >> but there were no advertised migrations from that cpu. >> >> Andre, can you fold the attached patch into your testing? Sorry, but that bug (and its output) didn't trigger on two tries. Instead I now saw two occasions of the "migration failed, must retry later" message. Interestingly enough is does not seem to be fatal. The first time it triggers, the numa-split even completes, then after I roll it back and repeat it it shows again, but crashes later on that old BUG_ON(). See the attached log for more details. Thanks for the try, anyway. Regards, Andre. >> >> Thanks for all your work on this. I am glad for all your help. I only start to really understand the scheduler, so your support is much appreciated. >> >> -George >> -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany