On Thu, Mar 22, 2012 at 06:58:29AM -0700, Dan Smith wrote: > AA> The only reasonable explanation I can imagine for the weird stuff > AA> going on with "numa01_inverse" is that maybe it was compiled without > AA> -DHARD_BIND? I forgot to specify -DINVERSE_BIND is a noop unless > AA> -DHARD_BIND is specified too at the same time. -DINVERSE_BIND alone > AA> results in the default build without -D parameters. > > Ah, yeah, that's probably it. Later I'll try re-running some of the > cases to verify. Ok! Please checkout also autonuma branch again, or autonuma-dev, if you re-run on autonuma, because I had a bug that autonuma would optimize the hard binds too :). Now they're fully obeyed (I already obeyed vma_policy(vma) but I forgot to add a check on current->mempolicy to be null). BTW, in the meantime I've some virt bench.. attached screenshot of vnc, first run is with autonuma off, second and third run are with autonuma on. The full_scan is increased every 10 sec with autonuma on, so the scanning overhead is being measured. Autonuma off picks the wrong node roughly 50% of the time and you can see the difference in elapsed time when it happens. AutoNUMA gets it right 100% of the time thanks to autonuma_balance (always "16sec" vs "16 sec or 26 sec" is a great improvement). I also tried to measure a kernel build of a VM that fits in one node (in CPU and RAM) but I get badly bitten by HT effects, I should basically notice in autonuma_balance that it's better to spread the load to remote nodes if the remote nodes have full-core idle, while the local node has only the HT sibling idle. What a mess. Anyway current code would optimally perform, if all nodes are busy and there aren't idle cores (or only idle siblings). I guess I'll leave the HT optimizations for later. I probably shall measure this again with HT off. Running the kernel build on a VM that spans over the whole system wouldn't be ok until I run autonuma in the guest too, but to do that I need a vtopology and I haven't tried how to tell qemu to do that yet. Otherwise without autonuma in guest too, the guest scheduler will freely move guest gcc tasks from vCPU0 to vCPU1 and maybe those two are on threads that lives in different nodes on the host, so triggering potentially spurious memory migrations triggered by the guest scheduler not being aware.