On Mon, 2015-03-09 at 11:45 +0000, George Dunlap wrote: > On 03/09/2015 07:11 AM, Justin Weaver wrote: > > I don't think there's any way the mask can be empty after the > > cpumask_and with the cpu_hard_affinity and the VCPU2ONLINE. In > > schedule.c:vcpu_set_hard_affinity, if the intersection of the new hard > > mask and VCPU2ONLINE is empty, the function returns -EINVAL. I took > > into account all the discussion here and modified the function for v3. > > What about this: > > * Create a cpu pool with cpus 0 and 1. online_cpus is now [0,1]. > * Set a hard affinity of [1]. This succeeds. > So, I decided to try this: # xl cpupool-list -c Name CPU list Pool-0 0,1 # xl list -c Name ID Mem VCPUs State Time(s) Cpupool Domain-0 0 507 4 r----- 19.9 Pool0 test-pv 1 2000 8 -b---- 19.2 Pool0 # xl vcpu-list test-pv Name ID VCPU CPU State Time(s) Affinity (Hard / Soft) test-pv 1 0 1 -b- 5.5 1 / 1 test-pv 1 1 1 -b- 2.4 1 / 1 test-pv 1 2 1 -b- 2.9 1 / 1 test-pv 1 3 1 -b- 1.9 1 / 1 test-pv 1 4 1 -b- 1.0 1 / 1 test-pv 1 5 1 -b- 2.0 1 / 1 test-pv 1 6 1 -b- 1.2 1 / 1 test-pv 1 7 1 -b- 2.4 1 / 1 # xl cpupool-cpu-remove Pool0 1 (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:460 (XEN) **************************************** i.e., surprise surprise, there's already an assert guarding this... and it triggers!! :-( It seems to have been added in v4 of the per-vcpu soft affinity work: http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=commit;h=4d4e999637f38e0bbd4318ad8e0c92fd1e580430 So, we must have had this discussion before! :-) I've done a bit of archeology and the ASSERT() in domain_update_node_affinity() was introduced by me (upon request) while working on implementing per-vcpu soft affinity. Then, cs 93be8285 is what causes the assert to trigger. Time seems not to match, but that's because soft affinity was put on hold when it was decided not to include it in 4.4, and I probably forgot to retest with a use case similar to the above when resubmitting it! :-( A patch to fix things is attached to this email, for convenience (I'll submit it properly, with changelog and everything, right away). After seeing this, I'm even more convinced that !empty(online && hard_affinity) is really something we want and, as we ASSERT() it in domain.c, the same should be done in sched_credit2.c. OTOH, if we don't want to ASSERT() in the specific scheduler code, then I'd remove the one in domain.c too (and, perhaps, just use online as a fallback), or things would look inconsistet. Of course, this can also be seen as a proof of George's point, that this is something not obvious and not easy to catch. Still, I think that if we say that we support hard vcpu affinity (a.k.a. pinning), we should not allow vcpus outside their hard affinity, and the code must reflect this. > * Move cpu 1 to a different cpupool. > > After a quick look I don't see anything that updates the hard affinity > when cpus are removed from pools. > cpupool_unassign_cpu() calls cpupool_unassign_cpu_helper()that calls cpu_disable_scheduler() which, if it finds that empty(online && hard_affinity)==true, it resets hard_affinity to "all". Note that this is reported in the log, as a confirmation that this is really something exceptional: (XEN) Breaking affinity for d1v0 (XEN) Breaking affinity for d1v1 (XEN) Breaking affinity for d1v2 (XEN) Breaking affinity for d1v3 (XEN) Breaking affinity for d1v4 (XEN) Breaking affinity for d1v5 (XEN) Breaking affinity for d1v6 (XEN) Breaking affinity for d1v7 And also note that, as a consequence of fiddling with cpupools, the affinity has been altered by Xen (i.e., vcpus still always run within their hard affinity masks): # xl vcpu-pin 1 all 1 1 # xl cpupool-cpu-remove Pool-0 1 # xl cpupool-list -c Name CPU list Pool-0 0,2,3,4,5,6,7,8,9,10,11,12,13,14,15 # xl vcpu-list test-pv Name ID VCPU CPU State Time(s) Affinity (Hard / Soft) test-pv 1 0 2 -b- 6.1 all / 1 test-pv 1 1 6 -b- 1.6 all / 1 test-pv 1 2 4 -b- 1.6 all / 1 test-pv 1 3 5 -b- 3.1 all / 1 test-pv 1 4 4 -b- 0.8 all / 1 test-pv 1 5 7 -b- 3.0 all / 1 test-pv 1 6 3 -b- 1.1 all / 1 test-pv 1 7 3 -b- 0.7 all / 1 That's what drove all the reasoning when changing domain_update_node_affinity(), and led to that ASSERT(), during the soft affinity work. The reason why the ASSERT() triggers, as can be seen in the patch, is that, because of the cited changeset, domain_update_node_affinity() is called too early. > And, even if it does *now*, it's possible that something might be > changed in the future that would forget it; this ASSERT() isn't exactly > next to that code. > But it would help us catch the bug at some point... As proved above! :-D BTW, I've already written and submitted an OSSTest test involving cpupools (we don't do anything at the moment). I'll see about adding these kind of things to it. > So it seems to me like handling that case makes the software > more robust for little cost. > Yep, I also agree that it at least won't cost much, in terms of runtime overhead. Regards, Dario