On lun, 2013-12-09 at 22:08 -1000, Justin Weaver wrote: > Hello, > Hey... here I am! It took a while, eh? Well, sorry for that. :-( First of all, a bit of context, for the people in Cc (I added Marcus) and the list. Basically, Justin is looking into implementing soft affinity for credit2. As we know, credit2 lacks hard affinity too. We'll see along the way whether it would be easy enough to add that too. > On Sat, Nov 30, 2013 at 10:18 PM, Dario Faggioli > wrote: > I'll have to re-look at the details of credit2 about load > balance and > migration between CPUs/runqueues but it looks like we need to > have > something allowing us to honour pinning/affinity _within_ the > same > runqueue, anyway, don't we? I mean, even if you implement > per-L2 > runqueues, that would still span more than one CPU, and the > user may > well want to pin a vCPU to only one (or in general a subset) > of them. > > > Yes, I agree. Just looking for some feedback before I attempt a patch. > Some of the functions I think need updating for hard/soft affinity... > Ok, first of all, about one global runqueue VS. one runqueue per socket/L2. I'm also seeing what you are reporting, i.e., all the cpus being assigned to one same runqueue, despite the box having to socket: cpu_topology : cpu: core socket node 0: 0 0 0 1: 1 0 0 2: 2 0 0 3: 3 0 0 4: 0 1 1 5: 1 1 1 6: 2 1 1 7: 3 1 1 and despite this piece of code in sched_credit2:init_pcpu(): /* Figure out which runqueue to put it in */ /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */ if ( cpu == 0 ) rqi = 0; else rqi = cpu_to_socket(cpu); if ( rqi < 0 ) { printk("%s: cpu_to_socket(%d) returned %d!\n", __func__, cpu, rqi); BUG(); } rqd=prv->rqd + rqi; printk("Adding cpu %d to runqueue %d\n", cpu, rqi); if ( ! cpumask_test_cpu(rqi, &prv->active_queues) ) { printk(" First cpu on runqueue, activating\n"); activate_runqueue(prv, rqi); } which, AFAICT, ought to be creating two runqueues. Weird... Let's raise that in a separate e-mail/thread, ok? > runq_candidate needs to be updated. It decides which vcpu from the run > queue to run next on a given pcpu. Currently it only takes credit into > account. Considering hard affinity should be simple enough. > Most likely. I guess you're thinking at something like, instead of just looking at the next guy in the queue, scan it until we find one with an hard affinity suitable for the cpu we're dealing with, is that so? If yes, I guess it would be fine, although at some point we'd want to figure out the performance implications of such a scan. Also, I'm not too familiar with credit2, but it's very well possible that, even after having modified this, there are other places where hard affinity needs to be considered (saying this just as a 'warning' toward claiming victory too soon ;-P ). > For soft, what if it first looked through the run queue in credit > order at only vcpus that prefer to run on the given processor and had > a certain amount of credit, and if none were found it then considered > the whole run queue considering only hard affinity and credit? > What do you mean with "have a certain amount of credit" ? Apart from that, seems a reasonable line of thinking... Looks similar to the 2 phases approach I took for credit1. Again, some performance investigation would be necessary, at some point. Of course, we can first come up with an implementation, and then start benchmarking and optimizing (actually, optimizing too early usually lead to very bad situations!). I wonder whether we could do something clever, knowing that we have one runqueue per socket. I mean, of course both hard and and soft affinities can span sockets, nodes (or whatever), they can be sparse, etc. However, it would be fairly common to have some sort of relationship between them and the topology (being put in place by either the user explicitly or the toolstack automatically), to the point that it may be worth optimizing a bit for this case. > runq_assign assumes that the run queue associated with vcpu->processor > is OK for vcpu to run on. If considering affinity, I'm not sure if > that can be assumed. > Yes, because affinity may have changed in the meantime, right? Which means, in case of hard, you'd be violating an explicit user request. In case of soft, it won't be equally bad, but it would be nice to check whether we can transition to a better system state (i.e., respecting the soft affinity too). > I probably need to dig further into schedule.c to see where > vcpu->processor is being assigned initially. Anyway, with only one run > queue this doesn't matter for now. > Well, sure, but I actually would recommend looking more in sched_credit.c rather than there. What I mean is, when implementing soft affinity for credit1, I didn't need to touch anything in schedule.c (well, for sure anything related to vcpu->processor). Similarly, talking about hard affinity, the fact that it is possible to have a scheduler that implements hard affinity (credit1) and one that does not (credit2), makes me thinking that the code in schedule.c is general enough to support both, and that the whole game is played inside the actual scheduler source files (i.e., either sched_credit.c or sched_credit2.c) > choose_cpu / migrate will need to be updated, but currently migrate > never gets called because there's only one run queue. > choose_cpu, yes, definitely. It's basically the core logic implementing csched_cpu_pick, which is key in chosing a cpu where to run a vcpu, and both hard and soft affinity needs to be taken into account when doing that. About migrate, it surely needs tweaking too. That another place where, I think, we can make good use of the information we have about the link between the runqueus and the topology, perhaps after having check whether the affinity (whichever one) is in such a form that plays well enough with that. Thanks for your interest in this work! :-) Dario -- <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)