On Mon, Jul 18, 2016 at 6:24 PM, Dario Faggioli wrote: > On Mon, 2016-07-18 at 17:48 +0100, George Dunlap wrote: >> On 15/07/16 15:50, Dario Faggioli wrote: >> > >> > +/* >> > + * If all the siblings of cpu (including cpu itself) are in >> > idlers, >> > + * set all their bits in mask. >> > + * >> > + * In order to properly take into account tickling, idlers needs >> > to be >> > + * set qeual to something like: >> *equal (I can fix this on check-in) >> > Oops! > >> > + * >> > + * rqd->idle & (~rqd->tickled) >> > + * >> > + * This is because cpus that have been tickled will very likely >> > pick up some >> > + * work as soon as the manage to schedule, and hence we should >> > really consider >> > + * them as busy. >> OK this is something that slightly confused me when I was reviewing >> the >> patch the first time: that rqd->idle is *all* pcpus which are >> currently >> idle (and thus we need to & (~tickled) when using it), but rqd- >> >smt_idle >> is meant to be maintained as *non-tickled* idle pcpus. >> > Short answer is, "yes, this recap of yours is correct". > > In fact, the difference between idle and smt_idle is that the former is > valid instantaneously, while the latter is tracking a state. > > IOW, if, at any given time, I want to know what pcpus are idle, I check > rqd->idle. If I want to know what are idle and also are not (or are > unlikely) just about to pick up work, I can check > rqd->idle&(~rqd->tickled) > > Let's now consider smt_idle and assume that, at time t siblings pcpus 2 > and 3 are idle (as in, their bit is 1 in rqd->idle). If I'd be basing > smt_idle just on that, I could at this point set the bit of the core in > smt_idle. This in turn means that work will likely be sent to either 2 > or 3 (depending on all the other factors that influence this). Let's > assume we select 2. But if either of them --although being idle-- was > has actually been tickled already, we may have taken a suboptimal > decision. In fact, if 3 was tickled, both 2 and 3 will pick up work, > and if there is another core (say, made up of siblings pcpus 6 and 7) > which is truly fully idle, we would better have chosen a pcpu from > there. If 2 was the one that was tickled, that's even worse, because I > most likely have 2 work items, and am tickling only 1 pcpu! > > So, again, yes, basically this means that I need smt_idle to be > representative of the set of non-tickled idle pcpus. > >> Are you planning at some point to have a follow-up patch which >> changes >> rqd->idle to be non-tickled idle pcpus as well? Unless I missed >> something it looks like at the moment the only times rqd->idle is >> acted >> upon is after &~-ing out rqd->tickled anyway. >> > I am indeed, but I was planning to do that after this round of changes > (this series, plus soft-affinity, plus caps, which I have in my queue). > > It's, after all, an optimization, and hence I think it is fine to leave > it to when things will be proven to be working. :-) > > If you're saying that this discrepancy between rqd->idle's and > rqd->smt_idle's semantic is, at minimum, unideal, I do agree... but I > think, for now at least, it's worth living with it. I hadn't actually said anything, but you know me well enough to guess what I'm thinking. :-) I am somewhat torn between feeling like the inconsistency and as you say, the fact that this is a distinct improvement and it would seem a bit petty to insist that you either wait or produce a patch to change idle at the same time. But I do think that the difference needs to be called out a bit better. What about folding in something like the attached patch? -George