On Tue, 1 Aug 2017, Dario Faggioli wrote: > On Mon, 2017-07-31 at 16:58 -0700, Stefano Stabellini wrote: > > On Tue, 1 Aug 2017, Dario Faggioli wrote: > > > On Mon, 2017-07-31 at 14:20 -0700, Stefano Stabellini wrote: > > > > On Thu, 27 Jul 2017, Dario Faggioli wrote: > > > > > > > > > > diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c > > > > > index f0fdc87..4586f2a 100644 > > > > > --- a/xen/common/rcupdate.c > > > > > +++ b/xen/common/rcupdate.c > > > > > @@ -84,8 +84,14 @@ struct rcu_data { > > > > >      int cpu; > > > > >      struct rcu_head barrier; > > > > >      long            last_rs_qlen;     /* qlen during the last > > > > > resched */ > > > > > + > > > > > +    /* 3) idle CPUs handling */ > > > > > +    struct timer idle_timer; > > > > > +    bool idle_timer_active; > > > > >  }; > > > > >   > > > > > +#define RCU_IDLE_TIMER_PERIOD MILLISECS(10) > > > > > > > > Isn't this a bit too short? How is it chosen? > > > > > > > > > > What makes you think it's short? > > > > In terms of power saving and CPU sleep states, 10ms is not much to > > sleep > > for. I wonder if there are any power saving benefits in sleeping for > > only 10ms (especially on x86 where entering and exiting CPU sleep > > states > > takes longer, to be confirmed). > > > I *think* we should be fine with, say, 100ms. But that's again, > guess/rule of thumb, nothing more than that. And, TBH, I'm not even > sure what a good experiment/benchmark would be, to assess whether a > particular value is good or bad. :-/ Given the description below, it's possible that the new timer has to fire several times before the callback can be finally invoked, right? I would make some measurements to check how many times the timer has to fire (how long we actually need to wait before calling the callback) in various scenarios. Then I would choose a value to minimize the number of unnecessary wake-ups. > >   We might as well do the thing we need > > to do immediately? I guess we cannot do that? > > > You're guessing correct, we can't. It's indeed a bit tricky, and it > took it a little bit to me as well to figure all of it out properly. > > Basically, let's say that, at time t1, on CPU1, someone calls > call_rcu(). The situation on the other CPUs is: CPU0 busy; CPU2 idle > (no timer pending); CPU3 busy. > > So, a new grace period starts, and its exact end will be when CPU0, > CPU1 and CPU3 have quiesced once (in Xen, quiescence means: "going > through do_softirq()"). > > But RCU it's a passive mechanism, i.e., we rely on each CPU coming to > the RCU core logic, and tell <>. > Let's say that CPU0 quiesces at time t2 > t1. CPU1 quiesces at time > t3 > t2, and goes fully idle (no timer pending). CPU3 quiesces at time > t4 > t3. Now, at time t4, CPU1 can actually invoke the callbeck, queued > at time t1, from within call_rcu(). > > This patch series solves two problems, of our current RCU > implementation: > > 1) right now, we don't only wait for CPU0, CPU1 and CPU3, we also wait  > for CPU2. But since CPU2 is fully idle, it just won't bother  > telling RCU that it has quiesced (well, on x86, that would actually  > happen at some point, while on ARM, it is really possible that this  > never happens!). This is solved in patch 3, by introducing the  > cpumask; > > 2) now (after patch 3) we know that we just can avoid waiting for  > CPU2. Good. But at time t4, when CPU3 quiesces, marking the end of > the grace period, how would CPU1 --which is fully idle-- know that > it can now safely invoke the callback? Again, RCU is passive, so it > relies on CPU1 to figure that out on its own, next time it wakes > up, e.g., because of the periodic tick timer. But we don't have a > periodic tick timer! Well, this again means that, on x86, CPU1 will > actually figure that out at some (unpredictable) point in future. > On ARM, not so much. The purpose of the timer in this patch is to > make sure it always will. > In fact, with patches 4 and 5 applied, at time t3, we let CPU1 go  > idle, but we program the timer. It will fire at t3+T (with T=10ms,  > right now). When this happens, if t3+T > t4, CPU1 invokes the > callback, and we're done. If not (and CPU1 is still idle) we retry > in another T. > > So, when you say "immediately", *when* do you actually mean? > > We can't invoke the callback at t3 (i.e., immediately after CPU1 > quiesces), because we need to wait for CPU3 to do the same. > > We can't invoke the callback when CPU3 quiesces, at t4 (i.e., > immediately after the grace period ends) either, because the callback > it's on CPU1, not on CPU3. > > Linux's solution is not to stop the tick for CPU1 at time t3. It will > be stopped only after the first firing of the tick itself at time > t > t4 (if CPU1 is still idle, of course). This timer thing is, IMO, > pretty similar. The only difference is that we don't have a tick to not > stop... So I had to make up a very special one. :-D > > TL;DR, yes, I also would have loved the world (or even just this > problem) to be simple. It's not! ;-P