On Tue, 1 Aug 2017, Dario Faggioli wrote:
> On Mon, 2017-07-31 at 16:58 -0700, Stefano Stabellini wrote:
> > On Tue, 1 Aug 2017, Dario Faggioli wrote:
> > > On Mon, 2017-07-31 at 14:20 -0700, Stefano Stabellini wrote:
> > > > On Thu, 27 Jul 2017, Dario Faggioli wrote:
> > > > > 
> > > > > diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
> > > > > index f0fdc87..4586f2a 100644
> > > > > --- a/xen/common/rcupdate.c
> > > > > +++ b/xen/common/rcupdate.c
> > > > > @@ -84,8 +84,14 @@ struct rcu_data {
> > > > >      int cpu;
> > > > >      struct rcu_head barrier;
> > > > >      long            last_rs_qlen;     /* qlen during the last
> > > > > resched */
> > > > > +
> > > > > +    /* 3) idle CPUs handling */
> > > > > +    struct timer idle_timer;
> > > > > +    bool idle_timer_active;
> > > > >  };
> > > > >  
> > > > > +#define RCU_IDLE_TIMER_PERIOD MILLISECS(10)
> > > > 
> > > > Isn't this a bit too short? How is it chosen?
> > > > 
> > > 
> > > What makes you think it's short?
> > 
> > In terms of power saving and CPU sleep states, 10ms is not much to
> > sleep
> > for. I wonder if there are any power saving benefits in sleeping for
> > only 10ms (especially on x86 where entering and exiting CPU sleep
> > states
> > takes longer, to be confirmed).
> >
> I *think* we should be fine with, say, 100ms. But that's again,
> guess/rule of thumb, nothing more than that. And, TBH, I'm not even
> sure what a good experiment/benchmark would be, to assess whether a
> particular value is good or bad. :-/

Given the description below, it's possible that the new timer has to
fire several times before the callback can be finally invoked, right?

I would make some measurements to check how many times the timer has to
fire (how long we actually need to wait before calling the callback) in
various scenarios. Then I would choose a value to minimize the number of
unnecessary wake-ups.


> >   We might as well do the thing we need
> > to do immediately? I guess we cannot do that?
> >
> You're guessing correct, we can't. It's indeed a bit tricky, and it
> took it a little bit to me as well to figure all of it out properly.
> 
> Basically, let's say that, at time t1, on CPU1, someone calls
> call_rcu(). The situation on the other CPUs is: CPU0 busy; CPU2 idle
> (no timer pending); CPU3 busy.
> 
> So, a new grace period starts, and its exact end will be when CPU0,
> CPU1 and CPU3 have quiesced once (in Xen, quiescence means: "going
> through do_softirq()").
> 
> But RCU it's a passive mechanism, i.e., we rely on each CPU coming to
> the RCU core logic, and tell <<hey, btw, I've quiesced>>.
> Let's say that CPU0 quiesces at time t2 > t1. CPU1 quiesces at time
> t3 > t2, and goes fully idle (no timer pending). CPU3 quiesces at time
> t4 > t3. Now, at time t4, CPU1 can actually invoke the callbeck, queued
> at time t1, from within call_rcu().
> 
> This patch series solves two problems, of our current RCU
> implementation:
> 
> 1) right now, we don't only wait for CPU0, CPU1 and CPU3, we also wait 
>    for CPU2. But since CPU2 is fully idle, it just won't bother 
>    telling RCU that it has quiesced (well, on x86, that would actually 
>    happen at some point, while on ARM, it is really possible that this 
>    never happens!). This is solved in patch 3, by introducing the 
>    cpumask;
> 
> 2) now (after patch 3) we know that we just can avoid waiting for 
>    CPU2. Good. But at time t4, when CPU3 quiesces, marking the end of
>    the grace period, how would CPU1 --which is fully idle-- know that
>    it can now safely invoke the callback? Again, RCU is passive, so it
>    relies on CPU1 to figure that out on its own, next time it wakes
>    up, e.g., because of the periodic tick timer. But we don't have a
>    periodic tick timer! Well, this again means that, on x86, CPU1 will
>    actually figure that out at some (unpredictable) point in future.
>    On ARM, not so much. The purpose of the timer in this patch is to
>    make sure it always will.
>    In fact, with patches 4 and 5 applied, at time t3, we let CPU1 go 
>    idle, but we program the timer. It will fire at t3+T (with T=10ms, 
>    right now). When this happens, if t3+T > t4, CPU1 invokes the
>    callback, and we're done. If not (and CPU1 is still idle) we retry
>    in another T.
> 
> So, when you say "immediately", *when* do you actually mean?
> 
> We can't invoke the callback at t3 (i.e., immediately after CPU1
> quiesces), because we need to wait for CPU3 to do the same.
> 
> We can't invoke the callback when CPU3 quiesces, at t4 (i.e.,
> immediately after the grace period ends) either, because the callback
> it's on CPU1, not on CPU3.
> 
> Linux's solution is not to stop the tick for CPU1 at time t3. It will
> be stopped only after the first firing of the tick itself at time
> t > t4 (if CPU1 is still idle, of course). This timer thing is, IMO,
> pretty similar. The only difference is that we don't have a tick to not
> stop... So I had to make up a very special one. :-D
> 
> TL;DR, yes, I also would have loved the world (or even just this
> problem) to be simple. It's not! ;-P