All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] xen: RCU: x86/ARM: Add support of rcu_idle_{enter, exit}
@ 2017-08-18 18:04 Dario Faggioli
  2017-08-18 18:04 ` [PATCH v3 1/6] xen: in do_softirq() sample smp_processor_id() once and for all Dario Faggioli
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Dario Faggioli @ 2017-08-18 18:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich

Hey,

I know, v2 of this series just went out. But since I leave for 2 weeks, I
wanted the version with as much as possible comments addressed to be on the
list, and since it was rather quick to do that (and test it), for latest Tim's
comment, here I am.

So, basically, this is v2, as here:

 https://lists.xen.org/archives/html/xen-devel/2017-08/msg01881.html

But with, as Tim suggested, the idle_cpumask initialized to "all clear". The
various CPUs, therefore, are considered busy when they come up, and clear their
own bit in the mask, as soon as they enter the idle loop for the first time
(which is pretty soon). Doing like this, we close another window for a
potential (although, rather unlikely) race/unnecessary extension of the grace
period, and we simplify the code (a.k.a. 'win-win' :-D).

Regards,
Dario
---
Dario Faggioli (6):
      xen: in do_softirq() sample smp_processor_id() once and for all.
      xen: ARM: suspend the tick (if in use) when going idle.
      xen: RCU/x86/ARM: discount CPUs that were idle when grace period started.
      xen: RCU: don't let a CPU with a callback go idle.
      xen: RCU: avoid busy waiting until the end of grace period.
      xen: try to prevent idle timer from firing too often.

 xen/arch/arm/domain.c         |   29 +++++++---
 xen/arch/x86/cpu/mwait-idle.c |    3 -
 xen/common/rcupdate.c         |  123 ++++++++++++++++++++++++++++++++++++++++-
 xen/common/schedule.c         |    4 +
 xen/common/softirq.c          |    8 +--
 xen/include/xen/perfc_defn.h  |    2 +
 xen/include/xen/rcupdate.h    |    6 ++
 xen/include/xen/sched.h       |    6 +-
 8 files changed, 159 insertions(+), 22 deletions(-)
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 1/6] xen: in do_softirq() sample smp_processor_id() once and for all.
  2017-08-18 18:04 [PATCH v3 0/6] xen: RCU: x86/ARM: Add support of rcu_idle_{enter, exit} Dario Faggioli
@ 2017-08-18 18:04 ` Dario Faggioli
  2017-08-18 18:04 ` [PATCH v3 2/6] xen: ARM: suspend the tick (if in use) when going idle Dario Faggioli
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Dario Faggioli @ 2017-08-18 18:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, George Dunlap, Andrew Cooper, Tim Deegan,
	Julien Grall, Jan Beulich

In fact, right now, we read it at every iteration of the loop.
The reason it's done like this is how context switch was handled
on IA64 (see commit ae9bfcdc, "[XEN] Various softirq cleanups" [1]).

However:
1) we don't have IA64 any longer, and all the achitectures that
   we do support, are ok with sampling once and for all;
2) sampling at every iteration (slightly) affect performance;
3) sampling at every iteration is misleading, as it makes people
   believe that it is currently possible that SCHEDULE_SOFTIRQ
   moves the execution flow on another CPU (and the comment,
   by reinforcing this belief, makes things even worse!).

Therefore, let's:
- do the sampling only once, and remove the comment;
- leave an ASSERT() around, so that, if context switching
  logic changes (in current or new arches), we will notice.

[1] Some more (historical) information here:
    http://old-list-archives.xenproject.org/archives/html/xen-devel/2006-06/msg01262.html

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Tim Deegan <tim@xen.org>
---
This has been submitted already, as a part of another series. Discussion is here:
 https://lists.xen.org/archives/html/xen-devel/2017-06/msg00102.html

For the super lazy, Jan's latest word in that thread were these:
 "I've voiced my opinion, but I don't mean to block the patch. After
  all there's no active issue the change introduces."
 (https://lists.xen.org/archives/html/xen-devel/2017-06/msg00797.html)

Since then:
- changed "once and for all" with "only once", as requested by George (and
  applied his Reviewed-by, as he said I could).
---
 xen/common/softirq.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/xen/common/softirq.c b/xen/common/softirq.c
index ac12cf8..67c84ba 100644
--- a/xen/common/softirq.c
+++ b/xen/common/softirq.c
@@ -27,16 +27,12 @@ static DEFINE_PER_CPU(unsigned int, batching);
 
 static void __do_softirq(unsigned long ignore_mask)
 {
-    unsigned int i, cpu;
+    unsigned int i, cpu = smp_processor_id();
     unsigned long pending;
 
     for ( ; ; )
     {
-        /*
-         * Initialise @cpu on every iteration: SCHEDULE_SOFTIRQ may move
-         * us to another processor.
-         */
-        cpu = smp_processor_id();
+        ASSERT(cpu == smp_processor_id());
 
         if ( rcu_pending(cpu) )
             rcu_check_callbacks(cpu);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 2/6] xen: ARM: suspend the tick (if in use) when going idle.
  2017-08-18 18:04 [PATCH v3 0/6] xen: RCU: x86/ARM: Add support of rcu_idle_{enter, exit} Dario Faggioli
  2017-08-18 18:04 ` [PATCH v3 1/6] xen: in do_softirq() sample smp_processor_id() once and for all Dario Faggioli
@ 2017-08-18 18:04 ` Dario Faggioli
  2017-08-18 18:04 ` [PATCH v3 3/6] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started Dario Faggioli
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Dario Faggioli @ 2017-08-18 18:04 UTC (permalink / raw)
  To: xen-devel; +Cc: Julien Grall, Stefano Stabellini

Since commit 964fae8ac ("cpuidle: suspend/resume scheduler
tick timer during cpu idle state entry/exit"), if a scheduler
has a periodic tick timer, we stop it when going idle.

This, however, is only true for x86. Make it true for ARM as
well.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
Cc: Julien Grall <julien.grall@arm.com>
---
 xen/arch/arm/domain.c |   29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index eeebbdb..2160d2b 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -39,6 +39,25 @@
 
 DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
 
+static void do_idle(void)
+{
+    unsigned int cpu = smp_processor_id();
+
+    sched_tick_suspend();
+    /* sched_tick_suspend() can raise TIMER_SOFTIRQ. Process it now. */
+    process_pending_softirqs();
+
+    local_irq_disable();
+    if ( cpu_is_haltable(cpu) )
+    {
+        dsb(sy);
+        wfi();
+    }
+    local_irq_enable();
+
+    sched_tick_resume();
+}
+
 void idle_loop(void)
 {
     unsigned int cpu = smp_processor_id();
@@ -52,15 +71,7 @@ void idle_loop(void)
         if ( unlikely(tasklet_work_to_do(cpu)) )
             do_tasklet();
         else
-        {
-            local_irq_disable();
-            if ( cpu_is_haltable(cpu) )
-            {
-                dsb(sy);
-                wfi();
-            }
-            local_irq_enable();
-        }
+            do_idle();
 
         do_softirq();
         /*


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 3/6] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started.
  2017-08-18 18:04 [PATCH v3 0/6] xen: RCU: x86/ARM: Add support of rcu_idle_{enter, exit} Dario Faggioli
  2017-08-18 18:04 ` [PATCH v3 1/6] xen: in do_softirq() sample smp_processor_id() once and for all Dario Faggioli
  2017-08-18 18:04 ` [PATCH v3 2/6] xen: ARM: suspend the tick (if in use) when going idle Dario Faggioli
@ 2017-08-18 18:04 ` Dario Faggioli
  2017-08-22 12:59   ` Jan Beulich
  2017-08-29 14:58   ` George Dunlap
  2017-08-18 18:04 ` [PATCH v3 4/6] xen: RCU: don't let a CPU with a callback go idle Dario Faggioli
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 21+ messages in thread
From: Dario Faggioli @ 2017-08-18 18:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich

Xen is a tickless (micro-)kernel, i.e., when a CPU becomes
idle there is no timer tick that will periodically wake the
CPU up.
OTOH, when we imported RCU from Linux, Linux was (on x86) a
ticking kernel, i.e., there was a periodic timer tick always
running, even on idle CPUs. This was bad for power consumption,
but, for instance, made it easy to monitor the quiescent states
of all the CPUs, and hence tell when RCU grace periods ended.

In Xen, that is impossible, and that's particularly problematic
when the system is very lightly loaded, as some CPUs may never
have the chance to tell the RCU core logic about their quiescence,
and grace periods could extend indefinitely!

This has led, on x86, to long (and unpredictable) delays between
RCU callbacks queueing and their actual invokation. On ARM, we've
even seen infinite grace periods (e.g., complate_domain_destroy()
never being actually invoked!). See here:

 https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg02454.html

The first step for fixing this situation is for RCU to record,
at the beginning of a grace period, which CPUs are already idle.
In fact, being idle, they can't be in the middle of any read-side
critical section, and we don't have to wait for their quiescence.

This is tracked in a cpumask, in a similar way to how it was also
done in Linux (on s390, which was tickless already). It is also
basically the same approach used for making Linux x86 tickless,
in 2.6.21 on (see commit 79bf2bb3 "tick-management: dyntick /
highres functionality").

For correctness, wee also add barriers. One is also present in
Linux, (see commit c3f59023, "Fix RCU race in access of nohz_cpu_mask",
although, we change the code comment to something that makes better
sense for us). The other (which is its pair), is put in the newly
introduced function rcu_idle_enter(), right after updating the
cpumask. They prevent races between CPUs going idle during the
beginning of a grace period.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
---
Changes from v2:
* initialize idle_cpumask to "all clear", i.e., all the CPUs are busy;
  they'll clear their bit out themselves as soon as the run the idle
  loop (pretty soon anyway).

Changes from v1:
* call rcu_idle_{enter,exit}() from tick suspension/restarting logic.  This
  widen the window during which a CPU has its bit set in the idle cpumask.
  During review, it was suggested to do the opposite (narrow it), and that's
  what I did first. But then, I changed my mind, as doing things as they look
  now (wide window), cures another pre-existing (and independent) raca which
  Tim discovered, still during v1 review;
* add a barrier in rcu_idle_enter() too, to properly deal with the race Tim
  pointed out during review;
* mark CPU where RCU initialization happens, at boot, as non-idle.
---
 xen/common/rcupdate.c      |   41 +++++++++++++++++++++++++++++++++++++++--
 xen/common/schedule.c      |    2 ++
 xen/include/xen/rcupdate.h |    3 +++
 3 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index 8cc5a82..12ae7da 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -52,7 +52,8 @@ static struct rcu_ctrlblk {
     int  next_pending;  /* Is the next batch already waiting?         */
 
     spinlock_t  lock __cacheline_aligned;
-    cpumask_t   cpumask; /* CPUs that need to switch in order    */
+    cpumask_t   cpumask; /* CPUs that need to switch in order ... */
+    cpumask_t   idle_cpumask; /* ... unless they are already idle */
     /* for current batch to proceed.        */
 } __cacheline_aligned rcu_ctrlblk = {
     .cur = -300,
@@ -248,7 +249,16 @@ static void rcu_start_batch(struct rcu_ctrlblk *rcp)
         smp_wmb();
         rcp->cur++;
 
-        cpumask_copy(&rcp->cpumask, &cpu_online_map);
+       /*
+        * Make sure the increment of rcp->cur is visible so, even if a
+        * CPU that is about to go idle, is captured inside rcp->cpumask,
+        * rcu_pending() will return false, which then means cpu_quiet()
+        * will be invoked, before the CPU would actually enter idle.
+        *
+        * This barrier is paired with the one in rcu_idle_enter().
+        */
+        smp_mb();
+        cpumask_andnot(&rcp->cpumask, &cpu_online_map, &rcp->idle_cpumask);
     }
 }
 
@@ -474,7 +484,34 @@ static struct notifier_block cpu_nfb = {
 void __init rcu_init(void)
 {
     void *cpu = (void *)(long)smp_processor_id();
+
+    cpumask_clear(&rcu_ctrlblk.idle_cpumask);
     cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
     register_cpu_notifier(&cpu_nfb);
     open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
 }
+
+/*
+ * The CPU is becoming idle, so no more read side critical
+ * sections, and one more step toward grace period.
+ */
+void rcu_idle_enter(unsigned int cpu)
+{
+    ASSERT(!cpumask_test_cpu(cpu, &rcu_ctrlblk.idle_cpumask));
+    cpumask_set_cpu(cpu, &rcu_ctrlblk.idle_cpumask);
+    /*
+     * If some other CPU is starting a new grace period, we'll notice that
+     * by seeing a new value in rcp->cur (different than our quiescbatch).
+     * That will force us all the way until cpu_quiet(), clearing our bit
+     * in rcp->cpumask, even in case we managed to get in there.
+     *
+     * Se the comment before cpumask_andnot() in  rcu_start_batch().
+     */
+    smp_mb();
+}
+
+void rcu_idle_exit(unsigned int cpu)
+{
+    ASSERT(cpumask_test_cpu(cpu, &rcu_ctrlblk.idle_cpumask));
+    cpumask_clear_cpu(cpu, &rcu_ctrlblk.idle_cpumask);
+}
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index e83f4c7..c6f4817 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1903,6 +1903,7 @@ void sched_tick_suspend(void)
 
     sched = per_cpu(scheduler, cpu);
     SCHED_OP(sched, tick_suspend, cpu);
+    rcu_idle_enter(cpu);
 }
 
 void sched_tick_resume(void)
@@ -1910,6 +1911,7 @@ void sched_tick_resume(void)
     struct scheduler *sched;
     unsigned int cpu = smp_processor_id();
 
+    rcu_idle_exit(cpu);
     sched = per_cpu(scheduler, cpu);
     SCHED_OP(sched, tick_resume, cpu);
 }
diff --git a/xen/include/xen/rcupdate.h b/xen/include/xen/rcupdate.h
index 557a7b1..561ac43 100644
--- a/xen/include/xen/rcupdate.h
+++ b/xen/include/xen/rcupdate.h
@@ -146,4 +146,7 @@ void call_rcu(struct rcu_head *head,
 
 int rcu_barrier(void);
 
+void rcu_idle_enter(unsigned int cpu);
+void rcu_idle_exit(unsigned int cpu);
+
 #endif /* __XEN_RCUPDATE_H */


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 4/6] xen: RCU: don't let a CPU with a callback go idle.
  2017-08-18 18:04 [PATCH v3 0/6] xen: RCU: x86/ARM: Add support of rcu_idle_{enter, exit} Dario Faggioli
                   ` (2 preceding siblings ...)
  2017-08-18 18:04 ` [PATCH v3 3/6] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started Dario Faggioli
@ 2017-08-18 18:04 ` Dario Faggioli
  2017-08-29 15:01   ` George Dunlap
  2017-08-18 18:04 ` [PATCH v3 5/6] xen: RCU: avoid busy waiting until the end of grace period Dario Faggioli
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Dario Faggioli @ 2017-08-18 18:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich

If a CPU has a callback queued, it must be ready to invoke
it, as soon as all the other CPUs involved in the grace period
has gone through a quiescent state.

But if we let such CPU go idle, we can't really tell when (if!)
it will realize that it is actually time to invoke the callback.
To solve this problem, a CPU that has a callback queued (and has
already gone through a quiescent state itself) will stay online,
until the grace period ends, and the callback can be invoked.

This is similar to what Linux does, and is the second and last
step for fixing the overly long (or infinite!) grace periods.
The problem, though, is that, within Linux, we have the tick,
so, all that is necessary is to not stop the tick for the CPU
(even if it has gone idle). In Xen, there's no tick, so we must
avoid for the CPU to go idle entirely, and let it spin on
rcu_pending(), consuming power and causing overhead.

In this commit, we implement the above, using rcu_needs_cpu(),
in a way similar to how it is used in Linux. This it correct,
useful and not wasteful for CPUs that participate in grace
period, but have not a callback queued. For the ones that
has callbacks, an optimization that avoids having to spin is
introduced in a subsequent change.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/xen/sched.h |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 5828a01..c116604 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -847,7 +847,8 @@ uint64_t get_cpu_idle_time(unsigned int cpu);
 
 /*
  * Used by idle loop to decide whether there is work to do:
- *  (1) Run softirqs; or (2) Play dead; or (3) Run tasklets.
+ *  (1) Deal with RCU; (2) or run softirqs; or (3) Play dead;
+ *  or (4) Run tasklets.
  *
  * About (3), if a tasklet is enqueued, it will be scheduled
  * really really soon, and hence it's pointless to try to
@@ -855,7 +856,8 @@ uint64_t get_cpu_idle_time(unsigned int cpu);
  * the tasklet_work_to_do() helper).
  */
 #define cpu_is_haltable(cpu)                    \
-    (!softirq_pending(cpu) &&                   \
+    (!rcu_needs_cpu(cpu) &&                     \
+     !softirq_pending(cpu) &&                   \
      cpu_online(cpu) &&                         \
      !per_cpu(tasklet_work_to_do, cpu))
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 5/6] xen: RCU: avoid busy waiting until the end of grace period.
  2017-08-18 18:04 [PATCH v3 0/6] xen: RCU: x86/ARM: Add support of rcu_idle_{enter, exit} Dario Faggioli
                   ` (3 preceding siblings ...)
  2017-08-18 18:04 ` [PATCH v3 4/6] xen: RCU: don't let a CPU with a callback go idle Dario Faggioli
@ 2017-08-18 18:04 ` Dario Faggioli
  2017-08-22 13:04   ` Jan Beulich
  2017-08-29 16:03   ` George Dunlap
  2017-08-18 18:04 ` [PATCH v3 6/6] xen: try to prevent idle timer from firing too often Dario Faggioli
  2017-08-19  9:45 ` [PATCH v3 0/6] xen: RCU: x86/ARM: Add support of rcu_idle_{enter, exit} Tim Deegan
  6 siblings, 2 replies; 21+ messages in thread
From: Dario Faggioli @ 2017-08-18 18:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich

On the CPU where a callback is queued, cpu_is_haltable()
returns false (due to rcu_needs_cpu() being itself false).
That means the CPU would spin inside idle_loop(), continuously
calling do_softirq(), and, in there, continuously checking
rcu_pending(), in a tight loop.

Let's instead allow the CPU to really go idle, but make sure,
by arming a timer, that we periodically check whether the
grace period has come to an ended. As the period of the
timer, we pick a value that makes thing look like what
happens in Linux, with the periodic tick (as this code
comes from there).

Note that the timer will *only* be armed on CPUs that are
going idle while having queued RCU callbacks. On CPUs that
don't, there won't be any timer, and their sleep won't be
interrupted (and even for CPUs with callbacks, we only
expect an handful of wakeups at most, but that depends on
the system load, as much as from other things).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
---
Changes from v1:
* clarified changelog;
* fix style/indentation issues;
* deal with RCU idle timer in tick suspension logic;
* as a consequence of the point above, the timer now fires, so kill
  the ASSERT_UNREACHABLE, and put a perfcounter there (to count the
  times it triggers);
* add a comment about the value chosen for programming the idle timer;
* avoid pointless/bogus '!!' and void* casts;
* rearrange the rcu_needs_cpu() return condition;
* add a comment to clarify why we don't want to check rcu_pending()
  in rcu_idle_timer_start().
---
 xen/arch/x86/cpu/mwait-idle.c |    3 +-
 xen/common/rcupdate.c         |   72 ++++++++++++++++++++++++++++++++++++++++-
 xen/common/schedule.c         |    2 +
 xen/include/xen/perfc_defn.h  |    2 +
 xen/include/xen/rcupdate.h    |    3 ++
 5 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c
index 762dff1..b6770ea 100644
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -741,9 +741,8 @@ static void mwait_idle(void)
 	}
 
 	cpufreq_dbs_timer_suspend();
-
 	sched_tick_suspend();
-	/* sched_tick_suspend() can raise TIMER_SOFTIRQ. Process it now. */
+	/* Timer related operations can raise TIMER_SOFTIRQ. Process it now. */
 	process_pending_softirqs();
 
 	/* Interrupts must be disabled for C2 and higher transitions. */
diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index 12ae7da..871936f 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -84,8 +84,37 @@ struct rcu_data {
     int cpu;
     struct rcu_head barrier;
     long            last_rs_qlen;     /* qlen during the last resched */
+
+    /* 3) idle CPUs handling */
+    struct timer idle_timer;
+    bool idle_timer_active;
 };
 
+/*
+ * If a CPU with RCU callbacks queued goes idle, when the grace period is
+ * not finished yet, how can we make sure that the callbacks will eventually
+ * be executed? In Linux (2.6.21, the first "tickless idle" Linux kernel),
+ * the periodic timer tick would not be stopped for such CPU. Here in Xen,
+ * we (may) don't even have a periodic timer tick, so we need to use a
+ * special purpose timer.
+ *
+ * Such timer:
+ * 1) is armed only when a CPU with an RCU callback(s) queued goes idle
+ *    before the end of the current grace period (_not_ for any CPUs that
+ *    go idle!);
+ * 2) when it fires, it is only re-armed if the grace period is still
+ *    running;
+ * 3) it is stopped immediately, if the CPU wakes up from idle and
+ *    resumes 'normal' execution.
+ *
+ * About how far in the future the timer should be programmed each time,
+ * it's hard to tell (guess!!). Since this mimics Linux's periodic timer
+ * tick, take values used there as an indication. In Linux 2.6.21, tick
+ * period can be 10ms, 4ms, 3.33ms or 1ms. Let's use 10ms, to enable
+ * at least some power saving on the CPU that is going idle.
+ */
+#define RCU_IDLE_TIMER_PERIOD MILLISECS(10)
+
 static DEFINE_PER_CPU(struct rcu_data, rcu_data);
 
 static int blimit = 10;
@@ -404,7 +433,45 @@ int rcu_needs_cpu(int cpu)
 {
     struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
 
-    return (!!rdp->curlist || rcu_pending(cpu));
+    return (rdp->curlist && !rdp->idle_timer_active) || rcu_pending(cpu);
+}
+
+/*
+ * Timer for making sure the CPU where a callback is queued does
+ * periodically poke rcu_pedning(), so that it will invoke the callback
+ * not too late after the end of the grace period.
+ */
+void rcu_idle_timer_start()
+{
+    struct rcu_data *rdp = &this_cpu(rcu_data);
+
+    /*
+     * Note that we don't check rcu_pending() here. In fact, we don't want
+     * the timer armed on CPUs that are in the process of quiescing while
+     * going idle, unless they really are the ones with a queued callback.
+     */
+    if (likely(!rdp->curlist))
+        return;
+
+    set_timer(&rdp->idle_timer, NOW() + RCU_IDLE_TIMER_PERIOD);
+    rdp->idle_timer_active = true;
+}
+
+void rcu_idle_timer_stop()
+{
+    struct rcu_data *rdp = &this_cpu(rcu_data);
+
+    if (likely(!rdp->idle_timer_active))
+        return;
+
+    rdp->idle_timer_active = false;
+    stop_timer(&rdp->idle_timer);
+}
+
+static void rcu_idle_timer_handler(void* data)
+{
+    /* Nothing, really... Just count the number of times we fire */
+    perfc_incr(rcu_idle_timer);
 }
 
 void rcu_check_callbacks(int cpu)
@@ -425,6 +492,8 @@ static void rcu_move_batch(struct rcu_data *this_rdp, struct rcu_head *list,
 static void rcu_offline_cpu(struct rcu_data *this_rdp,
                             struct rcu_ctrlblk *rcp, struct rcu_data *rdp)
 {
+    kill_timer(&rdp->idle_timer);
+
     /* If the cpu going offline owns the grace period we can block
      * indefinitely waiting for it, so flush it here.
      */
@@ -453,6 +522,7 @@ static void rcu_init_percpu_data(int cpu, struct rcu_ctrlblk *rcp,
     rdp->qs_pending = 0;
     rdp->cpu = cpu;
     rdp->blimit = blimit;
+    init_timer(&rdp->idle_timer, rcu_idle_timer_handler, rdp, cpu);
 }
 
 static int cpu_callback(
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index c6f4817..8827921 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1904,6 +1904,7 @@ void sched_tick_suspend(void)
     sched = per_cpu(scheduler, cpu);
     SCHED_OP(sched, tick_suspend, cpu);
     rcu_idle_enter(cpu);
+    rcu_idle_timer_start();
 }
 
 void sched_tick_resume(void)
@@ -1911,6 +1912,7 @@ void sched_tick_resume(void)
     struct scheduler *sched;
     unsigned int cpu = smp_processor_id();
 
+    rcu_idle_timer_stop();
     rcu_idle_exit(cpu);
     sched = per_cpu(scheduler, cpu);
     SCHED_OP(sched, tick_resume, cpu);
diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
index 53849af..ca446e5 100644
--- a/xen/include/xen/perfc_defn.h
+++ b/xen/include/xen/perfc_defn.h
@@ -12,6 +12,8 @@ PERFCOUNTER(calls_from_multicall,       "calls from multicall")
 PERFCOUNTER(irqs,                   "#interrupts")
 PERFCOUNTER(ipis,                   "#IPIs")
 
+PERFCOUNTER(rcu_idle_timer,         "RCU: idle_timer")
+
 /* Generic scheduler counters (applicable to all schedulers) */
 PERFCOUNTER(sched_irq,              "sched: timer")
 PERFCOUNTER(sched_run,              "sched: runs through scheduler")
diff --git a/xen/include/xen/rcupdate.h b/xen/include/xen/rcupdate.h
index 561ac43..3402eb5 100644
--- a/xen/include/xen/rcupdate.h
+++ b/xen/include/xen/rcupdate.h
@@ -149,4 +149,7 @@ int rcu_barrier(void);
 void rcu_idle_enter(unsigned int cpu);
 void rcu_idle_exit(unsigned int cpu);
 
+void rcu_idle_timer_start(void);
+void rcu_idle_timer_stop(void);
+
 #endif /* __XEN_RCUPDATE_H */


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 6/6] xen: try to prevent idle timer from firing too often.
  2017-08-18 18:04 [PATCH v3 0/6] xen: RCU: x86/ARM: Add support of rcu_idle_{enter, exit} Dario Faggioli
                   ` (4 preceding siblings ...)
  2017-08-18 18:04 ` [PATCH v3 5/6] xen: RCU: avoid busy waiting until the end of grace period Dario Faggioli
@ 2017-08-18 18:04 ` Dario Faggioli
  2017-08-29 16:30   ` George Dunlap
  2017-08-19  9:45 ` [PATCH v3 0/6] xen: RCU: x86/ARM: Add support of rcu_idle_{enter, exit} Tim Deegan
  6 siblings, 1 reply; 21+ messages in thread
From: Dario Faggioli @ 2017-08-18 18:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich

Idea is: the more CPUs are still active in a grace period,
the more we can wait to check whether it's time to invoke
the callbacks (on those CPUs that have already quiesced,
are idle, and have callbacks queued).

What we're trying to avoid is one of those idle CPUs to
wake up, only to discover that the grace period is still
running, and that it hence could have be slept longer
(saving more power).

This patch implements an heuristic aimed at achieving
that, at the price of having to call cpumask_weight() on
the 'entering idle' path, on CPUs with queued callbacks.

Of course, we, at the same time, don't want to delay
recognising that we can invoke the callbacks for too
much, so we also set a maximum.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
---
 xen/common/rcupdate.c |   18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index 871936f..5a3cb1d 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -110,10 +110,17 @@ struct rcu_data {
  * About how far in the future the timer should be programmed each time,
  * it's hard to tell (guess!!). Since this mimics Linux's periodic timer
  * tick, take values used there as an indication. In Linux 2.6.21, tick
- * period can be 10ms, 4ms, 3.33ms or 1ms. Let's use 10ms, to enable
- * at least some power saving on the CPU that is going idle.
+ * period can be 10ms, 4ms, 3.33ms or 1ms.
+ *
+ * That being said, we can assume that, the more CPUs are still active in
+ * the current grace period, the longer it will take for it to come to its
+ * end. We wait 10ms for each active CPU, as minimizing the wakeups enables
+ * more effective power saving, on the CPU that has gone idle. But we also
+ * never wait more than 100ms, to avoid delaying recognising the end of a
+ * grace period (and the invocation of the callbacks) by too much.
  */
-#define RCU_IDLE_TIMER_PERIOD MILLISECS(10)
+#define RCU_IDLE_TIMER_CPU_DELAY  MILLISECS(10)
+#define RCU_IDLE_TIMER_PERIOD_MAX MILLISECS(100)
 
 static DEFINE_PER_CPU(struct rcu_data, rcu_data);
 
@@ -444,6 +451,7 @@ int rcu_needs_cpu(int cpu)
 void rcu_idle_timer_start()
 {
     struct rcu_data *rdp = &this_cpu(rcu_data);
+    s_time_t next;
 
     /*
      * Note that we don't check rcu_pending() here. In fact, we don't want
@@ -453,7 +461,9 @@ void rcu_idle_timer_start()
     if (likely(!rdp->curlist))
         return;
 
-    set_timer(&rdp->idle_timer, NOW() + RCU_IDLE_TIMER_PERIOD);
+    next = min_t(s_time_t, RCU_IDLE_TIMER_PERIOD_MAX,
+                 cpumask_weight(&rcu_ctrlblk.cpumask) * RCU_IDLE_TIMER_CPU_DELAY);
+    set_timer(&rdp->idle_timer, NOW() + next);
     rdp->idle_timer_active = true;
 }
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 0/6] xen: RCU: x86/ARM: Add support of rcu_idle_{enter, exit}
  2017-08-18 18:04 [PATCH v3 0/6] xen: RCU: x86/ARM: Add support of rcu_idle_{enter, exit} Dario Faggioli
                   ` (5 preceding siblings ...)
  2017-08-18 18:04 ` [PATCH v3 6/6] xen: try to prevent idle timer from firing too often Dario Faggioli
@ 2017-08-19  9:45 ` Tim Deegan
  6 siblings, 0 replies; 21+ messages in thread
From: Tim Deegan @ 2017-08-19  9:45 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Julien Grall, Jan Beulich, xen-devel

At 20:04 +0200 on 18 Aug (1503086640), Dario Faggioli wrote:
> Hey,
> 
> I know, v2 of this series just went out. But since I leave for 2 weeks, I
> wanted the version with as much as possible comments addressed to be on the
> list, and since it was rather quick to do that (and test it), for latest Tim's
> comment, here I am.
> 
> So, basically, this is v2, as here:
> 
>  https://lists.xen.org/archives/html/xen-devel/2017-08/msg01881.html
> 
> But with, as Tim suggested, the idle_cpumask initialized to "all clear". The
> various CPUs, therefore, are considered busy when they come up, and clear their
> own bit in the mask, as soon as they enter the idle loop for the first time
> (which is pretty soon). Doing like this, we close another window for a
> potential (although, rather unlikely) race/unnecessary extension of the grace
> period, and we simplify the code (a.k.a. 'win-win' :-D).

Thanks!

Reviewed-by: Tim Deegan <tim@xen.org>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/6] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started.
  2017-08-18 18:04 ` [PATCH v3 3/6] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started Dario Faggioli
@ 2017-08-22 12:59   ` Jan Beulich
  2017-08-29 14:58   ` George Dunlap
  1 sibling, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2017-08-22 12:59 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

>>> On 18.08.17 at 20:04, <dario.faggioli@citrix.com> wrote:
> Changes from v2:
> * initialize idle_cpumask to "all clear", i.e., all the CPUs are busy;
>   they'll clear their bit out themselves as soon as the run the idle
>   loop (pretty soon anyway).

Just to make sure I correctly understand this - you really mean
"they'll set their bit themselves ..."?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 5/6] xen: RCU: avoid busy waiting until the end of grace period.
  2017-08-18 18:04 ` [PATCH v3 5/6] xen: RCU: avoid busy waiting until the end of grace period Dario Faggioli
@ 2017-08-22 13:04   ` Jan Beulich
  2017-08-29 16:06     ` George Dunlap
  2017-08-29 16:03   ` George Dunlap
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2017-08-22 13:04 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

>>> On 18.08.17 at 20:04, <dario.faggioli@citrix.com> wrote:
> --- a/xen/arch/x86/cpu/mwait-idle.c
> +++ b/xen/arch/x86/cpu/mwait-idle.c
> @@ -741,9 +741,8 @@ static void mwait_idle(void)
>  	}
>  
>  	cpufreq_dbs_timer_suspend();
> -
>  	sched_tick_suspend();
> -	/* sched_tick_suspend() can raise TIMER_SOFTIRQ. Process it now. */
> +	/* Timer related operations can raise TIMER_SOFTIRQ. Process it now. */
>  	process_pending_softirqs();

Is this a leftover from v1? Otherwise, why do you do the adjustment
here but not in acpi_processor_idle()?

> --- a/xen/common/rcupdate.c
> +++ b/xen/common/rcupdate.c
> @@ -84,8 +84,37 @@ struct rcu_data {
>      int cpu;
>      struct rcu_head barrier;
>      long            last_rs_qlen;     /* qlen during the last resched */
> +
> +    /* 3) idle CPUs handling */
> +    struct timer idle_timer;
> +    bool idle_timer_active;
>  };
>  
> +/*
> + * If a CPU with RCU callbacks queued goes idle, when the grace period is
> + * not finished yet, how can we make sure that the callbacks will eventually
> + * be executed? In Linux (2.6.21, the first "tickless idle" Linux kernel),
> + * the periodic timer tick would not be stopped for such CPU. Here in Xen,
> + * we (may) don't even have a periodic timer tick, so we need to use a
> + * special purpose timer.
> + *
> + * Such timer:
> + * 1) is armed only when a CPU with an RCU callback(s) queued goes idle
> + *    before the end of the current grace period (_not_ for any CPUs that
> + *    go idle!);
> + * 2) when it fires, it is only re-armed if the grace period is still
> + *    running;
> + * 3) it is stopped immediately, if the CPU wakes up from idle and
> + *    resumes 'normal' execution.
> + *
> + * About how far in the future the timer should be programmed each time,
> + * it's hard to tell (guess!!). Since this mimics Linux's periodic timer
> + * tick, take values used there as an indication. In Linux 2.6.21, tick
> + * period can be 10ms, 4ms, 3.33ms or 1ms. Let's use 10ms, to enable
> + * at least some power saving on the CPU that is going idle.
> + */
> +#define RCU_IDLE_TIMER_PERIOD MILLISECS(10)

With you even mentioning that the original Linux code has ways
to use different values, wouldn't it be worth allowing this to be
command line controllable?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/6] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started.
  2017-08-18 18:04 ` [PATCH v3 3/6] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started Dario Faggioli
  2017-08-22 12:59   ` Jan Beulich
@ 2017-08-29 14:58   ` George Dunlap
  1 sibling, 0 replies; 21+ messages in thread
From: George Dunlap @ 2017-08-29 14:58 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich

On 08/18/2017 07:04 PM, Dario Faggioli wrote:
> Xen is a tickless (micro-)kernel, i.e., when a CPU becomes
> idle there is no timer tick that will periodically wake the
> CPU up.
> OTOH, when we imported RCU from Linux, Linux was (on x86) a
> ticking kernel, i.e., there was a periodic timer tick always
> running, even on idle CPUs. This was bad for power consumption,
> but, for instance, made it easy to monitor the quiescent states
> of all the CPUs, and hence tell when RCU grace periods ended.
> 
> In Xen, that is impossible, and that's particularly problematic
> when the system is very lightly loaded, as some CPUs may never
> have the chance to tell the RCU core logic about their quiescence,
> and grace periods could extend indefinitely!
> 
> This has led, on x86, to long (and unpredictable) delays between
> RCU callbacks queueing and their actual invokation. On ARM, we've
> even seen infinite grace periods (e.g., complate_domain_destroy()
> never being actually invoked!). See here:
> 
>  https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg02454.html
> 
> The first step for fixing this situation is for RCU to record,
> at the beginning of a grace period, which CPUs are already idle.
> In fact, being idle, they can't be in the middle of any read-side
> critical section, and we don't have to wait for their quiescence.
> 
> This is tracked in a cpumask, in a similar way to how it was also
> done in Linux (on s390, which was tickless already). It is also
> basically the same approach used for making Linux x86 tickless,
> in 2.6.21 on (see commit 79bf2bb3 "tick-management: dyntick /
> highres functionality").
> 
> For correctness, wee also add barriers. One is also present in
> Linux, (see commit c3f59023, "Fix RCU race in access of nohz_cpu_mask",
> although, we change the code comment to something that makes better
> sense for us). The other (which is its pair), is put in the newly
> introduced function rcu_idle_enter(), right after updating the
> cpumask. They prevent races between CPUs going idle during the
> beginning of a grace period.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 4/6] xen: RCU: don't let a CPU with a callback go idle.
  2017-08-18 18:04 ` [PATCH v3 4/6] xen: RCU: don't let a CPU with a callback go idle Dario Faggioli
@ 2017-08-29 15:01   ` George Dunlap
  0 siblings, 0 replies; 21+ messages in thread
From: George Dunlap @ 2017-08-29 15:01 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich

On 08/18/2017 07:04 PM, Dario Faggioli wrote:
> If a CPU has a callback queued, it must be ready to invoke
> it, as soon as all the other CPUs involved in the grace period
> has gone through a quiescent state.
> 
> But if we let such CPU go idle, we can't really tell when (if!)
> it will realize that it is actually time to invoke the callback.
> To solve this problem, a CPU that has a callback queued (and has
> already gone through a quiescent state itself) will stay online,
> until the grace period ends, and the callback can be invoked.
> 
> This is similar to what Linux does, and is the second and last
> step for fixing the overly long (or infinite!) grace periods.
> The problem, though, is that, within Linux, we have the tick,
> so, all that is necessary is to not stop the tick for the CPU
> (even if it has gone idle). In Xen, there's no tick, so we must
> avoid for the CPU to go idle entirely, and let it spin on
> rcu_pending(), consuming power and causing overhead.
> 
> In this commit, we implement the above, using rcu_needs_cpu(),
> in a way similar to how it is used in Linux. This it correct,
> useful and not wasteful for CPUs that participate in grace
> period, but have not a callback queued. For the ones that
> has callbacks, an optimization that avoids having to spin is
> introduced in a subsequent change.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Acked-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 5/6] xen: RCU: avoid busy waiting until the end of grace period.
  2017-08-18 18:04 ` [PATCH v3 5/6] xen: RCU: avoid busy waiting until the end of grace period Dario Faggioli
  2017-08-22 13:04   ` Jan Beulich
@ 2017-08-29 16:03   ` George Dunlap
  1 sibling, 0 replies; 21+ messages in thread
From: George Dunlap @ 2017-08-29 16:03 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich

On 08/18/2017 07:04 PM, Dario Faggioli wrote:
> On the CPU where a callback is queued, cpu_is_haltable()
> returns false (due to rcu_needs_cpu() being itself false).
> That means the CPU would spin inside idle_loop(), continuously
> calling do_softirq(), and, in there, continuously checking
> rcu_pending(), in a tight loop.
> 
> Let's instead allow the CPU to really go idle, but make sure,
> by arming a timer, that we periodically check whether the
> grace period has come to an ended. As the period of the
> timer, we pick a value that makes thing look like what
> happens in Linux, with the periodic tick (as this code
> comes from there).
> 
> Note that the timer will *only* be armed on CPUs that are
> going idle while having queued RCU callbacks. On CPUs that
> don't, there won't be any timer, and their sleep won't be
> interrupted (and even for CPUs with callbacks, we only
> expect an handful of wakeups at most, but that depends on
> the system load, as much as from other things).
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 5/6] xen: RCU: avoid busy waiting until the end of grace period.
  2017-08-22 13:04   ` Jan Beulich
@ 2017-08-29 16:06     ` George Dunlap
  2017-08-30  7:18       ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: George Dunlap @ 2017-08-29 16:06 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

On 08/22/2017 02:04 PM, Jan Beulich wrote:
>>>> On 18.08.17 at 20:04, <dario.faggioli@citrix.com> wrote:
>> --- a/xen/arch/x86/cpu/mwait-idle.c
>> +++ b/xen/arch/x86/cpu/mwait-idle.c
>> @@ -741,9 +741,8 @@ static void mwait_idle(void)
>>  	}
>>  
>>  	cpufreq_dbs_timer_suspend();
>> -
>>  	sched_tick_suspend();
>> -	/* sched_tick_suspend() can raise TIMER_SOFTIRQ. Process it now. */
>> +	/* Timer related operations can raise TIMER_SOFTIRQ. Process it now. */
>>  	process_pending_softirqs();
> 
> Is this a leftover from v1? Otherwise, why do you do the adjustment
> here but not in acpi_processor_idle()?
> 
>> --- a/xen/common/rcupdate.c
>> +++ b/xen/common/rcupdate.c
>> @@ -84,8 +84,37 @@ struct rcu_data {
>>      int cpu;
>>      struct rcu_head barrier;
>>      long            last_rs_qlen;     /* qlen during the last resched */
>> +
>> +    /* 3) idle CPUs handling */
>> +    struct timer idle_timer;
>> +    bool idle_timer_active;
>>  };
>>  
>> +/*
>> + * If a CPU with RCU callbacks queued goes idle, when the grace period is
>> + * not finished yet, how can we make sure that the callbacks will eventually
>> + * be executed? In Linux (2.6.21, the first "tickless idle" Linux kernel),
>> + * the periodic timer tick would not be stopped for such CPU. Here in Xen,
>> + * we (may) don't even have a periodic timer tick, so we need to use a
>> + * special purpose timer.
>> + *
>> + * Such timer:
>> + * 1) is armed only when a CPU with an RCU callback(s) queued goes idle
>> + *    before the end of the current grace period (_not_ for any CPUs that
>> + *    go idle!);
>> + * 2) when it fires, it is only re-armed if the grace period is still
>> + *    running;
>> + * 3) it is stopped immediately, if the CPU wakes up from idle and
>> + *    resumes 'normal' execution.
>> + *
>> + * About how far in the future the timer should be programmed each time,
>> + * it's hard to tell (guess!!). Since this mimics Linux's periodic timer
>> + * tick, take values used there as an indication. In Linux 2.6.21, tick
>> + * period can be 10ms, 4ms, 3.33ms or 1ms. Let's use 10ms, to enable
>> + * at least some power saving on the CPU that is going idle.
>> + */
>> +#define RCU_IDLE_TIMER_PERIOD MILLISECS(10)
> 
> With you even mentioning that the original Linux code has ways
> to use different values, wouldn't it be worth allowing this to be
> command line controllable?

Apart from this, are you OK with the patch?

Dario is on holiday, and I think it would be good to get this
functionality in sooner rather than later to shake out as many bugs as
possible.  Would you be willing to let the idle timer period be set with
a follow-up patch?

I'm happy to propagate the comment change to acpi_processor_idle() if
necessary.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 6/6] xen: try to prevent idle timer from firing too often.
  2017-08-18 18:04 ` [PATCH v3 6/6] xen: try to prevent idle timer from firing too often Dario Faggioli
@ 2017-08-29 16:30   ` George Dunlap
  2017-09-06 16:51     ` Dario Faggioli
  0 siblings, 1 reply; 21+ messages in thread
From: George Dunlap @ 2017-08-29 16:30 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich

On 08/18/2017 07:04 PM, Dario Faggioli wrote:
> Idea is: the more CPUs are still active in a grace period,
> the more we can wait to check whether it's time to invoke
> the callbacks (on those CPUs that have already quiesced,
> are idle, and have callbacks queued).
> 
> What we're trying to avoid is one of those idle CPUs to
> wake up, only to discover that the grace period is still
> running, and that it hence could have be slept longer
> (saving more power).

But this will only happen:
1. If the cpu in question had just been running
2. Until the grace period is actually finished, at which point it will
stop entirely again (since idle vcpus are filtered out now).

So I think we're only taking about one or two extra wakeups per cpu
maximum -- if this even happens at all.

Wouldn't it be better to first add a performance counter, and check to
see if and how often this situation happens?

> This patch implements an heuristic aimed at achieving
> that, at the price of having to call cpumask_weight() on
> the 'entering idle' path, on CPUs with queued callbacks.

The heuristic seems a bit strange to me too: why would each cpu increase
the grace period in a linear fashion?  I haven't looked at the grace
period code, but I would have expected each one to be independent; and
so there'd be a "diminishing returns" effect when adding more cpus.

If we have to have something like this (which I'm not at all convinced
we do), I would think having a single number which self-adjusted so that
the number of 'misses' was around 1% would be a good balance.  What
about a heuristic like this:

1. If the timer goes off and the grace period isn't over, add 10ms to
the timer period
2. If the timer goes off and the grace period *is* over, subtract 100us
from the timer period

The above should self-adjust so that around 1% of timers miss.

Thoughts?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 5/6] xen: RCU: avoid busy waiting until the end of grace period.
  2017-08-29 16:06     ` George Dunlap
@ 2017-08-30  7:18       ` Jan Beulich
  2017-08-30 11:16         ` George Dunlap
  2017-09-05 17:13         ` Dario Faggioli
  0 siblings, 2 replies; 21+ messages in thread
From: Jan Beulich @ 2017-08-30  7:18 UTC (permalink / raw)
  To: George Dunlap
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Dario Faggioli, Ian Jackson, Tim Deegan, Julien Grall, xen-devel

>>> On 29.08.17 at 18:06, <george.dunlap@citrix.com> wrote:
> On 08/22/2017 02:04 PM, Jan Beulich wrote:
>>>>> On 18.08.17 at 20:04, <dario.faggioli@citrix.com> wrote:
>>> --- a/xen/arch/x86/cpu/mwait-idle.c
>>> +++ b/xen/arch/x86/cpu/mwait-idle.c
>>> @@ -741,9 +741,8 @@ static void mwait_idle(void)
>>>  	}
>>>  
>>>  	cpufreq_dbs_timer_suspend();
>>> -
>>>  	sched_tick_suspend();
>>> -	/* sched_tick_suspend() can raise TIMER_SOFTIRQ. Process it now. */
>>> +	/* Timer related operations can raise TIMER_SOFTIRQ. Process it now. */
>>>  	process_pending_softirqs();
>> 
>> Is this a leftover from v1? Otherwise, why do you do the adjustment
>> here but not in acpi_processor_idle()?
>> 
>>> --- a/xen/common/rcupdate.c
>>> +++ b/xen/common/rcupdate.c
>>> @@ -84,8 +84,37 @@ struct rcu_data {
>>>      int cpu;
>>>      struct rcu_head barrier;
>>>      long            last_rs_qlen;     /* qlen during the last resched */
>>> +
>>> +    /* 3) idle CPUs handling */
>>> +    struct timer idle_timer;
>>> +    bool idle_timer_active;
>>>  };
>>>  
>>> +/*
>>> + * If a CPU with RCU callbacks queued goes idle, when the grace period is
>>> + * not finished yet, how can we make sure that the callbacks will 
> eventually
>>> + * be executed? In Linux (2.6.21, the first "tickless idle" Linux kernel),
>>> + * the periodic timer tick would not be stopped for such CPU. Here in Xen,
>>> + * we (may) don't even have a periodic timer tick, so we need to use a
>>> + * special purpose timer.
>>> + *
>>> + * Such timer:
>>> + * 1) is armed only when a CPU with an RCU callback(s) queued goes idle
>>> + *    before the end of the current grace period (_not_ for any CPUs that
>>> + *    go idle!);
>>> + * 2) when it fires, it is only re-armed if the grace period is still
>>> + *    running;
>>> + * 3) it is stopped immediately, if the CPU wakes up from idle and
>>> + *    resumes 'normal' execution.
>>> + *
>>> + * About how far in the future the timer should be programmed each time,
>>> + * it's hard to tell (guess!!). Since this mimics Linux's periodic timer
>>> + * tick, take values used there as an indication. In Linux 2.6.21, tick
>>> + * period can be 10ms, 4ms, 3.33ms or 1ms. Let's use 10ms, to enable
>>> + * at least some power saving on the CPU that is going idle.
>>> + */
>>> +#define RCU_IDLE_TIMER_PERIOD MILLISECS(10)
>> 
>> With you even mentioning that the original Linux code has ways
>> to use different values, wouldn't it be worth allowing this to be
>> command line controllable?
> 
> Apart from this, are you OK with the patch?

Yes.

> Dario is on holiday, and I think it would be good to get this
> functionality in sooner rather than later to shake out as many bugs as
> possible.  Would you be willing to let the idle timer period be set with
> a follow-up patch?

Yes.

> I'm happy to propagate the comment change to acpi_processor_idle() if
> necessary.

Propagate? So far I was of the opinion that the change had been
intentionally dropped there, but was mistakenly left in place in
mwait_idle(). Or if the comment was intended to be changed (in
both places) I can't see how that would belong into this patch, as
I think sched_tick_suspend() could have fiddled with timers even
before. But in the end I don't care all that much if the comment
adjustment gets done here or in a separate patch - my main wish
really is for the two places to stay in sync.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 5/6] xen: RCU: avoid busy waiting until the end of grace period.
  2017-08-30  7:18       ` Jan Beulich
@ 2017-08-30 11:16         ` George Dunlap
  2017-09-05 17:13         ` Dario Faggioli
  1 sibling, 0 replies; 21+ messages in thread
From: George Dunlap @ 2017-08-30 11:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Dario Faggioli,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

On Wed, Aug 30, 2017 at 8:18 AM, Jan Beulich <JBeulich@suse.com> wrote:
>> Apart from this, are you OK with the patch?
>
> Yes.
>
>> Dario is on holiday, and I think it would be good to get this
>> functionality in sooner rather than later to shake out as many bugs as
>> possible.  Would you be willing to let the idle timer period be set with
>> a follow-up patch?
>
> Yes.
>
>> I'm happy to propagate the comment change to acpi_processor_idle() if
>> necessary.
>
> Propagate? So far I was of the opinion that the change had been
> intentionally dropped there, but was mistakenly left in place in
> mwait_idle(). Or if the comment was intended to be changed (in
> both places) I can't see how that would belong into this patch, as
> I think sched_tick_suspend() could have fiddled with timers even
> before. But in the end I don't care all that much if the comment
> adjustment gets done here or in a separate patch - my main wish
> really is for the two places to stay in sync.

OK -- I've checked in through patch 5 with that comment hunk removed.

Thanks,
 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 5/6] xen: RCU: avoid busy waiting until the end of grace period.
  2017-08-30  7:18       ` Jan Beulich
  2017-08-30 11:16         ` George Dunlap
@ 2017-09-05 17:13         ` Dario Faggioli
  2017-09-06  9:50           ` Jan Beulich
  1 sibling, 1 reply; 21+ messages in thread
From: Dario Faggioli @ 2017-09-05 17:13 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4168 bytes --]

On Wed, 2017-08-30 at 01:18 -0600, Jan Beulich wrote:
> > > > On 29.08.17 at 18:06, <george.dunlap@citrix.com> wrote:
> > 
> > On 08/22/2017 02:04 PM, Jan Beulich wrote:
> > > > > > On 18.08.17 at 20:04, <dario.faggioli@citrix.com> wrote:
> > > > 
> > > > --- a/xen/arch/x86/cpu/mwait-idle.c
> > > > +++ b/xen/arch/x86/cpu/mwait-idle.c
> > > > @@ -741,9 +741,8 @@ static void mwait_idle(void)
> > > >  	}
> > > >  
> > > >  	cpufreq_dbs_timer_suspend();
> > > > -
> > > >  	sched_tick_suspend();
> > > > -	/* sched_tick_suspend() can raise TIMER_SOFTIRQ.
> > > > Process it now. */
> > > > +	/* Timer related operations can raise TIMER_SOFTIRQ.
> > > > Process it now. */
> > > >  	process_pending_softirqs();
> > > 
> > > Is this a leftover from v1? Otherwise, why do you do the
> > > adjustment
> > > here but not in acpi_processor_idle()?
> > > 
> > > > --- a/xen/common/rcupdate.c
> > > > +++ b/xen/common/rcupdate.c
> > > > @@ -84,8 +84,37 @@ struct rcu_data {
> > > >      int cpu;
> > > >      struct rcu_head barrier;
> > > >      long            last_rs_qlen;     /* qlen during the last
> > > > resched */
> > > > +
> > > > +    /* 3) idle CPUs handling */
> > > > +    struct timer idle_timer;
> > > > +    bool idle_timer_active;
> > > >  };
> > > >  
> > > > +/*
> > > > + * If a CPU with RCU callbacks queued goes idle, when the
> > > > grace period is
> > > > + * not finished yet, how can we make sure that the callbacks
> > > > will 
> > 
> > eventually
> > > > + * be executed? In Linux (2.6.21, the first "tickless idle"
> > > > Linux kernel),
> > > > + * the periodic timer tick would not be stopped for such CPU.
> > > > Here in Xen,
> > > > + * we (may) don't even have a periodic timer tick, so we need
> > > > to use a
> > > > + * special purpose timer.
> > > > + *
> > > > + * Such timer:
> > > > + * 1) is armed only when a CPU with an RCU callback(s) queued
> > > > goes idle
> > > > + *    before the end of the current grace period (_not_ for
> > > > any CPUs that
> > > > + *    go idle!);
> > > > + * 2) when it fires, it is only re-armed if the grace period
> > > > is still
> > > > + *    running;
> > > > + * 3) it is stopped immediately, if the CPU wakes up from idle
> > > > and
> > > > + *    resumes 'normal' execution.
> > > > + *
> > > > + * About how far in the future the timer should be programmed
> > > > each time,
> > > > + * it's hard to tell (guess!!). Since this mimics Linux's
> > > > periodic timer
> > > > + * tick, take values used there as an indication. In Linux
> > > > 2.6.21, tick
> > > > + * period can be 10ms, 4ms, 3.33ms or 1ms. Let's use 10ms, to
> > > > enable
> > > > + * at least some power saving on the CPU that is going idle.
> > > > + */
> > > > +#define RCU_IDLE_TIMER_PERIOD MILLISECS(10)
> > > 
> > > With you even mentioning that the original Linux code has ways
> > > to use different values, wouldn't it be worth allowing this to be
> > > command line controllable?
> > 
> > Dario is on holiday, and I think it would be good to get this
> > functionality in sooner rather than later to shake out as many bugs
> > as
> > possible.  Would you be willing to let the idle timer period be set
> > with
> > a follow-up patch?
> 
So, I'm back, and can do such a patch.

Do we want to enforce a maximum value, to try to at least avoid severe
injuries, even for users that shot themselves in the foot? Or we just
accept anything which is below STIME_MAX?

I personally would only accept values smaller than 100ms (In fact, I
was keeping it below that level in patch 6, where the heuristics was
implemnted) or, if we really want, 1s.

Going above these values is basically equivalent to saying that the bug
this series has fixed was not really a bug! :-/

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 5/6] xen: RCU: avoid busy waiting until the end of grace period.
  2017-09-05 17:13         ` Dario Faggioli
@ 2017-09-06  9:50           ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2017-09-06  9:50 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, George Dunlap, Tim Deegan, Julien Grall, xen-devel

>>> On 05.09.17 at 19:13, <dario.faggioli@citrix.com> wrote:
> On Wed, 2017-08-30 at 01:18 -0600, Jan Beulich wrote:
>> > > > On 29.08.17 at 18:06, <george.dunlap@citrix.com> wrote:
>> > Dario is on holiday, and I think it would be good to get this
>> > functionality in sooner rather than later to shake out as many bugs
>> > as
>> > possible.  Would you be willing to let the idle timer period be set
>> > with
>> > a follow-up patch?
>> 
> So, I'm back, and can do such a patch.
> 
> Do we want to enforce a maximum value, to try to at least avoid severe
> injuries, even for users that shot themselves in the foot? Or we just
> accept anything which is below STIME_MAX?
> 
> I personally would only accept values smaller than 100ms (In fact, I
> was keeping it below that level in patch 6, where the heuristics was
> implemnted) or, if we really want, 1s.
> 
> Going above these values is basically equivalent to saying that the bug
> this series has fixed was not really a bug! :-/

I think an upper limit would be good to have.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 6/6] xen: try to prevent idle timer from firing too often.
  2017-08-29 16:30   ` George Dunlap
@ 2017-09-06 16:51     ` Dario Faggioli
  2017-09-07  9:24       ` George Dunlap
  0 siblings, 1 reply; 21+ messages in thread
From: Dario Faggioli @ 2017-09-06 16:51 UTC (permalink / raw)
  To: George Dunlap, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 3057 bytes --]

On Tue, 2017-08-29 at 17:30 +0100, George Dunlap wrote:
> On 08/18/2017 07:04 PM, Dario Faggioli wrote:
> > 
> > What we're trying to avoid is one of those idle CPUs to
> > wake up, only to discover that the grace period is still
> > running, and that it hence could have be slept longer
> > (saving more power).
> 
> So I think we're only taking about one or two extra wakeups per cpu
> maximum -- if this even happens at all.
> 
Yep, indeed.

> Wouldn't it be better to first add a performance counter, and check
> to
> see if and how often this situation happens?
> 
The counter is there already. It's rcu_idle_timer ("RCU: idle_timer").

> > This patch implements an heuristic aimed at achieving
> > that, at the price of having to call cpumask_weight() on
> > the 'entering idle' path, on CPUs with queued callbacks.
> 
> The heuristic seems a bit strange to me too: why would each cpu
> increase
> the grace period in a linear fashion?  I haven't looked at the grace
> period code, but I would have expected each one to be independent;
> and
> so there'd be a "diminishing returns" effect when adding more cpus.
> 
I like the idea, in general. Let me just double check whether I'm
understanding what you're suggesting properly.

First of all, what do you mean with "adding more cpus"? Adding to what?
 The timer is useful when a CPU, which is participating in the grace
period, goes idle, while the grace period is not finished. From that
point on, the number of CPUs participating to _that_ grace period will
monotonically decrease, until it reaches zero. So what does 'adding'
means in this context?

> If we have to have something like this (which I'm not at all
> convinced
> we do), I would think having a single number which self-adjusted so
> that
> the number of 'misses' was around 1% would be a good balance.  What
> about a heuristic like this:
> 
> 1. If the timer goes off and the grace period isn't over, add 10ms to
> the timer period
> 2. If the timer goes off and the grace period *is* over, subtract
> 100us
> from the timer period
> 
So, let's say we start with a period of 1ms. First time RCU is used,
the timer fires twice: the first time it finds the grace period is
still ongoning --and hence the period becomes 11ms-- while the seconds
finds it over --so the period is now 10.9ms.

Next time RCU is used, if the timer is necessary, we use 10.9ms.

Am I getting your proposal right?

If yes, do we allow the period to become smaller than the initial value
(1ms, in the example above). I'd say we better not (or that we at least
set a lower bound), or, given enough occurrences of cases when the
timer fires and finds the grace period to be over in a row, the period
can become 0!

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 6/6] xen: try to prevent idle timer from firing too often.
  2017-09-06 16:51     ` Dario Faggioli
@ 2017-09-07  9:24       ` George Dunlap
  0 siblings, 0 replies; 21+ messages in thread
From: George Dunlap @ 2017-09-07  9:24 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich

On 09/06/2017 05:51 PM, Dario Faggioli wrote:
> On Tue, 2017-08-29 at 17:30 +0100, George Dunlap wrote:
>> On 08/18/2017 07:04 PM, Dario Faggioli wrote:
>>>
>>> What we're trying to avoid is one of those idle CPUs to
>>> wake up, only to discover that the grace period is still
>>> running, and that it hence could have be slept longer
>>> (saving more power).
>>
>> So I think we're only taking about one or two extra wakeups per cpu
>> maximum -- if this even happens at all.
>>
> Yep, indeed.
> 
>> Wouldn't it be better to first add a performance counter, and check
>> to
>> see if and how often this situation happens?
>>
> The counter is there already. It's rcu_idle_timer ("RCU: idle_timer").

And do you (or maybe Julien or Stefano) have an idea how often this
happens with a fixed timer like patch 5?

>>> This patch implements an heuristic aimed at achieving
>>> that, at the price of having to call cpumask_weight() on
>>> the 'entering idle' path, on CPUs with queued callbacks.
>>
>> The heuristic seems a bit strange to me too: why would each cpu
>> increase
>> the grace period in a linear fashion?  I haven't looked at the grace
>> period code, but I would have expected each one to be independent;
>> and
>> so there'd be a "diminishing returns" effect when adding more cpus.
>>
> I like the idea, in general. Let me just double check whether I'm
> understanding what you're suggesting properly.
> 
> First of all, what do you mean with "adding more cpus"? Adding to what?
>  The timer is useful when a CPU, which is participating in the grace
> period, goes idle, while the grace period is not finished. From that
> point on, the number of CPUs participating to _that_ grace period will
> monotonically decrease, until it reaches zero. So what does 'adding'
> means in this context?

'Adding' means 'adding more pcpus to the rcu operation'.

The formula in this patch for the wait time was K * N, where N is the
number of cpus involved in the operation.  Thus, the calculated wait
time will scale linearly with the number of cpus.  An RCU operation with
2 pcups will be 2K, while an operation with 4 cpus will be 4K.  This
implies that the *time you think it will take to complete an RCU
transaction* will scale linearly with the number of cpus -- that on
average, an operation with 4 cpus will take twice as long for all cpus
to complete their grace periods as an operation with 2 cpus, and
operation with 6 cpus will take three times as long as an operation with
2 cpus.

This seems unlikely to me: It seems to me like like statistically the
amount of time taken will converge to some limit as the number of pcpus
involved in the operation grows.

>> If we have to have something like this (which I'm not at all
>> convinced
>> we do), I would think having a single number which self-adjusted so
>> that
>> the number of 'misses' was around 1% would be a good balance.  What
>> about a heuristic like this:
>>
>> 1. If the timer goes off and the grace period isn't over, add 10ms to
>> the timer period
>> 2. If the timer goes off and the grace period *is* over, subtract
>> 100us
>> from the timer period
>>
> So, let's say we start with a period of 1ms. First time RCU is used,
> the timer fires twice: the first time it finds the grace period is
> still ongoning --and hence the period becomes 11ms-- while the seconds
> finds it over --so the period is now 10.9ms.
> 
> Next time RCU is used, if the timer is necessary, we use 10.9ms.
> 
> Am I getting your proposal right?
> 
> If yes, do we allow the period to become smaller than the initial value
> (1ms, in the example above). I'd say we better not (or that we at least
> set a lower bound), or, given enough occurrences of cases when the
> timer fires and finds the grace period to be over in a row, the period
> can become 0!

Yes, just about -- I was thinking of just starting at 10ms rather than
at 1ms, but everything else is pretty accurate.  And yes of course we
probably want an upper and lower bound on the timeout.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-09-07  9:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-18 18:04 [PATCH v3 0/6] xen: RCU: x86/ARM: Add support of rcu_idle_{enter, exit} Dario Faggioli
2017-08-18 18:04 ` [PATCH v3 1/6] xen: in do_softirq() sample smp_processor_id() once and for all Dario Faggioli
2017-08-18 18:04 ` [PATCH v3 2/6] xen: ARM: suspend the tick (if in use) when going idle Dario Faggioli
2017-08-18 18:04 ` [PATCH v3 3/6] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started Dario Faggioli
2017-08-22 12:59   ` Jan Beulich
2017-08-29 14:58   ` George Dunlap
2017-08-18 18:04 ` [PATCH v3 4/6] xen: RCU: don't let a CPU with a callback go idle Dario Faggioli
2017-08-29 15:01   ` George Dunlap
2017-08-18 18:04 ` [PATCH v3 5/6] xen: RCU: avoid busy waiting until the end of grace period Dario Faggioli
2017-08-22 13:04   ` Jan Beulich
2017-08-29 16:06     ` George Dunlap
2017-08-30  7:18       ` Jan Beulich
2017-08-30 11:16         ` George Dunlap
2017-09-05 17:13         ` Dario Faggioli
2017-09-06  9:50           ` Jan Beulich
2017-08-29 16:03   ` George Dunlap
2017-08-18 18:04 ` [PATCH v3 6/6] xen: try to prevent idle timer from firing too often Dario Faggioli
2017-08-29 16:30   ` George Dunlap
2017-09-06 16:51     ` Dario Faggioli
2017-09-07  9:24       ` George Dunlap
2017-08-19  9:45 ` [PATCH v3 0/6] xen: RCU: x86/ARM: Add support of rcu_idle_{enter, exit} Tim Deegan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.