All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nohz/cpuset: Make a CPU stick with do_timer() duty in the presence of nohz cpusets
@ 2012-06-17  6:35 Hakan Akkan
  2012-11-19 14:46 ` Frederic Weisbecker
  0 siblings, 1 reply; 9+ messages in thread
From: Hakan Akkan @ 2012-06-17  6:35 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: LKML

An adaptive nohz (AHZ) CPU may not do do_timer() for a while
despite being non-idle. When all other CPUs are idle, AHZ
CPUs might be using stale jiffies values. To prevent this
always keep a CPU with ticks if there is one or more AHZ
CPUs.

The patch changes can_stop_{idle,adaptive}_tick functions
and prevents either the last CPU who did the do_timer() duty
or the AHZ CPU itself from stopping its sched timer if there
is one or more AHZ CPUs in the system. This means AHZ CPUs
might keep the ticks running for short periods until a
non-AHZ CPU takes the charge away in
tick_do_timer_check_handler() function. When a non-AHZ CPU
takes the charge, it never gives it away so that AHZ CPUs
can run tickless.

Signed-off-by: Hakan Akkan <hakanakkan@gmail.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/cpuset.h   |    3 ++-
 kernel/cpuset.c          |    5 +++++
 kernel/time/tick-sched.c |   31 ++++++++++++++++++++++++++++++-
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index ccbc2fd..19aa448 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -266,11 +266,12 @@ static inline bool cpuset_adaptive_nohz(void)
 
 extern void cpuset_exit_nohz_interrupt(void *unused);
 extern void cpuset_nohz_flush_cputimes(void);
+extern bool nohz_cpu_exist(void);
 #else
 static inline bool cpuset_cpu_adaptive_nohz(int cpu) { return false; }
 static inline bool cpuset_adaptive_nohz(void) { return false; }
 static inline void cpuset_nohz_flush_cputimes(void) { }
-
+static inline bool nohz_cpu_exist(void) { return false; }
 #endif /* CONFIG_CPUSETS_NO_HZ */
 
 #endif /* _LINUX_CPUSET_H */
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 858217b..ccbaac9 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1231,6 +1231,11 @@ DEFINE_PER_CPU(int, cpu_adaptive_nohz_ref);
 
 static cpumask_t nohz_cpuset_mask;
 
+inline bool nohz_cpu_exist(void)
+{
+	return !cpumask_empty(&nohz_cpuset_mask);
+}
+
 static void flush_cputime_interrupt(void *unused)
 {
 	trace_printk("IPI: flush cputime\n");
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index bdc8aeb..e60d541 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -409,6 +409,25 @@ out:
 	return ret;
 }
 
+static inline bool must_take_timer_duty(int cpu)
+{
+	int handler = tick_do_timer_cpu;
+	bool ret = false;
+	bool tick_needed = nohz_cpu_exist();
+
+	/*
+	 * A CPU will have to take the timer duty if there is an adaptive
+	 * nohz CPU in the system. The last handler == cpu check ensures
+	 * that the last cpu that did the do_timer() sticks with the duty.
+	 * A normal (non nohz) cpu will take the charge from a nohz cpu in
+	 * tick_do_timer_check_handler anyway.
+	 */
+	if (tick_needed && (handler == TICK_DO_TIMER_NONE || handler == cpu))
+		ret = true;
+
+	return ret;
+}
+
 static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 {
 	/*
@@ -421,6 +440,9 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 	if (unlikely(!cpu_online(cpu))) {
 		if (cpu == tick_do_timer_cpu)
 			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+	} else if (must_take_timer_duty(cpu)) {
+		tick_do_timer_cpu = cpu;
+		return false;
 	}
 
 	if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))
@@ -512,6 +534,13 @@ void tick_nohz_idle_enter(void)
 #ifdef CONFIG_CPUSETS_NO_HZ
 static bool can_stop_adaptive_tick(void)
 {
+	int cpu = smp_processor_id();
+
+	if (must_take_timer_duty(cpu)) {
+		tick_do_timer_cpu = cpu;
+		return false;
+	}
+
 	if (!sched_can_stop_tick())
 		return false;
 
@@ -519,7 +548,7 @@ static bool can_stop_adaptive_tick(void)
 		return false;
 
 	/* Is there a grace period to complete ? */
-	if (rcu_pending(smp_processor_id()))
+	if (rcu_pending(cpu))
 		return false;
 
 	return true;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] nohz/cpuset: Make a CPU stick with do_timer() duty in the presence of nohz cpusets
  2012-06-17  6:35 [PATCH] nohz/cpuset: Make a CPU stick with do_timer() duty in the presence of nohz cpusets Hakan Akkan
@ 2012-11-19 14:46 ` Frederic Weisbecker
  2012-11-20  0:27   ` Hakan Akkan
  0 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2012-11-19 14:46 UTC (permalink / raw)
  To: Hakan Akkan
  Cc: LKML, Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar

Hi Hakan,

As I start to focus on timekeeing for full dynticks, I'm looking at
your patch. Sorry I haven't yet replied with a serious review until
now. But here is it, finally:

2012/6/17 Hakan Akkan <hakanakkan@gmail.com>:
> An adaptive nohz (AHZ) CPU may not do do_timer() for a while
> despite being non-idle. When all other CPUs are idle, AHZ
> CPUs might be using stale jiffies values. To prevent this
> always keep a CPU with ticks if there is one or more AHZ
> CPUs.
>
> The patch changes can_stop_{idle,adaptive}_tick functions
> and prevents either the last CPU who did the do_timer() duty
> or the AHZ CPU itself from stopping its sched timer if there
> is one or more AHZ CPUs in the system. This means AHZ CPUs
> might keep the ticks running for short periods until a
> non-AHZ CPU takes the charge away in
> tick_do_timer_check_handler() function. When a non-AHZ CPU
> takes the charge, it never gives it away so that AHZ CPUs
> can run tickless.
>
> Signed-off-by: Hakan Akkan <hakanakkan@gmail.com>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  include/linux/cpuset.h   |    3 ++-
>  kernel/cpuset.c          |    5 +++++
>  kernel/time/tick-sched.c |   31 ++++++++++++++++++++++++++++++-
>  3 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
> index ccbc2fd..19aa448 100644
> --- a/include/linux/cpuset.h
> +++ b/include/linux/cpuset.h
> @@ -266,11 +266,12 @@ static inline bool cpuset_adaptive_nohz(void)
>
>  extern void cpuset_exit_nohz_interrupt(void *unused);
>  extern void cpuset_nohz_flush_cputimes(void);
> +extern bool nohz_cpu_exist(void);
>  #else
>  static inline bool cpuset_cpu_adaptive_nohz(int cpu) { return false; }
>  static inline bool cpuset_adaptive_nohz(void) { return false; }
>  static inline void cpuset_nohz_flush_cputimes(void) { }
> -
> +static inline bool nohz_cpu_exist(void) { return false; }
>  #endif /* CONFIG_CPUSETS_NO_HZ */
>
>  #endif /* _LINUX_CPUSET_H */
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index 858217b..ccbaac9 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -1231,6 +1231,11 @@ DEFINE_PER_CPU(int, cpu_adaptive_nohz_ref);
>
>  static cpumask_t nohz_cpuset_mask;
>
> +inline bool nohz_cpu_exist(void)
> +{
> +       return !cpumask_empty(&nohz_cpuset_mask);
> +}
> +
>  static void flush_cputime_interrupt(void *unused)
>  {
>         trace_printk("IPI: flush cputime\n");
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index bdc8aeb..e60d541 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -409,6 +409,25 @@ out:
>         return ret;
>  }
>
> +static inline bool must_take_timer_duty(int cpu)
> +{
> +       int handler = tick_do_timer_cpu;
> +       bool ret = false;
> +       bool tick_needed = nohz_cpu_exist();

Note this is racy because this fetches the value of the
nohz_cpuset_mask without locking.
We may "see" there is no nohz cpusets whereas we just set some of them
as nohz and they
could even have shut down their tick already.

> +
> +       /*
> +        * A CPU will have to take the timer duty if there is an adaptive
> +        * nohz CPU in the system. The last handler == cpu check ensures
> +        * that the last cpu that did the do_timer() sticks with the duty.
> +        * A normal (non nohz) cpu will take the charge from a nohz cpu in
> +        * tick_do_timer_check_handler anyway.
> +        */
> +       if (tick_needed && (handler == TICK_DO_TIMER_NONE || handler == cpu))
> +               ret = true;

This check is also racy due to the lack of locking. The previous
handler may have set TICK_DO_TIMER_NONE and gone to sleep. We have no
guarantee that the CPU can see that new value. It could believe there
is still a handler. This needs at least cmpxchg() to make the test and
set atomic.

> +
> +       return ret;
> +}
> +
>  static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
>  {
>         /*
> @@ -421,6 +440,9 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
>         if (unlikely(!cpu_online(cpu))) {
>                 if (cpu == tick_do_timer_cpu)
>                         tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> +       } else if (must_take_timer_duty(cpu)) {
> +               tick_do_timer_cpu = cpu;
> +               return false;
>         }
>
>         if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))
> @@ -512,6 +534,13 @@ void tick_nohz_idle_enter(void)
>  #ifdef CONFIG_CPUSETS_NO_HZ
>  static bool can_stop_adaptive_tick(void)
>  {
> +       int cpu = smp_processor_id();
> +
> +       if (must_take_timer_duty(cpu)) {
> +               tick_do_timer_cpu = cpu;
> +               return false;

One problem I see here is that you randomize the handler. It could be
an adaptive nohz CPU or an idle CPU. It's a problem if the user wants
CPU isolation.

I suggest to rather define a tunable timekeeping duty CPU affinity in
a cpumask file at /sys/devices/system/cpu/timekeeping and a toggle at
/sys/devices/system/cpu/cpuX/timekeeping (like the online file). This
way the user can decide whether adaptive nohz CPU can handle
timekeeping or this must be forced to other CPUs in order to enforce
isolation.

Concerning implementation details, a way to avoid locking while
ensuring there is always a CPU handling timekeeping is to define some
state machine through tick_do_timer_cpu values like the below
scenario:

* tick_do_timer_cpu = some CPU. In this case the timekeeping has a
running handler, everything is fine

* tick_do_timer_cpu = TICK_DO_TIMER_IDLE. The timekeeping handler
wants to go idle and stop its tick. He's going to keep its tick and
the timekeeping duty until some other CPU sees this value and sets
itself as the new handler. Then the previous handler can finally stop
its tick.

* tick_do_timer_cpu = TICK_DO_TIMER_NONE. There is no handler
currently. The next handler is the next CPU that will restart its
tick. This value is only possible if cpu_online_mask =
cpu_timekeeping_mask and there is no adaptive nohz running.

Does that look ok?

If so, tell me if you want to rework this patch with these
suggestions. Or I can do it, either way.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nohz/cpuset: Make a CPU stick with do_timer() duty in the presence of nohz cpusets
  2012-11-19 14:46 ` Frederic Weisbecker
@ 2012-11-20  0:27   ` Hakan Akkan
  2012-11-20  1:17     ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Hakan Akkan @ 2012-11-20  0:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar

On Mon, Nov 19, 2012 at 7:46 AM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> Hi Hakan,
>
> As I start to focus on timekeeing for full dynticks, I'm looking at
> your patch. Sorry I haven't yet replied with a serious review until
> now. But here is it, finally:
>
> 2012/6/17 Hakan Akkan <hakanakkan@gmail.com>:
> > An adaptive nohz (AHZ) CPU may not do do_timer() for a while
> > despite being non-idle. When all other CPUs are idle, AHZ
> > CPUs might be using stale jiffies values. To prevent this
> > always keep a CPU with ticks if there is one or more AHZ
> > CPUs.
> >
> > The patch changes can_stop_{idle,adaptive}_tick functions
> > and prevents either the last CPU who did the do_timer() duty
> > or the AHZ CPU itself from stopping its sched timer if there
> > is one or more AHZ CPUs in the system. This means AHZ CPUs
> > might keep the ticks running for short periods until a
> > non-AHZ CPU takes the charge away in
> > tick_do_timer_check_handler() function. When a non-AHZ CPU
> > takes the charge, it never gives it away so that AHZ CPUs
> > can run tickless.
> >
> > Signed-off-by: Hakan Akkan <hakanakkan@gmail.com>
> > CC: Frederic Weisbecker <fweisbec@gmail.com>
> > ---
> >  include/linux/cpuset.h   |    3 ++-
> >  kernel/cpuset.c          |    5 +++++
> >  kernel/time/tick-sched.c |   31 ++++++++++++++++++++++++++++++-
> >  3 files changed, 37 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
> > index ccbc2fd..19aa448 100644
> > --- a/include/linux/cpuset.h
> > +++ b/include/linux/cpuset.h
> > @@ -266,11 +266,12 @@ static inline bool cpuset_adaptive_nohz(void)
> >
> >  extern void cpuset_exit_nohz_interrupt(void *unused);
> >  extern void cpuset_nohz_flush_cputimes(void);
> > +extern bool nohz_cpu_exist(void);
> >  #else
> >  static inline bool cpuset_cpu_adaptive_nohz(int cpu) { return false; }
> >  static inline bool cpuset_adaptive_nohz(void) { return false; }
> >  static inline void cpuset_nohz_flush_cputimes(void) { }
> > -
> > +static inline bool nohz_cpu_exist(void) { return false; }
> >  #endif /* CONFIG_CPUSETS_NO_HZ */
> >
> >  #endif /* _LINUX_CPUSET_H */
> > diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> > index 858217b..ccbaac9 100644
> > --- a/kernel/cpuset.c
> > +++ b/kernel/cpuset.c
> > @@ -1231,6 +1231,11 @@ DEFINE_PER_CPU(int, cpu_adaptive_nohz_ref);
> >
> >  static cpumask_t nohz_cpuset_mask;
> >
> > +inline bool nohz_cpu_exist(void)
> > +{
> > +       return !cpumask_empty(&nohz_cpuset_mask);
> > +}
> > +
> >  static void flush_cputime_interrupt(void *unused)
> >  {
> >         trace_printk("IPI: flush cputime\n");
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index bdc8aeb..e60d541 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -409,6 +409,25 @@ out:
> >         return ret;
> >  }
> >
> > +static inline bool must_take_timer_duty(int cpu)
> > +{
> > +       int handler = tick_do_timer_cpu;
> > +       bool ret = false;
> > +       bool tick_needed = nohz_cpu_exist();
>
> Note this is racy because this fetches the value of the
> nohz_cpuset_mask without locking.
> We may "see" there is no nohz cpusets whereas we just set some of them
> as nohz and they
> could even have shut down their tick already.

Yes, this is racy indeed. Thanks for pointing out.

>
>
> > +
> > +       /*
> > +        * A CPU will have to take the timer duty if there is an adaptive
> > +        * nohz CPU in the system. The last handler == cpu check ensures
> > +        * that the last cpu that did the do_timer() sticks with the duty.
> > +        * A normal (non nohz) cpu will take the charge from a nohz cpu in
> > +        * tick_do_timer_check_handler anyway.
> > +        */
> > +       if (tick_needed && (handler == TICK_DO_TIMER_NONE || handler == cpu))
> > +               ret = true;
>
> This check is also racy due to the lack of locking. The previous
> handler may have set TICK_DO_TIMER_NONE and gone to sleep. We have no
> guarantee that the CPU can see that new value. It could believe there
> is still a handler. This needs at least cmpxchg() to make the test and
> set atomic.
>
> > +
> > +       return ret;
> > +}
> > +
> >  static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
> >  {
> >         /*
> > @@ -421,6 +440,9 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
> >         if (unlikely(!cpu_online(cpu))) {
> >                 if (cpu == tick_do_timer_cpu)
> >                         tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> > +       } else if (must_take_timer_duty(cpu)) {
> > +               tick_do_timer_cpu = cpu;
> > +               return false;
> >         }
> >
> >         if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))
> > @@ -512,6 +534,13 @@ void tick_nohz_idle_enter(void)
> >  #ifdef CONFIG_CPUSETS_NO_HZ
> >  static bool can_stop_adaptive_tick(void)
> >  {
> > +       int cpu = smp_processor_id();
> > +
> > +       if (must_take_timer_duty(cpu)) {
> > +               tick_do_timer_cpu = cpu;
> > +               return false;
>
> One problem I see here is that you randomize the handler. It could be
> an adaptive nohz CPU or an idle CPU. It's a problem if the user wants
> CPU isolation.

Yes, an adaptive nohz CPU might be forced to take the duty if nobody else
is doing it. However, as soon as a regular CPU notices this it takes the duty
and never gives it up as long as there are nohz cpusets in the system.

>
> I suggest to rather define a tunable timekeeping duty CPU affinity in
> a cpumask file at /sys/devices/system/cpu/timekeeping and a toggle at
> /sys/devices/system/cpu/cpuX/timekeeping (like the online file). This
> way the user can decide whether adaptive nohz CPU can handle
> timekeeping or this must be forced to other CPUs in order to enforce
> isolation.

Well, users want tickless CPUs because they don't want timekeeping
(or any other kernel activity for that matter) to run in there. So, I believe
having that "timekeeping affinity" stay in the regular CPUs is good enough.
Please let me know how users could utilize these control files to do anything
other than keeping the timekeeping out of adaptive nohz CPUs.

>
> Concerning implementation details, a way to avoid locking while
> ensuring there is always a CPU handling timekeeping is to define some
> state machine through tick_do_timer_cpu values like the below
> scenario:
>
> * tick_do_timer_cpu = some CPU. In this case the timekeeping has a
> running handler, everything is fine
>
> * tick_do_timer_cpu = TICK_DO_TIMER_IDLE. The timekeeping handler
> wants to go idle and stop its tick. He's going to keep its tick and
> the timekeeping duty until some other CPU sees this value and sets
> itself as the new handler. Then the previous handler can finally stop
> its tick.
>
> * tick_do_timer_cpu = TICK_DO_TIMER_NONE. There is no handler
> currently. The next handler is the next CPU that will restart its
> tick. This value is only possible if cpu_online_mask =
> cpu_timekeeping_mask and there is no adaptive nohz running.
>
> Does that look ok?

Something like this together with a mechanism to prevent regular CPUs
stopping the tick for being idle when adaptive nohz CPUs are doing their
things is necessary. I'll rework the patch and resend soon.

>
> If so, tell me if you want to rework this patch with these
> suggestions. Or I can do it, either way.
>
> Thanks.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nohz/cpuset: Make a CPU stick with do_timer() duty in the presence of nohz cpusets
  2012-11-20  0:27   ` Hakan Akkan
@ 2012-11-20  1:17     ` Steven Rostedt
  2012-11-20  1:52       ` Frederic Weisbecker
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2012-11-20  1:17 UTC (permalink / raw)
  To: Hakan Akkan
  Cc: Frederic Weisbecker, LKML, Thomas Gleixner, Peter Zijlstra, Ingo Molnar

On Mon, 2012-11-19 at 17:27 -0700, Hakan Akkan wrote:

> >
> > I suggest to rather define a tunable timekeeping duty CPU affinity in
> > a cpumask file at /sys/devices/system/cpu/timekeeping and a toggle at
> > /sys/devices/system/cpu/cpuX/timekeeping (like the online file). This
> > way the user can decide whether adaptive nohz CPU can handle
> > timekeeping or this must be forced to other CPUs in order to enforce
> > isolation.
> 
> Well, users want tickless CPUs because they don't want timekeeping
> (or any other kernel activity for that matter) to run in there. So, I believe
> having that "timekeeping affinity" stay in the regular CPUs is good enough.
> Please let me know how users could utilize these control files to do anything
> other than keeping the timekeeping out of adaptive nohz CPUs.

I agree. If we already have some /sys cpumask that denotes which CPUs
will be adaptive NO_HZ (or simply isolated) then just keep the tick from
ever going on those CPUs. If all but one CPU is set for nohz, and that
one CPU goes idle, it should still be the one doing the tick.

-- Steve



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nohz/cpuset: Make a CPU stick with do_timer() duty in the presence of nohz cpusets
  2012-11-20  1:17     ` Steven Rostedt
@ 2012-11-20  1:52       ` Frederic Weisbecker
  2012-11-28  8:29         ` Hakan Akkan
  0 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2012-11-20  1:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Hakan Akkan, LKML, Thomas Gleixner, Peter Zijlstra, Ingo Molnar

2012/11/20 Steven Rostedt <rostedt@goodmis.org>:
> On Mon, 2012-11-19 at 17:27 -0700, Hakan Akkan wrote:
>
>> >
>> > I suggest to rather define a tunable timekeeping duty CPU affinity in
>> > a cpumask file at /sys/devices/system/cpu/timekeeping and a toggle at
>> > /sys/devices/system/cpu/cpuX/timekeeping (like the online file). This
>> > way the user can decide whether adaptive nohz CPU can handle
>> > timekeeping or this must be forced to other CPUs in order to enforce
>> > isolation.
>>
>> Well, users want tickless CPUs because they don't want timekeeping
>> (or any other kernel activity for that matter) to run in there. So, I believe
>> having that "timekeeping affinity" stay in the regular CPUs is good enough.
>> Please let me know how users could utilize these control files to do anything
>> other than keeping the timekeeping out of adaptive nohz CPUs.
>
> I agree. If we already have some /sys cpumask that denotes which CPUs
> will be adaptive NO_HZ (or simply isolated) then just keep the tick from
> ever going on those CPUs. If all but one CPU is set for nohz, and that
> one CPU goes idle, it should still be the one doing the tick.

If you want isolation on your full dynticks CPU it's right. Now you
could have lower requirements, a different policy that rather enforce
energy saving.

But I realize we can integrate such granularity later if users request
it and take the behaviour you both describe as the default for now. So
let's take that direction.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nohz/cpuset: Make a CPU stick with do_timer() duty in the presence of nohz cpusets
  2012-11-20  1:52       ` Frederic Weisbecker
@ 2012-11-28  8:29         ` Hakan Akkan
  2012-11-28 18:03           ` Hakan Akkan
  0 siblings, 1 reply; 9+ messages in thread
From: Hakan Akkan @ 2012-11-28  8:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Hakan Akkan, Frederic Weisbecker, Thomas Gleixner,
	Steven Rostedt, Peter Zijlstra, Ingo Molnar

An adaptive nohz (AHZ) CPU may not do do_timer() for a while
despite being non-idle. When all other CPUs are idle, AHZ
CPUs might be using stale jiffies values. To prevent this
always keep a CPU with ticks if there is one or more AHZ
CPUs.

A new function, check_drop_timer_duty, handles the updates
to tick_do_timer_cpu value and makes sure that the jiffies
update is done when there are non-idle adaptive-nohz CPUs.
Also added is a new field in struct tick_sched to indicate
if CPU is ready to run something other than the idle task
without ticks once it drops the do_timer() duty. This also
facilitates the system-wide tick shut down when all CPUs,
including AHZ CPUs, are idle.

Signed-off-by: Hakan Akkan <hakanakkan@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 include/linux/tick.h     |    2 +
 kernel/time/tick-sched.c |  235 ++++++++++++++++++++++++++++++++++++----------
 2 files changed, 188 insertions(+), 49 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 93add37..0a65dfb 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -54,6 +54,7 @@ enum tick_saved_jiffies {
  * @iowait_sleeptime:		Sum of the time slept in idle with sched tick stopped, with IO outstanding
  * @sleep_length:		Duration of the current idle sleep
  * @do_timer_lst:		CPU was the last one doing do_timer before going idle
+ * @user_nohz:			CPU wants to switch to adaptive nohz mode
  */
 struct tick_sched {
 	struct hrtimer			sched_timer;
@@ -77,6 +78,7 @@ struct tick_sched {
 	unsigned long			next_jiffies;
 	ktime_t				idle_expires;
 	int				do_timer_last;
+	int				user_nohz;
 };
 
 extern void __init tick_init(void);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index bdc8aeb..3ff9dc5 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -172,6 +172,130 @@ update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_upda
 
 }
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+/*
+ * This defines the number of CPUs currently in (or wanting to
+ * be in) adaptive nohz mode. Greater than 0 means at least
+ * one CPU is ready to shut down its tick for non-idle purposes.
+ */
+static atomic_t __read_mostly nr_cpus_user_nohz = ATOMIC_INIT(0);
+
+static inline int update_do_timer_cpu(int current_handler,
+			int new_handler)
+{
+	return cmpxchg(&tick_do_timer_cpu, current_handler, new_handler);
+}
+#else
+static inline int update_do_timer_cpu(int current_handler,
+			int new_handler)
+{
+	int tmp = ACCESS_ONCE(tick_do_timer_cpu);
+	tick_do_timer_cpu = new_handler;
+	return tmp;
+}
+#endif
+
+/*
+ * check_drop_timer_duty: Check if this cpu can shut down
+ * ticks without worrying about who is going to handle
+ * timekeeping. The duty is dropped here as well if possible.
+ * When there are adaptive nohz cpus in the system ready to
+ * run user tasks without ticks, this function makes sure
+ * that timekeeping is handled by a cpu. A non-adaptive-nohz
+ * cpu, if any, will claim the duty as soon as it discovers
+ * that some adaptive-nohz cpu is stuck with it.
+ *
+ * Returns
+ *  0 if cpu has to keep ticks on for timekeeping,
+ *  1 if cpu drops the duty inside this function and can shut
+ *    down its ticks,
+ *  2 if cpu did not have the duty anyway.
+ */
+static int check_drop_timer_duty(int cpu)
+{
+	int curr_handler, prev_handler, new_handler;
+	int nrepeat = -1;
+	bool drop_recheck;
+
+repeat:
+	WARN_ON_ONCE(++nrepeat > 1);
+	drop_recheck = false;
+	curr_handler = cpu;
+	new_handler = TICK_DO_TIMER_NONE;
+
+#ifdef CONFIG_CPUSETS_NO_HZ
+	if (atomic_read(&nr_cpus_user_nohz) > 0) {
+		curr_handler = ACCESS_ONCE(tick_do_timer_cpu);
+		/*
+		 * Keep the duty until someone takes it away.
+		 * FIXME: Make nr_cpus_user_nohz an atomic cpumask
+		 * to find an idle CPU to dump the duty at.
+		 */
+		if (curr_handler == cpu)
+			return 0;
+		/*
+		 * This cpu will try to take the duty if 1) there is
+		 * no handler or 2) current handler seems to be an
+		 * adaptive-nohz cpu. We take the duty from others
+		 * only if the we are idle or not part of an
+		 * adaptive-nohz cpuset.
+		 * Once we take the duty, the check above ensures that
+		 * we stick with it.
+		 */
+		if (unlikely(curr_handler == TICK_DO_TIMER_NONE)
+			|| (per_cpu(tick_cpu_sched, curr_handler).user_nohz
+				&& (is_idle_task(current)
+					|| !cpuset_cpu_adaptive_nohz(cpu))))
+			new_handler = cpu;
+		else
+			/*
+			 * A regular CPU is updating the jiffies and we don't
+			 * have to take it away from her.
+			 */
+			new_handler = curr_handler;
+	} else {
+		/*
+		 * We might miss nr_cpus_user_nohz update and drop the duty
+		 * whereas other CPUs think that we keep handling the
+		 * timekeeping. To prevent this, we recheck its value after
+		 * we update the timer_do_timer_cpu and start over if
+		 * necessary.
+		 */
+		drop_recheck = true;
+	}
+#endif
+
+	prev_handler = update_do_timer_cpu(curr_handler, new_handler);
+
+	if (drop_recheck && atomic_read(&nr_cpus_user_nohz) > 0)
+		goto repeat;
+
+	if (likely(new_handler != TICK_DO_TIMER_NONE)) {
+		if (prev_handler == curr_handler) {
+			if (new_handler == cpu) {
+				/* We claimed the duty. */
+				return 0;
+			} else {
+				/*
+				 * We know that the previous handler
+				 * still has the duty. We can sleep
+				 * as long as we want.
+				 */
+				return 2;
+			}
+		}
+		/*
+		 * Handler was probably changed under us. Whoever has
+		 * the duty might just drop it and we wouldn't know.
+		 * So, let's try again...
+		 */
+		goto repeat;
+	} else {
+		/* We either just dropped the duty or didn't have it. */
+		return prev_handler == curr_handler ? 1 : 2;
+	}
+}
+
 static void tick_nohz_stop_idle(int cpu, ktime_t now)
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
@@ -187,6 +311,14 @@ static ktime_t tick_nohz_start_idle(int cpu, struct tick_sched *ts)
 	ktime_t now = ktime_get();
 
 	ts->idle_entrytime = now;
+
+#ifdef CONFIG_CPUSETS_NO_HZ
+	if (ts->user_nohz) {
+		ts->user_nohz = 0;
+		WARN_ON_ONCE(atomic_add_negative(-1, &nr_cpus_user_nohz));
+	}
+#endif
+
 	ts->idle_active = 1;
 	sched_clock_idle_sleep_event();
 	return now;
@@ -280,6 +412,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	ktime_t last_update, expires, ret = { .tv64 = 0 };
 	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
 	u64 time_delta;
+	int can_drop_do_timer;
 
 
 	/* Read jiffies and the time when jiffies were updated last */
@@ -308,28 +441,26 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 
 	/* Schedule the tick, if we are at least one jiffie off */
 	if ((long)delta_jiffies >= 1) {
+		/*
+		 * Check if adaptive nohz needs this CPU to take care
+		 * of the jiffies update. We also drop the duty in this
+		 * function if we can.
+		 */
+		can_drop_do_timer = check_drop_timer_duty(cpu);
+		if (!can_drop_do_timer)
+			goto out;
 
 		/*
-		 * If this cpu is the one which updates jiffies, then
-		 * give up the assignment and let it be taken by the
-		 * cpu which runs the tick timer next, which might be
-		 * this cpu as well. If we don't drop this here the
-		 * jiffies might be stale and do_timer() never
-		 * invoked. Keep track of the fact that it was the one
-		 * which had the do_timer() duty last. If this cpu is
-		 * the one which had the do_timer() duty last, we
-		 * limit the sleep time to the timekeeping
-		 * max_deferement value which we retrieved
+		 * If this cpu is the one which had the do_timer()
+		 * duty last, we limit the sleep time to the
+		 * timekeeping max_deferement value which we retrieved
 		 * above. Otherwise we can sleep as long as we want.
 		 */
-		if (cpu == tick_do_timer_cpu) {
-			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+		if (can_drop_do_timer == 1) {
 			ts->do_timer_last = 1;
-		} else if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) {
+		} else {
 			time_delta = KTIME_MAX;
 			ts->do_timer_last = 0;
-		} else if (!ts->do_timer_last) {
-			time_delta = KTIME_MAX;
 		}
 
 		/*
@@ -419,6 +550,10 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 	 * invoked.
 	 */
 	if (unlikely(!cpu_online(cpu))) {
+		/*
+		 * FIXME: Might need some sort of protection
+		 * against CPU hotunplug for adaptive nohz.
+		 */
 		if (cpu == tick_do_timer_cpu)
 			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
 	}
@@ -510,19 +645,24 @@ void tick_nohz_idle_enter(void)
 }
 
 #ifdef CONFIG_CPUSETS_NO_HZ
-static bool can_stop_adaptive_tick(void)
+static bool can_stop_adaptive_tick(struct tick_sched *ts)
 {
-	if (!sched_can_stop_tick())
-		return false;
-
-	if (posix_cpu_timers_running(current))
-		return false;
-
-	/* Is there a grace period to complete ? */
-	if (rcu_pending(smp_processor_id()))
-		return false;
+	int ret = true;
+
+	if (!sched_can_stop_tick()
+			|| posix_cpu_timers_running(current)
+			|| rcu_pending(smp_processor_id()))
+		ret = false;
+
+	if (ret && !ts->user_nohz) {
+		ts->user_nohz = 1;
+		atomic_inc(&nr_cpus_user_nohz);
+	} else if (!ret && ts->user_nohz) {
+		ts->user_nohz = 0;
+		WARN_ON_ONCE(atomic_add_negative(-1, &nr_cpus_user_nohz));
+	}
 
-	return true;
+	return ret;
 }
 
 static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts)
@@ -541,7 +681,7 @@ static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts)
 	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
 		return;
 
-	if (!can_stop_adaptive_tick())
+	if (!can_stop_adaptive_tick(ts))
 		return;
 
 	/*
@@ -990,27 +1130,14 @@ void tick_nohz_exit_exception(struct pt_regs *regs)
 		tick_nohz_exit_kernel();
 }
 
-/*
- * Take the timer duty if nobody is taking care of it.
- * If a CPU already does and and it's in a nohz cpuset,
- * then take the charge so that it can switch to nohz mode.
- */
-static void tick_do_timer_check_handler(int cpu)
+static void tick_nohz_restart_adaptive(struct tick_sched *ts)
 {
-	int handler = tick_do_timer_cpu;
+	tick_nohz_flush_current_times(true);
 
-	if (unlikely(handler == TICK_DO_TIMER_NONE)) {
-		tick_do_timer_cpu = cpu;
-	} else {
-		if (!cpuset_adaptive_nohz() &&
-		    cpuset_cpu_adaptive_nohz(handler))
-			tick_do_timer_cpu = cpu;
+	if (ts->user_nohz) {
+		ts->user_nohz = 0;
+		WARN_ON_ONCE(atomic_add_negative(-1, &nr_cpus_user_nohz));
 	}
-}
-
-static void tick_nohz_restart_adaptive(void)
-{
-	tick_nohz_flush_current_times(true);
 	tick_nohz_restart_sched_tick();
 	clear_thread_flag(TIF_NOHZ);
 	trace_printk("clear TIF_NOHZ\n");
@@ -1022,8 +1149,8 @@ void tick_nohz_check_adaptive(void)
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
 
 	if (ts->tick_stopped && !is_idle_task(current)) {
-		if (!can_stop_adaptive_tick())
-			tick_nohz_restart_adaptive();
+		if (!can_stop_adaptive_tick(ts))
+			tick_nohz_restart_adaptive(ts);
 	}
 }
 
@@ -1033,7 +1160,7 @@ void cpuset_exit_nohz_interrupt(void *unused)
 
 	trace_printk("IPI: Nohz exit\n");
 	if (ts->tick_stopped && !is_idle_task(current))
-		tick_nohz_restart_adaptive();
+		tick_nohz_restart_adaptive(ts);
 }
 
 /*
@@ -1122,7 +1249,17 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 	ktime_t now = ktime_get();
 	int cpu = smp_processor_id();
 
-	tick_do_timer_check_handler(cpu);
+#ifdef CONFIG_NO_HZ
+	/*
+	 * Check if the do_timer duty was dropped. We don't care about
+	 * concurrency: This happens only when the cpu in charge went
+	 * into a long sleep. If two cpus happen to assign themself to
+	 * this duty, then the jiffies update is still serialized by
+	 * xtime_lock.
+	 */
+	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
+		tick_do_timer_cpu = cpu;
+#endif
 
 	/* Check, if the jiffies need an update */
 	if (tick_do_timer_cpu == cpu)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] nohz/cpuset: Make a CPU stick with do_timer() duty in the presence of nohz cpusets
  2012-11-28  8:29         ` Hakan Akkan
@ 2012-11-28 18:03           ` Hakan Akkan
  2012-11-30 17:38             ` Frederic Weisbecker
  0 siblings, 1 reply; 9+ messages in thread
From: Hakan Akkan @ 2012-11-28 18:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Hakan Akkan, Frederic Weisbecker, Thomas Gleixner,
	Steven Rostedt, Peter Zijlstra, Ingo Molnar

(Please ignore the previous patch as it never really prevented
the last idler from going longer than timekeeping_max_deferement
sleeps.)

An adaptive nohz (AHZ) CPU may not do do_timer() for a while
despite being non-idle. When all other CPUs are idle, AHZ
CPUs might be using stale jiffies values. To prevent this
always keep a CPU with ticks if there is one or more AHZ
CPUs.

A new function, check_drop_timer_duty, handles the updates
to tick_do_timer_cpu value and makes sure that the jiffies
update is done when there are non-idle adaptive-nohz CPUs.
Also added is a new field in struct tick_sched to indicate
if CPU is ready to run something other than the idle task
without ticks once it drops the do_timer() duty. This also
facilitates the system-wide tick shut down when all CPUs,
including AHZ CPUs, are idle.

Signed-off-by: Hakan Akkan <hakanakkan@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 include/linux/tick.h     |    2 +
 kernel/time/tick-sched.c |  219 ++++++++++++++++++++++++++++++++++++----------
 2 files changed, 174 insertions(+), 47 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 93add37..0a65dfb 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -54,6 +54,7 @@ enum tick_saved_jiffies {
  * @iowait_sleeptime:		Sum of the time slept in idle with sched tick stopped, with IO outstanding
  * @sleep_length:		Duration of the current idle sleep
  * @do_timer_lst:		CPU was the last one doing do_timer before going idle
+ * @user_nohz:			CPU wants to switch to adaptive nohz mode
  */
 struct tick_sched {
 	struct hrtimer			sched_timer;
@@ -77,6 +78,7 @@ struct tick_sched {
 	unsigned long			next_jiffies;
 	ktime_t				idle_expires;
 	int				do_timer_last;
+	int				user_nohz;
 };
 
 extern void __init tick_init(void);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index bdc8aeb..f9a85e0 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -172,6 +172,115 @@ update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_upda
 
 }
 
+#ifdef CONFIG_CPUSETS_NO_HZ
+/*
+ * This defines the number of CPUs currently in (or wanting to
+ * be in) adaptive nohz mode. Greater than 0 means at least
+ * one CPU is ready to shut down its tick for non-idle purposes.
+ */
+static atomic_t __read_mostly nr_cpus_user_nohz = ATOMIC_INIT(0);
+
+static inline int update_do_timer_cpu(int current_handler,
+			int new_handler)
+{
+	return cmpxchg(&tick_do_timer_cpu, current_handler, new_handler);
+}
+#else
+static inline int update_do_timer_cpu(int current_handler,
+			int new_handler)
+{
+	int tmp = ACCESS_ONCE(tick_do_timer_cpu);
+	tick_do_timer_cpu = new_handler;
+	return tmp;
+}
+#endif
+
+/*
+ * check_drop_timer_duty: Check if this cpu can shut down
+ * ticks without worrying about who is going to handle
+ * timekeeping. The duty is dropped here as well if possible.
+ * When there are adaptive nohz cpus in the system ready to
+ * run user tasks without ticks, this function makes sure
+ * that timekeeping is handled by a cpu. A non-adaptive-nohz
+ * cpu, if any, will claim the duty as soon as it discovers
+ * that some adaptive-nohz cpu is stuck with it.
+ *
+ * Returns the new value of tick_do_timer_cpu.
+ */
+static int check_drop_timer_duty(int cpu)
+{
+	int curr_handler, prev_handler, new_handler;
+	int nrepeat = -1;
+	bool drop_recheck;
+
+repeat:
+	WARN_ON_ONCE(++nrepeat > 1);
+	drop_recheck = false;
+	curr_handler = cpu;
+	new_handler = TICK_DO_TIMER_NONE;
+
+#ifdef CONFIG_CPUSETS_NO_HZ
+	if (atomic_read(&nr_cpus_user_nohz) > 0) {
+		curr_handler = ACCESS_ONCE(tick_do_timer_cpu);
+		/*
+		 * Keep the duty until someone takes it away.
+		 * FIXME: Make nr_cpus_user_nohz an atomic cpumask
+		 * to find an idle CPU to dump the duty at.
+		 */
+		if (curr_handler == cpu)
+			return cpu;
+		/*
+		 * This cpu will try to take the duty if 1) there is
+		 * no handler or 2) current handler seems to be an
+		 * adaptive-nohz cpu. We take the duty from others
+		 * only if the we are idle or not part of an
+		 * adaptive-nohz cpuset.
+		 * Once we take the duty, the check above ensures that
+		 * we stick with it.
+		 */
+		if (unlikely(curr_handler == TICK_DO_TIMER_NONE)
+			|| (per_cpu(tick_cpu_sched, curr_handler).user_nohz
+				&& (is_idle_task(current)
+					|| !cpuset_cpu_adaptive_nohz(cpu))))
+			new_handler = cpu;
+		else
+			/*
+			 * A regular CPU is updating the jiffies and we don't
+			 * have to take it away from her.
+			 */
+			new_handler = curr_handler;
+	} else {
+		/*
+		 * We might miss nr_cpus_user_nohz update and drop the duty
+		 * whereas other CPUs think that we keep handling the
+		 * timekeeping. To prevent this, we recheck its value after
+		 * we update the timer_do_timer_cpu and start over if
+		 * necessary.
+		 */
+		drop_recheck = true;
+	}
+#endif
+
+	prev_handler = update_do_timer_cpu(curr_handler, new_handler);
+
+	if (drop_recheck && atomic_read(&nr_cpus_user_nohz) > 0)
+		goto repeat;
+
+	if (likely(new_handler != TICK_DO_TIMER_NONE)) {
+		if (prev_handler == curr_handler)
+			return new_handler;
+		/*
+		 * Handler was probably changed under us. Whoever has
+		 * the duty might just drop it and we wouldn't know.
+		 * So, let's try again...
+		 */
+		goto repeat;
+	} else {
+		/* We either just dropped the duty or didn't have it. */
+		return prev_handler == cpu ? TICK_DO_TIMER_NONE : prev_handler;
+	}
+}
+
 static void tick_nohz_stop_idle(int cpu, ktime_t now)
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
@@ -187,6 +296,14 @@ static ktime_t tick_nohz_start_idle(int cpu, struct tick_sched *ts)
 	ktime_t now = ktime_get();
 
 	ts->idle_entrytime = now;
+
+#ifdef CONFIG_CPUSETS_NO_HZ
+	if (ts->user_nohz) {
+		ts->user_nohz = 0;
+		WARN_ON_ONCE(atomic_add_negative(-1, &nr_cpus_user_nohz));
+	}
+#endif
+
 	ts->idle_active = 1;
 	sched_clock_idle_sleep_event();
 	return now;
@@ -280,6 +397,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	ktime_t last_update, expires, ret = { .tv64 = 0 };
 	struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
 	u64 time_delta;
+	int new_handler, prev_handler;
 
 
 	/* Read jiffies and the time when jiffies were updated last */
@@ -308,24 +426,25 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 
 	/* Schedule the tick, if we are at least one jiffie off */
 	if ((long)delta_jiffies >= 1) {
+		/*
+		 * Check if adaptive nohz needs this CPU to take care
+		 * of the jiffies update. We also drop the duty in this
+		 * function if we can.
+		 */
+		prev_handler = ACCESS_ONCE(tick_do_timer_cpu);
+		new_handler = check_drop_timer_duty(cpu);
+		if (new_handler == cpu)
+			goto out;
 
 		/*
-		 * If this cpu is the one which updates jiffies, then
-		 * give up the assignment and let it be taken by the
-		 * cpu which runs the tick timer next, which might be
-		 * this cpu as well. If we don't drop this here the
-		 * jiffies might be stale and do_timer() never
-		 * invoked. Keep track of the fact that it was the one
-		 * which had the do_timer() duty last. If this cpu is
-		 * the one which had the do_timer() duty last, we
-		 * limit the sleep time to the timekeeping
-		 * max_deferement value which we retrieved
+		 * If this cpu is the one which had the do_timer()
+		 * duty last, we limit the sleep time to the
+		 * timekeeping max_deferement value which we retrieved
 		 * above. Otherwise we can sleep as long as we want.
 		 */
-		if (cpu == tick_do_timer_cpu) {
-			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+		if (prev_handler == cpu) {
 			ts->do_timer_last = 1;
-		} else if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) {
+		} else if (new_handler != TICK_DO_TIMER_NONE) {
 			time_delta = KTIME_MAX;
 			ts->do_timer_last = 0;
 		} else if (!ts->do_timer_last) {
@@ -419,6 +538,10 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 	 * invoked.
 	 */
 	if (unlikely(!cpu_online(cpu))) {
+		/*
+		 * FIXME: Might need some sort of protection
+		 * against CPU hotunplug for adaptive nohz.
+		 */
 		if (cpu == tick_do_timer_cpu)
 			tick_do_timer_cpu = TICK_DO_TIMER_NONE;
 	}
@@ -510,19 +633,24 @@ void tick_nohz_idle_enter(void)
 }
 
 #ifdef CONFIG_CPUSETS_NO_HZ
-static bool can_stop_adaptive_tick(void)
+static bool can_stop_adaptive_tick(struct tick_sched *ts)
 {
-	if (!sched_can_stop_tick())
-		return false;
-
-	if (posix_cpu_timers_running(current))
-		return false;
-
-	/* Is there a grace period to complete ? */
-	if (rcu_pending(smp_processor_id()))
-		return false;
+	int ret = true;
+
+	if (!sched_can_stop_tick()
+		|| posix_cpu_timers_running(current)
+		|| rcu_pending(smp_processor_id()))
+		ret = false;
+
+	if (ret && !ts->user_nohz) {
+		ts->user_nohz = 1;
+		atomic_inc(&nr_cpus_user_nohz);
+	} else if (!ret && ts->user_nohz) {
+		ts->user_nohz = 0;
+		WARN_ON_ONCE(atomic_add_negative(-1, &nr_cpus_user_nohz));
+	}
 
-	return true;
+	return ret;
 }
 
 static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts)
@@ -541,7 +669,7 @@ static void tick_nohz_cpuset_stop_tick(struct tick_sched *ts)
 	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
 		return;
 
-	if (!can_stop_adaptive_tick())
+	if (!can_stop_adaptive_tick(ts))
 		return;
 
 	/*
@@ -990,27 +1118,14 @@ void tick_nohz_exit_exception(struct pt_regs *regs)
 		tick_nohz_exit_kernel();
 }
 
-/*
- * Take the timer duty if nobody is taking care of it.
- * If a CPU already does and and it's in a nohz cpuset,
- * then take the charge so that it can switch to nohz mode.
- */
-static void tick_do_timer_check_handler(int cpu)
+static void tick_nohz_restart_adaptive(struct tick_sched *ts)
 {
-	int handler = tick_do_timer_cpu;
+	tick_nohz_flush_current_times(true);
 
-	if (unlikely(handler == TICK_DO_TIMER_NONE)) {
-		tick_do_timer_cpu = cpu;
-	} else {
-		if (!cpuset_adaptive_nohz() &&
-		    cpuset_cpu_adaptive_nohz(handler))
-			tick_do_timer_cpu = cpu;
+	if (ts->user_nohz) {
+		ts->user_nohz = 0;
+		WARN_ON_ONCE(atomic_add_negative(-1, &nr_cpus_user_nohz));
 	}
-}
-
-static void tick_nohz_restart_adaptive(void)
-{
-	tick_nohz_flush_current_times(true);
 	tick_nohz_restart_sched_tick();
 	clear_thread_flag(TIF_NOHZ);
 	trace_printk("clear TIF_NOHZ\n");
@@ -1022,8 +1137,8 @@ void tick_nohz_check_adaptive(void)
 	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
 
 	if (ts->tick_stopped && !is_idle_task(current)) {
-		if (!can_stop_adaptive_tick())
-			tick_nohz_restart_adaptive();
+		if (!can_stop_adaptive_tick(ts))
+			tick_nohz_restart_adaptive(ts);
 	}
 }
 
@@ -1033,7 +1148,7 @@ void cpuset_exit_nohz_interrupt(void *unused)
 
 	trace_printk("IPI: Nohz exit\n");
 	if (ts->tick_stopped && !is_idle_task(current))
-		tick_nohz_restart_adaptive();
+		tick_nohz_restart_adaptive(ts);
 }
 
 /*
@@ -1122,7 +1237,17 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 	ktime_t now = ktime_get();
 	int cpu = smp_processor_id();
 
-	tick_do_timer_check_handler(cpu);
+#ifdef CONFIG_NO_HZ
+	/*
+	 * Check if the do_timer duty was dropped. We don't care about
+	 * concurrency: This happens only when the cpu in charge went
+	 * into a long sleep. If two cpus happen to assign themself to
+	 * this duty, then the jiffies update is still serialized by
+	 * xtime_lock.
+	 */
+	if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
+		tick_do_timer_cpu = cpu;
+#endif
 
 	/* Check, if the jiffies need an update */
 	if (tick_do_timer_cpu == cpu)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] nohz/cpuset: Make a CPU stick with do_timer() duty in the presence of nohz cpusets
  2012-11-28 18:03           ` Hakan Akkan
@ 2012-11-30 17:38             ` Frederic Weisbecker
  2012-11-30 19:41               ` Hakan Akkan
  0 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2012-11-30 17:38 UTC (permalink / raw)
  To: Hakan Akkan
  Cc: linux-kernel, Thomas Gleixner, Steven Rostedt, Peter Zijlstra,
	Ingo Molnar

2012/11/28 Hakan Akkan <hakanakkan@gmail.com>:
> +static int check_drop_timer_duty(int cpu)
> +{
> +       int curr_handler, prev_handler, new_handler;
> +       int nrepeat = -1;
> +       bool drop_recheck;
> +
> +repeat:
> +       WARN_ON_ONCE(++nrepeat > 1);
> +       drop_recheck = false;
> +       curr_handler = cpu;
> +       new_handler = TICK_DO_TIMER_NONE;
> +
> +#ifdef CONFIG_CPUSETS_NO_HZ
> +       if (atomic_read(&nr_cpus_user_nohz) > 0) {

Note atomic_read() is not SMP ordered. If another CPU does
atomic_add() or atomic_add_return(), you may not see the new value as
expected with atomic_read(). Doing atomic_add_return(0, &atomic_thing)
returns the fully ordered result.

You also need to do that to ensure full ordering against
tick_cpu_sched.user_nohz.

On the current layout we have:

(Write side)                                      (Read side)

ts->user_nohz = 1;
atomic_inc(&nr_cpus_user_nohz)

                                                      if
(atomic_read(&nr_cpus_user_nohz))
                                                            if
(per_cpu(tick_cpu_sched, curr_handler).user_nohz)
                                                                ....

If you want to make sure that you see the expected value on user_nohz
from the read side, you need to correctly order the write and read
against nr_cpus_user_nohz.

For this you can use atomic_add_return() which implies the full barrier:


(Write side)                                      (Read side)

ts->user_nohz = 1;
atomic_inc_return(&nr_cpus_user_nohz)

                                                      if
(atomic_add_return(0, &nr_cpus_user_nohz))
                                                            if
(per_cpu(tick_cpu_sched, curr_handler).user_nohz)
                                                                ....

I have much more comments on this patch, I will come back on this soon.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nohz/cpuset: Make a CPU stick with do_timer() duty in the presence of nohz cpusets
  2012-11-30 17:38             ` Frederic Weisbecker
@ 2012-11-30 19:41               ` Hakan Akkan
  0 siblings, 0 replies; 9+ messages in thread
From: Hakan Akkan @ 2012-11-30 19:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Ingo Molnar

On Fri, Nov 30, 2012 at 10:38 AM, Frederic Weisbecker
<fweisbec@gmail.com> wrote:
> 2012/11/28 Hakan Akkan <hakanakkan@gmail.com>:
>> +static int check_drop_timer_duty(int cpu)
>> +{
>> +       int curr_handler, prev_handler, new_handler;
>> +       int nrepeat = -1;
>> +       bool drop_recheck;
>> +
>> +repeat:
>> +       WARN_ON_ONCE(++nrepeat > 1);
>> +       drop_recheck = false;
>> +       curr_handler = cpu;
>> +       new_handler = TICK_DO_TIMER_NONE;
>> +
>> +#ifdef CONFIG_CPUSETS_NO_HZ
>> +       if (atomic_read(&nr_cpus_user_nohz) > 0) {
>
> Note atomic_read() is not SMP ordered. If another CPU does
> atomic_add() or atomic_add_return(), you may not see the new value as
> expected with atomic_read(). Doing atomic_add_return(0, &atomic_thing)
> returns the fully ordered result.

That code doesn't necessarily try to achieve a prefect ordering. I wasn't very
sure if we want the overhead of atomic_add_return(0, ...) in such a hot path
so this part of the function is racy. Nevertheless, it makes sure that someone
gets the duty.

#ifdef CONFIG_CPUSETS_NO_HZ
       if (atomic_read(&nr_cpus_user_nohz) > 0) {

The CPU who updated nr_cpus_user_nohz will certainly see the new
value and claim the duty if someone else hasn't yet. Another regular
CPU will eventually see the new value and assign itself to the duty.
So we can tolerate some race here for the sake of avoiding heavier locks.

>
> You also need to do that to ensure full ordering against
> tick_cpu_sched.user_nohz.
>
> On the current layout we have:
>
> (Write side)                                      (Read side)
>
> ts->user_nohz = 1;
> atomic_inc(&nr_cpus_user_nohz)
>
>                                                       if
> (atomic_read(&nr_cpus_user_nohz))
>                                                             if
> (per_cpu(tick_cpu_sched, curr_handler).user_nohz)
>                                                                 ....
>
> If you want to make sure that you see the expected value on user_nohz
> from the read side, you need to correctly order the write and read
> against nr_cpus_user_nohz.

Again, perfect ordering is not critical here. In the worst case we miss that
CPUs ts->user_nohz = 1 update and that adaptive nohz CPU will keep
the duty for a short while until someone else takes it. Trying to avoid this
would require more locking instructions like you suggest below.

If we really want to fully isolate nohz CPUs and avoid "being stuck with
jiffies update until someone else takes it away" situations, we can have a
cpumask denoting idle and/or non-adaptive-nohz CPUs and dump the duty
once we discover that we're "stuck".

>
> For this you can use atomic_add_return() which implies the full barrier:
>
>
> (Write side)                                      (Read side)
>
> ts->user_nohz = 1;
> atomic_inc_return(&nr_cpus_user_nohz)
>
>                                                       if
> (atomic_add_return(0, &nr_cpus_user_nohz))
>                                                             if
> (per_cpu(tick_cpu_sched, curr_handler).user_nohz)
>                                                                 ....
>
> I have much more comments on this patch, I will come back on this soon.
>
> Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-11-30 19:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-17  6:35 [PATCH] nohz/cpuset: Make a CPU stick with do_timer() duty in the presence of nohz cpusets Hakan Akkan
2012-11-19 14:46 ` Frederic Weisbecker
2012-11-20  0:27   ` Hakan Akkan
2012-11-20  1:17     ` Steven Rostedt
2012-11-20  1:52       ` Frederic Weisbecker
2012-11-28  8:29         ` Hakan Akkan
2012-11-28 18:03           ` Hakan Akkan
2012-11-30 17:38             ` Frederic Weisbecker
2012-11-30 19:41               ` Hakan Akkan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.