linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong
@ 2013-03-06 11:18 Thomas Gleixner
  2013-03-06 11:18 ` [patch 1/7] tick: Call tick_init late Thomas Gleixner
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

Jason decoded a problem related to the broadcast timer mode. The
reprogramming of the cpu local timer causes a huge number of
retries. Also there is a situation where the CPU which does not handle
the broadcast timer interrupt exits and reenters broadcast mode before
the broadcast interrupt got handled by another CPU. This can lead to
an interesting ping pong of the broadcast and the cpu local timer
code.

This series addresses these problems. The first two patches convert
the broadcast code to proper cpumask_var_t instead of adding more
bitmaps later.

The rest of the series is adopted from the quick patches which I
posted earlier while discussing the issue with Jason et. al.

Please give it a proper testing on your affected hardware.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 1/7] tick: Call tick_init late
  2013-03-06 11:18 [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong Thomas Gleixner
@ 2013-03-06 11:18 ` Thomas Gleixner
  2013-03-06 11:18 ` [patch 2/7] tick: Convert broadcast cpu bitmaps to cpumask_var_t Thomas Gleixner
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: tick-call-init-late.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130306/e6a9768c/attachment.ksh>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 2/7] tick: Convert broadcast cpu bitmaps to cpumask_var_t
  2013-03-06 11:18 [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong Thomas Gleixner
  2013-03-06 11:18 ` [patch 1/7] tick: Call tick_init late Thomas Gleixner
@ 2013-03-06 11:18 ` Thomas Gleixner
  2013-03-07  5:51   ` Rusty Russell
  2013-03-06 11:18 ` [patch 3/7] tick: Avoid programming the local cpu timer if broadcast pending Thomas Gleixner
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: tick-broadcast-convert-to-cpumask-t.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130306/b41f91c1/attachment.ksh>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 3/7] tick: Avoid programming the local cpu timer if broadcast pending
  2013-03-06 11:18 [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong Thomas Gleixner
  2013-03-06 11:18 ` [patch 1/7] tick: Call tick_init late Thomas Gleixner
  2013-03-06 11:18 ` [patch 2/7] tick: Convert broadcast cpu bitmaps to cpumask_var_t Thomas Gleixner
@ 2013-03-06 11:18 ` Thomas Gleixner
  2013-03-06 11:18 ` [patch 4/7] tick: Handle broadcast wakeup of multiple cpus Thomas Gleixner
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: tick-avoid-programming-local-cpu-timer-if-broadcast-pending.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130306/52b75b8a/attachment.ksh>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 4/7] tick: Handle broadcast wakeup of multiple cpus
  2013-03-06 11:18 [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong Thomas Gleixner
                   ` (2 preceding siblings ...)
  2013-03-06 11:18 ` [patch 3/7] tick: Avoid programming the local cpu timer if broadcast pending Thomas Gleixner
@ 2013-03-06 11:18 ` Thomas Gleixner
  2013-03-13 11:36   ` Lorenzo Pieralisi
  2013-03-06 11:18 ` [patch 5/7] tick: Provide a check for a forced broadcast pending Thomas Gleixner
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: tick-broadcast-handle-mass-wakeup.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130306/7e87bc1c/attachment.ksh>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 5/7] tick: Provide a check for a forced broadcast pending
  2013-03-06 11:18 [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong Thomas Gleixner
                   ` (3 preceding siblings ...)
  2013-03-06 11:18 ` [patch 4/7] tick: Handle broadcast wakeup of multiple cpus Thomas Gleixner
@ 2013-03-06 11:18 ` Thomas Gleixner
  2013-03-06 11:18 ` [patch 6/7] arm: Use tick broadcast expired check Thomas Gleixner
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: tick-provide-check-for-forced-broadcast-pending.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130306/8859fa0e/attachment.ksh>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 6/7] arm: Use tick broadcast expired check
  2013-03-06 11:18 [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong Thomas Gleixner
                   ` (4 preceding siblings ...)
  2013-03-06 11:18 ` [patch 5/7] tick: Provide a check for a forced broadcast pending Thomas Gleixner
@ 2013-03-06 11:18 ` Thomas Gleixner
  2013-03-06 11:18 ` [patch 7/7] x86: " Thomas Gleixner
  2013-03-13  9:35 ` [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong Santosh Shilimkar
  7 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: arm-use-broadcast-expired-check.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130306/53dd5770/attachment.ksh>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 7/7] x86: Use tick broadcast expired check
  2013-03-06 11:18 [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong Thomas Gleixner
                   ` (5 preceding siblings ...)
  2013-03-06 11:18 ` [patch 6/7] arm: Use tick broadcast expired check Thomas Gleixner
@ 2013-03-06 11:18 ` Thomas Gleixner
  2013-03-13  9:35 ` [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong Santosh Shilimkar
  7 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: x86-use-broadcast-expired-check.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130306/9e773a0f/attachment.ksh>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 2/7] tick: Convert broadcast cpu bitmaps to cpumask_var_t
  2013-03-06 11:18 ` [patch 2/7] tick: Convert broadcast cpu bitmaps to cpumask_var_t Thomas Gleixner
@ 2013-03-07  5:51   ` Rusty Russell
  0 siblings, 0 replies; 12+ messages in thread
From: Rusty Russell @ 2013-03-07  5:51 UTC (permalink / raw)
  To: linux-arm-kernel

Thomas Gleixner <tglx@linutronix.de> writes:
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  kernel/time/tick-broadcast.c |   86 +++++++++++++++++++++----------------------
>  kernel/time/tick-common.c    |    1 
>  kernel/time/tick-internal.h  |    3 +
>  3 files changed, 46 insertions(+), 44 deletions(-)
>
> Index: tip/kernel/time/tick-broadcast.c
> ===================================================================
> --- tip.orig/kernel/time/tick-broadcast.c
> +++ tip/kernel/time/tick-broadcast.c
> @@ -28,9 +28,8 @@
>   */
>  
>  static struct tick_device tick_broadcast_device;
> -/* FIXME: Use cpumask_var_t. */
> -static DECLARE_BITMAP(tick_broadcast_mask, NR_CPUS);
> -static DECLARE_BITMAP(tmpmask, NR_CPUS);
> +static cpumask_var_t tick_broadcast_mask;
> +static cpumask_var_t tmpmask;
>  static DEFINE_RAW_SPINLOCK(tick_broadcast_lock);
>  static int tick_broadcast_force;

Thanks, this is a nice cleanup.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong
  2013-03-06 11:18 [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong Thomas Gleixner
                   ` (6 preceding siblings ...)
  2013-03-06 11:18 ` [patch 7/7] x86: " Thomas Gleixner
@ 2013-03-13  9:35 ` Santosh Shilimkar
  7 siblings, 0 replies; 12+ messages in thread
From: Santosh Shilimkar @ 2013-03-13  9:35 UTC (permalink / raw)
  To: linux-arm-kernel

Thomas,

On Wednesday 06 March 2013 04:48 PM, Thomas Gleixner wrote:
> Jason decoded a problem related to the broadcast timer mode. The
> reprogramming of the cpu local timer causes a huge number of
> retries. Also there is a situation where the CPU which does not handle
> the broadcast timer interrupt exits and reenters broadcast mode before
> the broadcast interrupt got handled by another CPU. This can lead to
> an interesting ping pong of the broadcast and the cpu local timer
> code.
> 
> This series addresses these problems. The first two patches convert
> the broadcast code to proper cpumask_var_t instead of adding more
> bitmaps later.
> 
> The rest of the series is adopted from the quick patches which I
> posted earlier while discussing the issue with Jason et. al.
> 
> Please give it a proper testing on your affected hardware.
> 
I have tested this revised patches on OMAP4 and OMAP5 platforms
with CPUidle enabled against 3.9-rc2. As expected they seems to
work without any issue and also fixes the reported retry issue.

Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 4/7] tick: Handle broadcast wakeup of multiple cpus
  2013-03-06 11:18 ` [patch 4/7] tick: Handle broadcast wakeup of multiple cpus Thomas Gleixner
@ 2013-03-13 11:36   ` Lorenzo Pieralisi
  2013-03-13 21:42     ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Lorenzo Pieralisi @ 2013-03-13 11:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

On Wed, Mar 06, 2013 at 11:18:35AM +0000, Thomas Gleixner wrote:
> Some brilliant hardware implementations wake multiple cores when the
> broadcast timer fires. This leads to the following interesting
> problem:
> 
> CPU0				CPU1
> wakeup from idle		wakeup from idle
> 
> leave broadcast mode		leave broadcast mode
>  restart per cpu timer		 restart per cpu timer
>  	     	 		go back to idle
> handle broadcast
>  (empty mask)			
> 				enter broadcast mode
> 				programm broadcast device
> enter broadcast mode
> programm broadcast device
> 
> So what happens is that due to the forced reprogramming of the cpu
> local timer, we need to set a event in the future. Now if we manage to
> go back to idle before the timer fires, we switch off the timer and
> arm the broadcast device with an already expired time (covered by
> forced mode). So in the worst case we repeat the above ping pong
> forever.
> 					
> Unfortunately we have no information about what caused the wakeup, but
> we can check current time against the expiry time of the local cpu. If
> the local event is already in the past, we know that the broadcast
> timer is about to fire and send an IPI. So we mark ourself as an IPI
> target even if we left broadcast mode and avoid the reprogramming of
> the local cpu timer.
> 
> This still leaves the possibility that a CPU which is not handling the
> broadcast interrupt is going to reach idle again before the IPI
> arrives. This can't be solved in the core code and will be handled in
> follow up patches.
> 
> Reported-by: Jason Liu <liu.h.jason@gmail.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/time/tick-broadcast.c |   59 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 58 insertions(+), 1 deletion(-)
> 
> Index: tip/kernel/time/tick-broadcast.c
> ===================================================================
> --- tip.orig/kernel/time/tick-broadcast.c
> +++ tip/kernel/time/tick-broadcast.c
> @@ -393,6 +393,7 @@ int tick_resume_broadcast(void)
>  
>  static cpumask_var_t tick_broadcast_oneshot_mask;
>  static cpumask_var_t tick_broadcast_pending_mask;
> +static cpumask_var_t tick_broadcast_force_mask;
>  
>  /*
>   * Exposed for debugging: see timer_list.c
> @@ -462,6 +463,10 @@ again:
>  		}
>  	}
>  
> +	/* Take care of enforced broadcast requests */
> +	cpumask_or(tmpmask, tmpmask, tick_broadcast_force_mask);
> +	cpumask_clear(tick_broadcast_force_mask);

I tested the set and it works fine on a dual cluster big.LITTLE testchip
using broadcast timer to manage deep idle cluster states.

Just asking a question: the force mask is cleared before sending the
timer IPI. Would not be better to clear it after the IPI is sent in

tick_do_broadcast(...) ?

Can you spot a regression if we do this ? The idle thread checks that
mask with irqs disabled, so it is possible that we clear the mask before
the CPU has a chance to get the IPI. If we clear the mask after sending
the IPI, we are increasing the chances for the idle thread to get it.

It is just a further optimization, just asking, thanks.

> +
>  	/*
>  	 * Wakeup the cpus which have an expired event.
>  	 */
> @@ -497,6 +502,7 @@ void tick_broadcast_oneshot_control(unsi
>  	struct clock_event_device *bc, *dev;
>  	struct tick_device *td;
>  	unsigned long flags;
> +	ktime_t now;
>  	int cpu;
>  
>  	/*
> @@ -524,7 +530,16 @@ void tick_broadcast_oneshot_control(unsi
>  		WARN_ON_ONCE(cpumask_test_cpu(cpu, tick_broadcast_pending_mask));
>  		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
>  			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
> -			if (dev->next_event.tv64 < bc->next_event.tv64)
> +			/*
> +			 * We only reprogram the broadcast timer if we
> +			 * did not mark ourself in the force mask and
> +			 * if the cpu local event is earlier than the
> +			 * broadcast event. If the current CPU is in
> +			 * the force mask, then we are going to be
> +			 * woken by the IPI right away.
> +			 */
> +			if (!cpumask_test_cpu(cpu, tick_broadcast_force_mask) &&
Is the test against tick_broadcast_force_mask necessary if we add the check
in the idle thread before entering idle ? It does not hurt, agreed, and we'd
better leave it there, it is just for my own understanding, thanks a lot.

Having said that, on the series:

Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

> +			    dev->next_event.tv64 < bc->next_event.tv64)
>  				tick_broadcast_set_event(dev->next_event, 1);
>  		}
>  	} else {
> @@ -545,6 +560,47 @@ void tick_broadcast_oneshot_control(unsi
>  				       tick_broadcast_pending_mask))
>  				goto out;
>  
> +			/*
> +			 * If the pending bit is not set, then we are
> +			 * either the CPU handling the broadcast
> +			 * interrupt or we got woken by something else.
> +			 *
> +			 * We are not longer in the broadcast mask, so
> +			 * if the cpu local expiry time is already
> +			 * reached, we would reprogram the cpu local
> +			 * timer with an already expired event.
> +			 *
> +			 * This can lead to a ping-pong when we return
> +			 * to idle and therefor rearm the broadcast
> +			 * timer before the cpu local timer was able
> +			 * to fire. This happens because the forced
> +			 * reprogramming makes sure that the event
> +			 * will happen in the future and depending on
> +			 * the min_delta setting this might be far
> +			 * enough out that the ping-pong starts.
> +			 *
> +			 * If the cpu local next_event has expired
> +			 * then we know that the broadcast timer
> +			 * next_event has expired as well and
> +			 * broadcast is about to be handled. So we
> +			 * avoid reprogramming and enforce that the
> +			 * broadcast handler, which did not run yet,
> +			 * will invoke the cpu local handler.
> +			 *
> +			 * We cannot call the handler directly from
> +			 * here, because we might be in a NOHZ phase
> +			 * and we did not go through the irq_enter()
> +			 * nohz fixups.
> +			 */
> +			now = ktime_get();
> +			if (dev->next_event.tv64 <= now.tv64) {
> +				cpumask_set_cpu(cpu, tick_broadcast_force_mask);
> +				goto out;
> +			}
> +			/*
> +			 * We got woken by something else. Reprogram
> +			 * the cpu local timer device.
> +			 */
>  			tick_program_event(dev->next_event, 1);
>  		}
>  	}
> @@ -686,5 +742,6 @@ void tick_broadcast_init(void)
>  #ifdef CONFIG_TICK_ONESHOT
>  	alloc_cpumask_var(&tick_broadcast_oneshot_mask, GFP_NOWAIT);
>  	alloc_cpumask_var(&tick_broadcast_pending_mask, GFP_NOWAIT);
> +	alloc_cpumask_var(&tick_broadcast_force_mask, GFP_NOWAIT);
>  #endif
>  }
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch 4/7] tick: Handle broadcast wakeup of multiple cpus
  2013-03-13 11:36   ` Lorenzo Pieralisi
@ 2013-03-13 21:42     ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-03-13 21:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 13 Mar 2013, Lorenzo Pieralisi wrote:
> > +	/* Take care of enforced broadcast requests */
> > +	cpumask_or(tmpmask, tmpmask, tick_broadcast_force_mask);
> > +	cpumask_clear(tick_broadcast_force_mask);
> 
> I tested the set and it works fine on a dual cluster big.LITTLE testchip
> using broadcast timer to manage deep idle cluster states.
> 
> Just asking a question: the force mask is cleared before sending the
> timer IPI. Would not be better to clear it after the IPI is sent in
> 
> tick_do_broadcast(...) ?
> 
> Can you spot a regression if we do this ? The idle thread checks that
> mask with irqs disabled, so it is possible that we clear the mask before
> the CPU has a chance to get the IPI. If we clear the mask after sending
> the IPI, we are increasing the chances for the idle thread to get it.
> 
> It is just a further optimization, just asking, thanks.

Need to think about that.
 
> > @@ -524,7 +530,16 @@ void tick_broadcast_oneshot_control(unsi
> >  		WARN_ON_ONCE(cpumask_test_cpu(cpu, tick_broadcast_pending_mask));
> >  		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
> >  			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
> > -			if (dev->next_event.tv64 < bc->next_event.tv64)
> > +			/*
> > +			 * We only reprogram the broadcast timer if we
> > +			 * did not mark ourself in the force mask and
> > +			 * if the cpu local event is earlier than the
> > +			 * broadcast event. If the current CPU is in
> > +			 * the force mask, then we are going to be
> > +			 * woken by the IPI right away.
> > +			 */
> > +			if (!cpumask_test_cpu(cpu, tick_broadcast_force_mask) &&
> Is the test against tick_broadcast_force_mask necessary if we add the check
> in the idle thread before entering idle ? It does not hurt, agreed, and we'd
> better leave it there, it is just for my own understanding, thanks a lot.

Well, it's necessary for all archs which do not have the check (yet).
 
> Having said that, on the series:
> 
> Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-03-13 21:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-06 11:18 [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong Thomas Gleixner
2013-03-06 11:18 ` [patch 1/7] tick: Call tick_init late Thomas Gleixner
2013-03-06 11:18 ` [patch 2/7] tick: Convert broadcast cpu bitmaps to cpumask_var_t Thomas Gleixner
2013-03-07  5:51   ` Rusty Russell
2013-03-06 11:18 ` [patch 3/7] tick: Avoid programming the local cpu timer if broadcast pending Thomas Gleixner
2013-03-06 11:18 ` [patch 4/7] tick: Handle broadcast wakeup of multiple cpus Thomas Gleixner
2013-03-13 11:36   ` Lorenzo Pieralisi
2013-03-13 21:42     ` Thomas Gleixner
2013-03-06 11:18 ` [patch 5/7] tick: Provide a check for a forced broadcast pending Thomas Gleixner
2013-03-06 11:18 ` [patch 6/7] arm: Use tick broadcast expired check Thomas Gleixner
2013-03-06 11:18 ` [patch 7/7] x86: " Thomas Gleixner
2013-03-13  9:35 ` [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong Santosh Shilimkar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).