All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 1/7] tick: Call tick_init late
  2013-03-06 11:18 ` Thomas Gleixner
@ 2013-03-06 11:18   ` Thomas Gleixner
  -1 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: LKML
  Cc: LAK, John Stultz, Arjan van de Veen, Lorenzo Pieralisi,
	Santosh Shilimkar, Jason Liu, Rusty Russell

[-- Attachment #1: tick-call-init-late.patch --]
[-- Type: text/plain, Size: 977 bytes --]

To convert the clockevents code to cpumask_var_t we need to move the
init call after the allocator setup. 

Clockevents are earliest registered from time_init() as they need
interrupts being set up, so this is safe.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
---
 init/main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: tip/init/main.c
===================================================================
--- tip.orig/init/main.c
+++ tip/init/main.c
@@ -494,7 +494,6 @@ asmlinkage void __init start_kernel(void
  * Interrupts are still disabled. Do necessary setups, then
  * enable them
  */
-	tick_init();
 	boot_cpu_init();
 	page_address_init();
 	printk(KERN_NOTICE "%s", linux_banner);
@@ -551,6 +550,7 @@ asmlinkage void __init start_kernel(void
 	/* init some links before init_ISA_irqs() */
 	early_irq_init();
 	init_IRQ();
+	tick_init();
 	init_timers();
 	hrtimers_init();
 	softirq_init();



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong
@ 2013-03-06 11:18 ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: LKML
  Cc: LAK, John Stultz, Arjan van de Veen, Lorenzo Pieralisi,
	Santosh Shilimkar, Jason Liu

Jason decoded a problem related to the broadcast timer mode. The
reprogramming of the cpu local timer causes a huge number of
retries. Also there is a situation where the CPU which does not handle
the broadcast timer interrupt exits and reenters broadcast mode before
the broadcast interrupt got handled by another CPU. This can lead to
an interesting ping pong of the broadcast and the cpu local timer
code.

This series addresses these problems. The first two patches convert
the broadcast code to proper cpumask_var_t instead of adding more
bitmaps later.

The rest of the series is adopted from the quick patches which I
posted earlier while discussing the issue with Jason et. al.

Please give it a proper testing on your affected hardware.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong
@ 2013-03-06 11:18 ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

Jason decoded a problem related to the broadcast timer mode. The
reprogramming of the cpu local timer causes a huge number of
retries. Also there is a situation where the CPU which does not handle
the broadcast timer interrupt exits and reenters broadcast mode before
the broadcast interrupt got handled by another CPU. This can lead to
an interesting ping pong of the broadcast and the cpu local timer
code.

This series addresses these problems. The first two patches convert
the broadcast code to proper cpumask_var_t instead of adding more
bitmaps later.

The rest of the series is adopted from the quick patches which I
posted earlier while discussing the issue with Jason et. al.

Please give it a proper testing on your affected hardware.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 1/7] tick: Call tick_init late
@ 2013-03-06 11:18   ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: tick-call-init-late.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130306/e6a9768c/attachment.ksh>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 2/7] tick: Convert broadcast cpu bitmaps to cpumask_var_t
  2013-03-06 11:18 ` Thomas Gleixner
@ 2013-03-06 11:18   ` Thomas Gleixner
  -1 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: LKML
  Cc: LAK, John Stultz, Arjan van de Veen, Lorenzo Pieralisi,
	Santosh Shilimkar, Jason Liu, Rusty Russell

[-- Attachment #1: tick-broadcast-convert-to-cpumask-t.patch --]
[-- Type: text/plain, Size: 10508 bytes --]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
---
 kernel/time/tick-broadcast.c |   86 +++++++++++++++++++++----------------------
 kernel/time/tick-common.c    |    1 
 kernel/time/tick-internal.h  |    3 +
 3 files changed, 46 insertions(+), 44 deletions(-)

Index: tip/kernel/time/tick-broadcast.c
===================================================================
--- tip.orig/kernel/time/tick-broadcast.c
+++ tip/kernel/time/tick-broadcast.c
@@ -28,9 +28,8 @@
  */
 
 static struct tick_device tick_broadcast_device;
-/* FIXME: Use cpumask_var_t. */
-static DECLARE_BITMAP(tick_broadcast_mask, NR_CPUS);
-static DECLARE_BITMAP(tmpmask, NR_CPUS);
+static cpumask_var_t tick_broadcast_mask;
+static cpumask_var_t tmpmask;
 static DEFINE_RAW_SPINLOCK(tick_broadcast_lock);
 static int tick_broadcast_force;
 
@@ -50,7 +49,7 @@ struct tick_device *tick_get_broadcast_d
 
 struct cpumask *tick_get_broadcast_mask(void)
 {
-	return to_cpumask(tick_broadcast_mask);
+	return tick_broadcast_mask;
 }
 
 /*
@@ -74,7 +73,7 @@ int tick_check_broadcast_device(struct c
 
 	clockevents_exchange_device(tick_broadcast_device.evtdev, dev);
 	tick_broadcast_device.evtdev = dev;
-	if (!cpumask_empty(tick_get_broadcast_mask()))
+	if (!cpumask_empty(tick_broadcast_mask))
 		tick_broadcast_start_periodic(dev);
 	return 1;
 }
@@ -123,7 +122,7 @@ int tick_device_uses_broadcast(struct cl
 	if (!tick_device_is_functional(dev)) {
 		dev->event_handler = tick_handle_periodic;
 		tick_device_setup_broadcast_func(dev);
-		cpumask_set_cpu(cpu, tick_get_broadcast_mask());
+		cpumask_set_cpu(cpu, tick_broadcast_mask);
 		tick_broadcast_start_periodic(tick_broadcast_device.evtdev);
 		ret = 1;
 	} else {
@@ -134,7 +133,7 @@ int tick_device_uses_broadcast(struct cl
 		 */
 		if (!(dev->features & CLOCK_EVT_FEAT_C3STOP)) {
 			int cpu = smp_processor_id();
-			cpumask_clear_cpu(cpu, tick_get_broadcast_mask());
+			cpumask_clear_cpu(cpu, tick_broadcast_mask);
 			tick_broadcast_clear_oneshot(cpu);
 		} else {
 			tick_device_setup_broadcast_func(dev);
@@ -198,9 +197,8 @@ static void tick_do_periodic_broadcast(v
 {
 	raw_spin_lock(&tick_broadcast_lock);
 
-	cpumask_and(to_cpumask(tmpmask),
-		    cpu_online_mask, tick_get_broadcast_mask());
-	tick_do_broadcast(to_cpumask(tmpmask));
+	cpumask_and(tmpmask, cpu_online_mask, tick_broadcast_mask);
+	tick_do_broadcast(tmpmask);
 
 	raw_spin_unlock(&tick_broadcast_lock);
 }
@@ -263,13 +261,12 @@ static void tick_do_broadcast_on_off(uns
 	if (!tick_device_is_functional(dev))
 		goto out;
 
-	bc_stopped = cpumask_empty(tick_get_broadcast_mask());
+	bc_stopped = cpumask_empty(tick_broadcast_mask);
 
 	switch (*reason) {
 	case CLOCK_EVT_NOTIFY_BROADCAST_ON:
 	case CLOCK_EVT_NOTIFY_BROADCAST_FORCE:
-		if (!cpumask_test_cpu(cpu, tick_get_broadcast_mask())) {
-			cpumask_set_cpu(cpu, tick_get_broadcast_mask());
+		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_mask)) {
 			if (tick_broadcast_device.mode ==
 			    TICKDEV_MODE_PERIODIC)
 				clockevents_shutdown(dev);
@@ -279,8 +276,7 @@ static void tick_do_broadcast_on_off(uns
 		break;
 	case CLOCK_EVT_NOTIFY_BROADCAST_OFF:
 		if (!tick_broadcast_force &&
-		    cpumask_test_cpu(cpu, tick_get_broadcast_mask())) {
-			cpumask_clear_cpu(cpu, tick_get_broadcast_mask());
+		    cpumask_test_and_clear_cpu(cpu, tick_broadcast_mask)) {
 			if (tick_broadcast_device.mode ==
 			    TICKDEV_MODE_PERIODIC)
 				tick_setup_periodic(dev, 0);
@@ -288,7 +284,7 @@ static void tick_do_broadcast_on_off(uns
 		break;
 	}
 
-	if (cpumask_empty(tick_get_broadcast_mask())) {
+	if (cpumask_empty(tick_broadcast_mask)) {
 		if (!bc_stopped)
 			clockevents_shutdown(bc);
 	} else if (bc_stopped) {
@@ -337,10 +333,10 @@ void tick_shutdown_broadcast(unsigned in
 	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 
 	bc = tick_broadcast_device.evtdev;
-	cpumask_clear_cpu(cpu, tick_get_broadcast_mask());
+	cpumask_clear_cpu(cpu, tick_broadcast_mask);
 
 	if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC) {
-		if (bc && cpumask_empty(tick_get_broadcast_mask()))
+		if (bc && cpumask_empty(tick_broadcast_mask))
 			clockevents_shutdown(bc);
 	}
 
@@ -376,13 +372,13 @@ int tick_resume_broadcast(void)
 
 		switch (tick_broadcast_device.mode) {
 		case TICKDEV_MODE_PERIODIC:
-			if (!cpumask_empty(tick_get_broadcast_mask()))
+			if (!cpumask_empty(tick_broadcast_mask))
 				tick_broadcast_start_periodic(bc);
 			broadcast = cpumask_test_cpu(smp_processor_id(),
-						     tick_get_broadcast_mask());
+						     tick_broadcast_mask);
 			break;
 		case TICKDEV_MODE_ONESHOT:
-			if (!cpumask_empty(tick_get_broadcast_mask()))
+			if (!cpumask_empty(tick_broadcast_mask))
 				broadcast = tick_resume_broadcast_oneshot(bc);
 			break;
 		}
@@ -395,15 +391,14 @@ int tick_resume_broadcast(void)
 
 #ifdef CONFIG_TICK_ONESHOT
 
-/* FIXME: use cpumask_var_t. */
-static DECLARE_BITMAP(tick_broadcast_oneshot_mask, NR_CPUS);
+static cpumask_var_t tick_broadcast_oneshot_mask;
 
 /*
  * Exposed for debugging: see timer_list.c
  */
 struct cpumask *tick_get_broadcast_oneshot_mask(void)
 {
-	return to_cpumask(tick_broadcast_oneshot_mask);
+	return tick_broadcast_oneshot_mask;
 }
 
 static int tick_broadcast_set_event(ktime_t expires, int force)
@@ -428,7 +423,7 @@ int tick_resume_broadcast_oneshot(struct
  */
 void tick_check_oneshot_broadcast(int cpu)
 {
-	if (cpumask_test_cpu(cpu, to_cpumask(tick_broadcast_oneshot_mask))) {
+	if (cpumask_test_cpu(cpu, tick_broadcast_oneshot_mask)) {
 		struct tick_device *td = &per_cpu(tick_cpu_device, cpu);
 
 		clockevents_set_mode(td->evtdev, CLOCK_EVT_MODE_ONESHOT);
@@ -448,13 +443,13 @@ static void tick_handle_oneshot_broadcas
 again:
 	dev->next_event.tv64 = KTIME_MAX;
 	next_event.tv64 = KTIME_MAX;
-	cpumask_clear(to_cpumask(tmpmask));
+	cpumask_clear(tmpmask);
 	now = ktime_get();
 	/* Find all expired events */
-	for_each_cpu(cpu, tick_get_broadcast_oneshot_mask()) {
+	for_each_cpu(cpu, tick_broadcast_oneshot_mask) {
 		td = &per_cpu(tick_cpu_device, cpu);
 		if (td->evtdev->next_event.tv64 <= now.tv64)
-			cpumask_set_cpu(cpu, to_cpumask(tmpmask));
+			cpumask_set_cpu(cpu, tmpmask);
 		else if (td->evtdev->next_event.tv64 < next_event.tv64)
 			next_event.tv64 = td->evtdev->next_event.tv64;
 	}
@@ -462,7 +457,7 @@ again:
 	/*
 	 * Wakeup the cpus which have an expired event.
 	 */
-	tick_do_broadcast(to_cpumask(tmpmask));
+	tick_do_broadcast(tmpmask);
 
 	/*
 	 * Two reasons for reprogram:
@@ -518,16 +513,13 @@ void tick_broadcast_oneshot_control(unsi
 
 	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 	if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
-		if (!cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) {
-			cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask());
+		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
 			if (dev->next_event.tv64 < bc->next_event.tv64)
 				tick_broadcast_set_event(dev->next_event, 1);
 		}
 	} else {
-		if (cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) {
-			cpumask_clear_cpu(cpu,
-					  tick_get_broadcast_oneshot_mask());
+		if (cpumask_test_and_clear_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
 			if (dev->next_event.tv64 != KTIME_MAX)
 				tick_program_event(dev->next_event, 1);
@@ -543,7 +535,7 @@ void tick_broadcast_oneshot_control(unsi
  */
 static void tick_broadcast_clear_oneshot(int cpu)
 {
-	cpumask_clear_cpu(cpu, tick_get_broadcast_oneshot_mask());
+	cpumask_clear_cpu(cpu, tick_broadcast_oneshot_mask);
 }
 
 static void tick_broadcast_init_next_event(struct cpumask *mask,
@@ -581,15 +573,14 @@ void tick_broadcast_setup_oneshot(struct
 		 * oneshot_mask bits for those and program the
 		 * broadcast device to fire.
 		 */
-		cpumask_copy(to_cpumask(tmpmask), tick_get_broadcast_mask());
-		cpumask_clear_cpu(cpu, to_cpumask(tmpmask));
-		cpumask_or(tick_get_broadcast_oneshot_mask(),
-			   tick_get_broadcast_oneshot_mask(),
-			   to_cpumask(tmpmask));
+		cpumask_copy(tmpmask, tick_broadcast_mask);
+		cpumask_clear_cpu(cpu, tmpmask);
+		cpumask_or(tick_broadcast_oneshot_mask,
+			   tick_broadcast_oneshot_mask, tmpmask);
 
-		if (was_periodic && !cpumask_empty(to_cpumask(tmpmask))) {
+		if (was_periodic && !cpumask_empty(tmpmask)) {
 			clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);
-			tick_broadcast_init_next_event(to_cpumask(tmpmask),
+			tick_broadcast_init_next_event(tmpmask,
 						       tick_next_period);
 			tick_broadcast_set_event(tick_next_period, 1);
 		} else
@@ -639,7 +630,7 @@ void tick_shutdown_broadcast_oneshot(uns
 	 * Clear the broadcast mask flag for the dead cpu, but do not
 	 * stop the broadcast device!
 	 */
-	cpumask_clear_cpu(cpu, tick_get_broadcast_oneshot_mask());
+	cpumask_clear_cpu(cpu, tick_broadcast_oneshot_mask);
 
 	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 }
@@ -663,3 +654,12 @@ bool tick_broadcast_oneshot_available(vo
 }
 
 #endif
+
+void __init tick_broadcast_init(void)
+{
+	alloc_cpumask_var(&tick_broadcast_mask, GFP_NOWAIT);
+	alloc_cpumask_var(&tmpmask, GFP_NOWAIT);
+#ifdef CONFIG_TICK_ONESHOT
+	alloc_cpumask_var(&tick_broadcast_oneshot_mask, GFP_NOWAIT);
+#endif
+}
Index: tip/kernel/time/tick-common.c
===================================================================
--- tip.orig/kernel/time/tick-common.c
+++ tip/kernel/time/tick-common.c
@@ -416,4 +416,5 @@ static struct notifier_block tick_notifi
 void __init tick_init(void)
 {
 	clockevents_register_notifier(&tick_notifier);
+	tick_broadcast_init();
 }
Index: tip/kernel/time/tick-internal.h
===================================================================
--- tip.orig/kernel/time/tick-internal.h
+++ tip/kernel/time/tick-internal.h
@@ -94,7 +94,7 @@ extern void tick_broadcast_on_off(unsign
 extern void tick_shutdown_broadcast(unsigned int *cpup);
 extern void tick_suspend_broadcast(void);
 extern int tick_resume_broadcast(void);
-
+extern void tick_broadcast_init(void);
 extern void
 tick_set_periodic_handler(struct clock_event_device *dev, int broadcast);
 
@@ -119,6 +119,7 @@ static inline void tick_broadcast_on_off
 static inline void tick_shutdown_broadcast(unsigned int *cpup) { }
 static inline void tick_suspend_broadcast(void) { }
 static inline int tick_resume_broadcast(void) { return 0; }
+static inline void tick_broadcast_init(void) { }
 
 /*
  * Set the periodic handler in non broadcast mode



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 2/7] tick: Convert broadcast cpu bitmaps to cpumask_var_t
@ 2013-03-06 11:18   ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: tick-broadcast-convert-to-cpumask-t.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130306/b41f91c1/attachment.ksh>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 4/7] tick: Handle broadcast wakeup of multiple cpus
  2013-03-06 11:18 ` Thomas Gleixner
@ 2013-03-06 11:18   ` Thomas Gleixner
  -1 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: LKML
  Cc: LAK, John Stultz, Arjan van de Veen, Lorenzo Pieralisi,
	Santosh Shilimkar, Jason Liu

[-- Attachment #1: tick-broadcast-handle-mass-wakeup.patch --]
[-- Type: text/plain, Size: 5236 bytes --]

Some brilliant hardware implementations wake multiple cores when the
broadcast timer fires. This leads to the following interesting
problem:

CPU0				CPU1
wakeup from idle		wakeup from idle

leave broadcast mode		leave broadcast mode
 restart per cpu timer		 restart per cpu timer
 	     	 		go back to idle
handle broadcast
 (empty mask)			
				enter broadcast mode
				programm broadcast device
enter broadcast mode
programm broadcast device

So what happens is that due to the forced reprogramming of the cpu
local timer, we need to set a event in the future. Now if we manage to
go back to idle before the timer fires, we switch off the timer and
arm the broadcast device with an already expired time (covered by
forced mode). So in the worst case we repeat the above ping pong
forever.
					
Unfortunately we have no information about what caused the wakeup, but
we can check current time against the expiry time of the local cpu. If
the local event is already in the past, we know that the broadcast
timer is about to fire and send an IPI. So we mark ourself as an IPI
target even if we left broadcast mode and avoid the reprogramming of
the local cpu timer.

This still leaves the possibility that a CPU which is not handling the
broadcast interrupt is going to reach idle again before the IPI
arrives. This can't be solved in the core code and will be handled in
follow up patches.

Reported-by: Jason Liu <liu.h.jason@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-broadcast.c |   59 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

Index: tip/kernel/time/tick-broadcast.c
===================================================================
--- tip.orig/kernel/time/tick-broadcast.c
+++ tip/kernel/time/tick-broadcast.c
@@ -393,6 +393,7 @@ int tick_resume_broadcast(void)
 
 static cpumask_var_t tick_broadcast_oneshot_mask;
 static cpumask_var_t tick_broadcast_pending_mask;
+static cpumask_var_t tick_broadcast_force_mask;
 
 /*
  * Exposed for debugging: see timer_list.c
@@ -462,6 +463,10 @@ again:
 		}
 	}
 
+	/* Take care of enforced broadcast requests */
+	cpumask_or(tmpmask, tmpmask, tick_broadcast_force_mask);
+	cpumask_clear(tick_broadcast_force_mask);
+
 	/*
 	 * Wakeup the cpus which have an expired event.
 	 */
@@ -497,6 +502,7 @@ void tick_broadcast_oneshot_control(unsi
 	struct clock_event_device *bc, *dev;
 	struct tick_device *td;
 	unsigned long flags;
+	ktime_t now;
 	int cpu;
 
 	/*
@@ -524,7 +530,16 @@ void tick_broadcast_oneshot_control(unsi
 		WARN_ON_ONCE(cpumask_test_cpu(cpu, tick_broadcast_pending_mask));
 		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
-			if (dev->next_event.tv64 < bc->next_event.tv64)
+			/*
+			 * We only reprogram the broadcast timer if we
+			 * did not mark ourself in the force mask and
+			 * if the cpu local event is earlier than the
+			 * broadcast event. If the current CPU is in
+			 * the force mask, then we are going to be
+			 * woken by the IPI right away.
+			 */
+			if (!cpumask_test_cpu(cpu, tick_broadcast_force_mask) &&
+			    dev->next_event.tv64 < bc->next_event.tv64)
 				tick_broadcast_set_event(dev->next_event, 1);
 		}
 	} else {
@@ -545,6 +560,47 @@ void tick_broadcast_oneshot_control(unsi
 				       tick_broadcast_pending_mask))
 				goto out;
 
+			/*
+			 * If the pending bit is not set, then we are
+			 * either the CPU handling the broadcast
+			 * interrupt or we got woken by something else.
+			 *
+			 * We are not longer in the broadcast mask, so
+			 * if the cpu local expiry time is already
+			 * reached, we would reprogram the cpu local
+			 * timer with an already expired event.
+			 *
+			 * This can lead to a ping-pong when we return
+			 * to idle and therefor rearm the broadcast
+			 * timer before the cpu local timer was able
+			 * to fire. This happens because the forced
+			 * reprogramming makes sure that the event
+			 * will happen in the future and depending on
+			 * the min_delta setting this might be far
+			 * enough out that the ping-pong starts.
+			 *
+			 * If the cpu local next_event has expired
+			 * then we know that the broadcast timer
+			 * next_event has expired as well and
+			 * broadcast is about to be handled. So we
+			 * avoid reprogramming and enforce that the
+			 * broadcast handler, which did not run yet,
+			 * will invoke the cpu local handler.
+			 *
+			 * We cannot call the handler directly from
+			 * here, because we might be in a NOHZ phase
+			 * and we did not go through the irq_enter()
+			 * nohz fixups.
+			 */
+			now = ktime_get();
+			if (dev->next_event.tv64 <= now.tv64) {
+				cpumask_set_cpu(cpu, tick_broadcast_force_mask);
+				goto out;
+			}
+			/*
+			 * We got woken by something else. Reprogram
+			 * the cpu local timer device.
+			 */
 			tick_program_event(dev->next_event, 1);
 		}
 	}
@@ -686,5 +742,6 @@ void tick_broadcast_init(void)
 #ifdef CONFIG_TICK_ONESHOT
 	alloc_cpumask_var(&tick_broadcast_oneshot_mask, GFP_NOWAIT);
 	alloc_cpumask_var(&tick_broadcast_pending_mask, GFP_NOWAIT);
+	alloc_cpumask_var(&tick_broadcast_force_mask, GFP_NOWAIT);
 #endif
 }



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 3/7] tick: Avoid programming the local cpu timer if broadcast pending
  2013-03-06 11:18 ` Thomas Gleixner
@ 2013-03-06 11:18   ` Thomas Gleixner
  -1 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: LKML
  Cc: LAK, John Stultz, Arjan van de Veen, Lorenzo Pieralisi,
	Santosh Shilimkar, Jason Liu

[-- Attachment #1: tick-avoid-programming-local-cpu-timer-if-broadcast-pending.patch --]
[-- Type: text/plain, Size: 3693 bytes --]

If the local cpu timer stops in deep idle, we arm the broadcast device
and get woken by an IPI. Now when we return from deep idle we reenable
the local cpu timer unconditionally before handling the IPI. But
that's a pointless exercise: the timer is already expired and the IPI
is on the way. And it's an expensive exercise as we use the forced
reprogramming mode so that we do not lose a timer event. This forced
reprogramming will loop at least once in the retry.

To avoid this reprogramming, we mark the cpu in a pending bit mask
before we send the IPI. Now when the IPI target cpu wakes up, it will
see the pending bit set and skip the reprogramming. The reprogramming
of the cpu local timer will happen in the IPI handler which runs the
cpu local timer interrupt function.

Reported-by: Jason Liu <liu.h.jason@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-broadcast.c |   33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

Index: tip/kernel/time/tick-broadcast.c
===================================================================
--- tip.orig/kernel/time/tick-broadcast.c
+++ tip/kernel/time/tick-broadcast.c
@@ -392,6 +392,7 @@ int tick_resume_broadcast(void)
 #ifdef CONFIG_TICK_ONESHOT
 
 static cpumask_var_t tick_broadcast_oneshot_mask;
+static cpumask_var_t tick_broadcast_pending_mask;
 
 /*
  * Exposed for debugging: see timer_list.c
@@ -448,10 +449,17 @@ again:
 	/* Find all expired events */
 	for_each_cpu(cpu, tick_broadcast_oneshot_mask) {
 		td = &per_cpu(tick_cpu_device, cpu);
-		if (td->evtdev->next_event.tv64 <= now.tv64)
+		if (td->evtdev->next_event.tv64 <= now.tv64) {
 			cpumask_set_cpu(cpu, tmpmask);
-		else if (td->evtdev->next_event.tv64 < next_event.tv64)
+			/*
+			 * Mark the remote cpu in the pending mask, so
+			 * it can avoid reprogramming the cpu local
+			 * timer in tick_broadcast_oneshot_control().
+			 */
+			cpumask_set_cpu(cpu, tick_broadcast_pending_mask);
+		} else if (td->evtdev->next_event.tv64 < next_event.tv64) {
 			next_event.tv64 = td->evtdev->next_event.tv64;
+		}
 	}
 
 	/*
@@ -513,6 +521,7 @@ void tick_broadcast_oneshot_control(unsi
 
 	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 	if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
+		WARN_ON_ONCE(cpumask_test_cpu(cpu, tick_broadcast_pending_mask));
 		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
 			if (dev->next_event.tv64 < bc->next_event.tv64)
@@ -521,10 +530,25 @@ void tick_broadcast_oneshot_control(unsi
 	} else {
 		if (cpumask_test_and_clear_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
-			if (dev->next_event.tv64 != KTIME_MAX)
-				tick_program_event(dev->next_event, 1);
+			if (dev->next_event.tv64 == KTIME_MAX)
+				goto out;
+			/*
+			 * The cpu which was handling the broadcast
+			 * timer marked this cpu in the broadcast
+			 * pending mask and fired the broadcast
+			 * IPI. So we are going to handle the expired
+			 * event anyway via the broadcast IPI
+			 * handler. No need to reprogram the timer
+			 * with an already expired event.
+			 */
+			if (cpumask_test_and_clear_cpu(cpu,
+				       tick_broadcast_pending_mask))
+				goto out;
+
+			tick_program_event(dev->next_event, 1);
 		}
 	}
+out:
 	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 }
 
@@ -661,5 +685,6 @@ void tick_broadcast_init(void)
 	alloc_cpumask_var(&tmpmask, GFP_NOWAIT);
 #ifdef CONFIG_TICK_ONESHOT
 	alloc_cpumask_var(&tick_broadcast_oneshot_mask, GFP_NOWAIT);
+	alloc_cpumask_var(&tick_broadcast_pending_mask, GFP_NOWAIT);
 #endif
 }



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 3/7] tick: Avoid programming the local cpu timer if broadcast pending
@ 2013-03-06 11:18   ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: tick-avoid-programming-local-cpu-timer-if-broadcast-pending.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130306/52b75b8a/attachment.ksh>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 4/7] tick: Handle broadcast wakeup of multiple cpus
@ 2013-03-06 11:18   ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: tick-broadcast-handle-mass-wakeup.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130306/7e87bc1c/attachment.ksh>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 5/7] tick: Provide a check for a forced broadcast pending
  2013-03-06 11:18 ` Thomas Gleixner
@ 2013-03-06 11:18   ` Thomas Gleixner
  -1 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: LKML
  Cc: LAK, John Stultz, Arjan van de Veen, Lorenzo Pieralisi,
	Santosh Shilimkar, Jason Liu

[-- Attachment #1: tick-provide-check-for-forced-broadcast-pending.patch --]
[-- Type: text/plain, Size: 2496 bytes --]

On the CPU which gets woken along with the target CPU of the broadcast
the following happens:

  deep_idle()
			<-- spurious wakeup
  broadcast_exit()
    set forced bit
  
  enable interrupts
    
			<-- Nothing happens

  disable interrupts

  broadcast_enter()
			<-- Here we observe the forced bit is set
  deep_idle()

Now after that the target CPU of the broadcast runs the broadcast
handler and finds the other CPU in both the broadcast and the forced
mask, sends the IPI and stuff gets back to normal.

So it's not actually harmful, just more evidence for the theory, that
hardware designers have access to very special drug supplies.

Now there is no point in going back to deep idle just to wake up again
right away via an IPI. Provide a check which allows the idle code to
avoid the deep idle transition.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 include/linux/clockchips.h   |    6 ++++++
 kernel/time/tick-broadcast.c |   12 ++++++++++++
 2 files changed, 18 insertions(+)

Index: tip/include/linux/clockchips.h
===================================================================
--- tip.orig/include/linux/clockchips.h
+++ tip/include/linux/clockchips.h
@@ -170,6 +170,12 @@ extern void tick_broadcast(const struct 
 extern int tick_receive_broadcast(void);
 #endif
 
+#if defined(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST) && defined(CONFIG_TICK_ONESHOT)
+extern int tick_check_broadcast_expired(void);
+#else
+static inline int tick_check_broadcast_expired(void) { return 0; }
+#endif
+
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
 extern void clockevents_notify(unsigned long reason, void *arg);
 #else
Index: tip/kernel/time/tick-broadcast.c
===================================================================
--- tip.orig/kernel/time/tick-broadcast.c
+++ tip/kernel/time/tick-broadcast.c
@@ -413,6 +413,18 @@ static int tick_broadcast_set_event(ktim
 	return clockevents_program_event(bc, expires, force);
 }
 
+/*
+ * Called before going idle with interrupts disabled. Checks whether a
+ * broadcast event from the other core is about to happen. We detected
+ * that in tick_broadcast_oneshot_control(). The callsite can use this
+ * to avoid a deep idle transition as we are about to get the
+ * broadcast IPI right away.
+ */
+int tick_check_broadcast_expired(void)
+{
+	return cpumask_test_cpu(smp_processor_id(), tick_broadcast_force_mask);
+}
+
 int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
 {
 	clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 5/7] tick: Provide a check for a forced broadcast pending
@ 2013-03-06 11:18   ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: tick-provide-check-for-forced-broadcast-pending.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130306/8859fa0e/attachment.ksh>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 6/7] arm: Use tick broadcast expired check
  2013-03-06 11:18 ` Thomas Gleixner
@ 2013-03-06 11:18   ` Thomas Gleixner
  -1 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: LKML
  Cc: LAK, John Stultz, Arjan van de Veen, Lorenzo Pieralisi,
	Santosh Shilimkar, Jason Liu

[-- Attachment #1: arm-use-broadcast-expired-check.patch --]
[-- Type: text/plain, Size: 951 bytes --]

Avoid going back into deep idle if the tick broadcast IPI is about to
fire.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/arm/kernel/process.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Index: tip/arch/arm/kernel/process.c
===================================================================
--- tip.orig/arch/arm/kernel/process.c
+++ tip/arch/arm/kernel/process.c
@@ -199,7 +199,16 @@ void cpu_idle(void)
 #ifdef CONFIG_PL310_ERRATA_769419
 			wmb();
 #endif
-			if (hlt_counter) {
+			/*
+			 * In poll mode we reenable interrupts and spin.
+			 *
+			 * Also if we detected in the wakeup from idle
+			 * path that the tick broadcast device expired
+			 * for us, we don't want to go deep idle as we
+			 * know that the IPI is going to arrive right
+			 * away
+			 */
+			if (hlt_counter || tick_check_broadcast_expired()) {
 				local_irq_enable();
 				cpu_relax();
 			} else if (!need_resched()) {



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 7/7] x86: Use tick broadcast expired check
  2013-03-06 11:18 ` Thomas Gleixner
@ 2013-03-06 11:18   ` Thomas Gleixner
  -1 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: LKML
  Cc: LAK, John Stultz, Arjan van de Veen, Lorenzo Pieralisi,
	Santosh Shilimkar, Jason Liu, x86

[-- Attachment #1: x86-use-broadcast-expired-check.patch --]
[-- Type: text/plain, Size: 894 bytes --]

Avoid going back into deep idle if the tick broadcast IPI is about to
fire.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
---
 arch/x86/kernel/process.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

Index: tip/arch/x86/kernel/process.c
===================================================================
--- tip.orig/arch/x86/kernel/process.c
+++ tip/arch/x86/kernel/process.c
@@ -336,6 +336,18 @@ void cpu_idle(void)
 			local_touch_nmi();
 			local_irq_disable();
 
+			/*
+			 * We detected in the wakeup path that the
+			 * tick broadcast device expired for us, but
+			 * we raced with the other CPU and came back
+			 * here before it was able to fire the IPI.
+			 * No point in going idle.
+			 */
+			if (tick_check_broadcast_expired()) {
+				local_irq_enable();
+				continue;
+			}
+
 			enter_idle();
 
 			/* Don't trace irqs off for idle */



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 6/7] arm: Use tick broadcast expired check
@ 2013-03-06 11:18   ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: arm-use-broadcast-expired-check.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130306/53dd5770/attachment.ksh>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 7/7] x86: Use tick broadcast expired check
@ 2013-03-06 11:18   ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-06 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

An embedded and charset-unspecified text was scrubbed...
Name: x86-use-broadcast-expired-check.patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20130306/9e773a0f/attachment.ksh>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 2/7] tick: Convert broadcast cpu bitmaps to cpumask_var_t
  2013-03-06 11:18   ` Thomas Gleixner
@ 2013-03-07  5:51     ` Rusty Russell
  -1 siblings, 0 replies; 31+ messages in thread
From: Rusty Russell @ 2013-03-07  5:51 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: LAK, John Stultz, Arjan van de Veen, Lorenzo Pieralisi,
	Santosh Shilimkar, Jason Liu

Thomas Gleixner <tglx@linutronix.de> writes:
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  kernel/time/tick-broadcast.c |   86 +++++++++++++++++++++----------------------
>  kernel/time/tick-common.c    |    1 
>  kernel/time/tick-internal.h  |    3 +
>  3 files changed, 46 insertions(+), 44 deletions(-)
>
> Index: tip/kernel/time/tick-broadcast.c
> ===================================================================
> --- tip.orig/kernel/time/tick-broadcast.c
> +++ tip/kernel/time/tick-broadcast.c
> @@ -28,9 +28,8 @@
>   */
>  
>  static struct tick_device tick_broadcast_device;
> -/* FIXME: Use cpumask_var_t. */
> -static DECLARE_BITMAP(tick_broadcast_mask, NR_CPUS);
> -static DECLARE_BITMAP(tmpmask, NR_CPUS);
> +static cpumask_var_t tick_broadcast_mask;
> +static cpumask_var_t tmpmask;
>  static DEFINE_RAW_SPINLOCK(tick_broadcast_lock);
>  static int tick_broadcast_force;

Thanks, this is a nice cleanup.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 2/7] tick: Convert broadcast cpu bitmaps to cpumask_var_t
@ 2013-03-07  5:51     ` Rusty Russell
  0 siblings, 0 replies; 31+ messages in thread
From: Rusty Russell @ 2013-03-07  5:51 UTC (permalink / raw)
  To: linux-arm-kernel

Thomas Gleixner <tglx@linutronix.de> writes:
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> ---
>  kernel/time/tick-broadcast.c |   86 +++++++++++++++++++++----------------------
>  kernel/time/tick-common.c    |    1 
>  kernel/time/tick-internal.h  |    3 +
>  3 files changed, 46 insertions(+), 44 deletions(-)
>
> Index: tip/kernel/time/tick-broadcast.c
> ===================================================================
> --- tip.orig/kernel/time/tick-broadcast.c
> +++ tip/kernel/time/tick-broadcast.c
> @@ -28,9 +28,8 @@
>   */
>  
>  static struct tick_device tick_broadcast_device;
> -/* FIXME: Use cpumask_var_t. */
> -static DECLARE_BITMAP(tick_broadcast_mask, NR_CPUS);
> -static DECLARE_BITMAP(tmpmask, NR_CPUS);
> +static cpumask_var_t tick_broadcast_mask;
> +static cpumask_var_t tmpmask;
>  static DEFINE_RAW_SPINLOCK(tick_broadcast_lock);
>  static int tick_broadcast_force;

Thanks, this is a nice cleanup.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:timers/core] tick: Call tick_init late
  2013-03-06 11:18   ` Thomas Gleixner
  (?)
@ 2013-03-07 16:29   ` tip-bot for Thomas Gleixner
  -1 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-03-07 16:29 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, rusty, tglx

Commit-ID:  ad2b13536ace08dfcca4cf86b75a5d06efe06373
Gitweb:     http://git.kernel.org/tip/ad2b13536ace08dfcca4cf86b75a5d06efe06373
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Tue, 5 Mar 2013 15:14:05 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 7 Mar 2013 16:13:25 +0100

tick: Call tick_init late

To convert the clockevents code to cpumask_var_t we need to move the
init call after the allocator setup. 

Clockevents are earliest registered from time_init() as they need
interrupts being set up, so this is safe.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130306111537.304379448@linutronix.de
Cc: Rusty Russell <rusty@rustcorp.com.au>
---
 init/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index 63534a1..b3e0614 100644
--- a/init/main.c
+++ b/init/main.c
@@ -494,7 +494,6 @@ asmlinkage void __init start_kernel(void)
  * Interrupts are still disabled. Do necessary setups, then
  * enable them
  */
-	tick_init();
 	boot_cpu_init();
 	page_address_init();
 	printk(KERN_NOTICE "%s", linux_banner);
@@ -551,6 +550,7 @@ asmlinkage void __init start_kernel(void)
 	/* init some links before init_ISA_irqs() */
 	early_irq_init();
 	init_IRQ();
+	tick_init();
 	init_timers();
 	hrtimers_init();
 	softirq_init();

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [tip:timers/core] tick: Convert broadcast cpu bitmaps to cpumask_var_t
  2013-03-06 11:18   ` Thomas Gleixner
  (?)
  (?)
@ 2013-03-07 16:30   ` tip-bot for Thomas Gleixner
  -1 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-03-07 16:30 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, rusty, tglx

Commit-ID:  b352bc1cbc29134a356b5c16ee2281807a7b984e
Gitweb:     http://git.kernel.org/tip/b352bc1cbc29134a356b5c16ee2281807a7b984e
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Tue, 5 Mar 2013 14:25:32 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 7 Mar 2013 16:13:26 +0100

tick: Convert broadcast cpu bitmaps to cpumask_var_t

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130306111537.366394000@linutronix.de
Cc: Rusty Russell <rusty@rustcorp.com.au>
---
 kernel/time/tick-broadcast.c | 86 ++++++++++++++++++++++----------------------
 kernel/time/tick-common.c    |  1 +
 kernel/time/tick-internal.h  |  3 +-
 3 files changed, 46 insertions(+), 44 deletions(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 2fb8cb8..35b8875 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -28,9 +28,8 @@
  */
 
 static struct tick_device tick_broadcast_device;
-/* FIXME: Use cpumask_var_t. */
-static DECLARE_BITMAP(tick_broadcast_mask, NR_CPUS);
-static DECLARE_BITMAP(tmpmask, NR_CPUS);
+static cpumask_var_t tick_broadcast_mask;
+static cpumask_var_t tmpmask;
 static DEFINE_RAW_SPINLOCK(tick_broadcast_lock);
 static int tick_broadcast_force;
 
@@ -50,7 +49,7 @@ struct tick_device *tick_get_broadcast_device(void)
 
 struct cpumask *tick_get_broadcast_mask(void)
 {
-	return to_cpumask(tick_broadcast_mask);
+	return tick_broadcast_mask;
 }
 
 /*
@@ -74,7 +73,7 @@ int tick_check_broadcast_device(struct clock_event_device *dev)
 
 	clockevents_exchange_device(tick_broadcast_device.evtdev, dev);
 	tick_broadcast_device.evtdev = dev;
-	if (!cpumask_empty(tick_get_broadcast_mask()))
+	if (!cpumask_empty(tick_broadcast_mask))
 		tick_broadcast_start_periodic(dev);
 	return 1;
 }
@@ -123,7 +122,7 @@ int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu)
 	if (!tick_device_is_functional(dev)) {
 		dev->event_handler = tick_handle_periodic;
 		tick_device_setup_broadcast_func(dev);
-		cpumask_set_cpu(cpu, tick_get_broadcast_mask());
+		cpumask_set_cpu(cpu, tick_broadcast_mask);
 		tick_broadcast_start_periodic(tick_broadcast_device.evtdev);
 		ret = 1;
 	} else {
@@ -134,7 +133,7 @@ int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu)
 		 */
 		if (!(dev->features & CLOCK_EVT_FEAT_C3STOP)) {
 			int cpu = smp_processor_id();
-			cpumask_clear_cpu(cpu, tick_get_broadcast_mask());
+			cpumask_clear_cpu(cpu, tick_broadcast_mask);
 			tick_broadcast_clear_oneshot(cpu);
 		} else {
 			tick_device_setup_broadcast_func(dev);
@@ -198,9 +197,8 @@ static void tick_do_periodic_broadcast(void)
 {
 	raw_spin_lock(&tick_broadcast_lock);
 
-	cpumask_and(to_cpumask(tmpmask),
-		    cpu_online_mask, tick_get_broadcast_mask());
-	tick_do_broadcast(to_cpumask(tmpmask));
+	cpumask_and(tmpmask, cpu_online_mask, tick_broadcast_mask);
+	tick_do_broadcast(tmpmask);
 
 	raw_spin_unlock(&tick_broadcast_lock);
 }
@@ -263,13 +261,12 @@ static void tick_do_broadcast_on_off(unsigned long *reason)
 	if (!tick_device_is_functional(dev))
 		goto out;
 
-	bc_stopped = cpumask_empty(tick_get_broadcast_mask());
+	bc_stopped = cpumask_empty(tick_broadcast_mask);
 
 	switch (*reason) {
 	case CLOCK_EVT_NOTIFY_BROADCAST_ON:
 	case CLOCK_EVT_NOTIFY_BROADCAST_FORCE:
-		if (!cpumask_test_cpu(cpu, tick_get_broadcast_mask())) {
-			cpumask_set_cpu(cpu, tick_get_broadcast_mask());
+		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_mask)) {
 			if (tick_broadcast_device.mode ==
 			    TICKDEV_MODE_PERIODIC)
 				clockevents_shutdown(dev);
@@ -279,8 +276,7 @@ static void tick_do_broadcast_on_off(unsigned long *reason)
 		break;
 	case CLOCK_EVT_NOTIFY_BROADCAST_OFF:
 		if (!tick_broadcast_force &&
-		    cpumask_test_cpu(cpu, tick_get_broadcast_mask())) {
-			cpumask_clear_cpu(cpu, tick_get_broadcast_mask());
+		    cpumask_test_and_clear_cpu(cpu, tick_broadcast_mask)) {
 			if (tick_broadcast_device.mode ==
 			    TICKDEV_MODE_PERIODIC)
 				tick_setup_periodic(dev, 0);
@@ -288,7 +284,7 @@ static void tick_do_broadcast_on_off(unsigned long *reason)
 		break;
 	}
 
-	if (cpumask_empty(tick_get_broadcast_mask())) {
+	if (cpumask_empty(tick_broadcast_mask)) {
 		if (!bc_stopped)
 			clockevents_shutdown(bc);
 	} else if (bc_stopped) {
@@ -337,10 +333,10 @@ void tick_shutdown_broadcast(unsigned int *cpup)
 	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 
 	bc = tick_broadcast_device.evtdev;
-	cpumask_clear_cpu(cpu, tick_get_broadcast_mask());
+	cpumask_clear_cpu(cpu, tick_broadcast_mask);
 
 	if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC) {
-		if (bc && cpumask_empty(tick_get_broadcast_mask()))
+		if (bc && cpumask_empty(tick_broadcast_mask))
 			clockevents_shutdown(bc);
 	}
 
@@ -376,13 +372,13 @@ int tick_resume_broadcast(void)
 
 		switch (tick_broadcast_device.mode) {
 		case TICKDEV_MODE_PERIODIC:
-			if (!cpumask_empty(tick_get_broadcast_mask()))
+			if (!cpumask_empty(tick_broadcast_mask))
 				tick_broadcast_start_periodic(bc);
 			broadcast = cpumask_test_cpu(smp_processor_id(),
-						     tick_get_broadcast_mask());
+						     tick_broadcast_mask);
 			break;
 		case TICKDEV_MODE_ONESHOT:
-			if (!cpumask_empty(tick_get_broadcast_mask()))
+			if (!cpumask_empty(tick_broadcast_mask))
 				broadcast = tick_resume_broadcast_oneshot(bc);
 			break;
 		}
@@ -395,15 +391,14 @@ int tick_resume_broadcast(void)
 
 #ifdef CONFIG_TICK_ONESHOT
 
-/* FIXME: use cpumask_var_t. */
-static DECLARE_BITMAP(tick_broadcast_oneshot_mask, NR_CPUS);
+static cpumask_var_t tick_broadcast_oneshot_mask;
 
 /*
  * Exposed for debugging: see timer_list.c
  */
 struct cpumask *tick_get_broadcast_oneshot_mask(void)
 {
-	return to_cpumask(tick_broadcast_oneshot_mask);
+	return tick_broadcast_oneshot_mask;
 }
 
 static int tick_broadcast_set_event(ktime_t expires, int force)
@@ -428,7 +423,7 @@ int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
  */
 void tick_check_oneshot_broadcast(int cpu)
 {
-	if (cpumask_test_cpu(cpu, to_cpumask(tick_broadcast_oneshot_mask))) {
+	if (cpumask_test_cpu(cpu, tick_broadcast_oneshot_mask)) {
 		struct tick_device *td = &per_cpu(tick_cpu_device, cpu);
 
 		clockevents_set_mode(td->evtdev, CLOCK_EVT_MODE_ONESHOT);
@@ -448,13 +443,13 @@ static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
 again:
 	dev->next_event.tv64 = KTIME_MAX;
 	next_event.tv64 = KTIME_MAX;
-	cpumask_clear(to_cpumask(tmpmask));
+	cpumask_clear(tmpmask);
 	now = ktime_get();
 	/* Find all expired events */
-	for_each_cpu(cpu, tick_get_broadcast_oneshot_mask()) {
+	for_each_cpu(cpu, tick_broadcast_oneshot_mask) {
 		td = &per_cpu(tick_cpu_device, cpu);
 		if (td->evtdev->next_event.tv64 <= now.tv64)
-			cpumask_set_cpu(cpu, to_cpumask(tmpmask));
+			cpumask_set_cpu(cpu, tmpmask);
 		else if (td->evtdev->next_event.tv64 < next_event.tv64)
 			next_event.tv64 = td->evtdev->next_event.tv64;
 	}
@@ -462,7 +457,7 @@ again:
 	/*
 	 * Wakeup the cpus which have an expired event.
 	 */
-	tick_do_broadcast(to_cpumask(tmpmask));
+	tick_do_broadcast(tmpmask);
 
 	/*
 	 * Two reasons for reprogram:
@@ -518,16 +513,13 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 
 	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 	if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
-		if (!cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) {
-			cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask());
+		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
 			if (dev->next_event.tv64 < bc->next_event.tv64)
 				tick_broadcast_set_event(dev->next_event, 1);
 		}
 	} else {
-		if (cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) {
-			cpumask_clear_cpu(cpu,
-					  tick_get_broadcast_oneshot_mask());
+		if (cpumask_test_and_clear_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
 			if (dev->next_event.tv64 != KTIME_MAX)
 				tick_program_event(dev->next_event, 1);
@@ -543,7 +535,7 @@ void tick_broadcast_oneshot_control(unsigned long reason)
  */
 static void tick_broadcast_clear_oneshot(int cpu)
 {
-	cpumask_clear_cpu(cpu, tick_get_broadcast_oneshot_mask());
+	cpumask_clear_cpu(cpu, tick_broadcast_oneshot_mask);
 }
 
 static void tick_broadcast_init_next_event(struct cpumask *mask,
@@ -581,15 +573,14 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
 		 * oneshot_mask bits for those and program the
 		 * broadcast device to fire.
 		 */
-		cpumask_copy(to_cpumask(tmpmask), tick_get_broadcast_mask());
-		cpumask_clear_cpu(cpu, to_cpumask(tmpmask));
-		cpumask_or(tick_get_broadcast_oneshot_mask(),
-			   tick_get_broadcast_oneshot_mask(),
-			   to_cpumask(tmpmask));
+		cpumask_copy(tmpmask, tick_broadcast_mask);
+		cpumask_clear_cpu(cpu, tmpmask);
+		cpumask_or(tick_broadcast_oneshot_mask,
+			   tick_broadcast_oneshot_mask, tmpmask);
 
-		if (was_periodic && !cpumask_empty(to_cpumask(tmpmask))) {
+		if (was_periodic && !cpumask_empty(tmpmask)) {
 			clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);
-			tick_broadcast_init_next_event(to_cpumask(tmpmask),
+			tick_broadcast_init_next_event(tmpmask,
 						       tick_next_period);
 			tick_broadcast_set_event(tick_next_period, 1);
 		} else
@@ -639,7 +630,7 @@ void tick_shutdown_broadcast_oneshot(unsigned int *cpup)
 	 * Clear the broadcast mask flag for the dead cpu, but do not
 	 * stop the broadcast device!
 	 */
-	cpumask_clear_cpu(cpu, tick_get_broadcast_oneshot_mask());
+	cpumask_clear_cpu(cpu, tick_broadcast_oneshot_mask);
 
 	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 }
@@ -663,3 +654,12 @@ bool tick_broadcast_oneshot_available(void)
 }
 
 #endif
+
+void __init tick_broadcast_init(void)
+{
+	alloc_cpumask_var(&tick_broadcast_mask, GFP_NOWAIT);
+	alloc_cpumask_var(&tmpmask, GFP_NOWAIT);
+#ifdef CONFIG_TICK_ONESHOT
+	alloc_cpumask_var(&tick_broadcast_oneshot_mask, GFP_NOWAIT);
+#endif
+}
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index b1600a6..74413e3 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -416,4 +416,5 @@ static struct notifier_block tick_notifier = {
 void __init tick_init(void)
 {
 	clockevents_register_notifier(&tick_notifier);
+	tick_broadcast_init();
 }
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index cf3e59e..46d9bd0 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -94,7 +94,7 @@ extern void tick_broadcast_on_off(unsigned long reason, int *oncpu);
 extern void tick_shutdown_broadcast(unsigned int *cpup);
 extern void tick_suspend_broadcast(void);
 extern int tick_resume_broadcast(void);
-
+extern void tick_broadcast_init(void);
 extern void
 tick_set_periodic_handler(struct clock_event_device *dev, int broadcast);
 
@@ -119,6 +119,7 @@ static inline void tick_broadcast_on_off(unsigned long reason, int *oncpu) { }
 static inline void tick_shutdown_broadcast(unsigned int *cpup) { }
 static inline void tick_suspend_broadcast(void) { }
 static inline int tick_resume_broadcast(void) { return 0; }
+static inline void tick_broadcast_init(void) { }
 
 /*
  * Set the periodic handler in non broadcast mode

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong
  2013-03-06 11:18 ` Thomas Gleixner
@ 2013-03-13  9:35   ` Santosh Shilimkar
  -1 siblings, 0 replies; 31+ messages in thread
From: Santosh Shilimkar @ 2013-03-13  9:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, LAK, John Stultz, Arjan van de Veen, Lorenzo Pieralisi, Jason Liu

Thomas,

On Wednesday 06 March 2013 04:48 PM, Thomas Gleixner wrote:
> Jason decoded a problem related to the broadcast timer mode. The
> reprogramming of the cpu local timer causes a huge number of
> retries. Also there is a situation where the CPU which does not handle
> the broadcast timer interrupt exits and reenters broadcast mode before
> the broadcast interrupt got handled by another CPU. This can lead to
> an interesting ping pong of the broadcast and the cpu local timer
> code.
> 
> This series addresses these problems. The first two patches convert
> the broadcast code to proper cpumask_var_t instead of adding more
> bitmaps later.
> 
> The rest of the series is adopted from the quick patches which I
> posted earlier while discussing the issue with Jason et. al.
> 
> Please give it a proper testing on your affected hardware.
> 
I have tested this revised patches on OMAP4 and OMAP5 platforms
with CPUidle enabled against 3.9-rc2. As expected they seems to
work without any issue and also fixes the reported retry issue.

Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong
@ 2013-03-13  9:35   ` Santosh Shilimkar
  0 siblings, 0 replies; 31+ messages in thread
From: Santosh Shilimkar @ 2013-03-13  9:35 UTC (permalink / raw)
  To: linux-arm-kernel

Thomas,

On Wednesday 06 March 2013 04:48 PM, Thomas Gleixner wrote:
> Jason decoded a problem related to the broadcast timer mode. The
> reprogramming of the cpu local timer causes a huge number of
> retries. Also there is a situation where the CPU which does not handle
> the broadcast timer interrupt exits and reenters broadcast mode before
> the broadcast interrupt got handled by another CPU. This can lead to
> an interesting ping pong of the broadcast and the cpu local timer
> code.
> 
> This series addresses these problems. The first two patches convert
> the broadcast code to proper cpumask_var_t instead of adding more
> bitmaps later.
> 
> The rest of the series is adopted from the quick patches which I
> posted earlier while discussing the issue with Jason et. al.
> 
> Please give it a proper testing on your affected hardware.
> 
I have tested this revised patches on OMAP4 and OMAP5 platforms
with CPUidle enabled against 3.9-rc2. As expected they seems to
work without any issue and also fixes the reported retry issue.

Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 4/7] tick: Handle broadcast wakeup of multiple cpus
  2013-03-06 11:18   ` Thomas Gleixner
@ 2013-03-13 11:36     ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 31+ messages in thread
From: Lorenzo Pieralisi @ 2013-03-13 11:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, LAK, John Stultz, Arjan van de Veen, Santosh Shilimkar, Jason Liu

Hi Thomas,

On Wed, Mar 06, 2013 at 11:18:35AM +0000, Thomas Gleixner wrote:
> Some brilliant hardware implementations wake multiple cores when the
> broadcast timer fires. This leads to the following interesting
> problem:
> 
> CPU0				CPU1
> wakeup from idle		wakeup from idle
> 
> leave broadcast mode		leave broadcast mode
>  restart per cpu timer		 restart per cpu timer
>  	     	 		go back to idle
> handle broadcast
>  (empty mask)			
> 				enter broadcast mode
> 				programm broadcast device
> enter broadcast mode
> programm broadcast device
> 
> So what happens is that due to the forced reprogramming of the cpu
> local timer, we need to set a event in the future. Now if we manage to
> go back to idle before the timer fires, we switch off the timer and
> arm the broadcast device with an already expired time (covered by
> forced mode). So in the worst case we repeat the above ping pong
> forever.
> 					
> Unfortunately we have no information about what caused the wakeup, but
> we can check current time against the expiry time of the local cpu. If
> the local event is already in the past, we know that the broadcast
> timer is about to fire and send an IPI. So we mark ourself as an IPI
> target even if we left broadcast mode and avoid the reprogramming of
> the local cpu timer.
> 
> This still leaves the possibility that a CPU which is not handling the
> broadcast interrupt is going to reach idle again before the IPI
> arrives. This can't be solved in the core code and will be handled in
> follow up patches.
> 
> Reported-by: Jason Liu <liu.h.jason@gmail.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/time/tick-broadcast.c |   59 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 58 insertions(+), 1 deletion(-)
> 
> Index: tip/kernel/time/tick-broadcast.c
> ===================================================================
> --- tip.orig/kernel/time/tick-broadcast.c
> +++ tip/kernel/time/tick-broadcast.c
> @@ -393,6 +393,7 @@ int tick_resume_broadcast(void)
>  
>  static cpumask_var_t tick_broadcast_oneshot_mask;
>  static cpumask_var_t tick_broadcast_pending_mask;
> +static cpumask_var_t tick_broadcast_force_mask;
>  
>  /*
>   * Exposed for debugging: see timer_list.c
> @@ -462,6 +463,10 @@ again:
>  		}
>  	}
>  
> +	/* Take care of enforced broadcast requests */
> +	cpumask_or(tmpmask, tmpmask, tick_broadcast_force_mask);
> +	cpumask_clear(tick_broadcast_force_mask);

I tested the set and it works fine on a dual cluster big.LITTLE testchip
using broadcast timer to manage deep idle cluster states.

Just asking a question: the force mask is cleared before sending the
timer IPI. Would not be better to clear it after the IPI is sent in

tick_do_broadcast(...) ?

Can you spot a regression if we do this ? The idle thread checks that
mask with irqs disabled, so it is possible that we clear the mask before
the CPU has a chance to get the IPI. If we clear the mask after sending
the IPI, we are increasing the chances for the idle thread to get it.

It is just a further optimization, just asking, thanks.

> +
>  	/*
>  	 * Wakeup the cpus which have an expired event.
>  	 */
> @@ -497,6 +502,7 @@ void tick_broadcast_oneshot_control(unsi
>  	struct clock_event_device *bc, *dev;
>  	struct tick_device *td;
>  	unsigned long flags;
> +	ktime_t now;
>  	int cpu;
>  
>  	/*
> @@ -524,7 +530,16 @@ void tick_broadcast_oneshot_control(unsi
>  		WARN_ON_ONCE(cpumask_test_cpu(cpu, tick_broadcast_pending_mask));
>  		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
>  			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
> -			if (dev->next_event.tv64 < bc->next_event.tv64)
> +			/*
> +			 * We only reprogram the broadcast timer if we
> +			 * did not mark ourself in the force mask and
> +			 * if the cpu local event is earlier than the
> +			 * broadcast event. If the current CPU is in
> +			 * the force mask, then we are going to be
> +			 * woken by the IPI right away.
> +			 */
> +			if (!cpumask_test_cpu(cpu, tick_broadcast_force_mask) &&
Is the test against tick_broadcast_force_mask necessary if we add the check
in the idle thread before entering idle ? It does not hurt, agreed, and we'd
better leave it there, it is just for my own understanding, thanks a lot.

Having said that, on the series:

Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

> +			    dev->next_event.tv64 < bc->next_event.tv64)
>  				tick_broadcast_set_event(dev->next_event, 1);
>  		}
>  	} else {
> @@ -545,6 +560,47 @@ void tick_broadcast_oneshot_control(unsi
>  				       tick_broadcast_pending_mask))
>  				goto out;
>  
> +			/*
> +			 * If the pending bit is not set, then we are
> +			 * either the CPU handling the broadcast
> +			 * interrupt or we got woken by something else.
> +			 *
> +			 * We are not longer in the broadcast mask, so
> +			 * if the cpu local expiry time is already
> +			 * reached, we would reprogram the cpu local
> +			 * timer with an already expired event.
> +			 *
> +			 * This can lead to a ping-pong when we return
> +			 * to idle and therefor rearm the broadcast
> +			 * timer before the cpu local timer was able
> +			 * to fire. This happens because the forced
> +			 * reprogramming makes sure that the event
> +			 * will happen in the future and depending on
> +			 * the min_delta setting this might be far
> +			 * enough out that the ping-pong starts.
> +			 *
> +			 * If the cpu local next_event has expired
> +			 * then we know that the broadcast timer
> +			 * next_event has expired as well and
> +			 * broadcast is about to be handled. So we
> +			 * avoid reprogramming and enforce that the
> +			 * broadcast handler, which did not run yet,
> +			 * will invoke the cpu local handler.
> +			 *
> +			 * We cannot call the handler directly from
> +			 * here, because we might be in a NOHZ phase
> +			 * and we did not go through the irq_enter()
> +			 * nohz fixups.
> +			 */
> +			now = ktime_get();
> +			if (dev->next_event.tv64 <= now.tv64) {
> +				cpumask_set_cpu(cpu, tick_broadcast_force_mask);
> +				goto out;
> +			}
> +			/*
> +			 * We got woken by something else. Reprogram
> +			 * the cpu local timer device.
> +			 */
>  			tick_program_event(dev->next_event, 1);
>  		}
>  	}
> @@ -686,5 +742,6 @@ void tick_broadcast_init(void)
>  #ifdef CONFIG_TICK_ONESHOT
>  	alloc_cpumask_var(&tick_broadcast_oneshot_mask, GFP_NOWAIT);
>  	alloc_cpumask_var(&tick_broadcast_pending_mask, GFP_NOWAIT);
> +	alloc_cpumask_var(&tick_broadcast_force_mask, GFP_NOWAIT);
>  #endif
>  }
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 4/7] tick: Handle broadcast wakeup of multiple cpus
@ 2013-03-13 11:36     ` Lorenzo Pieralisi
  0 siblings, 0 replies; 31+ messages in thread
From: Lorenzo Pieralisi @ 2013-03-13 11:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

On Wed, Mar 06, 2013 at 11:18:35AM +0000, Thomas Gleixner wrote:
> Some brilliant hardware implementations wake multiple cores when the
> broadcast timer fires. This leads to the following interesting
> problem:
> 
> CPU0				CPU1
> wakeup from idle		wakeup from idle
> 
> leave broadcast mode		leave broadcast mode
>  restart per cpu timer		 restart per cpu timer
>  	     	 		go back to idle
> handle broadcast
>  (empty mask)			
> 				enter broadcast mode
> 				programm broadcast device
> enter broadcast mode
> programm broadcast device
> 
> So what happens is that due to the forced reprogramming of the cpu
> local timer, we need to set a event in the future. Now if we manage to
> go back to idle before the timer fires, we switch off the timer and
> arm the broadcast device with an already expired time (covered by
> forced mode). So in the worst case we repeat the above ping pong
> forever.
> 					
> Unfortunately we have no information about what caused the wakeup, but
> we can check current time against the expiry time of the local cpu. If
> the local event is already in the past, we know that the broadcast
> timer is about to fire and send an IPI. So we mark ourself as an IPI
> target even if we left broadcast mode and avoid the reprogramming of
> the local cpu timer.
> 
> This still leaves the possibility that a CPU which is not handling the
> broadcast interrupt is going to reach idle again before the IPI
> arrives. This can't be solved in the core code and will be handled in
> follow up patches.
> 
> Reported-by: Jason Liu <liu.h.jason@gmail.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/time/tick-broadcast.c |   59 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 58 insertions(+), 1 deletion(-)
> 
> Index: tip/kernel/time/tick-broadcast.c
> ===================================================================
> --- tip.orig/kernel/time/tick-broadcast.c
> +++ tip/kernel/time/tick-broadcast.c
> @@ -393,6 +393,7 @@ int tick_resume_broadcast(void)
>  
>  static cpumask_var_t tick_broadcast_oneshot_mask;
>  static cpumask_var_t tick_broadcast_pending_mask;
> +static cpumask_var_t tick_broadcast_force_mask;
>  
>  /*
>   * Exposed for debugging: see timer_list.c
> @@ -462,6 +463,10 @@ again:
>  		}
>  	}
>  
> +	/* Take care of enforced broadcast requests */
> +	cpumask_or(tmpmask, tmpmask, tick_broadcast_force_mask);
> +	cpumask_clear(tick_broadcast_force_mask);

I tested the set and it works fine on a dual cluster big.LITTLE testchip
using broadcast timer to manage deep idle cluster states.

Just asking a question: the force mask is cleared before sending the
timer IPI. Would not be better to clear it after the IPI is sent in

tick_do_broadcast(...) ?

Can you spot a regression if we do this ? The idle thread checks that
mask with irqs disabled, so it is possible that we clear the mask before
the CPU has a chance to get the IPI. If we clear the mask after sending
the IPI, we are increasing the chances for the idle thread to get it.

It is just a further optimization, just asking, thanks.

> +
>  	/*
>  	 * Wakeup the cpus which have an expired event.
>  	 */
> @@ -497,6 +502,7 @@ void tick_broadcast_oneshot_control(unsi
>  	struct clock_event_device *bc, *dev;
>  	struct tick_device *td;
>  	unsigned long flags;
> +	ktime_t now;
>  	int cpu;
>  
>  	/*
> @@ -524,7 +530,16 @@ void tick_broadcast_oneshot_control(unsi
>  		WARN_ON_ONCE(cpumask_test_cpu(cpu, tick_broadcast_pending_mask));
>  		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
>  			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
> -			if (dev->next_event.tv64 < bc->next_event.tv64)
> +			/*
> +			 * We only reprogram the broadcast timer if we
> +			 * did not mark ourself in the force mask and
> +			 * if the cpu local event is earlier than the
> +			 * broadcast event. If the current CPU is in
> +			 * the force mask, then we are going to be
> +			 * woken by the IPI right away.
> +			 */
> +			if (!cpumask_test_cpu(cpu, tick_broadcast_force_mask) &&
Is the test against tick_broadcast_force_mask necessary if we add the check
in the idle thread before entering idle ? It does not hurt, agreed, and we'd
better leave it there, it is just for my own understanding, thanks a lot.

Having said that, on the series:

Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

> +			    dev->next_event.tv64 < bc->next_event.tv64)
>  				tick_broadcast_set_event(dev->next_event, 1);
>  		}
>  	} else {
> @@ -545,6 +560,47 @@ void tick_broadcast_oneshot_control(unsi
>  				       tick_broadcast_pending_mask))
>  				goto out;
>  
> +			/*
> +			 * If the pending bit is not set, then we are
> +			 * either the CPU handling the broadcast
> +			 * interrupt or we got woken by something else.
> +			 *
> +			 * We are not longer in the broadcast mask, so
> +			 * if the cpu local expiry time is already
> +			 * reached, we would reprogram the cpu local
> +			 * timer with an already expired event.
> +			 *
> +			 * This can lead to a ping-pong when we return
> +			 * to idle and therefor rearm the broadcast
> +			 * timer before the cpu local timer was able
> +			 * to fire. This happens because the forced
> +			 * reprogramming makes sure that the event
> +			 * will happen in the future and depending on
> +			 * the min_delta setting this might be far
> +			 * enough out that the ping-pong starts.
> +			 *
> +			 * If the cpu local next_event has expired
> +			 * then we know that the broadcast timer
> +			 * next_event has expired as well and
> +			 * broadcast is about to be handled. So we
> +			 * avoid reprogramming and enforce that the
> +			 * broadcast handler, which did not run yet,
> +			 * will invoke the cpu local handler.
> +			 *
> +			 * We cannot call the handler directly from
> +			 * here, because we might be in a NOHZ phase
> +			 * and we did not go through the irq_enter()
> +			 * nohz fixups.
> +			 */
> +			now = ktime_get();
> +			if (dev->next_event.tv64 <= now.tv64) {
> +				cpumask_set_cpu(cpu, tick_broadcast_force_mask);
> +				goto out;
> +			}
> +			/*
> +			 * We got woken by something else. Reprogram
> +			 * the cpu local timer device.
> +			 */
>  			tick_program_event(dev->next_event, 1);
>  		}
>  	}
> @@ -686,5 +742,6 @@ void tick_broadcast_init(void)
>  #ifdef CONFIG_TICK_ONESHOT
>  	alloc_cpumask_var(&tick_broadcast_oneshot_mask, GFP_NOWAIT);
>  	alloc_cpumask_var(&tick_broadcast_pending_mask, GFP_NOWAIT);
> +	alloc_cpumask_var(&tick_broadcast_force_mask, GFP_NOWAIT);
>  #endif
>  }
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 4/7] tick: Handle broadcast wakeup of multiple cpus
  2013-03-13 11:36     ` Lorenzo Pieralisi
@ 2013-03-13 21:42       ` Thomas Gleixner
  -1 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-13 21:42 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: LKML, LAK, John Stultz, Arjan van de Veen, Santosh Shilimkar, Jason Liu

On Wed, 13 Mar 2013, Lorenzo Pieralisi wrote:
> > +	/* Take care of enforced broadcast requests */
> > +	cpumask_or(tmpmask, tmpmask, tick_broadcast_force_mask);
> > +	cpumask_clear(tick_broadcast_force_mask);
> 
> I tested the set and it works fine on a dual cluster big.LITTLE testchip
> using broadcast timer to manage deep idle cluster states.
> 
> Just asking a question: the force mask is cleared before sending the
> timer IPI. Would not be better to clear it after the IPI is sent in
> 
> tick_do_broadcast(...) ?
> 
> Can you spot a regression if we do this ? The idle thread checks that
> mask with irqs disabled, so it is possible that we clear the mask before
> the CPU has a chance to get the IPI. If we clear the mask after sending
> the IPI, we are increasing the chances for the idle thread to get it.
> 
> It is just a further optimization, just asking, thanks.

Need to think about that.
 
> > @@ -524,7 +530,16 @@ void tick_broadcast_oneshot_control(unsi
> >  		WARN_ON_ONCE(cpumask_test_cpu(cpu, tick_broadcast_pending_mask));
> >  		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
> >  			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
> > -			if (dev->next_event.tv64 < bc->next_event.tv64)
> > +			/*
> > +			 * We only reprogram the broadcast timer if we
> > +			 * did not mark ourself in the force mask and
> > +			 * if the cpu local event is earlier than the
> > +			 * broadcast event. If the current CPU is in
> > +			 * the force mask, then we are going to be
> > +			 * woken by the IPI right away.
> > +			 */
> > +			if (!cpumask_test_cpu(cpu, tick_broadcast_force_mask) &&
> Is the test against tick_broadcast_force_mask necessary if we add the check
> in the idle thread before entering idle ? It does not hurt, agreed, and we'd
> better leave it there, it is just for my own understanding, thanks a lot.

Well, it's necessary for all archs which do not have the check (yet).
 
> Having said that, on the series:
> 
> Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 4/7] tick: Handle broadcast wakeup of multiple cpus
@ 2013-03-13 21:42       ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2013-03-13 21:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 13 Mar 2013, Lorenzo Pieralisi wrote:
> > +	/* Take care of enforced broadcast requests */
> > +	cpumask_or(tmpmask, tmpmask, tick_broadcast_force_mask);
> > +	cpumask_clear(tick_broadcast_force_mask);
> 
> I tested the set and it works fine on a dual cluster big.LITTLE testchip
> using broadcast timer to manage deep idle cluster states.
> 
> Just asking a question: the force mask is cleared before sending the
> timer IPI. Would not be better to clear it after the IPI is sent in
> 
> tick_do_broadcast(...) ?
> 
> Can you spot a regression if we do this ? The idle thread checks that
> mask with irqs disabled, so it is possible that we clear the mask before
> the CPU has a chance to get the IPI. If we clear the mask after sending
> the IPI, we are increasing the chances for the idle thread to get it.
> 
> It is just a further optimization, just asking, thanks.

Need to think about that.
 
> > @@ -524,7 +530,16 @@ void tick_broadcast_oneshot_control(unsi
> >  		WARN_ON_ONCE(cpumask_test_cpu(cpu, tick_broadcast_pending_mask));
> >  		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
> >  			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
> > -			if (dev->next_event.tv64 < bc->next_event.tv64)
> > +			/*
> > +			 * We only reprogram the broadcast timer if we
> > +			 * did not mark ourself in the force mask and
> > +			 * if the cpu local event is earlier than the
> > +			 * broadcast event. If the current CPU is in
> > +			 * the force mask, then we are going to be
> > +			 * woken by the IPI right away.
> > +			 */
> > +			if (!cpumask_test_cpu(cpu, tick_broadcast_force_mask) &&
> Is the test against tick_broadcast_force_mask necessary if we add the check
> in the idle thread before entering idle ? It does not hurt, agreed, and we'd
> better leave it there, it is just for my own understanding, thanks a lot.

Well, it's necessary for all archs which do not have the check (yet).
 
> Having said that, on the series:
> 
> Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:timers/core] tick: Avoid programming the local cpu timer if broadcast pending
  2013-03-06 11:18   ` Thomas Gleixner
  (?)
@ 2013-03-19 11:35   ` tip-bot for Thomas Gleixner
  -1 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-03-19 11:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, linux-arm-kernel, santosh.shilimkar,
	john.stultz, arjan, liu.h.jason, tglx, lorenzo.pieralisi

Commit-ID:  26517f3e99248668315aee9460dcea21628cdd7f
Gitweb:     http://git.kernel.org/tip/26517f3e99248668315aee9460dcea21628cdd7f
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 6 Mar 2013 11:18:35 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 13 Mar 2013 11:39:39 +0100

tick: Avoid programming the local cpu timer if broadcast pending

If the local cpu timer stops in deep idle, we arm the broadcast device
and get woken by an IPI. Now when we return from deep idle we reenable
the local cpu timer unconditionally before handling the IPI. But
that's a pointless exercise: the timer is already expired and the IPI
is on the way. And it's an expensive exercise as we use the forced
reprogramming mode so that we do not lose a timer event. This forced
reprogramming will loop at least once in the retry.

To avoid this reprogramming, we mark the cpu in a pending bit mask
before we send the IPI. Now when the IPI target cpu wakes up, it will
see the pending bit set and skip the reprogramming. The reprogramming
of the cpu local timer will happen in the IPI handler which runs the
cpu local timer interrupt function.

Reported-by: Jason Liu <liu.h.jason@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Link: http://lkml.kernel.org/r/20130306111537.431082074@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-broadcast.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 380910d..005c0ca 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -392,6 +392,7 @@ int tick_resume_broadcast(void)
 #ifdef CONFIG_TICK_ONESHOT
 
 static cpumask_var_t tick_broadcast_oneshot_mask;
+static cpumask_var_t tick_broadcast_pending_mask;
 
 /*
  * Exposed for debugging: see timer_list.c
@@ -470,6 +471,12 @@ again:
 		td = &per_cpu(tick_cpu_device, cpu);
 		if (td->evtdev->next_event.tv64 <= now.tv64) {
 			cpumask_set_cpu(cpu, tmpmask);
+			/*
+			 * Mark the remote cpu in the pending mask, so
+			 * it can avoid reprogramming the cpu local
+			 * timer in tick_broadcast_oneshot_control().
+			 */
+			cpumask_set_cpu(cpu, tick_broadcast_pending_mask);
 		} else if (td->evtdev->next_event.tv64 < next_event.tv64) {
 			next_event.tv64 = td->evtdev->next_event.tv64;
 			next_cpu = cpu;
@@ -535,6 +542,7 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 
 	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 	if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
+		WARN_ON_ONCE(cpumask_test_cpu(cpu, tick_broadcast_pending_mask));
 		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
 			if (dev->next_event.tv64 < bc->next_event.tv64)
@@ -543,10 +551,25 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 	} else {
 		if (cpumask_test_and_clear_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
-			if (dev->next_event.tv64 != KTIME_MAX)
-				tick_program_event(dev->next_event, 1);
+			if (dev->next_event.tv64 == KTIME_MAX)
+				goto out;
+			/*
+			 * The cpu which was handling the broadcast
+			 * timer marked this cpu in the broadcast
+			 * pending mask and fired the broadcast
+			 * IPI. So we are going to handle the expired
+			 * event anyway via the broadcast IPI
+			 * handler. No need to reprogram the timer
+			 * with an already expired event.
+			 */
+			if (cpumask_test_and_clear_cpu(cpu,
+				       tick_broadcast_pending_mask))
+				goto out;
+
+			tick_program_event(dev->next_event, 1);
 		}
 	}
+out:
 	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 }
 
@@ -683,5 +706,6 @@ void __init tick_broadcast_init(void)
 	alloc_cpumask_var(&tmpmask, GFP_NOWAIT);
 #ifdef CONFIG_TICK_ONESHOT
 	alloc_cpumask_var(&tick_broadcast_oneshot_mask, GFP_NOWAIT);
+	alloc_cpumask_var(&tick_broadcast_pending_mask, GFP_NOWAIT);
 #endif
 }

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [tip:timers/core] tick: Handle broadcast wakeup of multiple cpus
  2013-03-06 11:18   ` Thomas Gleixner
  (?)
  (?)
@ 2013-03-19 11:37   ` tip-bot for Thomas Gleixner
  -1 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-03-19 11:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, linux-arm-kernel, santosh.shilimkar,
	john.stultz, arjan, liu.h.jason, tglx, lorenzo.pieralisi

Commit-ID:  989dcb645ca715129c5a2b39102c8334a20d9615
Gitweb:     http://git.kernel.org/tip/989dcb645ca715129c5a2b39102c8334a20d9615
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 6 Mar 2013 11:18:35 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 13 Mar 2013 11:39:39 +0100

tick: Handle broadcast wakeup of multiple cpus

Some brilliant hardware implementations wake multiple cores when the
broadcast timer fires. This leads to the following interesting
problem:

CPU0				CPU1
wakeup from idle		wakeup from idle

leave broadcast mode		leave broadcast mode
 restart per cpu timer		 restart per cpu timer
 	     	 		go back to idle
handle broadcast
 (empty mask)			
				enter broadcast mode
				programm broadcast device
enter broadcast mode
programm broadcast device

So what happens is that due to the forced reprogramming of the cpu
local timer, we need to set a event in the future. Now if we manage to
go back to idle before the timer fires, we switch off the timer and
arm the broadcast device with an already expired time (covered by
forced mode). So in the worst case we repeat the above ping pong
forever.
					
Unfortunately we have no information about what caused the wakeup, but
we can check current time against the expiry time of the local cpu. If
the local event is already in the past, we know that the broadcast
timer is about to fire and send an IPI. So we mark ourself as an IPI
target even if we left broadcast mode and avoid the reprogramming of
the local cpu timer.

This still leaves the possibility that a CPU which is not handling the
broadcast interrupt is going to reach idle again before the IPI
arrives. This can't be solved in the core code and will be handled in
follow up patches.

Reported-by: Jason Liu <liu.h.jason@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Link: http://lkml.kernel.org/r/20130306111537.492045206@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-broadcast.c | 59 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 005c0ca..2100aad 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -393,6 +393,7 @@ int tick_resume_broadcast(void)
 
 static cpumask_var_t tick_broadcast_oneshot_mask;
 static cpumask_var_t tick_broadcast_pending_mask;
+static cpumask_var_t tick_broadcast_force_mask;
 
 /*
  * Exposed for debugging: see timer_list.c
@@ -483,6 +484,10 @@ again:
 		}
 	}
 
+	/* Take care of enforced broadcast requests */
+	cpumask_or(tmpmask, tmpmask, tick_broadcast_force_mask);
+	cpumask_clear(tick_broadcast_force_mask);
+
 	/*
 	 * Wakeup the cpus which have an expired event.
 	 */
@@ -518,6 +523,7 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 	struct clock_event_device *bc, *dev;
 	struct tick_device *td;
 	unsigned long flags;
+	ktime_t now;
 	int cpu;
 
 	/*
@@ -545,7 +551,16 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 		WARN_ON_ONCE(cpumask_test_cpu(cpu, tick_broadcast_pending_mask));
 		if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) {
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
-			if (dev->next_event.tv64 < bc->next_event.tv64)
+			/*
+			 * We only reprogram the broadcast timer if we
+			 * did not mark ourself in the force mask and
+			 * if the cpu local event is earlier than the
+			 * broadcast event. If the current CPU is in
+			 * the force mask, then we are going to be
+			 * woken by the IPI right away.
+			 */
+			if (!cpumask_test_cpu(cpu, tick_broadcast_force_mask) &&
+			    dev->next_event.tv64 < bc->next_event.tv64)
 				tick_broadcast_set_event(bc, cpu, dev->next_event, 1);
 		}
 	} else {
@@ -566,6 +581,47 @@ void tick_broadcast_oneshot_control(unsigned long reason)
 				       tick_broadcast_pending_mask))
 				goto out;
 
+			/*
+			 * If the pending bit is not set, then we are
+			 * either the CPU handling the broadcast
+			 * interrupt or we got woken by something else.
+			 *
+			 * We are not longer in the broadcast mask, so
+			 * if the cpu local expiry time is already
+			 * reached, we would reprogram the cpu local
+			 * timer with an already expired event.
+			 *
+			 * This can lead to a ping-pong when we return
+			 * to idle and therefor rearm the broadcast
+			 * timer before the cpu local timer was able
+			 * to fire. This happens because the forced
+			 * reprogramming makes sure that the event
+			 * will happen in the future and depending on
+			 * the min_delta setting this might be far
+			 * enough out that the ping-pong starts.
+			 *
+			 * If the cpu local next_event has expired
+			 * then we know that the broadcast timer
+			 * next_event has expired as well and
+			 * broadcast is about to be handled. So we
+			 * avoid reprogramming and enforce that the
+			 * broadcast handler, which did not run yet,
+			 * will invoke the cpu local handler.
+			 *
+			 * We cannot call the handler directly from
+			 * here, because we might be in a NOHZ phase
+			 * and we did not go through the irq_enter()
+			 * nohz fixups.
+			 */
+			now = ktime_get();
+			if (dev->next_event.tv64 <= now.tv64) {
+				cpumask_set_cpu(cpu, tick_broadcast_force_mask);
+				goto out;
+			}
+			/*
+			 * We got woken by something else. Reprogram
+			 * the cpu local timer device.
+			 */
 			tick_program_event(dev->next_event, 1);
 		}
 	}
@@ -707,5 +763,6 @@ void __init tick_broadcast_init(void)
 #ifdef CONFIG_TICK_ONESHOT
 	alloc_cpumask_var(&tick_broadcast_oneshot_mask, GFP_NOWAIT);
 	alloc_cpumask_var(&tick_broadcast_pending_mask, GFP_NOWAIT);
+	alloc_cpumask_var(&tick_broadcast_force_mask, GFP_NOWAIT);
 #endif
 }

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [tip:timers/core] tick: Provide a check for a forced broadcast pending
  2013-03-06 11:18   ` Thomas Gleixner
  (?)
@ 2013-03-19 11:38   ` tip-bot for Thomas Gleixner
  -1 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-03-19 11:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, linux-arm-kernel, santosh.shilimkar,
	john.stultz, arjan, liu.h.jason, tglx, lorenzo.pieralisi

Commit-ID:  eaa907c546f76222227dfc41784b22588af1e3d7
Gitweb:     http://git.kernel.org/tip/eaa907c546f76222227dfc41784b22588af1e3d7
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 6 Mar 2013 11:18:36 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 13 Mar 2013 11:39:39 +0100

tick: Provide a check for a forced broadcast pending

On the CPU which gets woken along with the target CPU of the broadcast
the following happens:

  deep_idle()
			<-- spurious wakeup
  broadcast_exit()
    set forced bit
  
  enable interrupts
    
			<-- Nothing happens

  disable interrupts

  broadcast_enter()
			<-- Here we observe the forced bit is set
  deep_idle()

Now after that the target CPU of the broadcast runs the broadcast
handler and finds the other CPU in both the broadcast and the forced
mask, sends the IPI and stuff gets back to normal.

So it's not actually harmful, just more evidence for the theory, that
hardware designers have access to very special drug supplies.

Now there is no point in going back to deep idle just to wake up again
right away via an IPI. Provide a check which allows the idle code to
avoid the deep idle transition.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Jason Liu <liu.h.jason@gmail.com>
Link: http://lkml.kernel.org/r/20130306111537.565418308@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 include/linux/clockchips.h   |  6 ++++++
 kernel/time/tick-broadcast.c | 12 ++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 494d33e..646aac1 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -175,6 +175,12 @@ extern void tick_broadcast(const struct cpumask *mask);
 extern int tick_receive_broadcast(void);
 #endif
 
+#if defined(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST) && defined(CONFIG_TICK_ONESHOT)
+extern int tick_check_broadcast_expired(void);
+#else
+static inline int tick_check_broadcast_expired(void) { return 0; }
+#endif
+
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
 extern void clockevents_notify(unsigned long reason, void *arg);
 #else
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 2100aad..d76d816 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -404,6 +404,18 @@ struct cpumask *tick_get_broadcast_oneshot_mask(void)
 }
 
 /*
+ * Called before going idle with interrupts disabled. Checks whether a
+ * broadcast event from the other core is about to happen. We detected
+ * that in tick_broadcast_oneshot_control(). The callsite can use this
+ * to avoid a deep idle transition as we are about to get the
+ * broadcast IPI right away.
+ */
+int tick_check_broadcast_expired(void)
+{
+	return cpumask_test_cpu(smp_processor_id(), tick_broadcast_force_mask);
+}
+
+/*
  * Set broadcast interrupt affinity
  */
 static void tick_broadcast_set_affinity(struct clock_event_device *bc,

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [tip:timers/core] arm: Use tick broadcast expired check
  2013-03-06 11:18   ` Thomas Gleixner
  (?)
@ 2013-03-19 11:39   ` tip-bot for Thomas Gleixner
  -1 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-03-19 11:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, linux-arm-kernel, santosh.shilimkar,
	john.stultz, arjan, liu.h.jason, tglx, lorenzo.pieralisi

Commit-ID:  80bbe9f273a38f83ecfc50fe384a57f8428887bd
Gitweb:     http://git.kernel.org/tip/80bbe9f273a38f83ecfc50fe384a57f8428887bd
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 6 Mar 2013 11:18:37 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 13 Mar 2013 11:39:40 +0100

arm: Use tick broadcast expired check

Avoid going back into deep idle if the tick broadcast IPI is about to
fire.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Jason Liu <liu.h.jason@gmail.com>
Link: http://lkml.kernel.org/r/20130306111537.640722922@linutronix.de

---
 arch/arm/kernel/process.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 047d3e4..db4ffd0 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -199,7 +199,16 @@ void cpu_idle(void)
 #ifdef CONFIG_PL310_ERRATA_769419
 			wmb();
 #endif
-			if (hlt_counter) {
+			/*
+			 * In poll mode we reenable interrupts and spin.
+			 *
+			 * Also if we detected in the wakeup from idle
+			 * path that the tick broadcast device expired
+			 * for us, we don't want to go deep idle as we
+			 * know that the IPI is going to arrive right
+			 * away
+			 */
+			if (hlt_counter || tick_check_broadcast_expired()) {
 				local_irq_enable();
 				cpu_relax();
 			} else if (!need_resched()) {

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [tip:timers/core] x86: Use tick broadcast expired check
  2013-03-06 11:18   ` Thomas Gleixner
  (?)
@ 2013-03-19 11:40   ` tip-bot for Thomas Gleixner
  -1 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-03-19 11:40 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, arjan, john.stultz, hpa, mingo, tglx

Commit-ID:  35b61edb41ffee58711850e76215b852386ddb10
Gitweb:     http://git.kernel.org/tip/35b61edb41ffee58711850e76215b852386ddb10
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 6 Mar 2013 11:18:37 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 13 Mar 2013 11:39:40 +0100

x86: Use tick broadcast expired check

Avoid going back into deep idle if the tick broadcast IPI is about to
fire.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20130306111537.702278273@linutronix.de
---
 arch/x86/kernel/process.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 14ae100..aa524da 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -336,6 +336,18 @@ void cpu_idle(void)
 			local_touch_nmi();
 			local_irq_disable();
 
+			/*
+			 * We detected in the wakeup path that the
+			 * tick broadcast device expired for us, but
+			 * we raced with the other CPU and came back
+			 * here before it was able to fire the IPI.
+			 * No point in going idle.
+			 */
+			if (tick_check_broadcast_expired()) {
+				local_irq_enable();
+				continue;
+			}
+
 			enter_idle();
 
 			/* Don't trace irqs off for idle */

^ permalink raw reply related	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2013-03-19 11:41 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-06 11:18 [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong Thomas Gleixner
2013-03-06 11:18 ` Thomas Gleixner
2013-03-06 11:18 ` [patch 1/7] tick: Call tick_init late Thomas Gleixner
2013-03-06 11:18   ` Thomas Gleixner
2013-03-07 16:29   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2013-03-06 11:18 ` [patch 2/7] tick: Convert broadcast cpu bitmaps to cpumask_var_t Thomas Gleixner
2013-03-06 11:18   ` Thomas Gleixner
2013-03-07  5:51   ` Rusty Russell
2013-03-07  5:51     ` Rusty Russell
2013-03-07 16:30   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2013-03-06 11:18 ` [patch 3/7] tick: Avoid programming the local cpu timer if broadcast pending Thomas Gleixner
2013-03-06 11:18   ` Thomas Gleixner
2013-03-19 11:35   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2013-03-06 11:18 ` [patch 4/7] tick: Handle broadcast wakeup of multiple cpus Thomas Gleixner
2013-03-06 11:18   ` Thomas Gleixner
2013-03-13 11:36   ` Lorenzo Pieralisi
2013-03-13 11:36     ` Lorenzo Pieralisi
2013-03-13 21:42     ` Thomas Gleixner
2013-03-13 21:42       ` Thomas Gleixner
2013-03-19 11:37   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2013-03-06 11:18 ` [patch 5/7] tick: Provide a check for a forced broadcast pending Thomas Gleixner
2013-03-06 11:18   ` Thomas Gleixner
2013-03-19 11:38   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2013-03-06 11:18 ` [patch 6/7] arm: Use tick broadcast expired check Thomas Gleixner
2013-03-06 11:18   ` Thomas Gleixner
2013-03-19 11:39   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2013-03-06 11:18 ` [patch 7/7] x86: " Thomas Gleixner
2013-03-06 11:18   ` Thomas Gleixner
2013-03-19 11:40   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2013-03-13  9:35 ` [patch 0/7] tick: Optimize broadcast handling and prevent expiry ping pong Santosh Shilimkar
2013-03-13  9:35   ` Santosh Shilimkar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.