All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Jason Liu <liu.h.jason@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: too many timer retries happen when do local timer swtich with broadcast timer
Date: Thu, 21 Feb 2013 14:48:20 +0100 (CET)	[thread overview]
Message-ID: <alpine.LFD.2.02.1302211436300.22263@ionos> (raw)
In-Reply-To: <CAB4PhKf+Z-TK4KGDZm1kYOWibg=e2Ehnhjxt_oTbFq9eqouAjA@mail.gmail.com>

Jason,

On Thu, 21 Feb 2013, Jason Liu wrote:
> 2013/2/21 Thomas Gleixner <tglx@linutronix.de>:
> > Now your explanation makes sense.
> >
> > I have no fast solution for this, but I think that I have an idea how
> > to fix it. Stay tuned.
> 
> Thanks Thomas, wait for your fix. :)

find below a completely untested patch, which should address that issue.

Sigh. Stopping the cpu local timer in deep idle is known to be an
idiotic design decision for 10+ years. I'll never understand why ARM
vendors insist on implementing the same stupidity over and over.

Thanks,

	tglx

Index: linux-2.6/kernel/time/tick-broadcast.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-broadcast.c
+++ linux-2.6/kernel/time/tick-broadcast.c
@@ -397,6 +397,8 @@ int tick_resume_broadcast(void)
 
 /* FIXME: use cpumask_var_t. */
 static DECLARE_BITMAP(tick_broadcast_oneshot_mask, NR_CPUS);
+static DECLARE_BITMAP(tick_broadcast_pending, NR_CPUS);
+static DECLARE_BITMAP(tick_force_broadcast_mask, NR_CPUS);
 
 /*
  * Exposed for debugging: see timer_list.c
@@ -453,12 +455,24 @@ again:
 	/* Find all expired events */
 	for_each_cpu(cpu, tick_get_broadcast_oneshot_mask()) {
 		td = &per_cpu(tick_cpu_device, cpu);
-		if (td->evtdev->next_event.tv64 <= now.tv64)
+		if (td->evtdev->next_event.tv64 <= now.tv64) {
 			cpumask_set_cpu(cpu, to_cpumask(tmpmask));
-		else if (td->evtdev->next_event.tv64 < next_event.tv64)
+			/*
+			 * Mark the remote cpu in the pending mask, so
+			 * it can avoid reprogramming the cpu local
+			 * timer in tick_broadcast_oneshot_control().
+			 */
+			set_bit(cpu, tick_broadcast_pending);
+		} else if (td->evtdev->next_event.tv64 < next_event.tv64)
 			next_event.tv64 = td->evtdev->next_event.tv64;
 	}
 
+	/* Take care of enforced broadcast requests */
+	for_each_cpu(cpu, to_cpumask(tick_force_broadcast_mask)) {
+		set_bit(cpu, tmpmask);
+		clear_bit(cpu, tick_force_broadcast_mask);
+	}
+
 	/*
 	 * Wakeup the cpus which have an expired event.
 	 */
@@ -494,6 +508,7 @@ void tick_broadcast_oneshot_control(unsi
 	struct clock_event_device *bc, *dev;
 	struct tick_device *td;
 	unsigned long flags;
+	ktime_t now;
 	int cpu;
 
 	/*
@@ -518,6 +533,8 @@ void tick_broadcast_oneshot_control(unsi
 
 	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 	if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
+		WARN_ON_ONCE(test_bit(cpu, tick_broadcast_pending));
+		WARN_ON_ONCE(test_bit(cpu, tick_force_broadcast_mask));
 		if (!cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) {
 			cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask());
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
@@ -529,10 +546,63 @@ void tick_broadcast_oneshot_control(unsi
 			cpumask_clear_cpu(cpu,
 					  tick_get_broadcast_oneshot_mask());
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
-			if (dev->next_event.tv64 != KTIME_MAX)
-				tick_program_event(dev->next_event, 1);
+			if (dev->next_event.tv64 == KTIME_MAX)
+				goto out;
+			/*
+			 * The cpu handling the broadcast timer marked
+			 * this cpu in the broadcast pending mask and
+			 * fired the broadcast IPI. So we are going to
+			 * handle the expired event anyway via the
+			 * broadcast IPI handler. No need to reprogram
+			 * the timer with an already expired event.
+			 */
+			if (test_and_clear_bit(cpu, tick_broadcast_pending))
+				goto out;
+			/*
+			 * If the pending bit is not set, then we are
+			 * either the CPU handling the broadcast
+			 * interrupt or we got woken by something else.
+			 *
+			 * We are not longer in the broadcast mask, so
+			 * if the cpu local expiry time is already
+			 * reached, we would reprogram the cpu local
+			 * timer with an already expired event.
+			 *
+			 * This can lead to a ping-pong when we return
+			 * to idle and therefor rearm the broadcast
+			 * timer before the cpu local timer was able
+			 * to fire. This happens because the forced
+			 * reprogramming makes sure that the event
+			 * will happen in the future and depending on
+			 * the min_delta setting this might be far
+			 * enough out that the ping-pong starts.
+			 *
+			 * If the cpu local next_event has expired
+			 * then we know that the broadcast timer
+			 * next_event has expired as well and
+			 * broadcast is about to be handled. So we
+			 * avoid reprogramming and enforce that the
+			 * broadcast handler, which did not run yet,
+			 * will invoke the cpu local handler.
+			 *
+			 * We cannot call the handler directly from
+			 * here, because we might be in a NOHZ phase
+			 * and we did not go through the irq_enter()
+			 * nohz fixups.
+			 */
+			now = ktime_get();
+			if (dev->next_event.tv64 <= now.tv64) {
+				set_bit(cpu, tick_force_broadcast_mask);
+				goto out;
+			}
+			/*
+			 * We got woken by something else. Reprogram
+			 * the cpu local timer device.
+			 */
+			tick_program_event(dev->next_event, 1);
 		}
 	}
+out:
 	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 }
 

WARNING: multiple messages have this Message-ID (diff)
From: tglx@linutronix.de (Thomas Gleixner)
To: linux-arm-kernel@lists.infradead.org
Subject: too many timer retries happen when do local timer swtich with broadcast timer
Date: Thu, 21 Feb 2013 14:48:20 +0100 (CET)	[thread overview]
Message-ID: <alpine.LFD.2.02.1302211436300.22263@ionos> (raw)
In-Reply-To: <CAB4PhKf+Z-TK4KGDZm1kYOWibg=e2Ehnhjxt_oTbFq9eqouAjA@mail.gmail.com>

Jason,

On Thu, 21 Feb 2013, Jason Liu wrote:
> 2013/2/21 Thomas Gleixner <tglx@linutronix.de>:
> > Now your explanation makes sense.
> >
> > I have no fast solution for this, but I think that I have an idea how
> > to fix it. Stay tuned.
> 
> Thanks Thomas, wait for your fix. :)

find below a completely untested patch, which should address that issue.

Sigh. Stopping the cpu local timer in deep idle is known to be an
idiotic design decision for 10+ years. I'll never understand why ARM
vendors insist on implementing the same stupidity over and over.

Thanks,

	tglx

Index: linux-2.6/kernel/time/tick-broadcast.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-broadcast.c
+++ linux-2.6/kernel/time/tick-broadcast.c
@@ -397,6 +397,8 @@ int tick_resume_broadcast(void)
 
 /* FIXME: use cpumask_var_t. */
 static DECLARE_BITMAP(tick_broadcast_oneshot_mask, NR_CPUS);
+static DECLARE_BITMAP(tick_broadcast_pending, NR_CPUS);
+static DECLARE_BITMAP(tick_force_broadcast_mask, NR_CPUS);
 
 /*
  * Exposed for debugging: see timer_list.c
@@ -453,12 +455,24 @@ again:
 	/* Find all expired events */
 	for_each_cpu(cpu, tick_get_broadcast_oneshot_mask()) {
 		td = &per_cpu(tick_cpu_device, cpu);
-		if (td->evtdev->next_event.tv64 <= now.tv64)
+		if (td->evtdev->next_event.tv64 <= now.tv64) {
 			cpumask_set_cpu(cpu, to_cpumask(tmpmask));
-		else if (td->evtdev->next_event.tv64 < next_event.tv64)
+			/*
+			 * Mark the remote cpu in the pending mask, so
+			 * it can avoid reprogramming the cpu local
+			 * timer in tick_broadcast_oneshot_control().
+			 */
+			set_bit(cpu, tick_broadcast_pending);
+		} else if (td->evtdev->next_event.tv64 < next_event.tv64)
 			next_event.tv64 = td->evtdev->next_event.tv64;
 	}
 
+	/* Take care of enforced broadcast requests */
+	for_each_cpu(cpu, to_cpumask(tick_force_broadcast_mask)) {
+		set_bit(cpu, tmpmask);
+		clear_bit(cpu, tick_force_broadcast_mask);
+	}
+
 	/*
 	 * Wakeup the cpus which have an expired event.
 	 */
@@ -494,6 +508,7 @@ void tick_broadcast_oneshot_control(unsi
 	struct clock_event_device *bc, *dev;
 	struct tick_device *td;
 	unsigned long flags;
+	ktime_t now;
 	int cpu;
 
 	/*
@@ -518,6 +533,8 @@ void tick_broadcast_oneshot_control(unsi
 
 	raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 	if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
+		WARN_ON_ONCE(test_bit(cpu, tick_broadcast_pending));
+		WARN_ON_ONCE(test_bit(cpu, tick_force_broadcast_mask));
 		if (!cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) {
 			cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask());
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
@@ -529,10 +546,63 @@ void tick_broadcast_oneshot_control(unsi
 			cpumask_clear_cpu(cpu,
 					  tick_get_broadcast_oneshot_mask());
 			clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
-			if (dev->next_event.tv64 != KTIME_MAX)
-				tick_program_event(dev->next_event, 1);
+			if (dev->next_event.tv64 == KTIME_MAX)
+				goto out;
+			/*
+			 * The cpu handling the broadcast timer marked
+			 * this cpu in the broadcast pending mask and
+			 * fired the broadcast IPI. So we are going to
+			 * handle the expired event anyway via the
+			 * broadcast IPI handler. No need to reprogram
+			 * the timer with an already expired event.
+			 */
+			if (test_and_clear_bit(cpu, tick_broadcast_pending))
+				goto out;
+			/*
+			 * If the pending bit is not set, then we are
+			 * either the CPU handling the broadcast
+			 * interrupt or we got woken by something else.
+			 *
+			 * We are not longer in the broadcast mask, so
+			 * if the cpu local expiry time is already
+			 * reached, we would reprogram the cpu local
+			 * timer with an already expired event.
+			 *
+			 * This can lead to a ping-pong when we return
+			 * to idle and therefor rearm the broadcast
+			 * timer before the cpu local timer was able
+			 * to fire. This happens because the forced
+			 * reprogramming makes sure that the event
+			 * will happen in the future and depending on
+			 * the min_delta setting this might be far
+			 * enough out that the ping-pong starts.
+			 *
+			 * If the cpu local next_event has expired
+			 * then we know that the broadcast timer
+			 * next_event has expired as well and
+			 * broadcast is about to be handled. So we
+			 * avoid reprogramming and enforce that the
+			 * broadcast handler, which did not run yet,
+			 * will invoke the cpu local handler.
+			 *
+			 * We cannot call the handler directly from
+			 * here, because we might be in a NOHZ phase
+			 * and we did not go through the irq_enter()
+			 * nohz fixups.
+			 */
+			now = ktime_get();
+			if (dev->next_event.tv64 <= now.tv64) {
+				set_bit(cpu, tick_force_broadcast_mask);
+				goto out;
+			}
+			/*
+			 * We got woken by something else. Reprogram
+			 * the cpu local timer device.
+			 */
+			tick_program_event(dev->next_event, 1);
 		}
 	}
+out:
 	raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 }
 

  reply	other threads:[~2013-02-21 13:48 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-20 11:16 too many timer retries happen when do local timer swtich with broadcast timer Jason Liu
2013-02-20 11:16 ` Jason Liu
2013-02-20 13:33 ` Thomas Gleixner
2013-02-20 13:33   ` Thomas Gleixner
2013-02-21  6:16   ` Jason Liu
2013-02-21  6:16     ` Jason Liu
2013-02-21  9:36     ` Thomas Gleixner
2013-02-21  9:36       ` Thomas Gleixner
2013-02-21 10:50       ` Jason Liu
2013-02-21 10:50         ` Jason Liu
2013-02-21 13:48         ` Thomas Gleixner [this message]
2013-02-21 13:48           ` Thomas Gleixner
2013-02-21 15:12           ` Santosh Shilimkar
2013-02-21 15:12             ` Santosh Shilimkar
2013-02-21 22:19             ` Thomas Gleixner
2013-02-21 22:19               ` Thomas Gleixner
2013-02-22 10:07               ` Santosh Shilimkar
2013-02-22 10:07                 ` Santosh Shilimkar
2013-02-22 10:24                 ` Thomas Gleixner
2013-02-22 10:24                   ` Thomas Gleixner
2013-02-22 10:30                   ` Santosh Shilimkar
2013-02-22 10:30                     ` Santosh Shilimkar
2013-02-22 10:31                   ` Lorenzo Pieralisi
2013-02-22 10:31                     ` Lorenzo Pieralisi
2013-02-22 11:02                     ` Santosh Shilimkar
2013-02-22 11:02                       ` Santosh Shilimkar
2013-02-22 12:07                       ` Thomas Gleixner
2013-02-22 12:07                         ` Thomas Gleixner
2013-02-22 14:48                         ` Lorenzo Pieralisi
2013-02-22 14:48                           ` Lorenzo Pieralisi
2013-02-22 15:03                           ` Thomas Gleixner
2013-02-22 15:03                             ` Thomas Gleixner
2013-02-22 15:26                             ` Lorenzo Pieralisi
2013-02-22 15:26                               ` Lorenzo Pieralisi
2013-02-22 18:52                               ` Thomas Gleixner
2013-02-22 18:52                                 ` Thomas Gleixner
2013-02-25  6:12                                 ` Santosh Shilimkar
2013-02-25  6:12                                   ` Santosh Shilimkar
2013-02-25  6:38                                 ` Jason Liu
2013-02-25  6:38                                   ` Jason Liu
2013-02-25 13:34                                 ` Lorenzo Pieralisi
2013-02-25 13:34                                   ` Lorenzo Pieralisi
2013-02-22 10:28               ` Lorenzo Pieralisi
2013-02-22 10:28                 ` Lorenzo Pieralisi
2013-02-22 10:26           ` Jason Liu
2013-02-22 10:26             ` Jason Liu
2013-02-21 10:35     ` Lorenzo Pieralisi
2013-02-21 10:35       ` Lorenzo Pieralisi
2013-02-21 10:49       ` Jason Liu
2013-02-21 10:49         ` Jason Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.02.1302211436300.22263@ionos \
    --to=tglx@linutronix.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liu.h.jason@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.