All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
To: Jason Liu <liu.h.jason@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: too many timer retries happen when do local timer swtich with broadcast timer
Date: Thu, 21 Feb 2013 10:35:08 +0000	[thread overview]
Message-ID: <20130221103508.GA24591@e102568-lin.cambridge.arm.com> (raw)
In-Reply-To: <CAB4PhKfxPcyQfpM2Ra1OxsyYhS5fnkjtPy=uYgxFHim1rOdoug@mail.gmail.com>

Hi Jason,

On Thu, Feb 21, 2013 at 06:16:51AM +0000, Jason Liu wrote:
> 2013/2/20 Thomas Gleixner <tglx@linutronix.de>:
> > On Wed, 20 Feb 2013, Jason Liu wrote:
> >> void arch_idle(void)
> >> {
> >> ....
> >> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
> >>
> >> enter_the_wait_mode();
> >>
> >> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
> >> }
> >>
> >> when the broadcast timer interrupt arrives(this interrupt just wakeup
> >> the ARM, and ARM has no chance
> >> to handle it since local irq is disabled. In fact it's disabled in
> >> cpu_idle() of arch/arm/kernel/process.c)
> >>
> >> the broadcast timer interrupt will wake up the CPU and run:
> >>
> >> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);    ->
> >> tick_broadcast_oneshot_control(...);
> >> ->
> >> tick_program_event(dev->next_event, 1);
> >> ->
> >> tick_dev_program_event(dev, expires, force);
> >> ->
> >> for (i = 0;;) {
> >>                 int ret = clockevents_program_event(dev, expires, now);
> >>                 if (!ret || !force)
> >>                         return ret;
> >>
> >>                 dev->retries++;
> >>                 ....
> >>                 now = ktime_get();
> >>                 expires = ktime_add_ns(now, dev->min_delta_ns);
> >> }
> >> clockevents_program_event(dev, expires, now);
> >>
> >>         delta = ktime_to_ns(ktime_sub(expires, now));
> >>
> >>         if (delta <= 0)
> >>                 return -ETIME;
> >>
> >> when the bc timer interrupt arrives,  which means the last local timer
> >> expires too. so,
> >> clockevents_program_event will return -ETIME, which will cause the
> >> dev->retries++
> >> when retry to program the expired timer.
> >>
> >> Even under the worst case, after the re-program the expired timer,
> >> then CPU enter idle
> >> quickly before the re-progam timer expired, it will make system
> >> ping-pang forever,
> >
> > That's nonsense.
> 
> I don't think so.
> 
> >
> > The timer IPI brings the core out of the deep idle state.
> >
> > So after returning from enter_wait_mode() and after calling
> > clockevents_notify() it returns from arch_idle() to cpu_idle().
> >
> > In cpu_idle() interrupts are reenabled, so the timer IPI handler is
> > invoked. That calls the event_handler of the per cpu local clockevent
> > device (the one which stops in C3). That ends up in the generic timer
> > code which expires timers and reprograms the local clock event device
> > with the next pending timer.
> >
> > So you cannot go idle again, before the expired timers of this event
> > are handled and their callbacks invoked.
> 
> That's true for the CPUs which not response to the global timer interrupt.
> Take our platform as example: we have 4CPUs(CPU0, CPU1,CPU2,CPU3)
> The global timer device will keep running even in the deep idle mode, so, it
> can be used as the broadcast timer device, and the interrupt of this device
> just raised to CPU0 when the timer expired, then, CPU0 will broadcast the
> IPI timer to other CPUs which is in deep idle mode.
> 
> So for CPU1, CPU2, CPU3, you are right, the IPI timer will bring it out of idle
> state, after running clockevents_notify() it returns from arch_idle()
> to cpu_idle(),
> then local_irq_enable(), the IPI handler will be invoked and handle
> the expires times
> and re-program the next pending timer.
> 
> But, that's not true for the CPU0. The flow for CPU0 is:
> the global timer interrupt wakes up CPU0 and then call:
> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
> 
> which will cpumask_clear_cpu(cpu, tick_get_broadcast_oneshot_mask());
> in the function tick_broadcast_oneshot_control(),

For my own understanding: at this point in time CPU0 local timer is
also reprogrammed, with min_delta (ie 1us) if I got your description
right.

> 
> After return from clockevents_notify(), it will return to cpu_idle
> from arch_idle,
> then local_irq_enable(), the CPU0 will response to the global timer
> interrupt, and
> call the interrupt handler: tick_handle_oneshot_broadcast()
> 
> static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
> {
>         struct tick_device *td;
>         ktime_t now, next_event;
>         int cpu;
> 
>         raw_spin_lock(&tick_broadcast_lock);
> again:
>         dev->next_event.tv64 = KTIME_MAX;
>         next_event.tv64 = KTIME_MAX;
>         cpumask_clear(to_cpumask(tmpmask));
>         now = ktime_get();
>         /* Find all expired events */
>         for_each_cpu(cpu, tick_get_broadcast_oneshot_mask()) {
>                 td = &per_cpu(tick_cpu_device, cpu);
>                 if (td->evtdev->next_event.tv64 <= now.tv64)
>                         cpumask_set_cpu(cpu, to_cpumask(tmpmask));
>                 else if (td->evtdev->next_event.tv64 < next_event.tv64)
>                         next_event.tv64 = td->evtdev->next_event.tv64;
>         }
> 
>         /*
>          * Wakeup the cpus which have an expired event.
>          */
>         tick_do_broadcast(to_cpumask(tmpmask));
>         ...
> }
> 
> since cpu0 has been removed from the tick_get_broadcast_oneshot_mask(), and if
> all the other cpu1/2/3 state in idle, and no expired timers, then the
> tmpmask will be 0,
> when call tick_do_broadcast().
> 
> static void tick_do_broadcast(struct cpumask *mask)
> {
>         int cpu = smp_processor_id();
>         struct tick_device *td;
> 
>         /*
>          * Check, if the current cpu is in the mask
>          */
>         if (cpumask_test_cpu(cpu, mask)) {
>                 cpumask_clear_cpu(cpu, mask);
>                 td = &per_cpu(tick_cpu_device, cpu);
>                 td->evtdev->event_handler(td->evtdev);
>         }
> 
>         if (!cpumask_empty(mask)) {
>                 /*
>                  * It might be necessary to actually check whether the devices
>                  * have different broadcast functions. For now, just use the
>                  * one of the first device. This works as long as we have this
>                  * misfeature only on x86 (lapic)
>                  */
>                 td = &per_cpu(tick_cpu_device, cpumask_first(mask));
>                 td->evtdev->broadcast(mask);
>         }
> }
> 
> If the mask is empty, then tick_do_broadcast will do nothing and return, which
> will make cpu0 enter idle quickly, and then system will ping-pang there.

This means that the local timer reprogrammed above (to actually emulate the
expired local timer on CPU0, likely to be set to min_delta == 1us) does not
have time to expire before the idle thread disables IRQs and goes idle again.

Is this a correct description of what's happening ?

Thanks a lot,
Lorenzo


WARNING: multiple messages have this Message-ID (diff)
From: lorenzo.pieralisi@arm.com (Lorenzo Pieralisi)
To: linux-arm-kernel@lists.infradead.org
Subject: too many timer retries happen when do local timer swtich with broadcast timer
Date: Thu, 21 Feb 2013 10:35:08 +0000	[thread overview]
Message-ID: <20130221103508.GA24591@e102568-lin.cambridge.arm.com> (raw)
In-Reply-To: <CAB4PhKfxPcyQfpM2Ra1OxsyYhS5fnkjtPy=uYgxFHim1rOdoug@mail.gmail.com>

Hi Jason,

On Thu, Feb 21, 2013 at 06:16:51AM +0000, Jason Liu wrote:
> 2013/2/20 Thomas Gleixner <tglx@linutronix.de>:
> > On Wed, 20 Feb 2013, Jason Liu wrote:
> >> void arch_idle(void)
> >> {
> >> ....
> >> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
> >>
> >> enter_the_wait_mode();
> >>
> >> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
> >> }
> >>
> >> when the broadcast timer interrupt arrives(this interrupt just wakeup
> >> the ARM, and ARM has no chance
> >> to handle it since local irq is disabled. In fact it's disabled in
> >> cpu_idle() of arch/arm/kernel/process.c)
> >>
> >> the broadcast timer interrupt will wake up the CPU and run:
> >>
> >> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);    ->
> >> tick_broadcast_oneshot_control(...);
> >> ->
> >> tick_program_event(dev->next_event, 1);
> >> ->
> >> tick_dev_program_event(dev, expires, force);
> >> ->
> >> for (i = 0;;) {
> >>                 int ret = clockevents_program_event(dev, expires, now);
> >>                 if (!ret || !force)
> >>                         return ret;
> >>
> >>                 dev->retries++;
> >>                 ....
> >>                 now = ktime_get();
> >>                 expires = ktime_add_ns(now, dev->min_delta_ns);
> >> }
> >> clockevents_program_event(dev, expires, now);
> >>
> >>         delta = ktime_to_ns(ktime_sub(expires, now));
> >>
> >>         if (delta <= 0)
> >>                 return -ETIME;
> >>
> >> when the bc timer interrupt arrives,  which means the last local timer
> >> expires too. so,
> >> clockevents_program_event will return -ETIME, which will cause the
> >> dev->retries++
> >> when retry to program the expired timer.
> >>
> >> Even under the worst case, after the re-program the expired timer,
> >> then CPU enter idle
> >> quickly before the re-progam timer expired, it will make system
> >> ping-pang forever,
> >
> > That's nonsense.
> 
> I don't think so.
> 
> >
> > The timer IPI brings the core out of the deep idle state.
> >
> > So after returning from enter_wait_mode() and after calling
> > clockevents_notify() it returns from arch_idle() to cpu_idle().
> >
> > In cpu_idle() interrupts are reenabled, so the timer IPI handler is
> > invoked. That calls the event_handler of the per cpu local clockevent
> > device (the one which stops in C3). That ends up in the generic timer
> > code which expires timers and reprograms the local clock event device
> > with the next pending timer.
> >
> > So you cannot go idle again, before the expired timers of this event
> > are handled and their callbacks invoked.
> 
> That's true for the CPUs which not response to the global timer interrupt.
> Take our platform as example: we have 4CPUs(CPU0, CPU1,CPU2,CPU3)
> The global timer device will keep running even in the deep idle mode, so, it
> can be used as the broadcast timer device, and the interrupt of this device
> just raised to CPU0 when the timer expired, then, CPU0 will broadcast the
> IPI timer to other CPUs which is in deep idle mode.
> 
> So for CPU1, CPU2, CPU3, you are right, the IPI timer will bring it out of idle
> state, after running clockevents_notify() it returns from arch_idle()
> to cpu_idle(),
> then local_irq_enable(), the IPI handler will be invoked and handle
> the expires times
> and re-program the next pending timer.
> 
> But, that's not true for the CPU0. The flow for CPU0 is:
> the global timer interrupt wakes up CPU0 and then call:
> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
> 
> which will cpumask_clear_cpu(cpu, tick_get_broadcast_oneshot_mask());
> in the function tick_broadcast_oneshot_control(),

For my own understanding: at this point in time CPU0 local timer is
also reprogrammed, with min_delta (ie 1us) if I got your description
right.

> 
> After return from clockevents_notify(), it will return to cpu_idle
> from arch_idle,
> then local_irq_enable(), the CPU0 will response to the global timer
> interrupt, and
> call the interrupt handler: tick_handle_oneshot_broadcast()
> 
> static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
> {
>         struct tick_device *td;
>         ktime_t now, next_event;
>         int cpu;
> 
>         raw_spin_lock(&tick_broadcast_lock);
> again:
>         dev->next_event.tv64 = KTIME_MAX;
>         next_event.tv64 = KTIME_MAX;
>         cpumask_clear(to_cpumask(tmpmask));
>         now = ktime_get();
>         /* Find all expired events */
>         for_each_cpu(cpu, tick_get_broadcast_oneshot_mask()) {
>                 td = &per_cpu(tick_cpu_device, cpu);
>                 if (td->evtdev->next_event.tv64 <= now.tv64)
>                         cpumask_set_cpu(cpu, to_cpumask(tmpmask));
>                 else if (td->evtdev->next_event.tv64 < next_event.tv64)
>                         next_event.tv64 = td->evtdev->next_event.tv64;
>         }
> 
>         /*
>          * Wakeup the cpus which have an expired event.
>          */
>         tick_do_broadcast(to_cpumask(tmpmask));
>         ...
> }
> 
> since cpu0 has been removed from the tick_get_broadcast_oneshot_mask(), and if
> all the other cpu1/2/3 state in idle, and no expired timers, then the
> tmpmask will be 0,
> when call tick_do_broadcast().
> 
> static void tick_do_broadcast(struct cpumask *mask)
> {
>         int cpu = smp_processor_id();
>         struct tick_device *td;
> 
>         /*
>          * Check, if the current cpu is in the mask
>          */
>         if (cpumask_test_cpu(cpu, mask)) {
>                 cpumask_clear_cpu(cpu, mask);
>                 td = &per_cpu(tick_cpu_device, cpu);
>                 td->evtdev->event_handler(td->evtdev);
>         }
> 
>         if (!cpumask_empty(mask)) {
>                 /*
>                  * It might be necessary to actually check whether the devices
>                  * have different broadcast functions. For now, just use the
>                  * one of the first device. This works as long as we have this
>                  * misfeature only on x86 (lapic)
>                  */
>                 td = &per_cpu(tick_cpu_device, cpumask_first(mask));
>                 td->evtdev->broadcast(mask);
>         }
> }
> 
> If the mask is empty, then tick_do_broadcast will do nothing and return, which
> will make cpu0 enter idle quickly, and then system will ping-pang there.

This means that the local timer reprogrammed above (to actually emulate the
expired local timer on CPU0, likely to be set to min_delta == 1us) does not
have time to expire before the idle thread disables IRQs and goes idle again.

Is this a correct description of what's happening ?

Thanks a lot,
Lorenzo

  parent reply	other threads:[~2013-02-21 10:35 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-20 11:16 too many timer retries happen when do local timer swtich with broadcast timer Jason Liu
2013-02-20 11:16 ` Jason Liu
2013-02-20 13:33 ` Thomas Gleixner
2013-02-20 13:33   ` Thomas Gleixner
2013-02-21  6:16   ` Jason Liu
2013-02-21  6:16     ` Jason Liu
2013-02-21  9:36     ` Thomas Gleixner
2013-02-21  9:36       ` Thomas Gleixner
2013-02-21 10:50       ` Jason Liu
2013-02-21 10:50         ` Jason Liu
2013-02-21 13:48         ` Thomas Gleixner
2013-02-21 13:48           ` Thomas Gleixner
2013-02-21 15:12           ` Santosh Shilimkar
2013-02-21 15:12             ` Santosh Shilimkar
2013-02-21 22:19             ` Thomas Gleixner
2013-02-21 22:19               ` Thomas Gleixner
2013-02-22 10:07               ` Santosh Shilimkar
2013-02-22 10:07                 ` Santosh Shilimkar
2013-02-22 10:24                 ` Thomas Gleixner
2013-02-22 10:24                   ` Thomas Gleixner
2013-02-22 10:30                   ` Santosh Shilimkar
2013-02-22 10:30                     ` Santosh Shilimkar
2013-02-22 10:31                   ` Lorenzo Pieralisi
2013-02-22 10:31                     ` Lorenzo Pieralisi
2013-02-22 11:02                     ` Santosh Shilimkar
2013-02-22 11:02                       ` Santosh Shilimkar
2013-02-22 12:07                       ` Thomas Gleixner
2013-02-22 12:07                         ` Thomas Gleixner
2013-02-22 14:48                         ` Lorenzo Pieralisi
2013-02-22 14:48                           ` Lorenzo Pieralisi
2013-02-22 15:03                           ` Thomas Gleixner
2013-02-22 15:03                             ` Thomas Gleixner
2013-02-22 15:26                             ` Lorenzo Pieralisi
2013-02-22 15:26                               ` Lorenzo Pieralisi
2013-02-22 18:52                               ` Thomas Gleixner
2013-02-22 18:52                                 ` Thomas Gleixner
2013-02-25  6:12                                 ` Santosh Shilimkar
2013-02-25  6:12                                   ` Santosh Shilimkar
2013-02-25  6:38                                 ` Jason Liu
2013-02-25  6:38                                   ` Jason Liu
2013-02-25 13:34                                 ` Lorenzo Pieralisi
2013-02-25 13:34                                   ` Lorenzo Pieralisi
2013-02-22 10:28               ` Lorenzo Pieralisi
2013-02-22 10:28                 ` Lorenzo Pieralisi
2013-02-22 10:26           ` Jason Liu
2013-02-22 10:26             ` Jason Liu
2013-02-21 10:35     ` Lorenzo Pieralisi [this message]
2013-02-21 10:35       ` Lorenzo Pieralisi
2013-02-21 10:49       ` Jason Liu
2013-02-21 10:49         ` Jason Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130221103508.GA24591@e102568-lin.cambridge.arm.com \
    --to=lorenzo.pieralisi@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liu.h.jason@gmail.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.