All of lore.kernel.org
 help / color / mirror / Atom feed
* CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
@ 2013-07-08 17:36 ` Stephen Warren
  0 siblings, 0 replies; 17+ messages in thread
From: Stephen Warren @ 2013-07-08 17:36 UTC (permalink / raw)
  To: Stephen Boyd, Mark Rutland, Marc Zyngier
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA, ARM kernel mailing list,
	Thomas Gleixner, John Stultz, Joseph Lo

CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
commit 0647065 "clocksource: Add generic dummy timer driver" in
linux-next. Reverting that commit solves the issue.

The symptom is that ~10% of the time, when re-plugging CPU1 (in a 2-core
system, after unplugging it about 1 second before), I'll see the
following WARN trigger in clockevents_program_event():

> int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
> 			      bool force)
> {
> 	unsigned long long clc;
> 	int64_t delta;
> 	int rc;
> 
> 	if (unlikely(expires.tv64 < 0)) {
> 		WARN_ON_ONCE(1);
> 		return -ETIME;
> 	}

This appears to be because in tick_handle_periodic_broadcast(),
dev->next_event == KTIME_MAX. The system then hangs; I think that loop
just keeps adding tick_period onto next_event, which doesn't manage to
get to an acceptable value for a long time, if ever!

Do you have any idea why this could happen? I assume that during
switching between the dummy timer added by that patch, and the real
Tegra timer (drivers/clocksource/tegra20_timer.c) the Tegra timer's
dev->next_event is temporarily set to KTIME_MAX, but somehow the timer
IRQ handling goes off while the device is in this temporary state? The
timer core seems to take steps to prevent this though, i.e. callilng
spin_lock_irqsave() in places.

If I modify tick_handle_periodic_broadcast() to check for a negative
dev->next_event and simply return in that case, the system seems to work
fine, and I do see tick_handle_periodic_broadcast() being called at a
later time, so obviously something is coming along later and programming
the HW to generate additional events. On this HW, I believe struct
clock_event_device.set_next_event is being used to emulate the periodic
broadcast using a one-shot timer, rather than using the HW's native
periodic capability, probably due to CONFIG_NO_HZ.

Any hints greatly appreciated!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
@ 2013-07-08 17:36 ` Stephen Warren
  0 siblings, 0 replies; 17+ messages in thread
From: Stephen Warren @ 2013-07-08 17:36 UTC (permalink / raw)
  To: linux-arm-kernel

CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
commit 0647065 "clocksource: Add generic dummy timer driver" in
linux-next. Reverting that commit solves the issue.

The symptom is that ~10% of the time, when re-plugging CPU1 (in a 2-core
system, after unplugging it about 1 second before), I'll see the
following WARN trigger in clockevents_program_event():

> int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
> 			      bool force)
> {
> 	unsigned long long clc;
> 	int64_t delta;
> 	int rc;
> 
> 	if (unlikely(expires.tv64 < 0)) {
> 		WARN_ON_ONCE(1);
> 		return -ETIME;
> 	}

This appears to be because in tick_handle_periodic_broadcast(),
dev->next_event == KTIME_MAX. The system then hangs; I think that loop
just keeps adding tick_period onto next_event, which doesn't manage to
get to an acceptable value for a long time, if ever!

Do you have any idea why this could happen? I assume that during
switching between the dummy timer added by that patch, and the real
Tegra timer (drivers/clocksource/tegra20_timer.c) the Tegra timer's
dev->next_event is temporarily set to KTIME_MAX, but somehow the timer
IRQ handling goes off while the device is in this temporary state? The
timer core seems to take steps to prevent this though, i.e. callilng
spin_lock_irqsave() in places.

If I modify tick_handle_periodic_broadcast() to check for a negative
dev->next_event and simply return in that case, the system seems to work
fine, and I do see tick_handle_periodic_broadcast() being called at a
later time, so obviously something is coming along later and programming
the HW to generate additional events. On this HW, I believe struct
clock_event_device.set_next_event is being used to emulate the periodic
broadcast using a one-shot timer, rather than using the HW's native
periodic capability, probably due to CONFIG_NO_HZ.

Any hints greatly appreciated!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
  2013-07-08 17:36 ` Stephen Warren
@ 2013-07-09  0:58     ` Stephen Boyd
  -1 siblings, 0 replies; 17+ messages in thread
From: Stephen Boyd @ 2013-07-09  0:58 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Mark Rutland, Marc Zyngier, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	ARM kernel mailing list, Thomas Gleixner, John Stultz, Joseph Lo

On 07/08, Stephen Warren wrote:
> CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
> commit 0647065 "clocksource: Add generic dummy timer driver" in
> linux-next. Reverting that commit solves the issue.

We found some breakage during boot that has been fixed by two
commits in linus' tree already. Do you know if you have these two
patches

1f73a9806bdd07a5106409bbcab3884078bd34fe
07bd1172902e782f288e4d44b1fde7dec0f08b6f

?

> 
> The symptom is that ~10% of the time, when re-plugging CPU1 (in a 2-core
> system, after unplugging it about 1 second before), I'll see the
> following WARN trigger in clockevents_program_event():
> 
> > int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
> > 			      bool force)
> > {
> > 	unsigned long long clc;
> > 	int64_t delta;
> > 	int rc;
> > 
> > 	if (unlikely(expires.tv64 < 0)) {
> > 		WARN_ON_ONCE(1);
> > 		return -ETIME;
> > 	}
> 
> This appears to be because in tick_handle_periodic_broadcast(),
> dev->next_event == KTIME_MAX. The system then hangs; I think that loop
> just keeps adding tick_period onto next_event, which doesn't manage to
> get to an acceptable value for a long time, if ever!
> 
> Do you have any idea why this could happen? I assume that during
> switching between the dummy timer added by that patch, and the real
> Tegra timer (drivers/clocksource/tegra20_timer.c) the Tegra timer's
> dev->next_event is temporarily set to KTIME_MAX, but somehow the timer
> IRQ handling goes off while the device is in this temporary state? The
> timer core seems to take steps to prevent this though, i.e. callilng
> spin_lock_irqsave() in places.

If you have the TWD then the dummy should only be used when you
notify clockevents core about hitting "C3". Are you seeing this
during idle or only during hotplug?

> 
> If I modify tick_handle_periodic_broadcast() to check for a negative
> dev->next_event and simply return in that case, the system seems to work
> fine, and I do see tick_handle_periodic_broadcast() being called at a
> later time, so obviously something is coming along later and programming
> the HW to generate additional events. On this HW, I believe struct
> clock_event_device.set_next_event is being used to emulate the periodic
> broadcast using a one-shot timer, rather than using the HW's native
> periodic capability, probably due to CONFIG_NO_HZ.

This sounds very much like the bug that was fixed. I don't see
why your broadcast timer would be emulating periodic mode instead
of just using oneshot mode unless it was started before the
system ever hit C3.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 17+ messages in thread

* CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
@ 2013-07-09  0:58     ` Stephen Boyd
  0 siblings, 0 replies; 17+ messages in thread
From: Stephen Boyd @ 2013-07-09  0:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/08, Stephen Warren wrote:
> CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
> commit 0647065 "clocksource: Add generic dummy timer driver" in
> linux-next. Reverting that commit solves the issue.

We found some breakage during boot that has been fixed by two
commits in linus' tree already. Do you know if you have these two
patches

1f73a9806bdd07a5106409bbcab3884078bd34fe
07bd1172902e782f288e4d44b1fde7dec0f08b6f

?

> 
> The symptom is that ~10% of the time, when re-plugging CPU1 (in a 2-core
> system, after unplugging it about 1 second before), I'll see the
> following WARN trigger in clockevents_program_event():
> 
> > int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
> > 			      bool force)
> > {
> > 	unsigned long long clc;
> > 	int64_t delta;
> > 	int rc;
> > 
> > 	if (unlikely(expires.tv64 < 0)) {
> > 		WARN_ON_ONCE(1);
> > 		return -ETIME;
> > 	}
> 
> This appears to be because in tick_handle_periodic_broadcast(),
> dev->next_event == KTIME_MAX. The system then hangs; I think that loop
> just keeps adding tick_period onto next_event, which doesn't manage to
> get to an acceptable value for a long time, if ever!
> 
> Do you have any idea why this could happen? I assume that during
> switching between the dummy timer added by that patch, and the real
> Tegra timer (drivers/clocksource/tegra20_timer.c) the Tegra timer's
> dev->next_event is temporarily set to KTIME_MAX, but somehow the timer
> IRQ handling goes off while the device is in this temporary state? The
> timer core seems to take steps to prevent this though, i.e. callilng
> spin_lock_irqsave() in places.

If you have the TWD then the dummy should only be used when you
notify clockevents core about hitting "C3". Are you seeing this
during idle or only during hotplug?

> 
> If I modify tick_handle_periodic_broadcast() to check for a negative
> dev->next_event and simply return in that case, the system seems to work
> fine, and I do see tick_handle_periodic_broadcast() being called at a
> later time, so obviously something is coming along later and programming
> the HW to generate additional events. On this HW, I believe struct
> clock_event_device.set_next_event is being used to emulate the periodic
> broadcast using a one-shot timer, rather than using the HW's native
> periodic capability, probably due to CONFIG_NO_HZ.

This sounds very much like the bug that was fixed. I don't see
why your broadcast timer would be emulating periodic mode instead
of just using oneshot mode unless it was started before the
system ever hit C3.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
  2013-07-09  0:58     ` Stephen Boyd
@ 2013-07-09 16:05         ` Stephen Warren
  -1 siblings, 0 replies; 17+ messages in thread
From: Stephen Warren @ 2013-07-09 16:05 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Mark Rutland, Marc Zyngier, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	ARM kernel mailing list, Thomas Gleixner, John Stultz, Joseph Lo

On 07/08/2013 06:58 PM, Stephen Boyd wrote:
> On 07/08, Stephen Warren wrote:
>> CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
>> commit 0647065 "clocksource: Add generic dummy timer driver" in
>> linux-next. Reverting that commit solves the issue.
> 
> We found some breakage during boot that has been fixed by two
> commits in linus' tree already. Do you know if you have these two
> patches
> 
> 1f73a9806bdd07a5106409bbcab3884078bd34fe
> 07bd1172902e782f288e4d44b1fde7dec0f08b6f

I didn't before since I was using next-20130705, but I just tried
next-20130709 which does have those two commits, and I still see the issue.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
@ 2013-07-09 16:05         ` Stephen Warren
  0 siblings, 0 replies; 17+ messages in thread
From: Stephen Warren @ 2013-07-09 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/08/2013 06:58 PM, Stephen Boyd wrote:
> On 07/08, Stephen Warren wrote:
>> CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
>> commit 0647065 "clocksource: Add generic dummy timer driver" in
>> linux-next. Reverting that commit solves the issue.
> 
> We found some breakage during boot that has been fixed by two
> commits in linus' tree already. Do you know if you have these two
> patches
> 
> 1f73a9806bdd07a5106409bbcab3884078bd34fe
> 07bd1172902e782f288e4d44b1fde7dec0f08b6f

I didn't before since I was using next-20130705, but I just tried
next-20130709 which does have those two commits, and I still see the issue.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
  2013-07-09 16:05         ` Stephen Warren
@ 2013-07-09 16:35             ` Stephen Boyd
  -1 siblings, 0 replies; 17+ messages in thread
From: Stephen Boyd @ 2013-07-09 16:35 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Mark Rutland, Marc Zyngier, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	ARM kernel mailing list, Thomas Gleixner, John Stultz, Joseph Lo

On 07/09, Stephen Warren wrote:
> On 07/08/2013 06:58 PM, Stephen Boyd wrote:
> > On 07/08, Stephen Warren wrote:
> >> CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
> >> commit 0647065 "clocksource: Add generic dummy timer driver" in
> >> linux-next. Reverting that commit solves the issue.
> > 
> > We found some breakage during boot that has been fixed by two
> > commits in linus' tree already. Do you know if you have these two
> > patches
> > 
> > 1f73a9806bdd07a5106409bbcab3884078bd34fe
> > 07bd1172902e782f288e4d44b1fde7dec0f08b6f
> 
> I didn't before since I was using next-20130705, but I just tried
> next-20130709 which does have those two commits, and I still see the issue.
> 

Ok can you get the output of /proc/timer_list and send it back
please? I assume you have TWD and those should all be in the
per-cpu slots and the broadcast timer should be your tegra specific
timer.

The odd thing is that you're seeing periodic mode for the
broadcast which doesn't make any sense. I would expect oneshot
mode and tick_handle_oneshot_broadcast to be the handler.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 17+ messages in thread

* CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
@ 2013-07-09 16:35             ` Stephen Boyd
  0 siblings, 0 replies; 17+ messages in thread
From: Stephen Boyd @ 2013-07-09 16:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/09, Stephen Warren wrote:
> On 07/08/2013 06:58 PM, Stephen Boyd wrote:
> > On 07/08, Stephen Warren wrote:
> >> CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
> >> commit 0647065 "clocksource: Add generic dummy timer driver" in
> >> linux-next. Reverting that commit solves the issue.
> > 
> > We found some breakage during boot that has been fixed by two
> > commits in linus' tree already. Do you know if you have these two
> > patches
> > 
> > 1f73a9806bdd07a5106409bbcab3884078bd34fe
> > 07bd1172902e782f288e4d44b1fde7dec0f08b6f
> 
> I didn't before since I was using next-20130705, but I just tried
> next-20130709 which does have those two commits, and I still see the issue.
> 

Ok can you get the output of /proc/timer_list and send it back
please? I assume you have TWD and those should all be in the
per-cpu slots and the broadcast timer should be your tegra specific
timer.

The odd thing is that you're seeing periodic mode for the
broadcast which doesn't make any sense. I would expect oneshot
mode and tick_handle_oneshot_broadcast to be the handler.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
  2013-07-09 16:35             ` Stephen Boyd
@ 2013-07-09 16:52                 ` Stephen Warren
  -1 siblings, 0 replies; 17+ messages in thread
From: Stephen Warren @ 2013-07-09 16:52 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Mark Rutland, Marc Zyngier, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	ARM kernel mailing list, Thomas Gleixner, John Stultz, Joseph Lo

On 07/09/2013 10:35 AM, Stephen Boyd wrote:
> On 07/09, Stephen Warren wrote:
>> On 07/08/2013 06:58 PM, Stephen Boyd wrote:
>>> On 07/08, Stephen Warren wrote:
>>>> CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
>>>> commit 0647065 "clocksource: Add generic dummy timer driver" in
>>>> linux-next. Reverting that commit solves the issue.
>>>
>>> We found some breakage during boot that has been fixed by two
>>> commits in linus' tree already. Do you know if you have these two
>>> patches
>>>
>>> 1f73a9806bdd07a5106409bbcab3884078bd34fe
>>> 07bd1172902e782f288e4d44b1fde7dec0f08b6f
>>
>> I didn't before since I was using next-20130705, but I just tried
>> next-20130709 which does have those two commits, and I still see the issue.
>>
> 
> Ok can you get the output of /proc/timer_list and send it back

Sure, it's below. I took another snapshot after unplugging then
re-plugging CPU1, and there was no difference except in the actual time
values and list of queued events (at ~line 12). The set of timers,
modes, etc. were all the same.

> Timer List Version: v0.7
> HRTIMER_MAX_CLOCK_BASES: 4
> now at 75889089000 nsecs
> 
> cpu: 0
>  clock 0:
>   .base:       c0c376a0
>   .index:      0
>   .resolution: 1 nsecs
>   .get_time:   ktime_get
>   .offset:     0 nsecs
> active timers:
>  #0: <c0c384c0>, menu_hrtimer_notify, S:01, hrtimer_start, swapper/0/0
>  # expires at 75889183000-75889183000 nsecs [in 94000 to 94000 nsecs]
>  #1: <c0c379a8>, tick_sched_timer, S:01, hrtimer_start_range_ns, swapper/0/0
>  # expires at 75890000000-75890000000 nsecs [in 911000 to 911000 nsecs]
>  #2: <def3a550>, ehci_hrtimer_func, S:01, hrtimer_start_range_ns, swapper/0/0
>  # expires at 75921594000-75922594000 nsecs [in 32505000 to 33505000 nsecs]
>  #3: <def3ad50>, ehci_hrtimer_func, S:01, hrtimer_start_range_ns, setfont/436
>  # expires at 75921599000-75922599000 nsecs [in 32510000 to 33510000 nsecs]
>  #4: <defd3990>, it_real_fn, S:01, hrtimer_start, syslogd/590
>  # expires at 102888764000-102888764000 nsecs [in 26999675000 to 26999675000 nsecs]
>  #5: <defd3990>, it_real_fn, S:01, hrtimer_start, syslogd/590
>  # expires at 102888764000-102888764000 nsecs [in 26999675000 to 26999675000 nsecs]
>  #6: <df7c1b30>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, avahi-daemon/352
>  # expires at 117911912000-117961554999 nsecs [in 42022823000 to 42072465999 nsecs]
>  #7: <dd857a90>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, dhclient/690
>  # expires at 267143504000-267243504000 nsecs [in 191254415000 to 191354415000 nsecs]
>  #8: <de8eff28>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, atd/557
>  # expires at 3611972607000-3611972657000 nsecs [in 3536083518000 to 3536083568000 nsecs]
>  clock 1:
>   .base:       c0c376d8
>   .index:      1
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_real
>   .offset:     24375428000 nsecs
> active timers:
>  clock 2:
>   .base:       c0c37710
>   .index:      2
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_boottime
>   .offset:     0 nsecs
> active timers:
>  clock 3:
>   .base:       c0c37748
>   .index:      3
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_clocktai
>   .offset:     24375428000 nsecs
> active timers:
>   .expires_next   : 75889329000 nsecs
>   .hres_active    : 1
>   .nr_events      : 4712
>   .nr_retries     : 2
>   .nr_hangs       : 0
>   .max_hang_time  : 0 nsecs
>   .nohz_mode      : 2
>   .last_tick      : 75740000000 nsecs
>   .tick_stopped   : 0
>   .idle_jiffies   : 4294944869
>   .idle_calls     : 13615
>   .idle_sleeps    : 5374
>   .idle_entrytime : 75889188000 nsecs
>   .idle_waketime  : 75881026000 nsecs
>   .idle_exittime  : 75881243000 nsecs
>   .idle_sleeptime : 61747543000 nsecs
>   .iowait_sleeptime: 7327300000 nsecs
>   .last_jiffies   : 4294944884
>   .next_jiffies   : 4294944885
>   .idle_expires   : 75930000000 nsecs
> jiffies: 4294944884
> 
> cpu: 1
>  clock 0:
>   .base:       c0c3f6a0
>   .index:      0
>   .resolution: 1 nsecs
>   .get_time:   ktime_get
>   .offset:     0 nsecs
> active timers:
>  #0: <c0c3f9a8>, tick_sched_timer, S:01, hrtimer_start_range_ns, swapper/1/0
>  # expires at 75900000000-75900000000 nsecs [in 10911000 to 10911000 nsecs]
>  #1: <de835b30>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, console-kit-dae/734
>  # expires at 79000444000-79026433999 nsecs [in 3111355000 to 3137344999 nsecs]
>  #2: <de833b30>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, rpcbind/487
>  # expires at 101235760000-101265759998 nsecs [in 25346671000 to 25376670998 nsecs]
>  #3: <debdff28>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, rwhod/668
>  # expires at 193712567000-193712617000 nsecs [in 117823478000 to 117823528000 nsecs]
>  #4: <de9abb30>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, NetworkManager/543
>  # expires at 313000895000-313100895000 nsecs [in 237111806000 to 237211806000 nsecs]
>  clock 1:
>   .base:       c0c3f6d8
>   .index:      1
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_real
>   .offset:     24375428000 nsecs
> active timers:
>  clock 2:
>   .base:       c0c3f710
>   .index:      2
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_boottime
>   .offset:     0 nsecs
> active timers:
>  clock 3:
>   .base:       c0c3f748
>   .index:      3
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_clocktai
>   .offset:     24375428000 nsecs
> active timers:
>   .expires_next   : 75900000000 nsecs
>   .hres_active    : 1
>   .nr_events      : 4578
>   .nr_retries     : 1
>   .nr_hangs       : 0
>   .max_hang_time  : 0 nsecs
>   .nohz_mode      : 2
>   .last_tick      : 75890000000 nsecs
>   .tick_stopped   : 0
>   .idle_jiffies   : 4294944884
>   .idle_calls     : 6538
>   .idle_sleeps    : 1104
>   .idle_entrytime : 75891066000 nsecs
>   .idle_waketime  : 75821610000 nsecs
>   .idle_exittime  : 75887089000 nsecs
>   .idle_sleeptime : 67621307000 nsecs
>   .iowait_sleeptime: 4203885000 nsecs
>   .last_jiffies   : 4294944885
>   .next_jiffies   : 4294944886
>   .idle_expires   : 120710000000 nsecs
> jiffies: 4294944885
> 
> Tick Device: mode:     1
> Broadcast device
> Clock Event Device: timer0
>  max_delta_ns:   536870948000
>  min_delta_ns:   1000
>  mult:           4294967
>  shift:          32
>  mode:           3
>  next_event:     75922594000 nsecs
>  set_next_event: tegra_timer_set_next_event
>  set_mode:       tegra_timer_set_mode
>  event_handler:  tick_handle_oneshot_broadcast
>  retries:        0
> 
> tick_broadcast_mask: 00000000
> tick_broadcast_oneshot_mask: 00000000
> 
> Tick Device: mode:     1
> Per CPU device: 0
> Clock Event Device: local_timer
>  max_delta_ns:   17179869180
>  min_delta_ns:   1000
>  mult:           536870912
>  shift:          31
>  mode:           3
>  next_event:     75900000000 nsecs
>  set_next_event: twd_set_next_event
>  set_mode:       twd_set_mode
>  event_handler:  hrtimer_interrupt
>  retries:        0
> 
> Tick Device: mode:     1
> Per CPU device: 1
> Clock Event Device: local_timer
>  max_delta_ns:   17179869180
>  min_delta_ns:   1000
>  mult:           536870912
>  shift:          31
>  mode:           3
>  next_event:     75900000000 nsecs
>  set_next_event: twd_set_next_event
>  set_mode:       twd_set_mode
>  event_handler:  hrtimer_interrupt
>  retries:        0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
@ 2013-07-09 16:52                 ` Stephen Warren
  0 siblings, 0 replies; 17+ messages in thread
From: Stephen Warren @ 2013-07-09 16:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/09/2013 10:35 AM, Stephen Boyd wrote:
> On 07/09, Stephen Warren wrote:
>> On 07/08/2013 06:58 PM, Stephen Boyd wrote:
>>> On 07/08, Stephen Warren wrote:
>>>> CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
>>>> commit 0647065 "clocksource: Add generic dummy timer driver" in
>>>> linux-next. Reverting that commit solves the issue.
>>>
>>> We found some breakage during boot that has been fixed by two
>>> commits in linus' tree already. Do you know if you have these two
>>> patches
>>>
>>> 1f73a9806bdd07a5106409bbcab3884078bd34fe
>>> 07bd1172902e782f288e4d44b1fde7dec0f08b6f
>>
>> I didn't before since I was using next-20130705, but I just tried
>> next-20130709 which does have those two commits, and I still see the issue.
>>
> 
> Ok can you get the output of /proc/timer_list and send it back

Sure, it's below. I took another snapshot after unplugging then
re-plugging CPU1, and there was no difference except in the actual time
values and list of queued events (at ~line 12). The set of timers,
modes, etc. were all the same.

> Timer List Version: v0.7
> HRTIMER_MAX_CLOCK_BASES: 4
> now at 75889089000 nsecs
> 
> cpu: 0
>  clock 0:
>   .base:       c0c376a0
>   .index:      0
>   .resolution: 1 nsecs
>   .get_time:   ktime_get
>   .offset:     0 nsecs
> active timers:
>  #0: <c0c384c0>, menu_hrtimer_notify, S:01, hrtimer_start, swapper/0/0
>  # expires at 75889183000-75889183000 nsecs [in 94000 to 94000 nsecs]
>  #1: <c0c379a8>, tick_sched_timer, S:01, hrtimer_start_range_ns, swapper/0/0
>  # expires at 75890000000-75890000000 nsecs [in 911000 to 911000 nsecs]
>  #2: <def3a550>, ehci_hrtimer_func, S:01, hrtimer_start_range_ns, swapper/0/0
>  # expires at 75921594000-75922594000 nsecs [in 32505000 to 33505000 nsecs]
>  #3: <def3ad50>, ehci_hrtimer_func, S:01, hrtimer_start_range_ns, setfont/436
>  # expires at 75921599000-75922599000 nsecs [in 32510000 to 33510000 nsecs]
>  #4: <defd3990>, it_real_fn, S:01, hrtimer_start, syslogd/590
>  # expires at 102888764000-102888764000 nsecs [in 26999675000 to 26999675000 nsecs]
>  #5: <defd3990>, it_real_fn, S:01, hrtimer_start, syslogd/590
>  # expires at 102888764000-102888764000 nsecs [in 26999675000 to 26999675000 nsecs]
>  #6: <df7c1b30>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, avahi-daemon/352
>  # expires at 117911912000-117961554999 nsecs [in 42022823000 to 42072465999 nsecs]
>  #7: <dd857a90>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, dhclient/690
>  # expires at 267143504000-267243504000 nsecs [in 191254415000 to 191354415000 nsecs]
>  #8: <de8eff28>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, atd/557
>  # expires at 3611972607000-3611972657000 nsecs [in 3536083518000 to 3536083568000 nsecs]
>  clock 1:
>   .base:       c0c376d8
>   .index:      1
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_real
>   .offset:     24375428000 nsecs
> active timers:
>  clock 2:
>   .base:       c0c37710
>   .index:      2
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_boottime
>   .offset:     0 nsecs
> active timers:
>  clock 3:
>   .base:       c0c37748
>   .index:      3
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_clocktai
>   .offset:     24375428000 nsecs
> active timers:
>   .expires_next   : 75889329000 nsecs
>   .hres_active    : 1
>   .nr_events      : 4712
>   .nr_retries     : 2
>   .nr_hangs       : 0
>   .max_hang_time  : 0 nsecs
>   .nohz_mode      : 2
>   .last_tick      : 75740000000 nsecs
>   .tick_stopped   : 0
>   .idle_jiffies   : 4294944869
>   .idle_calls     : 13615
>   .idle_sleeps    : 5374
>   .idle_entrytime : 75889188000 nsecs
>   .idle_waketime  : 75881026000 nsecs
>   .idle_exittime  : 75881243000 nsecs
>   .idle_sleeptime : 61747543000 nsecs
>   .iowait_sleeptime: 7327300000 nsecs
>   .last_jiffies   : 4294944884
>   .next_jiffies   : 4294944885
>   .idle_expires   : 75930000000 nsecs
> jiffies: 4294944884
> 
> cpu: 1
>  clock 0:
>   .base:       c0c3f6a0
>   .index:      0
>   .resolution: 1 nsecs
>   .get_time:   ktime_get
>   .offset:     0 nsecs
> active timers:
>  #0: <c0c3f9a8>, tick_sched_timer, S:01, hrtimer_start_range_ns, swapper/1/0
>  # expires at 75900000000-75900000000 nsecs [in 10911000 to 10911000 nsecs]
>  #1: <de835b30>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, console-kit-dae/734
>  # expires at 79000444000-79026433999 nsecs [in 3111355000 to 3137344999 nsecs]
>  #2: <de833b30>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, rpcbind/487
>  # expires at 101235760000-101265759998 nsecs [in 25346671000 to 25376670998 nsecs]
>  #3: <debdff28>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, rwhod/668
>  # expires at 193712567000-193712617000 nsecs [in 117823478000 to 117823528000 nsecs]
>  #4: <de9abb30>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, NetworkManager/543
>  # expires at 313000895000-313100895000 nsecs [in 237111806000 to 237211806000 nsecs]
>  clock 1:
>   .base:       c0c3f6d8
>   .index:      1
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_real
>   .offset:     24375428000 nsecs
> active timers:
>  clock 2:
>   .base:       c0c3f710
>   .index:      2
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_boottime
>   .offset:     0 nsecs
> active timers:
>  clock 3:
>   .base:       c0c3f748
>   .index:      3
>   .resolution: 1 nsecs
>   .get_time:   ktime_get_clocktai
>   .offset:     24375428000 nsecs
> active timers:
>   .expires_next   : 75900000000 nsecs
>   .hres_active    : 1
>   .nr_events      : 4578
>   .nr_retries     : 1
>   .nr_hangs       : 0
>   .max_hang_time  : 0 nsecs
>   .nohz_mode      : 2
>   .last_tick      : 75890000000 nsecs
>   .tick_stopped   : 0
>   .idle_jiffies   : 4294944884
>   .idle_calls     : 6538
>   .idle_sleeps    : 1104
>   .idle_entrytime : 75891066000 nsecs
>   .idle_waketime  : 75821610000 nsecs
>   .idle_exittime  : 75887089000 nsecs
>   .idle_sleeptime : 67621307000 nsecs
>   .iowait_sleeptime: 4203885000 nsecs
>   .last_jiffies   : 4294944885
>   .next_jiffies   : 4294944886
>   .idle_expires   : 120710000000 nsecs
> jiffies: 4294944885
> 
> Tick Device: mode:     1
> Broadcast device
> Clock Event Device: timer0
>  max_delta_ns:   536870948000
>  min_delta_ns:   1000
>  mult:           4294967
>  shift:          32
>  mode:           3
>  next_event:     75922594000 nsecs
>  set_next_event: tegra_timer_set_next_event
>  set_mode:       tegra_timer_set_mode
>  event_handler:  tick_handle_oneshot_broadcast
>  retries:        0
> 
> tick_broadcast_mask: 00000000
> tick_broadcast_oneshot_mask: 00000000
> 
> Tick Device: mode:     1
> Per CPU device: 0
> Clock Event Device: local_timer
>  max_delta_ns:   17179869180
>  min_delta_ns:   1000
>  mult:           536870912
>  shift:          31
>  mode:           3
>  next_event:     75900000000 nsecs
>  set_next_event: twd_set_next_event
>  set_mode:       twd_set_mode
>  event_handler:  hrtimer_interrupt
>  retries:        0
> 
> Tick Device: mode:     1
> Per CPU device: 1
> Clock Event Device: local_timer
>  max_delta_ns:   17179869180
>  min_delta_ns:   1000
>  mult:           536870912
>  shift:          31
>  mode:           3
>  next_event:     75900000000 nsecs
>  set_next_event: twd_set_next_event
>  set_mode:       twd_set_mode
>  event_handler:  hrtimer_interrupt
>  retries:        0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
  2013-07-09 16:52                 ` Stephen Warren
@ 2013-07-09 23:05                     ` Stephen Boyd
  -1 siblings, 0 replies; 17+ messages in thread
From: Stephen Boyd @ 2013-07-09 23:05 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Mark Rutland, Marc Zyngier, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	ARM kernel mailing list, Thomas Gleixner, John Stultz, Joseph Lo

On 07/09, Stephen Warren wrote:
> On 07/09/2013 10:35 AM, Stephen Boyd wrote:
> > On 07/09, Stephen Warren wrote:
> >> On 07/08/2013 06:58 PM, Stephen Boyd wrote:
> >>> On 07/08, Stephen Warren wrote:
> >>>> CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
> >>>> commit 0647065 "clocksource: Add generic dummy timer driver" in
> >>>> linux-next. Reverting that commit solves the issue.
> >>>
> >>> We found some breakage during boot that has been fixed by two
> >>> commits in linus' tree already. Do you know if you have these two
> >>> patches
> >>>
> >>> 1f73a9806bdd07a5106409bbcab3884078bd34fe
> >>> 07bd1172902e782f288e4d44b1fde7dec0f08b6f
> >>
> >> I didn't before since I was using next-20130705, but I just tried
> >> next-20130709 which does have those two commits, and I still see the issue.
> >>
> > 
> > Ok can you get the output of /proc/timer_list and send it back
> 
> Sure, it's below. I took another snapshot after unplugging then
> re-plugging CPU1, and there was no difference except in the actual time
> values and list of queued events (at ~line 12). The set of timers,
> modes, etc. were all the same.

Ok I think I understand the problem now. The dummy timer is
registered with the cpu hotplug notifier first. Then the TWD is
registered, so what we get is a clockevent_register(dummy)
followed by a clockevent_register(twd) on every hotplug (probably
something we should optimize later). Regardless, when we hotplug
out the CPU, the broadcast device is in oneshot mode. When we
hotplug back in the CPU, the dummy event gets added first and
fills in the per-cpu tickdevice slot for CPU1.

When this happens we have to start the broadcast timer and the
code there just forces the broadcast device to use the periodic
broadcast handler (see tick_device_uses_broadcast()) without
checking to see if the mode is periodic. Once that event handler
is assigned to periodic broadcast we're in danger of getting an
interrupt and trying to emulate oneshot mode when we shouldn't
be.

Can you try this patch?


diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 6d3f916..218bcb5 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -157,7 +157,10 @@ int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu)
 		dev->event_handler = tick_handle_periodic;
 		tick_device_setup_broadcast_func(dev);
 		cpumask_set_cpu(cpu, tick_broadcast_mask);
-		tick_broadcast_start_periodic(bc);
+		if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
+			tick_broadcast_start_periodic(bc);
+		else
+			tick_broadcast_setup_oneshot(bc);
 		ret = 1;
 	} else {
 		/*

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 17+ messages in thread

* CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
@ 2013-07-09 23:05                     ` Stephen Boyd
  0 siblings, 0 replies; 17+ messages in thread
From: Stephen Boyd @ 2013-07-09 23:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/09, Stephen Warren wrote:
> On 07/09/2013 10:35 AM, Stephen Boyd wrote:
> > On 07/09, Stephen Warren wrote:
> >> On 07/08/2013 06:58 PM, Stephen Boyd wrote:
> >>> On 07/08, Stephen Warren wrote:
> >>>> CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
> >>>> commit 0647065 "clocksource: Add generic dummy timer driver" in
> >>>> linux-next. Reverting that commit solves the issue.
> >>>
> >>> We found some breakage during boot that has been fixed by two
> >>> commits in linus' tree already. Do you know if you have these two
> >>> patches
> >>>
> >>> 1f73a9806bdd07a5106409bbcab3884078bd34fe
> >>> 07bd1172902e782f288e4d44b1fde7dec0f08b6f
> >>
> >> I didn't before since I was using next-20130705, but I just tried
> >> next-20130709 which does have those two commits, and I still see the issue.
> >>
> > 
> > Ok can you get the output of /proc/timer_list and send it back
> 
> Sure, it's below. I took another snapshot after unplugging then
> re-plugging CPU1, and there was no difference except in the actual time
> values and list of queued events (at ~line 12). The set of timers,
> modes, etc. were all the same.

Ok I think I understand the problem now. The dummy timer is
registered with the cpu hotplug notifier first. Then the TWD is
registered, so what we get is a clockevent_register(dummy)
followed by a clockevent_register(twd) on every hotplug (probably
something we should optimize later). Regardless, when we hotplug
out the CPU, the broadcast device is in oneshot mode. When we
hotplug back in the CPU, the dummy event gets added first and
fills in the per-cpu tickdevice slot for CPU1.

When this happens we have to start the broadcast timer and the
code there just forces the broadcast device to use the periodic
broadcast handler (see tick_device_uses_broadcast()) without
checking to see if the mode is periodic. Once that event handler
is assigned to periodic broadcast we're in danger of getting an
interrupt and trying to emulate oneshot mode when we shouldn't
be.

Can you try this patch?


diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 6d3f916..218bcb5 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -157,7 +157,10 @@ int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu)
 		dev->event_handler = tick_handle_periodic;
 		tick_device_setup_broadcast_func(dev);
 		cpumask_set_cpu(cpu, tick_broadcast_mask);
-		tick_broadcast_start_periodic(bc);
+		if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
+			tick_broadcast_start_periodic(bc);
+		else
+			tick_broadcast_setup_oneshot(bc);
 		ret = 1;
 	} else {
 		/*

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
  2013-07-09 23:05                     ` Stephen Boyd
@ 2013-07-10 16:09                         ` Stephen Warren
  -1 siblings, 0 replies; 17+ messages in thread
From: Stephen Warren @ 2013-07-10 16:09 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Mark Rutland, Marc Zyngier, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	ARM kernel mailing list, Thomas Gleixner, John Stultz, Joseph Lo

On 07/09/2013 05:05 PM, Stephen Boyd wrote:
> On 07/09, Stephen Warren wrote:
>> On 07/09/2013 10:35 AM, Stephen Boyd wrote:
>>> On 07/09, Stephen Warren wrote:
>>>> On 07/08/2013 06:58 PM, Stephen Boyd wrote:
>>>>> On 07/08, Stephen Warren wrote:
>>>>>> CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
>>>>>> commit 0647065 "clocksource: Add generic dummy timer driver" in
>>>>>> linux-next. Reverting that commit solves the issue.
...
> Can you try this patch?

That seems to work great, thanks! I successfully unpugged/replugged CPU1
about 200 times, whereas without the patch I'd usually see the problem
about 10% of replug attempts.

Tested-by: Stephen Warren <swarren-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>

> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
> index 6d3f916..218bcb5 100644
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -157,7 +157,10 @@ int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu)
>  		dev->event_handler = tick_handle_periodic;
>  		tick_device_setup_broadcast_func(dev);
>  		cpumask_set_cpu(cpu, tick_broadcast_mask);
> -		tick_broadcast_start_periodic(bc);
> +		if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
> +			tick_broadcast_start_periodic(bc);
> +		else
> +			tick_broadcast_setup_oneshot(bc);
>  		ret = 1;
>  	} else {
>  		/*
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
@ 2013-07-10 16:09                         ` Stephen Warren
  0 siblings, 0 replies; 17+ messages in thread
From: Stephen Warren @ 2013-07-10 16:09 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/09/2013 05:05 PM, Stephen Boyd wrote:
> On 07/09, Stephen Warren wrote:
>> On 07/09/2013 10:35 AM, Stephen Boyd wrote:
>>> On 07/09, Stephen Warren wrote:
>>>> On 07/08/2013 06:58 PM, Stephen Boyd wrote:
>>>>> On 07/08, Stephen Warren wrote:
>>>>>> CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
>>>>>> commit 0647065 "clocksource: Add generic dummy timer driver" in
>>>>>> linux-next. Reverting that commit solves the issue.
...
> Can you try this patch?

That seems to work great, thanks! I successfully unpugged/replugged CPU1
about 200 times, whereas without the patch I'd usually see the problem
about 10% of replug attempts.

Tested-by: Stephen Warren <swarren@nvidia.com>

> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
> index 6d3f916..218bcb5 100644
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -157,7 +157,10 @@ int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu)
>  		dev->event_handler = tick_handle_periodic;
>  		tick_device_setup_broadcast_func(dev);
>  		cpumask_set_cpu(cpu, tick_broadcast_mask);
> -		tick_broadcast_start_periodic(bc);
> +		if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
> +			tick_broadcast_start_periodic(bc);
> +		else
> +			tick_broadcast_setup_oneshot(bc);
>  		ret = 1;
>  	} else {
>  		/*
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
  2013-07-10 16:09                         ` Stephen Warren
@ 2013-07-11 14:00                             ` Stephen Boyd
  -1 siblings, 0 replies; 17+ messages in thread
From: Stephen Boyd @ 2013-07-11 14:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mark Rutland, Marc Zyngier, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	ARM kernel mailing list, John Stultz, Joseph Lo, Stephen Warren

On 07/10, Stephen Warren wrote:
> On 07/09/2013 05:05 PM, Stephen Boyd wrote:
> > On 07/09, Stephen Warren wrote:
> >> On 07/09/2013 10:35 AM, Stephen Boyd wrote:
> >>> On 07/09, Stephen Warren wrote:
> >>>> On 07/08/2013 06:58 PM, Stephen Boyd wrote:
> >>>>> On 07/08, Stephen Warren wrote:
> >>>>>> CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
> >>>>>> commit 0647065 "clocksource: Add generic dummy timer driver" in
> >>>>>> linux-next. Reverting that commit solves the issue.
> ...
> > Can you try this patch?
> 
> That seems to work great, thanks! I successfully unpugged/replugged CPU1
> about 200 times, whereas without the patch I'd usually see the problem
> about 10% of replug attempts.
> 
> Tested-by: Stephen Warren <swarren-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> 

Thomas, here's the full patch:

---8<----
Subject: [PATCH] tick: Fix hotplug confusing the broadcast mode

On ARM systems the dummy clockevent is registered with the cpu
hotplug notifier chain before any other per-cpu clockevent. This
has the side-effect of causing the dummy clockevent to be
registered first in every hotplug sequence. Because the dummy is
first, we'll try to turn the broadcast source on but the code in
tick_device_uses_broadcast() assumes the broadcast source is in
periodic mode and calls tick_broadcast_start_periodic()
unconditionally.

On boot this isn't a problem because we typically haven't
switched into oneshot mode yet (if at all). During hotplug, if
the broadcast source isn't in periodic mode we'll replace the
broadcast oneshot handler with the broadcast periodic handler and
start emulating oneshot mode when we shouldn't. Due to the way
the broadcast oneshot handler programs the next_event it's
possible for it to contain KTIME_MAX and cause us to hang the
system when the periodic handler tries to program the next tick.
Fix this by using the appropriate function to start the broadcast
source.

Reported-by: Stephen Warren <swarren-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Tested-by: Stephen Warren <swarren-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Stephen Boyd <sboyd-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
---
 kernel/time/tick-broadcast.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 6d3f916..218bcb5 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -157,7 +157,10 @@ int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu)
 		dev->event_handler = tick_handle_periodic;
 		tick_device_setup_broadcast_func(dev);
 		cpumask_set_cpu(cpu, tick_broadcast_mask);
-		tick_broadcast_start_periodic(bc);
+		if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
+			tick_broadcast_start_periodic(bc);
+		else
+			tick_broadcast_setup_oneshot(bc);
 		ret = 1;
 	} else {
 		/*
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 17+ messages in thread

* CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver
@ 2013-07-11 14:00                             ` Stephen Boyd
  0 siblings, 0 replies; 17+ messages in thread
From: Stephen Boyd @ 2013-07-11 14:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/10, Stephen Warren wrote:
> On 07/09/2013 05:05 PM, Stephen Boyd wrote:
> > On 07/09, Stephen Warren wrote:
> >> On 07/09/2013 10:35 AM, Stephen Boyd wrote:
> >>> On 07/09, Stephen Warren wrote:
> >>>> On 07/08/2013 06:58 PM, Stephen Boyd wrote:
> >>>>> On 07/08, Stephen Warren wrote:
> >>>>>> CPU hotplug (replug) on Tegra HW seems to be occasionally broken due to
> >>>>>> commit 0647065 "clocksource: Add generic dummy timer driver" in
> >>>>>> linux-next. Reverting that commit solves the issue.
> ...
> > Can you try this patch?
> 
> That seems to work great, thanks! I successfully unpugged/replugged CPU1
> about 200 times, whereas without the patch I'd usually see the problem
> about 10% of replug attempts.
> 
> Tested-by: Stephen Warren <swarren@nvidia.com>
> 

Thomas, here's the full patch:

---8<----
Subject: [PATCH] tick: Fix hotplug confusing the broadcast mode

On ARM systems the dummy clockevent is registered with the cpu
hotplug notifier chain before any other per-cpu clockevent. This
has the side-effect of causing the dummy clockevent to be
registered first in every hotplug sequence. Because the dummy is
first, we'll try to turn the broadcast source on but the code in
tick_device_uses_broadcast() assumes the broadcast source is in
periodic mode and calls tick_broadcast_start_periodic()
unconditionally.

On boot this isn't a problem because we typically haven't
switched into oneshot mode yet (if at all). During hotplug, if
the broadcast source isn't in periodic mode we'll replace the
broadcast oneshot handler with the broadcast periodic handler and
start emulating oneshot mode when we shouldn't. Due to the way
the broadcast oneshot handler programs the next_event it's
possible for it to contain KTIME_MAX and cause us to hang the
system when the periodic handler tries to program the next tick.
Fix this by using the appropriate function to start the broadcast
source.

Reported-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 kernel/time/tick-broadcast.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 6d3f916..218bcb5 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -157,7 +157,10 @@ int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu)
 		dev->event_handler = tick_handle_periodic;
 		tick_device_setup_broadcast_func(dev);
 		cpumask_set_cpu(cpu, tick_broadcast_mask);
-		tick_broadcast_start_periodic(bc);
+		if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
+			tick_broadcast_start_periodic(bc);
+		else
+			tick_broadcast_setup_oneshot(bc);
 		ret = 1;
 	} else {
 		/*
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:timers/urgent] tick: broadcast: Check broadcast mode on CPU hotplug
  2013-07-11 14:00                             ` Stephen Boyd
  (?)
@ 2013-07-12 10:39                             ` tip-bot for Stephen Boyd
  -1 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Stephen Boyd @ 2013-07-12 10:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, linux-arm-kernel, marc.zyngier,
	Mark.Rutland, swarren, sboyd, john.stultz, josephl, tglx

Commit-ID:  a272dcca1802a7e265a56e60b0d0a6715b0a8ac2
Gitweb:     http://git.kernel.org/tip/a272dcca1802a7e265a56e60b0d0a6715b0a8ac2
Author:     Stephen Boyd <sboyd@codeaurora.org>
AuthorDate: Thu, 11 Jul 2013 07:00:59 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 12 Jul 2013 12:35:40 +0200

tick: broadcast: Check broadcast mode on CPU hotplug

On ARM systems the dummy clockevent is registered with the cpu
hotplug notifier chain before any other per-cpu clockevent. This
has the side-effect of causing the dummy clockevent to be
registered first in every hotplug sequence. Because the dummy is
first, we'll try to turn the broadcast source on but the code in
tick_device_uses_broadcast() assumes the broadcast source is in
periodic mode and calls tick_broadcast_start_periodic()
unconditionally.

On boot this isn't a problem because we typically haven't
switched into oneshot mode yet (if at all). During hotplug, if
the broadcast source isn't in periodic mode we'll replace the
broadcast oneshot handler with the broadcast periodic handler and
start emulating oneshot mode when we shouldn't. Due to the way
the broadcast oneshot handler programs the next_event it's
possible for it to contain KTIME_MAX and cause us to hang the
system when the periodic handler tries to program the next tick.
Fix this by using the appropriate function to start the broadcast
source.

Reported-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: ARM kernel mailing list <linux-arm-kernel@lists.infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joseph Lo <josephl@nvidia.com>
Link: http://lkml.kernel.org/r/20130711140059.GA27430@codeaurora.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/tick-broadcast.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 6d3f916..218bcb5 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -157,7 +157,10 @@ int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu)
 		dev->event_handler = tick_handle_periodic;
 		tick_device_setup_broadcast_func(dev);
 		cpumask_set_cpu(cpu, tick_broadcast_mask);
-		tick_broadcast_start_periodic(bc);
+		if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
+			tick_broadcast_start_periodic(bc);
+		else
+			tick_broadcast_setup_oneshot(bc);
 		ret = 1;
 	} else {
 		/*

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-07-12 10:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-08 17:36 CPU hotplug issue w/ 0647065 clocksource: Add generic dummy timer driver Stephen Warren
2013-07-08 17:36 ` Stephen Warren
     [not found] ` <51DAF895.1020700-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-07-09  0:58   ` Stephen Boyd
2013-07-09  0:58     ` Stephen Boyd
     [not found]     ` <20130709005837.GC830-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2013-07-09 16:05       ` Stephen Warren
2013-07-09 16:05         ` Stephen Warren
     [not found]         ` <51DC34AD.30009-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-07-09 16:35           ` Stephen Boyd
2013-07-09 16:35             ` Stephen Boyd
     [not found]             ` <20130709163518.GD830-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2013-07-09 16:52               ` Stephen Warren
2013-07-09 16:52                 ` Stephen Warren
     [not found]                 ` <51DC3FD2.9070308-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-07-09 23:05                   ` Stephen Boyd
2013-07-09 23:05                     ` Stephen Boyd
     [not found]                     ` <20130709230528.GE830-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2013-07-10 16:09                       ` Stephen Warren
2013-07-10 16:09                         ` Stephen Warren
     [not found]                         ` <51DD872D.7020101-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2013-07-11 14:00                           ` Stephen Boyd
2013-07-11 14:00                             ` Stephen Boyd
2013-07-12 10:39                             ` [tip:timers/urgent] tick: broadcast: Check broadcast mode on CPU hotplug tip-bot for Stephen Boyd

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.