linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tick: prefer a lower rating device only if it's CPU local device
@ 2018-05-09 16:02 Sudeep Holla
  2018-05-13 13:09 ` [tip:timers/core] tick: Prefer " tip-bot for Sudeep Holla
  2018-07-02 23:44 ` [PATCH] tick: prefer " Kevin Hilman
  0 siblings, 2 replies; 10+ messages in thread
From: Sudeep Holla @ 2018-05-09 16:02 UTC (permalink / raw)
  To: linux-kernel, Thomas Gleixner; +Cc: Sudeep Holla, Frederic Weisbecker

Checking the equality of cpumask for both new and old tick device doesn't
ensure that it's CPU local device. This will cause issue if a low rating
clockevent tick device is registered first followed by the registration
of higher rating clockevent tick device.

In such case, clockevents_released list will never get emptied as both
the devices get selected as preferred one and we will loop forever in
clockevents_notify_released.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
---
 kernel/time/tick-common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Hi Thomas,

I am seeing this issue on my Juno devboard, where system wide timers
with rating 300 and 400 are registered in same order and we get stuck in
a loop in clockevents_notify_released. Let me know if this looks sane or
you have any suggestions that I can try out.

Regards,
Sudeep

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 49edc1c4f3e6..78e598334007 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -277,7 +277,8 @@ static bool tick_check_preferred(struct clock_event_device *curdev,
 	 */
 	return !curdev ||
 		newdev->rating > curdev->rating ||
-	       !cpumask_equal(curdev->cpumask, newdev->cpumask);
+	       (!cpumask_equal(curdev->cpumask, newdev->cpumask) &&
+	        !tick_check_percpu(curdev, newdev, smp_processor_id()));
 }

 /*
--
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip:timers/core] tick: Prefer a lower rating device only if it's CPU local device
  2018-05-09 16:02 [PATCH] tick: prefer a lower rating device only if it's CPU local device Sudeep Holla
@ 2018-05-13 13:09 ` tip-bot for Sudeep Holla
  2018-07-02 23:44 ` [PATCH] tick: prefer " Kevin Hilman
  1 sibling, 0 replies; 10+ messages in thread
From: tip-bot for Sudeep Holla @ 2018-05-13 13:09 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: mingo, fweisbec, tglx, linux-kernel, sudeep.holla, hpa

Commit-ID:  1332a90558013ae4242e3dd7934bdcdeafb06c0d
Gitweb:     https://git.kernel.org/tip/1332a90558013ae4242e3dd7934bdcdeafb06c0d
Author:     Sudeep Holla <sudeep.holla@arm.com>
AuthorDate: Wed, 9 May 2018 17:02:08 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 13 May 2018 15:07:41 +0200

tick: Prefer a lower rating device only if it's CPU local device

Checking the equality of cpumask for both new and old tick device doesn't
ensure that it's CPU local device. This will cause issue if a low rating
clockevent tick device is registered first followed by the registration
of higher rating clockevent tick device.

In such case, clockevents_released list will never get emptied as both
the devices get selected as preferred one and we will loop forever in
clockevents_notify_released.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: https://lkml.kernel.org/r/1525881728-4858-1-git-send-email-sudeep.holla@arm.com

---
 kernel/time/tick-common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 49edc1c4f3e6..78e598334007 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -277,7 +277,8 @@ static bool tick_check_preferred(struct clock_event_device *curdev,
 	 */
 	return !curdev ||
 		newdev->rating > curdev->rating ||
-	       !cpumask_equal(curdev->cpumask, newdev->cpumask);
+	       (!cpumask_equal(curdev->cpumask, newdev->cpumask) &&
+	        !tick_check_percpu(curdev, newdev, smp_processor_id()));
 }
 
 /*

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] tick: prefer a lower rating device only if it's CPU local device
  2018-05-09 16:02 [PATCH] tick: prefer a lower rating device only if it's CPU local device Sudeep Holla
  2018-05-13 13:09 ` [tip:timers/core] tick: Prefer " tip-bot for Sudeep Holla
@ 2018-07-02 23:44 ` Kevin Hilman
  2018-07-03 10:53   ` Sudeep Holla
  1 sibling, 1 reply; 10+ messages in thread
From: Kevin Hilman @ 2018-07-02 23:44 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: lkml, Thomas Gleixner, fweisbec, Arnd Bergmann, Martin Blumenstingl

Hi Sudeep,

On Wed, May 9, 2018 at 9:02 AM Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> Checking the equality of cpumask for both new and old tick device doesn't
> ensure that it's CPU local device. This will cause issue if a low rating
> clockevent tick device is registered first followed by the registration
> of higher rating clockevent tick device.
>
> In such case, clockevents_released list will never get emptied as both
> the devices get selected as preferred one and we will loop forever in
> clockevents_notify_released.
>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

I've got a arm32 board (meson8b-odroidc1) that's been failing in
kernelCI.org since the merge window (boot log[1]), and I finally got
around to bisecting it[2].  Unfortunately, the bisect pointed at a
merge commit, but with some trial and error (and a suggestion by Arnd)
I was able to test that revering $SUBJECT commit[3], my problem goes
away.

Another interesting data point is that disabling SMP (either by
"nosmp" on the command-line or CONFIG_SMP=n) also makes the problem go
away, without needing to revert this patch.

AFAICT, this platform, is using a single timer as a clocksource
("amlogic,meson6-timer") which is not a per-CPU timer.

I ran out of time to keep digging on this issue, and I'm still not
sure exactly what's going on, but I wanted to report it in case anyone
else has any ideas, and so we can hopefully get it fixed during the
-rc cycle.

Kevin

[1] https://storage.kernelci.org/mainline/master/v4.18-rc2-357-gd3bc0e67f852/arm/multi_v7_defconfig/lab-baylibre-seattle/boot-meson8b-odroidc1.html
[2] http://termbin.com/mk07
[3] in mainline as: 1332a9055801 tick: Prefer a lower rating device
only if it's CPU local device

> ---
>  kernel/time/tick-common.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> Hi Thomas,
>
> I am seeing this issue on my Juno devboard, where system wide timers
> with rating 300 and 400 are registered in same order and we get stuck in
> a loop in clockevents_notify_released. Let me know if this looks sane or
> you have any suggestions that I can try out.
>
> Regards,
> Sudeep
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 49edc1c4f3e6..78e598334007 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -277,7 +277,8 @@ static bool tick_check_preferred(struct clock_event_device *curdev,
>          */
>         return !curdev ||
>                 newdev->rating > curdev->rating ||
> -              !cpumask_equal(curdev->cpumask, newdev->cpumask);
> +              (!cpumask_equal(curdev->cpumask, newdev->cpumask) &&
> +               !tick_check_percpu(curdev, newdev, smp_processor_id()));
>  }
>
>  /*
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tick: prefer a lower rating device only if it's CPU local device
  2018-07-02 23:44 ` [PATCH] tick: prefer " Kevin Hilman
@ 2018-07-03 10:53   ` Sudeep Holla
  2018-07-03 15:04     ` Kevin Hilman
  0 siblings, 1 reply; 10+ messages in thread
From: Sudeep Holla @ 2018-07-03 10:53 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: lkml, Thomas Gleixner, fweisbec, Arnd Bergmann,
	Martin Blumenstingl, Sudeep Holla

On Mon, Jul 02, 2018 at 04:44:33PM -0700, Kevin Hilman wrote:
> Hi Sudeep,
> 
> On Wed, May 9, 2018 at 9:02 AM Sudeep Holla <sudeep.holla@arm.com> wrote:
> >
> > Checking the equality of cpumask for both new and old tick device doesn't
> > ensure that it's CPU local device. This will cause issue if a low rating
> > clockevent tick device is registered first followed by the registration
> > of higher rating clockevent tick device.
> >
> > In such case, clockevents_released list will never get emptied as both
> > the devices get selected as preferred one and we will loop forever in
> > clockevents_notify_released.
> >
> > Cc: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> 
> I've got a arm32 board (meson8b-odroidc1) that's been failing in
> kernelCI.org since the merge window (boot log[1]), and I finally got
> around to bisecting it[2].  Unfortunately, the bisect pointed at a
> merge commit, but with some trial and error (and a suggestion by Arnd)
> I was able to test that revering $SUBJECT commit[3], my problem goes
> away.
>

Interesting. Sorry for causing the regression.

> Another interesting data point is that disabling SMP (either by
> "nosmp" on the command-line or CONFIG_SMP=n) also makes the problem go
> away, without needing to revert this patch.
>

I am not sure of nosmp, but with CONFIG_SMP=n, TICK_BROADCAST also gets
disabled. dummy_timer won't be registered I assume.

I am not sure if dummy_timer is selected as it's per_cpu but the rating
is low anyways.

> AFAICT, this platform, is using a single timer as a clocksource
> ("amlogic,meson6-timer") which is not a per-CPU timer.
>

Yes that's what I could gather from DT. But this is A5 right ? It may
have per CPU TWD(watchdof timer) but DT doesn't specify it, so should be
fine.

> I ran out of time to keep digging on this issue, and I'm still not
> sure exactly what's going on, but I wanted to report it in case anyone
> else has any ideas, and so we can hopefully get it fixed during the
> -rc cycle.
>

From the log, it looks like the platform has booted to userspace. Any chance
we can have a look at:
$ grep "" /sys/devices/system/clock*/{broadcast,clock*}/{available,current}_*

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tick: prefer a lower rating device only if it's CPU local device
  2018-07-03 10:53   ` Sudeep Holla
@ 2018-07-03 15:04     ` Kevin Hilman
  2018-07-03 15:44       ` Sudeep Holla
  0 siblings, 1 reply; 10+ messages in thread
From: Kevin Hilman @ 2018-07-03 15:04 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: lkml, Thomas Gleixner, fweisbec, Arnd Bergmann, Martin Blumenstingl

On Tue, Jul 3, 2018 at 3:54 AM Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Mon, Jul 02, 2018 at 04:44:33PM -0700, Kevin Hilman wrote:
> > Hi Sudeep,
> >
> > On Wed, May 9, 2018 at 9:02 AM Sudeep Holla <sudeep.holla@arm.com> wrote:
> > >
> > > Checking the equality of cpumask for both new and old tick device doesn't
> > > ensure that it's CPU local device. This will cause issue if a low rating
> > > clockevent tick device is registered first followed by the registration
> > > of higher rating clockevent tick device.
> > >
> > > In such case, clockevents_released list will never get emptied as both
> > > the devices get selected as preferred one and we will loop forever in
> > > clockevents_notify_released.
> > >
> > > Cc: Frederic Weisbecker <fweisbec@gmail.com>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> >
> > I've got a arm32 board (meson8b-odroidc1) that's been failing in
> > kernelCI.org since the merge window (boot log[1]), and I finally got
> > around to bisecting it[2].  Unfortunately, the bisect pointed at a
> > merge commit, but with some trial and error (and a suggestion by Arnd)
> > I was able to test that revering $SUBJECT commit[3], my problem goes
> > away.
> >
>
> Interesting. Sorry for causing the regression.
>
> > Another interesting data point is that disabling SMP (either by
> > "nosmp" on the command-line or CONFIG_SMP=n) also makes the problem go
> > away, without needing to revert this patch.
> >
>
> I am not sure of nosmp, but with CONFIG_SMP=n, TICK_BROADCAST also gets
> disabled. dummy_timer won't be registered I assume.
>
> I am not sure if dummy_timer is selected as it's per_cpu but the rating
> is low anyways.

> > AFAICT, this platform, is using a single timer as a clocksource
> > ("amlogic,meson6-timer") which is not a per-CPU timer.
> >
>
> Yes that's what I could gather from DT. But this is A5 right ? It may
> have per CPU TWD(watchdof timer) but DT doesn't specify it, so should be
> fine.
>
> > I ran out of time to keep digging on this issue, and I'm still not
> > sure exactly what's going on, but I wanted to report it in case anyone
> > else has any ideas, and so we can hopefully get it fixed during the
> > -rc cycle.
> >
>
> From the log, it looks like the platform has booted to userspace. Any chance
> we can have a look at:
> $ grep "" /sys/devices/system/clock*/{broadcast,clock*}/{available,current}_*

In the failing case, it doesn't boot to a shell, so I can't do that,
but after I revert the patch, I have this:

/ # ls -l /sys/devices/system/clocksource
total 0
drwxr-xr-x    3 root     root             0 Jan  1 00:00 clocksource0
drwxr-xr-x    2 root     root             0 Jan  1 00:00 power
-rw-r--r--    1 root     root          4096 Jan  1 00:00 uevent
/ # cat /sys/devices/system/clocksource/clocksource0/available_clocksource
timer jiffies
/ # cat /sys/devices/system/clocksource/clocksource0/current_clocksource
timer

/ # cat /sys/devices/system/clockevents/broadcast/current_device
meson6_tick
/ # cat /sys/devices/system/clockevents/clockevent0/current_device
dummy_timer
/ # cat /sys/devices/system/clockevents/clockevent1/current_device
dummy_timer
/ # cat /sys/devices/system/clockevents/clockevent2/current_device
dummy_timer

Kevin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tick: prefer a lower rating device only if it's CPU local device
  2018-07-03 15:04     ` Kevin Hilman
@ 2018-07-03 15:44       ` Sudeep Holla
  2018-07-03 16:08         ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Sudeep Holla @ 2018-07-03 15:44 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: lkml, Thomas Gleixner, fweisbec, Arnd Bergmann,
	Martin Blumenstingl, Sudeep Holla

On Tue, Jul 03, 2018 at 08:04:37AM -0700, Kevin Hilman wrote:
> On Tue, Jul 3, 2018 at 3:54 AM Sudeep Holla <sudeep.holla@arm.com> wrote:
> >
> > On Mon, Jul 02, 2018 at 04:44:33PM -0700, Kevin Hilman wrote:
> > > Hi Sudeep,
> > >
> > > On Wed, May 9, 2018 at 9:02 AM Sudeep Holla <sudeep.holla@arm.com> wrote:
> > > >
> > > > Checking the equality of cpumask for both new and old tick device doesn't
> > > > ensure that it's CPU local device. This will cause issue if a low rating
> > > > clockevent tick device is registered first followed by the registration
> > > > of higher rating clockevent tick device.
> > > >
> > > > In such case, clockevents_released list will never get emptied as both
> > > > the devices get selected as preferred one and we will loop forever in
> > > > clockevents_notify_released.
> > > >
> > > > Cc: Frederic Weisbecker <fweisbec@gmail.com>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > >
> > > I've got a arm32 board (meson8b-odroidc1) that's been failing in
> > > kernelCI.org since the merge window (boot log[1]), and I finally got
> > > around to bisecting it[2].  Unfortunately, the bisect pointed at a
> > > merge commit, but with some trial and error (and a suggestion by Arnd)
> > > I was able to test that revering $SUBJECT commit[3], my problem goes
> > > away.
> > >
> >
> > Interesting. Sorry for causing the regression.
> >
> > > Another interesting data point is that disabling SMP (either by
> > > "nosmp" on the command-line or CONFIG_SMP=n) also makes the problem go
> > > away, without needing to revert this patch.
> > >
> >
> > I am not sure of nosmp, but with CONFIG_SMP=n, TICK_BROADCAST also gets
> > disabled. dummy_timer won't be registered I assume.
> >
> > I am not sure if dummy_timer is selected as it's per_cpu but the rating
> > is low anyways.
> 
> > > AFAICT, this platform, is using a single timer as a clocksource
> > > ("amlogic,meson6-timer") which is not a per-CPU timer.
> > >
> >
> > Yes that's what I could gather from DT. But this is A5 right ? It may
> > have per CPU TWD(watchdof timer) but DT doesn't specify it, so should be
> > fine.
> >
> > > I ran out of time to keep digging on this issue, and I'm still not
> > > sure exactly what's going on, but I wanted to report it in case anyone
> > > else has any ideas, and so we can hopefully get it fixed during the
> > > -rc cycle.
> > >
> >
> > From the log, it looks like the platform has booted to userspace. Any chance
> > we can have a look at:
> > $ grep "" /sys/devices/system/clock*/{broadcast,clock*}/{available,current}_*
> 
> In the failing case, it doesn't boot to a shell, so I can't do that,
> but after I revert the patch, I have this:
>

Ah ok, does it hang when it registers clockevents ?

> / # ls -l /sys/devices/system/clocksource
> total 0
> drwxr-xr-x    3 root     root             0 Jan  1 00:00 clocksource0
> drwxr-xr-x    2 root     root             0 Jan  1 00:00 power
> -rw-r--r--    1 root     root          4096 Jan  1 00:00 uevent
> / # cat /sys/devices/system/clocksource/clocksource0/available_clocksource
> timer jiffies

Looks good.

> / # cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> timer
>

OK, meson6 clocksource is active

> / # cat /sys/devices/system/clockevents/broadcast/current_device
> meson6_tick

OK, it can support broadcast

> / # cat /sys/devices/system/clockevents/clockevent0/current_device
> dummy_timer
> / # cat /sys/devices/system/clockevents/clockevent1/current_device
> dummy_timer
> / # cat /sys/devices/system/clockevents/clockevent2/current_device
> dummy_timer

But I can't understand why is dummy_timer the active event source and
not meson6_tick. And you say this is working case ? Looks suspicious.

If dummy_timer was getting used, I think meson6_tick was never utilised
before as I see this platform doesn't have cpuidle(at-least from DT)

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tick: prefer a lower rating device only if it's CPU local device
  2018-07-03 15:44       ` Sudeep Holla
@ 2018-07-03 16:08         ` Thomas Gleixner
  2018-07-03 16:48           ` Sudeep Holla
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2018-07-03 16:08 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Kevin Hilman, lkml, fweisbec, Arnd Bergmann, Martin Blumenstingl

On Tue, 3 Jul 2018, Sudeep Holla wrote:
> On Tue, Jul 03, 2018 at 08:04:37AM -0700, Kevin Hilman wrote:
> > / # ls -l /sys/devices/system/clocksource
> > total 0
> > drwxr-xr-x    3 root     root             0 Jan  1 00:00 clocksource0
> > drwxr-xr-x    2 root     root             0 Jan  1 00:00 power
> > -rw-r--r--    1 root     root          4096 Jan  1 00:00 uevent
> > / # cat /sys/devices/system/clocksource/clocksource0/available_clocksource
> > timer jiffies
> 
> Looks good.
> 
> > / # cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> > timer
> >
> 
> OK, meson6 clocksource is active
> 
> > / # cat /sys/devices/system/clockevents/broadcast/current_device
> > meson6_tick
> 
> OK, it can support broadcast
> 
> > / # cat /sys/devices/system/clockevents/clockevent0/current_device
> > dummy_timer
> > / # cat /sys/devices/system/clockevents/clockevent1/current_device
> > dummy_timer
> > / # cat /sys/devices/system/clockevents/clockevent2/current_device
> > dummy_timer
> 
> But I can't understand why is dummy_timer the active event source and
> not meson6_tick. And you say this is working case ? Looks suspicious.

Because if it switches to broadcast mode then the meson timer cannot longer
be used as per cpu timer. It's broadcasting to all CPUs via the dummy timer.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tick: prefer a lower rating device only if it's CPU local device
  2018-07-03 16:08         ` Thomas Gleixner
@ 2018-07-03 16:48           ` Sudeep Holla
  2018-07-08 20:59             ` Martin Blumenstingl
  0 siblings, 1 reply; 10+ messages in thread
From: Sudeep Holla @ 2018-07-03 16:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kevin Hilman, lkml, fweisbec, Arnd Bergmann, Martin Blumenstingl,
	Sudeep Holla

Hi Thomas,

On Tue, Jul 03, 2018 at 06:08:19PM +0200, Thomas Gleixner wrote:

[...]

> > > / # cat /sys/devices/system/clockevents/broadcast/current_device
> > > meson6_tick
> >
> > OK, it can support broadcast
> >
> > > / # cat /sys/devices/system/clockevents/clockevent0/current_device
> > > dummy_timer
> > > / # cat /sys/devices/system/clockevents/clockevent1/current_device
> > > dummy_timer
> > > / # cat /sys/devices/system/clockevents/clockevent2/current_device
> > > dummy_timer
> >
> > But I can't understand why is dummy_timer the active event source and
> > not meson6_tick. And you say this is working case ? Looks suspicious.
>
> Because if it switches to broadcast mode then the meson timer cannot longer
> be used as per cpu timer. It's broadcasting to all CPUs via the dummy timer.

Thanks for the explanation. I completely misread the sysfs entry and
assume clockevent_register failed for meson6 and hence regarded as
suspicious which is complete non-sense, my bad. Sorry for that.
I think I now understand the issue.

1. Juno usecase for which $subject was added as fix:

Two system wide timers(cpumask=possible cpus) with rating 300 and 400.
When second one with 400 is added, timer with rating 300 is added to
released list and again added back to main one. In this case both were
chosen as preferred and that resulted in deadlock.

2. Meson6 usecase:

When meson6_tick is added, it's set as preferred and dummy_timer is released.
When it's being added back from the released list, it will be chosen as
preferred as it's per_cpu resulting in deadlock.

I am not sure how to fix this. Should the fix to my original problem have
checks for both old and new for per-cpu  to prevent the issue reported on
Meson6

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tick: prefer a lower rating device only if it's CPU local device
  2018-07-03 16:48           ` Sudeep Holla
@ 2018-07-08 20:59             ` Martin Blumenstingl
  2018-07-09 15:12               ` Sudeep Holla
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Blumenstingl @ 2018-07-08 20:59 UTC (permalink / raw)
  To: sudeep.holla, tglx; +Cc: khilman, linux-kernel, fweisbec, arnd

Hi Thomas,

On Tue, Jul 3, 2018 at 6:48 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> Hi Thomas,
>
> On Tue, Jul 03, 2018 at 06:08:19PM +0200, Thomas Gleixner wrote:
>
> [...]
>
> > > > / # cat /sys/devices/system/clockevents/broadcast/current_device
> > > > meson6_tick
> > >
> > > OK, it can support broadcast
> > >
> > > > / # cat /sys/devices/system/clockevents/clockevent0/current_device
> > > > dummy_timer
> > > > / # cat /sys/devices/system/clockevents/clockevent1/current_device
> > > > dummy_timer
> > > > / # cat /sys/devices/system/clockevents/clockevent2/current_device
> > > > dummy_timer
> > >
> > > But I can't understand why is dummy_timer the active event source and
> > > not meson6_tick. And you say this is working case ? Looks suspicious.
> >
> > Because if it switches to broadcast mode then the meson timer cannot longer
> > be used as per cpu timer. It's broadcasting to all CPUs via the dummy timer.
>
> Thanks for the explanation. I completely misread the sysfs entry and
> assume clockevent_register failed for meson6 and hence regarded as
> suspicious which is complete non-sense, my bad. Sorry for that.
> I think I now understand the issue.
>
> 1. Juno usecase for which $subject was added as fix:
>
> Two system wide timers(cpumask=possible cpus) with rating 300 and 400.
> When second one with 400 is added, timer with rating 300 is added to
> released list and again added back to main one. In this case both were
> chosen as preferred and that resulted in deadlock.
>
> 2. Meson6 usecase:
>
> When meson6_tick is added, it's set as preferred and dummy_timer is released.
> When it's being added back from the released list, it will be chosen as
> preferred as it's per_cpu resulting in deadlock.
>
> I am not sure how to fix this. Should the fix to my original problem have
> checks for both old and new for per-cpu  to prevent the issue reported on
> Meson6
could you please answer Sudeep's question?

in the meantime I checked Sudpeed's suggestion regarding the TWD
timer: it seems that the Meson8 (Cortex-A9) and Meson8b (Cortex-A5)
SoCs have the ARM TWD (timer watchdog) built-in - Carlo sent a patch
for this a long time ago: [0].
however, I figured out that this doesn't work out-of-the-box anymore:
the TWD seems to be supplied (according to my own tests) by CPU_CLK
div 16 and the interrupt should be IRQ_TYPE_EDGE_RISING
unfortunately we cannot simply add the TWD timer to Meson8 or Meson8b
because this would first require changes to the clock controller
driver (the are currently registered as platform driver, which is too
late for the TWD timer driver)

in other words: we cannot use the TWD timer on the Meson platform in
the v4.18 cycle, so I would prefer a fix of the timer/tick code


Regards
Martin


[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/391928.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tick: prefer a lower rating device only if it's CPU local device
  2018-07-08 20:59             ` Martin Blumenstingl
@ 2018-07-09 15:12               ` Sudeep Holla
  0 siblings, 0 replies; 10+ messages in thread
From: Sudeep Holla @ 2018-07-09 15:12 UTC (permalink / raw)
  To: Martin Blumenstingl, tglx
  Cc: Sudeep Holla, khilman, linux-kernel, fweisbec, arnd



On 08/07/18 21:59, Martin Blumenstingl wrote:
> Hi Thomas,
> 
> On Tue, Jul 3, 2018 at 6:48 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
>>
>> Hi Thomas,
>>
>> On Tue, Jul 03, 2018 at 06:08:19PM +0200, Thomas Gleixner wrote:
>>
>> [...]
>>
>>>>> / # cat /sys/devices/system/clockevents/broadcast/current_device
>>>>> meson6_tick
>>>>
>>>> OK, it can support broadcast
>>>>
>>>>> / # cat /sys/devices/system/clockevents/clockevent0/current_device
>>>>> dummy_timer
>>>>> / # cat /sys/devices/system/clockevents/clockevent1/current_device
>>>>> dummy_timer
>>>>> / # cat /sys/devices/system/clockevents/clockevent2/current_device
>>>>> dummy_timer
>>>>
>>>> But I can't understand why is dummy_timer the active event source and
>>>> not meson6_tick. And you say this is working case ? Looks suspicious.
>>>
>>> Because if it switches to broadcast mode then the meson timer cannot longer
>>> be used as per cpu timer. It's broadcasting to all CPUs via the dummy timer.
>>
>> Thanks for the explanation. I completely misread the sysfs entry and
>> assume clockevent_register failed for meson6 and hence regarded as
>> suspicious which is complete non-sense, my bad. Sorry for that.
>> I think I now understand the issue.
>>
>> 1. Juno usecase for which $subject was added as fix:
>>
>> Two system wide timers(cpumask=possible cpus) with rating 300 and 400.
>> When second one with 400 is added, timer with rating 300 is added to
>> released list and again added back to main one. In this case both were
>> chosen as preferred and that resulted in deadlock.
>>
>> 2. Meson6 usecase:
>>
>> When meson6_tick is added, it's set as preferred and dummy_timer is released.
>> When it's being added back from the released list, it will be chosen as
>> preferred as it's per_cpu resulting in deadlock.
>>
>> I am not sure how to fix this. Should the fix to my original problem have
>> checks for both old and new for per-cpu  to prevent the issue reported on
>> Meson6
> could you please answer Sudeep's question?
> 

OK, I think I have misunderstood my original problem because of the
cpumask_equal for nr_bits <= BITS_PER_LONG. It uses nr_cpumask_bits
which is NR_CPUS when CPUMASK_OFFSTACK is not enabled.

Enabling it fixes my original problem(reverting $subject patch). So it
should be reverted. And also pointed out the issue with ARM mem timer.

I am sorry for the mess, I will post the revert and along with the fix
to my issue. Once again sorry for not understanding the root cause
correctly.
-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-07-09 15:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-09 16:02 [PATCH] tick: prefer a lower rating device only if it's CPU local device Sudeep Holla
2018-05-13 13:09 ` [tip:timers/core] tick: Prefer " tip-bot for Sudeep Holla
2018-07-02 23:44 ` [PATCH] tick: prefer " Kevin Hilman
2018-07-03 10:53   ` Sudeep Holla
2018-07-03 15:04     ` Kevin Hilman
2018-07-03 15:44       ` Sudeep Holla
2018-07-03 16:08         ` Thomas Gleixner
2018-07-03 16:48           ` Sudeep Holla
2018-07-08 20:59             ` Martin Blumenstingl
2018-07-09 15:12               ` Sudeep Holla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).