All of lore.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
@ 2014-10-23 13:51 ` Marcin Jabrzyk
  0 siblings, 0 replies; 18+ messages in thread
From: Marcin Jabrzyk @ 2014-10-23 13:51 UTC (permalink / raw)
  To: Daniel Lezcano, Kukjin Kim, Thomas Gleixner
  Cc: linux-kernel, Bartlomiej Zolnierkiewicz, kyungmin.park,
	linux-arm-kernel, linux-samsung-soc

[1.] One line summary of the problem: "BUG: sleeping function called 
from invalid context at mm/slub.c:1250" after CPU hotplug
[2.] Full description of the problem/report:

This was tested on Exynos 3250 board with 
https://lkml.org/lkml/2014/9/24/441 applied. Board is booting to 
/bin/sh. After executing:

mount -t sysfs sys /sys && echo 0 > /sys/devices/system/cpu/cpu1/online 
&& echo 1 > /sys/devices/system/cpu/cpu1/online

I'm getting:

[    7.226405] IRQ258 no longer affine to CPU1
[    7.226629] CPU1: shutdown
[    7.230037] CPU1: Software reset
[    7.231822] CPU1: Booted secondary processor
[    7.231843] BUG: sleeping function called from invalid context at 
mm/slub.c:1250
[    7.231850] in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/1
[    7.231861] Preemption disabled at:[<  (null)>]   (null)
[    7.231864]
[    7.231876] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.17.0-dirty #45
[    7.231914] [<c0013c04>] (unwind_backtrace) from [<c0010eac>] 
(show_stack+0x10/0x14)
[    7.231931] [<c0010eac>] (show_stack) from [<c03ffd0c>] 
(dump_stack+0x70/0xbc)
[    7.231950] [<c03ffd0c>] (dump_stack) from [<c00b9a20>] 
(kmem_cache_alloc+0xe8/0x184)
[    7.231968] [<c00b9a20>] (kmem_cache_alloc) from [<c0059710>] 
(request_threaded_irq+0x64/0x128)
[    7.231985] [<c0059710>] (request_threaded_irq) from [<c030ecc8>] 
(exynos4_local_timer_setup+0xc0/0x13c)
[    7.232000] [<c030ecc8>] (exynos4_local_timer_setup) from 
[<c030ede4>] (exynos4_mct_cpu_notify+0x30/0xa8)
[    7.232016] [<c030ede4>] (exynos4_mct_cpu_notify) from [<c0038540>] 
(notifier_call_chain+0x44/0x84)
[    7.232034] [<c0038540>] (notifier_call_chain) from [<c0021144>] 
(__cpu_notify+0x28/0x44)
[    7.232049] [<c0021144>] (__cpu_notify) from [<c0012af0>] 
(secondary_start_kernel+0xe8/0x138)
[    7.232062] [<c0012af0>] (secondary_start_kernel) from [<400086a4>] 
(0x400086a4)

The problem is that request_irq is calling allocation with GFP_KERNEL 
flag in atomic block.
This bug should be easy observable on any board with 
"samsung,exynos4210-mct" compatible MCT block.

[4.1.] Kernel version (from /proc/version):
3.17.0
[4.2.] Kernel .config file:
exynos_defconfig + DEBUG_ATOMIC_SLEEP and DEBUG_PREEMPT

[7.] A small shell script or example program which triggers the
      problem (if possible)
mount -t sysfs sys /sys && echo 0 > /sys/devices/system/cpu/cpu1/online 
&& echo 1 > /sys/devices/system/cpu/cpu1/online
[8.] Environment
/bin/sh

When SoC have MCT_INT_SPI interrupt it is being allocated after 
hotplugging of the CPU, secondary_start_kernel() is sending CPU boot 
notifications which are send when preemption and interrupts are 
disabled. Exynos_mct notification handler tries to set up and allocate 
IRQ for SPI type interrupt for started CPU and then BUG appears.
There might be similar problem on qcom-timer I think just after looking 
on the code.

Best regards,
--
Marcin Jabrzyk
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 18+ messages in thread

* PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
@ 2014-10-23 13:51 ` Marcin Jabrzyk
  0 siblings, 0 replies; 18+ messages in thread
From: Marcin Jabrzyk @ 2014-10-23 13:51 UTC (permalink / raw)
  To: linux-arm-kernel

[1.] One line summary of the problem: "BUG: sleeping function called 
from invalid context at mm/slub.c:1250" after CPU hotplug
[2.] Full description of the problem/report:

This was tested on Exynos 3250 board with 
https://lkml.org/lkml/2014/9/24/441 applied. Board is booting to 
/bin/sh. After executing:

mount -t sysfs sys /sys && echo 0 > /sys/devices/system/cpu/cpu1/online 
&& echo 1 > /sys/devices/system/cpu/cpu1/online

I'm getting:

[    7.226405] IRQ258 no longer affine to CPU1
[    7.226629] CPU1: shutdown
[    7.230037] CPU1: Software reset
[    7.231822] CPU1: Booted secondary processor
[    7.231843] BUG: sleeping function called from invalid context at 
mm/slub.c:1250
[    7.231850] in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/1
[    7.231861] Preemption disabled at:[<  (null)>]   (null)
[    7.231864]
[    7.231876] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.17.0-dirty #45
[    7.231914] [<c0013c04>] (unwind_backtrace) from [<c0010eac>] 
(show_stack+0x10/0x14)
[    7.231931] [<c0010eac>] (show_stack) from [<c03ffd0c>] 
(dump_stack+0x70/0xbc)
[    7.231950] [<c03ffd0c>] (dump_stack) from [<c00b9a20>] 
(kmem_cache_alloc+0xe8/0x184)
[    7.231968] [<c00b9a20>] (kmem_cache_alloc) from [<c0059710>] 
(request_threaded_irq+0x64/0x128)
[    7.231985] [<c0059710>] (request_threaded_irq) from [<c030ecc8>] 
(exynos4_local_timer_setup+0xc0/0x13c)
[    7.232000] [<c030ecc8>] (exynos4_local_timer_setup) from 
[<c030ede4>] (exynos4_mct_cpu_notify+0x30/0xa8)
[    7.232016] [<c030ede4>] (exynos4_mct_cpu_notify) from [<c0038540>] 
(notifier_call_chain+0x44/0x84)
[    7.232034] [<c0038540>] (notifier_call_chain) from [<c0021144>] 
(__cpu_notify+0x28/0x44)
[    7.232049] [<c0021144>] (__cpu_notify) from [<c0012af0>] 
(secondary_start_kernel+0xe8/0x138)
[    7.232062] [<c0012af0>] (secondary_start_kernel) from [<400086a4>] 
(0x400086a4)

The problem is that request_irq is calling allocation with GFP_KERNEL 
flag in atomic block.
This bug should be easy observable on any board with 
"samsung,exynos4210-mct" compatible MCT block.

[4.1.] Kernel version (from /proc/version):
3.17.0
[4.2.] Kernel .config file:
exynos_defconfig + DEBUG_ATOMIC_SLEEP and DEBUG_PREEMPT

[7.] A small shell script or example program which triggers the
      problem (if possible)
mount -t sysfs sys /sys && echo 0 > /sys/devices/system/cpu/cpu1/online 
&& echo 1 > /sys/devices/system/cpu/cpu1/online
[8.] Environment
/bin/sh

When SoC have MCT_INT_SPI interrupt it is being allocated after 
hotplugging of the CPU, secondary_start_kernel() is sending CPU boot 
notifications which are send when preemption and interrupts are 
disabled. Exynos_mct notification handler tries to set up and allocate 
IRQ for SPI type interrupt for started CPU and then BUG appears.
There might be similar problem on qcom-timer I think just after looking 
on the code.

Best regards,
--
Marcin Jabrzyk
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
  2014-10-23 13:51 ` Marcin Jabrzyk
@ 2014-10-23 14:06   ` Russell King - ARM Linux
  -1 siblings, 0 replies; 18+ messages in thread
From: Russell King - ARM Linux @ 2014-10-23 14:06 UTC (permalink / raw)
  To: Marcin Jabrzyk
  Cc: Daniel Lezcano, Kukjin Kim, Thomas Gleixner, kyungmin.park,
	linux-samsung-soc, linux-kernel, linux-arm-kernel,
	Bartlomiej Zolnierkiewicz

On Thu, Oct 23, 2014 at 03:51:16PM +0200, Marcin Jabrzyk wrote:
> [1.] One line summary of the problem: "BUG: sleeping function called from
> invalid context at mm/slub.c:1250" after CPU hotplug

I'm really not surprised.

> When SoC have MCT_INT_SPI interrupt it is being allocated after hotplugging
> of the CPU, secondary_start_kernel() is sending CPU boot notifications which
> are send when preemption and interrupts are disabled. Exynos_mct
> notification handler tries to set up and allocate IRQ for SPI type interrupt
> for started CPU and then BUG appears.
> There might be similar problem on qcom-timer I think just after looking on
> the code.

The CPU notifier is called via notify_cpu_starting(), which is called
with interrupts disabled, and a reason code of CPU_STARTING.  Interrupts
at this point /must/ remain disabled.

The Exynos code then goes on to call exynos4_local_timer_setup() which
tries to reverse the free_irq() in exynos4_local_timer_stop() by calling
request_irq().  Calling request_irq() with interrupts off has never been
permissible.

So, this code is wrong today, and it was also wrong when it was written.
It /couldn't/ have been tested.  It looks like this commit added this
buggy code:

commit ee98d27df6827b5ba4bd99cb7d5cb1239b6a1a31
Author: Stephen Boyd <sboyd@codeaurora.org>
Date:   Fri Feb 15 16:40:51 2013 -0800

    ARM: EXYNOS4: Divorce mct from local timer API

    Separate the mct local timers from the local timer API. This will
    allow us to remove ARM local timer support in the near future and
    gets us closer to moving this driver to drivers/clocksource.

    Acked-by: Kukjin Kim <kgene.kim@samsung.com>
    Acked-by: Marc Zyngier <marc.zyngier@arm.com>
    Cc: Thomas Abraham <thomas.abraham@linaro.org>
    Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>

A good question would be: why doesn't this happen at boot time when CPU1
is first brought up?  The conditions here are no different from hotplugging
CPU1 back in.  Do you see a similar warning on boot too?

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
@ 2014-10-23 14:06   ` Russell King - ARM Linux
  0 siblings, 0 replies; 18+ messages in thread
From: Russell King - ARM Linux @ 2014-10-23 14:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 23, 2014 at 03:51:16PM +0200, Marcin Jabrzyk wrote:
> [1.] One line summary of the problem: "BUG: sleeping function called from
> invalid context at mm/slub.c:1250" after CPU hotplug

I'm really not surprised.

> When SoC have MCT_INT_SPI interrupt it is being allocated after hotplugging
> of the CPU, secondary_start_kernel() is sending CPU boot notifications which
> are send when preemption and interrupts are disabled. Exynos_mct
> notification handler tries to set up and allocate IRQ for SPI type interrupt
> for started CPU and then BUG appears.
> There might be similar problem on qcom-timer I think just after looking on
> the code.

The CPU notifier is called via notify_cpu_starting(), which is called
with interrupts disabled, and a reason code of CPU_STARTING.  Interrupts
at this point /must/ remain disabled.

The Exynos code then goes on to call exynos4_local_timer_setup() which
tries to reverse the free_irq() in exynos4_local_timer_stop() by calling
request_irq().  Calling request_irq() with interrupts off has never been
permissible.

So, this code is wrong today, and it was also wrong when it was written.
It /couldn't/ have been tested.  It looks like this commit added this
buggy code:

commit ee98d27df6827b5ba4bd99cb7d5cb1239b6a1a31
Author: Stephen Boyd <sboyd@codeaurora.org>
Date:   Fri Feb 15 16:40:51 2013 -0800

    ARM: EXYNOS4: Divorce mct from local timer API

    Separate the mct local timers from the local timer API. This will
    allow us to remove ARM local timer support in the near future and
    gets us closer to moving this driver to drivers/clocksource.

    Acked-by: Kukjin Kim <kgene.kim@samsung.com>
    Acked-by: Marc Zyngier <marc.zyngier@arm.com>
    Cc: Thomas Abraham <thomas.abraham@linaro.org>
    Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>

A good question would be: why doesn't this happen at boot time when CPU1
is first brought up?  The conditions here are no different from hotplugging
CPU1 back in.  Do you see a similar warning on boot too?

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
  2014-10-23 14:06   ` Russell King - ARM Linux
@ 2014-10-23 18:41     ` Stephen Boyd
  -1 siblings, 0 replies; 18+ messages in thread
From: Stephen Boyd @ 2014-10-23 18:41 UTC (permalink / raw)
  To: Russell King - ARM Linux, Marcin Jabrzyk
  Cc: Kukjin Kim, Bartlomiej Zolnierkiewicz, Daniel Lezcano,
	linux-kernel, kyungmin.park, linux-samsung-soc, Thomas Gleixner,
	linux-arm-kernel, Mark Rutland, Chander Kashyap

On 10/23/2014 07:06 AM, Russell King - ARM Linux wrote:
> On Thu, Oct 23, 2014 at 03:51:16PM +0200, Marcin Jabrzyk wrote:
>> [1.] One line summary of the problem: "BUG: sleeping function called from
>> invalid context at mm/slub.c:1250" after CPU hotplug
> I'm really not surprised.
>
>> When SoC have MCT_INT_SPI interrupt it is being allocated after hotplugging
>> of the CPU, secondary_start_kernel() is sending CPU boot notifications which
>> are send when preemption and interrupts are disabled. Exynos_mct
>> notification handler tries to set up and allocate IRQ for SPI type interrupt
>> for started CPU and then BUG appears.
>> There might be similar problem on qcom-timer I think just after looking on
>> the code.

There's no problem for qcom-timer because there are only PPIs on SMP
platforms.

> The CPU notifier is called via notify_cpu_starting(), which is called
> with interrupts disabled, and a reason code of CPU_STARTING.  Interrupts
> at this point /must/ remain disabled.
>
> The Exynos code then goes on to call exynos4_local_timer_setup() which
> tries to reverse the free_irq() in exynos4_local_timer_stop() by calling
> request_irq().  Calling request_irq() with interrupts off has never been
> permissible.
>
> So, this code is wrong today, and it was also wrong when it was written.
> It /couldn't/ have been tested.  It looks like this commit added this
> buggy code:
>
> commit ee98d27df6827b5ba4bd99cb7d5cb1239b6a1a31
> Author: Stephen Boyd <sboyd@codeaurora.org>
> Date:   Fri Feb 15 16:40:51 2013 -0800
>
>     ARM: EXYNOS4: Divorce mct from local timer API
>
>     Separate the mct local timers from the local timer API. This will
>     allow us to remove ARM local timer support in the near future and
>     gets us closer to moving this driver to drivers/clocksource.
>
>     Acked-by: Kukjin Kim <kgene.kim@samsung.com>
>     Acked-by: Marc Zyngier <marc.zyngier@arm.com>
>     Cc: Thomas Abraham <thomas.abraham@linaro.org>
>     Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>

I'm not so sure. It looks like in that patch I didn't change anything
with respect to when things are called. In fact, it looks like we were
calling setup_irq() there, but another patch around the same time
changed that to request_irq()

commit 7114cd749a12ff9fd64a2f6f04919760f45ab183
Author: Chander Kashyap <chander.kashyap@linaro.org>
Date:   Wed Jun 19 00:29:35 2013 +0900

    clocksource: exynos_mct: use (request/free)_irq calls for local timer registration
    
    Replace the (setup/remove)_irq calls for local timer registration with
    (request/free)_irq calls. This generalizes the local timer registration API.
    Suggested by Mark Rutland.
    
    Signed-off-by: Chander Kashyap <chander.kashyap@linaro.org>
    Acked-by: Mark Rutland <mark.rutland@arm.com>
    Reviewed-by: Tomasz Figa <t.figa@samsung.com>
    Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>

I don't believe setup_irq() allocates anything so we should probably go
back to using that over request_irq() or explore requesting the irqs
once and then enabling/disabling instead.

> A good question would be: why doesn't this happen at boot time when CPU1
> is first brought up?  The conditions here are no different from hotplugging
> CPU1 back in.  Do you see a similar warning on boot too?
>

Probably because such checks are completely avoided until the system
state is switched to SYSTEM_RUNNING (see the first if statement in
__might_sleep()). It would be nice if we could remove that.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 18+ messages in thread

* PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
@ 2014-10-23 18:41     ` Stephen Boyd
  0 siblings, 0 replies; 18+ messages in thread
From: Stephen Boyd @ 2014-10-23 18:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/23/2014 07:06 AM, Russell King - ARM Linux wrote:
> On Thu, Oct 23, 2014 at 03:51:16PM +0200, Marcin Jabrzyk wrote:
>> [1.] One line summary of the problem: "BUG: sleeping function called from
>> invalid context at mm/slub.c:1250" after CPU hotplug
> I'm really not surprised.
>
>> When SoC have MCT_INT_SPI interrupt it is being allocated after hotplugging
>> of the CPU, secondary_start_kernel() is sending CPU boot notifications which
>> are send when preemption and interrupts are disabled. Exynos_mct
>> notification handler tries to set up and allocate IRQ for SPI type interrupt
>> for started CPU and then BUG appears.
>> There might be similar problem on qcom-timer I think just after looking on
>> the code.

There's no problem for qcom-timer because there are only PPIs on SMP
platforms.

> The CPU notifier is called via notify_cpu_starting(), which is called
> with interrupts disabled, and a reason code of CPU_STARTING.  Interrupts
> at this point /must/ remain disabled.
>
> The Exynos code then goes on to call exynos4_local_timer_setup() which
> tries to reverse the free_irq() in exynos4_local_timer_stop() by calling
> request_irq().  Calling request_irq() with interrupts off has never been
> permissible.
>
> So, this code is wrong today, and it was also wrong when it was written.
> It /couldn't/ have been tested.  It looks like this commit added this
> buggy code:
>
> commit ee98d27df6827b5ba4bd99cb7d5cb1239b6a1a31
> Author: Stephen Boyd <sboyd@codeaurora.org>
> Date:   Fri Feb 15 16:40:51 2013 -0800
>
>     ARM: EXYNOS4: Divorce mct from local timer API
>
>     Separate the mct local timers from the local timer API. This will
>     allow us to remove ARM local timer support in the near future and
>     gets us closer to moving this driver to drivers/clocksource.
>
>     Acked-by: Kukjin Kim <kgene.kim@samsung.com>
>     Acked-by: Marc Zyngier <marc.zyngier@arm.com>
>     Cc: Thomas Abraham <thomas.abraham@linaro.org>
>     Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>

I'm not so sure. It looks like in that patch I didn't change anything
with respect to when things are called. In fact, it looks like we were
calling setup_irq() there, but another patch around the same time
changed that to request_irq()

commit 7114cd749a12ff9fd64a2f6f04919760f45ab183
Author: Chander Kashyap <chander.kashyap@linaro.org>
Date:   Wed Jun 19 00:29:35 2013 +0900

    clocksource: exynos_mct: use (request/free)_irq calls for local timer registration
    
    Replace the (setup/remove)_irq calls for local timer registration with
    (request/free)_irq calls. This generalizes the local timer registration API.
    Suggested by Mark Rutland.
    
    Signed-off-by: Chander Kashyap <chander.kashyap@linaro.org>
    Acked-by: Mark Rutland <mark.rutland@arm.com>
    Reviewed-by: Tomasz Figa <t.figa@samsung.com>
    Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>

I don't believe setup_irq() allocates anything so we should probably go
back to using that over request_irq() or explore requesting the irqs
once and then enabling/disabling instead.

> A good question would be: why doesn't this happen at boot time when CPU1
> is first brought up?  The conditions here are no different from hotplugging
> CPU1 back in.  Do you see a similar warning on boot too?
>

Probably because such checks are completely avoided until the system
state is switched to SYSTEM_RUNNING (see the first if statement in
__might_sleep()). It would be nice if we could remove that.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
  2014-10-23 18:41     ` Stephen Boyd
@ 2014-10-24 13:22       ` Marcin Jabrzyk
  -1 siblings, 0 replies; 18+ messages in thread
From: Marcin Jabrzyk @ 2014-10-24 13:22 UTC (permalink / raw)
  To: Stephen Boyd, Russell King - ARM Linux
  Cc: Kukjin Kim, Bartlomiej Zolnierkiewicz, Daniel Lezcano,
	linux-kernel, kyungmin.park, linux-samsung-soc, Thomas Gleixner,
	linux-arm-kernel, Mark Rutland, Chander Kashyap



On 23/10/14 20:41, Stephen Boyd wrote:
> On 10/23/2014 07:06 AM, Russell King - ARM Linux wrote:
>> On Thu, Oct 23, 2014 at 03:51:16PM +0200, Marcin Jabrzyk wrote:
>>> [1.] One line summary of the problem: "BUG: sleeping function called from
>>> invalid context at mm/slub.c:1250" after CPU hotplug
>> I'm really not surprised.
>>
>>> When SoC have MCT_INT_SPI interrupt it is being allocated after hotplugging
>>> of the CPU, secondary_start_kernel() is sending CPU boot notifications which
>>> are send when preemption and interrupts are disabled. Exynos_mct
>>> notification handler tries to set up and allocate IRQ for SPI type interrupt
>>> for started CPU and then BUG appears.
>>> There might be similar problem on qcom-timer I think just after looking on
>>> the code.
>
> There's no problem for qcom-timer because there are only PPIs on SMP
> platforms.
>

Ok, so it's only a problem on Exynos platform for now.
>> The CPU notifier is called via notify_cpu_starting(), which is called
>> with interrupts disabled, and a reason code of CPU_STARTING.  Interrupts
>> at this point /must/ remain disabled.
>>
>> The Exynos code then goes on to call exynos4_local_timer_setup() which
>> tries to reverse the free_irq() in exynos4_local_timer_stop() by calling
>> request_irq().  Calling request_irq() with interrupts off has never been
>> permissible.
>>
>> So, this code is wrong today, and it was also wrong when it was written.
>> It /couldn't/ have been tested.  It looks like this commit added this
>> buggy code:
>>
>> commit ee98d27df6827b5ba4bd99cb7d5cb1239b6a1a31
>> Author: Stephen Boyd <sboyd@codeaurora.org>
>> Date:   Fri Feb 15 16:40:51 2013 -0800
>>
>>      ARM: EXYNOS4: Divorce mct from local timer API
>>
>>      Separate the mct local timers from the local timer API. This will
>>      allow us to remove ARM local timer support in the near future and
>>      gets us closer to moving this driver to drivers/clocksource.
>>
>>      Acked-by: Kukjin Kim <kgene.kim@samsung.com>
>>      Acked-by: Marc Zyngier <marc.zyngier@arm.com>
>>      Cc: Thomas Abraham <thomas.abraham@linaro.org>
>>      Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
>
> I'm not so sure. It looks like in that patch I didn't change anything
> with respect to when things are called. In fact, it looks like we were
> calling setup_irq() there, but another patch around the same time
> changed that to request_irq()
>
> commit 7114cd749a12ff9fd64a2f6f04919760f45ab183
> Author: Chander Kashyap <chander.kashyap@linaro.org>
> Date:   Wed Jun 19 00:29:35 2013 +0900
>
>      clocksource: exynos_mct: use (request/free)_irq calls for local timer registration
>
>      Replace the (setup/remove)_irq calls for local timer registration with
>      (request/free)_irq calls. This generalizes the local timer registration API.
>      Suggested by Mark Rutland.
>
>      Signed-off-by: Chander Kashyap <chander.kashyap@linaro.org>
>      Acked-by: Mark Rutland <mark.rutland@arm.com>
>      Reviewed-by: Tomasz Figa <t.figa@samsung.com>
>      Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
>
> I don't believe setup_irq() allocates anything so we should probably go
> back to using that over request_irq() or explore requesting the irqs
> once and then enabling/disabling instead.
>

So what would be a better way to handle this? Going back to setup_irq or 
trying to enable/disable irqs on CPU hotplug? As this touched low level 
things and it's rare case for setting/enabling irqs just after CPU is 
coming back to life again.

>> A good question would be: why doesn't this happen at boot time when CPU1
>> is first brought up?  The conditions here are no different from hotplugging
>> CPU1 back in.  Do you see a similar warning on boot too?
>>

No the boot looks clean and there is not any sign of that problem.
>
> Probably because such checks are completely avoided until the system
> state is switched to SYSTEM_RUNNING (see the first if statement in
> __might_sleep()). It would be nice if we could remove that.
>

That's most probably the reason of no warnings on boot process.

Best regards,
--
Marcin Jabrzyk
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 18+ messages in thread

* PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
@ 2014-10-24 13:22       ` Marcin Jabrzyk
  0 siblings, 0 replies; 18+ messages in thread
From: Marcin Jabrzyk @ 2014-10-24 13:22 UTC (permalink / raw)
  To: linux-arm-kernel



On 23/10/14 20:41, Stephen Boyd wrote:
> On 10/23/2014 07:06 AM, Russell King - ARM Linux wrote:
>> On Thu, Oct 23, 2014 at 03:51:16PM +0200, Marcin Jabrzyk wrote:
>>> [1.] One line summary of the problem: "BUG: sleeping function called from
>>> invalid context at mm/slub.c:1250" after CPU hotplug
>> I'm really not surprised.
>>
>>> When SoC have MCT_INT_SPI interrupt it is being allocated after hotplugging
>>> of the CPU, secondary_start_kernel() is sending CPU boot notifications which
>>> are send when preemption and interrupts are disabled. Exynos_mct
>>> notification handler tries to set up and allocate IRQ for SPI type interrupt
>>> for started CPU and then BUG appears.
>>> There might be similar problem on qcom-timer I think just after looking on
>>> the code.
>
> There's no problem for qcom-timer because there are only PPIs on SMP
> platforms.
>

Ok, so it's only a problem on Exynos platform for now.
>> The CPU notifier is called via notify_cpu_starting(), which is called
>> with interrupts disabled, and a reason code of CPU_STARTING.  Interrupts
>> at this point /must/ remain disabled.
>>
>> The Exynos code then goes on to call exynos4_local_timer_setup() which
>> tries to reverse the free_irq() in exynos4_local_timer_stop() by calling
>> request_irq().  Calling request_irq() with interrupts off has never been
>> permissible.
>>
>> So, this code is wrong today, and it was also wrong when it was written.
>> It /couldn't/ have been tested.  It looks like this commit added this
>> buggy code:
>>
>> commit ee98d27df6827b5ba4bd99cb7d5cb1239b6a1a31
>> Author: Stephen Boyd <sboyd@codeaurora.org>
>> Date:   Fri Feb 15 16:40:51 2013 -0800
>>
>>      ARM: EXYNOS4: Divorce mct from local timer API
>>
>>      Separate the mct local timers from the local timer API. This will
>>      allow us to remove ARM local timer support in the near future and
>>      gets us closer to moving this driver to drivers/clocksource.
>>
>>      Acked-by: Kukjin Kim <kgene.kim@samsung.com>
>>      Acked-by: Marc Zyngier <marc.zyngier@arm.com>
>>      Cc: Thomas Abraham <thomas.abraham@linaro.org>
>>      Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
>
> I'm not so sure. It looks like in that patch I didn't change anything
> with respect to when things are called. In fact, it looks like we were
> calling setup_irq() there, but another patch around the same time
> changed that to request_irq()
>
> commit 7114cd749a12ff9fd64a2f6f04919760f45ab183
> Author: Chander Kashyap <chander.kashyap@linaro.org>
> Date:   Wed Jun 19 00:29:35 2013 +0900
>
>      clocksource: exynos_mct: use (request/free)_irq calls for local timer registration
>
>      Replace the (setup/remove)_irq calls for local timer registration with
>      (request/free)_irq calls. This generalizes the local timer registration API.
>      Suggested by Mark Rutland.
>
>      Signed-off-by: Chander Kashyap <chander.kashyap@linaro.org>
>      Acked-by: Mark Rutland <mark.rutland@arm.com>
>      Reviewed-by: Tomasz Figa <t.figa@samsung.com>
>      Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
>
> I don't believe setup_irq() allocates anything so we should probably go
> back to using that over request_irq() or explore requesting the irqs
> once and then enabling/disabling instead.
>

So what would be a better way to handle this? Going back to setup_irq or 
trying to enable/disable irqs on CPU hotplug? As this touched low level 
things and it's rare case for setting/enabling irqs just after CPU is 
coming back to life again.

>> A good question would be: why doesn't this happen at boot time when CPU1
>> is first brought up?  The conditions here are no different from hotplugging
>> CPU1 back in.  Do you see a similar warning on boot too?
>>

No the boot looks clean and there is not any sign of that problem.
>
> Probably because such checks are completely avoided until the system
> state is switched to SYSTEM_RUNNING (see the first if statement in
> __might_sleep()). It would be nice if we could remove that.
>

That's most probably the reason of no warnings on boot process.

Best regards,
--
Marcin Jabrzyk
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
  2014-10-24 13:22       ` Marcin Jabrzyk
@ 2014-10-27 20:16         ` Stephen Boyd
  -1 siblings, 0 replies; 18+ messages in thread
From: Stephen Boyd @ 2014-10-27 20:16 UTC (permalink / raw)
  To: Marcin Jabrzyk, Russell King - ARM Linux
  Cc: Kukjin Kim, Bartlomiej Zolnierkiewicz, Daniel Lezcano,
	linux-kernel, kyungmin.park, linux-samsung-soc, Thomas Gleixner,
	linux-arm-kernel, Mark Rutland, Chander Kashyap

On 10/24/2014 06:22 AM, Marcin Jabrzyk wrote:
>
>
> On 23/10/14 20:41, Stephen Boyd wrote:
>> On 10/23/2014 07:06 AM, Russell King - ARM Linux wrote:
>>> The CPU notifier is called via notify_cpu_starting(), which is called
>>> with interrupts disabled, and a reason code of CPU_STARTING. 
>>> Interrupts
>>> at this point /must/ remain disabled.
>>>
>>> The Exynos code then goes on to call exynos4_local_timer_setup() which
>>> tries to reverse the free_irq() in exynos4_local_timer_stop() by
>>> calling
>>> request_irq().  Calling request_irq() with interrupts off has never
>>> been
>>> permissible.
>>>
>>> So, this code is wrong today, and it was also wrong when it was
>>> written.
>>> It /couldn't/ have been tested.  It looks like this commit added this
>>> buggy code:
>>>
>>> commit ee98d27df6827b5ba4bd99cb7d5cb1239b6a1a31
>>> Author: Stephen Boyd <sboyd@codeaurora.org>
>>> Date:   Fri Feb 15 16:40:51 2013 -0800
>>>
>>>      ARM: EXYNOS4: Divorce mct from local timer API
>>>
>>>      Separate the mct local timers from the local timer API. This will
>>>      allow us to remove ARM local timer support in the near future and
>>>      gets us closer to moving this driver to drivers/clocksource.
>>>
>>>      Acked-by: Kukjin Kim <kgene.kim@samsung.com>
>>>      Acked-by: Marc Zyngier <marc.zyngier@arm.com>
>>>      Cc: Thomas Abraham <thomas.abraham@linaro.org>
>>>      Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
>>
>> I'm not so sure. It looks like in that patch I didn't change anything
>> with respect to when things are called. In fact, it looks like we were
>> calling setup_irq() there, but another patch around the same time
>> changed that to request_irq()
>>
>> commit 7114cd749a12ff9fd64a2f6f04919760f45ab183
>> Author: Chander Kashyap <chander.kashyap@linaro.org>
>> Date:   Wed Jun 19 00:29:35 2013 +0900
>>
>>      clocksource: exynos_mct: use (request/free)_irq calls for local
>> timer registration
>>
>>      Replace the (setup/remove)_irq calls for local timer
>> registration with
>>      (request/free)_irq calls. This generalizes the local timer
>> registration API.
>>      Suggested by Mark Rutland.
>>
>>      Signed-off-by: Chander Kashyap <chander.kashyap@linaro.org>
>>      Acked-by: Mark Rutland <mark.rutland@arm.com>
>>      Reviewed-by: Tomasz Figa <t.figa@samsung.com>
>>      Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
>>
>> I don't believe setup_irq() allocates anything so we should probably go
>> back to using that over request_irq() or explore requesting the irqs
>> once and then enabling/disabling instead.
>>
>
> So what would be a better way to handle this? Going back to setup_irq
> or trying to enable/disable irqs on CPU hotplug? As this touched low
> level things and it's rare case for setting/enabling irqs just after
> CPU is coming back to life again.
>

The safest thing is setup_irq(), but do you care to try this patch?
Doing the enable/disable is not as robust because request_irq() returns
with the irq enabled and then we have to disable the irq to make things
symmetric. This whole driver doesn't look like it's prepared for such a
situation where the interrupt triggers before the clockevent is
registered so this doesn't look like a problem in practice. Doing the
disable right after request is typically bad though, and may not pass
review.

----8<-----

From: Stephen Boyd <sboyd@codeaurora.org>
Subject: [PATCH] clocksource: exynos_mct: Avoid scheduling while atomic

If we call request_irq() during the CPU_STARTING notifier we'll
try to allocate an irq descriptor with GFP_KERNEL while we're
running with irqs disabled. Just request the irqs at boot time
and enable/disable them when a CPU comes up or goes down to avoid
such problems.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 drivers/clocksource/exynos_mct.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index 9403061a2acc..1800053b4644 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -467,13 +467,7 @@ static int exynos4_local_timer_setup(struct clock_event_device *evt)
 
 	if (mct_int_type == MCT_INT_SPI) {
 		evt->irq = mct_irqs[MCT_L0_IRQ + cpu];
-		if (request_irq(evt->irq, exynos4_mct_tick_isr,
-				IRQF_TIMER | IRQF_NOBALANCING,
-				evt->name, mevt)) {
-			pr_err("exynos-mct: cannot register IRQ %d\n",
-				evt->irq);
-			return -EIO;
-		}
+		enable_irq(evt->irq);
 		irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
 	} else {
 		enable_percpu_irq(mct_irqs[MCT_L0_IRQ], 0);
@@ -488,7 +482,7 @@ static void exynos4_local_timer_stop(struct clock_event_device *evt)
 {
 	evt->set_mode(CLOCK_EVT_MODE_UNUSED, evt);
 	if (mct_int_type == MCT_INT_SPI)
-		free_irq(evt->irq, this_cpu_ptr(&percpu_mct_tick));
+		disable_irq(evt->irq);
 	else
 		disable_percpu_irq(mct_irqs[MCT_L0_IRQ]);
 }
@@ -522,8 +516,9 @@ static struct notifier_block exynos4_mct_cpu_nb = {
 
 static void __init exynos4_timer_resources(struct device_node *np, void __iomem *base)
 {
-	int err;
+	int err, cpu;
 	struct mct_clock_event_device *mevt = this_cpu_ptr(&percpu_mct_tick);
+	struct mct_clock_event_device *evt;
 	struct clk *mct_clk, *tick_clk;
 
 	tick_clk = np ? of_clk_get_by_name(np, "fin_pll") :
@@ -549,7 +544,15 @@ static void __init exynos4_timer_resources(struct device_node *np, void __iomem
 		WARN(err, "MCT: can't request IRQ %d (%d)\n",
 		     mct_irqs[MCT_L0_IRQ], err);
 	} else {
-		irq_set_affinity(mct_irqs[MCT_L0_IRQ], cpumask_of(0));
+		for_each_present_cpu(cpu) {
+			evt = per_cpu_ptr(&percpu_mct_tick, cpu);
+			if (request_irq(mct_irqs[MCT_L0_IRQ + cpu],
+					exynos4_mct_tick_isr,
+					IRQF_TIMER | IRQF_NOBALANCING,
+					"MCT", evt))
+				pr_err("exynos-mct: cannot register IRQ\n");
+			disable_irq(mct_irqs[MCT_L0_IRQ + cpu]);
+		}
 	}
 
 	err = register_cpu_notifier(&exynos4_mct_cpu_nb);

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
@ 2014-10-27 20:16         ` Stephen Boyd
  0 siblings, 0 replies; 18+ messages in thread
From: Stephen Boyd @ 2014-10-27 20:16 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/24/2014 06:22 AM, Marcin Jabrzyk wrote:
>
>
> On 23/10/14 20:41, Stephen Boyd wrote:
>> On 10/23/2014 07:06 AM, Russell King - ARM Linux wrote:
>>> The CPU notifier is called via notify_cpu_starting(), which is called
>>> with interrupts disabled, and a reason code of CPU_STARTING. 
>>> Interrupts
>>> at this point /must/ remain disabled.
>>>
>>> The Exynos code then goes on to call exynos4_local_timer_setup() which
>>> tries to reverse the free_irq() in exynos4_local_timer_stop() by
>>> calling
>>> request_irq().  Calling request_irq() with interrupts off has never
>>> been
>>> permissible.
>>>
>>> So, this code is wrong today, and it was also wrong when it was
>>> written.
>>> It /couldn't/ have been tested.  It looks like this commit added this
>>> buggy code:
>>>
>>> commit ee98d27df6827b5ba4bd99cb7d5cb1239b6a1a31
>>> Author: Stephen Boyd <sboyd@codeaurora.org>
>>> Date:   Fri Feb 15 16:40:51 2013 -0800
>>>
>>>      ARM: EXYNOS4: Divorce mct from local timer API
>>>
>>>      Separate the mct local timers from the local timer API. This will
>>>      allow us to remove ARM local timer support in the near future and
>>>      gets us closer to moving this driver to drivers/clocksource.
>>>
>>>      Acked-by: Kukjin Kim <kgene.kim@samsung.com>
>>>      Acked-by: Marc Zyngier <marc.zyngier@arm.com>
>>>      Cc: Thomas Abraham <thomas.abraham@linaro.org>
>>>      Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
>>
>> I'm not so sure. It looks like in that patch I didn't change anything
>> with respect to when things are called. In fact, it looks like we were
>> calling setup_irq() there, but another patch around the same time
>> changed that to request_irq()
>>
>> commit 7114cd749a12ff9fd64a2f6f04919760f45ab183
>> Author: Chander Kashyap <chander.kashyap@linaro.org>
>> Date:   Wed Jun 19 00:29:35 2013 +0900
>>
>>      clocksource: exynos_mct: use (request/free)_irq calls for local
>> timer registration
>>
>>      Replace the (setup/remove)_irq calls for local timer
>> registration with
>>      (request/free)_irq calls. This generalizes the local timer
>> registration API.
>>      Suggested by Mark Rutland.
>>
>>      Signed-off-by: Chander Kashyap <chander.kashyap@linaro.org>
>>      Acked-by: Mark Rutland <mark.rutland@arm.com>
>>      Reviewed-by: Tomasz Figa <t.figa@samsung.com>
>>      Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
>>
>> I don't believe setup_irq() allocates anything so we should probably go
>> back to using that over request_irq() or explore requesting the irqs
>> once and then enabling/disabling instead.
>>
>
> So what would be a better way to handle this? Going back to setup_irq
> or trying to enable/disable irqs on CPU hotplug? As this touched low
> level things and it's rare case for setting/enabling irqs just after
> CPU is coming back to life again.
>

The safest thing is setup_irq(), but do you care to try this patch?
Doing the enable/disable is not as robust because request_irq() returns
with the irq enabled and then we have to disable the irq to make things
symmetric. This whole driver doesn't look like it's prepared for such a
situation where the interrupt triggers before the clockevent is
registered so this doesn't look like a problem in practice. Doing the
disable right after request is typically bad though, and may not pass
review.

----8<-----

From: Stephen Boyd <sboyd@codeaurora.org>
Subject: [PATCH] clocksource: exynos_mct: Avoid scheduling while atomic

If we call request_irq() during the CPU_STARTING notifier we'll
try to allocate an irq descriptor with GFP_KERNEL while we're
running with irqs disabled. Just request the irqs at boot time
and enable/disable them when a CPU comes up or goes down to avoid
such problems.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 drivers/clocksource/exynos_mct.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index 9403061a2acc..1800053b4644 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -467,13 +467,7 @@ static int exynos4_local_timer_setup(struct clock_event_device *evt)
 
 	if (mct_int_type == MCT_INT_SPI) {
 		evt->irq = mct_irqs[MCT_L0_IRQ + cpu];
-		if (request_irq(evt->irq, exynos4_mct_tick_isr,
-				IRQF_TIMER | IRQF_NOBALANCING,
-				evt->name, mevt)) {
-			pr_err("exynos-mct: cannot register IRQ %d\n",
-				evt->irq);
-			return -EIO;
-		}
+		enable_irq(evt->irq);
 		irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
 	} else {
 		enable_percpu_irq(mct_irqs[MCT_L0_IRQ], 0);
@@ -488,7 +482,7 @@ static void exynos4_local_timer_stop(struct clock_event_device *evt)
 {
 	evt->set_mode(CLOCK_EVT_MODE_UNUSED, evt);
 	if (mct_int_type == MCT_INT_SPI)
-		free_irq(evt->irq, this_cpu_ptr(&percpu_mct_tick));
+		disable_irq(evt->irq);
 	else
 		disable_percpu_irq(mct_irqs[MCT_L0_IRQ]);
 }
@@ -522,8 +516,9 @@ static struct notifier_block exynos4_mct_cpu_nb = {
 
 static void __init exynos4_timer_resources(struct device_node *np, void __iomem *base)
 {
-	int err;
+	int err, cpu;
 	struct mct_clock_event_device *mevt = this_cpu_ptr(&percpu_mct_tick);
+	struct mct_clock_event_device *evt;
 	struct clk *mct_clk, *tick_clk;
 
 	tick_clk = np ? of_clk_get_by_name(np, "fin_pll") :
@@ -549,7 +544,15 @@ static void __init exynos4_timer_resources(struct device_node *np, void __iomem
 		WARN(err, "MCT: can't request IRQ %d (%d)\n",
 		     mct_irqs[MCT_L0_IRQ], err);
 	} else {
-		irq_set_affinity(mct_irqs[MCT_L0_IRQ], cpumask_of(0));
+		for_each_present_cpu(cpu) {
+			evt = per_cpu_ptr(&percpu_mct_tick, cpu);
+			if (request_irq(mct_irqs[MCT_L0_IRQ + cpu],
+					exynos4_mct_tick_isr,
+					IRQF_TIMER | IRQF_NOBALANCING,
+					"MCT", evt))
+				pr_err("exynos-mct: cannot register IRQ\n");
+			disable_irq(mct_irqs[MCT_L0_IRQ + cpu]);
+		}
 	}
 
 	err = register_cpu_notifier(&exynos4_mct_cpu_nb);

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
  2014-10-27 20:16         ` Stephen Boyd
@ 2014-10-29 10:38           ` Marcin Jabrzyk
  -1 siblings, 0 replies; 18+ messages in thread
From: Marcin Jabrzyk @ 2014-10-29 10:38 UTC (permalink / raw)
  To: Stephen Boyd, Russell King - ARM Linux
  Cc: Kukjin Kim, Bartlomiej Zolnierkiewicz, Daniel Lezcano,
	linux-kernel, kyungmin.park, linux-samsung-soc, Thomas Gleixner,
	linux-arm-kernel, Mark Rutland, Chander Kashyap

So I've tried this patch, it resolves one problem but introduces also 
new ones. As expected the BUG warning is not showing after applying this 
patch but there are some interesting side effects.
I was looking on /proc/interrupts output. IRQ for CPU0 have "MCT" name 
and IRQ for CPU1 has unexpectedly no name at all.
After making hotplug cycle of CPU1 I've observed that IRQs attached 
originally for that CPU are generating on really low count and not in 
order with IRQ for CPU0.
What's more the interrupt for CPU1 is showing to me as being counted for 
both CPUs, so it's probably not being attached to CPU1.

Best regards,
Marcin Jabrzyk

On 27/10/14 21:16, Stephen Boyd wrote:
> On 10/24/2014 06:22 AM, Marcin Jabrzyk wrote:
>>
>>
>> On 23/10/14 20:41, Stephen Boyd wrote:
>>> On 10/23/2014 07:06 AM, Russell King - ARM Linux wrote:
>>>> The CPU notifier is called via notify_cpu_starting(), which is called
>>>> with interrupts disabled, and a reason code of CPU_STARTING.
>>>> Interrupts
>>>> at this point /must/ remain disabled.
>>>>
>>>> The Exynos code then goes on to call exynos4_local_timer_setup() which
>>>> tries to reverse the free_irq() in exynos4_local_timer_stop() by
>>>> calling
>>>> request_irq().  Calling request_irq() with interrupts off has never
>>>> been
>>>> permissible.
>>>>
>>>> So, this code is wrong today, and it was also wrong when it was
>>>> written.
>>>> It /couldn't/ have been tested.  It looks like this commit added this
>>>> buggy code:
>>>>
>>>> commit ee98d27df6827b5ba4bd99cb7d5cb1239b6a1a31
>>>> Author: Stephen Boyd <sboyd@codeaurora.org>
>>>> Date:   Fri Feb 15 16:40:51 2013 -0800
>>>>
>>>>       ARM: EXYNOS4: Divorce mct from local timer API
>>>>
>>>>       Separate the mct local timers from the local timer API. This will
>>>>       allow us to remove ARM local timer support in the near future and
>>>>       gets us closer to moving this driver to drivers/clocksource.
>>>>
>>>>       Acked-by: Kukjin Kim <kgene.kim@samsung.com>
>>>>       Acked-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>       Cc: Thomas Abraham <thomas.abraham@linaro.org>
>>>>       Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
>>>
>>> I'm not so sure. It looks like in that patch I didn't change anything
>>> with respect to when things are called. In fact, it looks like we were
>>> calling setup_irq() there, but another patch around the same time
>>> changed that to request_irq()
>>>
>>> commit 7114cd749a12ff9fd64a2f6f04919760f45ab183
>>> Author: Chander Kashyap <chander.kashyap@linaro.org>
>>> Date:   Wed Jun 19 00:29:35 2013 +0900
>>>
>>>       clocksource: exynos_mct: use (request/free)_irq calls for local
>>> timer registration
>>>
>>>       Replace the (setup/remove)_irq calls for local timer
>>> registration with
>>>       (request/free)_irq calls. This generalizes the local timer
>>> registration API.
>>>       Suggested by Mark Rutland.
>>>
>>>       Signed-off-by: Chander Kashyap <chander.kashyap@linaro.org>
>>>       Acked-by: Mark Rutland <mark.rutland@arm.com>
>>>       Reviewed-by: Tomasz Figa <t.figa@samsung.com>
>>>       Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
>>>
>>> I don't believe setup_irq() allocates anything so we should probably go
>>> back to using that over request_irq() or explore requesting the irqs
>>> once and then enabling/disabling instead.
>>>
>>
>> So what would be a better way to handle this? Going back to setup_irq
>> or trying to enable/disable irqs on CPU hotplug? As this touched low
>> level things and it's rare case for setting/enabling irqs just after
>> CPU is coming back to life again.
>>
>
> The safest thing is setup_irq(), but do you care to try this patch?
> Doing the enable/disable is not as robust because request_irq() returns
> with the irq enabled and then we have to disable the irq to make things
> symmetric. This whole driver doesn't look like it's prepared for such a
> situation where the interrupt triggers before the clockevent is
> registered so this doesn't look like a problem in practice. Doing the
> disable right after request is typically bad though, and may not pass
> review.
>
> ----8<-----
>
> From: Stephen Boyd <sboyd@codeaurora.org>
> Subject: [PATCH] clocksource: exynos_mct: Avoid scheduling while atomic
>
> If we call request_irq() during the CPU_STARTING notifier we'll
> try to allocate an irq descriptor with GFP_KERNEL while we're
> running with irqs disabled. Just request the irqs at boot time
> and enable/disable them when a CPU comes up or goes down to avoid
> such problems.
>
> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
> ---
>   drivers/clocksource/exynos_mct.c | 23 +++++++++++++----------
>   1 file changed, 13 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
> index 9403061a2acc..1800053b4644 100644
> --- a/drivers/clocksource/exynos_mct.c
> +++ b/drivers/clocksource/exynos_mct.c
> @@ -467,13 +467,7 @@ static int exynos4_local_timer_setup(struct clock_event_device *evt)
>
>   	if (mct_int_type == MCT_INT_SPI) {
>   		evt->irq = mct_irqs[MCT_L0_IRQ + cpu];
> -		if (request_irq(evt->irq, exynos4_mct_tick_isr,
> -				IRQF_TIMER | IRQF_NOBALANCING,
> -				evt->name, mevt)) {
> -			pr_err("exynos-mct: cannot register IRQ %d\n",
> -				evt->irq);
> -			return -EIO;
> -		}
> +		enable_irq(evt->irq);
>   		irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
>   	} else {
>   		enable_percpu_irq(mct_irqs[MCT_L0_IRQ], 0);
> @@ -488,7 +482,7 @@ static void exynos4_local_timer_stop(struct clock_event_device *evt)
>   {
>   	evt->set_mode(CLOCK_EVT_MODE_UNUSED, evt);
>   	if (mct_int_type == MCT_INT_SPI)
> -		free_irq(evt->irq, this_cpu_ptr(&percpu_mct_tick));
> +		disable_irq(evt->irq);
>   	else
>   		disable_percpu_irq(mct_irqs[MCT_L0_IRQ]);
>   }
> @@ -522,8 +516,9 @@ static struct notifier_block exynos4_mct_cpu_nb = {
>
>   static void __init exynos4_timer_resources(struct device_node *np, void __iomem *base)
>   {
> -	int err;
> +	int err, cpu;
>   	struct mct_clock_event_device *mevt = this_cpu_ptr(&percpu_mct_tick);
> +	struct mct_clock_event_device *evt;
>   	struct clk *mct_clk, *tick_clk;
>
>   	tick_clk = np ? of_clk_get_by_name(np, "fin_pll") :
> @@ -549,7 +544,15 @@ static void __init exynos4_timer_resources(struct device_node *np, void __iomem
>   		WARN(err, "MCT: can't request IRQ %d (%d)\n",
>   		     mct_irqs[MCT_L0_IRQ], err);
>   	} else {
> -		irq_set_affinity(mct_irqs[MCT_L0_IRQ], cpumask_of(0));
> +		for_each_present_cpu(cpu) {
> +			evt = per_cpu_ptr(&percpu_mct_tick, cpu);
> +			if (request_irq(mct_irqs[MCT_L0_IRQ + cpu],
> +					exynos4_mct_tick_isr,
> +					IRQF_TIMER | IRQF_NOBALANCING,
> +					"MCT", evt))
> +				pr_err("exynos-mct: cannot register IRQ\n");
> +			disable_irq(mct_irqs[MCT_L0_IRQ + cpu]);
> +		}
>   	}
>
>   	err = register_cpu_notifier(&exynos4_mct_cpu_nb);
>
--
Marcin Jabrzyk
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 18+ messages in thread

* PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
@ 2014-10-29 10:38           ` Marcin Jabrzyk
  0 siblings, 0 replies; 18+ messages in thread
From: Marcin Jabrzyk @ 2014-10-29 10:38 UTC (permalink / raw)
  To: linux-arm-kernel

So I've tried this patch, it resolves one problem but introduces also 
new ones. As expected the BUG warning is not showing after applying this 
patch but there are some interesting side effects.
I was looking on /proc/interrupts output. IRQ for CPU0 have "MCT" name 
and IRQ for CPU1 has unexpectedly no name at all.
After making hotplug cycle of CPU1 I've observed that IRQs attached 
originally for that CPU are generating on really low count and not in 
order with IRQ for CPU0.
What's more the interrupt for CPU1 is showing to me as being counted for 
both CPUs, so it's probably not being attached to CPU1.

Best regards,
Marcin Jabrzyk

On 27/10/14 21:16, Stephen Boyd wrote:
> On 10/24/2014 06:22 AM, Marcin Jabrzyk wrote:
>>
>>
>> On 23/10/14 20:41, Stephen Boyd wrote:
>>> On 10/23/2014 07:06 AM, Russell King - ARM Linux wrote:
>>>> The CPU notifier is called via notify_cpu_starting(), which is called
>>>> with interrupts disabled, and a reason code of CPU_STARTING.
>>>> Interrupts
>>>> at this point /must/ remain disabled.
>>>>
>>>> The Exynos code then goes on to call exynos4_local_timer_setup() which
>>>> tries to reverse the free_irq() in exynos4_local_timer_stop() by
>>>> calling
>>>> request_irq().  Calling request_irq() with interrupts off has never
>>>> been
>>>> permissible.
>>>>
>>>> So, this code is wrong today, and it was also wrong when it was
>>>> written.
>>>> It /couldn't/ have been tested.  It looks like this commit added this
>>>> buggy code:
>>>>
>>>> commit ee98d27df6827b5ba4bd99cb7d5cb1239b6a1a31
>>>> Author: Stephen Boyd <sboyd@codeaurora.org>
>>>> Date:   Fri Feb 15 16:40:51 2013 -0800
>>>>
>>>>       ARM: EXYNOS4: Divorce mct from local timer API
>>>>
>>>>       Separate the mct local timers from the local timer API. This will
>>>>       allow us to remove ARM local timer support in the near future and
>>>>       gets us closer to moving this driver to drivers/clocksource.
>>>>
>>>>       Acked-by: Kukjin Kim <kgene.kim@samsung.com>
>>>>       Acked-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>       Cc: Thomas Abraham <thomas.abraham@linaro.org>
>>>>       Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
>>>
>>> I'm not so sure. It looks like in that patch I didn't change anything
>>> with respect to when things are called. In fact, it looks like we were
>>> calling setup_irq() there, but another patch around the same time
>>> changed that to request_irq()
>>>
>>> commit 7114cd749a12ff9fd64a2f6f04919760f45ab183
>>> Author: Chander Kashyap <chander.kashyap@linaro.org>
>>> Date:   Wed Jun 19 00:29:35 2013 +0900
>>>
>>>       clocksource: exynos_mct: use (request/free)_irq calls for local
>>> timer registration
>>>
>>>       Replace the (setup/remove)_irq calls for local timer
>>> registration with
>>>       (request/free)_irq calls. This generalizes the local timer
>>> registration API.
>>>       Suggested by Mark Rutland.
>>>
>>>       Signed-off-by: Chander Kashyap <chander.kashyap@linaro.org>
>>>       Acked-by: Mark Rutland <mark.rutland@arm.com>
>>>       Reviewed-by: Tomasz Figa <t.figa@samsung.com>
>>>       Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
>>>
>>> I don't believe setup_irq() allocates anything so we should probably go
>>> back to using that over request_irq() or explore requesting the irqs
>>> once and then enabling/disabling instead.
>>>
>>
>> So what would be a better way to handle this? Going back to setup_irq
>> or trying to enable/disable irqs on CPU hotplug? As this touched low
>> level things and it's rare case for setting/enabling irqs just after
>> CPU is coming back to life again.
>>
>
> The safest thing is setup_irq(), but do you care to try this patch?
> Doing the enable/disable is not as robust because request_irq() returns
> with the irq enabled and then we have to disable the irq to make things
> symmetric. This whole driver doesn't look like it's prepared for such a
> situation where the interrupt triggers before the clockevent is
> registered so this doesn't look like a problem in practice. Doing the
> disable right after request is typically bad though, and may not pass
> review.
>
> ----8<-----
>
> From: Stephen Boyd <sboyd@codeaurora.org>
> Subject: [PATCH] clocksource: exynos_mct: Avoid scheduling while atomic
>
> If we call request_irq() during the CPU_STARTING notifier we'll
> try to allocate an irq descriptor with GFP_KERNEL while we're
> running with irqs disabled. Just request the irqs at boot time
> and enable/disable them when a CPU comes up or goes down to avoid
> such problems.
>
> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
> ---
>   drivers/clocksource/exynos_mct.c | 23 +++++++++++++----------
>   1 file changed, 13 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
> index 9403061a2acc..1800053b4644 100644
> --- a/drivers/clocksource/exynos_mct.c
> +++ b/drivers/clocksource/exynos_mct.c
> @@ -467,13 +467,7 @@ static int exynos4_local_timer_setup(struct clock_event_device *evt)
>
>   	if (mct_int_type == MCT_INT_SPI) {
>   		evt->irq = mct_irqs[MCT_L0_IRQ + cpu];
> -		if (request_irq(evt->irq, exynos4_mct_tick_isr,
> -				IRQF_TIMER | IRQF_NOBALANCING,
> -				evt->name, mevt)) {
> -			pr_err("exynos-mct: cannot register IRQ %d\n",
> -				evt->irq);
> -			return -EIO;
> -		}
> +		enable_irq(evt->irq);
>   		irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
>   	} else {
>   		enable_percpu_irq(mct_irqs[MCT_L0_IRQ], 0);
> @@ -488,7 +482,7 @@ static void exynos4_local_timer_stop(struct clock_event_device *evt)
>   {
>   	evt->set_mode(CLOCK_EVT_MODE_UNUSED, evt);
>   	if (mct_int_type == MCT_INT_SPI)
> -		free_irq(evt->irq, this_cpu_ptr(&percpu_mct_tick));
> +		disable_irq(evt->irq);
>   	else
>   		disable_percpu_irq(mct_irqs[MCT_L0_IRQ]);
>   }
> @@ -522,8 +516,9 @@ static struct notifier_block exynos4_mct_cpu_nb = {
>
>   static void __init exynos4_timer_resources(struct device_node *np, void __iomem *base)
>   {
> -	int err;
> +	int err, cpu;
>   	struct mct_clock_event_device *mevt = this_cpu_ptr(&percpu_mct_tick);
> +	struct mct_clock_event_device *evt;
>   	struct clk *mct_clk, *tick_clk;
>
>   	tick_clk = np ? of_clk_get_by_name(np, "fin_pll") :
> @@ -549,7 +544,15 @@ static void __init exynos4_timer_resources(struct device_node *np, void __iomem
>   		WARN(err, "MCT: can't request IRQ %d (%d)\n",
>   		     mct_irqs[MCT_L0_IRQ], err);
>   	} else {
> -		irq_set_affinity(mct_irqs[MCT_L0_IRQ], cpumask_of(0));
> +		for_each_present_cpu(cpu) {
> +			evt = per_cpu_ptr(&percpu_mct_tick, cpu);
> +			if (request_irq(mct_irqs[MCT_L0_IRQ + cpu],
> +					exynos4_mct_tick_isr,
> +					IRQF_TIMER | IRQF_NOBALANCING,
> +					"MCT", evt))
> +				pr_err("exynos-mct: cannot register IRQ\n");
> +			disable_irq(mct_irqs[MCT_L0_IRQ + cpu]);
> +		}
>   	}
>
>   	err = register_cpu_notifier(&exynos4_mct_cpu_nb);
>
--
Marcin Jabrzyk
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
  2014-10-29 10:38           ` Marcin Jabrzyk
@ 2015-01-31  1:08             ` Stephen Boyd
  -1 siblings, 0 replies; 18+ messages in thread
From: Stephen Boyd @ 2015-01-31  1:08 UTC (permalink / raw)
  To: Marcin Jabrzyk, Russell King - ARM Linux
  Cc: Kukjin Kim, Bartlomiej Zolnierkiewicz, Daniel Lezcano,
	linux-kernel, kyungmin.park, linux-samsung-soc, Thomas Gleixner,
	linux-arm-kernel, Mark Rutland, Chander Kashyap

Kept meaning to get back to this thread. Have you resolved it?

On 10/29/14 03:38, Marcin Jabrzyk wrote:
> So I've tried this patch, it resolves one problem but introduces also
> new ones. As expected the BUG warning is not showing after applying
> this patch but there are some interesting side effects.

Well that's half good news.

> I was looking on /proc/interrupts output. IRQ for CPU0 have "MCT" name
> and IRQ for CPU1 has unexpectedly no name at all.

This is pretty confusing. I don't see how the patch could cause this to
happen.

> After making hotplug cycle of CPU1 I've observed that IRQs attached
> originally for that CPU are generating on really low count and not in
> order with IRQ for CPU0.
> What's more the interrupt for CPU1 is showing to me as being counted
> for both CPUs, so it's probably not being attached to CPU1.
>

yeah. Can you give the output of /proc/timer_list in addition to
/proc/interrupts? It may give some hints on what's going on. It may also
be interesting to see if irq_force_affinity() is failing. Please check
the return value and print an error


diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index 1800053b4644..3c4538e26731 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -450,6 +450,7 @@ static int exynos4_local_timer_setup(struct clock_event_device *evt)
 {
 	struct mct_clock_event_device *mevt;
 	unsigned int cpu = smp_processor_id();
+	int ret;
 
 	mevt = container_of(evt, struct mct_clock_event_device, evt);
 
@@ -468,7 +469,9 @@ static int exynos4_local_timer_setup(struct clock_event_device *evt)
 	if (mct_int_type == MCT_INT_SPI) {
 		evt->irq = mct_irqs[MCT_L0_IRQ + cpu];
 		enable_irq(evt->irq);
-		irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
+		ret = irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
+		if (ret)
+			pr_err("force failed %d\n", ret);
 	} else {
 		enable_percpu_irq(mct_irqs[MCT_L0_IRQ], 0);
 	}

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
@ 2015-01-31  1:08             ` Stephen Boyd
  0 siblings, 0 replies; 18+ messages in thread
From: Stephen Boyd @ 2015-01-31  1:08 UTC (permalink / raw)
  To: linux-arm-kernel

Kept meaning to get back to this thread. Have you resolved it?

On 10/29/14 03:38, Marcin Jabrzyk wrote:
> So I've tried this patch, it resolves one problem but introduces also
> new ones. As expected the BUG warning is not showing after applying
> this patch but there are some interesting side effects.

Well that's half good news.

> I was looking on /proc/interrupts output. IRQ for CPU0 have "MCT" name
> and IRQ for CPU1 has unexpectedly no name at all.

This is pretty confusing. I don't see how the patch could cause this to
happen.

> After making hotplug cycle of CPU1 I've observed that IRQs attached
> originally for that CPU are generating on really low count and not in
> order with IRQ for CPU0.
> What's more the interrupt for CPU1 is showing to me as being counted
> for both CPUs, so it's probably not being attached to CPU1.
>

yeah. Can you give the output of /proc/timer_list in addition to
/proc/interrupts? It may give some hints on what's going on. It may also
be interesting to see if irq_force_affinity() is failing. Please check
the return value and print an error


diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index 1800053b4644..3c4538e26731 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -450,6 +450,7 @@ static int exynos4_local_timer_setup(struct clock_event_device *evt)
 {
 	struct mct_clock_event_device *mevt;
 	unsigned int cpu = smp_processor_id();
+	int ret;
 
 	mevt = container_of(evt, struct mct_clock_event_device, evt);
 
@@ -468,7 +469,9 @@ static int exynos4_local_timer_setup(struct clock_event_device *evt)
 	if (mct_int_type == MCT_INT_SPI) {
 		evt->irq = mct_irqs[MCT_L0_IRQ + cpu];
 		enable_irq(evt->irq);
-		irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
+		ret = irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
+		if (ret)
+			pr_err("force failed %d\n", ret);
 	} else {
 		enable_percpu_irq(mct_irqs[MCT_L0_IRQ], 0);
 	}

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
  2015-01-31  1:08             ` Stephen Boyd
@ 2015-01-31  9:21               ` Daniel Lezcano
  -1 siblings, 0 replies; 18+ messages in thread
From: Daniel Lezcano @ 2015-01-31  9:21 UTC (permalink / raw)
  To: Stephen Boyd, Marcin Jabrzyk, Russell King - ARM Linux
  Cc: Kukjin Kim, Bartlomiej Zolnierkiewicz, linux-kernel,
	kyungmin.park, linux-samsung-soc, Thomas Gleixner,
	linux-arm-kernel, Mark Rutland, Chander Kashyap

On 01/31/2015 02:08 AM, Stephen Boyd wrote:
> Kept meaning to get back to this thread. Have you resolved it?
>
> On 10/29/14 03:38, Marcin Jabrzyk wrote:
>> So I've tried this patch, it resolves one problem but introduces also
>> new ones. As expected the BUG warning is not showing after applying
>> this patch but there are some interesting side effects.
>
> Well that's half good news.
>
>> I was looking on /proc/interrupts output. IRQ for CPU0 have "MCT" name
>> and IRQ for CPU1 has unexpectedly no name at all.
>
> This is pretty confusing. I don't see how the patch could cause this to
> happen.
>
>> After making hotplug cycle of CPU1 I've observed that IRQs attached
>> originally for that CPU are generating on really low count and not in
>> order with IRQ for CPU0.
>> What's more the interrupt for CPU1 is showing to me as being counted
>> for both CPUs, so it's probably not being attached to CPU1.
>>
>
> yeah. Can you give the output of /proc/timer_list in addition to
> /proc/interrupts? It may give some hints on what's going on. It may also
> be interesting to see if irq_force_affinity() is failing. Please check
> the return value and print an error

Hi Stephen, Marcin,

can you have a look if the patch [1] fixes this issue ?

  -- Daniel

[1] https://lkml.org/lkml/2015/1/30/423


> diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
> index 1800053b4644..3c4538e26731 100644
> --- a/drivers/clocksource/exynos_mct.c
> +++ b/drivers/clocksource/exynos_mct.c
> @@ -450,6 +450,7 @@ static int exynos4_local_timer_setup(struct clock_event_device *evt)
>   {
>   	struct mct_clock_event_device *mevt;
>   	unsigned int cpu = smp_processor_id();
> +	int ret;
>
>   	mevt = container_of(evt, struct mct_clock_event_device, evt);
>
> @@ -468,7 +469,9 @@ static int exynos4_local_timer_setup(struct clock_event_device *evt)
>   	if (mct_int_type == MCT_INT_SPI) {
>   		evt->irq = mct_irqs[MCT_L0_IRQ + cpu];
>   		enable_irq(evt->irq);
> -		irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
> +		ret = irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
> +		if (ret)
> +			pr_err("force failed %d\n", ret);
>   	} else {
>   		enable_percpu_irq(mct_irqs[MCT_L0_IRQ], 0);
>   	}
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
@ 2015-01-31  9:21               ` Daniel Lezcano
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Lezcano @ 2015-01-31  9:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/31/2015 02:08 AM, Stephen Boyd wrote:
> Kept meaning to get back to this thread. Have you resolved it?
>
> On 10/29/14 03:38, Marcin Jabrzyk wrote:
>> So I've tried this patch, it resolves one problem but introduces also
>> new ones. As expected the BUG warning is not showing after applying
>> this patch but there are some interesting side effects.
>
> Well that's half good news.
>
>> I was looking on /proc/interrupts output. IRQ for CPU0 have "MCT" name
>> and IRQ for CPU1 has unexpectedly no name at all.
>
> This is pretty confusing. I don't see how the patch could cause this to
> happen.
>
>> After making hotplug cycle of CPU1 I've observed that IRQs attached
>> originally for that CPU are generating on really low count and not in
>> order with IRQ for CPU0.
>> What's more the interrupt for CPU1 is showing to me as being counted
>> for both CPUs, so it's probably not being attached to CPU1.
>>
>
> yeah. Can you give the output of /proc/timer_list in addition to
> /proc/interrupts? It may give some hints on what's going on. It may also
> be interesting to see if irq_force_affinity() is failing. Please check
> the return value and print an error

Hi Stephen, Marcin,

can you have a look if the patch [1] fixes this issue ?

  -- Daniel

[1] https://lkml.org/lkml/2015/1/30/423


> diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
> index 1800053b4644..3c4538e26731 100644
> --- a/drivers/clocksource/exynos_mct.c
> +++ b/drivers/clocksource/exynos_mct.c
> @@ -450,6 +450,7 @@ static int exynos4_local_timer_setup(struct clock_event_device *evt)
>   {
>   	struct mct_clock_event_device *mevt;
>   	unsigned int cpu = smp_processor_id();
> +	int ret;
>
>   	mevt = container_of(evt, struct mct_clock_event_device, evt);
>
> @@ -468,7 +469,9 @@ static int exynos4_local_timer_setup(struct clock_event_device *evt)
>   	if (mct_int_type == MCT_INT_SPI) {
>   		evt->irq = mct_irqs[MCT_L0_IRQ + cpu];
>   		enable_irq(evt->irq);
> -		irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
> +		ret = irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
> +		if (ret)
> +			pr_err("force failed %d\n", ret);
>   	} else {
>   		enable_percpu_irq(mct_irqs[MCT_L0_IRQ], 0);
>   	}
>


-- 
  <http://www.linaro.org/> Linaro.org ? Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
  2015-01-31  9:21               ` Daniel Lezcano
@ 2015-02-02  8:47                 ` Marcin Jabrzyk
  -1 siblings, 0 replies; 18+ messages in thread
From: Marcin Jabrzyk @ 2015-02-02  8:47 UTC (permalink / raw)
  To: Daniel Lezcano, Stephen Boyd, Russell King - ARM Linux
  Cc: Kukjin Kim, Bartlomiej Zolnierkiewicz, linux-kernel,
	kyungmin.park, linux-samsung-soc, Thomas Gleixner,
	linux-arm-kernel, Mark Rutland, Chander Kashyap



On 31/01/15 10:21, Daniel Lezcano wrote:
> On 01/31/2015 02:08 AM, Stephen Boyd wrote:
>> Kept meaning to get back to this thread. Have you resolved it?
>>
>> On 10/29/14 03:38, Marcin Jabrzyk wrote:
>>> So I've tried this patch, it resolves one problem but introduces also
>>> new ones. As expected the BUG warning is not showing after applying
>>> this patch but there are some interesting side effects.
>>
>> Well that's half good news.
>>
>>> I was looking on /proc/interrupts output. IRQ for CPU0 have "MCT" name
>>> and IRQ for CPU1 has unexpectedly no name at all.
>>
>> This is pretty confusing. I don't see how the patch could cause this to
>> happen.
>>
>>> After making hotplug cycle of CPU1 I've observed that IRQs attached
>>> originally for that CPU are generating on really low count and not in
>>> order with IRQ for CPU0.
>>> What's more the interrupt for CPU1 is showing to me as being counted
>>> for both CPUs, so it's probably not being attached to CPU1.
>>>
>>
>> yeah. Can you give the output of /proc/timer_list in addition to
>> /proc/interrupts? It may give some hints on what's going on. It may also
>> be interesting to see if irq_force_affinity() is failing. Please check
>> the return value and print an error
>
> Hi Stephen, Marcin,
>
> can you have a look if the patch [1] fixes this issue ?
>
>   -- Daniel
>
> [1] https://lkml.org/lkml/2015/1/30/423
>
>
Hi Daniel,

I've checked this patch on the board that have problems and it fixes 
this issue completely. Everything looks fine after power cycle of the CPU.

Best regards,
Marcin Jabrzyk

>> diff --git a/drivers/clocksource/exynos_mct.c
>> b/drivers/clocksource/exynos_mct.c
>> index 1800053b4644..3c4538e26731 100644
>> --- a/drivers/clocksource/exynos_mct.c
>> +++ b/drivers/clocksource/exynos_mct.c
>> @@ -450,6 +450,7 @@ static int exynos4_local_timer_setup(struct
>> clock_event_device *evt)
>>   {
>>       struct mct_clock_event_device *mevt;
>>       unsigned int cpu = smp_processor_id();
>> +    int ret;
>>
>>       mevt = container_of(evt, struct mct_clock_event_device, evt);
>>
>> @@ -468,7 +469,9 @@ static int exynos4_local_timer_setup(struct
>> clock_event_device *evt)
>>       if (mct_int_type == MCT_INT_SPI) {
>>           evt->irq = mct_irqs[MCT_L0_IRQ + cpu];
>>           enable_irq(evt->irq);
>> -        irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
>> +        ret = irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu],
>> cpumask_of(cpu));
>> +        if (ret)
>> +            pr_err("force failed %d\n", ret);
>>       } else {
>>           enable_percpu_irq(mct_irqs[MCT_L0_IRQ], 0);
>>       }
>>
>
>

-- 
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 18+ messages in thread

* PROBLEM: BUG  appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug
@ 2015-02-02  8:47                 ` Marcin Jabrzyk
  0 siblings, 0 replies; 18+ messages in thread
From: Marcin Jabrzyk @ 2015-02-02  8:47 UTC (permalink / raw)
  To: linux-arm-kernel



On 31/01/15 10:21, Daniel Lezcano wrote:
> On 01/31/2015 02:08 AM, Stephen Boyd wrote:
>> Kept meaning to get back to this thread. Have you resolved it?
>>
>> On 10/29/14 03:38, Marcin Jabrzyk wrote:
>>> So I've tried this patch, it resolves one problem but introduces also
>>> new ones. As expected the BUG warning is not showing after applying
>>> this patch but there are some interesting side effects.
>>
>> Well that's half good news.
>>
>>> I was looking on /proc/interrupts output. IRQ for CPU0 have "MCT" name
>>> and IRQ for CPU1 has unexpectedly no name at all.
>>
>> This is pretty confusing. I don't see how the patch could cause this to
>> happen.
>>
>>> After making hotplug cycle of CPU1 I've observed that IRQs attached
>>> originally for that CPU are generating on really low count and not in
>>> order with IRQ for CPU0.
>>> What's more the interrupt for CPU1 is showing to me as being counted
>>> for both CPUs, so it's probably not being attached to CPU1.
>>>
>>
>> yeah. Can you give the output of /proc/timer_list in addition to
>> /proc/interrupts? It may give some hints on what's going on. It may also
>> be interesting to see if irq_force_affinity() is failing. Please check
>> the return value and print an error
>
> Hi Stephen, Marcin,
>
> can you have a look if the patch [1] fixes this issue ?
>
>   -- Daniel
>
> [1] https://lkml.org/lkml/2015/1/30/423
>
>
Hi Daniel,

I've checked this patch on the board that have problems and it fixes 
this issue completely. Everything looks fine after power cycle of the CPU.

Best regards,
Marcin Jabrzyk

>> diff --git a/drivers/clocksource/exynos_mct.c
>> b/drivers/clocksource/exynos_mct.c
>> index 1800053b4644..3c4538e26731 100644
>> --- a/drivers/clocksource/exynos_mct.c
>> +++ b/drivers/clocksource/exynos_mct.c
>> @@ -450,6 +450,7 @@ static int exynos4_local_timer_setup(struct
>> clock_event_device *evt)
>>   {
>>       struct mct_clock_event_device *mevt;
>>       unsigned int cpu = smp_processor_id();
>> +    int ret;
>>
>>       mevt = container_of(evt, struct mct_clock_event_device, evt);
>>
>> @@ -468,7 +469,9 @@ static int exynos4_local_timer_setup(struct
>> clock_event_device *evt)
>>       if (mct_int_type == MCT_INT_SPI) {
>>           evt->irq = mct_irqs[MCT_L0_IRQ + cpu];
>>           enable_irq(evt->irq);
>> -        irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
>> +        ret = irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu],
>> cpumask_of(cpu));
>> +        if (ret)
>> +            pr_err("force failed %d\n", ret);
>>       } else {
>>           enable_percpu_irq(mct_irqs[MCT_L0_IRQ], 0);
>>       }
>>
>
>

-- 
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-02-02  8:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-23 13:51 PROBLEM: BUG appearing when trying to allocate interrupt on Exynos MCT after CPU hotplug Marcin Jabrzyk
2014-10-23 13:51 ` Marcin Jabrzyk
2014-10-23 14:06 ` Russell King - ARM Linux
2014-10-23 14:06   ` Russell King - ARM Linux
2014-10-23 18:41   ` Stephen Boyd
2014-10-23 18:41     ` Stephen Boyd
2014-10-24 13:22     ` Marcin Jabrzyk
2014-10-24 13:22       ` Marcin Jabrzyk
2014-10-27 20:16       ` Stephen Boyd
2014-10-27 20:16         ` Stephen Boyd
2014-10-29 10:38         ` Marcin Jabrzyk
2014-10-29 10:38           ` Marcin Jabrzyk
2015-01-31  1:08           ` Stephen Boyd
2015-01-31  1:08             ` Stephen Boyd
2015-01-31  9:21             ` Daniel Lezcano
2015-01-31  9:21               ` Daniel Lezcano
2015-02-02  8:47               ` Marcin Jabrzyk
2015-02-02  8:47                 ` Marcin Jabrzyk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.