All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH ] ipmi: kcs_bmc: fix IRQ exception if the channel is not open
@ 2018-06-22  2:56 Haiyue Wang
  2018-06-22 12:43 ` Corey Minyard
  0 siblings, 1 reply; 4+ messages in thread
From: Haiyue Wang @ 2018-06-22  2:56 UTC (permalink / raw)
  To: minyard, arnd, gregkh, openipmi-developer, linux-kernel
  Cc: Haiyue Wang, luis.a.silva, avi.fishman, openbmc

The original core KCS IRQ handling function 'kcs_bmc_handle_event' had
a wrong logical desgin, it should force abort the KCS transaction only
if the IBF flag is set, because multiple KCS channels share the same
handle function; and it should return 0, which indicates that event of
current channel is handled. An negative value -ENODATA will cause the
IRQ handler of BMC KCS controller return IRQ_NONE, then IRQ moudle will
treat this IRQ 'nobody cared', it will trigger IRQ exception.

   irq 30: nobody cared (try booting with the "irqpoll" option)
   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.15-npcm750 #1
   Hardware name: NPCMX50 Chip family
   [<c010b264>] (unwind_backtrace) from [<c0106930>] (show_stack+0x20/0x24)
   [<c0106930>] (show_stack) from [<c03dad38>] (dump_stack+0x8c/0xa0)
   [<c03dad38>] (dump_stack) from [<c0168810>] (__report_bad_irq+0x3c/0xdc)
   [<c0168810>] (__report_bad_irq) from [<c0168c34>] (note_interrupt+0x29c/0x2ec)
   [<c0168c34>] (note_interrupt) from [<c0165c80>] (handle_irq_event_percpu+0x5c/0x68)
   [<c0165c80>] (handle_irq_event_percpu) from [<c0165cd4>] (handle_irq_event+0x48/0x6c)
   [<c0165cd4>] (handle_irq_event) from [<c0169664>] (handle_fasteoi_irq+0xc8/0x198)
   [<c0169664>] (handle_fasteoi_irq) from [<c016529c>] (__handle_domain_irq+0x90/0xe8)
   [<c016529c>] (__handle_domain_irq) from [<c01014bc>] (gic_handle_irq+0x58/0x9c)
   [<c01014bc>] (gic_handle_irq) from [<c010752c>] (__irq_svc+0x6c/0x90)
   Exception stack(0xc0a01de8 to 0xc0a01e30)
   1de0:                   00002080 c0a6fbc0 00000000 00000000 00000000 c096d294
   1e00: 00000000 00000001 dc406400 f03ff100 00000082 c0a01e94 c0a6fbc0 c0a01e38
   1e20: 00200102 c01015bc 60000113 ffffffff
   [<c010752c>] (__irq_svc) from [<c01015bc>] (__do_softirq+0xbc/0x358)
   [<c01015bc>] (__do_softirq) from [<c011c798>] (irq_exit+0xb8/0xec)
   [<c011c798>] (irq_exit) from [<c01652a0>] (__handle_domain_irq+0x94/0xe8)
   [<c01652a0>] (__handle_domain_irq) from [<c01014bc>] (gic_handle_irq+0x58/0x9c)
   [<c01014bc>] (gic_handle_irq) from [<c010752c>] (__irq_svc+0x6c/0x90)
   Exception stack(0xc0a01ef8 to 0xc0a01f40)
   1ee0:                                                       00000000 000003ae
   1f00: dcc0f338 c0111060 c0a00000 c0a0cc44 c0a0cbe4 c0a1c22b c07bc218 00000001
   1f20: dcffca40 c0a01f54 c0a01f58 c0a01f48 c0103524 c0103528 60000013 ffffffff
   [<c010752c>] (__irq_svc) from [<c0103528>] (arch_cpu_idle+0x48/0x4c)
   [<c0103528>] (arch_cpu_idle) from [<c0681390>] (default_idle_call+0x30/0x3c)
   [<c0681390>] (default_idle_call) from [<c0156f24>] (do_idle+0xc8/0x134)
   [<c0156f24>] (do_idle) from [<c015722c>] (cpu_startup_entry+0x28/0x2c)
   [<c015722c>] (cpu_startup_entry) from [<c067ad74>] (rest_init+0x84/0x88)
   [<c067ad74>] (rest_init) from [<c0900d44>] (start_kernel+0x388/0x394)
   [<c0900d44>] (start_kernel) from [<0000807c>] (0x807c)
   handlers:
   [<c041c5dc>] npcm7xx_kcs_irq
   Disabling IRQ #30

Signed-off-by: Haiyue Wang <haiyue.wang@linux.intel.com>
---
Hi Corey,

This patch looks introducing BIG modification, it should be easily
understandable, and makes code clean & fix an error design, which
is introduced by misunderstanding the IRQ return value.

BR,
Haiyue 
---
 drivers/char/ipmi/kcs_bmc.c | 31 ++++++++++---------------------
 1 file changed, 10 insertions(+), 21 deletions(-)

diff --git a/drivers/char/ipmi/kcs_bmc.c b/drivers/char/ipmi/kcs_bmc.c
index fbfc05e..bb882ab1 100644
--- a/drivers/char/ipmi/kcs_bmc.c
+++ b/drivers/char/ipmi/kcs_bmc.c
@@ -210,34 +210,23 @@ static void kcs_bmc_handle_cmd(struct kcs_bmc *kcs_bmc)
 int kcs_bmc_handle_event(struct kcs_bmc *kcs_bmc)
 {
 	unsigned long flags;
-	int ret = 0;
+	int ret = -ENODATA;
 	u8 status;
 
 	spin_lock_irqsave(&kcs_bmc->lock, flags);
 
-	if (!kcs_bmc->running) {
-		kcs_force_abort(kcs_bmc);
-		ret = -ENODEV;
-		goto out_unlock;
-	}
-
-	status = read_status(kcs_bmc) & (KCS_STATUS_IBF | KCS_STATUS_CMD_DAT);
-
-	switch (status) {
-	case KCS_STATUS_IBF | KCS_STATUS_CMD_DAT:
-		kcs_bmc_handle_cmd(kcs_bmc);
-		break;
-
-	case KCS_STATUS_IBF:
-		kcs_bmc_handle_data(kcs_bmc);
-		break;
+	status = read_status(kcs_bmc);
+	if (status & KCS_STATUS_IBF) {
+		if (!kcs_bmc->running)
+			kcs_force_abort(kcs_bmc);
+		else if (status & KCS_STATUS_CMD_DAT)
+			kcs_bmc_handle_cmd(kcs_bmc);
+		else
+			kcs_bmc_handle_data(kcs_bmc);
 
-	default:
-		ret = -ENODATA;
-		break;
+		ret = 0;
 	}
 
-out_unlock:
 	spin_unlock_irqrestore(&kcs_bmc->lock, flags);
 
 	return ret;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH ] ipmi: kcs_bmc: fix IRQ exception if the channel is not open
  2018-06-22  2:56 [PATCH ] ipmi: kcs_bmc: fix IRQ exception if the channel is not open Haiyue Wang
@ 2018-06-22 12:43 ` Corey Minyard
  2018-06-22 17:23   ` Wang, Haiyue
  0 siblings, 1 reply; 4+ messages in thread
From: Corey Minyard @ 2018-06-22 12:43 UTC (permalink / raw)
  To: Haiyue Wang, arnd, gregkh, openipmi-developer, linux-kernel
  Cc: luis.a.silva, avi.fishman, openbmc

On 06/21/2018 09:56 PM, Haiyue Wang wrote:
> The original core KCS IRQ handling function 'kcs_bmc_handle_event' had
> a wrong logical desgin, it should force abort the KCS transaction only
> if the IBF flag is set, because multiple KCS channels share the same
> handle function; and it should return 0, which indicates that event of
> current channel is handled. An negative value -ENODATA will cause the
> IRQ handler of BMC KCS controller return IRQ_NONE, then IRQ moudle will
> treat this IRQ 'nobody cared', it will trigger IRQ exception.
>
>     irq 30: nobody cared (try booting with the "irqpoll" option)
>     CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.15-npcm750 #1
>     Hardware name: NPCMX50 Chip family
>     [<c010b264>] (unwind_backtrace) from [<c0106930>] (show_stack+0x20/0x24)
>     [<c0106930>] (show_stack) from [<c03dad38>] (dump_stack+0x8c/0xa0)
>     [<c03dad38>] (dump_stack) from [<c0168810>] (__report_bad_irq+0x3c/0xdc)
>     [<c0168810>] (__report_bad_irq) from [<c0168c34>] (note_interrupt+0x29c/0x2ec)
>     [<c0168c34>] (note_interrupt) from [<c0165c80>] (handle_irq_event_percpu+0x5c/0x68)
>     [<c0165c80>] (handle_irq_event_percpu) from [<c0165cd4>] (handle_irq_event+0x48/0x6c)
>     [<c0165cd4>] (handle_irq_event) from [<c0169664>] (handle_fasteoi_irq+0xc8/0x198)
>     [<c0169664>] (handle_fasteoi_irq) from [<c016529c>] (__handle_domain_irq+0x90/0xe8)
>     [<c016529c>] (__handle_domain_irq) from [<c01014bc>] (gic_handle_irq+0x58/0x9c)
>     [<c01014bc>] (gic_handle_irq) from [<c010752c>] (__irq_svc+0x6c/0x90)
>     Exception stack(0xc0a01de8 to 0xc0a01e30)
>     1de0:                   00002080 c0a6fbc0 00000000 00000000 00000000 c096d294
>     1e00: 00000000 00000001 dc406400 f03ff100 00000082 c0a01e94 c0a6fbc0 c0a01e38
>     1e20: 00200102 c01015bc 60000113 ffffffff
>     [<c010752c>] (__irq_svc) from [<c01015bc>] (__do_softirq+0xbc/0x358)
>     [<c01015bc>] (__do_softirq) from [<c011c798>] (irq_exit+0xb8/0xec)
>     [<c011c798>] (irq_exit) from [<c01652a0>] (__handle_domain_irq+0x94/0xe8)
>     [<c01652a0>] (__handle_domain_irq) from [<c01014bc>] (gic_handle_irq+0x58/0x9c)
>     [<c01014bc>] (gic_handle_irq) from [<c010752c>] (__irq_svc+0x6c/0x90)
>     Exception stack(0xc0a01ef8 to 0xc0a01f40)
>     1ee0:                                                       00000000 000003ae
>     1f00: dcc0f338 c0111060 c0a00000 c0a0cc44 c0a0cbe4 c0a1c22b c07bc218 00000001
>     1f20: dcffca40 c0a01f54 c0a01f58 c0a01f48 c0103524 c0103528 60000013 ffffffff
>     [<c010752c>] (__irq_svc) from [<c0103528>] (arch_cpu_idle+0x48/0x4c)
>     [<c0103528>] (arch_cpu_idle) from [<c0681390>] (default_idle_call+0x30/0x3c)
>     [<c0681390>] (default_idle_call) from [<c0156f24>] (do_idle+0xc8/0x134)
>     [<c0156f24>] (do_idle) from [<c015722c>] (cpu_startup_entry+0x28/0x2c)
>     [<c015722c>] (cpu_startup_entry) from [<c067ad74>] (rest_init+0x84/0x88)
>     [<c067ad74>] (rest_init) from [<c0900d44>] (start_kernel+0x388/0x394)
>     [<c0900d44>] (start_kernel) from [<0000807c>] (0x807c)
>     handlers:
>     [<c041c5dc>] npcm7xx_kcs_irq
>     Disabling IRQ #30
>
> Signed-off-by: Haiyue Wang <haiyue.wang@linux.intel.com>
> ---
> Hi Corey,
>
> This patch looks introducing BIG modification, it should be easily
> understandable, and makes code clean & fix an error design, which
> is introduced by misunderstanding the IRQ return value.

I'm having a little trouble understanding your text above, so let me try 
to repeat
back to you what I'm thinking you are saying...

You have two (or more) devices using the same interrupt, and at least 
one is an
IPMI KCS device.  And interrupt comes in to the other device when the 
IPMI KCS
device is not running.  The original code issues an abort when that happens,
which is obviously incorrect.  It then returns -ENODATA, .

When the interrupt comes in for the abort handling, the driver will then 
issue
another abort, and again returns -ENODATA.  This time neither driver handles
the interrupt, resulting in the logs.

In general, I think the change you have here is good.  You don't want to
issue an abort in this case.  But...

If I am right, this fix doesn't completely solve the problem.  It does 
solve this
immediate problem, but what if there is an OS on the other end of the
KCS interface that sets the IBF flag?  The same situation will occur.  
In fact
it will occur even if there is only one handler for the interrupt.

Maybe it's best to have the interrupt disabled unless the device is open.
You have to handle the interrupt disable race on a close, but with the
sync functions that shouldn't be too hard.

-corey

>
> BR,
> Haiyue
> ---
>   drivers/char/ipmi/kcs_bmc.c | 31 ++++++++++---------------------
>   1 file changed, 10 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/char/ipmi/kcs_bmc.c b/drivers/char/ipmi/kcs_bmc.c
> index fbfc05e..bb882ab1 100644
> --- a/drivers/char/ipmi/kcs_bmc.c
> +++ b/drivers/char/ipmi/kcs_bmc.c
> @@ -210,34 +210,23 @@ static void kcs_bmc_handle_cmd(struct kcs_bmc *kcs_bmc)
>   int kcs_bmc_handle_event(struct kcs_bmc *kcs_bmc)
>   {
>   	unsigned long flags;
> -	int ret = 0;
> +	int ret = -ENODATA;
>   	u8 status;
>   
>   	spin_lock_irqsave(&kcs_bmc->lock, flags);
>   
> -	if (!kcs_bmc->running) {
> -		kcs_force_abort(kcs_bmc);
> -		ret = -ENODEV;
> -		goto out_unlock;
> -	}
> -
> -	status = read_status(kcs_bmc) & (KCS_STATUS_IBF | KCS_STATUS_CMD_DAT);
> -
> -	switch (status) {
> -	case KCS_STATUS_IBF | KCS_STATUS_CMD_DAT:
> -		kcs_bmc_handle_cmd(kcs_bmc);
> -		break;
> -
> -	case KCS_STATUS_IBF:
> -		kcs_bmc_handle_data(kcs_bmc);
> -		break;
> +	status = read_status(kcs_bmc);
> +	if (status & KCS_STATUS_IBF) {
> +		if (!kcs_bmc->running)
> +			kcs_force_abort(kcs_bmc);
> +		else if (status & KCS_STATUS_CMD_DAT)
> +			kcs_bmc_handle_cmd(kcs_bmc);
> +		else
> +			kcs_bmc_handle_data(kcs_bmc);
>   
> -	default:
> -		ret = -ENODATA;
> -		break;
> +		ret = 0;
>   	}
>   
> -out_unlock:
>   	spin_unlock_irqrestore(&kcs_bmc->lock, flags);
>   
>   	return ret;



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH ] ipmi: kcs_bmc: fix IRQ exception if the channel is not open
  2018-06-22 12:43 ` Corey Minyard
@ 2018-06-22 17:23   ` Wang, Haiyue
  2018-06-22 17:43     ` Wang, Haiyue
  0 siblings, 1 reply; 4+ messages in thread
From: Wang, Haiyue @ 2018-06-22 17:23 UTC (permalink / raw)
  To: minyard, arnd, gregkh, openipmi-developer, linux-kernel
  Cc: luis.a.silva, avi.fishman, openbmc



On 2018-06-22 20:43, Corey Minyard wrote:
>> Signed-off-by: Haiyue Wang <haiyue.wang@linux.intel.com>
>> ---
>> Hi Corey,
>>
>> This patch looks introducing BIG modification, it should be easily
>> understandable, and makes code clean & fix an error design, which
>> is introduced by misunderstanding the IRQ return value.
>
> I'm having a little trouble understanding your text above, so let me 
> try to repeat
> back to you what I'm thinking you are saying...
>
Sorry for my bad writing.... :(
> You have two (or more) devices using the same interrupt, and at least 
> one is an
> IPMI KCS device.  And interrupt comes in to the other device when the 
> IPMI KCS
> device is not running.  The original code issues an abort when that 
> happens,
> which is obviously incorrect.  It then returns -ENODATA, .
>
> When the interrupt comes in for the abort handling, the driver will 
> then issue
> another abort, and again returns -ENODATA.  This time neither driver 
> handles
> the interrupt, resulting in the logs.
>
> In general, I think the change you have here is good.  You don't want to
> issue an abort in this case.  But...
>

> If I am right, this fix doesn't completely solve the problem.  It does 
> solve this
> immediate problem, but what if there is an OS on the other end of the
> KCS interface that sets the IBF flag?  The same situation will occur.  
> In fact
> it will occur even if there is only one handler for the interrupt.
>
> Maybe it's best to have the interrupt disabled unless the device is open.
> You have to handle the interrupt disable race on a close, but with the
> sync functions that shouldn't be too hard.
>
In fact, in BMC chip design, the LPC controller has many devices, such as
Port 80 snoop, BT, KCS etc, they shares the same interrupt. :)

Take AST2500 BMC as an example, please search the keyword 'interrupts = 
<8>' in
this file:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/aspeed-g5.dtsi?h=v4.18-rc1

And may have 4 KCS channels:
https://patchwork.kernel.org/patch/10318877/
This patch just enables kcs3 & kcs4, which use the same interrupt number 8.

So the interrupt should be enabled always.

And this IRQ issue root cause it that the IRQ handler just return IRQ_NONE
if it is not opened. And for abort actions, I just put it under its 
related channel
IBF is set. Because only IBF is set, aborting makes sense.

> -corey


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH ] ipmi: kcs_bmc: fix IRQ exception if the channel is not open
  2018-06-22 17:23   ` Wang, Haiyue
@ 2018-06-22 17:43     ` Wang, Haiyue
  0 siblings, 0 replies; 4+ messages in thread
From: Wang, Haiyue @ 2018-06-22 17:43 UTC (permalink / raw)
  To: minyard, arnd, gregkh, openipmi-developer, linux-kernel
  Cc: luis.a.silva, avi.fishman, openbmc



On 2018-06-23 01:23, Wang, Haiyue wrote:
>> Maybe it's best to have the interrupt disabled unless the device is 
>> open.
>> You have to handle the interrupt disable race on a close, but with the
>> sync functions that shouldn't be too hard.
>>
> In fact, in BMC chip design, the LPC controller has many devices, such as
> Port 80 snoop, BT, KCS etc, they shares the same interrupt. :) 
BTW, for AST2500, if the BMC and PCH run under eSPI (like LPC) mode, and 
if we
disable the KCS devices, then reboot the BMC, it will cause the system 
hang, and
the KCS channel can't be used anymore. So we need always enable these 
used KCS
devices (keep the KCS enable register setting) for working well under 
eSPI mode.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-06-22 17:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-22  2:56 [PATCH ] ipmi: kcs_bmc: fix IRQ exception if the channel is not open Haiyue Wang
2018-06-22 12:43 ` Corey Minyard
2018-06-22 17:23   ` Wang, Haiyue
2018-06-22 17:43     ` Wang, Haiyue

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.