All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] generic: Add the exception case checking routine for ppi interrupt
@ 2016-08-30  4:17 ` MaJun
  0 siblings, 0 replies; 23+ messages in thread
From: MaJun @ 2016-08-30  4:17 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, marc.zyngier, tglx, dingtianhong,
	guohanjun, majun258

From: Ma Jun <majun258@huawei.com>

During system booting, if the interrupt which has no action registered
is triggered, it would cause system panic when try to access the
action member.

Signed-off-by: Ma Jun <majun258@huawei.com>
---
 kernel/irq/chip.c |   20 ++++++++++++++++----
 1 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 8114d06..9a0e872 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -766,11 +766,23 @@ handle_percpu_irq(unsigned int irq, struct irq_desc *desc)
  */
 void handle_percpu_devid_irq(unsigned int irq, struct irq_desc *desc)
 {
-	struct irq_chip *chip = irq_desc_get_chip(desc);
-	struct irqaction *action = desc->action;
-	void *dev_id = raw_cpu_ptr(action->percpu_dev_id);
+	struct irq_chip *chip = NULL;
+	struct irqaction *action;
+	void *dev_id;
 	irqreturn_t res;
 
+	action = desc->action;
+
+	/* Unexpected interrupt in some execption case
+	 * we just send eoi to end this interrupt
+	 */
+	if (unlikely(!action)) {
+		mask_irq(desc);
+		goto out;
+	}
+	dev_id = raw_cpu_ptr(action->percpu_dev_id);
+
+	chip = irq_desc_get_chip(desc);
 	kstat_incr_irqs_this_cpu(irq, desc);
 
 	if (chip->irq_ack)
@@ -779,7 +791,7 @@ void handle_percpu_devid_irq(unsigned int irq, struct irq_desc *desc)
 	trace_irq_handler_entry(irq, action);
 	res = action->handler(irq, dev_id);
 	trace_irq_handler_exit(irq, action, res);
-
+out:
 	if (chip->irq_eoi)
 		chip->irq_eoi(&desc->irq_data);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH] generic: Add the exception case checking routine for ppi interrupt
@ 2016-08-30  4:17 ` MaJun
  0 siblings, 0 replies; 23+ messages in thread
From: MaJun @ 2016-08-30  4:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Ma Jun <majun258@huawei.com>

During system booting, if the interrupt which has no action registered
is triggered, it would cause system panic when try to access the
action member.

Signed-off-by: Ma Jun <majun258@huawei.com>
---
 kernel/irq/chip.c |   20 ++++++++++++++++----
 1 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 8114d06..9a0e872 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -766,11 +766,23 @@ handle_percpu_irq(unsigned int irq, struct irq_desc *desc)
  */
 void handle_percpu_devid_irq(unsigned int irq, struct irq_desc *desc)
 {
-	struct irq_chip *chip = irq_desc_get_chip(desc);
-	struct irqaction *action = desc->action;
-	void *dev_id = raw_cpu_ptr(action->percpu_dev_id);
+	struct irq_chip *chip = NULL;
+	struct irqaction *action;
+	void *dev_id;
 	irqreturn_t res;
 
+	action = desc->action;
+
+	/* Unexpected interrupt in some execption case
+	 * we just send eoi to end this interrupt
+	 */
+	if (unlikely(!action)) {
+		mask_irq(desc);
+		goto out;
+	}
+	dev_id = raw_cpu_ptr(action->percpu_dev_id);
+
+	chip = irq_desc_get_chip(desc);
 	kstat_incr_irqs_this_cpu(irq, desc);
 
 	if (chip->irq_ack)
@@ -779,7 +791,7 @@ void handle_percpu_devid_irq(unsigned int irq, struct irq_desc *desc)
 	trace_irq_handler_entry(irq, action);
 	res = action->handler(irq, dev_id);
 	trace_irq_handler_exit(irq, action, res);
-
+out:
 	if (chip->irq_eoi)
 		chip->irq_eoi(&desc->irq_data);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] generic: Add the exception case checking routine for ppi interrupt
  2016-08-30  4:17 ` MaJun
@ 2016-08-30  8:50   ` Marc Zyngier
  -1 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2016-08-30  8:50 UTC (permalink / raw)
  To: MaJun, linux-kernel, linux-arm-kernel, tglx, dingtianhong, guohanjun

On 30/08/16 05:17, MaJun wrote:
> From: Ma Jun <majun258@huawei.com>
> 
> During system booting, if the interrupt which has no action registered
> is triggered, it would cause system panic when try to access the
> action member.

And why would that interrupt be enabled? If you enable a PPI before
registering a handler, you're doing something wrong.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] generic: Add the exception case checking routine for ppi interrupt
@ 2016-08-30  8:50   ` Marc Zyngier
  0 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2016-08-30  8:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/08/16 05:17, MaJun wrote:
> From: Ma Jun <majun258@huawei.com>
> 
> During system booting, if the interrupt which has no action registered
> is triggered, it would cause system panic when try to access the
> action member.

And why would that interrupt be enabled? If you enable a PPI before
registering a handler, you're doing something wrong.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] generic: Add the exception case checking routine for ppi interrupt
  2016-08-30  8:50   ` Marc Zyngier
@ 2016-08-30 10:35     ` majun (F)
  -1 siblings, 0 replies; 23+ messages in thread
From: majun (F) @ 2016-08-30 10:35 UTC (permalink / raw)
  To: Marc Zyngier, linux-kernel, linux-arm-kernel, tglx, dingtianhong,
	guohanjun
  Cc: majun258



在 2016/8/30 16:50, Marc Zyngier 写道:
> On 30/08/16 05:17, MaJun wrote:
>> From: Ma Jun <majun258@huawei.com>
>>
>> During system booting, if the interrupt which has no action registered
>> is triggered, it would cause system panic when try to access the
>> action member.
> 
> And why would that interrupt be enabled? If you enable a PPI before
> registering a handler, you're doing something wrong.
> 

Actually,the problem described above happened during the capture kernel booting.

In my system, sometimes there is a pending physical timer interrupt(30)
when the first kernel panic and the status is kept until the capture kernel booting.

So, this interrupt will be handled during capture kernel booting.

Becasue we use virt timer interrupt but not physical timer interrupt in capture kernel,
the interrupt 30 has no action handler.

Besides, I think we need to do exception check in this function just
like "handle_fasteoi_irq" does.

Thanks
MaJun

> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] generic: Add the exception case checking routine for ppi interrupt
@ 2016-08-30 10:35     ` majun (F)
  0 siblings, 0 replies; 23+ messages in thread
From: majun (F) @ 2016-08-30 10:35 UTC (permalink / raw)
  To: linux-arm-kernel



? 2016/8/30 16:50, Marc Zyngier ??:
> On 30/08/16 05:17, MaJun wrote:
>> From: Ma Jun <majun258@huawei.com>
>>
>> During system booting, if the interrupt which has no action registered
>> is triggered, it would cause system panic when try to access the
>> action member.
> 
> And why would that interrupt be enabled? If you enable a PPI before
> registering a handler, you're doing something wrong.
> 

Actually,the problem described above happened during the capture kernel booting.

In my system, sometimes there is a pending physical timer interrupt(30)
when the first kernel panic and the status is kept until the capture kernel booting.

So, this interrupt will be handled during capture kernel booting.

Becasue we use virt timer interrupt but not physical timer interrupt in capture kernel,
the interrupt 30 has no action handler.

Besides, I think we need to do exception check in this function just
like "handle_fasteoi_irq" does.

Thanks
MaJun

> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] generic: Add the exception case checking routine for ppi interrupt
  2016-08-30 10:35     ` majun (F)
@ 2016-08-30 11:07       ` Marc Zyngier
  -1 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2016-08-30 11:07 UTC (permalink / raw)
  To: majun (F), linux-kernel, linux-arm-kernel, tglx, dingtianhong, guohanjun
  Cc: Mark Rutland

+Mark

On 30/08/16 11:35, majun (F) wrote:
> 
> 
> 在 2016/8/30 16:50, Marc Zyngier 写道:
>> On 30/08/16 05:17, MaJun wrote:
>>> From: Ma Jun <majun258@huawei.com>
>>>
>>> During system booting, if the interrupt which has no action registered
>>> is triggered, it would cause system panic when try to access the
>>> action member.
>>
>> And why would that interrupt be enabled? If you enable a PPI before
>> registering a handler, you're doing something wrong.
>>
> 
> Actually,the problem described above happened during the capture
> kernel booting.
> 
> In my system, sometimes there is a pending physical timer
> interrupt(30) when the first kernel panic and the status is kept
> until the capture kernel booting.

And that's perfectly fine. The interrupt can be pending forever, as it
shouldn't get enabled.

> So, this interrupt will be handled during capture kernel booting.

Why? Who enables it?

> 
> Becasue we use virt timer interrupt but not physical timer interrupt
> in capture kernel, the interrupt 30 has no action handler.

Again: who enables this interrupt? Whichever driver enables it should be
fixed.

> Besides, I think we need to do exception check in this function just
> like "handle_fasteoi_irq" does.

I respectfully disagree. This will only hide a whole class of silly
bugs, and I'd rather squash them instead of papering over them.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] generic: Add the exception case checking routine for ppi interrupt
@ 2016-08-30 11:07       ` Marc Zyngier
  0 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2016-08-30 11:07 UTC (permalink / raw)
  To: linux-arm-kernel

+Mark

On 30/08/16 11:35, majun (F) wrote:
> 
> 
> ? 2016/8/30 16:50, Marc Zyngier ??:
>> On 30/08/16 05:17, MaJun wrote:
>>> From: Ma Jun <majun258@huawei.com>
>>>
>>> During system booting, if the interrupt which has no action registered
>>> is triggered, it would cause system panic when try to access the
>>> action member.
>>
>> And why would that interrupt be enabled? If you enable a PPI before
>> registering a handler, you're doing something wrong.
>>
> 
> Actually,the problem described above happened during the capture
> kernel booting.
> 
> In my system, sometimes there is a pending physical timer
> interrupt(30) when the first kernel panic and the status is kept
> until the capture kernel booting.

And that's perfectly fine. The interrupt can be pending forever, as it
shouldn't get enabled.

> So, this interrupt will be handled during capture kernel booting.

Why? Who enables it?

> 
> Becasue we use virt timer interrupt but not physical timer interrupt
> in capture kernel, the interrupt 30 has no action handler.

Again: who enables this interrupt? Whichever driver enables it should be
fixed.

> Besides, I think we need to do exception check in this function just
> like "handle_fasteoi_irq" does.

I respectfully disagree. This will only hide a whole class of silly
bugs, and I'd rather squash them instead of papering over them.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] generic: Add the exception case checking routine for ppi interrupt
  2016-08-30 11:07       ` Marc Zyngier
@ 2016-08-30 11:21         ` Mark Rutland
  -1 siblings, 0 replies; 23+ messages in thread
From: Mark Rutland @ 2016-08-30 11:21 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: majun (F), linux-kernel, linux-arm-kernel, tglx, dingtianhong, guohanjun

On Tue, Aug 30, 2016 at 12:07:36PM +0100, Marc Zyngier wrote:
> +Mark
> On 30/08/16 11:35, majun (F) wrote:
> > 在 2016/8/30 16:50, Marc Zyngier 写道:
> >> On 30/08/16 05:17, MaJun wrote:
> >>> From: Ma Jun <majun258@huawei.com>
> >>>
> >>> During system booting, if the interrupt which has no action registered
> >>> is triggered, it would cause system panic when try to access the
> >>> action member.
> >>
> >> And why would that interrupt be enabled? If you enable a PPI before
> >> registering a handler, you're doing something wrong.
> > 
> > Actually,the problem described above happened during the capture
> > kernel booting.
> > 
> > In my system, sometimes there is a pending physical timer
> > interrupt(30) when the first kernel panic and the status is kept
> > until the capture kernel booting.
> 
> And that's perfectly fine. The interrupt can be pending forever, as it
> shouldn't get enabled.
> 
> > So, this interrupt will be handled during capture kernel booting.
> 
> Why? Who enables it?
> 
> > Becasue we use virt timer interrupt but not physical timer interrupt
> > in capture kernel, the interrupt 30 has no action handler.
> 
> Again: who enables this interrupt? Whichever driver enables it should be
> fixed.

I'm also at a loss.

In this case, arch_timer_uses_ppi must be VIRT_PPI. So in
arch_timer_register(), we'll only request_percpu_irq the virt PPI.
arch_timer_has_nonsecure_ppi() will be false, given arch_timer_uses_ppi
is VIRT_PPI, so in arch_timer_starting_cpu() we'll only
enable_percpu_irq() the virt PPI.

We don't fiddle with arch_timer_uses_ppi after calling
arch_timer_register(). So I can't see how we could enable another IRQ in
this case.

Looking at the driver in virt/kvm/arm/arch_timer.c, we only enable what
we've succesfully requested, so it doesnt' seem like there's an issue
there.

>From a quick look at teh GIC driver, it looks like we reset PPIs
correctly, so it doesn't look like we have a "latent enable".

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] generic: Add the exception case checking routine for ppi interrupt
@ 2016-08-30 11:21         ` Mark Rutland
  0 siblings, 0 replies; 23+ messages in thread
From: Mark Rutland @ 2016-08-30 11:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Aug 30, 2016 at 12:07:36PM +0100, Marc Zyngier wrote:
> +Mark
> On 30/08/16 11:35, majun (F) wrote:
> > ? 2016/8/30 16:50, Marc Zyngier ??:
> >> On 30/08/16 05:17, MaJun wrote:
> >>> From: Ma Jun <majun258@huawei.com>
> >>>
> >>> During system booting, if the interrupt which has no action registered
> >>> is triggered, it would cause system panic when try to access the
> >>> action member.
> >>
> >> And why would that interrupt be enabled? If you enable a PPI before
> >> registering a handler, you're doing something wrong.
> > 
> > Actually,the problem described above happened during the capture
> > kernel booting.
> > 
> > In my system, sometimes there is a pending physical timer
> > interrupt(30) when the first kernel panic and the status is kept
> > until the capture kernel booting.
> 
> And that's perfectly fine. The interrupt can be pending forever, as it
> shouldn't get enabled.
> 
> > So, this interrupt will be handled during capture kernel booting.
> 
> Why? Who enables it?
> 
> > Becasue we use virt timer interrupt but not physical timer interrupt
> > in capture kernel, the interrupt 30 has no action handler.
> 
> Again: who enables this interrupt? Whichever driver enables it should be
> fixed.

I'm also at a loss.

In this case, arch_timer_uses_ppi must be VIRT_PPI. So in
arch_timer_register(), we'll only request_percpu_irq the virt PPI.
arch_timer_has_nonsecure_ppi() will be false, given arch_timer_uses_ppi
is VIRT_PPI, so in arch_timer_starting_cpu() we'll only
enable_percpu_irq() the virt PPI.

We don't fiddle with arch_timer_uses_ppi after calling
arch_timer_register(). So I can't see how we could enable another IRQ in
this case.

Looking at the driver in virt/kvm/arm/arch_timer.c, we only enable what
we've succesfully requested, so it doesnt' seem like there's an issue
there.

>From a quick look at teh GIC driver, it looks like we reset PPIs
correctly, so it doesn't look like we have a "latent enable".

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] generic: Add the exception case checking routine for ppi interrupt
  2016-08-30 11:21         ` Mark Rutland
@ 2016-08-31  6:35           ` majun (F)
  -1 siblings, 0 replies; 23+ messages in thread
From: majun (F) @ 2016-08-31  6:35 UTC (permalink / raw)
  To: Mark Rutland, Marc Zyngier
  Cc: majun258, linux-kernel, linux-arm-kernel, tglx, dingtianhong, guohanjun

Hi Marc & Mark:

在 2016/8/30 19:21, Mark Rutland 写道:
> On Tue, Aug 30, 2016 at 12:07:36PM +0100, Marc Zyngier wrote:
>> +Mark
>> On 30/08/16 11:35, majun (F) wrote:
>>> 在 2016/8/30 16:50, Marc Zyngier 写道:
>>>> On 30/08/16 05:17, MaJun wrote:
>>>>> From: Ma Jun <majun258@huawei.com>
>>>>>
>>>>> During system booting, if the interrupt which has no action registered
>>>>> is triggered, it would cause system panic when try to access the
>>>>> action member.
>>>>
>>>> And why would that interrupt be enabled? If you enable a PPI before
>>>> registering a handler, you're doing something wrong.
>>>
>>> Actually,the problem described above happened during the capture
>>> kernel booting.
>>>
>>> In my system, sometimes there is a pending physical timer
>>> interrupt(30) when the first kernel panic and the status is kept
>>> until the capture kernel booting.
>>
>> And that's perfectly fine. The interrupt can be pending forever, as it
>> shouldn't get enabled.
>>
>>> So, this interrupt will be handled during capture kernel booting.
>>
>> Why? Who enables it?
>>
>>> Becasue we use virt timer interrupt but not physical timer interrupt
>>> in capture kernel, the interrupt 30 has no action handler.
>>
>> Again: who enables this interrupt? Whichever driver enables it should be
>> fixed.
> 
> I'm also at a loss.
> 
> In this case, arch_timer_uses_ppi must be VIRT_PPI. So in
> arch_timer_register(), we'll only request_percpu_irq the virt PPI.
> arch_timer_has_nonsecure_ppi() will be false, given arch_timer_uses_ppi
> is VIRT_PPI, so in arch_timer_starting_cpu() we'll only
> enable_percpu_irq() the virt PPI.
> 
> We don't fiddle with arch_timer_uses_ppi after calling
> arch_timer_register(). So I can't see how we could enable another IRQ in
> this case.
> 
> Looking at the driver in virt/kvm/arm/arch_timer.c, we only enable what
> we've succesfully requested, so it doesnt' seem like there's an issue
> there.
> 
>>From a quick look at teh GIC driver, it looks like we reset PPIs
> correctly, so it doesn't look like we have a "latent enable".
> 

I just checked the status of irq 30 during capture kernel booting.

The irq 30 status is: mask, pending after arch_timer_starting_cpu() called.
Because irq 30 triggered only 1 time during capture kernel booting,
I think this problem maybe happened in the case like:
1:irq 30 triggered, but not acked by cpu yet.
2:local_irq_disable() called
3:system reboot -->capture kernel booting
4:local_irq_enable()
5:irq 30 acked by CPU.

Is this case possible?

Thanks
MaJun

> Thanks,
> Mark.
> 
> .
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] generic: Add the exception case checking routine for ppi interrupt
@ 2016-08-31  6:35           ` majun (F)
  0 siblings, 0 replies; 23+ messages in thread
From: majun (F) @ 2016-08-31  6:35 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc & Mark:

? 2016/8/30 19:21, Mark Rutland ??:
> On Tue, Aug 30, 2016 at 12:07:36PM +0100, Marc Zyngier wrote:
>> +Mark
>> On 30/08/16 11:35, majun (F) wrote:
>>> ? 2016/8/30 16:50, Marc Zyngier ??:
>>>> On 30/08/16 05:17, MaJun wrote:
>>>>> From: Ma Jun <majun258@huawei.com>
>>>>>
>>>>> During system booting, if the interrupt which has no action registered
>>>>> is triggered, it would cause system panic when try to access the
>>>>> action member.
>>>>
>>>> And why would that interrupt be enabled? If you enable a PPI before
>>>> registering a handler, you're doing something wrong.
>>>
>>> Actually,the problem described above happened during the capture
>>> kernel booting.
>>>
>>> In my system, sometimes there is a pending physical timer
>>> interrupt(30) when the first kernel panic and the status is kept
>>> until the capture kernel booting.
>>
>> And that's perfectly fine. The interrupt can be pending forever, as it
>> shouldn't get enabled.
>>
>>> So, this interrupt will be handled during capture kernel booting.
>>
>> Why? Who enables it?
>>
>>> Becasue we use virt timer interrupt but not physical timer interrupt
>>> in capture kernel, the interrupt 30 has no action handler.
>>
>> Again: who enables this interrupt? Whichever driver enables it should be
>> fixed.
> 
> I'm also at a loss.
> 
> In this case, arch_timer_uses_ppi must be VIRT_PPI. So in
> arch_timer_register(), we'll only request_percpu_irq the virt PPI.
> arch_timer_has_nonsecure_ppi() will be false, given arch_timer_uses_ppi
> is VIRT_PPI, so in arch_timer_starting_cpu() we'll only
> enable_percpu_irq() the virt PPI.
> 
> We don't fiddle with arch_timer_uses_ppi after calling
> arch_timer_register(). So I can't see how we could enable another IRQ in
> this case.
> 
> Looking at the driver in virt/kvm/arm/arch_timer.c, we only enable what
> we've succesfully requested, so it doesnt' seem like there's an issue
> there.
> 
>>From a quick look at teh GIC driver, it looks like we reset PPIs
> correctly, so it doesn't look like we have a "latent enable".
> 

I just checked the status of irq 30 during capture kernel booting.

The irq 30 status is: mask, pending after arch_timer_starting_cpu() called.
Because irq 30 triggered only 1 time during capture kernel booting,
I think this problem maybe happened in the case like:
1:irq 30 triggered, but not acked by cpu yet.
2:local_irq_disable() called
3:system reboot -->capture kernel booting
4:local_irq_enable()
5:irq 30 acked by CPU.

Is this case possible?

Thanks
MaJun

> Thanks,
> Mark.
> 
> .
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] generic: Add the exception case checking routine for ppi interrupt
  2016-08-31  6:35           ` majun (F)
@ 2016-08-31  8:35             ` Marc Zyngier
  -1 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2016-08-31  8:35 UTC (permalink / raw)
  To: majun (F), Mark Rutland
  Cc: linux-kernel, linux-arm-kernel, tglx, dingtianhong, guohanjun

On 31/08/16 07:35, majun (F) wrote:
> Hi Marc & Mark:
> 
> 在 2016/8/30 19:21, Mark Rutland 写道:
>> On Tue, Aug 30, 2016 at 12:07:36PM +0100, Marc Zyngier wrote:
>>> +Mark
>>> On 30/08/16 11:35, majun (F) wrote:
>>>> 在 2016/8/30 16:50, Marc Zyngier 写道:
>>>>> On 30/08/16 05:17, MaJun wrote:
>>>>>> From: Ma Jun <majun258@huawei.com>
>>>>>>
>>>>>> During system booting, if the interrupt which has no action registered
>>>>>> is triggered, it would cause system panic when try to access the
>>>>>> action member.
>>>>>
>>>>> And why would that interrupt be enabled? If you enable a PPI before
>>>>> registering a handler, you're doing something wrong.
>>>>
>>>> Actually,the problem described above happened during the capture
>>>> kernel booting.
>>>>
>>>> In my system, sometimes there is a pending physical timer
>>>> interrupt(30) when the first kernel panic and the status is kept
>>>> until the capture kernel booting.
>>>
>>> And that's perfectly fine. The interrupt can be pending forever, as it
>>> shouldn't get enabled.
>>>
>>>> So, this interrupt will be handled during capture kernel booting.
>>>
>>> Why? Who enables it?
>>>
>>>> Becasue we use virt timer interrupt but not physical timer interrupt
>>>> in capture kernel, the interrupt 30 has no action handler.
>>>
>>> Again: who enables this interrupt? Whichever driver enables it should be
>>> fixed.
>>
>> I'm also at a loss.
>>
>> In this case, arch_timer_uses_ppi must be VIRT_PPI. So in
>> arch_timer_register(), we'll only request_percpu_irq the virt PPI.
>> arch_timer_has_nonsecure_ppi() will be false, given arch_timer_uses_ppi
>> is VIRT_PPI, so in arch_timer_starting_cpu() we'll only
>> enable_percpu_irq() the virt PPI.
>>
>> We don't fiddle with arch_timer_uses_ppi after calling
>> arch_timer_register(). So I can't see how we could enable another IRQ in
>> this case.
>>
>> Looking at the driver in virt/kvm/arm/arch_timer.c, we only enable what
>> we've succesfully requested, so it doesnt' seem like there's an issue
>> there.
>>
>> >From a quick look at teh GIC driver, it looks like we reset PPIs
>> correctly, so it doesn't look like we have a "latent enable".
>>
> 
> I just checked the status of irq 30 during capture kernel booting.
> 
> The irq 30 status is: mask, pending after arch_timer_starting_cpu() called.
> Because irq 30 triggered only 1 time during capture kernel booting,
> I think this problem maybe happened in the case like:
> 1:irq 30 triggered, but not acked by cpu yet.
> 2:local_irq_disable() called
> 3:system reboot -->capture kernel booting
> 4:local_irq_enable()
> 5:irq 30 acked by CPU.
> 
> Is this case possible?

I can't see how, because you've missed:

3b: All PPIs are disabled as each CPU comes up

So for (5) to occur, I can only see two possibilities:
(a) either something else is enabling the timer PPI
(b) your GIC doesn't correctly retire a pending PPI that is being disabled

I'm discounting (b) because I can't see how the system would work
otherwise, so (a) must be happening somehow.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] generic: Add the exception case checking routine for ppi interrupt
@ 2016-08-31  8:35             ` Marc Zyngier
  0 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2016-08-31  8:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 31/08/16 07:35, majun (F) wrote:
> Hi Marc & Mark:
> 
> ? 2016/8/30 19:21, Mark Rutland ??:
>> On Tue, Aug 30, 2016 at 12:07:36PM +0100, Marc Zyngier wrote:
>>> +Mark
>>> On 30/08/16 11:35, majun (F) wrote:
>>>> ? 2016/8/30 16:50, Marc Zyngier ??:
>>>>> On 30/08/16 05:17, MaJun wrote:
>>>>>> From: Ma Jun <majun258@huawei.com>
>>>>>>
>>>>>> During system booting, if the interrupt which has no action registered
>>>>>> is triggered, it would cause system panic when try to access the
>>>>>> action member.
>>>>>
>>>>> And why would that interrupt be enabled? If you enable a PPI before
>>>>> registering a handler, you're doing something wrong.
>>>>
>>>> Actually,the problem described above happened during the capture
>>>> kernel booting.
>>>>
>>>> In my system, sometimes there is a pending physical timer
>>>> interrupt(30) when the first kernel panic and the status is kept
>>>> until the capture kernel booting.
>>>
>>> And that's perfectly fine. The interrupt can be pending forever, as it
>>> shouldn't get enabled.
>>>
>>>> So, this interrupt will be handled during capture kernel booting.
>>>
>>> Why? Who enables it?
>>>
>>>> Becasue we use virt timer interrupt but not physical timer interrupt
>>>> in capture kernel, the interrupt 30 has no action handler.
>>>
>>> Again: who enables this interrupt? Whichever driver enables it should be
>>> fixed.
>>
>> I'm also at a loss.
>>
>> In this case, arch_timer_uses_ppi must be VIRT_PPI. So in
>> arch_timer_register(), we'll only request_percpu_irq the virt PPI.
>> arch_timer_has_nonsecure_ppi() will be false, given arch_timer_uses_ppi
>> is VIRT_PPI, so in arch_timer_starting_cpu() we'll only
>> enable_percpu_irq() the virt PPI.
>>
>> We don't fiddle with arch_timer_uses_ppi after calling
>> arch_timer_register(). So I can't see how we could enable another IRQ in
>> this case.
>>
>> Looking at the driver in virt/kvm/arm/arch_timer.c, we only enable what
>> we've succesfully requested, so it doesnt' seem like there's an issue
>> there.
>>
>> >From a quick look at teh GIC driver, it looks like we reset PPIs
>> correctly, so it doesn't look like we have a "latent enable".
>>
> 
> I just checked the status of irq 30 during capture kernel booting.
> 
> The irq 30 status is: mask, pending after arch_timer_starting_cpu() called.
> Because irq 30 triggered only 1 time during capture kernel booting,
> I think this problem maybe happened in the case like:
> 1:irq 30 triggered, but not acked by cpu yet.
> 2:local_irq_disable() called
> 3:system reboot -->capture kernel booting
> 4:local_irq_enable()
> 5:irq 30 acked by CPU.
> 
> Is this case possible?

I can't see how, because you've missed:

3b: All PPIs are disabled as each CPU comes up

So for (5) to occur, I can only see two possibilities:
(a) either something else is enabling the timer PPI
(b) your GIC doesn't correctly retire a pending PPI that is being disabled

I'm discounting (b) because I can't see how the system would work
otherwise, so (a) must be happening somehow.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] generic: Add the exception case checking routine for ppi interrupt
  2016-08-31  8:35             ` Marc Zyngier
@ 2016-09-01  8:15               ` majun (F)
  -1 siblings, 0 replies; 23+ messages in thread
From: majun (F) @ 2016-09-01  8:15 UTC (permalink / raw)
  To: Marc Zyngier, Mark Rutland
  Cc: majun258, linux-kernel, linux-arm-kernel, tglx, dingtianhong, guohanjun



在 2016/8/31 16:35, Marc Zyngier 写道:
> On 31/08/16 07:35, majun (F) wrote:
[...]
>>>
>>
>> I just checked the status of irq 30 during capture kernel booting.
>>
>> The irq 30 status is: mask, pending after arch_timer_starting_cpu() called.
>> Because irq 30 triggered only 1 time during capture kernel booting,
>> I think this problem maybe happened in the case like:
>> 1:irq 30 triggered, but not acked by cpu yet.
>> 2:local_irq_disable() called
>> 3:system reboot -->capture kernel booting
>> 4:local_irq_enable()
>> 5:irq 30 acked by CPU.
>>
>> Is this case possible?
> 
> I can't see how, because you've missed:
> 
> 3b: All PPIs are disabled as each CPU comes up
> 
> So for (5) to occur, I can only see two possibilities:
> (a) either something else is enabling the timer PPI

I checked the whole process, the irq 30 alway keeping disabled.

> (b) your GIC doesn't correctly retire a pending PPI that is being disabled

According to our hardware guy said, GIC in our system has problem in this case.
Usually, when we mask irq 30, the interrupt which in pending status but not acked by cpu
should be released/cleared by hardware, but actually, we did't do like this in our system.

So, this conclusion just same as you assumption.

Do you have any suggestion or workaround for this problem?

Thanks!
Majun

> 
> I'm discounting (b) because I can't see how the system would work
> otherwise, so (a) must be happening somehow.
> 


> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] generic: Add the exception case checking routine for ppi interrupt
@ 2016-09-01  8:15               ` majun (F)
  0 siblings, 0 replies; 23+ messages in thread
From: majun (F) @ 2016-09-01  8:15 UTC (permalink / raw)
  To: linux-arm-kernel



? 2016/8/31 16:35, Marc Zyngier ??:
> On 31/08/16 07:35, majun (F) wrote:
[...]
>>>
>>
>> I just checked the status of irq 30 during capture kernel booting.
>>
>> The irq 30 status is: mask, pending after arch_timer_starting_cpu() called.
>> Because irq 30 triggered only 1 time during capture kernel booting,
>> I think this problem maybe happened in the case like:
>> 1:irq 30 triggered, but not acked by cpu yet.
>> 2:local_irq_disable() called
>> 3:system reboot -->capture kernel booting
>> 4:local_irq_enable()
>> 5:irq 30 acked by CPU.
>>
>> Is this case possible?
> 
> I can't see how, because you've missed:
> 
> 3b: All PPIs are disabled as each CPU comes up
> 
> So for (5) to occur, I can only see two possibilities:
> (a) either something else is enabling the timer PPI

I checked the whole process, the irq 30 alway keeping disabled.

> (b) your GIC doesn't correctly retire a pending PPI that is being disabled

According to our hardware guy said, GIC in our system has problem in this case.
Usually, when we mask irq 30, the interrupt which in pending status but not acked by cpu
should be released/cleared by hardware, but actually, we did't do like this in our system.

So, this conclusion just same as you assumption.

Do you have any suggestion or workaround for this problem?

Thanks!
Majun

> 
> I'm discounting (b) because I can't see how the system would work
> otherwise, so (a) must be happening somehow.
> 


> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] generic: Add the exception case checking routine for ppi interrupt
  2016-09-01  8:15               ` majun (F)
@ 2016-09-01  9:03                 ` Marc Zyngier
  -1 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2016-09-01  9:03 UTC (permalink / raw)
  To: majun (F), Mark Rutland
  Cc: linux-kernel, linux-arm-kernel, tglx, dingtianhong, guohanjun

On 01/09/16 09:15, majun (F) wrote:
> 
> 
> 在 2016/8/31 16:35, Marc Zyngier 写道:
>> On 31/08/16 07:35, majun (F) wrote:
> [...]
>>>>
>>>
>>> I just checked the status of irq 30 during capture kernel booting.
>>>
>>> The irq 30 status is: mask, pending after arch_timer_starting_cpu() called.
>>> Because irq 30 triggered only 1 time during capture kernel booting,
>>> I think this problem maybe happened in the case like:
>>> 1:irq 30 triggered, but not acked by cpu yet.
>>> 2:local_irq_disable() called
>>> 3:system reboot -->capture kernel booting
>>> 4:local_irq_enable()
>>> 5:irq 30 acked by CPU.
>>>
>>> Is this case possible?
>>
>> I can't see how, because you've missed:
>>
>> 3b: All PPIs are disabled as each CPU comes up
>>
>> So for (5) to occur, I can only see two possibilities:
>> (a) either something else is enabling the timer PPI
> 
> I checked the whole process, the irq 30 alway keeping disabled.
> 
>> (b) your GIC doesn't correctly retire a pending PPI that is being disabled
> 
> According to our hardware guy said, GIC in our system has problem in this case.
> Usually, when we mask irq 30, the interrupt which in pending status but not acked by cpu
> should be released/cleared by hardware, but actually, we did't do like this in our system.

That's crazy. This means that you cannot reliably mask interrupts. :-(
Does this only affect PPIs? Or does it affect all interrupt types?

> So, this conclusion just same as you assumption.
> 
> Do you have any suggestion or workaround for this problem?

Well, this issue goes way beyond the hack you wanted to add to the
generic code, and it should probably be addressed in the GIC code
itself, as an implementation specific workaround. Without knowing the
details of the erratum, it is difficult to think of that would be
required. I can come up with something like this:

	irqnr = gic_read_iar();
	if (unlikely(!is_enabled(irqnr))) {
		gic_write_eoir(irqnr);
		if (static_key_true(&supports_deactivate))
			gic_write_dir(irqnr);
		set_pending(irqnr);
		continue;
	}

Performance will suffer (an extra MMIO access on the fast path). If LPIs
are also affected, then the ITS code also needs to be involved, and
that's not going to be pretty either. This code will have to be enabled
at runtime, and handled like other erratum we have in this code.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] generic: Add the exception case checking routine for ppi interrupt
@ 2016-09-01  9:03                 ` Marc Zyngier
  0 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2016-09-01  9:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/09/16 09:15, majun (F) wrote:
> 
> 
> ? 2016/8/31 16:35, Marc Zyngier ??:
>> On 31/08/16 07:35, majun (F) wrote:
> [...]
>>>>
>>>
>>> I just checked the status of irq 30 during capture kernel booting.
>>>
>>> The irq 30 status is: mask, pending after arch_timer_starting_cpu() called.
>>> Because irq 30 triggered only 1 time during capture kernel booting,
>>> I think this problem maybe happened in the case like:
>>> 1:irq 30 triggered, but not acked by cpu yet.
>>> 2:local_irq_disable() called
>>> 3:system reboot -->capture kernel booting
>>> 4:local_irq_enable()
>>> 5:irq 30 acked by CPU.
>>>
>>> Is this case possible?
>>
>> I can't see how, because you've missed:
>>
>> 3b: All PPIs are disabled as each CPU comes up
>>
>> So for (5) to occur, I can only see two possibilities:
>> (a) either something else is enabling the timer PPI
> 
> I checked the whole process, the irq 30 alway keeping disabled.
> 
>> (b) your GIC doesn't correctly retire a pending PPI that is being disabled
> 
> According to our hardware guy said, GIC in our system has problem in this case.
> Usually, when we mask irq 30, the interrupt which in pending status but not acked by cpu
> should be released/cleared by hardware, but actually, we did't do like this in our system.

That's crazy. This means that you cannot reliably mask interrupts. :-(
Does this only affect PPIs? Or does it affect all interrupt types?

> So, this conclusion just same as you assumption.
> 
> Do you have any suggestion or workaround for this problem?

Well, this issue goes way beyond the hack you wanted to add to the
generic code, and it should probably be addressed in the GIC code
itself, as an implementation specific workaround. Without knowing the
details of the erratum, it is difficult to think of that would be
required. I can come up with something like this:

	irqnr = gic_read_iar();
	if (unlikely(!is_enabled(irqnr))) {
		gic_write_eoir(irqnr);
		if (static_key_true(&supports_deactivate))
			gic_write_dir(irqnr);
		set_pending(irqnr);
		continue;
	}

Performance will suffer (an extra MMIO access on the fast path). If LPIs
are also affected, then the ITS code also needs to be involved, and
that's not going to be pretty either. This code will have to be enabled
at runtime, and handled like other erratum we have in this code.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] generic: Add the exception case checking routine for ppi interrupt
  2016-09-01  9:03                 ` Marc Zyngier
@ 2016-09-02 13:08                   ` Thomas Gleixner
  -1 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2016-09-02 13:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: majun (F),
	Mark Rutland, linux-kernel, linux-arm-kernel, dingtianhong,
	guohanjun

On Thu, 1 Sep 2016, Marc Zyngier wrote:
> On 01/09/16 09:15, majun (F) wrote:
> Well, this issue goes way beyond the hack you wanted to add to the
> generic code, and it should probably be addressed in the GIC code
> itself, as an implementation specific workaround. Without knowing the
> details of the erratum, it is difficult to think of that would be
> required. I can come up with something like this:
> 
> 	irqnr = gic_read_iar();
> 	if (unlikely(!is_enabled(irqnr))) {
> 		gic_write_eoir(irqnr);
> 		if (static_key_true(&supports_deactivate))
> 			gic_write_dir(irqnr);
> 		set_pending(irqnr);
> 		continue;
> 	}
> 
> Performance will suffer (an extra MMIO access on the fast path). If LPIs
> are also affected, then the ITS code also needs to be involved, and
> that's not going to be pretty either. This code will have to be enabled
> at runtime, and handled like other erratum we have in this code.

So that's certainly a required workaround at the gic level. Though I really
think that we should make handle_percpu_devid_irq robust against a spurious
interrupt.

>  void handle_percpu_devid_irq(unsigned int irq, struct irq_desc *desc)
>  {
> -	struct irq_chip *chip = irq_desc_get_chip(desc);
> -	struct irqaction *action = desc->action;
> -	void *dev_id = raw_cpu_ptr(action->percpu_dev_id);
> +	struct irq_chip *chip = NULL;
> +	struct irqaction *action;
> +	void *dev_id;
>  	irqreturn_t res;
>  
> +	action = desc->action;
> +
> +	/* Unexpected interrupt in some execption case
> +	 * we just send eoi to end this interrupt
> +	 */
> +	if (unlikely(!action)) {
> +		mask_irq(desc);

This is wrong. mask_irq() does not work for percpu interrupts. Aside of that
this completely lacks any debug information which tells us that there is
something wrong in the system. I'm going to apply the patch below for
robustness sake.

Thanks,

	tglx

8<----------------------
Subject: genirq: Robustify handle_percpu_devid_irq()
From: Thomas Gleixner <tglx@linutronix.de>
Date: Fri, 02 Sep 2016 14:45:19 +0200

The percpu_devid handler is not robust against spurious interrupts. If a
spurious interrupt happens and no action is installed then the handler crashes
with a NULL pointer dereference.

Add a sanity check for this and log the wreckage once in dmesg.

Reported-by: Majun <majun258@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/irq/chip.c |   18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -756,7 +756,6 @@ void handle_percpu_devid_irq(struct irq_
 {
 	struct irq_chip *chip = irq_desc_get_chip(desc);
 	struct irqaction *action = desc->action;
-	void *dev_id = raw_cpu_ptr(action->percpu_dev_id);
 	unsigned int irq = irq_desc_get_irq(desc);
 	irqreturn_t res;
 
@@ -765,9 +764,20 @@ void handle_percpu_devid_irq(struct irq_
 	if (chip->irq_ack)
 		chip->irq_ack(&desc->irq_data);
 
-	trace_irq_handler_entry(irq, action);
-	res = action->handler(irq, dev_id);
-	trace_irq_handler_exit(irq, action, res);
+	if (likely(action)) {
+		trace_irq_handler_entry(irq, action);
+		res = action->handler(irq, raw_cpu_ptr(action->percpu_dev_id));
+		trace_irq_handler_exit(irq, action, res);
+	} else {
+		unsigned int cpu = smp_processor_id();
+		bool enabled = cpumask_test_cpu(cpu, desc->percpu_enabled);
+
+		if (enabled)
+			irq_percpu_disable(desc, cpu);
+
+		pr_err_once("Spurious%s percpu IRQ%u on CPU%u\n",
+			    enabled ? " and unmasked" : "", irq, cpu);
+	}
 
 	if (chip->irq_eoi)
 		chip->irq_eoi(&desc->irq_data);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] generic: Add the exception case checking routine for ppi interrupt
@ 2016-09-02 13:08                   ` Thomas Gleixner
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2016-09-02 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 1 Sep 2016, Marc Zyngier wrote:
> On 01/09/16 09:15, majun (F) wrote:
> Well, this issue goes way beyond the hack you wanted to add to the
> generic code, and it should probably be addressed in the GIC code
> itself, as an implementation specific workaround. Without knowing the
> details of the erratum, it is difficult to think of that would be
> required. I can come up with something like this:
> 
> 	irqnr = gic_read_iar();
> 	if (unlikely(!is_enabled(irqnr))) {
> 		gic_write_eoir(irqnr);
> 		if (static_key_true(&supports_deactivate))
> 			gic_write_dir(irqnr);
> 		set_pending(irqnr);
> 		continue;
> 	}
> 
> Performance will suffer (an extra MMIO access on the fast path). If LPIs
> are also affected, then the ITS code also needs to be involved, and
> that's not going to be pretty either. This code will have to be enabled
> at runtime, and handled like other erratum we have in this code.

So that's certainly a required workaround at the gic level. Though I really
think that we should make handle_percpu_devid_irq robust against a spurious
interrupt.

>  void handle_percpu_devid_irq(unsigned int irq, struct irq_desc *desc)
>  {
> -	struct irq_chip *chip = irq_desc_get_chip(desc);
> -	struct irqaction *action = desc->action;
> -	void *dev_id = raw_cpu_ptr(action->percpu_dev_id);
> +	struct irq_chip *chip = NULL;
> +	struct irqaction *action;
> +	void *dev_id;
>  	irqreturn_t res;
>  
> +	action = desc->action;
> +
> +	/* Unexpected interrupt in some execption case
> +	 * we just send eoi to end this interrupt
> +	 */
> +	if (unlikely(!action)) {
> +		mask_irq(desc);

This is wrong. mask_irq() does not work for percpu interrupts. Aside of that
this completely lacks any debug information which tells us that there is
something wrong in the system. I'm going to apply the patch below for
robustness sake.

Thanks,

	tglx

8<----------------------
Subject: genirq: Robustify handle_percpu_devid_irq()
From: Thomas Gleixner <tglx@linutronix.de>
Date: Fri, 02 Sep 2016 14:45:19 +0200

The percpu_devid handler is not robust against spurious interrupts. If a
spurious interrupt happens and no action is installed then the handler crashes
with a NULL pointer dereference.

Add a sanity check for this and log the wreckage once in dmesg.

Reported-by: Majun <majun258@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/irq/chip.c |   18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -756,7 +756,6 @@ void handle_percpu_devid_irq(struct irq_
 {
 	struct irq_chip *chip = irq_desc_get_chip(desc);
 	struct irqaction *action = desc->action;
-	void *dev_id = raw_cpu_ptr(action->percpu_dev_id);
 	unsigned int irq = irq_desc_get_irq(desc);
 	irqreturn_t res;
 
@@ -765,9 +764,20 @@ void handle_percpu_devid_irq(struct irq_
 	if (chip->irq_ack)
 		chip->irq_ack(&desc->irq_data);
 
-	trace_irq_handler_entry(irq, action);
-	res = action->handler(irq, dev_id);
-	trace_irq_handler_exit(irq, action, res);
+	if (likely(action)) {
+		trace_irq_handler_entry(irq, action);
+		res = action->handler(irq, raw_cpu_ptr(action->percpu_dev_id));
+		trace_irq_handler_exit(irq, action, res);
+	} else {
+		unsigned int cpu = smp_processor_id();
+		bool enabled = cpumask_test_cpu(cpu, desc->percpu_enabled);
+
+		if (enabled)
+			irq_percpu_disable(desc, cpu);
+
+		pr_err_once("Spurious%s percpu IRQ%u on CPU%u\n",
+			    enabled ? " and unmasked" : "", irq, cpu);
+	}
 
 	if (chip->irq_eoi)
 		chip->irq_eoi(&desc->irq_data);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] generic: Add the exception case checking routine for ppi interrupt
  2016-09-02 13:08                   ` Thomas Gleixner
@ 2016-09-02 15:49                     ` Marc Zyngier
  -1 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2016-09-02 15:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: majun (F),
	Mark Rutland, linux-kernel, linux-arm-kernel, dingtianhong,
	guohanjun

On 02/09/16 14:08, Thomas Gleixner wrote:
> On Thu, 1 Sep 2016, Marc Zyngier wrote:
>> On 01/09/16 09:15, majun (F) wrote:
>> Well, this issue goes way beyond the hack you wanted to add to the
>> generic code, and it should probably be addressed in the GIC code
>> itself, as an implementation specific workaround. Without knowing the
>> details of the erratum, it is difficult to think of that would be
>> required. I can come up with something like this:
>>
>> 	irqnr = gic_read_iar();
>> 	if (unlikely(!is_enabled(irqnr))) {
>> 		gic_write_eoir(irqnr);
>> 		if (static_key_true(&supports_deactivate))
>> 			gic_write_dir(irqnr);
>> 		set_pending(irqnr);
>> 		continue;
>> 	}
>>
>> Performance will suffer (an extra MMIO access on the fast path). If LPIs
>> are also affected, then the ITS code also needs to be involved, and
>> that's not going to be pretty either. This code will have to be enabled
>> at runtime, and handled like other erratum we have in this code.
> 
> So that's certainly a required workaround at the gic level. Though I really
> think that we should make handle_percpu_devid_irq robust against a spurious
> interrupt.
> 
>>  void handle_percpu_devid_irq(unsigned int irq, struct irq_desc *desc)
>>  {
>> -	struct irq_chip *chip = irq_desc_get_chip(desc);
>> -	struct irqaction *action = desc->action;
>> -	void *dev_id = raw_cpu_ptr(action->percpu_dev_id);
>> +	struct irq_chip *chip = NULL;
>> +	struct irqaction *action;
>> +	void *dev_id;
>>  	irqreturn_t res;
>>  
>> +	action = desc->action;
>> +
>> +	/* Unexpected interrupt in some execption case
>> +	 * we just send eoi to end this interrupt
>> +	 */
>> +	if (unlikely(!action)) {
>> +		mask_irq(desc);
> 
> This is wrong. mask_irq() does not work for percpu interrupts. Aside of that
> this completely lacks any debug information which tells us that there is
> something wrong in the system. I'm going to apply the patch below for
> robustness sake.
> 
> Thanks,
> 
> 	tglx
> 
> 8<----------------------
> Subject: genirq: Robustify handle_percpu_devid_irq()
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Fri, 02 Sep 2016 14:45:19 +0200
> 
> The percpu_devid handler is not robust against spurious interrupts. If a
> spurious interrupt happens and no action is installed then the handler crashes
> with a NULL pointer dereference.
> 
> Add a sanity check for this and log the wreckage once in dmesg.
> 
> Reported-by: Majun <majun258@huawei.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Looks fine to me.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] generic: Add the exception case checking routine for ppi interrupt
@ 2016-09-02 15:49                     ` Marc Zyngier
  0 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2016-09-02 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/09/16 14:08, Thomas Gleixner wrote:
> On Thu, 1 Sep 2016, Marc Zyngier wrote:
>> On 01/09/16 09:15, majun (F) wrote:
>> Well, this issue goes way beyond the hack you wanted to add to the
>> generic code, and it should probably be addressed in the GIC code
>> itself, as an implementation specific workaround. Without knowing the
>> details of the erratum, it is difficult to think of that would be
>> required. I can come up with something like this:
>>
>> 	irqnr = gic_read_iar();
>> 	if (unlikely(!is_enabled(irqnr))) {
>> 		gic_write_eoir(irqnr);
>> 		if (static_key_true(&supports_deactivate))
>> 			gic_write_dir(irqnr);
>> 		set_pending(irqnr);
>> 		continue;
>> 	}
>>
>> Performance will suffer (an extra MMIO access on the fast path). If LPIs
>> are also affected, then the ITS code also needs to be involved, and
>> that's not going to be pretty either. This code will have to be enabled
>> at runtime, and handled like other erratum we have in this code.
> 
> So that's certainly a required workaround at the gic level. Though I really
> think that we should make handle_percpu_devid_irq robust against a spurious
> interrupt.
> 
>>  void handle_percpu_devid_irq(unsigned int irq, struct irq_desc *desc)
>>  {
>> -	struct irq_chip *chip = irq_desc_get_chip(desc);
>> -	struct irqaction *action = desc->action;
>> -	void *dev_id = raw_cpu_ptr(action->percpu_dev_id);
>> +	struct irq_chip *chip = NULL;
>> +	struct irqaction *action;
>> +	void *dev_id;
>>  	irqreturn_t res;
>>  
>> +	action = desc->action;
>> +
>> +	/* Unexpected interrupt in some execption case
>> +	 * we just send eoi to end this interrupt
>> +	 */
>> +	if (unlikely(!action)) {
>> +		mask_irq(desc);
> 
> This is wrong. mask_irq() does not work for percpu interrupts. Aside of that
> this completely lacks any debug information which tells us that there is
> something wrong in the system. I'm going to apply the patch below for
> robustness sake.
> 
> Thanks,
> 
> 	tglx
> 
> 8<----------------------
> Subject: genirq: Robustify handle_percpu_devid_irq()
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Fri, 02 Sep 2016 14:45:19 +0200
> 
> The percpu_devid handler is not robust against spurious interrupts. If a
> spurious interrupt happens and no action is installed then the handler crashes
> with a NULL pointer dereference.
> 
> Add a sanity check for this and log the wreckage once in dmesg.
> 
> Reported-by: Majun <majun258@huawei.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Looks fine to me.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [tip:irq/core] genirq: Robustify handle_percpu_devid_irq()
  2016-09-02 13:08                   ` Thomas Gleixner
  (?)
  (?)
@ 2016-09-02 16:13                   ` tip-bot for Thomas Gleixner
  -1 siblings, 0 replies; 23+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-09-02 16:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, mark.rutland, marc.zyngier, mingo, tglx, linux-kernel, majun258

Commit-ID:  fc590c22f9f056ab50190b797f6cacead29f9b75
Gitweb:     http://git.kernel.org/tip/fc590c22f9f056ab50190b797f6cacead29f9b75
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 2 Sep 2016 14:45:19 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 2 Sep 2016 18:06:49 +0200

genirq: Robustify handle_percpu_devid_irq()

The percpu_devid handler is not robust against spurious interrupts. If a
spurious interrupt happens and no action is installed then the handler
crashes with a NULL pointer dereference.

Add a sanity check for this and log the wreckage once in dmesg.

Reported-by: Majun <majun258@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: guohanjun@huawei.com
Cc: dingtianhong@huawei.com
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1609021436160.5647@nanos

---
 kernel/irq/chip.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index b4c1bc7..93c373a 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -756,7 +756,6 @@ void handle_percpu_devid_irq(struct irq_desc *desc)
 {
 	struct irq_chip *chip = irq_desc_get_chip(desc);
 	struct irqaction *action = desc->action;
-	void *dev_id = raw_cpu_ptr(action->percpu_dev_id);
 	unsigned int irq = irq_desc_get_irq(desc);
 	irqreturn_t res;
 
@@ -765,9 +764,20 @@ void handle_percpu_devid_irq(struct irq_desc *desc)
 	if (chip->irq_ack)
 		chip->irq_ack(&desc->irq_data);
 
-	trace_irq_handler_entry(irq, action);
-	res = action->handler(irq, dev_id);
-	trace_irq_handler_exit(irq, action, res);
+	if (likely(action)) {
+		trace_irq_handler_entry(irq, action);
+		res = action->handler(irq, raw_cpu_ptr(action->percpu_dev_id));
+		trace_irq_handler_exit(irq, action, res);
+	} else {
+		unsigned int cpu = smp_processor_id();
+		bool enabled = cpumask_test_cpu(cpu, desc->percpu_enabled);
+
+		if (enabled)
+			irq_percpu_disable(desc, cpu);
+
+		pr_err_once("Spurious%s percpu IRQ%u on CPU%u\n",
+			    enabled ? " and unmasked" : "", irq, cpu);
+	}
 
 	if (chip->irq_eoi)
 		chip->irq_eoi(&desc->irq_data);

^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-09-02 16:13 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-30  4:17 [PATCH] generic: Add the exception case checking routine for ppi interrupt MaJun
2016-08-30  4:17 ` MaJun
2016-08-30  8:50 ` Marc Zyngier
2016-08-30  8:50   ` Marc Zyngier
2016-08-30 10:35   ` majun (F)
2016-08-30 10:35     ` majun (F)
2016-08-30 11:07     ` Marc Zyngier
2016-08-30 11:07       ` Marc Zyngier
2016-08-30 11:21       ` Mark Rutland
2016-08-30 11:21         ` Mark Rutland
2016-08-31  6:35         ` majun (F)
2016-08-31  6:35           ` majun (F)
2016-08-31  8:35           ` Marc Zyngier
2016-08-31  8:35             ` Marc Zyngier
2016-09-01  8:15             ` majun (F)
2016-09-01  8:15               ` majun (F)
2016-09-01  9:03               ` Marc Zyngier
2016-09-01  9:03                 ` Marc Zyngier
2016-09-02 13:08                 ` Thomas Gleixner
2016-09-02 13:08                   ` Thomas Gleixner
2016-09-02 15:49                   ` Marc Zyngier
2016-09-02 15:49                     ` Marc Zyngier
2016-09-02 16:13                   ` [tip:irq/core] genirq: Robustify handle_percpu_devid_irq() tip-bot for Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.