All of lore.kernel.org
 help / color / mirror / Atom feed
* ARC770: "unexpected IRQ trap at vector 00" during boot
@ 2017-07-24 17:38 Alexandru Gagniuc
  2017-07-24 20:39 ` Alexey Brodkin
  0 siblings, 1 reply; 9+ messages in thread
From: Alexandru Gagniuc @ 2017-07-24 17:38 UTC (permalink / raw)
  To: linux-snps-arc

Hi,

I'm getting a storm of these messages when trying to boot an in-house 
ASIC with an ARC770. This only happens with an ethernet cable plugged 
in. I've learned that the actual interrupt number is 21. The issue is 
that the irq_find_mapping() in __handle_domain_irq() fails to find a 
mapping for vector 21, and the remaining logic will brainlessly print 
out '0' as the interrupt number (which is of course, bass-ackwards).

This happens very early in the boot process, right after interrupts are 
globally enabled. IRQ 21 is the correct IRQ vector for the ethernet 
controller, but I don't understand why the IRQ vector is unmasked before 
the ethernet driver is loaded. This is a chicken and egg problem, since 
we have no control over the state of the ethernet before the driver is 
actually loaded.

I'm hoping someone might be able to point me in the right directions, 
since at this point, I'm not sure if this is a devicetree problem, 
hardware bug, or linux bug.

Alex


# Appendix A: Relevant devicetree bindings:

/ {
	model = "adaptrum,anarion";
	compatible = "snps,nsim";
	#address-cells = <1>;
	#size-cells = <1>;
	interrupt-parent = <&core_intc>;

	chosen {
		bootargs = "earlycon console=ttyS0,115200n8";
		stdout-path = "serial0:115200n8";
	};

	aliases {
		serial0 = &uart0;
	};

	soc {
		compatible = "simple-bus";
		device_type = "soc";
		#address-cells = <1>;
		#size-cells = <1>;
		ranges;

		core_clk: core_clk {
			#clock-cells = <0>;
			compatible = "fixed-clock";
			clock-frequency = <12000000>;
		};

		core_intc: interrupt-controller {
			compatible = "snps,arc700-intc";
			interrupt-controller;
			#interrupt-cells = <1>;
		};

		uart0: serial at f2202100 {
			compatible = "ns16550";
			reg = <0xf2202100 0x20>;
			interrupts = <8>;
			reg-shift = <2>;
			reg-io-width = <4>;
			clock-frequency = <192000000>;
		};


		gmac1: ethernet at f2014000 {
			compatible = "snps,dwmac";
			reg = <0xf2014000 0x4000>;

			interrupt-parent = <&core_intc>;
			interrupts = <21>;
			interrupt-names = "macirq";

			clocks = <&core_clk>;
			clock-names = "stmmaceth";

			snps,pbl = <32>;
			status = "disabled";
		};
	};

};

&gmac1 {
	phy-mode = "rgmii";
	status = "okay";
};

^ permalink raw reply	[flat|nested] 9+ messages in thread

* ARC770: "unexpected IRQ trap at vector 00" during boot
  2017-07-24 17:38 ARC770: "unexpected IRQ trap at vector 00" during boot Alexandru Gagniuc
@ 2017-07-24 20:39 ` Alexey Brodkin
  2017-07-25  4:04   ` Alex
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Brodkin @ 2017-07-24 20:39 UTC (permalink / raw)
  To: linux-snps-arc

Hi Alexandru,

> I'm getting a storm of these messages when trying to boot an in-house
> ASIC with an ARC770. This only happens with an ethernet cable plugged
> in. I've learned that the actual interrupt number is 21. The issue is
> that the irq_find_mapping() in __handle_domain_irq() fails to find a
> mapping for vector 21, and the remaining logic will brainlessly print
> out '0' as the interrupt number (which is of course, bass-ackwards).
>
> This happens very early in the boot process, right after interrupts are
> globally enabled. IRQ 21 is the correct IRQ vector for the ethernet
> controller, but I don't understand why the IRQ vector is unmasked before
> the ethernet driver is loaded. This is a chicken and egg problem, since
> we have no control over the state of the ethernet before the driver is
> actually loaded.

That's interesting! I saw exactly the same issue with one of our devboards.
What happens here is GMAC generates output interrupt because of some
condition(s), now since GMAC's IRQ line is wired directly to
ARC700's interrupt controller, all line of which get enabled early on boot
(as you correctly mentioned above) and so you're getting an interrupt kernel cannot
serve and reset as it doesn't know how to "please" an interrupt source.

Answering your question why interrupt from GMAC happens before its driver
is probed:
 1) I need to look at my notes I made when was fighting with the same problem,
     but for some reason DW GMAC seems to have interrupts enabled on reset
     which is indeed a bit unexpected and might lead to a behavior you and I saw.
 2) Historically we used to enable all possible core IRQ lines early on boot as opposed
     to per-line init by request of each particular driver. We have this on our to-do list
     as one of important improvements but that's not a short-term fix for sure.

A work-around that I made and which may recommend to you is to figure out what
condition in GMAC leads to generation of interrupt on its out and then resetting it in
GMAC in your early platform boot code.

Let me know if it helps.

-Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* ARC770: "unexpected IRQ trap at vector 00" during boot
  2017-07-24 20:39 ` Alexey Brodkin
@ 2017-07-25  4:04   ` Alex
  2017-07-25 20:11     ` Alexey Brodkin
  0 siblings, 1 reply; 9+ messages in thread
From: Alex @ 2017-07-25  4:04 UTC (permalink / raw)
  To: linux-snps-arc

On 07/24/2017 01:39 PM, Alexey Brodkin wrote:
> Hi Alexandru,
>
>> I'm getting a storm of these messages when trying to boot an in-house
>> ASIC with an ARC770. This only happens with an ethernet cable plugged
>> in. I've learned that the actual interrupt number is 21. The issue is
>> that the irq_find_mapping() in __handle_domain_irq() fails to find a
>> mapping for vector 21, and the remaining logic will brainlessly print
>> out '0' as the interrupt number (which is of course, bass-ackwards).
[snip]


> That's interesting! I saw exactly the same issue with one of our devboards.
> What happens here is GMAC generates output interrupt because of some
> condition(s),

I'm seeing this when using U-boot to load the kernel over ethernet. I 
think it's enough to have the PHY link autonegotiated to get the 
problem, but I didn't verify this.

> Answering your question why interrupt from GMAC happens before its driver
> is probed:
>  1) I need to look at my notes I made when was fighting with the same problem,
>      but for some reason DW GMAC seems to have interrupts enabled on reset
>      which is indeed a bit unexpected and might lead to a behavior you and I saw.

I'm curious to know what your notes say.

>  2) Historically we used to enable all possible core IRQ lines early on boot as opposed
>      to per-line init by request of each particular driver. We have this on our to-do list
>      as one of important improvements but that's not a short-term fix for sure.


>
> A work-around that I made and which may recommend to you is to figure out what
> condition in GMAC leads to generation of interrupt on its out and then resetting it in
> GMAC in your early platform boot code.

I was afraid that might be the only way. I can keep the GMAC logic in 
reset, but that requires a custom platform early_init(), and a glue 
driver for snps,dwmac.

> Let me know if it helps.

Yes. I implemented the proof-of-concept today. It's a very "interesting" 
balancing act on when exactly to release the reset on the GMAC. I'm able 
to get to boot to a shell with ethernet plugged in.

Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* ARC770: "unexpected IRQ trap at vector 00" during boot
  2017-07-25  4:04   ` Alex
@ 2017-07-25 20:11     ` Alexey Brodkin
  2017-07-26  3:08       ` Vineet Gupta
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Brodkin @ 2017-07-25 20:11 UTC (permalink / raw)
  To: linux-snps-arc

Hi Alex,

> -----Original Message-----
> From: Alex [mailto:alex.g at adaptrum.com]
> Sent: Tuesday, July 25, 2017 7:05 AM
> To: Alexey Brodkin <Alexey.Brodkin at synopsys.com>; linux-snps-arc at lists.infradead.org
> Cc: Gokhan Cosgul <gokhan at adaptrum.com>; Vineet.Gupta1 at synopsys.com
> Subject: Re: ARC770: "unexpected IRQ trap at vector 00" during boot
> 
> On 07/24/2017 01:39 PM, Alexey Brodkin wrote:
> > Hi Alexandru,
> >
> >> I'm getting a storm of these messages when trying to boot an in-house
> >> ASIC with an ARC770. This only happens with an ethernet cable plugged
> >> in. I've learned that the actual interrupt number is 21. The issue is
> >> that the irq_find_mapping() in __handle_domain_irq() fails to find a
> >> mapping for vector 21, and the remaining logic will brainlessly print
> >> out '0' as the interrupt number (which is of course, bass-ackwards).
> [snip]
> 
> 
> > That's interesting! I saw exactly the same issue with one of our devboards.
> > What happens here is GMAC generates output interrupt because of some
> > condition(s),
> 
> I'm seeing this when using U-boot to load the kernel over ethernet. I
> think it's enough to have the PHY link autonegotiated to get the
> problem, but I didn't verify this.

Right that matches my observations made back in the day.
It was link status bit (0x1) in ("INT STS" register - offset 0x38). And indeed with cable disconnected
this problem never happened.

> > Answering your question why interrupt from GMAC happens before its driver
> > is probed:
> >  1) I need to look at my notes I made when was fighting with the same problem,
> >      but for some reason DW GMAC seems to have interrupts enabled on reset
> >      which is indeed a bit unexpected and might lead to a behavior you and I saw.
> 
> I'm curious to know what your notes say.

See above.

> 
> >  2) Historically we used to enable all possible core IRQ lines early on boot as opposed
> >      to per-line init by request of each particular driver. We have this on our to-do list
> >      as one of important improvements but that's not a short-term fix for sure.
> 
> 
> >
> > A work-around that I made and which may recommend to you is to figure out what
> > condition in GMAC leads to generation of interrupt on its out and then resetting it in
> > GMAC in your early platform boot code.
> 
> I was afraid that might be the only way. I can keep the GMAC logic in
> reset, but that requires a custom platform early_init(), and a glue
> driver for snps,dwmac.

Well I in case you have U-Boot what you may do (and that really makes sense) is to
clean GMAC's link state flag in _dw_eth_halt() in U-Boot, see
http://git.denx.de/?p=u-boot.git;a=blob;f=drivers/net/designware.c#l264

> > Let me know if it helps.
> 
> Yes. I implemented the proof-of-concept today. It's a very "interesting"
> balancing act on when exactly to release the reset on the GMAC. I'm able
> to get to boot to a shell with ethernet plugged in.

Agree, that's all not as it should be. So I advise you to report that issue to developers of
DW GMAC via Solvnet (https://solvnet.synopsys.com). But if you already have a silicon
then your rants might not be of much help.

BTW what is your exact kernel version?

In the meantime we'll try to revisit rework of ARC's INTC init procedure  but
I cannot promise anything very soon as I'm on ETO this week but we'll see how it goes.

-Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* ARC770: "unexpected IRQ trap at vector 00" during boot
  2017-07-25 20:11     ` Alexey Brodkin
@ 2017-07-26  3:08       ` Vineet Gupta
  2017-08-01 21:33         ` snps,dwmac interrupt storm (Was: ARC770: "unexpected IRQ trap at vector 00" during boot) Alex
  0 siblings, 1 reply; 9+ messages in thread
From: Vineet Gupta @ 2017-07-26  3:08 UTC (permalink / raw)
  To: linux-snps-arc

On 07/26/2017 01:41 AM, Alexey Brodkin wrote:
> BTW what is your exact kernel version?
>
> In the meantime we'll try to revisit rework of ARC's INTC init procedure  but
> I cannot promise anything very soon as I'm on ETO this week but we'll see how it goes.

And exactly do we intend to rework - AFAIK nothings really broken at the moment in 
ARC intc handling !

-Vineet

^ permalink raw reply	[flat|nested] 9+ messages in thread

* snps,dwmac interrupt storm (Was: ARC770: "unexpected IRQ trap at vector 00" during boot)
  2017-07-26  3:08       ` Vineet Gupta
@ 2017-08-01 21:33         ` Alex
  2017-08-02  6:23           ` snps, dwmac " Vineet Gupta
  0 siblings, 1 reply; 9+ messages in thread
From: Alex @ 2017-08-01 21:33 UTC (permalink / raw)
  To: linux-snps-arc

On 07/25/2017 08:08 PM, Vineet Gupta wrote:

Hi Vineet,

> On 07/26/2017 01:41 AM, Alexey Brodkin wrote:
>> BTW what is your exact kernel version?
>>
>> In the meantime we'll try to revisit rework of ARC's INTC init
>> procedure  but
>> I cannot promise anything very soon as I'm on ETO this week but we'll
>> see how it goes.
>
> And exactly do we intend to rework - AFAIK nothings really broken at the
> moment in ARC intc handling !

I have tried the workarouns I mentioned on top of linux 4.9.34, and it 
works exactly as expected. however, on top of 4.13-rc3 [1], the story is 
a lot different. As soon as I release the GMAC from reset, the boot 
stops. I can single-step through JTAG, and see that the GMAC sends an 
interrupt storm. The kernel doesn't have time to move on with the dwmac 
initialization and register the interrupt, and that's that.

I'd file this under both 'regression' and 'bug' categories.

Not sure what changed under the hood from 4.9 to 4.13-rc3 to cause such 
a drastically different behavior. I can't really do much else as 
workarounds, since the GMAC registers are not writable while the GMAC is 
in reset.

Alex

[1] https://github.com/mrnuke-adaptrum/linux/commits/patch-v2-anarion-wip


> -Vineet

^ permalink raw reply	[flat|nested] 9+ messages in thread

* snps, dwmac interrupt storm (Was: ARC770: "unexpected IRQ trap at vector 00" during boot)
  2017-08-01 21:33         ` snps,dwmac interrupt storm (Was: ARC770: "unexpected IRQ trap at vector 00" during boot) Alex
@ 2017-08-02  6:23           ` Vineet Gupta
  2017-08-02  7:16             ` snps,dwmac " Alexey Brodkin
  2017-08-02 17:20             ` snps, dwmac " Alexandru Gagniuc
  0 siblings, 2 replies; 9+ messages in thread
From: Vineet Gupta @ 2017-08-02  6:23 UTC (permalink / raw)
  To: linux-snps-arc

On 08/02/2017 03:03 AM, Alex wrote:
> On 07/25/2017 08:08 PM, Vineet Gupta wrote:
> 
> Hi Vineet,
> 
>> On 07/26/2017 01:41 AM, Alexey Brodkin wrote:
>>> BTW what is your exact kernel version?
>>>
>>> In the meantime we'll try to revisit rework of ARC's INTC init
>>> procedure  but
>>> I cannot promise anything very soon as I'm on ETO this week but we'll
>>> see how it goes.
>>
>> And exactly do we intend to rework - AFAIK nothings really broken at the
>> moment in ARC intc handling !
> 
> I have tried the workarouns I mentioned on top of linux 4.9.34, and it works 
> exactly as expected. however, on top of 4.13-rc3 [1], the story is a lot 
> different. As soon as I release the GMAC from reset, the boot stops. I can 
> single-step through JTAG, and see that the GMAC sends an interrupt storm. The 
> kernel doesn't have time to move on with the dwmac initialization and register the 
> interrupt, and that's that.

I'm a bit confused here. Are you saying that your current patchset for ARC is 
broken on 4.13.x due to "something" while it was working with 4.9.

> I'd file this under both 'regression' and 'bug' categories.

Sure - the question where is the bug/regression, is it in ARC port, driver updates 
or yet something else in the kernel.

> 
> Not sure what changed under the hood from 4.9 to 4.13-rc3 to cause such a 
> drastically different behavior. I can't really do much else as workarounds, since 
> the GMAC registers are not writable while the GMAC is in reset.

We had a fair bit of churn in intc department in 4.10 and 4.11 but most of those 
were related to the IDU intc found only on HS38x cores, not on ARC700. To really 
narrow down the regression, perhaps try a dirty bisect trick (which works for me 
sometimes). Squash all the Adaptrum changes into 1 patch - I presume that same 
patch applies to 4.9 as to 4.13 (otherwise u need to improvise). git bisect 
between 4.9 (good) and 4.13-rcx (bad) and patch -p1 < ur-patch at each stage.

-Vineet

^ permalink raw reply	[flat|nested] 9+ messages in thread

* snps,dwmac interrupt storm (Was: ARC770: "unexpected IRQ trap at vector 00" during boot)
  2017-08-02  6:23           ` snps, dwmac " Vineet Gupta
@ 2017-08-02  7:16             ` Alexey Brodkin
  2017-08-02 17:20             ` snps, dwmac " Alexandru Gagniuc
  1 sibling, 0 replies; 9+ messages in thread
From: Alexey Brodkin @ 2017-08-02  7:16 UTC (permalink / raw)
  To: linux-snps-arc

Hi Alex,

On Wed, 2017-08-02@11:53 +0530, Vineet Gupta wrote:
> On 08/02/2017 03:03 AM, Alex wrote:
> > 
> > On 07/25/2017 08:08 PM, Vineet Gupta wrote:
> > 
> > Hi Vineet,
> > 
> > > 
> > > On 07/26/2017 01:41 AM, Alexey Brodkin wrote:
> > > > 
> > > > BTW what is your exact kernel version?
> > > > 
> > > > In the meantime we'll try to revisit rework of ARC's INTC init
> > > > procedure??but
> > > > I cannot promise anything very soon as I'm on ETO this week but we'll
> > > > see how it goes.
> > > 
> > > And exactly do we intend to rework - AFAIK nothings really broken at the
> > > moment in ARC intc handling !
> > 
> > I have tried the workarouns I mentioned on top of linux 4.9.34, and it works?
> > exactly as expected. however, on top of 4.13-rc3 [1], the story is a lot?
> > different. As soon as I release the GMAC from reset, the boot stops. I can?
> > single-step through JTAG, and see that the GMAC sends an interrupt storm. The?
> > kernel doesn't have time to move on with the dwmac initialization and register the?
> > interrupt, and that's that.

The only guess I have given mentioned symptoms is you're somehow affected by
90f522a20e3d "NET: dwmac: Make dwmac reset unconditional", see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=90f522a20e3d16d153e5a5f84cf4ff92281ee417

I guess what happens in 4.13-rc3 with that patch in place your previous work-around
gets reset together with GMAC on driver probe.

So just try to revert that patch for a moment and let me know if that was the case.

-Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* snps, dwmac interrupt storm (Was: ARC770: "unexpected IRQ trap at vector 00" during boot)
  2017-08-02  6:23           ` snps, dwmac " Vineet Gupta
  2017-08-02  7:16             ` snps,dwmac " Alexey Brodkin
@ 2017-08-02 17:20             ` Alexandru Gagniuc
  1 sibling, 0 replies; 9+ messages in thread
From: Alexandru Gagniuc @ 2017-08-02 17:20 UTC (permalink / raw)
  To: linux-snps-arc

On 08/01/2017 11:23 PM, Vineet Gupta wrote:
> On 08/02/2017 03:03 AM, Alex wrote:
>> On 07/25/2017 08:08 PM, Vineet Gupta wrote:
>> I have tried the workarouns I mentioned on top of linux 4.9.34, and it
>> works exactly as expected. however, on top of 4.13-rc3 [1], the story
>> is a lot different. As soon as I release the GMAC from reset, the boot
>> stops. I can single-step through JTAG, and see that the GMAC sends an
>> interrupt storm. The kernel doesn't have time to move on with the
>> dwmac initialization and register the interrupt, and that's that.
>
> I'm a bit confused here. Are you saying that your current patchset for
> ARC is broken on 4.13.x due to "something" while it was working with 4.9.

4.9: GOOD
4.13-rc3: BAD

>> I'd file this under both 'regression' and 'bug' categories.
>
> Sure - the question where is the bug/regression, is it in ARC port,
> driver updates or yet something else in the kernel.

Something else.

>> Not sure what changed under the hood from 4.9 to 4.13-rc3 to cause
>> such a drastically different behavior. I can't really do much else as
>> workarounds, since the GMAC registers are not writable while the GMAC
>> is in reset.
>
> We had a fair bit of churn in intc department in 4.10 and 4.11 but most
> of those were related to the IDU intc found only on HS38x cores, not on
> ARC700. To really narrow down the regression, perhaps try a dirty bisect
> trick (which works for me sometimes). Squash all the Adaptrum changes
> into 1 patch - I presume that same patch applies to 4.9 as to 4.13
> (otherwise u need to improvise). git bisect between 4.9 (good) and
> 4.13-rcx (bad) and patch -p1 < ur-patch at each stage.

I found the culprit, as evidenced in [Exhibit A]. I'm not really sure 
how that code is designed to work, but I'm suspecting before the change, 
the IRQ would get masked on the first hit, but now it's no longer masked.

I have reverted the patch in question on top of my 4.13 development 
branch and I can confirm that the issue is resolved.

Alex


# [Exhibit A]: Git output after two hours of hardcore bisecting:

bf22ff45bed664aefb5c4e43029057a199b7070c is the first bad commit
commit bf22ff45bed664aefb5c4e43029057a199b7070c
Author: Jeffy Chen <jeffy.chen at rock-chips.com>
Date:   Mon Jun 26 19:33:34 2017 +0800

     genirq: Avoid unnecessary low level irq function calls

     Check irq state in enable/disable/unmask/mask_irq to avoid unnecessary
     low level irq function calls.

     This has two advantages:
         - Conditionals are faster than hardware access

         - Solves issues with the underlying refcounting of the pinctrl
           infrastructure

     Suggested-by: Thomas Gleixner <tglx at linutronix.de>
     Signed-off-by: Jeffy Chen <jeffy.chen at rock-chips.com>
     Signed-off-by: Thomas Gleixner <tglx at linutronix.de>
     Cc: tfiga at chromium.org
     Cc: briannorris at chromium.org
     Cc: dianders at chromium.org
     Link: 
http://lkml.kernel.org/r/1498476814-12563-2-git-send-email-jeffy.chen at rock-chips.com

:040000 040000 ec5072725f8be0a3906e949aa0172cb3e00729d6 
27847e81e1c424a62938404fd48bea3c439d74c0 M      kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-08-02 17:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-24 17:38 ARC770: "unexpected IRQ trap at vector 00" during boot Alexandru Gagniuc
2017-07-24 20:39 ` Alexey Brodkin
2017-07-25  4:04   ` Alex
2017-07-25 20:11     ` Alexey Brodkin
2017-07-26  3:08       ` Vineet Gupta
2017-08-01 21:33         ` snps,dwmac interrupt storm (Was: ARC770: "unexpected IRQ trap at vector 00" during boot) Alex
2017-08-02  6:23           ` snps, dwmac " Vineet Gupta
2017-08-02  7:16             ` snps,dwmac " Alexey Brodkin
2017-08-02 17:20             ` snps, dwmac " Alexandru Gagniuc

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.