kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed
* read the memory mapped address - pcie - kernel hangs
@ 2020-01-08 19:00 Muni Sekhar
  2020-01-08 19:45 ` Greg KH
  2020-01-10 11:15 ` Primoz Beltram
  0 siblings, 2 replies; 10+ messages in thread
From: Muni Sekhar @ 2020-01-08 19:00 UTC (permalink / raw)
  To: kernelnewbies

Hi All,

I have module with Xilinx FPGA. It implements UART(s), SPI(s),
parallel I/O and interfaces them to the Host CPU via PCI Express bus.
I see that my system freezes without capturing the crash dump for certain tests.
I debugged this issue and it was tracked down to the ‘readl()’ in
interrupt handler code

In ISR, first reads the Interrupt Status register using ‘readl()’ as
given below.
    status = readl(ctrl->reg + INT_STATUS);

And then clears the pending interrupts using ‘writel()’ as given blow.
        writel(status, ctrl->reg + INT_STATUS);

I've noticed a kernel hang if INT_STATUS register read again after
clearing the pending interrupts.

My system freezes only after executing the same ISR code after
millions of interrupts. Basically reading the memory mapped register
in ISR resulting this behavior.
If I comment “status = readl(ctrl->reg + INT_STATUS);” after clearing
the pending interrupts then system is stable .

As a temporary workaround I avoided reading the INT_STATUS register
after clearing the pending bits, and this code change works fine.

Can someone clarify me why the kernel hangs without crash dump incase
if I read the INT_STATUS register using readl() after
clearing(writel()) the pending bits?

To read the memory mapped IO kernel provides {read}{b,w,l,q}() API’s.
If PCIe card is not responsive , can call to readl() from interrupt
context makes system freeze?

Thanks for any suggestions and solutions to this problem!

Snippet of the ISR code is given blow:
https://pastebin.com/as2tSPwE


static irqreturn_t pcie_isr(int irq, void *data)

{

        struct test_device *ctrl = (struct test_device *)data;

        u32 status;

…



        status = readl(ctrl->reg + INT_STATUS);

        /*

         * Check to see if it was our interrupt

         */

        if (!(status & 0x000C))

                return IRQ_NONE;



        /* Clear the interrupt */

        writel(status, ctrl->reg + INT_STATUS);



        if (status & 0x0004) {

                /*

                 * Tx interrupt pending.

                 */

                 ....

       }



        if (status & 0x0008) {

                /* Rx interrupt Pending */

                /* The system freezes if I read again the INT_STATUS
register as given below */

                status = readl(ctrl->reg + INT_STATUS);

                ....

        }

..

        return IRQ_HANDLED;
}

-- 
Thanks,
Sekhar

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: read the memory mapped address - pcie - kernel hangs
  2020-01-08 19:00 read the memory mapped address - pcie - kernel hangs Muni Sekhar
@ 2020-01-08 19:45 ` Greg KH
  2020-01-09 11:14   ` Muni Sekhar
  2020-01-10 11:15 ` Primoz Beltram
  1 sibling, 1 reply; 10+ messages in thread
From: Greg KH @ 2020-01-08 19:45 UTC (permalink / raw)
  To: Muni Sekhar; +Cc: kernelnewbies

On Thu, Jan 09, 2020 at 12:30:20AM +0530, Muni Sekhar wrote:
> Hi All,
> 
> I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> I see that my system freezes without capturing the crash dump for certain tests.
> I debugged this issue and it was tracked down to the ‘readl()’ in
> interrupt handler code
> 
> In ISR, first reads the Interrupt Status register using ‘readl()’ as
> given below.
>     status = readl(ctrl->reg + INT_STATUS);
> 
> And then clears the pending interrupts using ‘writel()’ as given blow.
>         writel(status, ctrl->reg + INT_STATUS);
> 
> I've noticed a kernel hang if INT_STATUS register read again after
> clearing the pending interrupts.

Why would you read that register again after writing to it?

And are you sure you are reading/writing the correct size of the irq
field?  I thought it was a "word" not "long"?  But that might depend on
your hardware, do you have a pointer to the kernel driver source you are
using for all of this?

thanks,

greg k-h

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: read the memory mapped address - pcie - kernel hangs
  2020-01-08 19:45 ` Greg KH
@ 2020-01-09 11:14   ` Muni Sekhar
  2020-01-09 11:37     ` Greg KH
  0 siblings, 1 reply; 10+ messages in thread
From: Muni Sekhar @ 2020-01-09 11:14 UTC (permalink / raw)
  To: Greg KH; +Cc: kernelnewbies

On Thu, Jan 9, 2020 at 1:15 AM Greg KH <greg@kroah.com> wrote:
>
> On Thu, Jan 09, 2020 at 12:30:20AM +0530, Muni Sekhar wrote:
> > Hi All,
> >
> > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > I see that my system freezes without capturing the crash dump for certain tests.
> > I debugged this issue and it was tracked down to the ‘readl()’ in
> > interrupt handler code
> >
> > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > given below.
> >     status = readl(ctrl->reg + INT_STATUS);
> >
> > And then clears the pending interrupts using ‘writel()’ as given blow.
> >         writel(status, ctrl->reg + INT_STATUS);
> >
> > I've noticed a kernel hang if INT_STATUS register read again after
> > clearing the pending interrupts.
>
> Why would you read that register again after writing to it?
>
> And are you sure you are reading/writing the correct size of the irq
> field?  I thought it was a "word" not "long"?  But that might depend on
> your hardware, do you have a pointer to the kernel driver source you are
> using for all of this?
Actually no need to read that register again. But reading that
register again should not freeze the system, right?

INT_STATUS register is 32-bit width, so readl() API is used(my system
is x86_64, Intel(R) Atom(TM) CPU). Instead of readl(), do I need to
use readw() twice? If so what is reason for this code change?

I’m trying to understand why system freezes without any crash dump
while reading the memory mapped IO from interrupt context?

FPGA code might be buggy, it may not send the completion for Memory
Read request. But CPU should not get stuck at LOAD instruction level..

When it hung, it does not even respond for SYSRQ button(SYSRQ is
enabled – in normal scenario it works), only way to recover is reboot
the system. I enabled almost all the kernel.panic* variables. I set
the kernel.panic to positive, so it should reboot after panic instead
of just hang. But it’s not rebooting by itself. Even 'pstore\ramoops’
also not helped.
After reboot I looked at the kern.log and most of the times it has
“^@^@^@^ ...“ line just before reboot.

Okay, I will write the minimalistic code to reproduce this one and
then share with you guys.

>
> thanks,
>
> greg k-h



-- 
Thanks,
Sekhar

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: read the memory mapped address - pcie - kernel hangs
  2020-01-09 11:14   ` Muni Sekhar
@ 2020-01-09 11:37     ` Greg KH
  2020-01-09 12:20       ` Muni Sekhar
  0 siblings, 1 reply; 10+ messages in thread
From: Greg KH @ 2020-01-09 11:37 UTC (permalink / raw)
  To: Muni Sekhar; +Cc: kernelnewbies

On Thu, Jan 09, 2020 at 04:44:16PM +0530, Muni Sekhar wrote:
> On Thu, Jan 9, 2020 at 1:15 AM Greg KH <greg@kroah.com> wrote:
> >
> > On Thu, Jan 09, 2020 at 12:30:20AM +0530, Muni Sekhar wrote:
> > > Hi All,
> > >
> > > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > > I see that my system freezes without capturing the crash dump for certain tests.
> > > I debugged this issue and it was tracked down to the ‘readl()’ in
> > > interrupt handler code
> > >
> > > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > > given below.
> > >     status = readl(ctrl->reg + INT_STATUS);
> > >
> > > And then clears the pending interrupts using ‘writel()’ as given blow.
> > >         writel(status, ctrl->reg + INT_STATUS);
> > >
> > > I've noticed a kernel hang if INT_STATUS register read again after
> > > clearing the pending interrupts.
> >
> > Why would you read that register again after writing to it?
> >
> > And are you sure you are reading/writing the correct size of the irq
> > field?  I thought it was a "word" not "long"?  But that might depend on
> > your hardware, do you have a pointer to the kernel driver source you are
> > using for all of this?
> Actually no need to read that register again. But reading that
> register again should not freeze the system, right?

It might, depends on your hardware.  Go talk to the hardware vendor if
you have questions about this.

> INT_STATUS register is 32-bit width, so readl() API is used(my system
> is x86_64, Intel(R) Atom(TM) CPU). Instead of readl(), do I need to
> use readw() twice? If so what is reason for this code change?

Ok, if that register is 32 bits, that's fine.  It all depends on your
hardware.

> I’m trying to understand why system freezes without any crash dump
> while reading the memory mapped IO from interrupt context?

Because your hardware locked things up?

> FPGA code might be buggy, it may not send the completion for Memory
> Read request. But CPU should not get stuck at LOAD instruction level..

PCI hardware can do lots of bad things to a system, it _IS_ part of the
memory bus, right?  So of course it can lock the CPU at a read.

> When it hung, it does not even respond for SYSRQ button(SYSRQ is
> enabled – in normal scenario it works), only way to recover is reboot
> the system. I enabled almost all the kernel.panic* variables. I set
> the kernel.panic to positive, so it should reboot after panic instead
> of just hang. But it’s not rebooting by itself. Even 'pstore\ramoops’
> also not helped.
> After reboot I looked at the kern.log and most of the times it has
> “^@^@^@^ ...“ line just before reboot.
> 
> Okay, I will write the minimalistic code to reproduce this one and
> then share with you guys.

What's wrong with the real/full driver source?

And again, why are you trying to read the register twice?

thanks,

greg k-h

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: read the memory mapped address - pcie - kernel hangs
  2020-01-09 11:37     ` Greg KH
@ 2020-01-09 12:20       ` Muni Sekhar
  2020-01-09 18:12         ` Greg KH
  0 siblings, 1 reply; 10+ messages in thread
From: Muni Sekhar @ 2020-01-09 12:20 UTC (permalink / raw)
  To: Greg KH; +Cc: kernelnewbies

On Thu, Jan 9, 2020 at 5:07 PM Greg KH <greg@kroah.com> wrote:
>
> On Thu, Jan 09, 2020 at 04:44:16PM +0530, Muni Sekhar wrote:
> > On Thu, Jan 9, 2020 at 1:15 AM Greg KH <greg@kroah.com> wrote:
> > >
> > > On Thu, Jan 09, 2020 at 12:30:20AM +0530, Muni Sekhar wrote:
> > > > Hi All,
> > > >
> > > > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > > > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > > > I see that my system freezes without capturing the crash dump for certain tests.
> > > > I debugged this issue and it was tracked down to the ‘readl()’ in
> > > > interrupt handler code
> > > >
> > > > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > > > given below.
> > > >     status = readl(ctrl->reg + INT_STATUS);
> > > >
> > > > And then clears the pending interrupts using ‘writel()’ as given blow.
> > > >         writel(status, ctrl->reg + INT_STATUS);
> > > >
> > > > I've noticed a kernel hang if INT_STATUS register read again after
> > > > clearing the pending interrupts.
> > >
> > > Why would you read that register again after writing to it?
> > >
> > > And are you sure you are reading/writing the correct size of the irq
> > > field?  I thought it was a "word" not "long"?  But that might depend on
> > > your hardware, do you have a pointer to the kernel driver source you are
> > > using for all of this?
> > Actually no need to read that register again. But reading that
> > register again should not freeze the system, right?
>
> It might, depends on your hardware.  Go talk to the hardware vendor if
> you have questions about this.
>
> > INT_STATUS register is 32-bit width, so readl() API is used(my system
> > is x86_64, Intel(R) Atom(TM) CPU). Instead of readl(), do I need to
> > use readw() twice? If so what is reason for this code change?
>
> Ok, if that register is 32 bits, that's fine.  It all depends on your
> hardware.
>
> > I’m trying to understand why system freezes without any crash dump
> > while reading the memory mapped IO from interrupt context?
>
> Because your hardware locked things up?
Here hardware means PCI controller on host side or PCI endpoint(FPGA) device?

>
> > FPGA code might be buggy, it may not send the completion for Memory
> > Read request. But CPU should not get stuck at LOAD instruction level..
>
> PCI hardware can do lots of bad things to a system, it _IS_ part of the
> memory bus, right?  So of course it can lock the CPU at a read.
>
> > When it hung, it does not even respond for SYSRQ button(SYSRQ is
> > enabled – in normal scenario it works), only way to recover is reboot
> > the system. I enabled almost all the kernel.panic* variables. I set
> > the kernel.panic to positive, so it should reboot after panic instead
> > of just hang. But it’s not rebooting by itself. Even 'pstore\ramoops’
> > also not helped.
> > After reboot I looked at the kern.log and most of the times it has
> > “^@^@^@^ ...“ line just before reboot.
> >
> > Okay, I will write the minimalistic code to reproduce this one and
> > then share with you guys.
>
> What's wrong with the real/full driver source?
>
> And again, why are you trying to read the register twice?

I’m not the original author of this driver, so no idea why it
implemented like that. May be to verify the register contents after
clearing the bits…


>
> thanks,
>
> greg k-h



-- 
Thanks,
Sekhar

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: read the memory mapped address - pcie - kernel hangs
  2020-01-09 12:20       ` Muni Sekhar
@ 2020-01-09 18:12         ` Greg KH
  0 siblings, 0 replies; 10+ messages in thread
From: Greg KH @ 2020-01-09 18:12 UTC (permalink / raw)
  To: Muni Sekhar; +Cc: kernelnewbies

On Thu, Jan 09, 2020 at 05:50:30PM +0530, Muni Sekhar wrote:
> On Thu, Jan 9, 2020 at 5:07 PM Greg KH <greg@kroah.com> wrote:
> >
> > On Thu, Jan 09, 2020 at 04:44:16PM +0530, Muni Sekhar wrote:
> > > On Thu, Jan 9, 2020 at 1:15 AM Greg KH <greg@kroah.com> wrote:
> > > >
> > > > On Thu, Jan 09, 2020 at 12:30:20AM +0530, Muni Sekhar wrote:
> > > > > Hi All,
> > > > >
> > > > > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > > > > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > > > > I see that my system freezes without capturing the crash dump for certain tests.
> > > > > I debugged this issue and it was tracked down to the ‘readl()’ in
> > > > > interrupt handler code
> > > > >
> > > > > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > > > > given below.
> > > > >     status = readl(ctrl->reg + INT_STATUS);
> > > > >
> > > > > And then clears the pending interrupts using ‘writel()’ as given blow.
> > > > >         writel(status, ctrl->reg + INT_STATUS);
> > > > >
> > > > > I've noticed a kernel hang if INT_STATUS register read again after
> > > > > clearing the pending interrupts.
> > > >
> > > > Why would you read that register again after writing to it?
> > > >
> > > > And are you sure you are reading/writing the correct size of the irq
> > > > field?  I thought it was a "word" not "long"?  But that might depend on
> > > > your hardware, do you have a pointer to the kernel driver source you are
> > > > using for all of this?
> > > Actually no need to read that register again. But reading that
> > > register again should not freeze the system, right?
> >
> > It might, depends on your hardware.  Go talk to the hardware vendor if
> > you have questions about this.
> >
> > > INT_STATUS register is 32-bit width, so readl() API is used(my system
> > > is x86_64, Intel(R) Atom(TM) CPU). Instead of readl(), do I need to
> > > use readw() twice? If so what is reason for this code change?
> >
> > Ok, if that register is 32 bits, that's fine.  It all depends on your
> > hardware.
> >
> > > I’m trying to understand why system freezes without any crash dump
> > > while reading the memory mapped IO from interrupt context?
> >
> > Because your hardware locked things up?
> Here hardware means PCI controller on host side or PCI endpoint(FPGA) device?

Your PCI endpoint device.

> > > FPGA code might be buggy, it may not send the completion for Memory
> > > Read request. But CPU should not get stuck at LOAD instruction level..
> >
> > PCI hardware can do lots of bad things to a system, it _IS_ part of the
> > memory bus, right?  So of course it can lock the CPU at a read.
> >
> > > When it hung, it does not even respond for SYSRQ button(SYSRQ is
> > > enabled – in normal scenario it works), only way to recover is reboot
> > > the system. I enabled almost all the kernel.panic* variables. I set
> > > the kernel.panic to positive, so it should reboot after panic instead
> > > of just hang. But it’s not rebooting by itself. Even 'pstore\ramoops’
> > > also not helped.
> > > After reboot I looked at the kern.log and most of the times it has
> > > “^@^@^@^ ...“ line just before reboot.
> > >
> > > Okay, I will write the minimalistic code to reproduce this one and
> > > then share with you guys.
> >
> > What's wrong with the real/full driver source?
> >
> > And again, why are you trying to read the register twice?
> 
> I’m not the original author of this driver, so no idea why it
> implemented like that. May be to verify the register contents after
> clearing the bits…

That sounds really really odd.

Normal way to handle PCI irqs are:
	- read status register to see if this is your irq or not
	- write irq was handled bit back

and that's it.

Again, pointers to your source code would be appreciated, but you might
want to just go ask the people who wrote the driver and who made the
hardware, as this sounds like their issue, not ours :)

good luck!

greg k-h

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: read the memory mapped address - pcie - kernel hangs
  2020-01-08 19:00 read the memory mapped address - pcie - kernel hangs Muni Sekhar
  2020-01-08 19:45 ` Greg KH
@ 2020-01-10 11:15 ` Primoz Beltram
  2020-01-10 14:58   ` Muni Sekhar
  1 sibling, 1 reply; 10+ messages in thread
From: Primoz Beltram @ 2020-01-10 11:15 UTC (permalink / raw)
  To: Muni Sekhar; +Cc: kernelnewbies

Hi,
Have read also other replays to this topic.
I have seen-debug such deadlock problems with FPGA based PCIe endpoint 
devices (Xilinx chips) and usually (if not signal integrity problems), 
the problem was in wrong AXI master/slave bus handling in FPGA design.
I guess you have FPGA Xilinx PCIe endpoint IP core attached as AXI 
master to FPGA internal AXI bus (access to AXI slaves inside FPGA design).
If FPGA code in your design does not handle correctly AXI master 
read/write requests, e.g. FPGA AXI slave does not generate bus ACK in 
correct way, the PCIe bus will stay locked (no PCIe completion sent 
back), resulting in complete system lock. Some PCIe root chips have 
diagnostic LEDs to help decode PCIe problems.
 From your notice about doing two 32bit reads on 64bit CPU, I would 
guess the problem is in handling AXI transfer size signals in FPGA slave 
code.
I would suggest you to check the code in FPGA design. You can use FPGA 
test bench simulation to check the behaviour of PCIe endpoint originated 
AXI read/write requests.
Xilinx provides test bench simulation code for their PCIe IP's.
They provide also PCIe root port model, so you can simulate AXI 
read/writes accesses as they would come from CPU I/O memory requests via 
PCIe TLPs.
WBR Primoz

On 8. 01. 20 20:00, Muni Sekhar wrote:
> Hi All,
>
> I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> I see that my system freezes without capturing the crash dump for certain tests.
> I debugged this issue and it was tracked down to the ‘readl()’ in
> interrupt handler code
>
> In ISR, first reads the Interrupt Status register using ‘readl()’ as
> given below.
>      status = readl(ctrl->reg + INT_STATUS);
>
> And then clears the pending interrupts using ‘writel()’ as given blow.
>          writel(status, ctrl->reg + INT_STATUS);
>
> I've noticed a kernel hang if INT_STATUS register read again after
> clearing the pending interrupts.
>
> My system freezes only after executing the same ISR code after
> millions of interrupts. Basically reading the memory mapped register
> in ISR resulting this behavior.
> If I comment “status = readl(ctrl->reg + INT_STATUS);” after clearing
> the pending interrupts then system is stable .
>
> As a temporary workaround I avoided reading the INT_STATUS register
> after clearing the pending bits, and this code change works fine.
>
> Can someone clarify me why the kernel hangs without crash dump incase
> if I read the INT_STATUS register using readl() after
> clearing(writel()) the pending bits?
>
> To read the memory mapped IO kernel provides {read}{b,w,l,q}() API’s.
> If PCIe card is not responsive , can call to readl() from interrupt
> context makes system freeze?
>
> Thanks for any suggestions and solutions to this problem!
>
> Snippet of the ISR code is given blow:
> https://pastebin.com/as2tSPwE
>
>
> static irqreturn_t pcie_isr(int irq, void *data)
>
> {
>
>          struct test_device *ctrl = (struct test_device *)data;
>
>          u32 status;
>
> …
>
>
>
>          status = readl(ctrl->reg + INT_STATUS);
>
>          /*
>
>           * Check to see if it was our interrupt
>
>           */
>
>          if (!(status & 0x000C))
>
>                  return IRQ_NONE;
>
>
>
>          /* Clear the interrupt */
>
>          writel(status, ctrl->reg + INT_STATUS);
>
>
>
>          if (status & 0x0004) {
>
>                  /*
>
>                   * Tx interrupt pending.
>
>                   */
>
>                   ....
>
>         }
>
>
>
>          if (status & 0x0008) {
>
>                  /* Rx interrupt Pending */
>
>                  /* The system freezes if I read again the INT_STATUS
> register as given below */
>
>                  status = readl(ctrl->reg + INT_STATUS);
>
>                  ....
>
>          }
>
> ..
>
>          return IRQ_HANDLED;
> }
>


_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: read the memory mapped address - pcie - kernel hangs
  2020-01-10 11:15 ` Primoz Beltram
@ 2020-01-10 14:58   ` Muni Sekhar
  2020-01-10 23:03     ` Onur Atilla
  0 siblings, 1 reply; 10+ messages in thread
From: Muni Sekhar @ 2020-01-10 14:58 UTC (permalink / raw)
  To: primoz.beltram; +Cc: kernelnewbies

On Fri, Jan 10, 2020 at 4:46 PM Primoz Beltram <primoz.beltram@kate.si> wrote:
>
> Hi,
> Have read also other replays to this topic.
> I have seen-debug such deadlock problems with FPGA based PCIe endpoint
> devices (Xilinx chips) and usually (if not signal integrity problems),
> the problem was in wrong AXI master/slave bus handling in FPGA design.
> I guess you have FPGA Xilinx PCIe endpoint IP core attached as AXI
> master to FPGA internal AXI bus (access to AXI slaves inside FPGA design).
> If FPGA code in your design does not handle correctly AXI master
> read/write requests, e.g. FPGA AXI slave does not generate bus ACK in
> correct way, the PCIe bus will stay locked (no PCIe completion sent
> back), resulting in complete system lock. Some PCIe root chips have
> diagnostic LEDs to help decode PCIe problems.
>  From your notice about doing two 32bit reads on 64bit CPU, I would
> guess the problem is in handling AXI transfer size signals in FPGA slave
> code.
> I would suggest you to check the code in FPGA design. You can use FPGA
> test bench simulation to check the behaviour of PCIe endpoint originated
> AXI read/write requests.
> Xilinx provides test bench simulation code for their PCIe IP's.
> They provide also PCIe root port model, so you can simulate AXI
> read/writes accesses as they would come from CPU I/O memory requests via
> PCIe TLPs.
Thank you so much for sharing valuable information, will work on this.

> WBR Primoz
>
> On 8. 01. 20 20:00, Muni Sekhar wrote:
> > Hi All,
> >
> > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > I see that my system freezes without capturing the crash dump for certain tests.
> > I debugged this issue and it was tracked down to the ‘readl()’ in
> > interrupt handler code
> >
> > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > given below.
> >      status = readl(ctrl->reg + INT_STATUS);
> >
> > And then clears the pending interrupts using ‘writel()’ as given blow.
> >          writel(status, ctrl->reg + INT_STATUS);
> >
> > I've noticed a kernel hang if INT_STATUS register read again after
> > clearing the pending interrupts.
> >
> > My system freezes only after executing the same ISR code after
> > millions of interrupts. Basically reading the memory mapped register
> > in ISR resulting this behavior.
> > If I comment “status = readl(ctrl->reg + INT_STATUS);” after clearing
> > the pending interrupts then system is stable .
> >
> > As a temporary workaround I avoided reading the INT_STATUS register
> > after clearing the pending bits, and this code change works fine.
> >
> > Can someone clarify me why the kernel hangs without crash dump incase
> > if I read the INT_STATUS register using readl() after
> > clearing(writel()) the pending bits?
> >
> > To read the memory mapped IO kernel provides {read}{b,w,l,q}() API’s.
> > If PCIe card is not responsive , can call to readl() from interrupt
> > context makes system freeze?
> >
> > Thanks for any suggestions and solutions to this problem!
> >
> > Snippet of the ISR code is given blow:
> > https://pastebin.com/as2tSPwE
> >
> >
> > static irqreturn_t pcie_isr(int irq, void *data)
> >
> > {
> >
> >          struct test_device *ctrl = (struct test_device *)data;
> >
> >          u32 status;
> >
> > …
> >
> >
> >
> >          status = readl(ctrl->reg + INT_STATUS);
> >
> >          /*
> >
> >           * Check to see if it was our interrupt
> >
> >           */
> >
> >          if (!(status & 0x000C))
> >
> >                  return IRQ_NONE;
> >
> >
> >
> >          /* Clear the interrupt */
> >
> >          writel(status, ctrl->reg + INT_STATUS);
> >
> >
> >
> >          if (status & 0x0004) {
> >
> >                  /*
> >
> >                   * Tx interrupt pending.
> >
> >                   */
> >
> >                   ....
> >
> >         }
> >
> >
> >
> >          if (status & 0x0008) {
> >
> >                  /* Rx interrupt Pending */
> >
> >                  /* The system freezes if I read again the INT_STATUS
> > register as given below */
> >
> >                  status = readl(ctrl->reg + INT_STATUS);
> >
> >                  ....
> >
> >          }
> >
> > ..
> >
> >          return IRQ_HANDLED;
> > }
> >
>


-- 
Thanks,
Sekhar

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: read the memory mapped address - pcie - kernel hangs
  2020-01-10 14:58   ` Muni Sekhar
@ 2020-01-10 23:03     ` Onur Atilla
  2020-01-11  3:13       ` Muni Sekhar
  0 siblings, 1 reply; 10+ messages in thread
From: Onur Atilla @ 2020-01-10 23:03 UTC (permalink / raw)
  To: Muni Sekhar; +Cc: kernelnewbies, primoz.beltram


[-- Attachment #1.1.1: Type: text/plain, Size: 1893 bytes --]

On 10.01.20 15:58, Muni Sekhar wrote:
> On Fri, Jan 10, 2020 at 4:46 PM Primoz Beltram <primoz.beltram@kate.si> wrote:
>>
>> Hi,
>> Have read also other replays to this topic.
>> I have seen-debug such deadlock problems with FPGA based PCIe endpoint
>> devices (Xilinx chips) and usually (if not signal integrity problems),
>> the problem was in wrong AXI master/slave bus handling in FPGA design.
>> I guess you have FPGA Xilinx PCIe endpoint IP core attached as AXI
>> master to FPGA internal AXI bus (access to AXI slaves inside FPGA design).
>> If FPGA code in your design does not handle correctly AXI master
>> read/write requests, e.g. FPGA AXI slave does not generate bus ACK in
>> correct way, the PCIe bus will stay locked (no PCIe completion sent
>> back), resulting in complete system lock. Some PCIe root chips have
>> diagnostic LEDs to help decode PCIe problems.
>>  From your notice about doing two 32bit reads on 64bit CPU, I would
>> guess the problem is in handling AXI transfer size signals in FPGA slave
>> code.
>> I would suggest you to check the code in FPGA design. You can use FPGA
>> test bench simulation to check the behaviour of PCIe endpoint originated
>> AXI read/write requests.
>> Xilinx provides test bench simulation code for their PCIe IP's.
>> They provide also PCIe root port model, so you can simulate AXI
>> read/writes accesses as they would come from CPU I/O memory requests via
>> PCIe TLPs.
> Thank you so much for sharing valuable information, will work on this.
> 
>> WBR Primoz

Hi,

you may also want to have a look at the AXI Timeout Block (ATB) to
prevent system/core locks due to a missing ACK of a slave. If given by
the HW, ATB generates an alternative response in case the slave fails to
respond within a given time. It may also trigger an interrupt to help
handle/debug the error.

Regards,
Onur


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: read the memory mapped address - pcie - kernel hangs
  2020-01-10 23:03     ` Onur Atilla
@ 2020-01-11  3:13       ` Muni Sekhar
  0 siblings, 0 replies; 10+ messages in thread
From: Muni Sekhar @ 2020-01-11  3:13 UTC (permalink / raw)
  To: Onur Atilla; +Cc: kernelnewbies, primoz.beltram


[-- Attachment #1.1: Type: text/plain, Size: 2033 bytes --]

Thanks, I'll check it out.

On Sat, 11 Jan, 2020, 4:33 AM Onur Atilla, <onurati@posteo.de> wrote:

> On 10.01.20 15:58, Muni Sekhar wrote:
> > On Fri, Jan 10, 2020 at 4:46 PM Primoz Beltram <primoz.beltram@kate.si>
> wrote:
> >>
> >> Hi,
> >> Have read also other replays to this topic.
> >> I have seen-debug such deadlock problems with FPGA based PCIe endpoint
> >> devices (Xilinx chips) and usually (if not signal integrity problems),
> >> the problem was in wrong AXI master/slave bus handling in FPGA design.
> >> I guess you have FPGA Xilinx PCIe endpoint IP core attached as AXI
> >> master to FPGA internal AXI bus (access to AXI slaves inside FPGA
> design).
> >> If FPGA code in your design does not handle correctly AXI master
> >> read/write requests, e.g. FPGA AXI slave does not generate bus ACK in
> >> correct way, the PCIe bus will stay locked (no PCIe completion sent
> >> back), resulting in complete system lock. Some PCIe root chips have
> >> diagnostic LEDs to help decode PCIe problems.
> >>  From your notice about doing two 32bit reads on 64bit CPU, I would
> >> guess the problem is in handling AXI transfer size signals in FPGA slave
> >> code.
> >> I would suggest you to check the code in FPGA design. You can use FPGA
> >> test bench simulation to check the behaviour of PCIe endpoint originated
> >> AXI read/write requests.
> >> Xilinx provides test bench simulation code for their PCIe IP's.
> >> They provide also PCIe root port model, so you can simulate AXI
> >> read/writes accesses as they would come from CPU I/O memory requests via
> >> PCIe TLPs.
> > Thank you so much for sharing valuable information, will work on this.
> >
> >> WBR Primoz
>
> Hi,
>
> you may also want to have a look at the AXI Timeout Block (ATB) to
> prevent system/core locks due to a missing ACK of a slave. If given by
> the HW, ATB generates an alternative response in case the slave fails to
> respond within a given time. It may also trigger an interrupt to help
> handle/debug the error.
>
> Regards,
> Onur
>
>

[-- Attachment #1.2: Type: text/html, Size: 2658 bytes --]

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-01-11  3:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-08 19:00 read the memory mapped address - pcie - kernel hangs Muni Sekhar
2020-01-08 19:45 ` Greg KH
2020-01-09 11:14   ` Muni Sekhar
2020-01-09 11:37     ` Greg KH
2020-01-09 12:20       ` Muni Sekhar
2020-01-09 18:12         ` Greg KH
2020-01-10 11:15 ` Primoz Beltram
2020-01-10 14:58   ` Muni Sekhar
2020-01-10 23:03     ` Onur Atilla
2020-01-11  3:13       ` Muni Sekhar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).