linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* FPGA device behaves strangely with Linux
@ 2019-11-04 14:35 Ranran
  2019-11-18 22:50 ` Bjorn Helgaas
  0 siblings, 1 reply; 2+ messages in thread
From: Ranran @ 2019-11-04 14:35 UTC (permalink / raw)
  To: linux-pci

Hello,

I use x86 device with FPGA device.
The FPGA device acts strangely with Linux, while with vother OS on
same HW there is no issue.
The Other PCIe device acts find without any issues.
Doing lspci after reset, sometimes the device appear and other times
not enumerated at all.

After reset it is almost always missing.
Then I force rescan several times, until it appears in lspci:
03:00.0 RAM memory: Xilinx Corporation Default PCIe endpoint ID (rev ff)

After it appears there is still inconsistency when reading
configuration BAR with lspci -vv:
03:00.0 RAM memory: Xilinx Corporation Default PCIe endpoint ID
        Subsystem: Xilinx Corporation Default PCIe endpoint ID
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 255
        Region 0: [virtual] Memory at 91500000 (32-bit,
non-prefetchable) [size=1M]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [58] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 1, Latency L0s
<64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+
FLReset- SlotPowerLimit 10.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq-
AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s,
Exit Latency L0s unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train-
SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00

[root@localhost ~]# lspci -xx -s 03:00.00
03:00.0 RAM memory: Xilinx Corporation Default PCIe endpoint ID
00: ee 10 07 00 00 00 10 00 00 00 00 05 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00


Other tries of reading (without reseting in between):
=========
03:00.0 RAM memory: Xilinx Corporation Default PCIe endpoint ID (rev
ff) (prog-if ff)
        !!! Unknown header type 7f

root@localhost ~]# lspci -xx -s 03:00.00
03:00.0 RAM memory: Xilinx Corporation Default PCIe endpoint ID (rev ff)
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff


[root@localhost pcimem]# ./pcimem
/sys/bus/pci/devices/0000\:03\:00.0/resource0  0 w*100
/sys/bus/pci/devices/0000:03:00.0/resource0 opened.
Target offset is 0x0, page size is 4096
mmap(0, 4096, 0x3, 0x1, 3, 0x0)
PCI Memory mapped to address 0x7faf91256000.
0x0000: 0xFFFFFFFF
...

The BIOS is also different between Linux and the other OS on same HW.

Any idea what configuration can cause this behavior ?

Thank you,
Ran

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: FPGA device behaves strangely with Linux
  2019-11-04 14:35 FPGA device behaves strangely with Linux Ranran
@ 2019-11-18 22:50 ` Bjorn Helgaas
  0 siblings, 0 replies; 2+ messages in thread
From: Bjorn Helgaas @ 2019-11-18 22:50 UTC (permalink / raw)
  To: Ranran; +Cc: linux-pci, linux-fpga, Alex Williamson

[+cc Alex, linux-fpga]

Hi Ran, sorry for the delay; I overlooked this until now.

On Mon, Nov 04, 2019 at 04:35:00PM +0200, Ranran wrote:
> Hello,
> 
> I use x86 device with FPGA device.
> The FPGA device acts strangely with Linux, while with vother OS on
> same HW there is no issue.
> The Other PCIe device acts find without any issues.

If I understand correctly, there is no issue with other PCIe devices
in the system, but the FPGA device doesn't work correctly.

How long does it take the FPGA to initialize after reset?

> Doing lspci after reset, sometimes the device appear and other times
> not enumerated at all.

How are you doing the reset?  Writing to "1"  to
/sys/bus/pci/devices/.../reset?  Can you tell what kind of reset we're
doing (maybe you can instrument __pci_reset_function_locked()?)

It's possible we're missing a delay after doing the reset and before
restoring the device state, e.g., before calling pci_dev_restore() in
pci_reset_function().  You could add "msleep(5000)" there to see if it
makes any difference.

If we do config reads or MMIO reads to a device too soon after reset
and the device isn't ready to respond yet, we'll get 0xffffffff data,
which could explain some of what you're seeing.

> After reset it is almost always missing.
> Then I force rescan several times, until it appears in lspci:
> 03:00.0 RAM memory: Xilinx Corporation Default PCIe endpoint ID (rev ff)
> 
> After it appears there is still inconsistency when reading
> configuration BAR with lspci -vv:

What specific inconsistencies are you seeing here?  I see that "lspci
-xx" doesn't show anything in BAR 0, but the "lspci -vv" says it's
"virtual", which means it isn't a real BAR (I can't remember exactly
what it *does* mean, but the lspci source would tell you).

> 03:00.0 RAM memory: Xilinx Corporation Default PCIe endpoint ID
>         Subsystem: Xilinx Corporation Default PCIe endpoint ID
>         Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 255
>         Region 0: [virtual] Memory at 91500000 (32-bit,
> non-prefetchable) [size=1M]
>         Capabilities: [40] Power Management version 3
>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
>                 Address: 0000000000000000  Data: 0000
>         Capabilities: [58] Express (v1) Endpoint, MSI 00
>                 DevCap: MaxPayload 256 bytes, PhantFunc 1, Latency L0s
> <64ns, L1 <1us
>                         ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+
> FLReset- SlotPowerLimit 10.000W
>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
> Unsupported-
>                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>                         MaxPayload 128 bytes, MaxReadReq 512 bytes
>                 DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq-
> AuxPwr- TransPend-
>                 LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s,
> Exit Latency L0s unlimited
>                         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train-
> SlotClk+ DLActive- BWMgmt- ABWMgmt-
>         Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00
> 
> [root@localhost ~]# lspci -xx -s 03:00.00
> 03:00.0 RAM memory: Xilinx Corporation Default PCIe endpoint ID
> 00: ee 10 07 00 00 00 10 00 00 00 00 05 00 00 00 00
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
> 
> 
> Other tries of reading (without reseting in between):
> =========
> 03:00.0 RAM memory: Xilinx Corporation Default PCIe endpoint ID (rev
> ff) (prog-if ff)
>         !!! Unknown header type 7f
> 
> root@localhost ~]# lspci -xx -s 03:00.00
> 03:00.0 RAM memory: Xilinx Corporation Default PCIe endpoint ID (rev ff)
> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 
> 
> [root@localhost pcimem]# ./pcimem
> /sys/bus/pci/devices/0000\:03\:00.0/resource0  0 w*100
> /sys/bus/pci/devices/0000:03:00.0/resource0 opened.
> Target offset is 0x0, page size is 4096
> mmap(0, 4096, 0x3, 0x1, 3, 0x0)
> PCI Memory mapped to address 0x7faf91256000.
> 0x0000: 0xFFFFFFFF
> ...
> 
> The BIOS is also different between Linux and the other OS on same HW.

Not sure what this means.  Are you saying you need a different BIOS to
run Linux than the BIOS you need for the other OS?

> Any idea what configuration can cause this behavior ?

Most likely the device isn't responding for some reason, and we get ~0
data (0xffffffff) in that error case.

Bjorn

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-11-18 22:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-04 14:35 FPGA device behaves strangely with Linux Ranran
2019-11-18 22:50 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).