From: Bjorn Helgaas <helgaas@kernel.org>
To: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Cc: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
Alex Williamson <alex.williamson@redhat.com>,
Leon Romanovsky <leon@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>
Subject: Re: Write to srvio_numvfs triggers kernel panic
Date: Mon, 9 May 2022 11:49:29 -0500 [thread overview]
Message-ID: <20220509164929.GA602900@bhelgaas> (raw)
In-Reply-To: <87ee14l1tx.fsf@epam.com>
On Sun, May 08, 2022 at 11:07:40AM +0000, Volodymyr Babchuk wrote:
> I had another crash in nvme_pci_enable(), for which I made quick
> workaround. And now yeah, it looks like I have some issues with
> my root complex HW:
Please point to the root complex issue you see.
> 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a824 (prog-if 02 [NVM Express])
> Subsystem: Samsung Electronics Co Ltd Device a809
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0
> Interrupt: pin A routed to IRQ 0
> NUMA node: 0
> Region 0: Memory at 30010000 (64-bit, non-prefetchable) [size=32K]
> Expansion ROM at 30000000 [virtual] [disabled] [size=64K]
> Capabilities: [40] Power Management version 3
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [70] Express (v2) Endpoint, MSI 00
> DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
> LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM not supported
> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded)
> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR-
> 10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
> EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
> FRS-, TPHComp-, ExtTPHComp-
> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> AtomicOpsCtl: ReqEn-
> LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> Compliance De-emphasis: -6dB
> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> Capabilities: [b0] MSI-X: Enable- Count=64 Masked-
> Vector table: BAR=0 offset=00004000
> PBA: BAR=0 offset=00003000
> Capabilities: [100 v2] Advanced Error Reporting
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
> HeaderLog: 00000000 00000000 00000000 00000000
> Capabilities: [148 v1] Device Serial Number d3-42-50-11-99-38-25-00
> Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI)
> ARICap: MFVC- ACS-, Next Function: 0
> ARICtl: MFVC- ACS-, Function Group: 0
> Capabilities: [178 v1] Secondary PCI Express
> LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
> LaneErrStat: 0
> Capabilities: [198 v1] Physical Layer 16.0 GT/s <?>
> Capabilities: [1c0 v1] Lane Margining at the Receiver <?>
> Capabilities: [1e8 v1] Single Root I/O Virtualization (SR-IOV)
> IOVCap: Migration-, Interrupt Message Number: 000
> IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy-
> IOVSta: Migration-
> Initial VFs: 32, Total VFs: 32, Number of VFs: 1, Function Dependency Link: 00
> VF offset: 2, stride: 1, Device ID: a824
> Supported Page Size: 00000553, System Page Size: 00000001
> Region 0: Memory at 0000000030018000 (64-bit, non-prefetchable)
> VF Migration: offset: 00000000, BIR: 0
> Capabilities: [3a4 v1] Data Link Feature <?>
> Kernel driver in use: nvme
> Kernel modules: nvme
>
> 01:00.2 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a824 (prog-if 02 [NVM Express])
> Subsystem: Samsung Electronics Co Ltd Device a809
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0
> Interrupt: pin A routed to IRQ 0
> NUMA node: 0
> Region 0: Memory at 30018000 (64-bit, non-prefetchable) [size=32K]
> Capabilities: [40] Power Management version 3
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [70] Express (v2) Endpoint, MSI 00
> DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
> LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM not supported
> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded)
> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR-
> 10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
> EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
> FRS-, TPHComp-, ExtTPHComp-
> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> AtomicOpsCtl: ReqEn-
> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> Capabilities: [b0] MSI-X: Enable- Count=64 Masked-
> Vector table: BAR=0 offset=00004000
> PBA: BAR=0 offset=00003000
> Capabilities: [100 v2] Advanced Error Reporting
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
> HeaderLog: 00000000 00000000 00000000 00000000
> Capabilities: [148 v1] Device Serial Number d3-42-50-11-99-38-25-00
> Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI)
> ARICap: MFVC- ACS-, Next Function: 0
> ARICtl: MFVC- ACS-, Function Group: 0
> Capabilities: [178 v1] Secondary PCI Express
> LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
> LaneErrStat: 0
> Capabilities: [198 v1] Physical Layer 16.0 GT/s <?>
> Capabilities: [1c0 v1] Lane Margining at the Receiver <?>
> Capabilities: [1e8 v1] Single Root I/O Virtualization (SR-IOV)
> IOVCap: Migration-, Interrupt Message Number: 000
> IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy-
> IOVSta: Migration-
> Initial VFs: 32, Total VFs: 32, Number of VFs: 1, Function Dependency Link: 00
> VF offset: 2, stride: 1, Device ID: a824
> Supported Page Size: 00000553, System Page Size: 00000001
> Region 0: Memory at 0000000030018000 (64-bit, non-prefetchable)
> VF Migration: offset: 00000000, BIR: 0
> Capabilities: [3a4 v1] Data Link Feature <?>
> Kernel modules: nvme
>
> As you can see, output for func 0 and func 2 is identical, so yeah,
> looks like my system reads config space for func 0 in both cases.
They are not identical:
01:00.0 Non-Volatile memory controller
Region 0: Memory at 30010000
01:00.2 Non-Volatile memory controller
Region 0: Memory at 30018000
> On other hand, I'm wondering if it is correct to have both is_virtfn and
> is_physfn in the first place, as there can 4 combinations and only two
> (or three?) of them are valid. Maybe it is worth to replace them with
> enum?
Good question. I think there was a reason, but I can't remember it
right now.
Bjorn
next prev parent reply other threads:[~2022-05-09 16:49 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-04 19:56 Write to srvio_numvfs triggers kernel panic Volodymyr Babchuk
2022-05-06 20:17 ` Bjorn Helgaas
2022-05-07 1:34 ` Jason Gunthorpe
2022-05-07 10:25 ` Volodymyr Babchuk
2022-05-08 11:19 ` Leon Romanovsky
2022-05-09 18:22 ` Keith Busch
2022-05-07 10:22 ` Volodymyr Babchuk
2022-05-07 15:41 ` Bjorn Helgaas
2022-05-08 11:07 ` Volodymyr Babchuk
2022-05-09 16:49 ` Bjorn Helgaas [this message]
2022-05-09 16:58 ` Alex Williamson
2022-05-10 6:39 ` Christoph Hellwig
2022-05-10 17:37 ` Bjorn Helgaas
2022-05-12 7:18 ` Volodymyr Babchuk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220509164929.GA602900@bhelgaas \
--to=helgaas@kernel.org \
--cc=Volodymyr_Babchuk@epam.com \
--cc=alex.williamson@redhat.com \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).