All of lore.kernel.org
 help / color / mirror / Atom feed
* blktests block/019 lead system hang
       [not found] <838678680.4693215.1527664726174.JavaMail.zimbra@redhat.com>
@ 2018-05-30  7:26   ` Yi Zhang
  0 siblings, 0 replies; 18+ messages in thread
From: Yi Zhang @ 2018-05-30  7:26 UTC (permalink / raw)
  To: Keith Busch; +Cc: ming.lei, linux-nvme, linux-block, osandov

Hi Keith
I found blktest block/019 also can lead my NVMe server hang with 4.17.0-rc7, let me know if you need more info, thanks. 

Server: Dell R730xd
NVMe SSD: 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 172X (rev 01)

Console log:
Kernel 4.17.0-rc7 on an x86_64

storageqe-62 login: [ 6043.121834] run blktests block/019 at 2018-05-30 03:16:34
[ 6049.108476] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
[ 6049.108478] {1}[Hardware Error]: event severity: fatal
[ 6049.108479] {1}[Hardware Error]:  Error 0, type: fatal
[ 6049.108481] {1}[Hardware Error]:   section_type: PCIe error
[ 6049.108482] {1}[Hardware Error]:   port_type: 6, downstream switch port
[ 6049.108483] {1}[Hardware Error]:   version: 1.16
[ 6049.108484] {1}[Hardware Error]:   command: 0x0407, status: 0x0010
[ 6049.108485] {1}[Hardware Error]:   device_id: 0000:83:05.0
[ 6049.108486] {1}[Hardware Error]:   slot: 0
[ 6049.108487] {1}[Hardware Error]:   secondary_bus: 0x85
[ 6049.108488] {1}[Hardware Error]:   vendor_id: 0x10b5, device_id: 0x8734
[ 6049.108489] {1}[Hardware Error]:   class_code: 000406
[ 6049.108489] {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0003
[ 6049.108491] Kernel panic - not syncing: Fatal hardware error!
[ 6049.108514] Kernel Offset: 0x25800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)


Best Regards,
  Yi Zhang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* blktests block/019 lead system hang
@ 2018-05-30  7:26   ` Yi Zhang
  0 siblings, 0 replies; 18+ messages in thread
From: Yi Zhang @ 2018-05-30  7:26 UTC (permalink / raw)


Hi Keith
I found blktest block/019 also can lead my NVMe server hang with 4.17.0-rc7, let me know if you need more info, thanks. 

Server: Dell R730xd
NVMe SSD: 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 172X (rev 01)

Console log:
Kernel 4.17.0-rc7 on an x86_64

storageqe-62 login: [ 6043.121834] run blktests block/019 at 2018-05-30 03:16:34
[ 6049.108476] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
[ 6049.108478] {1}[Hardware Error]: event severity: fatal
[ 6049.108479] {1}[Hardware Error]:  Error 0, type: fatal
[ 6049.108481] {1}[Hardware Error]:   section_type: PCIe error
[ 6049.108482] {1}[Hardware Error]:   port_type: 6, downstream switch port
[ 6049.108483] {1}[Hardware Error]:   version: 1.16
[ 6049.108484] {1}[Hardware Error]:   command: 0x0407, status: 0x0010
[ 6049.108485] {1}[Hardware Error]:   device_id: 0000:83:05.0
[ 6049.108486] {1}[Hardware Error]:   slot: 0
[ 6049.108487] {1}[Hardware Error]:   secondary_bus: 0x85
[ 6049.108488] {1}[Hardware Error]:   vendor_id: 0x10b5, device_id: 0x8734
[ 6049.108489] {1}[Hardware Error]:   class_code: 000406
[ 6049.108489] {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0003
[ 6049.108491] Kernel panic - not syncing: Fatal hardware error!
[ 6049.108514] Kernel Offset: 0x25800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)


Best Regards,
  Yi Zhang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: blktests block/019 lead system hang
  2018-05-30  7:26   ` Yi Zhang
@ 2018-06-05 16:18     ` Keith Busch
  -1 siblings, 0 replies; 18+ messages in thread
From: Keith Busch @ 2018-06-05 16:18 UTC (permalink / raw)
  To: Yi Zhang; +Cc: Keith Busch, linux-block, osandov, linux-nvme, ming.lei

On Wed, May 30, 2018 at 03:26:54AM -0400, Yi Zhang wrote:
> Hi Keith
> I found blktest block/019 also can lead my NVMe server hang with 4.17.0-rc7, let me know if you need more info, thanks. 
> 
> Server: Dell R730xd
> NVMe SSD: 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 172X (rev 01)
> 
> Console log:
> Kernel 4.17.0-rc7 on an x86_64
> 
> storageqe-62 login: [ 6043.121834] run blktests block/019 at 2018-05-30 03:16:34
> [ 6049.108476] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
> [ 6049.108478] {1}[Hardware Error]: event severity: fatal
> [ 6049.108479] {1}[Hardware Error]:  Error 0, type: fatal
> [ 6049.108481] {1}[Hardware Error]:   section_type: PCIe error
> [ 6049.108482] {1}[Hardware Error]:   port_type: 6, downstream switch port
> [ 6049.108483] {1}[Hardware Error]:   version: 1.16
> [ 6049.108484] {1}[Hardware Error]:   command: 0x0407, status: 0x0010
> [ 6049.108485] {1}[Hardware Error]:   device_id: 0000:83:05.0
> [ 6049.108486] {1}[Hardware Error]:   slot: 0
> [ 6049.108487] {1}[Hardware Error]:   secondary_bus: 0x85
> [ 6049.108488] {1}[Hardware Error]:   vendor_id: 0x10b5, device_id: 0x8734
> [ 6049.108489] {1}[Hardware Error]:   class_code: 000406
> [ 6049.108489] {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0003
> [ 6049.108491] Kernel panic - not syncing: Fatal hardware error!
> [ 6049.108514] Kernel Offset: 0x25800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Sounds like your platform fundamentally doesn't support surprise link
down if it considers the event a fatal error. That's sort of what this
test was supposed to help catch so we know what platforms can do this
vs ones that can't.

The test does check that the slot is hotplug capable before running,
so it's supposed to only run the test on slots that claim to be capable
of handling the event. I just don't know of a good way to query platform
firmware to know what it will do in response to such an event.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* blktests block/019 lead system hang
@ 2018-06-05 16:18     ` Keith Busch
  0 siblings, 0 replies; 18+ messages in thread
From: Keith Busch @ 2018-06-05 16:18 UTC (permalink / raw)


On Wed, May 30, 2018@03:26:54AM -0400, Yi Zhang wrote:
> Hi Keith
> I found blktest block/019 also can lead my NVMe server hang with 4.17.0-rc7, let me know if you need more info, thanks. 
> 
> Server: Dell R730xd
> NVMe SSD: 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 172X (rev 01)
> 
> Console log:
> Kernel 4.17.0-rc7 on an x86_64
> 
> storageqe-62 login: [ 6043.121834] run blktests block/019 at 2018-05-30 03:16:34
> [ 6049.108476] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
> [ 6049.108478] {1}[Hardware Error]: event severity: fatal
> [ 6049.108479] {1}[Hardware Error]:  Error 0, type: fatal
> [ 6049.108481] {1}[Hardware Error]:   section_type: PCIe error
> [ 6049.108482] {1}[Hardware Error]:   port_type: 6, downstream switch port
> [ 6049.108483] {1}[Hardware Error]:   version: 1.16
> [ 6049.108484] {1}[Hardware Error]:   command: 0x0407, status: 0x0010
> [ 6049.108485] {1}[Hardware Error]:   device_id: 0000:83:05.0
> [ 6049.108486] {1}[Hardware Error]:   slot: 0
> [ 6049.108487] {1}[Hardware Error]:   secondary_bus: 0x85
> [ 6049.108488] {1}[Hardware Error]:   vendor_id: 0x10b5, device_id: 0x8734
> [ 6049.108489] {1}[Hardware Error]:   class_code: 000406
> [ 6049.108489] {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0003
> [ 6049.108491] Kernel panic - not syncing: Fatal hardware error!
> [ 6049.108514] Kernel Offset: 0x25800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Sounds like your platform fundamentally doesn't support surprise link
down if it considers the event a fatal error. That's sort of what this
test was supposed to help catch so we know what platforms can do this
vs ones that can't.

The test does check that the slot is hotplug capable before running,
so it's supposed to only run the test on slots that claim to be capable
of handling the event. I just don't know of a good way to query platform
firmware to know what it will do in response to such an event.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: blktests block/019 lead system hang
  2018-06-05 16:18     ` Keith Busch
@ 2018-06-05 17:21       ` Keith Busch
  -1 siblings, 0 replies; 18+ messages in thread
From: Keith Busch @ 2018-06-05 17:21 UTC (permalink / raw)
  To: Yi Zhang; +Cc: Keith Busch, linux-block, osandov, linux-nvme, ming.lei

On Tue, Jun 05, 2018 at 10:18:53AM -0600, Keith Busch wrote:
> On Wed, May 30, 2018 at 03:26:54AM -0400, Yi Zhang wrote:
> > Hi Keith
> > I found blktest block/019 also can lead my NVMe server hang with 4.17.0-rc7, let me know if you need more info, thanks. 
> > 
> > Server: Dell R730xd
> > NVMe SSD: 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 172X (rev 01)
> > 
> > Console log:
> > Kernel 4.17.0-rc7 on an x86_64
> > 
> > storageqe-62 login: [ 6043.121834] run blktests block/019 at 2018-05-30 03:16:34
> > [ 6049.108476] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
> > [ 6049.108478] {1}[Hardware Error]: event severity: fatal
> > [ 6049.108479] {1}[Hardware Error]:  Error 0, type: fatal
> > [ 6049.108481] {1}[Hardware Error]:   section_type: PCIe error
> > [ 6049.108482] {1}[Hardware Error]:   port_type: 6, downstream switch port
> > [ 6049.108483] {1}[Hardware Error]:   version: 1.16
> > [ 6049.108484] {1}[Hardware Error]:   command: 0x0407, status: 0x0010
> > [ 6049.108485] {1}[Hardware Error]:   device_id: 0000:83:05.0
> > [ 6049.108486] {1}[Hardware Error]:   slot: 0
> > [ 6049.108487] {1}[Hardware Error]:   secondary_bus: 0x85
> > [ 6049.108488] {1}[Hardware Error]:   vendor_id: 0x10b5, device_id: 0x8734
> > [ 6049.108489] {1}[Hardware Error]:   class_code: 000406
> > [ 6049.108489] {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0003
> > [ 6049.108491] Kernel panic - not syncing: Fatal hardware error!
> > [ 6049.108514] Kernel Offset: 0x25800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Could you attach 'lspci -vvv -s 0000:83:05.0'? Just want to see
your switch's capabilities to confirm the pre-test checks are really
sufficient.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* blktests block/019 lead system hang
@ 2018-06-05 17:21       ` Keith Busch
  0 siblings, 0 replies; 18+ messages in thread
From: Keith Busch @ 2018-06-05 17:21 UTC (permalink / raw)


On Tue, Jun 05, 2018@10:18:53AM -0600, Keith Busch wrote:
> On Wed, May 30, 2018@03:26:54AM -0400, Yi Zhang wrote:
> > Hi Keith
> > I found blktest block/019 also can lead my NVMe server hang with 4.17.0-rc7, let me know if you need more info, thanks. 
> > 
> > Server: Dell R730xd
> > NVMe SSD: 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 172X (rev 01)
> > 
> > Console log:
> > Kernel 4.17.0-rc7 on an x86_64
> > 
> > storageqe-62 login: [ 6043.121834] run blktests block/019 at 2018-05-30 03:16:34
> > [ 6049.108476] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
> > [ 6049.108478] {1}[Hardware Error]: event severity: fatal
> > [ 6049.108479] {1}[Hardware Error]:  Error 0, type: fatal
> > [ 6049.108481] {1}[Hardware Error]:   section_type: PCIe error
> > [ 6049.108482] {1}[Hardware Error]:   port_type: 6, downstream switch port
> > [ 6049.108483] {1}[Hardware Error]:   version: 1.16
> > [ 6049.108484] {1}[Hardware Error]:   command: 0x0407, status: 0x0010
> > [ 6049.108485] {1}[Hardware Error]:   device_id: 0000:83:05.0
> > [ 6049.108486] {1}[Hardware Error]:   slot: 0
> > [ 6049.108487] {1}[Hardware Error]:   secondary_bus: 0x85
> > [ 6049.108488] {1}[Hardware Error]:   vendor_id: 0x10b5, device_id: 0x8734
> > [ 6049.108489] {1}[Hardware Error]:   class_code: 000406
> > [ 6049.108489] {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0003
> > [ 6049.108491] Kernel panic - not syncing: Fatal hardware error!
> > [ 6049.108514] Kernel Offset: 0x25800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Could you attach 'lspci -vvv -s 0000:83:05.0'? Just want to see
your switch's capabilities to confirm the pre-test checks are really
sufficient.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: blktests block/019 lead system hang
  2018-06-05 17:21       ` Keith Busch
@ 2018-06-06  5:42         ` Yi Zhang
  -1 siblings, 0 replies; 18+ messages in thread
From: Yi Zhang @ 2018-06-06  5:42 UTC (permalink / raw)
  To: Keith Busch; +Cc: Keith Busch, linux-block, osandov, linux-nvme, ming.lei

Here is the output, and I can see "HotPlug+ Surprise+" on SltCap

# lspci -vvv -s 0000:83:05.0
83:05.0 PCI bridge: PLX Technology, Inc. PEX 8734 32-lane, 8-Port PCI 
Express Gen 3 (8.0GT/s) Switch (rev ab) (prog-if 00 [Normal decode])
     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
     Latency: 0, Cache Line Size: 32 bytes
     Interrupt: pin A routed to IRQ 40
     NUMA node: 1
     Bus: primary=83, secondary=85, subordinate=85, sec-latency=0
     I/O behind bridge: 00009000-00009fff
     Memory behind bridge: c8600000-c86fffff
     Prefetchable memory behind bridge: 000003c000200000-000003c0003fffff
     Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- <SERR- <PERR-
     BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
         PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
     Capabilities: [40] Power Management version 3
         Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
     Capabilities: [48] MSI: Enable+ Count=1/8 Maskable+ 64bit+
         Address: 00000000fee00118  Data: 0000
         Masking: 000000fe  Pending: 00000000
     Capabilities: [68] Express (v2) Downstream Port (Slot+), MSI 00
         DevCap:    MaxPayload 512 bytes, PhantFunc 0
             ExtTag- RBE+
         DevCtl:    Report errors: Correctable- Non-Fatal+ Fatal+ 
Unsupported+
             RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
             MaxPayload 128 bytes, MaxReadReq 128 bytes
         DevSta:    CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- 
TransPend-
         LnkCap:    Port #5, Speed 8GT/s, Width x4, ASPM L1, Exit 
Latency L0s <4us, L1 <4us
             ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
         LnkCtl:    ASPM Disabled; Disabled- CommClk-
             ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
         LnkSta:    Speed 8GT/s, Width x4, TrErr- Train- SlotClk- 
DLActive+ BWMgmt- ABWMgmt-
         SltCap:    AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ 
Surprise+
             Slot #181, PowerLimit 25.000W; Interlock- NoCompl-
         SltCtl:    Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt+ 
HPIrq+ LinkChg+
             Control: AttnInd Unknown, PwrInd On, Power- Interlock-
         SltSta:    Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ 
Interlock-
             Changed: MRL- PresDet- LinkState-
         DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, 
OBFF Via message ARIFwd+
         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, 
OBFF Disabled ARIFwd+
         LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, 
Selectable De-emphasis: -6dB
              Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
              Compliance De-emphasis: -6dB
         LnkSta2: Current De-emphasis Level: -6dB, 
EqualizationComplete+, EqualizationPhase1+
              EqualizationPhase2+, EqualizationPhase3+, 
LinkEqualizationRequest-
     Capabilities: [a4] Subsystem: Dell Device 1f84
     Capabilities: [100 v1] Device Serial Number ab-87-00-10-b5-df-0e-00
     Capabilities: [fb4 v1] Advanced Error Reporting
         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol+
         UESvrt:    DLP+ SDES+ TLP+ FCP+ CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
         CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
         CEMsk:    RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
         AERCap:    First Error Pointer: 1f, GenCap+ CGenEn+ ChkCap+ ChkEn+
     Capabilities: [138 v1] Power Budgeting <?>
     Capabilities: [10c v1] #19
     Capabilities: [148 v1] Virtual Channel
         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
         Arb:    Fixed- WRR32- WRR64- WRR128-
         Ctrl:    ArbSelect=Fixed
         Status:    InProgress-
         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
             Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
             Status:    NegoPending- InProgress-
     Capabilities: [e00 v1] #12
     Capabilities: [f24 v1] Access Control Services
         ACSCap:    SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ 
UpstreamFwd+ EgressCtrl+ DirectTrans+
         ACSCtl:    SrcValid- TransBlk- ReqRedir- CmpltRedir- 
UpstreamFwd- EgressCtrl- DirectTrans-
     Capabilities: [b70 v1] Vendor Specific Information: ID=0001 Rev=0 
Len=010 <?>
     Kernel driver in use: pcieport
     Kernel modules: shpchp

Thanks

Yi


On 06/06/2018 01:21 AM, Keith Busch wrote:
> On Tue, Jun 05, 2018 at 10:18:53AM -0600, Keith Busch wrote:
>> On Wed, May 30, 2018 at 03:26:54AM -0400, Yi Zhang wrote:
>>> Hi Keith
>>> I found blktest block/019 also can lead my NVMe server hang with 4.17.0-rc7, let me know if you need more info, thanks.
>>>
>>> Server: Dell R730xd
>>> NVMe SSD: 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 172X (rev 01)
>>>
>>> Console log:
>>> Kernel 4.17.0-rc7 on an x86_64
>>>
>>> storageqe-62 login: [ 6043.121834] run blktests block/019 at 2018-05-30 03:16:34
>>> [ 6049.108476] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
>>> [ 6049.108478] {1}[Hardware Error]: event severity: fatal
>>> [ 6049.108479] {1}[Hardware Error]:  Error 0, type: fatal
>>> [ 6049.108481] {1}[Hardware Error]:   section_type: PCIe error
>>> [ 6049.108482] {1}[Hardware Error]:   port_type: 6, downstream switch port
>>> [ 6049.108483] {1}[Hardware Error]:   version: 1.16
>>> [ 6049.108484] {1}[Hardware Error]:   command: 0x0407, status: 0x0010
>>> [ 6049.108485] {1}[Hardware Error]:   device_id: 0000:83:05.0
>>> [ 6049.108486] {1}[Hardware Error]:   slot: 0
>>> [ 6049.108487] {1}[Hardware Error]:   secondary_bus: 0x85
>>> [ 6049.108488] {1}[Hardware Error]:   vendor_id: 0x10b5, device_id: 0x8734
>>> [ 6049.108489] {1}[Hardware Error]:   class_code: 000406
>>> [ 6049.108489] {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0003
>>> [ 6049.108491] Kernel panic - not syncing: Fatal hardware error!
>>> [ 6049.108514] Kernel Offset: 0x25800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> Could you attach 'lspci -vvv -s 0000:83:05.0'? Just want to see
> your switch's capabilities to confirm the pre-test checks are really
> sufficient.
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* blktests block/019 lead system hang
@ 2018-06-06  5:42         ` Yi Zhang
  0 siblings, 0 replies; 18+ messages in thread
From: Yi Zhang @ 2018-06-06  5:42 UTC (permalink / raw)


Here is the output, and I can see "HotPlug+ Surprise+" on SltCap

# lspci -vvv -s 0000:83:05.0
83:05.0 PCI bridge: PLX Technology, Inc. PEX 8734 32-lane, 8-Port PCI 
Express Gen 3 (8.0GT/s) Switch (rev ab) (prog-if 00 [Normal decode])
 ??? Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
 ??? Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
 ??? Latency: 0, Cache Line Size: 32 bytes
 ??? Interrupt: pin A routed to IRQ 40
 ??? NUMA node: 1
 ??? Bus: primary=83, secondary=85, subordinate=85, sec-latency=0
 ??? I/O behind bridge: 00009000-00009fff
 ??? Memory behind bridge: c8600000-c86fffff
 ??? Prefetchable memory behind bridge: 000003c000200000-000003c0003fffff
 ??? Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- <SERR- <PERR-
 ??? BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
 ??? ??? PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
 ??? Capabilities: [40] Power Management version 3
 ??? ??? Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
 ??? ??? Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
 ??? Capabilities: [48] MSI: Enable+ Count=1/8 Maskable+ 64bit+
 ??? ??? Address: 00000000fee00118? Data: 0000
 ??? ??? Masking: 000000fe? Pending: 00000000
 ??? Capabilities: [68] Express (v2) Downstream Port (Slot+), MSI 00
 ??? ??? DevCap:??? MaxPayload 512 bytes, PhantFunc 0
 ??? ??? ??? ExtTag- RBE+
 ??? ??? DevCtl:??? Report errors: Correctable- Non-Fatal+ Fatal+ 
Unsupported+
 ??? ??? ??? RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
 ??? ??? ??? MaxPayload 128 bytes, MaxReadReq 128 bytes
 ??? ??? DevSta:??? CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- 
TransPend-
 ??? ??? LnkCap:??? Port #5, Speed 8GT/s, Width x4, ASPM L1, Exit 
Latency L0s <4us, L1 <4us
 ??? ??? ??? ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
 ??? ??? LnkCtl:??? ASPM Disabled; Disabled- CommClk-
 ??? ??? ??? ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
 ??? ??? LnkSta:??? Speed 8GT/s, Width x4, TrErr- Train- SlotClk- 
DLActive+ BWMgmt- ABWMgmt-
 ??? ??? SltCap:??? AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ 
Surprise+
 ??? ??? ??? Slot #181, PowerLimit 25.000W; Interlock- NoCompl-
 ??? ??? SltCtl:??? Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt+ 
HPIrq+ LinkChg+
 ??? ??? ??? Control: AttnInd Unknown, PwrInd On, Power- Interlock-
 ??? ??? SltSta:??? Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ 
Interlock-
 ??? ??? ??? Changed: MRL- PresDet- LinkState-
 ??? ??? DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, 
OBFF Via message ARIFwd+
 ??? ??? DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, 
OBFF Disabled ARIFwd+
 ??? ??? LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, 
Selectable De-emphasis: -6dB
 ??? ??? ??? ?Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
 ??? ??? ??? ?Compliance De-emphasis: -6dB
 ??? ??? LnkSta2: Current De-emphasis Level: -6dB, 
EqualizationComplete+, EqualizationPhase1+
 ??? ??? ??? ?EqualizationPhase2+, EqualizationPhase3+, 
LinkEqualizationRequest-
 ??? Capabilities: [a4] Subsystem: Dell Device 1f84
 ??? Capabilities: [100 v1] Device Serial Number ab-87-00-10-b5-df-0e-00
 ??? Capabilities: [fb4 v1] Advanced Error Reporting
 ??? ??? UESta:??? DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
 ??? ??? UEMsk:??? DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol+
 ??? ??? UESvrt:??? DLP+ SDES+ TLP+ FCP+ CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
 ??? ??? CESta:??? RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
 ??? ??? CEMsk:??? RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
 ??? ??? AERCap:??? First Error Pointer: 1f, GenCap+ CGenEn+ ChkCap+ ChkEn+
 ??? Capabilities: [138 v1] Power Budgeting <?>
 ??? Capabilities: [10c v1] #19
 ??? Capabilities: [148 v1] Virtual Channel
 ??? ??? Caps:??? LPEVC=0 RefClk=100ns PATEntryBits=1
 ??? ??? Arb:??? Fixed- WRR32- WRR64- WRR128-
 ??? ??? Ctrl:??? ArbSelect=Fixed
 ??? ??? Status:??? InProgress-
 ??? ??? VC0:??? Caps:??? PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
 ??? ??? ??? Arb:??? Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
 ??? ??? ??? Ctrl:??? Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
 ??? ??? ??? Status:??? NegoPending- InProgress-
 ??? Capabilities: [e00 v1] #12
 ??? Capabilities: [f24 v1] Access Control Services
 ??? ??? ACSCap:??? SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ 
UpstreamFwd+ EgressCtrl+ DirectTrans+
 ??? ??? ACSCtl:??? SrcValid- TransBlk- ReqRedir- CmpltRedir- 
UpstreamFwd- EgressCtrl- DirectTrans-
 ??? Capabilities: [b70 v1] Vendor Specific Information: ID=0001 Rev=0 
Len=010 <?>
 ??? Kernel driver in use: pcieport
 ??? Kernel modules: shpchp

Thanks

Yi


On 06/06/2018 01:21 AM, Keith Busch wrote:
> On Tue, Jun 05, 2018@10:18:53AM -0600, Keith Busch wrote:
>> On Wed, May 30, 2018@03:26:54AM -0400, Yi Zhang wrote:
>>> Hi Keith
>>> I found blktest block/019 also can lead my NVMe server hang with 4.17.0-rc7, let me know if you need more info, thanks.
>>>
>>> Server: Dell R730xd
>>> NVMe SSD: 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 172X (rev 01)
>>>
>>> Console log:
>>> Kernel 4.17.0-rc7 on an x86_64
>>>
>>> storageqe-62 login: [ 6043.121834] run blktests block/019 at 2018-05-30 03:16:34
>>> [ 6049.108476] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
>>> [ 6049.108478] {1}[Hardware Error]: event severity: fatal
>>> [ 6049.108479] {1}[Hardware Error]:  Error 0, type: fatal
>>> [ 6049.108481] {1}[Hardware Error]:   section_type: PCIe error
>>> [ 6049.108482] {1}[Hardware Error]:   port_type: 6, downstream switch port
>>> [ 6049.108483] {1}[Hardware Error]:   version: 1.16
>>> [ 6049.108484] {1}[Hardware Error]:   command: 0x0407, status: 0x0010
>>> [ 6049.108485] {1}[Hardware Error]:   device_id: 0000:83:05.0
>>> [ 6049.108486] {1}[Hardware Error]:   slot: 0
>>> [ 6049.108487] {1}[Hardware Error]:   secondary_bus: 0x85
>>> [ 6049.108488] {1}[Hardware Error]:   vendor_id: 0x10b5, device_id: 0x8734
>>> [ 6049.108489] {1}[Hardware Error]:   class_code: 000406
>>> [ 6049.108489] {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0003
>>> [ 6049.108491] Kernel panic - not syncing: Fatal hardware error!
>>> [ 6049.108514] Kernel Offset: 0x25800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> Could you attach 'lspci -vvv -s 0000:83:05.0'? Just want to see
> your switch's capabilities to confirm the pre-test checks are really
> sufficient.
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: blktests block/019 lead system hang
  2018-06-06  5:42         ` Yi Zhang
@ 2018-06-06 14:28           ` Keith Busch
  -1 siblings, 0 replies; 18+ messages in thread
From: Keith Busch @ 2018-06-06 14:28 UTC (permalink / raw)
  To: Yi Zhang; +Cc: Keith Busch, linux-block, osandov, linux-nvme, ming.lei

On Wed, Jun 06, 2018 at 01:42:15PM +0800, Yi Zhang wrote:
> Here is the output, and I can see "HotPlug+ Surprise+" on SltCap

Thanks. That looks like a perfectly capable port. I even have the same
switch in one of my machines, but the test doesn't trigger fatal
firmware-first errors.

Might need to query something about the platform to know how it treats
link-downs before proceeding with the test (don't know off the top of
my head; will do some digging).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* blktests block/019 lead system hang
@ 2018-06-06 14:28           ` Keith Busch
  0 siblings, 0 replies; 18+ messages in thread
From: Keith Busch @ 2018-06-06 14:28 UTC (permalink / raw)


On Wed, Jun 06, 2018@01:42:15PM +0800, Yi Zhang wrote:
> Here is the output, and I can see "HotPlug+ Surprise+" on SltCap

Thanks. That looks like a perfectly capable port. I even have the same
switch in one of my machines, but the test doesn't trigger fatal
firmware-first errors.

Might need to query something about the platform to know how it treats
link-downs before proceeding with the test (don't know off the top of
my head; will do some digging).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: blktests block/019 lead system hang
  2018-06-05 16:18     ` Keith Busch
@ 2018-06-12 23:41       ` Austin.Bolen
  -1 siblings, 0 replies; 18+ messages in thread
From: Austin.Bolen @ 2018-06-12 23:41 UTC (permalink / raw)
  To: keith.busch, yi.zhang
  Cc: keith.busch, linux-block, osandov, linux-nvme, ming.lei

On 6/5/2018 11:16 AM, Keith Busch wrote:=0A=
> On Wed, May 30, 2018 at 03:26:54AM -0400, Yi Zhang wrote:=0A=
>> Hi Keith=0A=
>> I found blktest block/019 also can lead my NVMe server hang with 4.17.0-=
rc7, let me know if you need more info, thanks. =0A=
>>=0A=
>> Server: Dell R730xd=0A=
>> NVMe SSD: 85:00.0 Non-Volatile memory controller: Samsung Electronics Co=
 Ltd NVMe SSD Controller 172X (rev 01)=0A=
>>=0A=
>> Console log:=0A=
>> Kernel 4.17.0-rc7 on an x86_64=0A=
>>=0A=
>> storageqe-62 login: [ 6043.121834] run blktests block/019 at 2018-05-30 =
03:16:34=0A=
>> [ 6049.108476] {1}[Hardware Error]: Hardware error from APEI Generic Har=
dware Error Source: 3=0A=
>> [ 6049.108478] {1}[Hardware Error]: event severity: fatal=0A=
>> [ 6049.108479] {1}[Hardware Error]:  Error 0, type: fatal=0A=
>> [ 6049.108481] {1}[Hardware Error]:   section_type: PCIe error=0A=
>> [ 6049.108482] {1}[Hardware Error]:   port_type: 6, downstream switch po=
rt=0A=
>> [ 6049.108483] {1}[Hardware Error]:   version: 1.16=0A=
>> [ 6049.108484] {1}[Hardware Error]:   command: 0x0407, status: 0x0010=0A=
>> [ 6049.108485] {1}[Hardware Error]:   device_id: 0000:83:05.0=0A=
>> [ 6049.108486] {1}[Hardware Error]:   slot: 0=0A=
>> [ 6049.108487] {1}[Hardware Error]:   secondary_bus: 0x85=0A=
>> [ 6049.108488] {1}[Hardware Error]:   vendor_id: 0x10b5, device_id: 0x87=
34=0A=
>> [ 6049.108489] {1}[Hardware Error]:   class_code: 000406=0A=
>> [ 6049.108489] {1}[Hardware Error]:   bridge: secondary_status: 0x0000, =
control: 0x0003=0A=
>> [ 6049.108491] Kernel panic - not syncing: Fatal hardware error!=0A=
>> [ 6049.108514] Kernel Offset: 0x25800000 from 0xffffffff81000000 (reloca=
tion range: 0xffffffff80000000-0xffffffffbfffffff)=0A=
> Sounds like your platform fundamentally doesn't support surprise link=0A=
> down if it considers the event a fatal error. That's sort of what this=0A=
> test was supposed to help catch so we know what platforms can do this=0A=
> vs ones that can't.=0A=
>=0A=
> The test does check that the slot is hotplug capable before running,=0A=
> so it's supposed to only run the test on slots that claim to be capable=
=0A=
> of handling the event. I just don't know of a good way to query platform=
=0A=
> firmware to know what it will do in response to such an event.=0A=
It looks like the test is setting the Link Disable bit.  But this is not=0A=
a good simulation for hot-plug surprise removal testing or surprise link=0A=
down (SLD) testing, if that is the intent.  One reason is that Link=0A=
Disable does not invoke SLD semantics per PCIe spec.  This is somewhat=0A=
of a moot point in this case since the switch has Hot-Plug Surprise bit=0A=
set which also masks the SLD semantics in PCIe.=0A=
=0A=
Also, the Hot-Plug Capable + Surprise Hot-Plug bits set means the=0A=
platform can tolerate the case where "an adapter present in this slot=0A=
might be removed from the system without any prior notification".  It=0A=
does not mean that a system can survive link down under any other=0A=
circumstances such as setting Link Disable or generating a Secondary Bus=0A=
Reset or a true surprise link down event.  To the earlier point, I also=0A=
do not know of any way the OS can know a priori if the platform can=0A=
handle surprise link down outside of surprise remove case.  We can look=0A=
at standardizing a way to do that if OSes find it useful to know.=0A=
=0A=
Relative to this particular error, Link Disable doesn't clear Presence=0A=
Detect State which would happen on a real Surprise Hot-Plug removal=0A=
event and this is probably why the system crashes.  What will happen is=0A=
that after the link goes to disabled state, the ongoing I/O will cause=0A=
MMIO accesses on the drive and that will cause a UR which is an=0A=
uncorrectable PCIe error (ERR_FATAL on R730).  The BIOS on the R730 is=0A=
surprise remove aware (Surprise Hot-Plug =3D 1) and so it will check if=0A=
the device is still present by checking Presence Detect State.  If the=0A=
device is not present it will mask the error and let the OS handle the=0A=
device removal due to hot-plug interrupt(s).  If the device is present,=0A=
as in this case, then the BIOS will escalate to OS as a fatal NMI=0A=
(current R730 platform policy is to only mask errors due to removal).=0A=
=0A=
For future, these servers may report these sort of errors as recoverable=0A=
via the GHES structures in APEI which will allow the OS to recover from=0A=
this non-surprise remove class of error as well.  In the (hopefully=0A=
near) future, the industry will move to DPC as the framework for this=0A=
sort of generic PCIe error handling/recovery but there are architectural=0A=
changes needed that are currently being defined in the relevant=0A=
standards bodies.  Once the architecture is defined it can be=0A=
implemented and tested to verify these sort of test cases pass.=0A=
=0A=
-Austin=0A=
=0A=
=0A=
>=0A=
> _______________________________________________=0A=
> Linux-nvme mailing list=0A=
> Linux-nvme@lists.infradead.org=0A=
> http://lists.infradead.org/mailman/listinfo/linux-nvme=0A=
>=0A=
=0A=

^ permalink raw reply	[flat|nested] 18+ messages in thread

* blktests block/019 lead system hang
@ 2018-06-12 23:41       ` Austin.Bolen
  0 siblings, 0 replies; 18+ messages in thread
From: Austin.Bolen @ 2018-06-12 23:41 UTC (permalink / raw)


On 6/5/2018 11:16 AM, Keith Busch wrote:
> On Wed, May 30, 2018@03:26:54AM -0400, Yi Zhang wrote:
>> Hi Keith
>> I found blktest block/019 also can lead my NVMe server hang with 4.17.0-rc7, let me know if you need more info, thanks. 
>>
>> Server: Dell R730xd
>> NVMe SSD: 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 172X (rev 01)
>>
>> Console log:
>> Kernel 4.17.0-rc7 on an x86_64
>>
>> storageqe-62 login: [ 6043.121834] run blktests block/019 at 2018-05-30 03:16:34
>> [ 6049.108476] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3
>> [ 6049.108478] {1}[Hardware Error]: event severity: fatal
>> [ 6049.108479] {1}[Hardware Error]:  Error 0, type: fatal
>> [ 6049.108481] {1}[Hardware Error]:   section_type: PCIe error
>> [ 6049.108482] {1}[Hardware Error]:   port_type: 6, downstream switch port
>> [ 6049.108483] {1}[Hardware Error]:   version: 1.16
>> [ 6049.108484] {1}[Hardware Error]:   command: 0x0407, status: 0x0010
>> [ 6049.108485] {1}[Hardware Error]:   device_id: 0000:83:05.0
>> [ 6049.108486] {1}[Hardware Error]:   slot: 0
>> [ 6049.108487] {1}[Hardware Error]:   secondary_bus: 0x85
>> [ 6049.108488] {1}[Hardware Error]:   vendor_id: 0x10b5, device_id: 0x8734
>> [ 6049.108489] {1}[Hardware Error]:   class_code: 000406
>> [ 6049.108489] {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0003
>> [ 6049.108491] Kernel panic - not syncing: Fatal hardware error!
>> [ 6049.108514] Kernel Offset: 0x25800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> Sounds like your platform fundamentally doesn't support surprise link
> down if it considers the event a fatal error. That's sort of what this
> test was supposed to help catch so we know what platforms can do this
> vs ones that can't.
>
> The test does check that the slot is hotplug capable before running,
> so it's supposed to only run the test on slots that claim to be capable
> of handling the event. I just don't know of a good way to query platform
> firmware to know what it will do in response to such an event.
It looks like the test is setting the Link Disable bit.  But this is not
a good simulation for hot-plug surprise removal testing or surprise link
down (SLD) testing, if that is the intent.  One reason is that Link
Disable does not invoke SLD semantics per PCIe spec.  This is somewhat
of a moot point in this case since the switch has Hot-Plug Surprise bit
set which also masks the SLD semantics in PCIe.

Also, the Hot-Plug Capable + Surprise Hot-Plug bits set means the
platform can tolerate the case where "an adapter present in this slot
might be removed from the system without any prior notification".  It
does not mean that a system can survive link down under any other
circumstances such as setting Link Disable or generating a Secondary Bus
Reset or a true surprise link down event.  To the earlier point, I also
do not know of any way the OS can know a priori if the platform can
handle surprise link down outside of surprise remove case.  We can look
at standardizing a way to do that if OSes find it useful to know.

Relative to this particular error, Link Disable doesn't clear Presence
Detect State which would happen on a real Surprise Hot-Plug removal
event and this is probably why the system crashes.  What will happen is
that after the link goes to disabled state, the ongoing I/O will cause
MMIO accesses on the drive and that will cause a UR which is an
uncorrectable PCIe error (ERR_FATAL on R730).  The BIOS on the R730 is
surprise remove aware (Surprise Hot-Plug = 1) and so it will check if
the device is still present by checking Presence Detect State.  If the
device is not present it will mask the error and let the OS handle the
device removal due to hot-plug interrupt(s).  If the device is present,
as in this case, then the BIOS will escalate to OS as a fatal NMI
(current R730 platform policy is to only mask errors due to removal).

For future, these servers may report these sort of errors as recoverable
via the GHES structures in APEI which will allow the OS to recover from
this non-surprise remove class of error as well.  In the (hopefully
near) future, the industry will move to DPC as the framework for this
sort of generic PCIe error handling/recovery but there are architectural
changes needed that are currently being defined in the relevant
standards bodies.  Once the architecture is defined it can be
implemented and tested to verify these sort of test cases pass.

-Austin


>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: blktests block/019 lead system hang
  2018-06-12 23:41       ` Austin.Bolen
@ 2018-06-13 15:44         ` Keith Busch
  -1 siblings, 0 replies; 18+ messages in thread
From: Keith Busch @ 2018-06-13 15:44 UTC (permalink / raw)
  To: Austin.Bolen
  Cc: keith.busch, yi.zhang, linux-block, osandov, linux-nvme, ming.lei

On Tue, Jun 12, 2018 at 04:41:54PM -0700, Austin.Bolen@dell.com wrote:
> It looks like the test is setting the Link Disable bit.  But this is not
> a good simulation for hot-plug surprise removal testing or surprise link
> down (SLD) testing, if that is the intent.  One reason is that Link
> Disable does not invoke SLD semantics per PCIe spec.  This is somewhat
> of a moot point in this case since the switch has Hot-Plug Surprise bit
> set which also masks the SLD semantics in PCIe.
> 
> Also, the Hot-Plug Capable + Surprise Hot-Plug bits set means the
> platform can tolerate the case where "an adapter present in this slot
> might be removed from the system without any prior notification".  It
> does not mean that a system can survive link down under any other
> circumstances such as setting Link Disable or generating a Secondary Bus
> Reset or a true surprise link down event.  To the earlier point, I also
> do not know of any way the OS can know a priori if the platform can
> handle surprise link down outside of surprise remove case.  We can look
> at standardizing a way to do that if OSes find it useful to know.
> 
> Relative to this particular error, Link Disable doesn't clear Presence
> Detect State which would happen on a real Surprise Hot-Plug removal
> event and this is probably why the system crashes.  What will happen is
> that after the link goes to disabled state, the ongoing I/O will cause
> MMIO accesses on the drive and that will cause a UR which is an
> uncorrectable PCIe error (ERR_FATAL on R730).  The BIOS on the R730 is
> surprise remove aware (Surprise Hot-Plug = 1) and so it will check if
> the device is still present by checking Presence Detect State.  If the
> device is not present it will mask the error and let the OS handle the
> device removal due to hot-plug interrupt(s).  If the device is present,
> as in this case, then the BIOS will escalate to OS as a fatal NMI
> (current R730 platform policy is to only mask errors due to removal).
> 
> For future, these servers may report these sort of errors as recoverable
> via the GHES structures in APEI which will allow the OS to recover from
> this non-surprise remove class of error as well.  In the (hopefully
> near) future, the industry will move to DPC as the framework for this
> sort of generic PCIe error handling/recovery but there are architectural
> changes needed that are currently being defined in the relevant
> standards bodies.  Once the architecture is defined it can be
> implemented and tested to verify these sort of test cases pass.

Thanks for the feedback!

This test does indeed toggle the Link Control Link Disable bit to simulate
the link failure. The PCIe specification specifically covers this case
in Section 3.2.1, Data Link Control and Management State Machine Rules:

  If the Link Disable bit has been Set by software, then the subsequent
  transition to DL_Inactive must not be considered an error.

So this test should suppress any Suprise Down Error events, but handling
that particular event wasn't the intent of the test (and as you mentioned,
it ought not occur anyway since the slot is HP Surprise capable).

The test should not suppress reporting the Data Link Layer State Changed
slot status. And while this doesn't trigger a Slot PDC status, triggering
a DLLSC should occur since the Link Status DLLLA should go to 0 when
state machine goes from DL_Active to DL_Down, regardless of if a Suprise
Down Error was detected.

The Linux PCIEHP driver handles a DLLSC link-down event the same as
a presence detect remove event, and that's part of what this test was
trying to cover.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* blktests block/019 lead system hang
@ 2018-06-13 15:44         ` Keith Busch
  0 siblings, 0 replies; 18+ messages in thread
From: Keith Busch @ 2018-06-13 15:44 UTC (permalink / raw)


On Tue, Jun 12, 2018@04:41:54PM -0700, Austin.Bolen@dell.com wrote:
> It looks like the test is setting the Link Disable bit.  But this is not
> a good simulation for hot-plug surprise removal testing or surprise link
> down (SLD) testing, if that is the intent.  One reason is that Link
> Disable does not invoke SLD semantics per PCIe spec.  This is somewhat
> of a moot point in this case since the switch has Hot-Plug Surprise bit
> set which also masks the SLD semantics in PCIe.
> 
> Also, the Hot-Plug Capable + Surprise Hot-Plug bits set means the
> platform can tolerate the case where "an adapter present in this slot
> might be removed from the system without any prior notification".  It
> does not mean that a system can survive link down under any other
> circumstances such as setting Link Disable or generating a Secondary Bus
> Reset or a true surprise link down event.  To the earlier point, I also
> do not know of any way the OS can know a priori if the platform can
> handle surprise link down outside of surprise remove case.  We can look
> at standardizing a way to do that if OSes find it useful to know.
> 
> Relative to this particular error, Link Disable doesn't clear Presence
> Detect State which would happen on a real Surprise Hot-Plug removal
> event and this is probably why the system crashes.  What will happen is
> that after the link goes to disabled state, the ongoing I/O will cause
> MMIO accesses on the drive and that will cause a UR which is an
> uncorrectable PCIe error (ERR_FATAL on R730).  The BIOS on the R730 is
> surprise remove aware (Surprise Hot-Plug = 1) and so it will check if
> the device is still present by checking Presence Detect State.  If the
> device is not present it will mask the error and let the OS handle the
> device removal due to hot-plug interrupt(s).  If the device is present,
> as in this case, then the BIOS will escalate to OS as a fatal NMI
> (current R730 platform policy is to only mask errors due to removal).
> 
> For future, these servers may report these sort of errors as recoverable
> via the GHES structures in APEI which will allow the OS to recover from
> this non-surprise remove class of error as well.  In the (hopefully
> near) future, the industry will move to DPC as the framework for this
> sort of generic PCIe error handling/recovery but there are architectural
> changes needed that are currently being defined in the relevant
> standards bodies.  Once the architecture is defined it can be
> implemented and tested to verify these sort of test cases pass.

Thanks for the feedback!

This test does indeed toggle the Link Control Link Disable bit to simulate
the link failure. The PCIe specification specifically covers this case
in Section 3.2.1, Data Link Control and Management State Machine Rules:

  If the Link Disable bit has been Set by software, then the subsequent
  transition to DL_Inactive must not be considered an error.

So this test should suppress any Suprise Down Error events, but handling
that particular event wasn't the intent of the test (and as you mentioned,
it ought not occur anyway since the slot is HP Surprise capable).

The test should not suppress reporting the Data Link Layer State Changed
slot status. And while this doesn't trigger a Slot PDC status, triggering
a DLLSC should occur since the Link Status DLLLA should go to 0 when
state machine goes from DL_Active to DL_Down, regardless of if a Suprise
Down Error was detected.

The Linux PCIEHP driver handles a DLLSC link-down event the same as
a presence detect remove event, and that's part of what this test was
trying to cover.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: blktests block/019 lead system hang
  2018-06-13 15:44         ` Keith Busch
@ 2018-06-13 17:17           ` Austin.Bolen
  -1 siblings, 0 replies; 18+ messages in thread
From: Austin.Bolen @ 2018-06-13 17:17 UTC (permalink / raw)
  To: keith.busch, Austin.Bolen
  Cc: keith.busch, yi.zhang, linux-block, osandov, linux-nvme, ming.lei

On 6/13/2018 10:41 AM, Keith Busch wrote:=0A=
> Thanks for the feedback!=0A=
> This test does indeed toggle the Link Control Link Disable bit to simulat=
e=0A=
> the link failure. The PCIe specification specifically covers this case=0A=
> in Section 3.2.1, Data Link Control and Management State Machine Rules:=
=0A=
>=0A=
>   If the Link Disable bit has been Set by software, then the subsequent=
=0A=
>   transition to DL_Inactive must not be considered an error.=0A=
>=0A=
> So this test should suppress any Suprise Down Error events, but handling=
=0A=
> that particular event wasn't the intent of the test (and as you mentioned=
,=0A=
> it ought not occur anyway since the slot is HP Surprise capable).=0A=
>=0A=
> The test should not suppress reporting the Data Link Layer State Changed=
=0A=
> slot status. And while this doesn't trigger a Slot PDC status, triggering=
=0A=
> a DLLSC should occur since the Link Status DLLLA should go to 0 when=0A=
> state machine goes from DL_Active to DL_Down, regardless of if a Suprise=
=0A=
> Down Error was detected.=0A=
>=0A=
> The Linux PCIEHP driver handles a DLLSC link-down event the same as=0A=
> a presence detect remove event, and that's part of what this test was=0A=
> trying to cover.=0A=
=0A=
Yes, the R730 could mask the error if OS sets Data Link Layer State=0A=
Changed Enable =3D 1 and could let the OS handle the hot-plug event=0A=
similar to what is done for surprise removal.  Current platform policy=0A=
on R730 is to not do that and only suppress errors related to physical=0A=
surprise removal (PDS =3D 0).  We'll probably forgo the option of=0A=
suppressing any non-surprise remove link down errors even if OS sets=0A=
Data Link Layer State Changed Enable =3D 1 and go straight to the=0A=
containment error recovery model for DPC once the architecture is=0A=
finalized to handle these non-surprise remove related error.  In the=0A=
meantime, it is expected (though not ideal) that this family of servers=0A=
will crash for this particular test.  Ditto for the test that disables=0A=
Memory Space Enable bit in the command register.=0A=
=0A=
-Austin=0A=
=0A=
=0A=

^ permalink raw reply	[flat|nested] 18+ messages in thread

* blktests block/019 lead system hang
@ 2018-06-13 17:17           ` Austin.Bolen
  0 siblings, 0 replies; 18+ messages in thread
From: Austin.Bolen @ 2018-06-13 17:17 UTC (permalink / raw)


On 6/13/2018 10:41 AM, Keith Busch wrote:
> Thanks for the feedback!
> This test does indeed toggle the Link Control Link Disable bit to simulate
> the link failure. The PCIe specification specifically covers this case
> in Section 3.2.1, Data Link Control and Management State Machine Rules:
>
>   If the Link Disable bit has been Set by software, then the subsequent
>   transition to DL_Inactive must not be considered an error.
>
> So this test should suppress any Suprise Down Error events, but handling
> that particular event wasn't the intent of the test (and as you mentioned,
> it ought not occur anyway since the slot is HP Surprise capable).
>
> The test should not suppress reporting the Data Link Layer State Changed
> slot status. And while this doesn't trigger a Slot PDC status, triggering
> a DLLSC should occur since the Link Status DLLLA should go to 0 when
> state machine goes from DL_Active to DL_Down, regardless of if a Suprise
> Down Error was detected.
>
> The Linux PCIEHP driver handles a DLLSC link-down event the same as
> a presence detect remove event, and that's part of what this test was
> trying to cover.

Yes, the R730 could mask the error if OS sets Data Link Layer State
Changed Enable = 1 and could let the OS handle the hot-plug event
similar to what is done for surprise removal.  Current platform policy
on R730 is to not do that and only suppress errors related to physical
surprise removal (PDS = 0).  We'll probably forgo the option of
suppressing any non-surprise remove link down errors even if OS sets
Data Link Layer State Changed Enable = 1 and go straight to the
containment error recovery model for DPC once the architecture is
finalized to handle these non-surprise remove related error.  In the
meantime, it is expected (though not ideal) that this family of servers
will crash for this particular test.  Ditto for the test that disables
Memory Space Enable bit in the command register.

-Austin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: blktests block/019 lead system hang
  2018-06-13 15:44         ` Keith Busch
@ 2018-06-13 18:24           ` Austin.Bolen
  -1 siblings, 0 replies; 18+ messages in thread
From: Austin.Bolen @ 2018-06-13 18:24 UTC (permalink / raw)
  To: keith.busch, Austin.Bolen
  Cc: keith.busch, yi.zhang, linux-block, osandov, linux-nvme, ming.lei

On 6/13/2018 10:41 AM, Keith Busch wrote:=0A=
> Thanks for the feedback!=0A=
>=0A=
> This test does indeed toggle the Link Control Link Disable bit to simulat=
e=0A=
> the link failure. The PCIe specification specifically covers this case=0A=
> in Section 3.2.1, Data Link Control and Management State Machine Rules:=
=0A=
>=0A=
>   If the Link Disable bit has been Set by software, then the subsequent=
=0A=
>   transition to DL_Inactive must not be considered an error.=0A=
Forgot to mention... this PCIe requirement to not treat Link Disable =3D 1=
=0A=
as an error is a requirement on PCIe hardware and not from platform=0A=
side.  So if you want to test this PCIe spec requirement then you need a=0A=
way to ascertain whether PCIe hardware or platform is causing the=0A=
error.  Additionally, setting Link Disable is not what is causing the=0A=
error in this specific case.  The error is coming from a subsequent MMIO=0A=
that is causing UR since link is down.=0A=
=0A=
-Austin=0A=
=0A=
=0A=
> So this test should suppress any Suprise Down Error events, but handling=
=0A=
> that particular event wasn't the intent of the test (and as you mentioned=
,=0A=
> it ought not occur anyway since the slot is HP Surprise capable).=0A=
>=0A=
> The test should not suppress reporting the Data Link Layer State Changed=
=0A=
> slot status. And while this doesn't trigger a Slot PDC status, triggering=
=0A=
> a DLLSC should occur since the Link Status DLLLA should go to 0 when=0A=
> state machine goes from DL_Active to DL_Down, regardless of if a Suprise=
=0A=
> Down Error was detected.=0A=
>=0A=
> The Linux PCIEHP driver handles a DLLSC link-down event the same as=0A=
> a presence detect remove event, and that's part of what this test was=0A=
> trying to cover.=0A=
>=0A=
=0A=

^ permalink raw reply	[flat|nested] 18+ messages in thread

* blktests block/019 lead system hang
@ 2018-06-13 18:24           ` Austin.Bolen
  0 siblings, 0 replies; 18+ messages in thread
From: Austin.Bolen @ 2018-06-13 18:24 UTC (permalink / raw)


On 6/13/2018 10:41 AM, Keith Busch wrote:
> Thanks for the feedback!
>
> This test does indeed toggle the Link Control Link Disable bit to simulate
> the link failure. The PCIe specification specifically covers this case
> in Section 3.2.1, Data Link Control and Management State Machine Rules:
>
>   If the Link Disable bit has been Set by software, then the subsequent
>   transition to DL_Inactive must not be considered an error.
Forgot to mention... this PCIe requirement to not treat Link Disable = 1
as an error is a requirement on PCIe hardware and not from platform
side.  So if you want to test this PCIe spec requirement then you need a
way to ascertain whether PCIe hardware or platform is causing the
error.  Additionally, setting Link Disable is not what is causing the
error in this specific case.  The error is coming from a subsequent MMIO
that is causing UR since link is down.

-Austin


> So this test should suppress any Suprise Down Error events, but handling
> that particular event wasn't the intent of the test (and as you mentioned,
> it ought not occur anyway since the slot is HP Surprise capable).
>
> The test should not suppress reporting the Data Link Layer State Changed
> slot status. And while this doesn't trigger a Slot PDC status, triggering
> a DLLSC should occur since the Link Status DLLLA should go to 0 when
> state machine goes from DL_Active to DL_Down, regardless of if a Suprise
> Down Error was detected.
>
> The Linux PCIEHP driver handles a DLLSC link-down event the same as
> a presence detect remove event, and that's part of what this test was
> trying to cover.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-06-13 18:24 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <838678680.4693215.1527664726174.JavaMail.zimbra@redhat.com>
2018-05-30  7:26 ` blktests block/019 lead system hang Yi Zhang
2018-05-30  7:26   ` Yi Zhang
2018-06-05 16:18   ` Keith Busch
2018-06-05 16:18     ` Keith Busch
2018-06-05 17:21     ` Keith Busch
2018-06-05 17:21       ` Keith Busch
2018-06-06  5:42       ` Yi Zhang
2018-06-06  5:42         ` Yi Zhang
2018-06-06 14:28         ` Keith Busch
2018-06-06 14:28           ` Keith Busch
2018-06-12 23:41     ` Austin.Bolen
2018-06-12 23:41       ` Austin.Bolen
2018-06-13 15:44       ` Keith Busch
2018-06-13 15:44         ` Keith Busch
2018-06-13 17:17         ` Austin.Bolen
2018-06-13 17:17           ` Austin.Bolen
2018-06-13 18:24         ` Austin.Bolen
2018-06-13 18:24           ` Austin.Bolen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.