All of lore.kernel.org
 help / color / mirror / Atom feed
* REGRESSION in c5552fde102f ("nvme: Enable autonomous power state transitions")
@ 2018-01-24 11:42 ` Jani Nikula
  0 siblings, 0 replies; 8+ messages in thread
From: Jani Nikula @ 2018-01-24 11:42 UTC (permalink / raw)



Hi Andy, all -

So this is an odd one.

I'm getting display FIFO underruns in a very specific setting: Laptop
display switched off, and an external display connected. Other
combinations work fine.

I've bisected this to c5552fde102f ("nvme: Enable autonomous power state
transitions"), and, being baffled by the result, carefully checked
this. There are no problems when running c5552fde102f^, with
nvme_core.default_ps_max_latency_us=0, or after 'echo 0 >
pm_qos_latency_tolerance_us'. With the last one, restoring the original
value of 100000 brings the underruns back.

I have no idea what the root cause mechanism here is, but the bisect is
correct. Perhaps something to do with timing. I'd be happy to provide
further details.

I see that you have quirked one Samsung device. Incidentally, this
Lenovo Yoga 910 (Kabylake, SunrisePoint LP PCH) also has a Samsung NVMe
device, just a different one. Details below. I don't know what the
failure mode in the quirked one is, so I don't know if this could be the
same issue.

BR,
Jani.


$ lspci -vvnn -s 02:00.0
02:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd Device [144d:a804] (prog-if 02 [NVM Express])
	Subsystem: Samsung Electronics Co Ltd Device [144d:a801]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	NUMA node: 0
	Region 0: Memory at a1200000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable+ Count=33 Masked-
		Vector table: BAR=0 offset=00003000
		PBA: BAR=0 offset=00002000
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
	Capabilities: [158 v1] Power Budgeting <?>
	Capabilities: [168 v1] #19
	Capabilities: [188 v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns
	Capabilities: [190 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
			   T_CommonMode=0us LTR1.2_Threshold=163840ns
		L1SubCtl2: T_PwrOn=44us
	Kernel driver in use: nvme
	Kernel modules: nvme


-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

* REGRESSION in c5552fde102f ("nvme: Enable autonomous power state transitions")
@ 2018-01-24 11:42 ` Jani Nikula
  0 siblings, 0 replies; 8+ messages in thread
From: Jani Nikula @ 2018-01-24 11:42 UTC (permalink / raw)
  To: Andy Lutomirski, Keith Busch, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, linux-nvme
  Cc: intel-gfx, ville.syrjala


Hi Andy, all -

So this is an odd one.

I'm getting display FIFO underruns in a very specific setting: Laptop
display switched off, and an external display connected. Other
combinations work fine.

I've bisected this to c5552fde102f ("nvme: Enable autonomous power state
transitions"), and, being baffled by the result, carefully checked
this. There are no problems when running c5552fde102f^, with
nvme_core.default_ps_max_latency_us=0, or after 'echo 0 >
pm_qos_latency_tolerance_us'. With the last one, restoring the original
value of 100000 brings the underruns back.

I have no idea what the root cause mechanism here is, but the bisect is
correct. Perhaps something to do with timing. I'd be happy to provide
further details.

I see that you have quirked one Samsung device. Incidentally, this
Lenovo Yoga 910 (Kabylake, SunrisePoint LP PCH) also has a Samsung NVMe
device, just a different one. Details below. I don't know what the
failure mode in the quirked one is, so I don't know if this could be the
same issue.

BR,
Jani.


$ lspci -vvnn -s 02:00.0
02:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd Device [144d:a804] (prog-if 02 [NVM Express])
	Subsystem: Samsung Electronics Co Ltd Device [144d:a801]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	NUMA node: 0
	Region 0: Memory at a1200000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable+ Count=33 Masked-
		Vector table: BAR=0 offset=00003000
		PBA: BAR=0 offset=00002000
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
	Capabilities: [158 v1] Power Budgeting <?>
	Capabilities: [168 v1] #19
	Capabilities: [188 v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns
	Capabilities: [190 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
			   T_CommonMode=0us LTR1.2_Threshold=163840ns
		L1SubCtl2: T_PwrOn=44us
	Kernel driver in use: nvme
	Kernel modules: nvme


-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* REGRESSION in c5552fde102f ("nvme: Enable autonomous power state transitions")
  2018-01-24 11:42 ` Jani Nikula
@ 2018-01-24 11:53   ` Jani Nikula
  -1 siblings, 0 replies; 8+ messages in thread
From: Jani Nikula @ 2018-01-24 11:53 UTC (permalink / raw)



[Fixed Ville's address, sorry for the extra noise.]

On Wed, 24 Jan 2018, Jani Nikula <jani.nikula@intel.com> wrote:
> Hi Andy, all -
>
> So this is an odd one.
>
> I'm getting display FIFO underruns in a very specific setting: Laptop
> display switched off, and an external display connected. Other
> combinations work fine.
>
> I've bisected this to c5552fde102f ("nvme: Enable autonomous power state
> transitions"), and, being baffled by the result, carefully checked
> this. There are no problems when running c5552fde102f^, with
> nvme_core.default_ps_max_latency_us=0, or after 'echo 0 >
> pm_qos_latency_tolerance_us'. With the last one, restoring the original
> value of 100000 brings the underruns back.
>
> I have no idea what the root cause mechanism here is, but the bisect is
> correct. Perhaps something to do with timing. I'd be happy to provide
> further details.
>
> I see that you have quirked one Samsung device. Incidentally, this
> Lenovo Yoga 910 (Kabylake, SunrisePoint LP PCH) also has a Samsung NVMe
> device, just a different one. Details below. I don't know what the
> failure mode in the quirked one is, so I don't know if this could be the
> same issue.
>
> BR,
> Jani.
>
>
> $ lspci -vvnn -s 02:00.0
> 02:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd Device [144d:a804] (prog-if 02 [NVM Express])
> 	Subsystem: Samsung Electronics Co Ltd Device [144d:a801]
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin A routed to IRQ 16
> 	NUMA node: 0
> 	Region 0: Memory at a1200000 (64-bit, non-prefetchable) [size=16K]
> 	Capabilities: [40] Power Management version 3
> 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> 	Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
> 		Address: 0000000000000000  Data: 0000
> 	Capabilities: [70] Express (v2) Endpoint, MSI 00
> 		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
> 			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
> 		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
> 			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
> 			MaxPayload 256 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
> 		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 <64us
> 			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> 		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
> 		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
> 			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
> 	Capabilities: [b0] MSI-X: Enable+ Count=33 Masked-
> 		Vector table: BAR=0 offset=00003000
> 		PBA: BAR=0 offset=00002000
> 	Capabilities: [100 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> 		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> 	Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
> 	Capabilities: [158 v1] Power Budgeting <?>
> 	Capabilities: [168 v1] #19
> 	Capabilities: [188 v1] Latency Tolerance Reporting
> 		Max snoop latency: 3145728ns
> 		Max no snoop latency: 3145728ns
> 	Capabilities: [190 v1] L1 PM Substates
> 		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> 			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
> 		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> 			   T_CommonMode=0us LTR1.2_Threshold=163840ns
> 		L1SubCtl2: T_PwrOn=44us
> 	Kernel driver in use: nvme
> 	Kernel modules: nvme

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: REGRESSION in c5552fde102f ("nvme: Enable autonomous power state transitions")
@ 2018-01-24 11:53   ` Jani Nikula
  0 siblings, 0 replies; 8+ messages in thread
From: Jani Nikula @ 2018-01-24 11:53 UTC (permalink / raw)
  To: Andy Lutomirski, Keith Busch, Jens Axboe, Christoph Hellwig,
	Sagi Grimberg, linux-nvme
  Cc: intel-gfx


[Fixed Ville's address, sorry for the extra noise.]

On Wed, 24 Jan 2018, Jani Nikula <jani.nikula@intel.com> wrote:
> Hi Andy, all -
>
> So this is an odd one.
>
> I'm getting display FIFO underruns in a very specific setting: Laptop
> display switched off, and an external display connected. Other
> combinations work fine.
>
> I've bisected this to c5552fde102f ("nvme: Enable autonomous power state
> transitions"), and, being baffled by the result, carefully checked
> this. There are no problems when running c5552fde102f^, with
> nvme_core.default_ps_max_latency_us=0, or after 'echo 0 >
> pm_qos_latency_tolerance_us'. With the last one, restoring the original
> value of 100000 brings the underruns back.
>
> I have no idea what the root cause mechanism here is, but the bisect is
> correct. Perhaps something to do with timing. I'd be happy to provide
> further details.
>
> I see that you have quirked one Samsung device. Incidentally, this
> Lenovo Yoga 910 (Kabylake, SunrisePoint LP PCH) also has a Samsung NVMe
> device, just a different one. Details below. I don't know what the
> failure mode in the quirked one is, so I don't know if this could be the
> same issue.
>
> BR,
> Jani.
>
>
> $ lspci -vvnn -s 02:00.0
> 02:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd Device [144d:a804] (prog-if 02 [NVM Express])
> 	Subsystem: Samsung Electronics Co Ltd Device [144d:a801]
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin A routed to IRQ 16
> 	NUMA node: 0
> 	Region 0: Memory at a1200000 (64-bit, non-prefetchable) [size=16K]
> 	Capabilities: [40] Power Management version 3
> 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> 	Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
> 		Address: 0000000000000000  Data: 0000
> 	Capabilities: [70] Express (v2) Endpoint, MSI 00
> 		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
> 			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
> 		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
> 			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
> 			MaxPayload 256 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
> 		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 <64us
> 			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> 		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
> 		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
> 			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
> 	Capabilities: [b0] MSI-X: Enable+ Count=33 Masked-
> 		Vector table: BAR=0 offset=00003000
> 		PBA: BAR=0 offset=00002000
> 	Capabilities: [100 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> 		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> 	Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
> 	Capabilities: [158 v1] Power Budgeting <?>
> 	Capabilities: [168 v1] #19
> 	Capabilities: [188 v1] Latency Tolerance Reporting
> 		Max snoop latency: 3145728ns
> 		Max no snoop latency: 3145728ns
> 	Capabilities: [190 v1] L1 PM Substates
> 		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> 			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
> 		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> 			   T_CommonMode=0us LTR1.2_Threshold=163840ns
> 		L1SubCtl2: T_PwrOn=44us
> 	Kernel driver in use: nvme
> 	Kernel modules: nvme

-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-gfx] REGRESSION in c5552fde102f ("nvme: Enable autonomous power state transitions")
  2018-01-24 11:42 ` Jani Nikula
@ 2018-01-24 13:35   ` Ville Syrjälä
  -1 siblings, 0 replies; 8+ messages in thread
From: Ville Syrjälä @ 2018-01-24 13:35 UTC (permalink / raw)


On Wed, Jan 24, 2018@01:42:08PM +0200, Jani Nikula wrote:
> 
> Hi Andy, all -
> 
> So this is an odd one.
> 
> I'm getting display FIFO underruns in a very specific setting: Laptop
> display switched off, and an external display connected. Other
> combinations work fine.
> 
> I've bisected this to c5552fde102f ("nvme: Enable autonomous power state
> transitions"), and, being baffled by the result, carefully checked
> this. There are no problems when running c5552fde102f^, with
> nvme_core.default_ps_max_latency_us=0, or after 'echo 0 >
> pm_qos_latency_tolerance_us'. With the last one, restoring the original
> value of 100000 brings the underruns back.
> 
> I have no idea what the root cause mechanism here is, but the bisect is
> correct. Perhaps something to do with timing. I'd be happy to provide
> further details.
> 
> I see that you have quirked one Samsung device. Incidentally, this
> Lenovo Yoga 910 (Kabylake, SunrisePoint LP PCH) also has a Samsung NVMe
> device, just a different one. Details below. I don't know what the
> failure mode in the quirked one is, so I don't know if this could be the
> same issue.

My first gut feeling would be that by allowing the nvme to go to sleep
we're gettting into some deeper power saving state, which then causes
display underruns. How does the package c-state residency look
before/after the commit?

I might be wrong too of course. IIRC there were plenty of display
flicker issues on SKL at least that were magically fixed by unknown
magic in BIOS updates.

> 
> BR,
> Jani.
> 
> 
> $ lspci -vvnn -s 02:00.0
> 02:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd Device [144d:a804] (prog-if 02 [NVM Express])
> 	Subsystem: Samsung Electronics Co Ltd Device [144d:a801]
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin A routed to IRQ 16
> 	NUMA node: 0
> 	Region 0: Memory at a1200000 (64-bit, non-prefetchable) [size=16K]
> 	Capabilities: [40] Power Management version 3
> 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> 	Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
> 		Address: 0000000000000000  Data: 0000
> 	Capabilities: [70] Express (v2) Endpoint, MSI 00
> 		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
> 			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
> 		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
> 			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
> 			MaxPayload 256 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
> 		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 <64us
> 			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> 		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
> 		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
> 			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
> 	Capabilities: [b0] MSI-X: Enable+ Count=33 Masked-
> 		Vector table: BAR=0 offset=00003000
> 		PBA: BAR=0 offset=00002000
> 	Capabilities: [100 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> 		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> 	Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
> 	Capabilities: [158 v1] Power Budgeting <?>
> 	Capabilities: [168 v1] #19
> 	Capabilities: [188 v1] Latency Tolerance Reporting
> 		Max snoop latency: 3145728ns
> 		Max no snoop latency: 3145728ns
> 	Capabilities: [190 v1] L1 PM Substates
> 		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> 			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
> 		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> 			   T_CommonMode=0us LTR1.2_Threshold=163840ns
> 		L1SubCtl2: T_PwrOn=44us
> 	Kernel driver in use: nvme
> 	Kernel modules: nvme
> 
> 
> -- 
> Jani Nikula, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrj?l?
Intel OTC

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: REGRESSION in c5552fde102f ("nvme: Enable autonomous power state transitions")
@ 2018-01-24 13:35   ` Ville Syrjälä
  0 siblings, 0 replies; 8+ messages in thread
From: Ville Syrjälä @ 2018-01-24 13:35 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Jens Axboe, Sagi Grimberg, intel-gfx, linux-nvme, ville.syrjala,
	Keith Busch, Andy Lutomirski, Christoph Hellwig

On Wed, Jan 24, 2018 at 01:42:08PM +0200, Jani Nikula wrote:
> 
> Hi Andy, all -
> 
> So this is an odd one.
> 
> I'm getting display FIFO underruns in a very specific setting: Laptop
> display switched off, and an external display connected. Other
> combinations work fine.
> 
> I've bisected this to c5552fde102f ("nvme: Enable autonomous power state
> transitions"), and, being baffled by the result, carefully checked
> this. There are no problems when running c5552fde102f^, with
> nvme_core.default_ps_max_latency_us=0, or after 'echo 0 >
> pm_qos_latency_tolerance_us'. With the last one, restoring the original
> value of 100000 brings the underruns back.
> 
> I have no idea what the root cause mechanism here is, but the bisect is
> correct. Perhaps something to do with timing. I'd be happy to provide
> further details.
> 
> I see that you have quirked one Samsung device. Incidentally, this
> Lenovo Yoga 910 (Kabylake, SunrisePoint LP PCH) also has a Samsung NVMe
> device, just a different one. Details below. I don't know what the
> failure mode in the quirked one is, so I don't know if this could be the
> same issue.

My first gut feeling would be that by allowing the nvme to go to sleep
we're gettting into some deeper power saving state, which then causes
display underruns. How does the package c-state residency look
before/after the commit?

I might be wrong too of course. IIRC there were plenty of display
flicker issues on SKL at least that were magically fixed by unknown
magic in BIOS updates.

> 
> BR,
> Jani.
> 
> 
> $ lspci -vvnn -s 02:00.0
> 02:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd Device [144d:a804] (prog-if 02 [NVM Express])
> 	Subsystem: Samsung Electronics Co Ltd Device [144d:a801]
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin A routed to IRQ 16
> 	NUMA node: 0
> 	Region 0: Memory at a1200000 (64-bit, non-prefetchable) [size=16K]
> 	Capabilities: [40] Power Management version 3
> 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> 	Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
> 		Address: 0000000000000000  Data: 0000
> 	Capabilities: [70] Express (v2) Endpoint, MSI 00
> 		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
> 			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
> 		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
> 			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
> 			MaxPayload 256 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
> 		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 <64us
> 			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> 		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
> 		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
> 			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
> 	Capabilities: [b0] MSI-X: Enable+ Count=33 Masked-
> 		Vector table: BAR=0 offset=00003000
> 		PBA: BAR=0 offset=00002000
> 	Capabilities: [100 v2] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> 		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> 	Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
> 	Capabilities: [158 v1] Power Budgeting <?>
> 	Capabilities: [168 v1] #19
> 	Capabilities: [188 v1] Latency Tolerance Reporting
> 		Max snoop latency: 3145728ns
> 		Max no snoop latency: 3145728ns
> 	Capabilities: [190 v1] L1 PM Substates
> 		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> 			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
> 		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> 			   T_CommonMode=0us LTR1.2_Threshold=163840ns
> 		L1SubCtl2: T_PwrOn=44us
> 	Kernel driver in use: nvme
> 	Kernel modules: nvme
> 
> 
> -- 
> Jani Nikula, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-gfx] REGRESSION in c5552fde102f ("nvme: Enable autonomous power state transitions")
  2018-01-24 13:35   ` Ville Syrjälä
@ 2018-01-24 17:00     ` Andy Lutomirski
  -1 siblings, 0 replies; 8+ messages in thread
From: Andy Lutomirski @ 2018-01-24 17:00 UTC (permalink / raw)


On Wed, Jan 24, 2018 at 5:35 AM, Ville Syrj?l?
<ville.syrjala@linux.intel.com> wrote:
> On Wed, Jan 24, 2018@01:42:08PM +0200, Jani Nikula wrote:
>>
>> Hi Andy, all -
>>
>> So this is an odd one.
>>
>> I'm getting display FIFO underruns in a very specific setting: Laptop
>> display switched off, and an external display connected. Other
>> combinations work fine.
>>
>> I've bisected this to c5552fde102f ("nvme: Enable autonomous power state
>> transitions"), and, being baffled by the result, carefully checked
>> this. There are no problems when running c5552fde102f^, with
>> nvme_core.default_ps_max_latency_us=0, or after 'echo 0 >
>> pm_qos_latency_tolerance_us'. With the last one, restoring the original
>> value of 100000 brings the underruns back.
>>
>> I have no idea what the root cause mechanism here is, but the bisect is
>> correct. Perhaps something to do with timing. I'd be happy to provide
>> further details.
>>
>> I see that you have quirked one Samsung device. Incidentally, this
>> Lenovo Yoga 910 (Kabylake, SunrisePoint LP PCH) also has a Samsung NVMe
>> device, just a different one. Details below. I don't know what the
>> failure mode in the quirked one is, so I don't know if this could be the
>> same issue.
>
> My first gut feeling would be that by allowing the nvme to go to sleep
> we're gettting into some deeper power saving state, which then causes
> display underruns. How does the package c-state residency look
> before/after the commit?

I know approximately nothing about how package C-states works and what
exactly triggers APSM low-power state entry, but I've seen reports
that APST is required to get ASPM L1 and that ASPM L1 is needed to get
to the deep PC states.  And deep PC states can surely trigger i915
issues...

--Andy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: REGRESSION in c5552fde102f ("nvme: Enable autonomous power state transitions")
@ 2018-01-24 17:00     ` Andy Lutomirski
  0 siblings, 0 replies; 8+ messages in thread
From: Andy Lutomirski @ 2018-01-24 17:00 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Jens Axboe, Sagi Grimberg, Jani Nikula,
	Intel Graphics Development, linux-nvme, ville.syrjala,
	Keith Busch, Andy Lutomirski, Christoph Hellwig

On Wed, Jan 24, 2018 at 5:35 AM, Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
> On Wed, Jan 24, 2018 at 01:42:08PM +0200, Jani Nikula wrote:
>>
>> Hi Andy, all -
>>
>> So this is an odd one.
>>
>> I'm getting display FIFO underruns in a very specific setting: Laptop
>> display switched off, and an external display connected. Other
>> combinations work fine.
>>
>> I've bisected this to c5552fde102f ("nvme: Enable autonomous power state
>> transitions"), and, being baffled by the result, carefully checked
>> this. There are no problems when running c5552fde102f^, with
>> nvme_core.default_ps_max_latency_us=0, or after 'echo 0 >
>> pm_qos_latency_tolerance_us'. With the last one, restoring the original
>> value of 100000 brings the underruns back.
>>
>> I have no idea what the root cause mechanism here is, but the bisect is
>> correct. Perhaps something to do with timing. I'd be happy to provide
>> further details.
>>
>> I see that you have quirked one Samsung device. Incidentally, this
>> Lenovo Yoga 910 (Kabylake, SunrisePoint LP PCH) also has a Samsung NVMe
>> device, just a different one. Details below. I don't know what the
>> failure mode in the quirked one is, so I don't know if this could be the
>> same issue.
>
> My first gut feeling would be that by allowing the nvme to go to sleep
> we're gettting into some deeper power saving state, which then causes
> display underruns. How does the package c-state residency look
> before/after the commit?

I know approximately nothing about how package C-states works and what
exactly triggers APSM low-power state entry, but I've seen reports
that APST is required to get ASPM L1 and that ASPM L1 is needed to get
to the deep PC states.  And deep PC states can surely trigger i915
issues...

--Andy
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-01-24 17:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-24 11:42 REGRESSION in c5552fde102f ("nvme: Enable autonomous power state transitions") Jani Nikula
2018-01-24 11:42 ` Jani Nikula
2018-01-24 11:53 ` Jani Nikula
2018-01-24 11:53   ` Jani Nikula
2018-01-24 13:35 ` [Intel-gfx] " Ville Syrjälä
2018-01-24 13:35   ` Ville Syrjälä
2018-01-24 17:00   ` [Intel-gfx] " Andy Lutomirski
2018-01-24 17:00     ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.