All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about cacheline size in PCIe SAS card
@ 2016-07-28  8:15 wangyijing
  2016-07-28 18:43 ` Bjorn Helgaas
  0 siblings, 1 reply; 5+ messages in thread
From: wangyijing @ 2016-07-28  8:15 UTC (permalink / raw)
  To: linux-pci; +Cc: jianghong011, wangyijing

Hi all, we found a question about PCIe cacheline, the cacheline here is mean the
configure space register at offset 0x0C in type 0 and type 1 configure space header.

We did a hotplug in our platform for PCIe SAS controller, this sas controller has
SSD disks and the disk sector is 520 bytes. Defaultly, BIOS set cacheline size to
64bytes, we test the IO read(io size is 128k/256k), the bandwith is 6G.
After hotplug, the cacheline size in SAS controller changes to 0(default after #RST),
and we test the IO read again, the bandwith changes to 5.2G.

We Tested other SAS controller which is not 520 bytes sector, we didn't found this issue,
and I grep the PCI_CACHE_LINE_SIZE in kernel, I found most of code change the PCI_CACHE_LINE_SIZE
are device driver, like net, ata, and some arm pci controller.

In PCI 3.0 spec, I found there are descriptions about cacheline size releated to performance,
but in PCIe 3.0 spec, there is nothing related to cacheline size.

I wonder what's cacheline register roles in PCIe spec, how can we use it correctly ?

Thanks!
Yijing.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about cacheline size in PCIe SAS card
  2016-07-28  8:15 Question about cacheline size in PCIe SAS card wangyijing
@ 2016-07-28 18:43 ` Bjorn Helgaas
  2016-07-29  2:53   ` wangyijing
  0 siblings, 1 reply; 5+ messages in thread
From: Bjorn Helgaas @ 2016-07-28 18:43 UTC (permalink / raw)
  To: wangyijing; +Cc: linux-pci, jianghong011

On Thu, Jul 28, 2016 at 04:15:31PM +0800, wangyijing wrote:
> Hi all, we found a question about PCIe cacheline, the cacheline here is mean the
> configure space register at offset 0x0C in type 0 and type 1 configure space header.
> 
> We did a hotplug in our platform for PCIe SAS controller, this sas controller has
> SSD disks and the disk sector is 520 bytes. Defaultly, BIOS set cacheline size to
> 64bytes, we test the IO read(io size is 128k/256k), the bandwith is 6G.
> After hotplug, the cacheline size in SAS controller changes to 0(default after #RST),
> and we test the IO read again, the bandwith changes to 5.2G.
> 
> We Tested other SAS controller which is not 520 bytes sector, we didn't found this issue,
> and I grep the PCI_CACHE_LINE_SIZE in kernel, I found most of code change the PCI_CACHE_LINE_SIZE
> are device driver, like net, ata, and some arm pci controller.
> 
> In PCI 3.0 spec, I found there are descriptions about cacheline size releated to performance,
> but in PCIe 3.0 spec, there is nothing related to cacheline size.

Not quite true: sec 7.5.1.3 of PCIe r3.0 says:

  This field [Cache Line Size] is implemented by PCI Express devices
  as a read-write field for legacy compatibility purposes but has no
  effect on any PCI Express device behavior.

Unless your SAS controller is doing something wrong, I suspect
something other than Cache Line Size is responsible for the difference
in performance.

After hot-add of your controller, Cache Line Size is probably zero
because Linux doesn't set it.  What happens if you set it manually
using "setpci"?  Does that affect the performance?

You might look at the MPS and MRRS settings in the two scenarios also.

You could try collecting the output of "lspci -vvxxx" for the whole
system in the default case and again after the hotplug, and then
compare the two for differences.

Bjorn

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about cacheline size in PCIe SAS card
  2016-07-28 18:43 ` Bjorn Helgaas
@ 2016-07-29  2:53   ` wangyijing
  2016-07-29 12:41     ` Bjorn Helgaas
  0 siblings, 1 reply; 5+ messages in thread
From: wangyijing @ 2016-07-29  2:53 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, jianghong011

Hi Bjorn, thanks for your comment!

在 2016/7/29 2:43, Bjorn Helgaas 写道:
> On Thu, Jul 28, 2016 at 04:15:31PM +0800, wangyijing wrote:
>> Hi all, we found a question about PCIe cacheline, the cacheline here is mean the
>> configure space register at offset 0x0C in type 0 and type 1 configure space header.
>>
>> We did a hotplug in our platform for PCIe SAS controller, this sas controller has
>> SSD disks and the disk sector is 520 bytes. Defaultly, BIOS set cacheline size to
>> 64bytes, we test the IO read(io size is 128k/256k), the bandwith is 6G.
>> After hotplug, the cacheline size in SAS controller changes to 0(default after #RST),
>> and we test the IO read again, the bandwith changes to 5.2G.
>>
>> We Tested other SAS controller which is not 520 bytes sector, we didn't found this issue,
>> and I grep the PCI_CACHE_LINE_SIZE in kernel, I found most of code change the PCI_CACHE_LINE_SIZE
>> are device driver, like net, ata, and some arm pci controller.
>>
>> In PCI 3.0 spec, I found there are descriptions about cacheline size releated to performance,
>> but in PCIe 3.0 spec, there is nothing related to cacheline size.
> 
> Not quite true: sec 7.5.1.3 of PCIe r3.0 says:
> 
>   This field [Cache Line Size] is implemented by PCI Express devices
>   as a read-write field for legacy compatibility purposes but has no
>   effect on any PCI Express device behavior.

Oh, sorry, I searched the key word "cacheline" in PCIe spec, according
to this description, there is no effect on any PCIe device.

> 
> Unless your SAS controller is doing something wrong, I suspect
> something other than Cache Line Size is responsible for the difference
> in performance.
> 
> After hot-add of your controller, Cache Line Size is probably zero
> because Linux doesn't set it.  What happens if you set it manually
> using "setpci"?  Does that affect the performance?

Yes, after hotplug, the cacheline size is reset to 0, linux doesn't
touch it, and we tried to change cacheline size to 64 bytes by setpci,
if we test the IO at this time, the IO bandwith is still 5.2G,
but if we reset the firmware after change the cacheline size to 64 bytes,
then test IO bandwith again, the IO bandwith would reach the 6G again.

> 
> You might look at the MPS and MRRS settings in the two scenarios also.

There is no difference for MPS and MRRS in the two scenarios, hotplug driver
restore them according their original values.

> 
> You could try collecting the output of "lspci -vvxxx" for the whole
> system in the default case and again after the hotplug, and then
> compare the two for differences.

Yes, I did, and there is no other significant difference found except cacheline size.

I suspect the SAS controller internal issues hurt the performance.

The normal config space after system boot up

13:00.0 Serial Attached SCSI controller: PMC-Sierra Inc. Device 8072 (rev 06)
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 34
	Region 0: Memory at 97000000 (64-bit, non-prefetchable) [size=64K]
	Region 2: Memory at 97010000 (64-bit, non-prefetchable) [size=64K]
	Expansion ROM at 97100000 [disabled] [size=1M]
	Capabilities: [80] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [88] Vital Product Data
		Unknown small resource type 00, will not decode more.
	Capabilities: [90] MSI: Enable- Count=1/32 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [b0] MSI-X: Enable+ Count=64 Masked-
		Vector table: BAR=0 offset=00000400
		PBA: BAR=0 offset=00000800
	Capabilities: [c0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <1us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x8, ASPM unknown, Latency L0 <1us, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range B, TimeoutDis+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP+ FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [300 v1] #19
	Kernel driver in use: quark

The config space after hotplug

13:00.0 Serial Attached SCSI controller: PMC-Sierra Inc. Device 8072 (rev 06)
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 34
	Region 0: Memory at 97100000 (64-bit, non-prefetchable) [size=64K]
	Region 2: Memory at 97110000 (64-bit, non-prefetchable) [size=64K]
	Expansion ROM at 97000000 [size=1M]
	Capabilities: [80] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [88] Vital Product Data
		Unknown small resource type 00, will not decode more.
	Capabilities: [90] MSI: Enable- Count=1/32 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [b0] MSI-X: Enable+ Count=64 Masked-
		Vector table: BAR=0 offset=00000400
		PBA: BAR=0 offset=00000800
	Capabilities: [c0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <1us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x8, ASPM unknown, Latency L0 <1us, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range B, TimeoutDis+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP+ FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [300 v1] #19
	Kernel driver in use: quark

Thanks!
Yijing.


> 
> Bjorn
> 
> .
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about cacheline size in PCIe SAS card
  2016-07-29  2:53   ` wangyijing
@ 2016-07-29 12:41     ` Bjorn Helgaas
  2016-07-30  1:49       ` wangyijing
  0 siblings, 1 reply; 5+ messages in thread
From: Bjorn Helgaas @ 2016-07-29 12:41 UTC (permalink / raw)
  To: wangyijing; +Cc: linux-pci, jianghong011

On Fri, Jul 29, 2016 at 10:53:57AM +0800, wangyijing wrote:
> Hi Bjorn, thanks for your comment!
> 
> 在 2016/7/29 2:43, Bjorn Helgaas 写道:
> > On Thu, Jul 28, 2016 at 04:15:31PM +0800, wangyijing wrote:
> >> Hi all, we found a question about PCIe cacheline, the cacheline here is mean the
> >> configure space register at offset 0x0C in type 0 and type 1 configure space header.
> >>
> >> We did a hotplug in our platform for PCIe SAS controller, this sas controller has
> >> SSD disks and the disk sector is 520 bytes. Defaultly, BIOS set cacheline size to
> >> 64bytes, we test the IO read(io size is 128k/256k), the bandwith is 6G.
> >> After hotplug, the cacheline size in SAS controller changes to 0(default after #RST),
> >> and we test the IO read again, the bandwith changes to 5.2G.
> >>
> >> We Tested other SAS controller which is not 520 bytes sector, we didn't found this issue,
> >> and I grep the PCI_CACHE_LINE_SIZE in kernel, I found most of code change the PCI_CACHE_LINE_SIZE
> >> are device driver, like net, ata, and some arm pci controller.
> >>
> >> In PCI 3.0 spec, I found there are descriptions about cacheline size releated to performance,
> >> but in PCIe 3.0 spec, there is nothing related to cacheline size.
> > 
> > Not quite true: sec 7.5.1.3 of PCIe r3.0 says:
> > 
> >   This field [Cache Line Size] is implemented by PCI Express devices
> >   as a read-write field for legacy compatibility purposes but has no
> >   effect on any PCI Express device behavior.
> 
> Oh, sorry, I searched the key word "cacheline" in PCIe spec, according
> to this description, there is no effect on any PCIe device.
> 
> > 
> > Unless your SAS controller is doing something wrong, I suspect
> > something other than Cache Line Size is responsible for the difference
> > in performance.
> > 
> > After hot-add of your controller, Cache Line Size is probably zero
> > because Linux doesn't set it.  What happens if you set it manually
> > using "setpci"?  Does that affect the performance?
> 
> Yes, after hotplug, the cacheline size is reset to 0, linux doesn't
> touch it, and we tried to change cacheline size to 64 bytes by setpci,
> if we test the IO at this time, the IO bandwith is still 5.2G,
> but if we reset the firmware after change the cacheline size to 64 bytes,
> then test IO bandwith again, the IO bandwith would reach the 6G again.

OK, that sounds like the category of "your controller doing something
wrong," namely, it is somehow dependent on the Cache Line Size when it
shouldn't be.

If you change Linux to set the Cache Line Size during hot-add, does
that fix it?  I assume you might still need to reset the firmware to
make the card notice the change?

I wonder how this works in the non-hotplug case.  Does the BIOS reset
the firmware somehow after setting Cache Line Size?  Is there an
option ROM that might do this?

Maybe the quark driver needs a quirk in its probe routine that sets
the Cache Line Size and resets the firmware?

Bjorn

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about cacheline size in PCIe SAS card
  2016-07-29 12:41     ` Bjorn Helgaas
@ 2016-07-30  1:49       ` wangyijing
  0 siblings, 0 replies; 5+ messages in thread
From: wangyijing @ 2016-07-30  1:49 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, jianghong011

Hi Bjorn, we confirmed this issue, it's caused by SAS controller internal desgin, following is the
SAS FAE mail,

=================================================================================
Hi Jianghong,

i had confirmed that BDMA module do use this register, it will have performance improve for 520B sector.

if the sector size is 512B, then you have no issue with TLP alignment (because it will be in multiple of the TLP sizes).
Problem happens when your sector size is not multiple of the TLP sizes, which means TLP is not aligned, to had a good performance,
The BDMA algorithm determines how the engine aligns the TLPs and whether or not the enhanced BDMA algorithm is being turned on,
it depends on the cache line size, it use this value as TLP alignment, and then IO performance will be different.
=================================================================================

So it's not a Linux PCIe issue,  thanks very much for your comments and analysis!

Thanks!
Yijing.

在 2016/7/29 20:41, Bjorn Helgaas 写道:
> On Fri, Jul 29, 2016 at 10:53:57AM +0800, wangyijing wrote:
>> Hi Bjorn, thanks for your comment!
>>
>> 在 2016/7/29 2:43, Bjorn Helgaas 写道:
>>> On Thu, Jul 28, 2016 at 04:15:31PM +0800, wangyijing wrote:
>>>> Hi all, we found a question about PCIe cacheline, the cacheline here is mean the
>>>> configure space register at offset 0x0C in type 0 and type 1 configure space header.
>>>>
>>>> We did a hotplug in our platform for PCIe SAS controller, this sas controller has
>>>> SSD disks and the disk sector is 520 bytes. Defaultly, BIOS set cacheline size to
>>>> 64bytes, we test the IO read(io size is 128k/256k), the bandwith is 6G.
>>>> After hotplug, the cacheline size in SAS controller changes to 0(default after #RST),
>>>> and we test the IO read again, the bandwith changes to 5.2G.
>>>>
>>>> We Tested other SAS controller which is not 520 bytes sector, we didn't found this issue,
>>>> and I grep the PCI_CACHE_LINE_SIZE in kernel, I found most of code change the PCI_CACHE_LINE_SIZE
>>>> are device driver, like net, ata, and some arm pci controller.
>>>>
>>>> In PCI 3.0 spec, I found there are descriptions about cacheline size releated to performance,
>>>> but in PCIe 3.0 spec, there is nothing related to cacheline size.
>>>
>>> Not quite true: sec 7.5.1.3 of PCIe r3.0 says:
>>>
>>>   This field [Cache Line Size] is implemented by PCI Express devices
>>>   as a read-write field for legacy compatibility purposes but has no
>>>   effect on any PCI Express device behavior.
>>
>> Oh, sorry, I searched the key word "cacheline" in PCIe spec, according
>> to this description, there is no effect on any PCIe device.
>>
>>>
>>> Unless your SAS controller is doing something wrong, I suspect
>>> something other than Cache Line Size is responsible for the difference
>>> in performance.
>>>
>>> After hot-add of your controller, Cache Line Size is probably zero
>>> because Linux doesn't set it.  What happens if you set it manually
>>> using "setpci"?  Does that affect the performance?
>>
>> Yes, after hotplug, the cacheline size is reset to 0, linux doesn't
>> touch it, and we tried to change cacheline size to 64 bytes by setpci,
>> if we test the IO at this time, the IO bandwith is still 5.2G,
>> but if we reset the firmware after change the cacheline size to 64 bytes,
>> then test IO bandwith again, the IO bandwith would reach the 6G again.
> 
> OK, that sounds like the category of "your controller doing something
> wrong," namely, it is somehow dependent on the Cache Line Size when it
> shouldn't be.
> 
> If you change Linux to set the Cache Line Size during hot-add, does
> that fix it?  I assume you might still need to reset the firmware to
> make the card notice the change?
> 
> I wonder how this works in the non-hotplug case.  Does the BIOS reset
> the firmware somehow after setting Cache Line Size?  Is there an
> option ROM that might do this?
> 
> Maybe the quark driver needs a quirk in its probe routine that sets
> the Cache Line Size and resets the firmware?
> 
> Bjorn
> 
> .
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-07-30  1:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-28  8:15 Question about cacheline size in PCIe SAS card wangyijing
2016-07-28 18:43 ` Bjorn Helgaas
2016-07-29  2:53   ` wangyijing
2016-07-29 12:41     ` Bjorn Helgaas
2016-07-30  1:49       ` wangyijing

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.